linux

Commit Graph

Author	SHA1	Message	Date
Yevgeny Petrilin	7d4b6bcce0	mlx4_core: updated driver version to 1.1 Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:08 -05:00
Jack Morgenstein	ab9c17a009	mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet 1. Added module parameters sr_iov and probe_vf for controlling enablement of SRIOV mode. 2. Increased default max num-qps, num-mpts and log_num_macs to accomodate SRIOV mode 3. Added port_type_array as a module parameter to allow driver startup with ports configured as desired. In SRIOV mode, only ETH is supported, and this array is ignored; otherwise, for the case where the FW supports both port types (ETH and IB), the port_type_array parameter is used. By default, the port_type_array is set to configure both ports as IB. 4. When running in sriov mode, the master needs to initialize the ICM eq table to hold the eq's for itself and also for all the slaves. 5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap. 6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca). VFs obtain their startup information from the PF (master) device via the comm channel. 7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver aborts loading the device. 8. Do not allow setting port type via sysfs when running in SRIOV mode. 9. mlx4_get_ownership: Currently, only one PF is supported by the driver. If the HCA is burned with FW which enables more than one PF, only one of the PFs is allowed to run. The first one up grabs a FW ownership semaphone -- all other PFs will find that semaphore taken, and the driver will not allow them to run. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Liran Liss <liranl@mellanox.co.il> Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:08 -05:00
Jack Morgenstein	d81c7186aa	mlx4_core: adjust catas operation for SRIOV mode When running in SRIOV mode, driver should not automatically start/stop the mlx4_core upon sensing an HCA internal error -- doing this disables/enables sriov, which will cause the hypervisor to hang if there are running VMs with attached VFs. In addition, on VMs the catas process should not run at all, since the HCA error buffer is not available to VMs in the BARs. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:08 -05:00
Marcel Apfelbaum	2b8fb2867c	mlx4_core: mtts resources units changed to offset In the previous implementation mtts are managed by: 1. order - log(mtt segments), 'mtt segment' groups several mtts together. 2. first_seg - segment location relative to mtt table. In the current implementation: 1. order - log(mtts) rather than segments 2. offset - mtt index in mtt table Note: The actual mtt allocation is made in segments but it is transparent to callers. Rational: The mtt resource holders are not interested on how the allocation of mtt is done, but rather on how they will use it. Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:07 -05:00
Eugenia Emantayev	5b4c4d3686	mlx4_en: Allow communication between functions on same host To enable internal loopback, always fill DMAC in control segment when transmitting the packet, once this is done, the packet is subject for loopback for if the DMAC mathces one of the multicast/unicast addresses registered on the physical port. In receive path if source MAC is our own MAC and we are not in selftest, or not in force LB mode - drop this packet. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:07 -05:00
Eugenia Emantayev	ffe455ad04	mlx4: Ethernet port management modifications The physical port is now common to the PF and VFs. The port resources and configuration is managed by the PF, VFs can only influence the MTU of the port, it is set as max among all functions, Each function allocates RX buffers of required size to meet it's MTU enforcement. Port management code was moved to mlx4_core, as the mlx4_en module is virtualization unaware Move handling qp functionality to mlx4_get_eth_qp/mlx4_put_eth_qp including reserve/release range and add/release unicast steering. Let mlx4_register/unregister_mac deal only with MAC (un)registration. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:07 -05:00
Eugenia Emantayev	0ec2c0f86d	mlx4: Traffic steering management support for SRIOV Let multicast/unicast attaching flow go through resource tracker. The PF is the one responsible for managing all the steering entries. Define and use module parameter that determines the number of qps per multicast group. Minor changes in function calls according to changed prototype. Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:07 -05:00
Jack Morgenstein	8e59d254fe	mlx4_ib: disable SRIOV mode for IB ports (not yet supported) Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:07 -05:00
Eli Cohen	c82e9aa0a8	mlx4_core: resource tracking for HCA resources used by guests The resource tracker is used to track usage of HCA resources by the different guests. Virtual functions (VFs) are attached to guest operating systems but resources are allocated from the same pool and are assigned to VFs. It is essential that hostile/buggy guests not be able to affect the operation of other VFs, possibly attached to other guest OSs since ConnectX firmware is not tolerant to misuse of resources. The resource tracker module associates each resource with a VF and maintains state information for the allocated object. It also defines allowed state transitions and enforces them. Relationships between resources are also referred to. For example, CQs are pointed to by QPs, so it is forbidden to destroy a CQ if a QP refers to it. ICM memory is always accessible through the primary function and hence it is allocated by the owner of the primary function. When a guest dies, an FLR is generated for all the VFs it owns and all the resources it used are freed. The tracked resource types are: QPs, CQs, SRQs, MPTs, MTTs, MACs, RES_EQs, and XRCDNs. Signed-off-by: Eli Cohen <eli@mellanox.co.il> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:07 -05:00
Jack Morgenstein	acba2420f9	mlx4_core: Add wrapper functions and comm channel and slave event support to EQs Passing async events to slaves: In SRIOV mode, each slave creates its own async EQ, but only the master can register directly with the FW to receive async events. Async events which should be passed to slaves (such as a WQ_ACCESS_ERROR for a QP owned by a slave) are generated at the slave by the master using the GEN_EQE FW command. Wrapper functions: mlx4_MAP_EQ_wrapper Only the master can map an EQ. The slave commands to map their EQs arrive at the master via the comm channel. The master then invokes the wrapper function to do the work (and enter the resource in the tracking database). New events: COMM_CHANNEL and FLR The COMM_CHANNEL event arrives only at the master, and signals that a slave has posted a command on the comm channel. The FLR event is generated by the FW when a guest operating a VF unexpectedly goes down. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:06 -05:00
Jack Morgenstein	ea51b377ab	mlx4_core: mtt modifications for SRIOV MTTs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:06 -05:00
Jack Morgenstein	d7233386b2	mlx4_core: cq modifications for SRIOV CQs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:06 -05:00
Jack Morgenstein	fe9a2603c5	mlx4_core: qp modifications for SRIOV QPs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:06 -05:00
Jack Morgenstein	3ec65b2be5	mlx4_core: srq modifications for SRIOV SRQs are resources which are allocated and tracked by the PF driver. In multifunction mode, the allocation and icm mapping is done in the resource tracker (later patch in this sequence). To accomplish this, we have "work" functions whose names start with "__", and "request" functions (same name, no __). If we are operating in multifunction mode, the request function actually results in comm-channel commands being sent (ALLOC_RES or FREE_RES). The PF-driver comm-channel handler will ultimately invoke the "work" (__) function and return the result. If we are not in multifunction mode, the "work" handler is invoked immediately. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:06 -05:00
Marcel Apfelbaum	5cc914f108	mlx4_core: Added FW commands and their wrappers for supporting SRIOV The following commands are added here: 1. QUERY_FUNC_CAP and its wrapper. This function is used by VFs when they start up to receive configuration information from the PF, such as resource quotas for this VF, which ports should be used (currently two), what protocol is running on the port (currently Ethernet ONLY, or port not active). 2. QUERY_PORT and its wrapper. Previously, this FW command was invoked directly by the ETH driver (en_port.c) using mlx4_cmd_box. Virtualization is now required here (the VF's MAC address must be substituted for the PFs MAC address returned by the FW). We changed the invocation in the ETH driver to use mlx4_QUERY_PORT, and added the wrapper. 3. QUERY_HCA. Used by the VF to determine how the HCA was initialized. For now, we need only the multicast table member entry size (log2_mc_table_entry_sz, in the ConnectX PRM). No wrapper is needed here, because the data may be passed as is to the VF without modification). In this command, we have added a GLOBAL_CAPS field for passing required configuration information from FW to a VF (this field is to allow safely adding new SRIOV capabilities which require support in VF drivers, too). Bits will set here by FW in response to PF-driver configuration commands which will activate as yet undefined new SRIOV features. The VF will test to see that all required capabilities indicated by this field are supported (i.e., if a bit is set and the VF driver does not recognize that bit, it must abort its initialization). Currently, no bits are set. 4. Added a CLOSE_PORT wrapper. The PF context needs to keep track of how many VF contexts have the port open. The PF context will not actually issue the FW close port command until the last port user issues a CLOSE_PORT request. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:06 -05:00
Yevgeny Petrilin	e8f081aacd	net/mlx4_core: Implement the master-slave communication channel When SRIOV is enabled, pf and vfs communicate via shared comm channel. The vf gets its side of the comm channel via a VF BAR. Each VF (slave) creates its vHCR (virtual HCA Command Register), Its DMA address is passed to the PF (master) using Communication Channel Register. The same Register is used to notify the master of commands posted by the slaves and for the master to pass events to the slaves, such as command completions and asynchronous events. The vHCR format is identical to the HCR format, except for the 'go' and 't' bits, which are reserved in the vHCR. Posting commands to the vHCR is identical to the way it is done with the HCR, albeit that the function/PF token fields are used instead of the HCR go bit. Specifically: - When the function prepares a new command in the vHCR, it issues the Post_vHCR_cmd communication channel command and toggles the value of the function token; when PF token has an equal value, the command has been accepted and a new command may be posted. - When the PF detects a Post_vHCR_cmd command, it concludes that a new command is available in the vHCR; after processing the command, the PF toggles the PF token to match the function token. When the 'e' bit is not set, the completion of a Post_vHCR_cmd command also indicates the completion the vHCR command. If, however, the 'e' bit is set, the completion of a Post_vHCR_cmd command only indicates that the vHCR command has been accepted for execution by the PF. Function commands are processed by the PF as follows: -DMA (using the ACCESS_MEM command) the vHCR image into a shadow buffer. -Validate that the opcode is non-privileged, and that the opcode- and input-modifiers are legal. -DMA the in-box (if required) into a shadow buffer. -Validate the command: o Resource ranges (e.g., QP ranges). o Partition key. o Ranges of referenced resources (e.g., CQs within QP contexts). -If the 'e' bit is set o complete the Post_vHCR_cmd command -Execute the command on the HCR. -DMA the results to the vHCR out-box (if required). -If the 'e' bit is set o Indicate command completion by generating a completion event using the GEN_EQE command -Otherwise o DMA the command status to the vHCR o Complete the Post_vHCR_cmd command Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrillin <yevgenyp@mellanox.com> Signed-off-by: Liran Liss <liranl@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:05 -05:00
Jack Morgenstein	f5311ac109	mlx4_core: Reduce number of PD bits to 17 When SRIOV is enabled on the chip (at FW burning time), the HCA uses only 17 bits for the PD. The remaining 7 high-order bits are ignored. Change the allocator to return only 17 bits for the PD. The MSB 7 bits will be used to encode the slave number for consistency checking later on in the resource tracker. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:05 -05:00
Jack Morgenstein	f9baff509f	mlx4_core: Add "native" argument to mlx4_cmd and its callers (where needed) For SRIOV, some Hypervisor commands can be executed directly (native = 1). Others should go through the command wrapper flow (for tracking resource usage, for example, or for changing some HCA configurations that slaves need to be notified of). This patch sets the groundwork for this capability -- adding the correct value of "native" in each case. Note that if SRIOV is not activated, this parameter has no effect. Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:05 -05:00
Jack Morgenstein	65dab25deb	mlx4: Extanding port_mask functionality Port mask now has additional state. Port can be set as "none". In this case neither the mlx4_en or mlx4_ib drivers take ownership of the port. In multifunction mode there is an option to set the vfs as single ported devices. (in single function mode, both physical ports belong to same function) Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:05 -05:00
Jack Morgenstein	623ed84b1f	mlx4_core: initial header-file changes for SRIOV support These changes will not affect module operation as yet. They are only to get some structs and enums in place for use by subsequent patches (making those smaller). Added here: * sriov state structs and inlines (mlx4_is_master/slave/mfunc) * comm-channel and vhcr support structures * enum values for new FW and comm-channel virtual commands (i.e., commands, passed via the comm channel to the PF-driver). * prototypes for many command wrapper functions (used by the PF context for processing FW commands passed to it by the VFs). * struct mlx4_eqe is moved from eq.c to mlx4.h (it will be used by other mlx4_core source files). Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:56:05 -05:00
Eric Dumazet	9f048bfba1	net: fix build error if CONFIG_CGROUPS=n Reported-by: Christoph Paasch <christoph.paasch@uclouvain.be> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:45:17 -05:00
Sathya Perla	11ac75ed1e	be2net: refactor/cleanup vf configuration code - use adapter->num_vfs (and not the module param) to store the actual number of vfs created. Use the same variable to reflect SRIOV enable/disable state. So, drop the adapter->sriov_enabled field. - use for_all_vfs() macro in VF configuration code - drop the "vf_" prefix for the fields of be_vf_cfg; the prefix is redundant and removing it helps reduce line wrap Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:34:26 -05:00
Sathya Perla	110b82bc62	be2net: fix ethtool ringparam reporting The ethtool "-g" option is supposed to report the max queue length and user modified queue length for RX and TX queues. be2net doesn't support user modification of queue lengths. So, the correct values for these would be the max numbers. be2net incorrectly reports the queue used values for these fields. Signed-off-by: Sathya Perla <sathya.perla@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:34:26 -05:00
Dmitry Kravkov	036d2df9b3	bnx2x: properly update skb when mtu > 1500 Since commit `e52fcb2462` newly allocated skb for small packets are not updated properly and dropped by stack. Signed-off-by: Dmitry Kravkov <dmitry@broadcom.com> Signed-off-by: Eilon Greenstein <eilong@broadcom.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-13 13:30:45 -05:00
Hagen Paul Pfeifer	90b41a1cd4	netem: add cell concept to simulate special MAC behavior This extension can be used to simulate special link layer characteristics. Simulate because packet data is not modified, only the calculation base is changed to delay a packet based on the original packet size and artificial cell information. packet_overhead can be used to simulate a link layer header compression scheme (e.g. set packet_overhead to -20) or with a positive packet_overhead value an additional MAC header can be simulated. It is also possible to "replace" the 14 byte Ethernet header with something else. cell_size and cell_overhead can be used to simulate link layer schemes, based on cells, like some TDMA schemes. Another application area are MAC schemes using a link layer fragmentation with a (small) header each. Cell size is the maximum amount of data bytes within one cell. Cell overhead is an additional variable to change the per-cell-overhead (e.g. 5 byte header per fragment). Example (5 kbit/s, 20 byte per packet overhead, cell-size 100 byte, per cell overhead 5 byte): tc qdisc add dev eth0 root netem rate 5kbit 20 100 5 Signed-off-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:44:48 -05:00
David S. Miller	c7c6575f25	Merge branch 'batman-adv/next' of git://git.open-mesh.org/linux-merge	2011-12-12 19:26:07 -05:00
Eric Dumazet	3f1e6d3fd3	sch_gred: should not use GFP_KERNEL while holding a spinlock gred_change_vq() is called under sch_tree_lock(sch). This means a spinlock is held, and we are not allowed to sleep in this context. We might pre-allocate memory using GFP_KERNEL before taking spinlock, but this is not suitable for stable material. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:08:54 -05:00
Glauber Costa	0850f0f5c5	Display maximum tcp memory allocation in kmem cgroup This patch introduces kmem.tcp.max_usage_in_bytes file, living in the kmem_cgroup filesystem. The root cgroup will display a value equal to RESOURCE_MAX. This is to avoid introducing any locking schemes in the network paths when cgroups are not being actively used. All others, will see the maximum memory ever used by this cgroup. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:11 -05:00
Glauber Costa	ffea59e504	Display current tcp failcnt in kmem cgroup This patch introduces kmem.tcp.failcnt file, living in the kmem_cgroup filesystem. Following the pattern in the other memcg resources, this files keeps a counter of how many times allocation failed due to limits being hit in this cgroup. The root cgroup will always show a failcnt of 0. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:11 -05:00
Glauber Costa	5a6dd34377	Display current tcp memory allocation in kmem cgroup This patch introduces kmem.tcp.usage_in_bytes file, living in the kmem_cgroup filesystem. It is a simple read-only file that displays the amount of kernel memory currently consumed by the cgroup. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:11 -05:00
Glauber Costa	3aaabe2342	tcp buffer limitation: per-cgroup limit This patch uses the "tcp.limit_in_bytes" field of the kmem_cgroup to effectively control the amount of kernel memory pinned by a cgroup. This value is ignored in the root cgroup, and in all others, caps the value specified by the admin in the net namespaces' view of tcp_sysctl_mem. If namespaces are being used, the admin is allowed to set a value bigger than cgroup's maximum, the same way it is allowed to set pretty much unlimited values in a real box. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:11 -05:00
Glauber Costa	3dc43e3e4d	per-netns ipv4 sysctl_tcp_mem This patch allows each namespace to independently set up its levels for tcp memory pressure thresholds. This patch alone does not buy much: we need to make this values per group of process somehow. This is achieved in the patches that follows in this patchset. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:11 -05:00
Glauber Costa	d1a4c0b37c	tcp memory pressure controls This patch introduces memory pressure controls for the tcp protocol. It uses the generic socket memory pressure code introduced in earlier patches, and fills in the necessary data in cg_proto struct. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtisu.com> CC: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:10 -05:00
Glauber Costa	e1aab161e0	socket: initial cgroup code. The goal of this work is to move the memory pressure tcp controls to a cgroup, instead of just relying on global conditions. To avoid excessive overhead in the network fast paths, the code that accounts allocated memory to a cgroup is hidden inside a static_branch(). This branch is patched out until the first non-root cgroup is created. So when nobody is using cgroups, even if it is mounted, no significant performance penalty should be seen. This patch handles the generic part of the code, and has nothing tcp-specific. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujtsu.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:10 -05:00
Glauber Costa	180d8cd942	foundations of per-cgroup memory pressure controlling. This patch replaces all uses of struct sock fields' memory_pressure, memory_allocated, sockets_allocated, and sysctl_mem to acessor macros. Those macros can either receive a socket argument, or a mem_cgroup argument, depending on the context they live in. Since we're only doing a macro wrapping here, no performance impact at all is expected in the case where we don't have cgroups disabled. Signed-off-by: Glauber Costa <glommer@parallels.com> Reviewed-by: Hiroyouki Kamezawa <kamezawa.hiroyu@jp.fujitsu.com> CC: David S. Miller <davem@davemloft.net> CC: Eric W. Biederman <ebiederm@xmission.com> CC: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:04:10 -05:00
Glauber Costa	e5671dfae5	Basic kernel memory functionality for the Memory Controller This patch lays down the foundation for the kernel memory component of the Memory Controller. As of today, I am only laying down the following files: * memory.independent_kmem_limit * memory.kmem.limit_in_bytes (currently ignored) * memory.kmem.usage_in_bytes (always zero) Signed-off-by: Glauber Costa <glommer@parallels.com> CC: Kirill A. Shutemov <kirill@shutemov.name> CC: Paul Menage <paul@paulmenage.org> CC: Greg Thelen <gthelen@google.com> CC: Johannes Weiner <jweiner@redhat.com> CC: Michal Hocko <mhocko@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:03:55 -05:00
Laszlo Ersek	08e34eb14f	xen-netfront: delay gARP until backend switches to Connected After a guest is live migrated, the xen-netfront driver emits a gratuitous ARP message, so that networking hardware on the target host's subnet can take notice, and public routing to the guest is re-established. However, if the packet appears on the backend interface before the backend is added to the target host's bridge, the packet is lost, and the migrated guest's peers become unable to talk to the guest. A sufficient two-parts condition to prevent the above is: (1) ensure that the backend only moves to Connected xenbus state after its hotplug scripts completed, ie. the netback interface got added to the bridge; and (2) ensure the frontend only queues the gARP when it sees the backend move to Connected. These two together provide complete ordering. Sub-condition (1) is already satisfied by commit `f942dc2552` in Linus' tree, based on commit 6b0b80ca7165 from [1]. In general, the full condition is sufficient, not necessary, because, according to [2], live migration has been working for a long time without satisfying sub-condition (2). However, after 6b0b80ca7165 was backported to the RHEL-5 host to ensure (1), (2) still proved necessary in the RHEL-6 guest. This patch intends to provide (2) for upstream. The Reviewed-by line comes from [3]. [1] git://xenbits.xen.org/people/ianc/linux-2.6.git#upstream/dom0/backend/netback-history [2] http://old-list-archives.xen.org/xen-devel/2011-06/msg01969.html [3] http://old-list-archives.xen.org/xen-devel/2011-07/msg00484.html Signed-off-by: Laszlo Ersek <lersek@redhat.com> Reviewed-by: Ian Campbell <ian.campbell@citrix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 19:02:41 -05:00
Ted Feng	72b36015ba	ipip, sit: copy parms.name after register_netdevice Same fix as `731abb9cb2` for ipip and sit tunnel. Commit `1c5cae815d` removed an explicit call to dev_alloc_name in ipip_tunnel_locate and ipip6_tunnel_locate, because register_netdevice will now create a valid name, however the tunnel keeps a copy of the name in the private parms structure. Fix this by copying the name back after register_netdevice has successfully returned. This shows up if you do a simple tunnel add, followed by a tunnel show: $ sudo ip tunnel add mode ipip remote 10.2.20.211 $ ip tunnel tunl0: ip/ip remote any local any ttl inherit nopmtudisc tunl%d: ip/ip remote 10.2.20.211 local any ttl inherit $ sudo ip tunnel add mode sit remote 10.2.20.212 $ ip tunnel sit0: ipv6/ip remote any local any ttl 64 nopmtudisc 6rd-prefix 2002::/16 sit%d: ioctl 89f8 failed: No such device sit%d: ipv6/ip remote 10.2.20.212 local any ttl inherit Cc: stable@vger.kernel.org Signed-off-by: Ted Feng <artisdom@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 18:50:51 -05:00
Li Wei	4af04aba93	ipv6: Fix for adding multicast route for loopback device automatically. There is no obvious reason to add a default multicast route for loopback devices, otherwise there would be a route entry whose dst.error set to -ENETUNREACH that would blocking all multicast packets. ==================== [ more detailed explanation ] The problem is that the resulting routing table depends on the sequence of interface's initialization and in some situation, that would block all muticast packets. Suppose there are two interfaces on my computer (lo and eth0), if we initailize 'lo' before 'eth0', the resuting routing table(for multicast) would be # ip -6 route show \| grep ff00:: unreachable ff00::/8 dev lo metric 256 error -101 ff00::/8 dev eth0 metric 256 When sending multicasting packets, routing subsystem will return the first route entry which with a error set to -101(ENETUNREACH). I know the kernel will set the default ipv6 address for 'lo' when it is up and won't set the default multicast route for it, but there is no reason to stop 'init' program from setting address for 'lo', and that is exactly what systemd did. I am sure there is something wrong with kernel or systemd, currently I preferred kernel caused this problem. ==================== Signed-off-by: Li Wei <lw@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-12 18:48:18 -05:00
Rajkumar Manoharan	10636bc2d6	ath9k: fix max phy rate at rate control init The stations always chooses 1Mbps for all trasmitting frames, whenever the AP is configured to lock the supported rates. As the max phy rate is always set with the 4th from highest phy rate, this assumption might be wrong if we have less than that. Fix that. Cc: stable@kernel.org Cc: Paul Stewart <pstew@google.com> Reported-by: Ajay Gummalla <agummalla@google.com> Signed-off-by: Rajkumar Manoharan <rmanohar@qca.qualcomm.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-12-12 14:23:28 -05:00
Dan Carpenter	f8c141c3e9	nfc: signedness bug in __nci_request() wait_for_completion_interruptible_timeout() returns -ERESTARTSYS if interrupted so completion_rc needs to be signed. The current code probably returns -ETIMEDOUT if we hit this situation, but after this patch is applied it will return -ERESTARTSYS. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-12-12 14:23:27 -05:00
Wey-Yi Guy	123877b80e	iwlwifi: do not set the sequence control bit is not needed Check the IEEE80211_TX_CTL_ASSIGN_SEQ flag from mac80211, then decide how to set the TX_CMD_FLG_SEQ_CTL_MSK bit. Setting the wrong bit in BAR frame whill make the firmware to increment the sequence number which is incorrect and cause unknown behavior. CC: stable@vger.kernel.org #3.0+ Signed-off-by: Wey-Yi Guy <wey-yi.w.guy@intel.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-12-12 14:23:27 -05:00
John W. Linville	f2abba4921	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem	2011-12-12 14:19:43 -05:00
Sven Eckelmann	b5a1eeef04	batman-adv: Only write requested number of byte to user buffer Don't write more than the requested number of bytes of an batman-adv icmp packet to the userspace buffer. Otherwise unrelated userspace memory might get overridden by the kernel. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>	2011-12-12 19:11:07 +08:00
Sven Eckelmann	d18eb45332	batman-adv: Directly check read of icmp packet in copy_from_user The access_ok read check can be directly done in copy_from_user since a failure of access_ok is handled the same way as an error in __copy_from_user. Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>	2011-12-12 19:11:06 +08:00
Paul Kot	c00b6856fc	batman-adv: bat_socket_read missing checks Writing a icmp_packet_rr and then reading icmp_packet can lead to kernel memory corruption, if __user *buf is just below TASK_SIZE. Signed-off-by: Paul Kot <pawlkt@gmail.com> [sven@narfation.org: made it checkpatch clean] Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>	2011-12-12 19:11:06 +08:00
Eric Dumazet	dfd56b8b38	net: use IS_ENABLED(CONFIG_IPV6) Instead of testing defined(CONFIG_IPV6) \|\| defined(CONFIG_IPV6_MODULE) Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-11 18:25:16 -05:00
Ajit Khaparde	1ded132d4c	be2net: workaround to fix a bug in BE disable Tx vlan offloading in certain cases. Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-11 18:25:15 -05:00
Ajit Khaparde	02fe702796	be2net: update some counters to display via ethtool update pmem_fifo_overflow_drop, rx_priority_pause_frames counters. Signed-off-by: Ajit Khaparde <ajit.khaparde@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-11 18:25:15 -05:00
Pavel Emelyanov	86e62ad6b2	udp_diag: Fix the !ipv6 case Wrap the udp6 lookup into the proper ifdef-s. Signed-off-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-12-10 13:14:59 -05:00

... 3 4 5 6 7 ...

278005 Commits (42bd48e0145567acf7b3d2ae48bea765315bdd89) All Branches Search

278005 Commits (42bd48e0145567acf7b3d2ae48bea765315bdd89)

All Branches