linux

q3k/linux

Author	SHA1	Message	Date
YOSHIFUJI Hideaki / 吉藤英明	d3aedd5ebd	ipv6 flowlabel: Convert hash list to RCU. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 22:41:13 -05:00
YOSHIFUJI Hideaki / 吉藤英明	f256dc59d0	ipv6 flowlabel: Ensure to take lock when modifying np->ip6_sk_fl_list. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 22:41:12 -05:00
Eric Dumazet	3b58908a92	x86: bpf_jit_comp: add pkt_type support Supporting access to skb->pkt_type is a bit tricky if we want to have a generic code, allowing pkt_type to be moved in struct sk_buff pkt_type is a bit field, so compiler cannot really help us to find its offset. Let's use a helper for this : It will throw a one time message if pkt_type no longer starts at a byte boundary or is no longer a 3bit field. Reported-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 22:38:34 -05:00
Jitendra Kalsaria	45acd3a0a1	qlcnic: Bump up the version to 5.1.33 Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:50 -05:00
Stephen Hemminger	fec9dd15d5	qlcnic: make pci_error_handlers const Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:50 -05:00
Manish chopra	22fd5ab462	qlcnic: Fix RX/TX checksum setting for some adapter types Signed-off-by: Manish chopra <manish.chopra@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:50 -05:00
Shahed Shaikh	4d53f40f54	qlcnic: Fix minidump in NPAR mode Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:50 -05:00
Manish chopra	283c1c6870	qlcnic: driver LRO bug fix o ipv4 address was not getting programmed properly because of improper byte order conversion Signed-off-by: Manish chopra <manish.chopra@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:50 -05:00
Manish chopra	cdc84dda1e	qlcnic: Free irq for mailbox interrupts Signed-off-by: Manish chopra <manish.chopra@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:49 -05:00
Manish chopra	1403f43a8f	qlcnic: Fix bug in reading HW reset template Signed-off-by: Manish chopra <manish.chopra@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:49 -05:00
Shahed Shaikh	069048f18b	qlcnic: Fix sparse check endian warnings Signed-off-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:34:49 -05:00
Daniele Palmas	3d6d7ab588	NET: qmi_wwan: add Telit LE920 support Add VID, PID and fixed interface for Telit LE920 Signed-off-by: Daniele Palmas <dnlplm@gmail.com> Acked-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:28:00 -05:00
Marcelo Ricardo Leitner	bd30e94720	ipv6: do not create neighbor entries for local delivery They will be created at output, if ever needed. This avoids creating empty neighbor entries when TPROXYing/Forwarding packets for addresses that are not even directly reachable. Note that IPv4 already handles it this way. No neighbor entries are created for local input. Tested by myself and customer. Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 20:26:07 -05:00
Bjørn Mork	70c37bf97f	net: usbnet: prevent buggy devices from killing us A device sending 0 length frames as fast as it can has been observed killing the host system due to the resulting memory pressure. Temporarily disable RX skb allocation and URB submission when the current error ratio is high, preventing us from trying to allocate an infinite number of skbs. Reenable as soon as we are finished processing the done queue, allowing the device to continue working after short error bursts. Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Oliver Neukum <oneukum@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 17:35:44 -05:00
Jiri Pirko	409cc1f8a4	bond: have random dev address by default instead of zeroes Makes more sense to have randomly generated address by default than to have all zeroes. It also allows user to for example put the bond into bridge without need to have any slaves in it. Also note that this changes only behaviour of bonds with no slaves. Once the first slave device is enslaved, its address will be used (no change here). Also, fix dev_assign_type values on the way. Reported-by: Pavel Šimerda <psimerda@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-30 15:34:00 -05:00
Bing Zhao	8a7d7cbf7b	mwifiex: fix incomplete scan in case of IE parsing error A scan request is split into multiple scan commands queued in scan_pending_q. Each scan command will be sent to firmware and its response is handlded one after another. If any error is detected while parsing IE in command response buffer the remaining data will be ignored and error is returned. We should check if there is any more scan commands pending in the queue before returning error. This ensures that we will call cfg80211_scan_done if this is the last scan command, or send next scan command in scan_pending_q to firmware. Cc: "3.6+" <stable@vger.kernel.org> Signed-off-by: Bing Zhao <bzhao@marvell.com> Signed-off-by: Amitkumar Karwar <akarwar@marvell.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2013-01-30 14:13:09 -05:00
Michał Mirosław	d2ed273d30	net: disallow drivers with buggy VLAN accel to register_netdevice() Instead of jumping aroung bugs that are easily fixed just don't let them in: affected drivers should be either fixed or have NETIF_F_HW_VLAN_FILTER removed from advertised features. Quick grep in drivers/net shows two drivers that have NETIF_F_HW_VLAN_FILTER but not ndo_vlan_rx_add/kill_vid(), but those are false-positives (features are commented out). OTOH two drivers have ndo_vlan_rx_add/kill_vid() implemented but don't advertise NETIF_F_HW_VLAN_FILTER. Those are: +ethernet/cisco/enic/enic_main.c +ethernet/qlogic/qlcnic/qlcnic_main.c Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明	29e3b1608c	netfilter ipset: Use ipv6_addr_equal() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明	d9e85655b5	netfilter ip6table_mangle: Use ipv6_addr_equal() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明	70e94e66ae	xfrm: Convert xfrm_addr_cmp() to boolean xfrm_addr_equal(). All users of xfrm_addr_cmp() use its result as boolean. Introduce xfrm_addr_equal() (which is equal to !xfrm_addr_cmp()) and convert all users. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明	ff88b30c71	xfrm: Use ipv6_addr_equal() where appropriate. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 22:58:40 -05:00
YOSHIFUJI Hideaki / 吉藤英明	07c2fecc36	ipv6 mcast: Use ipv6_addr_equal() in ip6_mc_source(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 22:58:40 -05:00
Neil Horman	6cdd20c380	vmxnet3: set carrier state properly on probe vmxnet3 fails to set netif_carrier_off on probe, meaning that when an interface is opened the __LINK_STATE_NOCARRIER bit is already cleared, and so /sys/class/net/<ifname>/operstate remains in the unknown state. Correct this by setting netif_carrier_off on probe, like other drivers do. Also, while we're at it, lets remove the netif_carrier_ok checks from the link_state_update function, as that check is atomically contained within the netif_carrier_[on\|off] functions anyway Tested successfully by myself Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: "David S. Miller" <davem@davemloft.net> CC: "VMware, Inc." <pv-drivers@vmware.com> CC: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 16:29:22 -05:00
Bruce Allan	286003048a	e1000e: enable ECC on I217/I218 to catch packet buffer memory errors In rare instances, memory errors have been detected in the internal packet buffer memory on I217/I218 when stressed under certain environmental conditions. Enable Error Correcting Code (ECC) in hardware to catch both correctable and uncorrectable errors. Correctable errors will be handled by the hardware. Uncorrectable errors in the packet buffer will cause the packet to be received with an error indication in the buffer descriptor causing the packet to be discarded. If the uncorrectable error is in the descriptor itself, the hardware will stop and interrupt the driver indicating the error. The driver will then reset the hardware in order to clear the error and restart. Both types of errors will be accounted for in statistics counters. Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Cc: <stable@vger.kernel.org> # 3.5.x & 3.6.x Tested-by: Jeff Pieper <jeffrey.e.pieper@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 16:01:33 -05:00
David S. Miller	b53c47dd4f	Included changes: - fix recently introduced output behaviour -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.19 (GNU/Linux) iQIcBAABAgAGBQJRBtWEAAoJEADl0hg6qKeO5P0P/1SvRNUjavJkZp5G7Yb/HhEQ 66rwst3tC6Bo0d7jpi0IX/EQd9KpQNPqmq/DzDiwPtNEYEA2obI8fKeUMvT6wXR+ +tdviD0biz7tzKoy8kFlRv2eQEs7ibvjsmdOaqoiFzfBfdDJjxmME6iFuyyNbbU1 3Yvb81lUpnT2+kw3vG5+yjvwlzrBal9Jwigeah3ok0TkzdvRmrU92TSt6KGhRvyx r6E2q8PA7pQvSPgRWjGW+8hye/6XzTsqe+zCT7yOe/LT+epMjn5rrgEzRdUvjzFS RGjAIMW82Cqn00h6PtacjOVecL3RjyfAmDamvPym8/6x2FxbfR00zkSPX/AX6F8T pxq4QhmMWP4YDYG8oHL1N2PUMqMODmhuDivrBfqTSpQO7h7KMZ0T4LHBXRoSuWSy W46/QuO/xazQzgoAZlO/CDM2ZNMAee6ol2wvN30uX9JmLb+12OTjD/7Isn4LP5oO rdJXhELwxd5oowo+NDXVIq0YmXjMNO9iukMhURBMNKXr/U+i0N9O124ba9p8ahFM voI19Iflhtl2nzMP4WsycNaaLVKVAalhV8hAu4y75aOA7Z6PwqcIHv4aSjOWwJmb 2jhH+PKyvddyJaduWLCYcAqvDZ8Gj2fVl9yf9uFcWJKgO3BbrNN69eOdOtLoCHkO ixJSGwm8lvUnXc0jOThJ =os3V -----END PGP SIGNATURE----- Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge Included changes: - fix recently introduced output behaviour Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:59:45 -05:00
Milos Vyletel	eb492f7443	bonding: unset primary slave via sysfs When bonding module is loaded with primary parameter and one decides to unset primary slave using sysfs these settings are not preserved during bond device restart. Primary slave is only unset once and it's not remembered in bond->params structure. Below is example of recreation. grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond0 BONDING_OPTS="mode=active-backup miimon=100 primary=eth01" grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: eth01 (primary_reselect always) echo "" > /sys/class/net/bond0/bonding/primary grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: None sed -i -e 's/primary=eth01//' /etc/sysconfig/network-scripts/ifcfg-bond0 grep OPTS /etc/sysconfig/network-scripts/ifcfg-bond BONDING_OPTS="mode=active-backup miimon=100 " ifdown bond0 && ifup bond0 without patch: grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: eth01 (primary_reselect always) with patch: grep "Primary Slave" /proc/net/bonding/bond0 Primary Slave: None Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Milos Vyletel <milos.vyletel@sde.cz> Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:35 -05:00
Nivedita Singhvi	2aeef18d37	tcp: Increment LISTENOVERFLOW and LISTENDROPS in tcp_v4_conn_request() We drop a connection request if the accept backlog is full and there are sufficient packets in the syn queue to warrant starting drops. Increment the appropriate counters so this isn't silent, for accurate stats and help in debugging. This patch assumes LINUX_MIB_LISTENDROPS is a superset of/includes the counter LINUX_MIB_LISTENOVERFLOWS. Signed-off-by: Nivedita Singhvi <niv@us.ibm.com> Acked-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:04 -05:00
YOSHIFUJI Hideaki / 吉藤英明	5e98a36ed4	ipv6 addrconf: Fix interface identifiers of 802.15.4 devices. The "Universal/Local" (U/L) bit must be complmented according to RFC4944 and RFC2464. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:04 -05:00
Sarveshwar Bandi	00d3d51e9d	be2net: Updating Module Author string and log message string to "Emulex Corporation" Signed-off-by: Sarveshwar Bandi <sarveshwar.bandi@emulex.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:04 -05:00
Jason Wang	9e85722d58	tuntap: allow polling/writing/reading when detached We forbid polling, writing and reading when the file were detached, this may complex the user in several cases: - when guest pass some buffers to vhost/qemu and then disable some queues, host/qemu needs to do its own cleanup on those buffers which is complex sometimes. We can do this simply by allowing a user can still write to an disabled queue. Write to an disabled queue will cause the packet pass to the kernel and read will get nothing. - align the polling behavior with macvtap which never fails when the queue is created. This can simplify the polling errors handling of its user (e.g vhost) We can simply achieve this by don't assign NULL to tfile->tun when detached. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:04 -05:00
Jason Wang	2b8b328b61	vhost_net: handle polling errors when setting backend Currently, the polling errors were ignored, which can lead following issues: - vhost remove itself unconditionally from waitqueue when stopping the poll, this may crash the kernel since the previous attempt of starting may fail to add itself to the waitqueue - userspace may think the backend were successfully set even when the polling failed. Solve this by: - check poll->wqh before trying to remove from waitqueue - report polling errors in vhost_poll_start(), tx_poll_start(), the return value will be checked and returned when userspace want to set the backend After this fix, there still could be a polling failure after backend is set, it will addressed by the next patch. Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:03 -05:00
Jason Wang	692a998b90	vhost_net: correct error handling in vhost_net_set_backend() Currently, when vhost_init_used() fails the sock refcnt and ubufs were leaked. Correct this by calling vhost_init_used() before assign ubufs and restore the oldsock when it fails. Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:03 -05:00
Michael S. Tsirkin	af668b3c27	tun: fix carrier on/off status Commit `c8d68e6be1` removed carrier off call from tun_detach since it's now called on queue disable and not only on tun close. This confuses userspace which used this flag to detect a free tun. To fix, put this back but under if (clean). Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Tested-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:03 -05:00
Cong Wang	604dfd6efc	pktgen: correctly handle failures when adding a device The return value of pktgen_add_device() is not checked, so even if we fail to add some device, for example, non-exist one, we still see "OK:...". This patch fixes it. After this patch, I got: # echo "add_device non-exist" > /proc/net/pktgen/kpktgend_0 -bash: echo: write error: No such device # cat /proc/net/pktgen/kpktgend_0 Running: Stopped: Result: ERROR: can not add device non-exist # echo "add_device eth0" > /proc/net/pktgen/kpktgend_0 # cat /proc/net/pktgen/kpktgend_0 Running: Stopped: eth0 Result: OK: add_device=eth0 (Candidate for -stable) Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:03 -05:00
Johannes Naab	a13d310471	netem: fix delay calculation in rate extension The delay calculation with the rate extension introduces in v3.3 does not properly work, if other packets are still queued for transmission. For the delay calculation to work, both delay types (latency and delay introduces by rate limitation) have to be handled differently. The latency delay for a packet can overlap with the delay of other packets. The delay introduced by the rate however is separate, and can only start, once all other rate-introduced delays finished. Latency delay is from same distribution for each packet, rate delay depends on the packet size. .: latency delay -: rate delay x: additional delay we have to wait since another packet is currently transmitted .....---- Packet 1 .....xx------ Packet 2 .....------ Packet 3 ^^^^^ latency stacks ^^ rate delay doesn't stack ^^ latency stacks -----> time When a packet is enqueued, we first consider the latency delay. If other packets are already queued, we can reduce the latency delay until the last packet in the queue is send, however the latency delay cannot be <0, since this would mean that the rate is overcommitted. The new reference point is the time at which the last packet will be send. To find the time, when the packet should be send, the rate introduces delay has to be added on top of that. Signed-off-by: Johannes Naab <jn@stusta.de> Acked-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:02 -05:00
Tom Parkin	80d84ef3ff	l2tp: prevent l2tp_tunnel_delete racing with userspace close If a tunnel socket is created by userspace, l2tp hooks the socket destructor in order to clean up resources if userspace closes the socket or crashes. It also caches a pointer to the struct sock for use in the data path and in the netlink interface. While it is safe to use the cached sock pointer in the data path, where the skb references keep the socket alive, it is not safe to use it elsewhere as such access introduces a race with userspace closing the socket. In particular, l2tp_tunnel_delete is prone to oopsing if a multithreaded userspace application closes a socket at the same time as sending a netlink delete command for the tunnel. This patch fixes this oops by forcing l2tp_tunnel_delete to explicitly look up a tunnel socket held by userspace using sockfd_lookup(). Signed-off-by: Tom Parkin <tparkin@katalix.com> Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:43:02 -05:00
David S. Miller	f1e7b73acc	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net Bring in the 'net' tree so that we can get some ipv4/ipv6 bug fixes that some net-next work will build upon. Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:32:13 -05:00
Hannes Frederic Sowa	218774dc34	ipv6: add anti-spoofing checks for 6to4 and 6rd This patch adds anti-spoofing checks in sit.c as specified in RFC3964 section 5.2 for 6to4 and RFC5969 section 12 for 6rd. I left out the checks which could easily be implemented with netfilter. Specifically this patch adds following logic (based loosely on the pseudocode in RFC3964 section 5.2): if prefix (inner_src_v6) == rd6_prefix (2002::/16 is the default) and outer_src_v4 != embedded_ipv4 (inner_src_v6) drop if prefix (inner_dst_v6) == rd6_prefix (or 2002::/16 is the default) and outer_dst_v4 != embedded_ipv4 (inner_dst_v6) drop accept To accomplish the specified security checks proposed by above RFCs, it is still necessary to employ uRPF filters with netfilter. These new checks only kick in if the employed addresses are within the 2002::/16 or another range specified by the 6rd-prefix (which defaults to 2002::/16). Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:22:03 -05:00
Claudiu Manoil	ee873fda3b	gianfar: Pack struct gfar_priv_grp into three cachelines * remove unused members(!): imask, ievent * move space consuming interrupt name strings (int_name_* members) to external structures, unessential for the driver's hot path * keep high priority hot path data within the first 2 cache lines This reduces struct gfar_priv_grp from 6 to 3 cache lines. (Also fixed checkpatch warnings for the old code, in the process.) Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:22:02 -05:00
Claudiu Manoil	5fedcc14d4	gianfar: Cleanup gfar_parse_group() code Factor out redundant code (improve readability, source code size). Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:22:02 -05:00
Claudiu Manoil	0cd3fdea07	gianfar: Optimize struct gfar_priv_tx_q for two cache lines Resize and regroup structure members to eliminate memory holes and to pack the structure into 2 cache lines (from 3). tx_ring_size was resized from 4 to 2 bytes and few members were re-grouped in order to eliminate byte holes and achieve compactness. Where possible, few members were grouped according to their usage and access order (i.e. start_xmit vs. clean_tx_ring members), less important members were pushed at the end. Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:22:02 -05:00
Eric W. Biederman	243bb4c6d7	ipv6: Fix inet6_csk_bind_conflict so it builds with user namespaces enabled When attempting to build linux-next with user namespaces enabled I ran into this fun build error. CC net/ipv6/inet6_connection_sock.o .../net/ipv6/inet6_connection_sock.c: In function ‘inet6_csk_bind_conflict’: .../net/ipv6/inet6_connection_sock.c:37:12: error: incompatible types when initializing type ‘int’ using type ‘kuid_t’ .../net/ipv6/inet6_connection_sock.c:54:30: error: incompatible type for argument 1 of ‘uid_eq’ .../include/linux/uidgid.h:48:20: note: expected ‘kuid_t’ but argument is of type ‘int’ make[3]: * [net/ipv6/inet6_connection_sock.o] Error 1 make[2]: * [net/ipv6] Error 2 make[2]: *** Waiting for unfinished jobs.... Using kuid_t instead of int to hold the uid fixes this. Cc: Tom Herbert <therbert@google.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:20:12 -05:00
Cong Wang	4e58a02750	pktgen: support net namespace v3: make pktgen_threads list per-namespace v2: remove a useless check This patch add net namespace to pktgen, so that we can use pktgen in different namespaces. Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Cong Wang <amwang@redhat.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 15:17:46 -05:00
Frank Li	dc975382d2	net: fec: add napi support to improve proformance Add napi support Before this patch iperf -s -i 1 ------------------------------------------------------------ Server listening on TCP port 5001 TCP window size: 85.3 KByte (default) ------------------------------------------------------------ [ 4] local 10.192.242.153 port 5001 connected with 10.192.242.138 port 50004 [ ID] Interval Transfer Bandwidth [ 4] 0.0- 1.0 sec 41.2 MBytes 345 Mbits/sec [ 4] 1.0- 2.0 sec 43.7 MBytes 367 Mbits/sec [ 4] 2.0- 3.0 sec 42.8 MBytes 359 Mbits/sec [ 4] 3.0- 4.0 sec 43.7 MBytes 367 Mbits/sec [ 4] 4.0- 5.0 sec 42.7 MBytes 359 Mbits/sec [ 4] 5.0- 6.0 sec 43.8 MBytes 367 Mbits/sec [ 4] 6.0- 7.0 sec 43.0 MBytes 361 Mbits/sec After this patch [ 4] 2.0- 3.0 sec 51.6 MBytes 433 Mbits/sec [ 4] 3.0- 4.0 sec 51.8 MBytes 435 Mbits/sec [ 4] 4.0- 5.0 sec 52.2 MBytes 438 Mbits/sec [ 4] 5.0- 6.0 sec 52.1 MBytes 437 Mbits/sec [ 4] 6.0- 7.0 sec 52.1 MBytes 437 Mbits/sec [ 4] 7.0- 8.0 sec 52.3 MBytes 439 Mbits/sec Signed-off-by: Frank Li <Frank.Li@freescale.com> Signed-off-by: Fugang Duan <B38611@freescale.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 14:16:17 -05:00
Barry Grussling	72aa8e1b29	ethoc: Cleanup driver format Cleanup the format of ethoc.c to meet network driver style as per checkpatch.pl. Signed-off-by: Barry Grussling <barry@grussling.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 14:07:05 -05:00
David Ward	040468a0a7	ip_gre: When TOS is inherited, use configured TOS value for non-IP packets A GRE tunnel can be configured so that outgoing tunnel packets inherit the value of the TOS field from the inner IP header. In doing so, when a non-IP packet is transmitted through the tunnel, the TOS field will always be set to 0. Instead, the user should be able to configure a different TOS value as the fallback to use for non-IP packets. This is helpful when the non-IP packets are all control packets and should be handled by routers outside the tunnel as having Internet Control precedence. One example of this is the NHRP packets that control a DMVPN-compatible mGRE tunnel; they are encapsulated directly by GRE and do not contain an inner IP header. Under the existing behavior, the IFLA_GRE_TOS parameter must be set to '1' for the TOS value to be inherited. Now, only the least significant bit of this parameter must be set to '1', and when a non-IP packet is sent through the tunnel, the upper 6 bits of this same parameter will be copied into the TOS field. (The ECN bits get masked off as before.) This behavior is backwards-compatible with existing configurations and iproute2 versions. Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 14:05:28 -05:00
Jiri Pirko	5c766d642b	ipv4: introduce address lifetime There are some usecase when lifetime of ipv4 addresses might be helpful. For example: 1) initramfs networkmanager uses a DHCP daemon to learn network configuration parameters 2) initramfs networkmanager addresses, routes and DNS configuration 3) initramfs networkmanager is requested to stop 4) initramfs networkmanager stops all daemons including dhclient 5) there are addresses and routes configured but no daemon running. If the system doesn't start networkmanager for some reason, addresses and routes will be used forever, which violates RFC 2131. This patch is essentially a backport of ivp6 address lifetime mechanism for ipv4 addresses. Current "ip" tool supports this without any patch (since it does not distinguish between ipv4 and ipv6 addresses in this perspective. Also, this should be back-compatible with all current netlink users. Reported-by: Pavel Šimerda <psimerda@redhat.com> Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 13:59:57 -05:00
David S. Miller	5a1dc31708	Merge branch 'ipfrags' Jesper Dangaard Brouer says: ==================== This patchset is V2, with some trivial code fixes, which were noticed by DaveM. It is still a partly respin of my fragmentation optimization patches: http://thread.gmane.org/gmane.linux.network/250914 This is not the complete patchset, from the gmane link above. In this patchset, I primarily focus on adjusting cacheline for better SMP/NUMA performance. Once this patchset have been agreed upon, I will continue and respin the rest of my patches. This time around, I have created a frag DoS generator, via the tool trafgen (http://netsniff-ng.org/). To create a stable DoS scenario (no longer relying on frame dropping due to disabled flow-control). Two 10G interfaces are under-test, and uses Ethernet flow-control. A third interface is used for generating the DoS attack (this interface is also 10G, but it does not need to be, as 500Kpps DoS is enough). Test types summary (netperf): Test-20G64K == 2x10G with 65K fragments Test-20G3F == 2x10G with 3x fragments (31472 bytes) Test-20G64K+DoS == Same as 20G64K with frag DoS Test-20G3F+DoS == Same as 20G3F with frag DoS Patch list: Patch-01 - net: cacheline adjust struct netns_frags for better frag performance Patch-02 - net: cacheline adjust struct inet_frags for better frag performance Patch-03 - net: cacheline adjust struct inet_frag_queue Patch-04 - net: frag helper functions for mem limit tracking Patch-05 - net: use lib/percpu_counter API for fragmentation mem accounting Patch-06 - net: frag, move LRU list maintenance outside of rwlock Performance table summary: Test-type: Test-20G64K Test-20G3F 20G64K+DoS 20G3F+DoS ---------- ----------- ---------- ---------- --------- net-next: 15114.5 Mbit/s 8954.21 2444.28 3918.01 Mbit/s Patch-01: 16075.8 Mbit/s 8976.18 2621.49 4072.79 Mbit/s Patch-02: 17806.9 Mbit/s 9280.32 2478.62 4274.59 Mbit/s Patch-03: 17317.4 Mbit/s 9308.62 2546.05 4336.59 Mbit/s Patch-04: 17635.9 Mbit/s 9256.16 2535.25 4327.63 Mbit/s Patch-05: 18027.0 Mbit/s 9918.99 2492.62 3621.68 Mbit/s Patch-06: 18486.7 Mbit/s 10723.20 3657.85 4560.64 Mbit/s I cannot explain the under-DoS regression that patch-05/percpu_counter introduces. But patch-06/LRU-lock corrects the situation again. Below is a testlab setup description, with links to the trafgen DoS packet config used. Testlab ======= Server setup ------------ The machine acting as a server: - 2x CPU (E5-2630) - Thus a NUMA arch/machine - 4x 10Gbit/s ports - NICs 2x Intel Dual port 82599 based (driver ixgbe) Setup: - Interfaces uses Ethernet flow control - Flush all iptables - Remove all iptables related module. - Kill irqbalance - Pin each 10G NIC port to a single* CPU each Pinning can easily be done by command hacks:: for x in /proc/irq//eth8/../smp_affinity_list ; do echo 1 > $x; done for x in /proc/irq//eth9/../smp_affinity_list ; do echo 3 > $x; done for x in /proc/irq//eth31/../smp_affinity_list; do echo 6 > $x; done for x in /proc/irq//eth32/../smp_affinity_list; do echo 8 > $x; done Notice NUMA setting: The CPU to NIC tying is carefully choosen according to the NUMA node setup. Thus, NICs connected to a PCI-e slot that is connected to a physical CPU socket are tied together. Choosing only a single CPU per NIC (port) is just to ease provoking and debugging this performance issue. (In real setups, you can choose more CPU, just remember the NUMA node in the equation). Tools ----- Netperf is used, with option -T to ensure CPU binding. The netserver processes, are NAPI pinned:: numactl -m0 -c0 netserver numactl -m1 -c 1 netserver -p 1337 I now have a frag DoS generator, created via the tool: trafgen (see: http://netsniff-ng.org/) Trafgen packet config file: http://people.netfilter.org/hawk/frag_work/trafgen/frag_packet03_small_frag.txf Notice, I'm using features of trafgen, recently developed by Daniel Borkmann, thus you need the latest git tree to use my trafgen packet config. git://github.com/borkmann/netsniff-ng.git Command line: trafgen --dev eth51 --conf frag_packet03_small_frag.txf -V -k 100 --cpus 2 Tests types ----------- Test(20G64K) UDP-64K 2x 10Gbit/s with no DoS traffic: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ export SIZE=$((65507)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\ netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\ netperf -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \ wait $! && tail -n3 ${LOG}.* && \ tail -n3 ${LOG}.{31,81} \| awk 'BEGIN{sum=0;} /212992 / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}' Test(20G3F) UDP-3xfrags 2x 10Gbit/s with no DoS traffic: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ export SIZE=$((31472)); export TIME=$((20)); export LOG=/tmp/netperf.log ;\ netperf -p 1337 -H 192.168.31.2 -T7,7 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.31 &\ netperf -H 192.168.81.2 -T2,2 -t UDP_STREAM -l $TIME -- -m $SIZE >> ${LOG}.81 && \ wait $! && tail -n3 ${LOG}. && \ tail -n3 ${LOG}.{31,81} \| awk 'BEGIN{sum=0;} /212992 / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}' Awk script for summming results: ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ tail -n3 ${LOG}.{31,81} \| awk 'BEGIN{sum=0;} /212992 / {sum+=$4; print " +"$4} /==/ {print " file:"$2} END{print "sum:"sum" Mbit/s"}' ==================== Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 13:37:29 -05:00
Jesper Dangaard Brouer	3ef0eb0db4	net: frag, move LRU list maintenance outside of rwlock Updating the fragmentation queues LRU (Least-Recently-Used) list, required taking the hash writer lock. However, the LRU list isn't tied to the hash at all, so we can use a separate lock for it. Original-idea-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 13:36:24 -05:00
Jesper Dangaard Brouer	6d7b857d54	net: use lib/percpu_counter API for fragmentation mem accounting Replace the per network namespace shared atomic "mem" accounting variable, in the fragmentation code, with a lib/percpu_counter. Getting percpu_counter to scale to the fragmentation code usage requires some tweaks. At first view, percpu_counter looks superfast, but it does not scale on multi-CPU/NUMA machines, because the default batch size is too small, for frag code usage. Thus, I have adjusted the batch size by using __percpu_counter_add() directly, instead of percpu_counter_sub() and percpu_counter_add(). The batch size is increased to 130.000, based on the largest 64K fragment memory usage. This does introduce some imprecise memory accounting, but its does not need to be strict for this use-case. It is also essential, that the percpu_counter, does not share cacheline with other writers, to make this scale. Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2013-01-29 13:36:24 -05:00

1 2 3 4 5 ...

350258 commits