linux

q3k/linux

Author	SHA1	Message	Date
Venkateswararao Jujjuri (JV)	a01a984035	[net/9p] Set the condition just before waking up. Given that the sprious wake-ups are common, we need to move the condition setting right next to the wake_up(). After setting the condition to req->status = REQ_STATUS_RCVD, sprious wakeups may cause the virtqueue back on the free list for someone else to use. This may result in kernel panic while relasing the pinned pages in p9_release_req_pages(). Also rearranged the while loop in req_done() for better redability. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-22 16:32:47 -05:00
Venkateswararao Jujjuri (JV)	53bda3e5b4	[net/9p] unconditional wake_up to proc waiting for space on VirtIO ring Process may wait to get space on VirtIO ring to send a transaction to VirtFS server. Current code just does a conditional wake_up() which means only one process will be woken up even if multiple processes are waiting. This fix makes the wake_up unconditional. Hence we won't have any processes waiting for-ever. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-22 16:32:19 -05:00
Aneesh Kumar K.V	472e7f9f8b	net/9p: Fix compile warning Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-22 15:43:35 -05:00
Aneesh Kumar K.V	eeff66ef6e	net/9p: Convert the in the 9p rpc call path to GFP_NOFS Without this we can cause reclaim allocation in writepage. [ 3433.448430] ================================= [ 3433.449117] [ INFO: inconsistent lock state ] [ 3433.449117] 2.6.38-rc5+ #84 [ 3433.449117] --------------------------------- [ 3433.449117] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-R} usage. [ 3433.449117] kswapd0/505 [HC0[0]:SC0[0]:HE1:SE1] takes: [ 3433.449117] (iprune_sem){+++++-}, at: [<ffffffff810ebbab>] shrink_icache_memory+0x45/0x2b1 [ 3433.449117] {RECLAIM_FS-ON-W} state was registered at: [ 3433.449117] [<ffffffff8107fe5f>] mark_held_locks+0x52/0x70 [ 3433.449117] [<ffffffff8107ff02>] lockdep_trace_alloc+0x85/0x9f [ 3433.449117] [<ffffffff810d353d>] slab_pre_alloc_hook+0x18/0x3c [ 3433.449117] [<ffffffff810d3fd5>] kmem_cache_alloc+0x23/0xa2 [ 3433.449117] [<ffffffff8127be77>] idr_pre_get+0x2d/0x6f [ 3433.449117] [<ffffffff815434eb>] p9_idpool_get+0x30/0xae [ 3433.449117] [<ffffffff81540123>] p9_client_rpc+0xd7/0x9b0 [ 3433.449117] [<ffffffff815427b0>] p9_client_clunk+0x88/0xdb [ 3433.449117] [<ffffffff811d56e5>] v9fs_evict_inode+0x3c/0x48 [ 3433.449117] [<ffffffff810eb511>] evict+0x1f/0x87 [ 3433.449117] [<ffffffff810eb5c0>] dispose_list+0x47/0xe3 [ 3433.449117] [<ffffffff810eb8da>] evict_inodes+0x138/0x14f [ 3433.449117] [<ffffffff810d90e2>] generic_shutdown_super+0x57/0xe8 [ 3433.449117] [<ffffffff810d91e8>] kill_anon_super+0x11/0x50 [ 3433.449117] [<ffffffff811d4951>] v9fs_kill_super+0x49/0xab [ 3433.449117] [<ffffffff810d926e>] deactivate_locked_super+0x21/0x46 [ 3433.449117] [<ffffffff810d9e84>] deactivate_super+0x40/0x44 [ 3433.449117] [<ffffffff810ef848>] mntput_no_expire+0x100/0x109 [ 3433.449117] [<ffffffff810f0aeb>] sys_umount+0x2f1/0x31c [ 3433.449117] [<ffffffff8102c87b>] system_call_fastpath+0x16/0x1b [ 3433.449117] irq event stamp: 192941 [ 3433.449117] hardirqs last enabled at (192941): [<ffffffff81568dcf>] _raw_spin_unlock_irq+0x2b/0x30 [ 3433.449117] hardirqs last disabled at (192940): [<ffffffff810b5f97>] shrink_inactive_list+0x290/0x2f5 [ 3433.449117] softirqs last enabled at (188470): [<ffffffff8105fd65>] __do_softirq+0x133/0x152 [ 3433.449117] softirqs last disabled at (188455): [<ffffffff8102d7cc>] call_softirq+0x1c/0x28 [ 3433.449117] [ 3433.449117] other info that might help us debug this: [ 3433.449117] 1 lock held by kswapd0/505: [ 3433.449117] #0: (shrinker_rwsem){++++..}, at: [<ffffffff810b52e2>] shrink_slab+0x38/0x15f [ 3433.449117] [ 3433.449117] stack backtrace: [ 3433.449117] Pid: 505, comm: kswapd0 Not tainted 2.6.38-rc5+ #84 [ 3433.449117] Call Trace: [ 3433.449117] [<ffffffff8107fbce>] ? valid_state+0x17e/0x191 [ 3433.449117] [<ffffffff81036896>] ? save_stack_trace+0x28/0x45 [ 3433.449117] [<ffffffff81080426>] ? check_usage_forwards+0x0/0x87 [ 3433.449117] [<ffffffff8107fcf4>] ? mark_lock+0x113/0x22c [ 3433.449117] [<ffffffff8108105f>] ? __lock_acquire+0x37a/0xcf7 [ 3433.449117] [<ffffffff8107fc0e>] ? mark_lock+0x2d/0x22c [ 3433.449117] [<ffffffff81081077>] ? __lock_acquire+0x392/0xcf7 [ 3433.449117] [<ffffffff810b14d2>] ? determine_dirtyable_memory+0x15/0x28 [ 3433.449117] [<ffffffff81081a33>] ? lock_acquire+0x57/0x6d [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff81567d85>] ? down_read+0x47/0x5c [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810ebbab>] ? shrink_icache_memory+0x45/0x2b1 [ 3433.449117] [<ffffffff810b5385>] ? shrink_slab+0xdb/0x15f [ 3433.449117] [<ffffffff810b69bc>] ? kswapd+0x574/0x96a [ 3433.449117] [<ffffffff810b6448>] ? kswapd+0x0/0x96a [ 3433.449117] [<ffffffff810714e2>] ? kthread+0x7d/0x85 [ 3433.449117] [<ffffffff8102d6d4>] ? kernel_thread_helper+0x4/0x10 [ 3433.449117] [<ffffffff81569200>] ? restore_args+0x0/0x30 [ 3433.449117] [<ffffffff81071465>] ? kthread+0x0/0x85 [ 3433.449117] [<ffffffff8102d6d0>] ? kernel_thread_helper+0x0/0x10 Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-22 15:43:35 -05:00
Yehuda Sadeh	a40c4f10e3	libceph: add lingering request and watch/notify event framework Lingering requests are requests that are sent to the OSD normally but tracked also after we get a successful request. This keeps the OSD connection open and resends the original request if the object moves to another OSD. The OSD can then send notification messages back to us if another client initiates a notify. This framework will be used by RBD so that the client gets notification when a snapshot is created by another node or tool. Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-22 11:33:55 -07:00
Julian Anastasov	04024b937a	ipv4: optimize route adding on secondary promotion Optimize the calling of fib_add_ifaddr for all secondary addresses after the promoted one to start from their place, not from the new place of the promoted secondary. It will save some CPU cycles because we are sure the promoted secondary was first for the subnet and all next secondaries do not change their place. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-22 01:06:33 -07:00
Julian Anastasov	2d230e2b2c	ipv4: remove the routes on secondary promotion The secondary address promotion relies on fib_sync_down_addr to remove all routes created for the secondary addresses when the old primary address is deleted. It does not happen for cases when the primary address is also in another subnet. Fix that by deleting local and broadcast routes for all secondaries while they are on device list and by faking that all addresses from this subnet are to be deleted. It relies on fib_del_ifaddr being able to ignore the IPs from the concerned subnet while checking for duplication. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-22 01:06:33 -07:00
Julian Anastasov	e6abbaa272	ipv4: fix route deletion for IPs on many subnets Alex Sidorenko reported for problems with local routes left after IP addresses are deleted. It happens when same IPs are used in more than one subnet for the device. Fix fib_del_ifaddr to restrict the checks for duplicate local and broadcast addresses only to the IFAs that use our primary IFA or another primary IFA with same address. And we expect the prefsrc to be matched when the routes are deleted because it is possible they to differ only by prefsrc. This patch prevents local and broadcast routes to be leaked until their primary IP is deleted finally from the box. As the secondary address promotion needs to delete the routes for all secondaries that used the old primary IFA, add option to ignore these secondaries from the checks and to assume they are already deleted, so that we can safely delete the route while these IFAs are still on the device list. Reported-by: Alex Sidorenko <alexandre.sidorenko@hp.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-22 01:06:32 -07:00
Julian Anastasov	74cb3c108b	ipv4: match prefsrc when deleting routes fib_table_delete forgets to match the routes by prefsrc. Callers can specify known IP in fc_prefsrc and we should remove the exact route. This is needed for cases when same local or broadcast addresses are used in different subnets and the routes differ only in prefsrc. All callers that do not provide fc_prefsrc will ignore the route prefsrc as before and will delete the first occurence. That is how the ip route del default magic works. Current callers are: - ip_rt_ioctl where rtentry_to_fib_config provides fc_prefsrc only when the provided device name matches IP label with colon. - inet_rtm_delroute where RTA_PREFSRC is optional too - fib_magic which deals with routes when deleting addresses and where the fc_prefsrc is always set with the primary IP for the concerned IFA. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-22 01:06:31 -07:00
Michał Mirosław	27660515a2	net: implement dev_disable_lro() hw_features compatibility Implement compatibility with new hw_features for dev_disable_lro(). This is a transition path - dev_disable_lro() should be later integrated into netdev_fix_features() after all drivers are converted. Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-22 01:00:26 -07:00
Simon Horman	736561a01f	IPVS: Use global mutex in ip_vs_app.c As part of the work to make IPVS network namespace aware __ip_vs_app_mutex was replaced by a per-namespace lock, ipvs->app_mutex. ipvs->app_key is also supplied for debugging purposes. Unfortunately this implementation results in ipvs->app_key residing in non-static storage which at the very least causes a lockdep warning. This patch takes the rather heavy-handed approach of reinstating __ip_vs_app_mutex which will cover access to the ipvs->list_head of all network namespaces. [ 12.610000] IPVS: Creating netns size=2456 id=0 [ 12.630000] IPVS: Registered protocols (TCP, UDP, SCTP, AH, ESP) [ 12.640000] BUG: key ffff880003bbf1a0 not in .data! [ 12.640000] ------------[ cut here ]------------ [ 12.640000] WARNING: at kernel/lockdep.c:2701 lockdep_init_map+0x37b/0x570() [ 12.640000] Hardware name: Bochs [ 12.640000] Pid: 1, comm: swapper Tainted: G W 2.6.38-kexec-06330-g69b7efe-dirty #122 [ 12.650000] Call Trace: [ 12.650000] [<ffffffff8102e685>] warn_slowpath_common+0x75/0xb0 [ 12.650000] [<ffffffff8102e6d5>] warn_slowpath_null+0x15/0x20 [ 12.650000] [<ffffffff8105967b>] lockdep_init_map+0x37b/0x570 [ 12.650000] [<ffffffff8105829d>] ? trace_hardirqs_on+0xd/0x10 [ 12.650000] [<ffffffff81055ad8>] debug_mutex_init+0x38/0x50 [ 12.650000] [<ffffffff8104bc4c>] __mutex_init+0x5c/0x70 [ 12.650000] [<ffffffff81685ee7>] __ip_vs_app_init+0x64/0x86 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1c33>] T.620+0x43/0x170 [ 12.660000] [<ffffffff811b1e9a>] ? register_pernet_subsys+0x1a/0x40 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.660000] [<ffffffff811b1db7>] register_pernet_operations+0x57/0xb0 [ 12.660000] [<ffffffff81685a3b>] ? ip_vs_init+0x0/0xff [ 12.670000] [<ffffffff811b1ea9>] register_pernet_subsys+0x29/0x40 [ 12.670000] [<ffffffff81685f19>] ip_vs_app_init+0x10/0x12 [ 12.670000] [<ffffffff81685a87>] ip_vs_init+0x4c/0xff [ 12.670000] [<ffffffff8166562c>] do_one_initcall+0x7a/0x12e [ 12.670000] [<ffffffff8166583e>] kernel_init+0x13e/0x1c2 [ 12.670000] [<ffffffff8128c134>] kernel_thread_helper+0x4/0x10 [ 12.670000] [<ffffffff8128ad40>] ? restore_args+0x0/0x30 [ 12.680000] [<ffffffff81665700>] ? kernel_init+0x0/0x1c2 [ 12.680000] [<ffffffff8128c130>] ? kernel_thread_helper+0x0/0x1global0 Signed-off-by: Simon Horman <horms@verge.net.au> Cc: Ingo Molnar <mingo@elte.hu> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Julian Anastasov <ja@ssi.bg> Cc: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 20:39:24 -07:00
Eric Dumazet	f40f94fc6c	ipvs: fix a typo in __ip_vs_control_init() Reported-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Simon Horman <horms@verge.net.au> Cc: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 20:39:24 -07:00
Eric W. Biederman	9d2a8fa96a	net ipv6: Fix duplicate /proc/sys/net/ipv6/neigh directory entries. When I was fixing issues with unregisgtering tables under /proc/sys/net/ipv6/neigh by adding a mount point it appears I missed a critical ordering issue, in the ipv6 initialization. I had not realized that ipv6_sysctl_register is called at the very end of the ipv6 initialization and in particular after we call neigh_sysctl_register from ndisc_init. "neigh" needs to be initialized in ipv6_static_sysctl_register which is the first ipv6 table to initialized, and definitely before ndisc_init. This removes the weirdness of duplicate tables while still providing a "neigh" mount point which prevents races in sysctl unregistering. This was initially reported at https://bugzilla.kernel.org/show_bug.cgi?id=31232 Reported-by: sunkan@zappa.cx Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 18:23:34 -07:00
Neil Horman	ac0a121d79	net: fix incorrect spelling in drop monitor protocol It was pointed out to me recently that my spelling could be better :) Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 18:20:26 -07:00
Arnd Bergmann	b20e7bbfc7	net/appletalk: fix atalk_release use after free The BKL removal in appletalk introduced a use-after-free problem, where atalk_destroy_socket frees a sock, but we still release the socket lock on it. An easy fix is to take an extra reference on the sock and sock_put it when returning from atalk_release. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 18:18:00 -07:00
Eric Dumazet	674f211599	ipx: fix ipx_release() Commit `b0d0d915d1` (remove the BKL) added a regression, because sock_put() can free memory while we are going to use it later. Fix is to delay sock_put() _after_ release_sock(). Reported-by: Ingo Molnar <mingo@elte.hu> Tested-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 18:16:39 -07:00
James Chapman	8aa525a934	l2tp: fix possible oops on l2tp_eth module unload A struct used in the l2tp_eth driver for registering network namespace ops was incorrectly marked as __net_initdata, leading to oops when module unloaded. BUG: unable to handle kernel paging request at ffffffffa00ec098 IP: [<ffffffff8123dbd8>] ops_exit_list+0x7/0x4b PGD 142d067 PUD 1431063 PMD 195da8067 PTE 0 Oops: 0000 [#1] SMP last sysfs file: /sys/module/l2tp_eth/refcnt Call Trace: [<ffffffff8123dc94>] ? unregister_pernet_operations+0x32/0x93 [<ffffffff8123dd20>] ? unregister_pernet_device+0x2b/0x38 [<ffffffff81068b6e>] ? sys_delete_module+0x1b8/0x222 [<ffffffff810c7300>] ? do_munmap+0x254/0x318 [<ffffffff812c64e5>] ? page_fault+0x25/0x30 [<ffffffff812c6952>] ? system_call_fastpath+0x16/0x1b Signed-off-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 18:10:25 -07:00
Wei Yongjun	a454f0ccef	xfrm: Fix initialize repl field of struct xfrm_state Commit 'xfrm: Move IPsec replay detection functions to a separate file' (`9fdc4883d9`) introduce repl field to struct xfrm_state, and only initialize it under SA's netlink create path, the other path, such as pf_key, ipcomp/ipcomp6 etc, the repl field remaining uninitialize. So if the SA is created by pf_key, any input packet with SA's encryption algorithm will cause panic. int xfrm_input() { ... x->repl->advance(x, seq); ... } This patch fixed it by introduce new function __xfrm_init_state(). Pid: 0, comm: swapper Not tainted 2.6.38-next+ #14 Bochs Bochs EIP: 0060:[<c078e5d5>] EFLAGS: 00010206 CPU: 0 EIP is at xfrm_input+0x31c/0x4cc EAX: dd839c00 EBX: 00000084 ECX: 00000000 EDX: 01000000 ESI: dd839c00 EDI: de3a0780 EBP: dec1de88 ESP: dec1de64 DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 Process swapper (pid: 0, ti=dec1c000 task=c09c0f20 task.ti=c0992000) Stack: 00000000 00000000 00000002 c0ba27c0 00100000 01000000 de3a0798 c0ba27c0 00000033 dec1de98 c0786848 00000000 de3a0780 dec1dea4 c0786868 00000000 dec1debc c074ee56 e1da6b8c de3a0780 c074ed44 de3a07a8 dec1decc c074ef32 Call Trace: [<c0786848>] xfrm4_rcv_encap+0x22/0x27 [<c0786868>] xfrm4_rcv+0x1b/0x1d [<c074ee56>] ip_local_deliver_finish+0x112/0x1b1 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44 [<c074ef77>] ip_local_deliver+0x3e/0x44 [<c074ed44>] ? ip_local_deliver_finish+0x0/0x1b1 [<c074ec03>] ip_rcv_finish+0x30a/0x332 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332 [<c074ef32>] NF_HOOK.clone.1+0x3d/0x44 [<c074f188>] ip_rcv+0x20b/0x247 [<c074e8f9>] ? ip_rcv_finish+0x0/0x332 [<c072797d>] __netif_receive_skb+0x373/0x399 [<c0727bc1>] netif_receive_skb+0x4b/0x51 [<e0817e2a>] cp_rx_poll+0x210/0x2c4 [8139cp] [<c072818f>] net_rx_action+0x9a/0x17d [<c0445b5c>] __do_softirq+0xa1/0x149 [<c0445abb>] ? __do_softirq+0x0/0x149 Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-21 18:08:28 -07:00
Sage Weil	6f6c700675	libceph: fix osd request queuing on osdmap updates If we send a request to osd A, and the request's pg remaps to osd B and then back to A in quick succession, we need to resend the request to A. The old code was only calling kick_requests after processing all incremental maps in a message, so it was very possible to not resend a request that needed to be resent. This would make the osd eventually time out (at least with the current default of osd timeouts enabled). The correct approach is to scan requests on every map incremental. This patch refactors the kick code in a few ways: - all requests are either on req_lru (in flight), req_unsent (ready to send), or req_notarget (currently map to no up osd) - mapping always done by map_request (previous map_osds) - if the mapping changes, we requeue. requests are resent only after all map incrementals are processed. - some osd reset code is moved out of kick_requests into a separate function - the "kick this osd" functionality is moved to kick_osd_requests, as it is unrelated to scanning for request->pg->osd mapping changes Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-21 12:24:19 -07:00
Felix Fietkau	8bc8aecdc5	mac80211: initialize sta->last_rx in sta_info_alloc This field is used to determine the inactivity time. When in AP mode, hostapd uses it for kicking out inactive clients after a while. Without this patch, hostapd immediately deauthenticates a new client if it checks the inactivity time before the client sends its first data frame. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Cc: stable@kernel.org Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-03-21 15:19:50 -04:00
Vasiliy Kulikov	961ed183a9	netfilter: ipt_CLUSTERIP: fix buffer overflow 'buffer' string is copied from userspace. It is not checked whether it is zero terminated. This may lead to overflow inside of simple_strtoul(). Changli Gao suggested to copy not more than user supplied 'size' bytes. It was introduced before the git epoch. Files "ipt_CLUSTERIP/*" are root writable only by default, however, on some setups permissions might be relaxed to e.g. network admin user. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-20 15:42:52 +01:00
Eric Dumazet	db856674ac	netfilter: xtables: fix reentrancy commit `f3c5c1bfd4` (make ip_tables reentrant) introduced a race in handling the stackptr restore, at the end of ipt_do_table() We should do it before the call to xt_info_rdunlock_bh(), or we allow cpu preemption and another cpu overwrites stackptr of original one. A second fix is to change the underflow test to check the origptr value instead of 0 to detect underflow, or else we allow a jump from different hooks. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-20 15:40:06 +01:00
Jozsef Kadlecsik	5c1aba4678	netfilter: ipset: fix checking the type revision at create command The revision of the set type was not checked at the create command: if the userspace sent a valid set type but with not supported revision number, it'd create a loop. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-20 15:35:01 +01:00
Jozsef Kadlecsik	5e0c1eb7e6	netfilter: ipset: fix address ranges at hash:port types The hash:port types with IPv4 silently ignored when address ranges with non TCP/UDP were added/deleted from the set and used the first address from the range only. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-20 15:33:26 +01:00
Herbert Xu	6b1e960fdb	bridge: Reset IPCB when entering IP stack on NF_FORWARD Whenever we enter the IP stack proper from bridge netfilter we need to ensure that the skb is in a form the IP stack expects it to be in. The entry point on NF_FORWARD did not meet the requirements of the IP stack, therefore leading to potential crashes/panics. This patch fixes the problem. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-18 15:13:12 -07:00
Eric Dumazet	d870bfb9d3	vlan: should take into account needed_headroom Commit `c95b819ad7` (gre: Use needed_headroom) made gre use needed_headroom instead of hard_header_len This uncover a bug in vlan code. We should make sure vlan devices take into account their real_dev->needed_headroom or we risk a crash in ipgre_header(), because we dont have enough room to push IP header in skb. Reported-by: Diddi Oscarsson <diddi@diddi.se> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-18 15:13:12 -07:00
Ben Hutchings	3a7da39d16	ethtool: Compat handling for struct ethtool_rxnfc This structure was accidentally defined such that its layout can differ between 32-bit and 64-bit processes. Add compat structure definitions and an ioctl wrapper function. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Acked-by: Alexander Duyck <alexander.h.duyck@intel.com> Cc: stable@kernel.org [2.6.30+] Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-18 15:13:11 -07:00
Roger Luethi	5e5069b41d	ethtool: __ethtool_set_sg: check for function pointer before using it __ethtool_set_sg does not check if dev->ethtool_ops->set_sg is defined which can result in a NULL pointer dereference when ethtool is used to change SG settings for drivers without SG support. Signed-off-by: Roger Luethi <rl@hellgate.ch> Reviewed-by: Ben Hutchings <bhutchings@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-18 15:13:10 -07:00
Vasiliy Kulikov	67c5c6cb81	econet: 4 byte infoleak to the network struct aunhdr has 4 padding bytes between 'pad' and 'handle' fields on x86_64. These bytes are not initialized in the variable 'ah' before sending 'ah' to the network. This leads to 4 bytes kernel stack infoleak. This bug was introduced before the git epoch. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: Phil Blundell <philb@gnu.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-18 15:12:15 -07:00
Linus Torvalds	e16b396ce3	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (47 commits) doc: CONFIG_UNEVICTABLE_LRU doesn't exist anymore Update cpuset info & webiste for cgroups dcdbas: force SMI to happen when expected arch/arm/Kconfig: remove one to many l's in the word. asm-generic/user.h: Fix spelling in comment drm: fix printk typo 'sracth' Remove one to many n's in a word Documentation/filesystems/romfs.txt: fixing link to genromfs drivers:scsi Change printk typo initate -> initiate serial, pch uart: Remove duplicate inclusion of linux/pci.h header fs/eventpoll.c: fix spelling mm: Fix out-of-date comments which refers non-existent functions drm: Fix printk typo 'failled' coh901318.c: Change initate to initiate. mbox-db5500.c Change initate to initiate. edac: correct i82975x error-info reported edac: correct i82975x mci initialisation edac: correct commented info fs: update comments to point correct document target: remove duplicate include of target/target_core_device.h from drivers/target/target_core_hba.c ... Trivial conflict in fs/eventpoll.c (spelling vs addition)	2011-03-18 10:37:40 -07:00
Linus Torvalds	7fd23a2471	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid: (48 commits) HID: add support for Logitech Driving Force Pro wheel HID: hid-ortek: remove spurious reference HID: add support for Ortek PKB-1700 HID: roccat-koneplus: vorrect mode of sysfs attr 'sensor' HID: hid-ntrig: init settle and mode check HID: merge hid-egalax into hid-multitouch HID: hid-multitouch: Send events per slot if CONTACTCOUNT is missing HID: ntrig remove if and drop an indent HID: ACRUX - activate the device immediately after binding HID: ntrig: apply NO_INIT_REPORTS quirk HID: hid-magicmouse: Correct touch orientation direction HID: ntrig don't dereference unclaimed hidinput HID: Do not create input devices for feature reports HID: bt hidp: send Output reports using SET_REPORT on the Control channel HID: hid-sony.c: Fix sending Output reports to the Sixaxis HID: add support for Keytouch IEC 60945 HID: Add HID Report Descriptor to sysfs HID: add IRTOUCH infrared USB to hid_have_special_driver HID: kernel oops in out_cleanup in function hidinput_connect HID: Add teletext/color keys - gyration remote - EU version (GYAR3101CKDE) ...	2011-03-18 10:35:30 -07:00
Linus Torvalds	179198373c	Merge branch 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'nfs-for-2.6.39' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (54 commits) RPC: killing RPC tasks races fixed xprt: remove redundant check SUNRPC: Convert struct rpc_xprt to use atomic_t counters SUNRPC: Ensure we always run the tk_callback before tk_action sunrpc: fix printk format warning xprt: remove redundant null check nfs: BKL is no longer needed, so remove the include NFS: Fix a warning in fs/nfs/idmap.c Cleanup: Factor out some cut-and-paste code. cleanup: save 60 lines/100 bytes by combining two mostly duplicate functions. NFS: account direct-io into task io accounting gss:krb5 only include enctype numbers in gm_upcall_enctypes RPCRDMA: Fix FRMR registration/invalidate handling. RPCRDMA: Fix to XDR page base interpretation in marshalling logic. NFSv4: Send unmapped uid/gids to the server when using auth_sys NFSv4: Propagate the error NFS4ERR_BADOWNER to nfs4_do_setattr NFSv4: cleanup idmapper functions to take an nfs_server argument NFSv4: Send unmapped uid/gids to the server if the idmapper fails NFSv4: If the server sends us a numeric uid/gid then accept it NFSv4.1: reject zero layout with zeroed stripe unit ...	2011-03-17 17:40:00 -07:00
Jesper Juhl	4be34b9d69	SUNRPC: Remove resource leak in svc_rdma_send_error() We leak the memory allocated to 'ctxt' when we return after 'ib_dma_mapping_error()' returns !=0. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-17 14:38:03 -04:00
Stanislav Kinsbursky	8e26de238f	RPC: killing RPC tasks races fixed RPC task RPC_TASK_QUEUED bit is set must be checked before trying to wake up task rpc_killall_tasks() because task->tk_waitqueue can not be set (equal to NULL). Also, as Trond Myklebust mentioned, such approach (instead of checking tk_waitqueue to NULL) allows us to "optimise away the call to rpc_wake_up_queued_task() altogether for those tasks that aren't queued". Here is an example of dereferencing of tk_waitqueue equal to NULL: CPU 0 CPU 1 CPU 2 -------------------- --------------------- -------------------------- nfs4_run_open_task rpc_run_task rpc_execute rpc_set_active rpc_make_runnable (waiting) rpc_async_schedule nfs4_open_prepare nfs_wait_on_sequence nfs_umount_begin rpc_killall_tasks rpc_wake_up_task rpc_wake_up_queued_task spin_lock(tk_waitqueue == NULL) BUG() rpc_sleep_on spin_lock(&q->lock) __rpc_sleep_on task->tk_waitqueue = q Signed-off-by: Stanislav Kinsbursky <skinsbursky@openvz.org> Cc: stable@kernel.org Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-17 12:39:00 -04:00
j223yang@asset.uwaterloo.ca	ba3c578de2	xprt: remove redundant check remove redundant check. Signed-off-by: Jinqiu Yang <crindy646@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-17 12:39:00 -04:00
Trond Myklebust	a8de240a90	SUNRPC: Convert struct rpc_xprt to use atomic_t counters Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-17 12:38:59 -04:00
Trond Myklebust	e020c6800c	SUNRPC: Ensure we always run the tk_callback before tk_action This fixes a race in which the task->tk_callback() puts the rpc_task to sleep, setting a new callback. Under certain circumstances, the current code may end up executing the task->tk_action before it gets round to the callback. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Cc: stable@kernel.org	2011-03-17 12:38:41 -04:00
Jiri Kosina	65b06194c9	Merge branches 'dragonrise', 'hidraw-feature', 'multitouch', 'ntrig', 'roccat', 'upstream' and 'upstream-fixes' into for-linus	2011-03-17 14:31:46 +01:00
Linus Torvalds	f74b944419	Merge branch 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl * 'config' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl: BKL: That's all, folks fs/locks.c: Remove stale FIXME left over from BKL conversion ipx: remove the BKL appletalk: remove the BKL x25: remove the BKL ufs: remove the BKL hpfs: remove the BKL drivers: remove extraneous includes of smp_lock.h tracing: don't trace the BKL adfs: remove the big kernel lock	2011-03-16 17:21:00 -07:00
Linus Torvalds	7a6362800c	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1480 commits) bonding: enable netpoll without checking link status xfrm: Refcount destination entry on xfrm_lookup net: introduce rx_handler results and logic around that bonding: get rid of IFF_SLAVE_INACTIVE netdev->priv_flag bonding: wrap slave state work net: get rid of multiple bond-related netdevice->priv_flags bonding: register slave pointer for rx_handler be2net: Bump up the version number be2net: Copyright notice change. Update to Emulex instead of ServerEngines e1000e: fix kconfig for crc32 dependency netfilter ebtables: fix xt_AUDIT to work with ebtables xen network backend driver bonding: Improve syslog message at device creation time bonding: Call netif_carrier_off after register_netdevice bonding: Incorrect TX queue offset net_sched: fix ip_tos2prio xfrm: fix __xfrm_route_forward() be2net: Fix UDP packet detected status in RX compl Phonet: fix aligned-mode pipe socket buffer header reserve netxen: support for GbE port settings ... Fix up conflicts in drivers/staging/brcm80211/brcmsmac/wl_mac80211.c with the staging updates.	2011-03-16 16:29:25 -07:00
Linus Torvalds	e6bee325e4	Merge branch 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6 * 'tty-next' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty-2.6: (76 commits) pch_uart: reference clock on CM-iTC pch_phub: add new device ML7213 n_gsm: fix UIH control byte : P bit should be 0 n_gsm: add a documentation serial: msm_serial_hs: Add MSM high speed UART driver tty_audit: fix tty_audit_add_data live lock on audit disabled tty: move cd1865.h to drivers/staging/tty/ Staging: tty: fix build with epca.c driver pcmcia: synclink_cs: fix prototype for mgslpc_ioctl() Staging: generic_serial: fix double locking bug nozomi: don't use flush_scheduled_work() tty/serial: Relax the device_type restriction from of_serial MAINTAINERS: Update HVC file patterns tty: phase out of ioctl file pointer for tty3270 as well tty: forgot to remove ipwireless from drivers/char/pcmcia/Makefile pch_uart: Fix DMA channel miss-setting issue. pch_uart: fix exclusive access issue pch_uart: fix auto flow control miss-setting issue pch_uart: fix uart clock setting issue pch_uart : Use dev_xxx not pr_xxx ... Fix up trivial conflicts in drivers/misc/pch_phub.c (same patch applied twice, then changes to the same area in one branch)	2011-03-16 15:11:04 -07:00
Steffen Klassert	fbd5060875	xfrm: Refcount destination entry on xfrm_lookup We return a destination entry without refcount if a socket policy is found in xfrm_lookup. This triggers a warning on a negative refcount when freeeing this dst entry. So take a refcount in this case to fix it. This refcount was forgotten when xfrm changed to cache bundles instead of policies for outgoing flows. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Timo Teräs <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-16 12:55:36 -07:00
Jiri Pirko	8a4eb5734e	net: introduce rx_handler results and logic around that This patch allows rx_handlers to better signalize what to do next to it's caller. That makes skb->deliver_no_wcard no longer needed. kernel-doc for rx_handler_result is taken from Nicolas' patch. Signed-off-by: Jiri Pirko <jpirko@redhat.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-16 12:53:54 -07:00
David S. Miller	ee0caa7956	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6	2011-03-16 11:12:57 -07:00
Thomas Graf	400b871ba6	netfilter ebtables: fix xt_AUDIT to work with ebtables Even though ebtables uses xtables it still requires targets to return EBT_CONTINUE instead of XT_CONTINUE. This prevented xt_AUDIT to work as ebt module. Upon Jan's suggestion, use a separate struct xt_target for NFPROTO_BRIDGE having its own target callback returning EBT_CONTINUE instead of cloning the module. Signed-off-by: Thomas Graf <tgraf@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-16 18:32:13 +01:00
Linus Torvalds	0f6e0e8448	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (33 commits) AppArmor: kill unused macros in lsm.c AppArmor: cleanup generated files correctly KEYS: Add an iovec version of KEYCTL_INSTANTIATE KEYS: Add a new keyctl op to reject a key with a specified error code KEYS: Add a key type op to permit the key description to be vetted KEYS: Add an RCU payload dereference macro AppArmor: Cleanup make file to remove cruft and make it easier to read SELinux: implement the new sb_remount LSM hook LSM: Pass -o remount options to the LSM SELinux: Compute SID for the newly created socket SELinux: Socket retains creator role and MLS attribute SELinux: Auto-generate security_is_socket_class TOMOYO: Fix memory leak upon file open. Revert "selinux: simplify ioctl checking" selinux: drop unused packet flow permissions selinux: Fix packet forwarding checks on postrouting selinux: Fix wrong checks for selinux_policycap_netpeer selinux: Fix check for xfrm selinux context algorithm ima: remove unnecessary call to ima_must_measure IMA: remove IMA imbalance checking ...	2011-03-16 09:15:43 -07:00
Linus Torvalds	26a992dbc2	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs: (46 commits) fs/9p: Make the writeback_fid owned by root fs/9p: Writeback dirty data before setattr fs/9p: call vmtruncate before setattr 9p opeation fs/9p: Properly update inode attributes on link fs/9p: Prevent multiple inclusion of same header fs/9p: Workaround vfs rename rehash bug fs/9p: Mark directory inode invalid for many directory inode operations fs/9p: Add . and .. dentry revalidation flag fs/9p: mark inode attribute invalid on rename, unlink and setattr fs/9p: Add support for marking inode attribute invalid fs/9p: Initialize root inode number for dotl fs/9p: Update link count correctly on different file system operations fs/9p: Add drop_inode 9p callback fs/9p: Add direct IO support in cached mode fs/9p: Fix inode i_size update in file_write fs/9p: set default readahead pages in cached mode fs/9p: Move writeback fid to v9fs_inode fs/9p: Add v9fs_inode fs/9p: Don't set stat.st_blocks based on nrpages fs/9p: Add inode hashing ...	2011-03-16 08:58:09 -07:00
Linus Torvalds	bd2895eead	Merge branch 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq * 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: fix build failure introduced by s/freezeable/freezable/ workqueue: add system_freezeable_wq rds/ib: use system_wq instead of rds_ib_fmr_wq net/9p: replace p9_poll_task with a work net/9p: use system_wq instead of p9_mux_wq xfs: convert to alloc_workqueue() reiserfs: make commit_wq use the default concurrency level ocfs2: use system_wq instead of ocfs2_quota_wq ext4: convert to alloc_workqueue() scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path scsi/be2iscsi,qla2xxx: convert to alloc_workqueue() misc/iwmc3200top: use system_wq instead of dedicated workqueues i2o: use alloc_workqueue() instead of create_workqueue() acpi: kacpi*_wq don't need WQ_MEM_RECLAIM fs/aio: aio_wq isn't used in memory reclaim path input/tps6507x-ts: use system_wq instead of dedicated workqueue cpufreq: use system_wq instead of dedicated workqueues wireless/ipw2x00: use system_wq instead of dedicated workqueues arm/omap: use system_wq in mailbox workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER	2011-03-16 08:20:19 -07:00
Dan Siemon	4a2b9c3756	net_sched: fix ip_tos2prio ECN support incorrectly maps ECN BESTEFFORT packets to TC_PRIO_FILLER (1) instead of TC_PRIO_BESTEFFORT (0) This means ECN enabled flows are placed in pfifo_fast/prio low priority band, giving ECN enabled flows [ECT(0) and CE codepoints] higher drop probabilities. This is rather unfortunate, given we would like ECN being more widely used. Ref : http://www.coverfire.com/archives/2011/03/13/pfifo_fast-and-ecn/ Signed-off-by: Dan Siemon <dan@coverfire.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Dave Täht <d@taht.net> Cc: Jonathan Morton <chromatix99@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-15 18:53:54 -07:00
Randy Dunlap	986d4abbdd	sunrpc: fix printk format warning Fix printk format build warning: net/sunrpc/xprtrdma/verbs.c:1463: warning: format '%llx' expects type 'long long unsigned int', but argument 3 has type 'dma_addr_t' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-15 20:17:32 -04:00
j223yang@asset.uwaterloo.ca	4d4a76f330	xprt: remove redundant null check 'req' is dereferenced before checked for NULL. The patch simply removes the check. Signed-off-by: Jinqiu Yang<crindy646@gmail.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-15 20:16:14 -04:00
Linus Torvalds	422e6c4bc4	Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (57 commits) tidy the trailing symlinks traversal up Turn resolution of trailing symlinks iterative everywhere simplify link_path_walk() tail Make trailing symlink resolution in path_lookupat() iterative update nd->inode in __do_follow_link() instead of after do_follow_link() pull handling of one pathname component into a helper fs: allow AT_EMPTY_PATH in linkat(), limit that to CAP_DAC_READ_SEARCH Allow passing O_PATH descriptors via SCM_RIGHTS datagrams readlinkat(), fchownat() and fstatat() with empty relative pathnames Allow O_PATH for symlinks New kind of open files - "location only". ext4: Copy fs UUID to superblock ext3: Copy fs UUID to superblock. vfs: Export file system uuid via /proc/<pid>/mountinfo unistd.h: Add new syscalls numbers to asm-generic x86: Add new syscalls for x86_64 x86: Add new syscalls for x86_32 fs: Remove i_nlink check from file system link callback fs: Don't allow to create hardlink for deleted file vfs: Add open by file handle support ...	2011-03-15 15:48:13 -07:00
James Morris	a002951c97	Merge branch 'next' into for-linus	2011-03-16 09:41:17 +11:00
Eric Dumazet	7313714775	xfrm: fix __xfrm_route_forward() This function should return 0 in case of error, 1 if OK commit `452edd598f` (xfrm: Return dst directly from xfrm_lookup()) got it wrong. Reported-and-bisected-by: Michael Smith <msmith@cbnco.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-15 15:26:43 -07:00
David S. Miller	c337ffb68e	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2011-03-15 15:15:17 -07:00
Rémi Denis-Courmont	638be34459	Phonet: fix aligned-mode pipe socket buffer header reserve When the pipe uses aligned-mode data packets, we must reserve 4 bytes instead of 3 for the pipe protocol header. Otherwise the Phonet header would not be aligned, resulting in potentially corrupted headers with later unaligned memory writes. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-15 14:55:49 -07:00
David S. Miller	918690f981	Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6	2011-03-15 13:57:18 -07:00
David S. Miller	31111c26d9	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 Conflicts: Documentation/feature-removal-schedule.txt	2011-03-15 13:03:27 -07:00
Florian Westphal	2f5dc63123	netfilter: xt_addrtype: ipv6 support The kernel will refuse certain types that do not work in ipv6 mode. We can then add these features incrementally without risk of userspace breakage. Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 20:17:44 +01:00
Florian Westphal	de81bbea17	netfilter: ipt_addrtype: rename to xt_addrtype Followup patch will add ipv6 support. ipt_addrtype.h is retained for compatibility reasons, but no longer used by the kernel. Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 20:16:20 +01:00
John W. Linville	106af2c99a	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2011-03-15 14:16:48 -04:00
Tommi Virtanen	b09734b1f4	libceph: Fix base64-decoding when input ends in newline. It used to return -EINVAL because it thought the end was not aligned to 4 bytes. Clean up superfluous src < end test in if, the while itself guarantees that. Signed-off-by: Tommi Virtanen <tommi.virtanen@dreamhost.com> Signed-off-by: Sage Weil <sage@newdream.net>	2011-03-15 09:14:02 -07:00
Aneesh Kumar K.V	c0aa4caf4c	net/9p: Implement syncfs 9P operation Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:38 -05:00
Venkateswararao Jujjuri (JV)	f735195d51	[net/9p] Small non-IO PDUs for zero-copy supporting transports. If a transport prefers payload to be sent separate from the PDU (P9_TRANS_PREF_PAYLOAD_SEP), there is no need to allocate msize PDU buffers(struct p9_fcall). This patch allocates only upto 4k buffers for this kind of transports and there won't be any change to the legacy transports. Hence, this patch on top of zero copy changes allows user to specify higher msizes through the mount option without hogging the kernel heap. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Venkateswararao Jujjuri (JV)	ca41bb3e21	[net/9p] Handle Zero Copy TREAD/RERROR case in !dotl case. This takes care of copying out error buffers from user buffer payloads when we are using zero copy. This happens because the only payload buffer the server has to respond to the request is the user buffer given for the zero copy read. Because we only use zerocopy when the amount of data to transfer is greater than a certain size (currently 4K) and error strings are limited to ERRMAX (currently 128) we don't need to worry about there being sufficient space for the error to fit in the payload. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:36 -05:00
Venkateswararao Jujjuri (JV)	2c66523fd2	[net/9p] readdir zerocopy changes for 9P2000.L protocol. Modify p9_client_readdir() to check the transport preference and act according If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload separately instead of putting it directly on PDU. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)	1fc52481c2	[net/9p] Write side zerocopy changes for 9P2000.L protocol. Modify p9_client_write() to check the transport preference and act accordingly. If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload separately instead of putting it directly on PDU. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)	bb2f8a5515	[net/9p] Read side zerocopy changes for 9P2000.L protocol. Modify p9_client_read() to check the transport preference and act accordingly. If the preference is P9_TRANS_PREF_PAYLOAD_SEP, send the payload separately instead of putting it directly on PDU. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)	6f69c395ce	[net/9p] Add preferences to transport layer. This patch adds preferences field to the p9_trans_module. Through this, now transport layer can express its preference about the payload. i.e if payload neds to be part of the PDU or it prefers it to be sent sepearetly so that the transport layer can handle it in a better way. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)	4038866dab	[net/9p] Add gup/zero_copy support to VirtIO transport layer. Modify p9_virtio_request() and req_done() functions to support additional payload sent down to the transport layer through tc->pubuf and tc->pkbuf. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:35 -05:00
Venkateswararao Jujjuri (JV)	9bb6c10a4e	[net/9p] Assign type of transaction to tc->pdu->id which is otherwise unsed. This will be used by the transport layer to determine the out going request type. Transport layer uses this information to correctly place the mapped pages in the PDU. Patches following this will make use of this to achieve zero copy. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Venkateswararao Jujjuri (JV)	022cae3655	[net/9p] Preparation and helper functions for zero copy This patch prepares p9_fcall structure for zero copy. Added fields send the payload buffer information to the transport layer. In addition it adds a 'private' field for the transport layer to store mapped/pinned page information so that it can be freed/unpinned during req_done. This patch also creates trans_common.[ch] to house helper functions. It adds the following helper functions. p9_release_req_pages - Release pages after the transaction. p9_nr_pages - Return number of pages needed to accomodate the payload. payload_gup - Translates user buffer into kernel pages. Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com> Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>	2011-03-15 09:57:34 -05:00
Vasiliy Kulikov	6a8ab06077	ipv6: netfilter: ip6_tables: fix infoleak to userspace Structures ip6t_replace, compat_ip6t_replace, and xt_get_revision are copied from userspace. Fields of these structs that are zero-terminated strings are not checked. When they are used as argument to a format string containing "%s" in request_module(), some sensitive information is leaked to userspace via argument of spawned modprobe process. The first bug was introduced before the git epoch; the second was introduced in `3bc3fe5e` (v2.6.25-rc1); the third is introduced by `6b7d31fc` (v2.6.15-rc1). To trigger the bug one should have CAP_NET_ADMIN. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 13:37:13 +01:00
Vasiliy Kulikov	78b7987676	netfilter: ip_tables: fix infoleak to userspace Structures ipt_replace, compat_ipt_replace, and xt_get_revision are copied from userspace. Fields of these structs that are zero-terminated strings are not checked. When they are used as argument to a format string containing "%s" in request_module(), some sensitive information is leaked to userspace via argument of spawned modprobe process. The first and the third bugs were introduced before the git epoch; the second was introduced in `2722971c` (v2.6.17-rc1). To trigger the bug one should have CAP_NET_ADMIN. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 13:36:05 +01:00
Vasiliy Kulikov	42eab94fff	netfilter: arp_tables: fix infoleak to userspace Structures ipt_replace, compat_ipt_replace, and xt_get_revision are copied from userspace. Fields of these structs that are zero-terminated strings are not checked. When they are used as argument to a format string containing "%s" in request_module(), some sensitive information is leaked to userspace via argument of spawned modprobe process. The first bug was introduced before the git epoch; the second is introduced by `6b7d31fc` (v2.6.15-rc1); the third is introduced by `6b7d31fc` (v2.6.15-rc1). To trigger the bug one should have CAP_NET_ADMIN. Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 13:35:21 +01:00
Changli Gao	4656c4d61a	netfilter: xt_connlimit: remove connlimit_rnd_inited A potential race condition when generating connlimit_rnd is also fixed. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 13:26:32 +01:00
Changli Gao	3e0d5149e6	netfilter: xt_connlimit: use hlist instead The header of hlist is smaller than list. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 13:25:42 +01:00
Changli Gao	0e23ca14f8	netfilter: xt_connlimit: use kmalloc() instead of kzalloc() All the members are initialized after kzalloc(). Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 13:24:56 +01:00
Changli Gao	8183e3a88a	netfilter: xt_connlimit: fix daddr connlimit in SNAT scenario We use the reply tuples when limiting the connections by the destination addresses, however, in SNAT scenario, the final reply tuples won't be ready until SNAT is done in POSTROUING or INPUT chain, and the following nf_conntrack_find_get() in count_tem() will get nothing, so connlimit can't work as expected. In this patch, the original tuples are always used, and an additional member addr is appended to save the address in either end. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-15 13:23:28 +01:00
Al Viro	326be7b484	Allow passing O_PATH descriptors via SCM_RIGHTS datagrams Just need to make sure that AF_UNIX garbage collector won't confuse O_PATHed socket on filesystem for real AF_UNIX opened socket. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-15 02:21:45 -04:00
Simon Horman	14e405461e	IPVS: Add __ip_vs_control_{init,cleanup}_sysctl() Break out the portions of __ip_vs_control_init() and __ip_vs_control_cleanup() where aren't necessary when CONFIG_SYSCTL is undefined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:37:01 +09:00
Simon Horman	fb1de432c1	IPVS: Conditionally define and use ip_vs_lblc{r}_table ip_vs_lblc_table and ip_vs_lblcr_table, and code that uses them are unnecessary when CONFIG_SYSCTL is undefined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:37:01 +09:00
Simon Horman	a7a86b8616	IPVS: Minimise ip_vs_leave when CONFIG_SYSCTL is undefined Much of ip_vs_leave() is unnecessary if CONFIG_SYSCTL is undefined. I tried an approach of breaking the now #ifdef'ed portions out into a separate function. However this appeared to grow the compiled code on x86_64 by about 200 bytes in the case where CONFIG_SYSCTL is defined. So I have gone with the simpler though less elegant #ifdef'ed solution for now. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:37:00 +09:00
Simon Horman	b27d777ec5	IPVS: Conditinally use sysctl_lblc{r}_expiration In preparation for not including sysctl_lblc{r}_expiration in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:59 +09:00
Simon Horman	8e1b0b1b56	IPVS: Add expire_quiescent_template() In preparation for not including sysctl_expire_quiescent_template in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:58 +09:00
Simon Horman	71a8ab6cad	IPVS: Add sysctl_expire_nodest_conn() In preparation for not including sysctl_expire_nodest_conn in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:58 +09:00
Simon Horman	7532e8d40c	IPVS: Add sysctl_sync_ver() In preparation for not including sysctl_sync_ver in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:57 +09:00
Simon Horman	59e0350ead	IPVS: Add {sysctl_sync_threshold,period}() In preparation for not including sysctl_sync_threshold in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:57 +09:00
Simon Horman	0cfa558e2c	IPVS: Add sysctl_nat_icmp_send() In preparation for not including sysctl_nat_icmp_send in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:56 +09:00
Simon Horman	84b3cee39f	IPVS: Add sysctl_snat_reroute() In preparation for not including sysctl_snat_reroute in struct netns_ipvs when CONFIG_SYCTL is not defined. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:55 +09:00
Simon Horman	ba4fd7e966	IPVS: Add ip_vs_route_me_harder() Add ip_vs_route_me_harder() to avoid repeating the same code twice. Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:54 +09:00
Julian Anastasov	6ef757f965	ipvs: rename estimator functions Rename ip_vs_new_estimator to ip_vs_start_estimator and ip_vs_kill_estimator to ip_vs_stop_estimator to better match their logic. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:54 +09:00
Julian Anastasov	ea9f22cce9	ipvs: optimize rates reading Move the estimator reading from estimation_timer to user context. ip_vs_read_estimator() will be used to decode the rate values. As the decoded rates are not set by estimation timer there is no need to reset them in ip_vs_zero_stats. There is no need ip_vs_new_estimator() to encode stats to rates, if the destination is in trash both the stats and the rates are inactive. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:53 +09:00
Julian Anastasov	55a3d4e15c	ipvs: properly zero stats and rates Currently, the new percpu counters are not zeroed and the zero commands do not work as expected, we still show the old sum of percpu values. OTOH, we can not reset the percpu counters from user context without causing the incrementing to use old and bogus values. So, as Eric Dumazet suggested fix that by moving all overhead to stats reading in user context. Do not introduce overhead in timer context (estimator) and incrementing (packet handling in softirqs). The new ustats0 field holds the zero point for all counter values, the rates always use 0 as base value as before. When showing the values to user space just give the difference between counters and the base values. The only drawback is that percpu stats are not zeroed, they are accessible only from /proc and are new interface, so it should not be a compatibility problem as long as the sum stats are correct after zeroing. Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:52 +09:00
Julian Anastasov	2a0751af09	ipvs: reorganize tot_stats The global tot_stats contains cpustats field just like the stats for dest and svc, so better use it to simplify the usage in estimation_timer. As tot_stats is registered as estimator we can remove the special ip_vs_read_cpu_stats call for tot_stats. Fix ip_vs_read_cpu_stats to be called under stats lock because it is still used as synchronization between estimation timer and user context (the stats readers). Also, make sure ip_vs_stats_percpu_show reads properly the u64 stats from user context. Signed-off-by: Julian Anastasov <ja@ssi.bg> Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:52 +09:00
Shan Wei	6060c74a3d	netfilter:ipvs: use kmemdup The semantic patch that makes this output is available in scripts/coccinelle/api/memdup.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:49 +09:00
Julian Anastasov	4a569c0c0f	ipvs: remove _bh from percpu stats reading ip_vs_read_cpu_stats is called only from timer, so no need for _bh locks. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:48 +09:00
Julian Anastasov	097fc76a08	ipvs: avoid lookup for fwmark 0 Restore the previous behaviour to lookup for fwmark service only when fwmark is non-null. This saves only CPU. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Hans Schillstrom <hans@schillstrom.com> Signed-off-by: Simon Horman <horms@verge.net.au>	2011-03-15 09:36:48 +09:00
Mark Rustad	698e1d23cf	net: dcbnl: Update copyright dates Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 17:02:42 -07:00
Sangtae Ha	b5ccd07337	tcp_cubic: fix low utilization of CUBIC with HyStart HyStart sets the initial exit point of slow start. Suppose that HyStart exits at 0.5BDP in a BDP network and no history exists. If the BDP of a network is large, CUBIC's initial cwnd growth may be too conservative to utilize the link. CUBIC increases the cwnd 20% per RTT in this case. Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:54:42 -07:00
Sangtae Ha	2b4636a5f8	tcp_cubic: make the delay threshold of HyStart less sensitive Make HyStart less sensitive to abrupt delay variations due to buffer bloat. Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com> Acked-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:54:42 -07:00
stephen hemminger	3b585b3449	tcp_cubic: enable high resolution ack time if needed This is a refined version of an earlier patch by Lucas Nussbaum. Cubic needs RTT values in milliseconds. If HZ < 1000 then the values will be too coarse. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:54:40 -07:00
stephen hemminger	17a6e9f1aa	tcp_cubic: fix clock dependency The hystart code was written with assumption that HZ=1000. Replace the use of jiffies with bictcp_clock as a millisecond real time clock. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reported-by: Lucas Nussbaum <lucas.nussbaum@loria.fr> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:54:39 -07:00
stephen hemminger	aac46324e1	tcp_cubic: make ack train delta value a parameter Make the spacing between ACK's that indicates a train a tuneable value like other hystart values. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:54:39 -07:00
stephen hemminger	c54b4b7655	tcp_cubic: fix comparison of jiffies Jiffies wraps around therefore the correct way to compare is to use cast to signed value. Note: cubic is not using full jiffies value on 64 bit arch because using full unsigned long makes struct bictcp grow too large for the available ca_priv area. Includes correction from Sangtae Ha to improve ack train detection. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:54:38 -07:00
stephen hemminger	febf081987	tcp: fix RTT for quick packets in congestion control In the congestion control interface, the callback for each ACK includes an estimated round trip time in microseconds. Some algorithms need high resolution (Vegas style) but most only need jiffie resolution. If RTT is not accurate (like a retransmission) -1 is used as a flag value. When doing coarse resolution if RTT is less than a a jiffie then 0 should be returned rather than no estimate. Otherwise algorithms that expect good ack's to trigger slow start (like CUBIC Hystart) will be confused. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:54:38 -07:00
Daniel Baluta	e5537bfc98	af_unix: update locking comment We latch our state using a spinlock not a r/w kind of lock. Signed-off-by: Daniel Baluta <dbaluta@ixiacom.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:25:33 -07:00
stephen hemminger	a461c0297f	bridge: skip forwarding delay if not using STP If Spanning Tree Protocol is not enabled, there is no good reason for the bridge code to wait for the forwarding delay period before enabling the link. The purpose of the forwarding delay is to allow STP to learn about other bridges before nominating itself. The only possible impact is that when starting up a new port the bridge may flood a packet now, where previously it might have seen traffic from the other host and preseeded the forwarding table. Includes change for local variable br already available in that func. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 15:06:49 -07:00
stephen hemminger	1faa4356a3	bridge: control carrier based on ports online This makes the bridge device behave like a physical device. In earlier releases the bridge always asserted carrier. This changes the behavior so that bridge device carrier is on only if one or more ports are in the forwarding state. This should help IPv6 autoconfiguration, DHCP, and routing daemons. I did brief testing with Network and Virt manager and they seem fine, but since this changes behavior of bridge, it should wait until net-next (2.6.39). Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Reviewed-by: Nicolas de Pesloüan <nicolas.2p.debian@free.fr> Tested-By: Adam Majer <adamm@zombino.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-14 14:29:02 -07:00
David S. Miller	201a11c1db	Merge branch 'tipc-Mar14-2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6	2011-03-14 13:49:53 -07:00
Daniel Turull	05aebe2e5d	pktgen: bug fix in transmission headers with frags=0 (bug introduced by commit `26ad787962` (pktgen: speedup fragmented skbs) The headers of pktgen were incorrectly added in a pktgen packet without frags (frags=0). There was an offset in the pktgen headers. The cause was in reusing the pgh variable as a return variable in skb_put when adding the payload to the skb. Signed-off-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>	2011-03-14 13:47:40 -07:00
Felix Fietkau	9db372fdd5	mac80211: fix channel type recalculation with HT and non-HT interfaces When running an AP interface along with the cooked monitor interface created by hostapd, adding an interface and deleting it again triggers a channel type recalculation during which the (non-HT) monitor interface takes precedence over the HT AP interface, thus causing the channel type to be set to non-HT. Fix this by ensuring that a more wide channel type will not be overwritten by a less wide channel type. Signed-off-by: Felix Fietkau <nbd@openwrt.org> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-03-14 14:46:58 -04:00
Helmut Schaa	cf28d7934c	mac80211: Shortcut minstrel_ht rate setup for non-MRR capable devices Devices without multi rate retry support won't be able to use all rates as specified by mintrel_ht. Hence, we can simply skip setting up further rates as the devices will only use the first one. Also add a special case for devices with only two possible tx rates. We use sample_rate -> max_prob_rate for sampling and max_tp_rate -> max_prob_rate by default. Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-03-14 14:46:58 -04:00
Stephen Hemminger	fe8f661f2c	netfilter: nf_conntrack: fix sysctl memory leak Message in log because sysctl table was not empty at netns exit WARNING: at net/sysctl_net.c:84 sysctl_net_exit+0x2a/0x2c() Instrumenting showed that the nf_conntrack_timestamp was the entry that was being created but not cleared. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-14 19:20:44 +01:00
Linus Torvalds	5f40d42094	Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: NFS: NFSROOT should default to "proto=udp" nfs4: remove duplicated #include NFSv4: nfs4_state_mark_reclaim_nograce() should be static NFSv4: Fix the setlk error handler NFSv4.1: Fix the handling of the SEQUENCE status bits NFSv4/4.1: Fix nfs4_schedule_state_recovery abuses NFSv4.1 reclaim complete must wait for completion NFSv4: remove duplicate clientid in struct nfs_client NFSv4.1: Retry CREATE_SESSION on NFS4ERR_DELAY sunrpc: Propagate errors from xs_bind() through xs_create_sock() (try3-resend) Fix nfs_compat_user_ino64 so it doesn't cause problems if bit 31 or 63 are set in fileid nfs: fix compilation warning nfs: add kmalloc return value check in decode_and_add_ds SUNRPC: Remove resource leak in svc_rdma_send_error() nfs: close NFSv4 COMMIT vs. CLOSE race SUNRPC: Close a race in __rpc_wait_for_completion_task()	2011-03-14 11:19:50 -07:00
Patrick McHardy	42046e2e45	netfilter: x_tables: return -ENOENT for non-existant matches/targets As Stephen correctly points out, we need to return -ENOENT in xt_find_match()/xt_find_target() after the patch "netfilter: x_tables: misuse of try_then_request_module" in order to properly indicate a non-existant module to the caller. Signed-off-by: Patrick McHardy <kaber@trash.net>	2011-03-14 19:11:44 +01:00
Paul Gortmaker	1fa073803e	tipc: delete extra semicolon blocking node deletion Remove bogus semicolon only recently introduced in `34e46258cb` that blocks cleanup of nodes for N>1 on shutdown. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-14 12:21:12 -04:00
Al Viro	c9c6cac0c2	kill path_lookup() all remaining callers pass LOOKUP_PARENT to it, so flags argument can die; renamed to kern_path_parent() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2011-03-14 09:15:23 -04:00
Eric Dumazet	4e75db2e8f	inetpeer: should use call_rcu() variant After commit `7b46ac4e77` (inetpeer: Don't disable BH for initial fast RCU lookup.), we should use call_rcu() to wait proper RCU grace period. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 23:22:23 -07:00
Steffen Klassert	d8647b79c3	xfrm: Add user interface for esn and big anti-replay windows This patch adds a netlink based user interface to configure esn and big anti-replay windows. The new netlink attribute XFRMA_REPLAY_ESN_VAL is used to configure the new implementation. If the XFRM_STATE_ESN flag is set, we use esn and support for big anti-replay windows for the configured state. If this flag is not set we use the new implementation with 32 bit sequence numbers. A big anti-replay window can be configured in this case anyway. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 20:22:31 -07:00
Steffen Klassert	2cd084678f	xfrm: Add support for IPsec extended sequence numbers This patch adds support for IPsec extended sequence numbers (esn) as defined in RFC 4303. The bits to manage the anti-replay window are based on a patch from Alex Badea. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 20:22:31 -07:00
Steffen Klassert	97e15c3a85	xfrm: Support anti-replay window size bigger than 32 packets As it is, the anti-replay bitmap in struct xfrm_replay_state can only accomodate 32 packets. Even though it is possible to configure anti-replay window sizes up to 255 packets from userspace. So we reject any packet with a sequence number within the configured window but outside the bitmap. With this patch, we represent the anti-replay window as a bitmap of variable length that can be accessed via the new struct xfrm_replay_state_esn. Thus, we have no limit on the window size anymore. To use the new anti-replay window implementantion, new userspace tools are required. We leave the old implementation untouched to stay in sync with old userspace tools. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 20:22:30 -07:00
Steffen Klassert	9fdc4883d9	xfrm: Move IPsec replay detection functions to a separate file To support multiple versions of replay detection, we move the replay detection functions to a separate file and make them accessible via function pointers contained in the struct xfrm_replay. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 20:22:30 -07:00
Steffen Klassert	d212a4c290	esp6: Add support for IPsec extended sequence numbers This patch adds IPsec extended sequence numbers support to esp6. We use the authencesn crypto algorithm to handle esp with separate encryption/authentication algorithms. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 20:22:29 -07:00
Steffen Klassert	0dc49e9b28	esp4: Add support for IPsec extended sequence numbers This patch adds IPsec extended sequence numbers support to esp4. We use the authencesn crypto algorithm to handle esp with separate encryption/authentication algorithms. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 20:22:29 -07:00
Steffen Klassert	1ce3644ade	xfrm: Use separate low and high order bits of the sequence numbers in xfrm_skb_cb To support IPsec extended sequence numbers, we split the output sequence numbers of xfrm_skb_cb in low and high order 32 bits and we add the high order 32 bits to the input sequence numbers. All users are updated accordingly. Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 20:22:28 -07:00
David S. Miller	27b61ae2d7	Merge branch 'tipc-Mar13-2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/net-next-2.6	2011-03-13 18:49:11 -07:00
Hiroaki SHIMODA	46af31800b	ipv4: Fix PMTU update. On current net-next-2.6, when Linux receives ICMP Type: 3, Code: 4 (Destination unreachable (Fragmentation needed)), icmp_unreach -> ip_rt_frag_needed (peer->pmtu_expires is set here) -> tcp_v4_err -> do_pmtu_discovery -> ip_rt_update_pmtu (peer->pmtu_expires is already set, so check_peer_pmtu is skipped.) -> check_peer_pmtu check_peer_pmtu is skipped and MTU is not updated. To fix this, let check_peer_pmtu execute unconditionally. And some minor fixes 1) Avoid potential peer->pmtu_expires set to be zero. 2) In check_peer_pmtu, argument of time_before is reversed. 3) check_peer_pmtu expects peer->pmtu_orig is initialized as zero, but not initialized. Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-13 18:37:49 -07:00
Allan Stephens	390bce4237	tipc: Eliminate obsolete routine for handling routed messages Eliminates a routine that is used in handling messages arriving from another cluster or zone. Such messages can no longer be received by TIPC now that multi-cluster and multi-zone network support has been eliminated. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:19 -04:00
Allan Stephens	7945c1fb02	tipc: Eliminate remaining support for routing table messages Gets rid of all remaining code relating to ROUTE_DISTRIBUTOR messages. These messages were only used in multi-cluster and multi-zone networks, which TIPC no longer supports. (For safety, TIPC now treats such messages the same way that it handles other unrecognized messages.) Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:19 -04:00
Allan Stephens	50d492321a	tipc: Remove bearer flag indicating existence of broadcast address Eliminates the flag in the TIPC bearer structure that indicates if the bearer supports broadcasting, since the flag is always set to 1 and serves no useful purpose. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:19 -04:00
Allan Stephens	f9107ebe7d	tipc: Don't respond to neighbor discovery request on blocked bearer Adds a check to prevent TIPC from trying to respond to an incoming LINK_CONFIG request message if the associated bearer is currently prohibited from sending messages. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:19 -04:00
Allan Stephens	d901a42b27	tipc: Eliminate unnecessary constant for neighbor discovery msg size Eliminates an unnecessary constant that defines the size of a LINK_CONFIG message, and uses one of the existing standard message size symbols in its place. (The defunct constant was located in the wrong place anyway, since it was grouped with other constants that define message users instead of message sizes.) Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:18 -04:00
Allan Stephens	a2b58de2e3	tipc: Remove unused field in bearer structure Eliminates a field in TIPC's bearer objects that is set, but never referenced. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:18 -04:00
Allan Stephens	50d3e6399a	tipc: Correct misnamed references to neighbor discovery domain Renames items that are improperly labelled as "network scope" items (which are represented by simple integer values) rather than "network domain" items (which are represented by <Z.C.N>-type network addresses). This change is purely cosmetic, and does not affect the operation of TIPC. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:18 -04:00
Allan Stephens	37b9c08a88	tipc: Optimizations to link creation code Enhances link creation code as follows: 1) Detects illegal attempts to add a requested link earlier in the link creation process. This prevents TIPC from wasting time initializing a link object it then throws away, and also eliminates the code needed to do the throwing away. 2) Passes in the node object associated with the requested link. This allows TIPC to eliminate a search to locate the node object, as well as code that attempted to create the node if it doesn't exist. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:18 -04:00
Allan Stephens	fa2bae2d5b	tipc: Give Tx of discovery responses priority over link messages Delay releasing the node lock when processing a neighbor discovery message until after the optional discovery response message has been sent. This helps ensure that any link protocol messages sent by a link endpoint created as a result of a neighbor discovery request are received after the discovery response is received, thereby giving the receiving node a chance to create a peer link endpoint to consume those link protocol messages, if one does not already exist. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:18 -04:00
Allan Stephens	a728750e4f	tipc: Cosmetic changes to neighbor discovery logic Reworks the appearance of the routine that processes incoming LINK_CONFIG messages to keep the main logic flow at a consistent level of indentation, and to add comments outlining the various phases involved in processing each message. This rework is being done to allow upcoming enhancements to this routine to be integrated more cleanly. The diff isn't really readable, so know that it was a case of the old code being like: tipc_disc_recv_msg(..) { if (in_own_cluster(orig)) { ... lines and lines of stuff ... } } which is now replaced with the more sane: tipc_disc_recv_msg(..) { if (!in_own_cluster(orig)) return; ... lines and lines of stuff ... } Instances of spin locking within the reindented block were replaced with the identical tipc_node_[un]lock() abstractions. Note that all these changes are cosmetic in nature, and do not change the way LINK_CONFIG messages are processed. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:18 -04:00
Allan Stephens	75f0aa4990	tipc: Fix redundant link field handling in link protocol message Ensures that the "redundant link exists" field of the LINK_PROTOCOL messages sent by a link endpoint is set if and only if the sending node has at least one other working link to the peer node. Previously, the bit was set only if there were at least 2 working links to the peer node, meaning the bit was incorrectly left unset in messages sent by a non-working link endpoint when exactly one alternate working link was available. The revised code now takes the state of the link sending the message into account when deciding if an alternate link exists. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:18 -04:00
Allan Stephens	77f167fcce	tipc: make msg_set_redundant_link() consistent with other set ops All the other boolean like msg_set_X(m) operations don't export both a msg_set_X(a) and a msg_clear_X(m), but instead just have the single msg_set_X(m, val) variant. Make the redundant_link one consistent by having the set take a value, and delete the msg_clear_redundant_link() anomoly. This is a cosmetic change and should not change behaviour. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:17 -04:00
Paul Gortmaker	8f19afb2db	tipc: cosmetic - function names are not to be full sentences Function names like "tipc_node_has_redundant_links" are unweildy and result in long lines even for simple lines. The "has" doesn't contribute any value add, so dropping that is a slight step in the right direction. This is a cosmetic change, basic result of: for i in `grep -l tipc_node_has_ *` ; do sed -i s/tipc_node_has_/tipc_node_/ $i ; done Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:17 -04:00
Allan Stephens	e7b3acb6a8	tipc: Eliminate timestamp from link protocol messages Removes support for the timestamp field of TIPC's link protocol messages. This field was previously used to hold an OS-dependent timestamp value that was used to assist in debugging early versions of TIPC. The field has now been deemed unnecessary and has been removed from the latest TIPC specification. This change has no impact on the operation of TIPC since the field was set by TIPC, but never referenced. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:17 -04:00
Allan Stephens	34e46258cb	tipc: manually inline net_start/stop, make assoc. vars static Relocates network-related variables into the subsystem files where they are now primarily used (following the recent rework of TIPC's node table), and converts globals into locals where possible. Changes the initialization of tipc_num_links from run-time to compile-time, and eliminates the net_start routine that becomes empty as a result. Also eliminates the corresponding net_stop routine by moving its (trivial) content into the one location that called the routine. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:17 -04:00
Allan Stephens	672d99e19a	tipc: Convert node object array to a hash table Replaces the dynamically allocated array of pointers to the cluster's node objects with a static hash table. Hash collisions are resolved using chaining, with a typical hash chain having only a single node, to avoid degrading performance during processing of incoming packets. The conversion to a hash table reduces the memory requirements for TIPC's node table to approximately the same size it had prior to the previous commit. In addition to the hash table itself, TIPC now also maintains a linked list for the node objects, sorted by ascending network address. This list allows TIPC to continue sending responses to user space applications that request node and link information in sorted order. The list also improves performance when name table update messages are sent by making it easier to identify the nodes that must be notified. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:17 -04:00
Allan Stephens	f831c963b5	tipc: Eliminate configuration for maximum number of cluster nodes Gets rid of the need for users to specify the maximum number of cluster nodes supported by TIPC. TIPC now automatically provides support for all 4K nodes allowed by its addressing scheme. Note: This change sets TIPC's memory usage to the amount used by a maximum size node table with 4K entries. An upcoming patch that converts the node table from a linear array to a hash table will compact the node table to a more efficient design, but for clarity it is nice to have all the Kconfig infrastruture go away separately. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:17 -04:00
Allan Stephens	d1bcb11544	tipc: Split up unified structure of network-related variables Converts the fields of the global "tipc_net" structure into individual variables. Since the struct was never referenced as a complete unit, its existence was pointless. This will facilitate upcoming changes to TIPC's node table and simpify upcoming relocation of the variables so they are only visible to the files that actually use them. This change is essentially cosmetic in nature, and doesn't affect the operation of TIPC. Signed-off-by: Allan Stephens <allan.stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:17 -04:00
Allan Stephens	9df3b7eb6e	tipc: Fix problem with missing link in "tipc-config -l" output Removes a race condition that could cause TIPC's internal counter of the number of links it has to neighboring nodes to have the incorrect value if two independent threads of control simultaneously create new link endpoints connecting to two different nodes using two different bearers. Such under counting would result in TIPC failing to list the final link(s) in its response to a configuration request to list all of the node's links. The counter is now updated atomically to ensure that simultaneous increments do not interfere with each other. Thanks go to Peter Butler <pbutler@pt.com> for his assistance in diagnosing and fixing this problem. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:16 -04:00
Allan Stephens	71092ea122	tipc: Add support for SO_RCVTIMEO socket option Adds support for the SO_RCVTIMEO socket option to TIPC's socket receive routines. Thanks go out to Raj Hegde <rajenhegde@yahoo.ca> for his contribution to the development and testing this enhancement. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:16 -04:00
Allan Stephens	f137917332	tipc: Cosmetic changes to node subscription code Relocates the code that notifies users of node subscriptions so that it is adjacent to the rest of the routines that implement TIPC's node subscription capability. Renames the name table routine that is invoked by a node subscription to better reflect its purpose and to be consistent with other, similar name table routines. These changes are cosmetic in nature, and do not alter the behavior of TIPC. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:16 -04:00
Allan Stephens	431697eb60	tipc: Prevent null pointer error when removing a node subscription Prevents a null pointer dereference from occurring if a node subscription is triggered at the same time that the subscribing port or publication is terminating the subscription. The problem arises if the triggering routine asynchronously activates and deregisters the node subscription while deregistration is already underway -- the deregistration routine may find that the pointer it has just verified to be non-NULL is now NULL. To avoid this race condition the triggering routine now simply marks the node subscription as defunct (to prevent it from re-activating) instead of deregistering it. The subscription is now both deregistered and destroyed only when the subscribing port or publication code terminates the node subscription. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:16 -04:00
Allan Stephens	a3796f895f	tipc: Add network address mask helper routines Introduces a pair of helper routines that convert the network address for a TIPC node into the network address for its cluster or zone. This is a cosmetic change designed to avoid future errors caused by the incorrect use of address bitmasks, and does not alter the existing operation of TIPC. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:16 -04:00
Allan Stephens	aa84729484	tipc: Correct broadcast link peer info when displaying links Fixes a typo in the calculation of the network address of a node's own cluster when generating a response to the configuration command that lists all of the node's links. The correct mask value for a <Z.C.N> network address uses 1's for the 8-bit zone and 12-bit cluster parts and 0's for the 12-bit node part. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:16 -04:00
Allan Stephens	0232fd0ac4	tipc: Allow receiving into iovec containing multiple entries Enhances TIPC's socket receive routines to support iovec structures containing more than a single entry. This change leverages existing sk_buff routines to do most of the work; the only significant change to TIPC itself is that an sk_buff now records how much data has been already consumed as an numeric offset, rather than as a pointer to the first unread data byte. Signed-off-by: Allan Stephens <Allan.Stephens@windriver.com> Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>	2011-03-13 16:35:16 -04:00
David S. Miller	bef55aebd5	decnet: Convert to use flowidn where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:55 -08:00
David S. Miller	1958b856c1	net: Put fl6_* macros to struct flowi6 and use them again. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:55 -08:00
David S. Miller	4c9483b2fb	ipv6: Convert to use flowi6 where applicable. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:54 -08:00
David S. Miller	9cce96df5b	net: Put fl4_* macros to struct flowi4 and use them again. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:54 -08:00
David S. Miller	f42454d632	ipv4: Kill fib_semantic_match declaration from fib_lookup.h This function no longer exists. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:53 -08:00
David S. Miller	7e1dc7b6f7	net: Use flowi4 and flowi6 in xfrm layer. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:52 -08:00
David S. Miller	a1bbb0e698	netfilter: Use flowi4 and flowi6 in xt_TCPMSS Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:51 -08:00
David S. Miller	5a49d0e04d	netfilter: Use flowi4 and flowi6 in nf_conntrack_h323_main Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:51 -08:00
David S. Miller	b6f21b2680	ipv4: Use flowi4 in UDP Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:50 -08:00
David S. Miller	3073e5ab92	netfilter: Use flowi4 in nf_nat_standalone.c Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:50 -08:00
David S. Miller	da91981bee	ipv4: Use flowi4 in ipmr code. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:49 -08:00
David S. Miller	9ade22861f	ipv4: Use flowi4 in FIB layer. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:49 -08:00
David S. Miller	9d6ec93801	ipv4: Use flowi4 in public route lookup interfaces. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:48 -08:00
David S. Miller	68a5e3dd0a	ipv4: Use struct flowi4 internally in routing lookups. We will change the externally visible APIs next. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:48 -08:00
David S. Miller	22bd5b9b13	ipv4: Pass ipv4 flow objects into fib_lookup() paths. To start doing these conversions, we need to add some temporary flow4_* macros which will eventually go away when all the protocol code paths are changed to work on AF specific flowi objects. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:47 -08:00
David S. Miller	56bb8059e1	net: Break struct flowi out into AF specific instances. Now we have struct flowi4, flowi6, and flowidn for each address family. And struct flowi is just a union of them all. It might have been troublesome to convert flow_cache_uli_match() but as it turns out this function is completely unused and therefore can be simply removed. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:46 -08:00
David S. Miller	6281dcc94a	net: Make flowi ports AF dependent. Create two sets of port member accessors, one set prefixed by fl4_* and the other prefixed by fl6_* This will let us to create AF optimal flow instances. It will work because every context in which we access the ports, we have to be fully aware of which AF the flowi is anyways. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:46 -08:00
David S. Miller	1d28f42c1b	net: Put flowi_* prefix on AF independent members of struct flowi I intend to turn struct flowi into a union of AF specific flowi structs. There will be a common structure that each variant includes first, much like struct sock_common. This is the first step to move in that direction. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:44 -08:00
David S. Miller	ca116922af	xfrm: Eliminate "fl" and "pol" args to xfrm_bundle_ok(). There is only one caller of xfrm_bundle_ok(), and that always passes these parameters as NULL. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:43 -08:00
David S. Miller	78fbfd8a65	ipv4: Create and use route lookup helpers. The idea here is this minimizes the number of places one has to edit in order to make changes to how flows are defined and used. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-12 15:08:42 -08:00
Kevin Coffman	f8628220bb	gss:krb5 only include enctype numbers in gm_upcall_enctypes Make the value in gm_upcall_enctypes just the enctype values. This allows the values to be used more easily elsewhere. Signed-off-by: Kevin Coffman <kwc@citi.umich.edu> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Tom Tucker	5c635e09ce	RPCRDMA: Fix FRMR registration/invalidate handling. When the rpc_memreg_strategy is 5, FRMR are used to map RPC data. This mode uses an FRMR to map the RPC data, then invalidates (i.e. unregisers) the data in xprt_rdma_free. These FRMR are used across connections on the same mount, i.e. if the connection goes away on an idle timeout and reconnects later, the FRMR are not destroyed and recreated. This creates a problem for transport errors because the WR that invalidate an FRMR may be flushed (i.e. fail) leaving the FRMR valid. When the FRMR is later used to map an RPC it will fail, tearing down the transport and starting over. Over time, more and more of the FRMR pool end up in the wrong state resulting in seemingly random disconnects. This fix keeps track of the FRMR state explicitly by setting it's state based on the successful completion of a reg/inv WR. If the FRMR is ever used and found to be in the wrong state, an invalidate WR is prepended, re-syncing the FRMR state and avoiding the connection loss. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Tom Tucker	bd7ea31b9e	RPCRDMA: Fix to XDR page base interpretation in marshalling logic. The RPCRDMA marshalling logic assumed that xdr->page_base was an offset into the first page of xdr->page_list. It is in fact an offset into the xdr->page_list itself, that is, it selects the first page in the page_list and the offset into that page. The symptom depended in part on the rpc_memreg_strategy, if it was FRMR, or some other one-shot mapping mode, the connection would get torn down on a base and bounds error. When the badly marshalled RPC was retransmitted it would reconnect, get the error, and tear down the connection again in a loop forever. This resulted in a hung-mount. For the other modes, it would result in silent data corruption. This bug is most easily reproduced by writing more data than the filesystem has space for. This fix corrects the page_base assumption and otherwise simplifies the iov mapping logic. Signed-off-by: Tom Tucker <tom@ogc.us> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:39:27 -05:00
Andy Adamson	cbdabc7f8b	NFSv4.1: filelayout async error handler Use our own async error handler. Mark the layout as failed and retry i/o through the MDS on specified errors. Update the mds_offset in nfs_readpage_retry so that a failed short-read retry to a DS gets correctly resent through the MDS. Signed-off-by: Andy Adamson <andros@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:43 -05:00
Fred Isaman	eabf5baaaa	RPC: clarify rpc_run_task error handling rpc_run_task can only fail if it is not passed in a preallocated task. However, that is not at all clear with the current code. So remove several impossible to occur failure checks. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
Fred Isaman	cee6a5372f	RPC: remove check for impossible condition in rpc_make_runnable queue_work() only returns 0 or 1, never a negative value. Signed-off-by: Fred Isaman <iisaman@netapp.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-11 15:38:40 -05:00
John W. Linville	38c091590f	mac80211: implement support for cfg80211_ops->{get,set}_ringparam Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-03-11 15:34:10 -05:00
John W. Linville	3677713b79	wireless: add support for ethtool_ops->{get,set}_ringparam Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-03-11 14:16:58 -05:00
Jason Young	808118cb41	mac80211: do not enable ps if 802.1x controlled port is unblocked If dynamic_ps is disabled, enabling power save before the 4-way handshake completes may delay the station from being authorized to send/receive traffic, i.e. increase roaming times. It also may result in a failed 4-way handshake depending on the AP's timing requirements and beacon interval, and the station's listen interval. To fix this, prevent power save from being enabled while the station isn't authorized and recalculate power save whenever the station's authorized state changes. Signed-off-by: Jason Young <a.young.jason@gmail.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: John W. Linville <linville@tuxdriver.com>	2011-03-11 14:15:37 -05:00
John W. Linville	409ec36c32	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem	2011-03-11 14:11:11 -05:00
David S. Miller	1b7fe59322	ipv4: Kill flowi arg to fib_select_multipath() Completely unused. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-10 17:03:45 -08:00
David S. Miller	ff3fccb3d0	ipv4: Remove unnecessary test from ip_mkroute_input() fl->oif will always be zero on the input path, so there is no reason to test for that. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-10 17:01:01 -08:00
David S. Miller	dbdd9a52e3	ipv4: Remove redundant RCU locking in ip_check_mc(). All callers are under rcu_read_lock() protection already. Rename to ip_check_mc_rcu() to make it even more clear. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-10 16:37:26 -08:00
David S. Miller	33175d84ee	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/bnx2x/bnx2x_cmn.c	2011-03-10 14:26:00 -08:00
stephen hemminger	6dfbd87a20	ip6ip6: autoload ip6 tunnel Add necessary alias to autoload ip6ip6 tunnel module. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-10 14:18:48 -08:00
David S. Miller	bef6e7e768	Merge branch 'master' of /home/davem/src/GIT/linux-2.6/	2011-03-10 14:00:44 -08:00
Randy Dunlap	dcbcdf22f5	net: bridge builtin vs. ipv6 modular When configs BRIDGE=y and IPV6=m, this build error occurs: br_multicast.c:(.text+0xa3341): undefined reference to `ipv6_dev_get_saddr' BRIDGE_IGMP_SNOOPING is boolean; if it were tristate, then adding depends on IPV6 \|\| IPV6=n to BRIDGE_IGMP_SNOOPING would be a good fix. As it is currently, making BRIDGE depend on the IPV6 config works. Reported-by: Patrick Schaaf <netdev@bof.de> Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-10 13:45:57 -08:00
Ben Hutchings	4cea288aaf	sunrpc: Propagate errors from xs_bind() through xs_create_sock() xs_create_sock() is supposed to return a pointer or an ERR_PTR-encoded error, but it currently returns 0 if xs_bind() fails. Signed-off-by: Ben Hutchings <bhutchings@solarflare.com> Cc: stable@kernel.org [v2.6.37] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:58 -05:00
Jesper Juhl	a5e5026810	SUNRPC: Remove resource leak in svc_rdma_send_error() We leak the memory allocated to 'ctxt' when we return after 'ib_dma_mapping_error()' returns !=0. Signed-off-by: Jesper Juhl <jj@chaosbits.net> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:54 -05:00
Trond Myklebust	bf294b41ce	SUNRPC: Close a race in __rpc_wait_for_completion_task() Although they run as rpciod background tasks, under normal operation (i.e. no SIGKILL), functions like nfs_sillyrename(), nfs4_proc_unlck() and nfs4_do_close() want to be fully synchronous. This means that when we exit, we want all references to the rpc_task to be gone, and we want any dentry references etc. held by that task to be released. For this reason these functions call __rpc_wait_for_completion_task(), followed by rpc_put_task() in the expectation that the latter will be releasing the last reference to the rpc_task, and thus ensuring that the callback_ops->rpc_release() has been called synchronously. This patch fixes a race which exists due to the fact that rpciod calls rpc_complete_task() (in order to wake up the callers of __rpc_wait_for_completion_task()) and then subsequently calls rpc_put_task() without ensuring that these two steps are done atomically. In order to avoid adding new spin locks, the patch uses the existing waitqueue spin lock to order the rpc_task reference count releases between the waiting process and rpciod. The common case where nobody is waiting for completion is optimised for by checking if the RPC_TASK_ASYNC flag is cleared and/or if the rpc_task reference count is 1: in those cases we drop trying to grab the spin lock, and immediately free up the rpc_task. Those few processes that need to put the rpc_task from inside an asynchronous context and that do not care about ordering are given a new helper: rpc_put_task_async(). Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>	2011-03-10 15:04:52 -05:00
Stephen Hemminger	a252bebe22	tcp: mark tcp_congestion_ops read_mostly Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-10 00:40:17 -08:00
David S. Miller	cc7e17ea04	ipv4: Optimize flow initialization in fib_validate_source(). Like in commit `44713b67db` ("ipv4: Optimize flow initialization in output route lookup." we can optimize the on-stack flow setup to only initialize the members which are actually used. Otherwise we bzero the entire structure, then initialize explicitly the first half of it. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-09 20:57:50 -08:00
David S. Miller	67e28ffd86	ipv4: Optimize flow initialization in input route lookup. Like in commit `44713b67db` ("ipv4: Optimize flow initialization in output route lookup." we can optimize the on-stack flow setup to only initialize the members which are actually used. Otherwise we bzero the entire structure, then initialize explicitly the first half of it. Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-09 20:42:07 -08:00
David S. Miller	7343ff31eb	ipv6: Don't create clones of host routes. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=29252 Addresses https://bugzilla.kernel.org/show_bug.cgi?id=30462 In commit `d80bc0fd26` ("ipv6: Always clone offlink routes.") we forced the kernel to always clone offlink routes. The reason we do that is to make sure we never bind an inetpeer to a prefixed route. The logic turned on here has existed in the tree for many years, but was always off due to a protecting CPP define. So perhaps it's no surprise that there is a logic bug here. The problem is that we canot clone a route that is already a host route (ie. has DST_HOST set). Because if we do, an identical entry already exists in the routing tree and therefore the ip6_rt_ins() call is going to fail. This sets off a series of failures and high cpu usage, because when ip6_rt_ins() fails we loop retrying this operation a few times in order to handle a race between two threads trying to clone and insert the same host route at the same time. Fix this by simply using the route as-is when DST_HOST is set. Reported-by: slash@ac.auone-net.jp Reported-by: Ernst Sjöstrand <ernstp@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-09 19:55:25 -08:00
J. Bruce Fields	352b5d13c0	svcrpc: fix bad argument in unix_domain_find "After merging the nfsd tree, today's linux-next build (powerpc ppc64_defconfig) produced this warning: net/sunrpc/svcauth_unix.c: In function 'unix_domain_find': net/sunrpc/svcauth_unix.c:58: warning: passing argument 1 of +'svcauth_unix_domain_release' from incompatible pointer type net/sunrpc/svcauth_unix.c:41: note: expected 'struct auth_domain ' but argument +is of type 'struct unix_domain ' Introduced by commit `8b3e07ac90` ("svcrpc: fix rare race on unix_domain creation")." Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: J. Bruce Fields <bfields@redhat.com>	2011-03-09 22:40:30 -05:00
Vasiliy Kulikov	8909c9ad8f	net: don't allow CAP_NET_ADMIN to load non-netdev kernel modules Since `a8f80e8ff9` any process with CAP_NET_ADMIN may load any module from /lib/modules/. This doesn't mean that CAP_NET_ADMIN is a superset of CAP_SYS_MODULE as modules are limited to /lib/modules/**. However, CAP_NET_ADMIN capability shouldn't allow anybody load any module not related to networking. This patch restricts an ability of autoloading modules to netdev modules with explicit aliases. This fixes CVE-2011-1019. Arnd Bergmann suggested to leave untouched the old pre-v2.6.32 behavior of loading netdev modules by name (without any prefix) for processes with CAP_SYS_MODULE to maintain the compatibility with network scripts that use autoloading netdev modules by aliases like "eth0", "wlan0". Currently there are only three users of the feature in the upstream kernel: ipip, ip_gre and sit. root@albatros:~# capsh --drop=$(seq -s, 0 11),$(seq -s, 13 34) -- root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: fffffff800001000 CapEff: fffffff800001000 CapBnd: fffffff800001000 root@albatros:~# modprobe xfs FATAL: Error inserting xfs (/lib/modules/2.6.38-rc6-00001-g2bf4ca3/kernel/fs/xfs/xfs.ko): Operation not permitted root@albatros:~# lsmod \| grep xfs root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod \| grep xfs root@albatros:~# lsmod \| grep sit root@albatros:~# ifconfig sit sit: error fetching interface information: Device not found root@albatros:~# lsmod \| grep sit root@albatros:~# ifconfig sit0 sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1 root@albatros:~# lsmod \| grep sit sit 10457 0 tunnel4 2957 1 sit For CAP_SYS_MODULE module loading is still relaxed: root@albatros:~# grep Cap /proc/$$/status CapInh: 0000000000000000 CapPrm: ffffffffffffffff CapEff: ffffffffffffffff CapBnd: ffffffffffffffff root@albatros:~# ifconfig xfs xfs: error fetching interface information: Device not found root@albatros:~# lsmod \| grep xfs xfs 745319 0 Reference: https://lkml.org/lkml/2011/2/24/203 Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Signed-off-by: Michael Tokarev <mjt@tls.msk.ru> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: James Morris <jmorris@namei.org>	2011-03-10 10:25:19 +11:00
Daniel Turull	03a14ab134	pktgen: fix errata in show results The units in show_results in pktgen were not correct. The results are in usec but it was displayed nsec. Reported-by: Jong-won Lee <ljw@handong.edu> Signed-off-by: Daniel Turull <daniel.turull@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2011-03-09 14:11:00 -08:00

... 2 3 4 5 6 ...

18826 commits