Commit Graph

15714 Commits (dbaa154178341689faaa08fbf40b94ae5ca1d6c0)

Author SHA1 Message Date
Joe Perches 288fcee8b7 net/ipv4/tcp_input.c: fix compilation breakage when FASTRETRANS_DEBUG > 1
Commit: c720c7e838 missed these.

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 23:52:39 -07:00
David S. Miller 5953a30347 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2010-05-31 23:44:57 -07:00
Eric Dumazet b1faf56664 net: sock_queue_err_skb() dont mess with sk_forward_alloc
Correct sk_forward_alloc handling for error_queue would need to use a
backlog of frames that softirq handler could not deliver because socket
is owned by user thread. Or extend backlog processing to be able to
process normal and error packets.

Another possibility is to not use mem charge for error queue, this is
what I implemented in this patch.

Note: this reverts commit 29030374
(net: fix sk_forward_alloc corruptions), since we dont need to lock
socket anymore.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 23:44:05 -07:00
Eric Dumazet 7489aec8ee netfilter: xtables: stackptr should be percpu
commit f3c5c1bfd4 (netfilter: xtables: make ip_tables reentrant)
introduced a performance regression, because stackptr array is shared by
all cpus, adding cache line ping pongs. (16 cpus share a 64 bytes cache
line)

Fix this using alloc_percpu()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-By: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-31 16:41:35 +02:00
Xiaotian Feng c936e8bd1d netfilter: don't xt_jumpstack_alloc twice in xt_register_table
In xt_register_table, xt_jumpstack_alloc is called first, later
xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc
will be used again. Then the memory allocated by previous xt_jumpstack_alloc
will be leaked. We can simply remove the previous xt_jumpstack_alloc because
there aren't any users of newinfo between xt_jumpstack_alloc and
xt_replace_table.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: Patrick McHardy <kaber@trash.net>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Acked-By: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-31 16:41:09 +02:00
David S. Miller 64960848ab Merge branch 'master' of /home/davem/src/GIT/linux-2.6/ 2010-05-31 05:46:45 -07:00
Ian Campbell 06c4648d46 arp_notify: allow drivers to explicitly request a notification event.
Currently such notifications are only generated when the device comes up or the
address changes. However one use case for these notifications is to enable
faster network recovery after a virtual machine migration (by causing switches
to relearn their MAC tables). A migration appears to the network stack as a
temporary loss of carrier and therefore does not trigger either of the current
conditions. Rather than adding carrier up as a trigger (which can cause issues
when interfaces a flapping) simply add an interface which the driver can use
to explicitly trigger the notification.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Cc: stable@kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:27:44 -07:00
Dan Carpenter 9bfca3c6e5 caif: cleanup: remove duplicate checks
"phyinfo" can never be null here because we assigned it an address, so I
removed both the assert and the second check inside the if statement.  I
removed the "phyinfo->phy_layer != NULL" check as well because that was
asserted earlier.

Walter Harms suggested I move the "phyinfo->phy_ref_count++;" outside
the if condition for readability, so I have done that.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:24:16 -07:00
Dan Carpenter f5d72af9f3 caif: remove unneeded null check in caif_connect()
We already dereferenced uaddr towards the start of the function when we
checked that "uaddr->sa_family != AF_CAIF".  Both the check here and the
earlier check were added in bece7b2398d0: "caif: Rewritten socket
implementation".  Before that patch, we assumed that we recieved a valid
pointer for uaddr, and based on that, I have removed this check.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:24:16 -07:00
Julia Lawall 042604d2a3 net/dccp: Use memdup_user
Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:24:14 -07:00
Julia Lawall 16dff91804 net/can: Use memdup_user
Use memdup_user when user data is immediately copied into the
allocated region.

The semantic patch that makes this change is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression from,to,size,flag;
position p;
identifier l1,l2;
@@

-  to = \(kmalloc@p\|kzalloc@p\)(size,flag);
+  to = memdup_user(from,size);
   if (
-      to==NULL
+      IS_ERR(to)
                 || ...) {
   <+... when != goto l1;
-  -ENOMEM
+  PTR_ERR(to)
   ...+>
   }
-  if (copy_from_user(to, from, size) != 0) {
-    <+... when != goto l2;
-    -EFAULT
-    ...+>
-  }
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:24:11 -07:00
Eric Dumazet d7fd1b5747 tcp: tcp_md5_hash_skb_data() frag_list handling
tcp_md5_hash_skb_data() should handle skb->frag_list, and eventually
recurse.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:24:02 -07:00
Eric Dumazet 15e83ed788 net: remove zap_completion_queue
netpoll does an interesting work in zap_completion_queue(), but this was
before we did skb orphaning before delivering packets to device.

It now makes sense to add a test in dev_kfree_skb_irq() to not queue a
skb if already orphaned, and to remove netpoll zap_completion_queue() as
a bonus.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:24:01 -07:00
Eric Dumazet 27f39c73e6 net: Use __this_cpu_inc() in fast path
This patch saves 224 bytes of text on my machine.

__this_cpu_inc() generates a single instruction, using no scratch
registers :

  65 ff 04 25 a8 30 01 00      incl   %gs:0x130a8

instead of :

  48 c7 c2 80 30 01 00         mov    $0x13080,%rdx
  65 48 8b 04 25 88 ea 00 00   mov    %gs:0xea88,%rax
  83 44 10 28 01               addl   $0x1,0x28(%rax,%rdx,1)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-31 00:24:01 -07:00
David S. Miller 92b4522f72 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-05-31 00:10:35 -07:00
Eric Dumazet 2903037400 net: fix sk_forward_alloc corruptions
As David found out, sock_queue_err_skb() should be called with socket
lock hold, or we risk sk_forward_alloc corruption, since we use non
atomic operations to update this field.

This patch adds bh_lock_sock()/bh_unlock_sock() pair to three spots.
(BH already disabled)

1) skb_tstamp_tx() 
2) Before calling ip_icmp_error(), in __udp4_lib_err() 
3) Before calling ipv6_icmp_error(), in __udp6_lib_err()

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-29 00:20:48 -07:00
Rémi Denis-Courmont 7dfde179c3 Phonet: listening socket lock protects the connected socket list
The accept()'d socket need to be unhashed while the (listen()'ing)
socket lock is held. This fixes a race condition that could lead to an
OOPS.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-29 00:18:50 -07:00
Dan Carpenter 97dc875f90 caif: unlock on error path in cfserl_receive()
There was an spin_unlock missing on the error path.  The spin_lock was
tucked in with the declarations so it was hard to spot.  I added a new
line.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Acked-by: Sjur Brændeland <sjurbren@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-29 00:18:50 -07:00
Julia Lawall 5daf47bb4e net/rds: Add missing mutex_unlock
Add a mutex_unlock missing on the error path.  In each case, whenever the
label out is reached from elsewhere in the function, mutex is not locked.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* mutex_lock(E1);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* mutex_unlock(E1);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Reviewed-by: Zach Brown <zach.brown@oracle.com>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-29 00:18:48 -07:00
Changli Gao 5b0daa3474 skb: make skb_recycle_check() return a bool value
Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-29 00:12:13 -07:00
Brian Haley 6057fd78a8 IPv6: fix Mobile IPv6 regression
Commit f4f914b5 (net: ipv6 bind to device issue) caused
a regression with Mobile IPv6 when it changed the meaning
of fl->oif to become a strict requirement of the route
lookup.  Instead, only force strict mode when
sk->sk_bound_dev_if is set on the calling socket, getting
the intended behavior and fixing the regression.

Tested-by: Arnaud Ebalard <arno@natisbad.org>
Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 23:02:35 -07:00
David S. Miller e5e5cf4c71 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-05-28 22:36:41 -07:00
Johannes Berg 368a07d26a mac80211: make a function static
sparse correctly complains that
__ieee80211_get_channel_mode is not static.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-28 13:41:27 -04:00
Linus Torvalds 72da3bc0cb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits)
  netlink: bug fix: wrong size was calculated for vfinfo list blob
  netlink: bug fix: don't overrun skbs on vf_port dump
  xt_tee: use skb_dst_drop()
  netdev/fec: fix ifconfig eth0 down hang issue
  cnic: Fix context memory init. on 5709.
  drivers/net: Eliminate a NULL pointer dereference
  drivers/net/hamradio: Eliminate a NULL pointer dereference
  be2net: Patch removes redundant while statement in loop.
  ipv6: Add GSO support on forwarding path
  net: fix __neigh_event_send()
  vhost: fix the memory leak which will happen when memory_access_ok fails
  vhost-net: fix to check the return value of copy_to/from_user() correctly
  vhost: fix to check the return value of copy_to/from_user() correctly
  vhost: Fix host panic if ioctl called with wrong index
  net: fix lock_sock_bh/unlock_sock_bh
  net/iucv: Add missing spin_unlock
  net: ll_temac: fix checksum offload logic
  net: ll_temac: fix interrupt bug when interrupt 0 is used
  sctp: dubious bitfields in sctp_transport
  ipmr: off by one in __ipmr_fill_mroute()
  ...
2010-05-28 10:18:40 -07:00
Scott Feldman 045de01a17 netlink: bug fix: wrong size was calculated for vfinfo list blob
The wrong size was being calculated for vfinfo.  In one case, it was over-
calculating using nlmsg_total_size on attrs, in another case, it was
under-calculating by assuming ifla_vf_* structs are packed together, but
each struct is it's own attr w/ hdr (and padding).

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 03:42:43 -07:00
Scott Feldman 8ca9418350 netlink: bug fix: don't overrun skbs on vf_port dump
Noticed by Patrick McHardy: was continuing to fill skb after a
nla_put_failure, ignoring the size calculated by upper layer.  Now,
return -EMSGSIZE on any overruns, but also allow netdev to
fail ndo_get_vf_port with error other than -EMSGSIZE, thus unwinding
nest.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 03:42:18 -07:00
Eric Dumazet 50636af715 xt_tee: use skb_dst_drop()
After commit 7fee226a (net: add a noref bit on skb dst), its wrong to
use : dst_release(skb_dst(skb)), since we could decrement a refcount
while skb dst was not refcounted.

We should use skb_dst_drop(skb) instead.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 03:41:17 -07:00
Herbert Xu 0aa6827151 ipv6: Add GSO support on forwarding path
Currently we disallow GSO packets on the IPv6 forward path.
This patch fixes this.

Note that I discovered that our existing GSO MTU checks (e.g.,
IPv4 forwarding) are buggy in that they skip the check altogether,
when they really should be checking gso_size + header instead.

I have also been lazy here in that I haven't bothered to segment
the GSO packet by hand before generating an ICMP message.  Someone
should add that to be 100% correct.

Reported-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:17 -07:00
Eric Dumazet a47311380e net: fix __neigh_event_send()
commit 7fee226ad2 (net: add a noref bit on skb dst) missed one spot
where an skb is enqueued, with a possibly not refcounted dst entry.

__neigh_event_send() inserts skb into arp_queue, so we must make sure
dst entry is refcounted, or dst entry can be freed by garbage collector
after caller exits from rcu protected section.

Reported-by: Ingo Molnar <mingo@elte.hu>
Tested-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-28 01:57:16 -07:00
Linus Torvalds c5617b200a Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (61 commits)
  tracing: Add __used annotation to event variable
  perf, trace: Fix !x86 build bug
  perf report: Support multiple events on the TUI
  perf annotate: Fix up usage of the build id cache
  x86/mmiotrace: Remove redundant instruction prefix checks
  perf annotate: Add TUI interface
  perf tui: Remove annotate from popup menu after failure
  perf report: Don't start the TUI if -D is used
  perf: Fix getline undeclared
  perf: Optimize perf_tp_event_match()
  perf: Remove more code from the fastpath
  perf: Optimize the !vmalloc backed buffer
  perf: Optimize perf_output_copy()
  perf: Fix wakeup storm for RO mmap()s
  perf-record: Share per-cpu buffers
  perf-record: Remove -M
  perf: Ensure that IOC_OUTPUT isn't used to create multi-writer buffers
  perf, trace: Optimize tracepoints by using per-tracepoint-per-cpu hlist to track events
  perf, trace: Optimize tracepoints by removing IRQ-disable from perf/tracepoint interaction
  perf tui: Allow disabling the TUI on a per command basis in ~/.perfconfig
  ...
2010-05-27 15:23:47 -07:00
Linus Torvalds ade61088bc Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Fix another nfs_wb_page() deadlock
  NFS: Ensure that we mark the inode as dirty if we exit early from commit
  NFS: Fix a lock imbalance typo in nfs_access_cache_shrinker
  sunrpc: fix leak on error on socket xprt setup
2010-05-27 10:18:44 -07:00
Akinobu Mita 92e99a98bb iucv: convert cpu notifier to return encapsulate errno value
By the previous modification, the cpu notifier can return encapsulate
errno value.  This converts the cpu notifiers for iucv.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Ursula Braun <ursula.braun@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-27 09:12:48 -07:00
Eric Dumazet 8a74ad60a5 net: fix lock_sock_bh/unlock_sock_bh
This new sock lock primitive was introduced to speedup some user context
socket manipulation. But it is unsafe to protect two threads, one using
regular lock_sock/release_sock, one using lock_sock_bh/unlock_sock_bh

This patch changes lock_sock_bh to be careful against 'owned' state.
If owned is found to be set, we must take the slow path.
lock_sock_bh() now returns a boolean to say if the slow path was taken,
and this boolean is used at unlock_sock_bh time to call the appropriate
unlock function.

After this change, BH are either disabled or enabled during the
lock_sock_bh/unlock_sock_bh protected section. This might be misleading,
so we rename these functions to lock_sock_fast()/unlock_sock_fast().

Reported-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Tested-by: Anton Blanchard <anton@samba.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-27 00:30:53 -07:00
Julia Lawall a56635a56f net/iucv: Add missing spin_unlock
Add a spin_unlock missing on the error path.  There seems like no reason
why the lock should continue to be held if the kzalloc fail.

The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)

// <smpl>
@@
expression E1;
@@

* spin_lock(E1,...);
  <+... when != E1
  if (...) {
    ... when != E1
*   return ...;
  }
  ...+>
* spin_unlock(E1,...);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 21:09:51 -07:00
J. Bruce Fields 0a68b0bed0 sunrpc: fix leak on error on socket xprt setup
Also collect exit code together while we're at it.

Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-26 08:43:50 -04:00
Dan Carpenter ed0f160ad6 ipmr: off by one in __ipmr_fill_mroute()
This fixes a smatch warning:
	net/ipv4/ipmr.c +1917 __ipmr_fill_mroute(12) error: buffer overflow
	'(mrt)->vif_table' 32 <= 32

The ipv6 version had the same issue.

Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-26 00:38:56 -07:00
Linus Torvalds b1cdc4670b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (63 commits)
  drivers/net/usb/asix.c: Fix pointer cast.
  be2net: Bug fix to avoid disabling bottom half during firmware upgrade.
  proc_dointvec: write a single value
  hso: add support for new products
  Phonet: fix potential use-after-free in pep_sock_close()
  ath9k: remove VEOL support for ad-hoc
  ath9k: change beacon allocation to prefer the first beacon slot
  sock.h: fix kernel-doc warning
  cls_cgroup: Fix build error when built-in
  macvlan: do proper cleanup in macvlan_common_newlink() V2
  be2net: Bug fix in init code in probe
  net/dccp: expansion of error code size
  ath9k: Fix rx of mcast/bcast frames in PS mode with auto sleep
  wireless: fix sta_info.h kernel-doc warnings
  wireless: fix mac80211.h kernel-doc warnings
  iwlwifi: testing the wrong variable in iwl_add_bssid_station()
  ath9k_htc: rare leak in ath9k_hif_usb_alloc_tx_urbs()
  ath9k_htc: dereferencing before check in hif_usb_tx_cb()
  rt2x00: Fix rt2800usb TX descriptor writing.
  rt2x00: Fix failed SLEEP->AWAKE and AWAKE->SLEEP transitions.
  ...
2010-05-25 16:59:51 -07:00
Rémi Denis-Courmont e513480e28 Phonet: fix potential use-after-free in pep_sock_close()
sk_common_release() might destroy our last reference to the socket.
So an extra temporary reference is needed during cleanup.

Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-25 16:08:39 -07:00
David S. Miller a261af927d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-05-25 13:15:11 -07:00
Alex Riesen ef7ffe8f06 sunrpc: use formatting of module name in SUNRPC
gcc-4.3.3 produces the warning:
  "format not a string literal and no format arguments"

Signed-off-by: Alex Riesen <raa.lkml@gmail.com>
Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: Chuck Lever <cel@citi.umich.edu>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Tom Talpey <tmtalpey@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:03 -07:00
Alexey Dobriyan 4be929be34 kernel-wide: replace USHORT_MAX, SHORT_MAX and SHORT_MIN with USHRT_MAX, SHRT_MAX and SHRT_MIN
- C99 knows about USHRT_MAX/SHRT_MAX/SHRT_MIN, not
  USHORT_MAX/SHORT_MAX/SHORT_MIN.

- Make SHRT_MIN of type s16, not int, for consistency.

[akpm@linux-foundation.org: fix drivers/dma/timb_dma.c]
[akpm@linux-foundation.org: fix security/keys/keyring.c]
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Acked-by: WANG Cong <xiyou.wangcong@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-25 08:07:02 -07:00
Yoichi Yuasa d9b52dc6fd net/dccp: expansion of error code size
Because MIPS's EDQUOT value is 1133(0x46d).
It's larger than u8.

Signed-off-by: Yoichi Yuasa <yuasa@linux-mips.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 18:37:02 -07:00
Randy Dunlap a0c9101c05 wireless: fix sta_info.h kernel-doc warnings
Fix sta_info.h kernel-doc warnings:

Warning(net/mac80211/sta_info.h:164): No description found for parameter 'tid_active_rx[STA_TID_NUM]'
Warning(net/mac80211/sta_info.h:164): Excess struct/union/enum/typedef member 'tid_state_rx' description in 'sta_ampdu_mlme'

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 15:07:43 -04:00
John W. Linville 3dc3fc52ea Revert "ath9k: Group Key fix for VAPs"
This reverts commit 03ceedea97.

This patch was reported to cause a regression in which connectivity is
lost and cannot be reestablished after a suspend/resume cycle.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-24 14:59:27 -04:00
Linus Torvalds f13771187b Merge branch 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/ioctl' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  uml: Pushdown the bkl from harddog_kern ioctl
  sunrpc: Pushdown the bkl from sunrpc cache ioctl
  sunrpc: Pushdown the bkl from ioctl
  autofs4: Pushdown the bkl from ioctl
  uml: Convert to unlocked_ioctls to remove implicit BKL
  ncpfs: BKL ioctl pushdown
  coda: Clean-up whitespace problems in pioctl.c
  coda: BKL ioctl pushdown
  drivers: Push down BKL into various drivers
  isdn: Push down BKL into ioctl functions
  scsi: Push down BKL into ioctl functions
  dvb: Push down BKL into ioctl functions
  smbfs: Push down BKL into ioctl function
  coda/psdev: Remove BKL from ioctl function
  um/mmapper: Remove BKL usage
  sn_hwperf: Kill BKL usage
  hfsplus: Push down BKL into ioctl function
2010-05-24 08:01:10 -07:00
Linus Torvalds a69eee4988 Revert "ath9k: Group Key fix for VAPs"
This reverts commit 03ceedea97, since it
breaks resume from suspend-to-ram on Rafael's Acer Ferrari One.
NetworkManager thinks everything is ok, but it can't connect to the AP
to get an IP address after the resume.

In fact, it even breaks resume for non-ath9k chipsets: reverting it also
fixes Rafael's Toshiba Protege R500 with the iwlagn driver.  As Johannes
says:

  "Indeed, this patch needs to be reverted. That mac80211 change is wrong
   and completely unnecessary."

Reported-and-requested-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Cc: Daniel Yingqiang Ma <yma.cool@gmail.com>
Cc: John W. Linville <linville@tuxdriver.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-05-24 07:45:43 -07:00
Linus Torvalds 4fd5ec509b Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
  9p: Optimize TCREATE by eliminating a redundant fid clone.
  9p: cleanup: remove unneeded assignment
  9p: Add mksock support
  fs/9p: Make sure we properly instantiate dentry.
  9p: add 9P2000.L rename operation
  9p: add 9P2000.L statfs operation
  9p: VFS switches for 9p2000.L: VFS switches
  9p: VFS switches for 9p2000.L: protocol and client changes
2010-05-24 07:41:13 -07:00
Herbert Xu 8286274284 tun: Update classid on packet injection
This patch makes tun update its socket classid every time we
inject a packet into the network stack.  This is so that any
updates made by the admin to the process writing packets to
tun is effected.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 00:14:10 -07:00
Herbert Xu f845172531 cls_cgroup: Store classid in struct sock
Up until now cls_cgroup has relied on fetching the classid out of
the current executing thread.  This runs into trouble when a packet
processing is delayed in which case it may execute out of another
thread's context.

Furthermore, even when a packet is not delayed we may fail to
classify it if soft IRQs have been disabled, because this scenario
is indistinguishable from one where a packet unrelated to the
current thread is processed by a real soft IRQ.

In fact, the current semantics is inherently broken, as a single
skb may be constructed out of the writes of two different tasks.
A different manifestation of this problem is when the TCP stack
transmits in response of an incoming ACK.  This is currently
unclassified.

As we already have a concept of packet ownership for accounting
purposes in the skb->sk pointer, this is a natural place to store
the classid in a persistent manner.

This patch adds the cls_cgroup classid in struct sock, filling up
an existing hole on 64-bit :)

The value is set at socket creation time.  So all sockets created
via socket(2) automatically gains the ID of the thread creating it.
Whenever another process touches the socket by either reading or
writing to it, we will change the socket classid to that of the
process if it has a valid (non-zero) classid.

For sockets created on inbound connections through accept(2), we
inherit the classid of the original listening socket through
sk_clone, possibly preceding the actual accept(2) call.

In order to minimise risks, I have not made this the authoritative
classid.  For now it is only used as a backup when we execute
with soft IRQs disabled.  Once we're completely happy with its
semantics we can use it as the sole classid.

Footnote: I have rearranged the error path on cls_group module
creation.  If we didn't do this, then there is a window where
someone could create a tc rule using cls_group before the cgroup
subsystem has been registered.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-24 00:12:34 -07:00
Sjur Braendeland dcda138d2f caif: Bugfix - use MSG_TRUNC in receive
Fixed handling when skb don't fit in user buffer,
instead of returning -EMSGSIZE, the buffer is truncated (just
as unix seqpakcet does).

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:57:43 -07:00
Sjur Braendeland a9a8f1070d caif: Bugfix - missing spin_unlock
Splint found missing spin_unlock.
Corrected this an some other trivial split warnings.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:57:43 -07:00
Sjur Braendeland ca6a09f25c caif: Bugfix - Poll can't return POLLHUP while connecting.
Discovered bug when testing async connect.
While connecting poll should not return POLLHUP,
but POLLOUT when connected.
Also fixed the sysfs flow-control-counters.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:57:42 -07:00
Sjur Braendeland 638e628a60 caif: Bugfix - handle mem-allocation failures
Discovered bugs when injecting slab allocation failures.
Add checks on all memory allocation.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:57:42 -07:00
Sjur Braendeland 7aecf4944f caif: Bugfix - use standard Linux lists
Discovered bug when running high number of parallel connect requests.
Replace buggy home brewed list with linux/list.h.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:57:41 -07:00
Sjur Braendeland 9e4b816bc3 caif: Bugfix - wait_ev*_timeout returns long.
Discovered bug when testing on 64bit architecture.
Fixed by using long to store result from wait_event_interruptible_timeout.

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:57:41 -07:00
Daniel Lezcano 8ce6cebc2f net-2.6 : V2 - fix dev_get_valid_name
the commit:

commit d90310243f
Author: Octavian Purdila <opurdila@ixiacom.com>
Date:   Wed Nov 18 02:36:59 2009 +0000

    net: device name allocation cleanups

introduced a bug when there is a hash collision making impossible
to rename a device with eth%d. This bug is very hard to reproduce
and appears rarely.

The problem is coming from we don't pass a temporary buffer to
__dev_alloc_name but 'dev->name' which is modified by the function.

A detailed explanation is here:

http://marc.info/?l=linux-netdev&m=127417784011987&w=2

Changelog:
 V2 : replaced strings comparison by pointers comparison

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Reviewed-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:24:36 -07:00
David Howells 253683bbfb rtnetlink: Fix error handling in do_setlink()
Commit c02db8c6290bb992442fec1407643c94cc414375:

	Author:  Chris Wright <chrisw@sous-sol.org>
	Date:    Sun May 16 01:05:45 2010 -0700
	Subject: rtnetlink: make SR-IOV VF interface symmetric

adds broken error handling to do_setlink() in net/core/rtnetlink.c.  The
problem is the following chunk of code:

	if (tb[IFLA_VFINFO_LIST]) {
		struct nlattr *attr;
		int rem;
		nla_for_each_nested(attr, tb[IFLA_VFINFO_LIST], rem) {
			if (nla_type(attr) != IFLA_VF_INFO)
  ---->				goto errout;
			err = do_setvfinfo(dev, attr);
			if (err < 0)
				goto errout;
			modified = 1;
		}
	}

which can get to errout without setting err, resulting in the following error:

net/core/rtnetlink.c: In function 'do_setlink':
net/core/rtnetlink.c:904: warning: 'err' may be used uninitialized in this function

Change the code to return -EINVAL in this case.  Note that this might not be
the appropriate error though.

Signed-off-by: David Howells <dhowells@redhat.com>
cc: Chris Wright <chrisw@sous-sol.org>
cc: David S. Miller <davem@davemloft.net>
Acked-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:11:08 -07:00
Mike Frysinger eb1669aed9 net-caif: drop redundant Kconfig entries
There is already a submenu entry that is always displayed, so there is
no need to also show a dedicated CAIF comment.

Drop dead commented code while we're here, and change the submenu text
to better match the style everyone else is using.

Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:11:08 -07:00
Eric Dumazet 53b0f08042 net_sched: Fix qdisc_notify()
Ben Pfaff reported a kernel oops and provided a test program to
reproduce it.

https://kerneltrap.org/mailarchive/linux-netdev/2010/5/21/6277805

tc_fill_qdisc() should not be called for builtin qdisc, or it
dereference a NULL pointer to get device ifindex.

Fix is to always use tc_qdisc_dump_ignore() before calling
tc_fill_qdisc().

Reported-by: Ben Pfaff <blp@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:11:07 -07:00
Denis Kirjanov a6c0f8217c ieee802154: Fix possible NULL pointer dereference in wpan_phy_alloc
Check for NULL pointer after kzalloc

Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-23 23:11:07 -07:00
Frederic Weisbecker 9918ff26b3 sunrpc: Pushdown the bkl from sunrpc cache ioctl
Pushdown the bkl to cache_ioctl_pipefs.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Nfs <linux-nfs@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
2010-05-22 17:44:20 +02:00
Frederic Weisbecker 674b604cdd sunrpc: Pushdown the bkl from ioctl
Pushdown the bkl to rpc_pipe_ioctl.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: "J. Bruce Fields" <bfields@fieldses.org>
Cc: Neil Brown <neilb@suse.de>
Cc: Nfs <linux-nfs@vger.kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: John Kacur <jkacur@redhat.com>
Cc: Arnd Bergmann <arnd@arndb.de>
2010-05-22 17:44:19 +02:00
Linus Torvalds 1756ac3d3c Merge branch 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus
* 'virtio' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus: (27 commits)
  drivers/char: Eliminate use after free
  virtio: console: Accept console size along with resize control message
  virtio: console: Store each console's size in the console structure
  virtio: console: Resize console port 0 on config intr only if multiport is off
  virtio: console: Add support for nonblocking write()s
  virtio: console: Rename wait_is_over() to will_read_block()
  virtio: console: Don't always create a port 0 if using multiport
  virtio: console: Use a control message to add ports
  virtio: console: Move code around for future patches
  virtio: console: Remove config work handler
  virtio: console: Don't call hvc_remove() on unplugging console ports
  virtio: console: Return -EPIPE to hvc_console if we lost the connection
  virtio: console: Let host know of port or device add failures
  virtio: console: Add a __send_control_msg() that can send messages without a valid port
  virtio: Revert "virtio: disable multiport console support."
  virtio: add_buf_gfp
  trans_virtio: use virtqueue_xxx wrappers
  virtio-rng: use virtqueue_xxx wrappers
  virtio_ring: remove a level of indirection
  virtio_net: use virtqueue_xxx wrappers
  ...

Fix up conflicts in drivers/net/virtio_net.c due to new virtqueue_xxx
wrappers changes conflicting with some other cleanups.
2010-05-21 17:22:52 -07:00
Sripathi Kodi 4681dbdacb 9p: add 9P2000.L rename operation
I made a V2 of this patch on top of my patches for VFS switches.
All the changes were due to change in some offsets.

rename - change name of file or directory

size[4] Trename tag[2] fid[4] newdirfid[4] name[s]
size[4] Rrename tag[2]

The rename message is used to change the name of a file, possibly moving it
to a new directory.  The 9P wstat message can only rename a file within the
same directory.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-05-21 16:44:34 -05:00
Sripathi Kodi bda8e77520 9p: add 9P2000.L statfs operation
I made a V2 of this patch on top of my patches for VFS switches. The
change was adding v9fs_statfs pointer to v9fs_super_ops_dotl
instead of v9fs_super_ops.

statfs - get file system statistics

size[4] Tstatfs tag[2] fid[4]
size[4] Rstatfs tag[2] type[4] bsize[4] blocks[8] bfree[8] bavail[8]
                files[8] ffree[8] fsid[8] namelen[4]

The statfs message is used to request file system information returned
by the statfs(2) system call, which is used by df(1) to report file
system and disk space usage.

Signed-off-by: Jim Garlick <garlick@llnl.gov>
Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-05-21 16:44:33 -05:00
Sripathi Kodi c56e4acf55 9p: VFS switches for 9p2000.L: protocol and client changes
Prepare p9pdu_read/write functions to handle multiple protocols.

Signed-off-by: Sripathi Kodi <sripathik@in.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-05-21 16:44:32 -05:00
Jens Axboe ee9a3607fb Merge branch 'master' into for-2.6.35
Conflicts:
	fs/ext3/fsync.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:27:26 +02:00
Jens Axboe 35f3d14dbb pipe: add support for shrinking and growing pipes
This patch adds F_GETPIPE_SZ and F_SETPIPE_SZ fcntl() actions for
growing and shrinking the size of a pipe and adjusts pipe.c and splice.c
(and relay and network splice) usage to work with these larger (or smaller)
pipes.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-05-21 21:12:40 +02:00
Johannes Berg 579d7534ca cfg80211: add missing braces
Specifying a valid channel type will get
goto out rather than continuing, due to
missing braces. This affects both remain
on channel and action frame TX commands.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-21 14:40:01 -04:00
Felix Fietkau 9fbc630c89 cfg80211: fix crash in cfg80211_set_freq()
Since wdev can be NULL, check it before dereferencing it

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-21 14:40:00 -04:00
Sujith 22fe88d3d8 cfg80211: Fix signal_type comparison
signal_type is enum cfg80211_signal_type.

This fixes the gcc warning:

scan.c: In function `cfg80211_inform_bss':
scan.c:518:6: warning: comparison between `enum cfg80211_signal_type' and `enum nl80211_bss'
scan.c: In function `cfg80211_inform_bss_frame':
scan.c:574:6: warning: comparison between `enum cfg80211_signal_type' and `enum nl80211_bss'

Signed-off-by: Sujith <Sujith.Manoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-05-21 14:39:59 -04:00
Eric W. Biederman a1b3f594dc net: Expose all network devices in a namespaces in sysfs
This reverts commit aaf8cdc34d.

Drivers like the ipw2100 call device_create_group when they
are initialized and device_remove_group when they are shutdown.
Moving them between namespaces deletes their sysfs groups early.

In particular the following call chain results.
netdev_unregister_kobject -> device_del -> kobject_del -> sysfs_remove_dir
With sysfs_remove_dir recursively deleting all of it's subdirectories,
and nothing adding them back.

Ouch!

Therefore we need to call something that ultimate calls sysfs_mv_dir
as that sysfs function can move sysfs directories between namespaces
without deleting their subdirectories or their contents.   Allowing
us to avoid placing extra boiler plate into every driver that does
something interesting with sysfs.

Currently the function that provides that capability is device_rename.
That is the code works without nasty side effects as originally written.

So remove the misguided fix for moving devices between namespaces.  The
bug in the kobject layer that inspired it has now been recognized and
fixed.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:34 -07:00
Eric W. Biederman 910a7e905f netlink: Implment netlink_broadcast_filtered
When netlink sockets are used to convey data that is in a namespace
we need a way to select a subset of the listening sockets to deliver
the packet to.  For the network namespace we have been doing this
by only transmitting packets in the correct network namespace.

For data belonging to other namespaces netlink_bradcast_filtered
provides a mechanism that allows us to examine the destination
socket and to decide if we should transmit the specified packet
to it.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:32 -07:00
Eric W. Biederman d6523ddf23 net/sysfs: Fix the bitrot in network device kobject namespace support
I had a couple of stupid bugs in:
netns: Teach network device kobjects which namespace they are in.

- I duplicated the Kconfig for the NET_NS
- The build was broken when sysfs was not compiled in

The sysfs breakage is because after I moved the operations
for the sysfs to the kobject layer, to make things cleaner
I forgot to move the ifdefs.  Opps.

I'm not quite certain how I got introduced a second NET_NS Kconfig,
but it was probably a 3 way merge somewhere along the way that
did not notice that the NET_NS Kconfig option had mvoed and thout
that was a bug.  It probably slipped in because it used to be the
sysfs patches were the first patches in my network namespace patches.
Some things just don't go like you would expect.

Neither of these bugs actually affect anything in the common case
but they should be fixed.

Thanks to Serge for noticing they were present.

Reported-by: Serge E. Hallyn <serue@us.ibm.com>
Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
Acked-by: David S. Miller <davem@davemloft.net>
2010-05-21 09:37:32 -07:00
Eric W. Biederman 608b4b9548 netns: Teach network device kobjects which namespace they are in.
The problem.  Network devices show up in sysfs and with the network
namespace active multiple devices with the same name can show up in
the same directory, ouch!

To avoid that problem and allow existing applications in network namespaces
to see the same interface that is currently presented in sysfs, this
patch enables the tagging directory support in sysfs.

By using the network namespace pointers as tags to separate out the
the sysfs directory entries we ensure that we don't have conflicts
in the directories and applications only see a limited set of
the network devices.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:32 -07:00
Chris Wright 2c3c8bea60 sysfs: add struct file* to bin_attr callbacks
This allows bin_attr->read,write,mmap callbacks to check file specific data
(such as inode owner) as part of any privilege validation.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2010-05-21 09:37:31 -07:00
Steven Rostedt ff5f149b6a Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-7
Conflicts:
	include/linux/ftrace_event.h
	include/trace/ftrace.h
	kernel/trace/trace_event_perf.c
	kernel/trace/trace_kprobe.c
	kernel/trace/trace_syscalls.c

Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-21 11:49:57 -04:00
Tom Herbert 76cc8b13a6 net: fix problem in dequeuing from input_pkt_queue
Fix some issues introduced in batch skb dequeuing for input_pkt_queue.
The primary issue it that the queue head must be incremented only
after a packet has been processed, that is only after
__netif_receive_skb has been called.  This is needed for the mechanism
to prevent OOO packet in RFS.  Also when flushing the input_pkt_queue
and process_queue, the process queue should be done first to prevent
OOO packets.

Because the input_pkt_queue has been effectively split into two queues,
the calculation of the tail ptr is no longer correct.  The correct value
would be head+input_pkt_queue->len+process_queue->len.  To avoid
this calculation we added an explict input_queue_tail in softnet_data.
The tail value is simply incremented when queuing to input_pkt_queue.

Signed-off-by: Tom Herbert <therbert@google.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-21 00:38:33 -07:00
David S. Miller 41499bd676 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2010-05-20 23:12:18 -07:00
Herbert Xu 622e0ca1cd gro: Fix bogus gso_size on the first fraglist entry
When GRO produces fraglist entries, and the resulting skb hits
an interface that is incapable of TSO but capable of FRAGLIST,
we end up producing a bogus packet with gso_size non-zero.

This was reported in the field with older versions of KVM that
did not set the TSO bits on tuntap.

This patch fixes that.

Reported-by: Igor Zhang <yugzhang@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-20 23:07:56 -07:00
Linus Torvalds f8965467f3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1674 commits)
  qlcnic: adding co maintainer
  ixgbe: add support for active DA cables
  ixgbe: dcb, do not tag tc_prio_control frames
  ixgbe: fix ixgbe_tx_is_paused logic
  ixgbe: always enable vlan strip/insert when DCB is enabled
  ixgbe: remove some redundant code in setting FCoE FIP filter
  ixgbe: fix wrong offset to fc_frame_header in ixgbe_fcoe_ddp
  ixgbe: fix header len when unsplit packet overflows to data buffer
  ipv6: Never schedule DAD timer on dead address
  ipv6: Use POSTDAD state
  ipv6: Use state_lock to protect ifa state
  ipv6: Replace inet6_ifaddr->dead with state
  cxgb4: notify upper drivers if the device is already up when they load
  cxgb4: keep interrupts available when the ports are brought down
  cxgb4: fix initial addition of MAC address
  cnic: Return SPQ credit to bnx2x after ring setup and shutdown.
  cnic: Convert cnic_local_flags to atomic ops.
  can: Fix SJA1000 command register writes on SMP systems
  bridge: fix build for CONFIG_SYSFS disabled
  ARCNET: Limit com20020 PCI ID matches for SOHARD cards
  ...

Fix up various conflicts with pcmcia tree drivers/net/
{pcmcia/3c589_cs.c, wireless/orinoco/orinoco_cs.c and
wireless/orinoco/spectrum_cs.c} and feature removal
(Documentation/feature-removal-schedule.txt).

Also fix a non-content conflict due to pm_qos_requirement getting
renamed in the PM tree (now pm_qos_request) in net/mac80211/scan.c
2010-05-20 21:04:44 -07:00
Linus Torvalds f39d01be4c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (44 commits)
  vlynq: make whole Kconfig-menu dependant on architecture
  add descriptive comment for TIF_MEMDIE task flag declaration.
  EEPROM: max6875: Header file cleanup
  EEPROM: 93cx6: Header file cleanup
  EEPROM: Header file cleanup
  agp: use NULL instead of 0 when pointer is needed
  rtc-v3020: make bitfield unsigned
  PCI: make bitfield unsigned
  jbd2: use NULL instead of 0 when pointer is needed
  cciss: fix shadows sparse warning
  doc: inode uses a mutex instead of a semaphore.
  uml: i386: Avoid redefinition of NR_syscalls
  fix "seperate" typos in comments
  cocbalt_lcdfb: correct sections
  doc: Change urls for sparse
  Powerpc: wii: Fix typo in comment
  i2o: cleanup some exit paths
  Documentation/: it's -> its where appropriate
  UML: Fix compiler warning due to missing task_struct declaration
  UML: add kernel.h include to signal.c
  ...
2010-05-20 09:20:59 -07:00
Linus Torvalds 46ee964509 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/suspend-2.6:
  PM: PM QOS update fix
  Freezer / cgroup freezer: Update stale locking comments
  PM / platform_bus: Allow runtime PM by default
  i2c: Fix bus-level power management callbacks
  PM QOS update
  PM / Hibernate: Fix block_io.c printk warning
  PM / Hibernate: Group swap ops
  PM / Hibernate: Move the first_sector out of swsusp_write
  PM / Hibernate: Separate block_io
  PM / Hibernate: Snapshot cleanup
  FS / libfs: Implement simple_write_to_buffer
  PM / Hibernate: document open(/dev/snapshot) side effects
  PM / Runtime: Add sysfs debug files
  PM: Improve device power management document
  PM: Update device power management document
  PM: Allow runtime_suspend methods to call pm_schedule_suspend()
  PM: pm_wakeup - switch to using bool
2010-05-20 09:03:55 -07:00
Joerg Marx fc350777c7 netfilter: nf_conntrack: fix a race in __nf_conntrack_confirm against nf_ct_get_next_corpse()
This race was triggered by a 'conntrack -F' command running in parallel
to the insertion of a hash for a new connection. Losing this race led to
a dead conntrack entry effectively blocking traffic for a particular
connection until timeout or flushing the conntrack hashes again.
Now the check for an already dying connection is done inside the lock.

Signed-off-by: Joerg Marx <joerg.marx@secunet.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-20 15:55:30 +02:00
Linus Torvalds f72caf7e49 Merge branch 'for-2.6.35' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.35' of git://linux-nfs.org/~bfields/linux: (45 commits)
  Revert "nfsd4: distinguish expired from stale stateids"
  nfsd: safer initialization order in find_file()
  nfs4: minor callback code simplification, comment
  NFSD: don't report compiled-out versions as present
  nfsd4: implement reclaim_complete
  nfsd4: nfsd4_destroy_session must set callback client under the state lock
  nfsd4: keep a reference count on client while in use
  nfsd4: mark_client_expired
  nfsd4: introduce nfs4_client.cl_refcount
  nfsd4: refactor expire_client
  nfsd4: extend the client_lock to cover cl_lru
  nfsd4: use list_move in move_to_confirmed
  nfsd4: fold release_session into expire_client
  nfsd4: rename sessionid_lock to client_lock
  nfsd4: fix bare destroy_session null dereference
  nfsd4: use local variable in nfs4svc_encode_compoundres
  nfsd: further comment typos
  sunrpc: centralise most calls to svc_xprt_received
  nfsd4: fix unlikely race in session replay case
  nfsd4: fix filehandle comment
  ...
2010-05-19 17:24:54 -07:00
Linus Torvalds 6a6be470c3 Merge branch 'nfs-for-2.6.35' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.35' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (78 commits)
  SUNRPC: Don't spam gssd with upcall requests when the kerberos key expired
  SUNRPC: Reorder the struct rpc_task fields
  SUNRPC: Remove the 'tk_magic' debugging field
  SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst
  NFS: Don't call iput() in nfs_access_cache_shrinker
  NFS: Clean up nfs_access_zap_cache()
  NFS: Don't run nfs_access_cache_shrinker() when the mask is GFP_NOFS
  SUNRPC: Ensure rpcauth_prune_expired() respects the nr_to_scan parameter
  SUNRPC: Ensure memory shrinker doesn't waste time in rpcauth_prune_expired()
  SUNRPC: Dont run rpcauth_cache_shrinker() when gfp_mask is GFP_NOFS
  NFS: Read requests can use GFP_KERNEL.
  NFS: Clean up nfs_create_request()
  NFS: Don't use GFP_KERNEL in rpcsec_gss downcalls
  NFSv4: Don't use GFP_KERNEL allocations in state recovery
  SUNRPC: Fix xs_setup_bc_tcp()
  SUNRPC: Replace jiffies-based metrics with ktime-based metrics
  ktime: introduce ktime_to_ms()
  SUNRPC: RPC metrics and RTT estimator should use same RTT value
  NFS: Calldata for nfs4_renew_done()
  NFS: Squelch compiler warning in nfs_add_server_stats()
  ...
2010-05-19 17:24:05 -07:00
Linus Torvalds 98c89cdd3a Merge branch 'bkl/procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing
* 'bkl/procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/frederic/random-tracing:
  sunrpc: Include missing smp_lock.h
  procfs: Kill the bkl in ioctl
  procfs: Push down the bkl from ioctl
  procfs: Use generic_file_llseek in /proc/vmcore
  procfs: Use generic_file_llseek in /proc/kmsg
  procfs: Use generic_file_llseek in /proc/kcore
  procfs: Kill BKL in llseek on proc base
2010-05-19 17:23:28 -07:00
Michael S. Tsirkin dc3f5e68f8 trans_virtio: use virtqueue_xxx wrappers
Switch trans_virtio to new virtqueue_xxx wrappers.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2010-05-19 22:15:45 +09:30
Herbert Xu 622ccdf107 ipv6: Never schedule DAD timer on dead address
This patch ensures that all places that schedule the DAD timer
look at the address state in a safe manner before scheduling the
timer.  This ensures that we don't end up with pending timers
after deleting an address.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:56:06 -07:00
Herbert Xu f2344a131b ipv6: Use POSTDAD state
This patch makes use of the new POSTDAD state.  This prevents
a race between DAD completion and failure.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:55:27 -07:00
Herbert Xu 4c5ff6a6fe ipv6: Use state_lock to protect ifa state
This patch makes use of the new state_lock to synchronise between
updates to the ifa state.  This fixes the issue where a remotely
triggered address deletion (through DAD failure) coincides with a
local administrative address deletion, causing certain actions to
be performed twice incorrectly.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:54:18 -07:00
Herbert Xu e9d3e08497 ipv6: Replace inet6_ifaddr->dead with state
This patch replaces the boolean dead flag on inet6_ifaddr with
a state enum.  This allows us to roll back changes when deleting
an address according to whether DAD has completed or not.

This patch only adds the state field and does not change the logic.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 15:36:06 -07:00
Randy Dunlap b3bcb72edb bridge: fix build for CONFIG_SYSFS disabled
Fix build when CONFIG_SYSFS is not enabled:
net/bridge/br_if.c:136: error: 'struct net_bridge_port' has no member named 'sysfs_name'

Note: dev->name == sysfs_name except when change name is in
progress, and we are protected from that by RTNL mutex.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-18 12:26:27 -07:00
Joe Perches 3fa21e07e6 net: Remove unnecessary returns from void function()s
This patch removes from net/ (but not any netfilter files)
all the unnecessary return; statements that precede the
last closing brace of void functions.

It does not remove the returns that are immediately
preceded by a label as gcc doesn't like that.

Done via:
$ grep -rP --include=*.[ch] -l "return;\n}" net/ | \
  xargs perl -i -e 'local $/ ; while (<>) { s/\n[ \t\n]+return;\n}/\n}/g; print; }'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:14 -07:00
stephen hemminger b60b6592ba net sched: cleanup and rate limit warning
If the user has a bad classification configuration, and gets a packet
that goes through too many steps. Chances are more packets will arrive,
and the message spew will overrun syslog because it is not rate limited.
And because it is not tagged with appropriate priority it can't not be screened.

Added the qdisc to the message to try and give some more context when
the message does arrive.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Acked-by: Jamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:13 -07:00
stephen hemminger 207024b947 pfkey: add severity to printk
Put severity level on pfkey printk messages

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:13 -07:00
stephen hemminger 62db5cfd70 xfrm: add severity to printk
Serious oh sh*t messages converted to WARN().
Add KERN_NOTICE severity to the unknown policy type messages.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:13 -07:00
stephen hemminger 6ff9c3644e net sched: printk message severity
The previous patch encourage me to go look at all the messages in
the network scheduler and fix them. Many messages were missing
any severity level. Some serious ones that should never happen
were turned into WARN(), and the random noise messages that were
handled changed to pr_debug().

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 23:23:12 -07:00
Julia Lawall 49afa55b5b net/caif: Use kzalloc
Use kzalloc rather than the combination of kmalloc and memset.

A simplified version of the semantic patch that makes this change is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression x,size,flags;
statement S;
@@

-x = kmalloc(size,flags);
+x = kzalloc(size,flags);
 if (x == NULL) S
-memset(x, 0, size);
// </smpl>

Signed-off-by: Julia Lawall <julia@diku.dk>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:55:09 -07:00
Wei Yongjun 2e3219b5c8 sctp: fix append error cause to ERROR chunk correctly
commit 5fa782c2f5
  sctp: Fix skb_over_panic resulting from multiple invalid \
    parameter errors (CVE-2010-1173) (v4)

cause 'error cause' never be add the the ERROR chunk due to
some typo when check valid length in sctp_init_cause_fixed().

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Reviewed-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:51:58 -07:00
Scott Feldman 57b610805c net: Add netlink support for virtual port management (was iovnl)
Add new netdev ops ndo_{set|get}_vf_port to allow setting of
port-profile on a netdev interface.  Extends netlink socket RTM_SETLINK/
RTM_GETLINK with two new sub msgs called IFLA_VF_PORTS and IFLA_PORT_SELF
(added to end of IFLA_cmd list).  These are both nested atrtibutes
using this layout:

              [IFLA_NUM_VF]
              [IFLA_VF_PORTS]
                      [IFLA_VF_PORT]
                              [IFLA_PORT_*], ...
                      [IFLA_VF_PORT]
                              [IFLA_PORT_*], ...
                      ...
              [IFLA_PORT_SELF]
                      [IFLA_PORT_*], ...

These attributes are design to be set and get symmetrically.  VF_PORTS
is a list of VF_PORTs, one for each VF, when dealing with an SR-IOV
device.  PORT_SELF is for the PF of the SR-IOV device, in case it wants
to also have a port-profile, or for the case where the VF==PF, like in
enic patch 2/2 of this patch set.

A port-profile is used to configure/enable the external switch virtual port
backing the netdev interface, not to configure the host-facing side of the
netdev.  A port-profile is an identifier known to the switch.  How port-
profiles are installed on the switch or how available port-profiles are
made know to the host is outside the scope of this patch.

There are two types of port-profiles specs in the netlink msg.  The first spec
is for 802.1Qbg (pre-)standard, VDP protocol.  The second spec is for devices
that run a similar protocol as VDP but in firmware, thus hiding the protocol
details.  In either case, the specs have much in common and makes sense to
define the netlink msg as the union of the two specs.  For example, both specs
have a notition of associating/deassociating a port-profile.  And both specs
require some information from the hypervisor manager, such as client port
instance ID.

The general flow is the port-profile is applied to a host netdev interface
using RTM_SETLINK, the receiver of the RTM_SETLINK msg communicates with the
switch, and the switch virtual port backing the host netdev interface is
configured/enabled based on the settings defined by the port-profile.  What
those settings comprise, and how those settings are managed is again
outside the scope of this patch, since this patch only deals with the
first step in the flow.

Signed-off-by: Scott Feldman <scofeldm@cisco.com>
Signed-off-by: Roopa Prabhu <roprabhu@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:49:55 -07:00
Eric Dumazet d19d56ddc8 net: Introduce skb_tunnel_rx() helper
skb rxhash should be cleared when a skb is handled by a tunnel before
being delivered again, so that correct packet steering can take place.

There are other cleanups and accounting that we can factorize in a new
helper, skb_tunnel_rx()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:36:55 -07:00
Eric Dumazet de213e5eed tcp: tcp_synack_options() fix
Commit 33ad798c92 (tcp: options clean up) introduced a problem
if MD5+SACK+timestamps were used in initial SYN message.

Some stacks (old linux for example) try to negotiate MD5+SACK+TSTAMP
sessions, but since 40 bytes of tcp options space are not enough to
store all the bits needed, we chose to disable timestamps in this case.

We send a SYN-ACK _without_ timestamp option, but socket has timestamps
enabled and all further outgoing messages contain a TS block, all with
the initial timestamp of the remote peer.

Fix is to really disable timestamps option for the whole session.

Reported-by: Bijay Singh <Bijay.Singh@guavus.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:35:36 -07:00
Stephen Hemminger eedf042a63 ipv6: fix the bug of address check
The duplicate address check code got broken in the conversion
to hlist (2.6.35).  The earlier patch did not fix the case where
two addresses match same hash value. Use two exit paths,
rather than depending on state of loop variables (from macro).

Based on earlier fix by Shan Wei.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Reviewed-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 22:27:12 -07:00
Steven Rostedt f0218b3e99 Merge branch 'perf/core' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip into trace/tip/tracing/core-6
Conflicts:
	include/trace/ftrace.h
	kernel/trace/trace_kprobe.c

Acked-by: Masami Hiramatsu <mhiramat@redhat.com>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2010-05-18 00:35:23 -04:00
David S. Miller 820ae8a80e Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-05-17 21:09:11 -07:00
Patrick McHardy a2f7922713 net_sched: sch_hfsc: fix classification loops
When attaching filters to a class pointing to a class higher up in the
hierarchy, classification may enter an endless loop. Currently this is
prevented for filters that are already resolved, but not for filters
resolved at runtime.

Only allow filters to point downwards in the hierarchy, similar to what
CBQ does.

Reported-by: Pawel Staszewski <pstaszewski@itcare.pl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:46 -07:00
stephen hemminger f0cd15081a tbf: stop wanton destruction of children (v2)
Several netem users use TBF for rate control. But every time the parameters
of TBF are changed it destroys the child qdisc, requiring reconfigation.
Better to just keep child qdisc and just notify it of changed limit.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
Joe Perches ccbd6a5a4f net: Remove unnecessary semicolons after switch statements
Also added an explicit break; to avoid
a fallthrough in net/ipv4/tcp_input.c

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:44:35 -07:00
andrew hendry 935e2a26b8 X25: Remove bkl in sockopts
Removes the BKL in x25 setsock and getsockopts.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:28 -07:00
andrew hendry 37cda78741 X25: Move accept approve flag to bitfield
Moves the x25 accept approve flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry b7792e34cb X25: Move interrupt flag to bitfield
Moves the x25 interrupt flag from char into bitfield.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:27 -07:00
andrew hendry cb863ffd4a X25: Move qbit flag to bitfield
Moves the X25 q bit flag from char into a bitfield to allow BKL cleanup.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:39:26 -07:00
Eric Dumazet ab6e3feba1 net: No dst refcounting in ip_queue_xmit()
TCP outgoing packets can avoid two atomic ops, and dirtying
of previously higly contended cache line using new refdst
infrastructure.

Note 1: loopback device excluded because of !IFF_XMIT_DST_RELEASE
Note 2: UDP packets dsts are built before ip_queue_xmit().

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:52 -07:00
Eric Dumazet 4a94445c9a net: Use ip_route_input_noref() in input path
Use ip_route_input_noref() in ip fast path, to avoid two atomic ops per
incoming packet.

Note: loopback is excluded from this optimization in ip_rcv_finish()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:51 -07:00
Eric Dumazet 407eadd996 net: implements ip_route_input_noref()
ip_route_input() is the version returning a refcounted dst, while
ip_route_input_noref() returns a non refcounted one.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:51 -07:00
Eric Dumazet 7fee226ad2 net: add a noref bit on skb dst
Use low order bit of skb->_skb_dst to tell dst is not refcounted.

Change _skb_dst to _skb_refdst to make sure all uses are catched.

skb_dst() returns the dst, regardless of noref bit set or not, but
with a lockdep check to make sure a noref dst is not given if current
user is not rcu protected.

New skb_dst_set_noref() helper to set an notrefcounted dst on a skb.
(with lockdep check)

skb_dst_drop() drops a reference only if skb dst was refcounted.

skb_dst_force() helper is used to force a refcount on dst, when skb
is queued and not anymore RCU protected.

Use skb_dst_force() in __sk_add_backlog(), __dev_xmit_skb() if
!IFF_XMIT_DST_RELEASE or skb enqueued on qdisc queue, in
sock_queue_rcv_skb(), in __nf_queue().

Use skb_dst_force() in dev_requeue_skb().

Note: dst_use_noref() still dirties dst, we might transform it
later to do one dirtying per jiffies.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:50 -07:00
Eric Dumazet ebda37c27d rps: avoid one atomic in enqueue_to_backlog
If CONFIG_SMP=y, then we own a queue spinlock, we can avoid the atomic
test_and_set_bit() from napi_schedule_prep().

We now have same number of atomic ops per netif_rx() calls than with
pre-RPS kernel.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:18:50 -07:00
Florian Westphal 0771275b25 ipv6 addrlabel: permit deletion of labels assigned to removed dev
as addrlabels with an interface index are left alone when the
interface gets removed this results in addrlabels that can no
longer be removed.

Restrict validation of index to adding new addrlabels.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-17 17:08:08 -07:00
John W. Linville 6fe70aae0d Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-05-17 13:57:43 -04:00
John W. Linville 0c348d7c14 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-05-17 13:31:05 -04:00
David S. Miller 6811d58fc1 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	include/linux/if_link.h
2010-05-16 22:26:58 -07:00
Frederic Weisbecker 99df95a22f sunrpc: Include missing smp_lock.h
Now that cache_ioctl_procfs() calls the bkl explicitly, we need to
include the relevant header as well.

This fixes the following build error:

	net/sunrpc/cache.c: In function 'cache_ioctl_procfs':
	net/sunrpc/cache.c:1355: error: implicit declaration of function 'lock_kernel'
	net/sunrpc/cache.c:1359: error: implicit declaration of function 'unlock_kernel'

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2010-05-17 03:06:31 +02:00
Frederic Weisbecker d79b6f4de5 procfs: Push down the bkl from ioctl
Push down the bkl from procfs's ioctl main handler to its users.
Only three procfs users implement an ioctl (non unlocked) handler.
Turn them into unlocked_ioctl and push down the Devil inside.

v2: PDE(inode)->data doesn't need to be under bkl
v3: And don't forget to git-add the result
v4: Use wrappers to pushdown instead of an invasive and error prone
    handlers surgery.

Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: John Kacur <jkacur@redhat.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Al Viro <viro@ZenIV.linux.org.uk>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
2010-05-17 03:06:12 +02:00
Chris Wright c02db8c629 rtnetlink: make SR-IOV VF interface symmetric
Now we have a set of nested attributes:

  IFLA_VFINFO_LIST (NESTED)
    IFLA_VF_INFO (NESTED)
      IFLA_VF_MAC
      IFLA_VF_VLAN
      IFLA_VF_TX_RATE

This allows a single set to operate on multiple attributes if desired.
Among other things, it means a dump can be replayed to set state.

The current interface has yet to be released, so this seems like
something to consider for 2.6.34.

Signed-off-by: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 01:05:45 -07:00
Wei Yongjun 55fa0cfd7c sctp: delete active ICMP proto unreachable timer when free transport
transport may be free before ICMP proto unreachable timer expire, so
we should delete active ICMP proto unreachable timer when transport
is going away.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:46:22 -07:00
Eric Dumazet 2d6c9ffcca net: congestion notifications are not dropped packets
vlan/macvlan start_xmit() can inform caller of congestion with
NET_XMIT_CN return value. This doesnt mean packet was dropped.
Increment normal stat counters instead of tx_dropped.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:42:15 -07:00
Eric Dumazet a465419b1f net: Introduce sk_route_nocaps
TCP-MD5 sessions have intermittent failures, when route cache is
invalidated. ip_queue_xmit() has to find a new route, calls
sk_setup_caps(sk, &rt->u.dst), destroying the 

sk->sk_route_caps &= ~NETIF_F_GSO_MASK

that MD5 desperately try to make all over its way (from
tcp_transmit_skb() for example)

So we send few bad packets, and everything is fine when
tcp_transmit_skb() is called again for this socket.

Since ip_queue_xmit() is at a lower level than TCP-MD5, I chose to use a
socket field, sk_route_nocaps, containing bits to mask on sk_route_caps.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:36:33 -07:00
Eric Dumazet 35790c0421 tcp: fix MD5 (RFC2385) support
TCP MD5 support uses percpu data for temporary storage. It currently
disables preemption so that same storage cannot be reclaimed by another
thread on same cpu.

We also have to make sure a softirq handler wont try to use also same
context. Various bug reports demonstrated corruptions.

Fix is to disable preemption and BH.

Reported-by: Bhaskar Dutta <bhaskie@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-16 00:34:04 -07:00
Eric Dumazet 3b098e2d7c net: Consistent skb timestamping
With RPS inclusion, skb timestamping is not consistent in RX path.

If netif_receive_skb() is used, its deferred after RPS dispatch.

If netif_rx() is used, its done before RPS dispatch.

This can give strange tcpdump timestamps results.

I think timestamping should be done as soon as possible in the receive
path, to get meaningful values (ie timestamps taken at the time packet
was delivered by NIC driver to our stack), even if NAPI already can
defer timestamping a bit (RPS can help to reduce the gap)

Tom Herbert prefer to sample timestamps after RPS dispatch. In case
sampling is expensive (HPET/acpi_pm on x86), this makes sense.

Let admins switch from one mode to another, using a new
sysctl, /proc/sys/net/core/netdev_tstamp_prequeue

Its default value (1), means timestamps are taken as soon as possible,
before backlog queueing, giving accurate timestamps.

Setting a 0 value permits to sample timestamps when processing backlog,
after RPS dispatch, to lower the load of the pre-RPS cpu.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:57:10 -07:00
Timo Teras a1aa348304 xfrm: fix policy unreferencing on larval drop
I mistakenly had the error path to use num_pols to decide how
many policies we need to drop (cruft from earlier patch set
version which did not handle socket policies right).

This is wrong since normally we do not keep explicit references
(instead we hold reference to the cache entry which holds references
to policies). drop_pols is set to num_pols if we are holding the
references, so use that. Otherwise we eventually BUG_ON inside
xfrm_policy_destroy due to premature policy deletion.

Signed-off-by: Timo Teras <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:49:26 -07:00
Jiri Pirko a14462f1bd net: adjust handle_macvlan to pass port struct to hook
Now there's null check here and also again in the hook. Looking at bridge bits
which are simmilar, port structure is rcu_dereferenced right away in
handle_bridge and passed to hook. Looks nicer.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:48:02 -07:00
Amerigo Wang e3826f1e94 net: reserve ports for applications using fixed port numbers
(Dropped the infiniband part, because Tetsuo modified the related code,
I will send a separate patch for it once this is accepted.)

This patch introduces /proc/sys/net/ipv4/ip_local_reserved_ports which
allows users to reserve ports for third-party applications.

The reserved ports will not be used by automatic port assignments
(e.g. when calling connect() or bind() with port number 0). Explicit
port allocation behavior is unchanged.

Signed-off-by: Octavian Purdila <opurdila@ixiacom.com>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:28:40 -07:00
David S. Miller 1cdc5abf40 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax 2010-05-15 23:14:16 -07:00
Simon Arlott e0f43752a9 bridge: update sysfs link names if port device names have changed
Links for each port are created in sysfs using the device
name, but this could be changed after being added to the
bridge.

As well as being unable to remove interfaces after this
occurs (because userspace tools don't recognise the new
name, and the kernel won't recognise the old name), adding
another interface with the old name to the bridge will
cause an error trying to create the sysfs link.

This fixes the problem by listening for NETDEV_CHANGENAME
notifications and renaming the link.

https://bugzilla.kernel.org/show_bug.cgi?id=12743

Signed-off-by: Simon Arlott <simon@fire.lp0.eu>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:15 -07:00
stephen hemminger 28a16c9796 bridge: change console message interface
Use one set of macro's for all bridge messages.

Note: can't use netdev_XXX macro's because bridge is purely
virtual and has no device parent.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:02 -07:00
stephen hemminger cfb478da70 bridge: netpoll cleanup
Move code around so that the ifdef for NETPOLL_CONTROLLER don't have to
show up in main code path. The control functions should be in helpers
that are only compiled if needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-15 23:10:01 -07:00
Randy Dunlap 83827f6a89 netfilter: xt_TEE depends on NF_CONNTRACK
Fix xt_TEE build for the case of NF_CONNTRACK=m and
NETFILTER_XT_TARGET_TEE=y:

xt_TEE.c:(.text+0x6df5c): undefined reference to `nf_conntrack_untracked'
4x

Built with all 4 m/y combinations.

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-05-14 13:52:30 -07:00
Patrick McHardy a1d7c1b4b8 netfilter: nf_ct_sip: handle non-linear skbs
Handle non-linear skbs by linearizing them instead of silently failing.
Long term the helper should be fixed to either work with non-linear skbs
directly by using the string search API or work on a copy of the data.

Based on patch by Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-05-14 21:18:17 +02:00
Trond Myklebust 126e216a87 SUNRPC: Don't spam gssd with upcall requests when the kerberos key expired
Now that the rpc.gssd daemon can explicitly tell us that the key expired,
we should cache that information to avoid spamming gssd.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:37 -04:00
Trond Myklebust d72b6cec8d SUNRPC: Remove the 'tk_magic' debugging field
It has not triggered in almost a decade. Time to get rid of it...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:36 -04:00
Trond Myklebust d60dbb20a7 SUNRPC: Move the task->tk_bytes_sent and tk_rtt to struct rpc_rqst
It seems strange to maintain stats for bytes_sent in one structure, and
bytes received in another. Try to assemble all the RPC request-related
stats in struct rpc_rqst

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:36 -04:00
Trond Myklebust 2067340653 SUNRPC: Ensure rpcauth_prune_expired() respects the nr_to_scan parameter
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:35 -04:00
Trond Myklebust 93a05e65c0 SUNRPC: Ensure memory shrinker doesn't waste time in rpcauth_prune_expired()
The 'cred_unused' list, that is traversed by rpcauth_cache_shrinker is
ordered by time. If we hit a credential that is under the 60 second garbage
collection moratorium, we should exit because we know at that point that
all successive credentials are subject to the same moratorium...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:34 -04:00
Trond Myklebust d300a41ef1 SUNRPC: Dont run rpcauth_cache_shrinker() when gfp_mask is GFP_NOFS
Under some circumstances, put_rpccred() can end up allocating memory, so
check the gfp_mask to prevent deadlocks.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:34 -04:00
Trond Myklebust 1f4c86c0be NFS: Don't use GFP_KERNEL in rpcsec_gss downcalls
Again, we can deadlock if the memory reclaim triggers a writeback that
requires a rpcsec_gss credential lookup.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Trond Myklebust 712a433866 SUNRPC: Fix xs_setup_bc_tcp()
It is a BUG for anybody to call this function without setting
args->bc_xprt. Trying to return an error value is just wrong, since the
user cannot fix this: it is a programming error, not a user error.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Chuck Lever ff8399709e SUNRPC: Replace jiffies-based metrics with ktime-based metrics
Currently RPC performance metrics that tabulate elapsed time use
jiffies time values.  This is problematic on systems that use slow
jiffies (for instance 100HZ systems built for paravirtualized
environments).  It is also a problem for computing precise latency
statistics for advanced network transports, such as InfiniBand,
that can have round-trip latencies significanly faster than a single
clock tick.

For the RPC client, adopt the high resolution time stamp mechanism
already used by the network layer and blktrace: ktime.

We use ktime format time stamps for all internal computations, and
convert to milliseconds for presentation.  As a result, we need only
addition operations in the performance critical paths; multiply/divide
is required only for presentation.

We could report RTT metrics in microseconds.  In fact the mountstats
format is versioned to accomodate exactly this kind of interface
improvement.

For now, however, we'll stay with millisecond precision for
presentation to maintain backwards compatibility with the handful of
currently deployed user space tools.  At a later point, we'll move to
an API such as BDI_STATS where a finer timestamp precision can be
reported.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:33 -04:00
Chuck Lever bbc72cea58 SUNRPC: RPC metrics and RTT estimator should use same RTT value
Compute an RPC request's RTT once, and use that value both for reporting
RPC metrics, and for adjusting the RTT context used by the RPC client's RTT
estimator algorithm.

Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:32 -04:00
Trond Myklebust a8ce4a8f37 SUNRPC: Fail over more quickly on connect errors
We should not allow soft tasks to wait for longer than the major timeout
period when waiting for a reconnect to occur.

Remove the field xprt->connect_timeout since it has been obsoleted by
xprt->reestablish_timeout.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-05-14 15:09:30 -04:00