Commit Graph

17810 Commits (0550769bb7f364fb9aeeb9412229fb7790ee79c4)

Author SHA1 Message Date
David S. Miller 8bc26a008f Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-02-14 12:51:42 -08:00
David S. Miller af756e9d88 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-14 11:16:12 -08:00
Patrick McHardy de9963f0f2 netfilter: nf_iterate: fix incorrect RCU usage
As noticed by Eric, nf_iterate doesn't use RCU correctly by
accessing the prev pointer of a RCU protected list element when
a verdict of NF_REPEAT is issued.

Fix by jumping backwards to the hook invocation directly instead
of loading the previous list element before continuing the list
iteration.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-14 17:35:07 +01:00
Jesper Juhl d3337de52a Don't potentially dereference NULL in net/dcb/dcbnl.c:dcbnl_getapp()
nla_nest_start() may return NULL. If it does then we'll blow up in
nla_nest_end() when we dereference the pointer.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-13 11:21:14 -08:00
John Fastabend 7ec79270d7 net: dcb: application priority is per net_device
The app_data priority may not be the same for all net devices.
In order for stacks with application notifiers to identify the
specific net device dcb_app_type should be passed in the ptr.

This allows handlers to use dev_get_by_name() to pin priority
to net devices.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-13 11:02:39 -08:00
Herbert Xu 8a870178c0 bridge: Replace mp->mglist hlist with a bool
As it turns out we never need to walk through the list of multicast
groups subscribed by the bridge interface itself (the only time we'd
want to do that is when we shut down the bridge, in which case we
simply walk through all multicast groups), we don't really need to
keep an hlist for mp->mglist.

This means that we can replace it with just a single bit to indicate
whether the bridge interface is subscribed to a group.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-12 01:05:42 -08:00
Herbert Xu 24f9cdcbd7 bridge: Fix timer typo that may render snooping less effective
In a couple of spots where we are supposed to modify the port
group timer (p->timer) we instead modify the bridge interface
group timer (mp->timer).

The effect of this is mostly harmless.  However, it can cause
port subscriptions to be longer than they should be, thus making
snooping less effective.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-11 21:59:37 -08:00
Herbert Xu 6b0d6a9b42 bridge: Fix mglist corruption that leads to memory corruption
The list mp->mglist is used to indicate whether a multicast group
is active on the bridge interface itself as opposed to one of the
constituent interfaces in the bridge.

Unfortunately the operation that adds the mp->mglist node to the
list neglected to check whether it has already been added.  This
leads to list corruption in the form of nodes pointing to itself.

Normally this would be quite obvious as it would cause an infinite
loop when walking the list.  However, as this list is never actually
walked (which means that we don't really need it, I'll get rid of
it in a subsequent patch), this instead is hidden until we perform
a delete operation on the affected nodes.

As the same node may now be pointed to by more than one node, the
delete operations can then cause modification of freed memory.

This was observed in practice to cause corruption in 512-byte slabs,
most commonly leading to crashes in jbd2.

Thanks to Josef Bacik for pointing me in the right direction.

Reported-by: Ian Page Hands <ihands@redhat.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-11 21:59:37 -08:00
Steffen Klassert 946bf5ee3c ip_gre: Add IPPROTO_GRE to flowi in ipgre_tunnel_xmit
Commit 5811662b15 ("net: use the macros
defined for the members of flowi") accidentally removed the setting of
IPPROTO_GRE from the struct flowi in ipgre_tunnel_xmit. This patch
restores it.

Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-11 11:23:12 -08:00
Hiroaki SHIMODA 0b15093219 xfrm: avoid possible oopse in xfrm_alloc_dst
Commit 80c802f307 (xfrm: cache bundles instead of policies for
outgoing flows) introduced possible oopse when dst_alloc returns NULL.

Signed-off-by: Hiroaki SHIMODA <shimoda.hiroaki@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-10 23:08:33 -08:00
David S. Miller 96642d42f0 x25: Do not reference freed memory.
In x25_link_free(), we destroy 'nb' before dereferencing
'nb->dev'.  Don't do this, because 'nb' might be freed
by then.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Tested-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-09 22:36:13 -08:00
David S. Miller ae0935776c Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-09 12:40:21 -08:00
Eliad Peller a7b545f7fe mac80211: add missing locking in ieee80211_reconfig
When suspending an associated system, and then resuming,
the station vif is being reconfigured without taking the
sdata->u.mgd.mtx lock, which results in the following warning:

WARNING: at net/mac80211/mlme.c:101 ieee80211_ap_probereq_get+0x58/0xb8 [mac80211]()
Modules linked in: wl12xx_sdio wl12xx firmware_class crc7 mac80211 cfg80211 [last unloaded: crc7]
Backtrace:
[<c005432c>] (dump_backtrace+0x0/0x118) from [<c0376e28>] (dump_stack+0x20/0x24)
 r7:00000000 r6:bf12d6ec r5:bf154aac r4:00000065
[<c0376e08>] (dump_stack+0x0/0x24) from [<c0079104>] (warn_slowpath_common+0x5c/0x74)
[<c00790a8>] (warn_slowpath_common+0x0/0x74) from [<c0079148>] (warn_slowpath_null+0x2c/0x34)
 r9:000024ff r8:cd006460 r7:00000001 r6:00000000 r5:00000000
r4:cf1394a0
[<c007911c>] (warn_slowpath_null+0x0/0x34) from [<bf12d6ec>] (ieee80211_ap_probereq_get+0x58/0xb8 [mac80211])
[<bf12d694>] (ieee80211_ap_probereq_get+0x0/0xb8 [mac80211]) from [<bf19cd04>] (wl1271_cmd_build_ap_probe_req+0x30/0xf8 [wl12xx])
 r4:cd007440
[<bf19ccd4>] (wl1271_cmd_build_ap_probe_req+0x0/0xf8 [wl12xx]) from [<bf1995f4>] (wl1271_op_bss_info_changed+0x4c4/0x808 [wl12xx])
 r5:cd007440 r4:000003b4
[<bf199130>] (wl1271_op_bss_info_changed+0x0/0x808 [wl12xx]) from [<bf122168>] (ieee80211_bss_info_change_notify+0x1a4/0x1f8 [mac80211])
[<bf121fc4>] (ieee80211_bss_info_change_notify+0x0/0x1f8 [mac80211]) from [<bf141e80>] (ieee80211_reconfig+0x4d0/0x668 [mac80211])
 r8:cf0eeea4 r7:cd00671c r6:00000000 r5:cd006460 r4:cf1394a0
[<bf1419b0>] (ieee80211_reconfig+0x0/0x668 [mac80211]) from [<bf137dd4>] (ieee80211_resume+0x60/0x70 [mac80211])
[<bf137d74>] (ieee80211_resume+0x0/0x70 [mac80211]) from [<bf0eb930>] (wiphy_resume+0x6c/0x7c [cfg80211])
 r5:cd006248 r4:cd006110
[<bf0eb8c4>] (wiphy_resume+0x0/0x7c [cfg80211]) from [<c0241024>] (legacy_resume+0x38/0x70)
 r7:00000000 r6:00000000 r5:cd006248 r4:cd0062fc
[<c0240fec>] (legacy_resume+0x0/0x70) from [<c0241478>] (device_resume+0x168/0x1a0)
 r8:c04ca8d8 r7:cd00627c r6:00000010 r5:cd006248 r4:cd0062fc
[<c0241310>] (device_resume+0x0/0x1a0) from [<c0241600>] (dpm_resume_end+0xf8/0x3bc)
 r7:00000000 r6:00000005 r5:cd006248 r4:cd0062fc
[<c0241508>] (dpm_resume_end+0x0/0x3bc) from [<c00b2a24>] (suspend_devices_and_enter+0x1b0/0x204)
[<c00b2874>] (suspend_devices_and_enter+0x0/0x204) from [<c00b2b68>] (enter_state+0xf0/0x148)
 r7:c037e978 r6:00000003 r5:c043d807 r4:00000000
[<c00b2a78>] (enter_state+0x0/0x148) from [<c00b20a4>] (state_store+0xa4/0xcc)
 r7:c037e978 r6:00000003 r5:00000003 r4:c043d807
[<c00b2000>] (state_store+0x0/0xcc) from [<c01fc90c>] (kobj_attr_store+0x20/0x24)
[<c01fc8ec>] (kobj_attr_store+0x0/0x24) from [<c0157120>] (sysfs_write_file+0x11c/0x150)
[<c0157004>] (sysfs_write_file+0x0/0x150) from [<c0100f84>] (vfs_write+0xc0/0x14c)
[<c0100ec4>] (vfs_write+0x0/0x14c) from [<c01010e4>] (sys_write+0x4c/0x78)
 r8:40126000 r7:00000004 r6:cf1a7c80 r5:00000000 r4:00000000
[<c0101098>] (sys_write+0x0/0x78) from [<c00500c0>] (ret_fast_syscall+0x0/0x30)
 r8:c00502c8 r7:00000004 r6:403525e8 r5:40126000 r4:00000004

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-09 15:35:13 -05:00
John W. Linville 5dc0fa782a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 2011-02-09 15:30:42 -05:00
Pablo Neira Ayuso c317428644 netfilter: nf_conntrack: set conntrack templates again if we return NF_REPEAT
The TCP tracking code has a special case that allows to return
NF_REPEAT if we receive a new SYN packet while in TIME_WAIT state.

In this situation, the TCP tracking code destroys the existing
conntrack to start a new clean session.

[DESTROY] tcp      6 src=192.168.0.2 dst=192.168.1.2 sport=38925 dport=8000 src=192.168.1.2 dst=192.168.1.100 sport=8000 dport=38925 [ASSURED]
    [NEW] tcp      6 120 SYN_SENT src=192.168.0.2 dst=192.168.1.2 sport=38925 dport=8000 [UNREPLIED] src=192.168.1.2 dst=192.168.1.100 sport=8000 dport=38925

However, this is a problem for the iptables' CT target event filtering
which will not work in this case since the conntrack template will not
be there for the new session. To fix this, we reassign the conntrack
template to the packet if we return NF_REPEAT.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-09 08:08:20 +01:00
David S. Miller 8d3bdbd55a net: Fix lockdep regression caused by initializing netdev queues too early.
In commit aa94210411 ("net: init ingress
queue") we moved the allocation and lock initialization of the queues
into alloc_netdev_mq() since register_netdevice() is way too late.

The problem is that dev->type is not setup until the setup()
callback is invoked by alloc_netdev_mq(), and the dev->type is
what determines the lockdep class to use for the locks in the
queues.

Fix this by doing the queue allocation after the setup() callback
runs.

This is safe because the setup() callback is not allowed to make any
state changes that need to be undone on error (memory allocations,
etc.).  It may, however, make state changes that are undone by
free_netdev() (such as netif_napi_add(), which is done by the
ipoib driver's setup routine).

The previous code also leaked a reference to the &init_net namespace
object on RX/TX queue allocation failures.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-08 15:02:50 -08:00
David S. Miller b2df5a8446 net/caif: Fix dangling list pointer in freed object on error.
rtnl_link_ops->setup(), and the "setup" callback passed to alloc_netdev*(),
cannot make state changes which need to be undone on failure.  There is
no cleanup mechanism available at this point.

So we have to add the caif private instance to the global list once we
are sure that register_netdev() has succedded in ->newlink().

Otherwise, if register_netdev() fails, the caller will invoke free_netdev()
and we will have a reference to freed up memory on the chnl_net_list.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-08 14:31:31 -08:00
David S. Miller e0985f27dd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-02-08 12:03:54 -08:00
David S. Miller 429a01a70f Merge branch 'batman-adv/merge' of git://git.open-mesh.org/ecsv/linux-merge 2011-02-07 19:54:14 -08:00
Sven Eckelmann 531c9da8c8 batman-adv: Linearize fragment packets before merge
We access the data inside the skbs of two fragments directly using memmove
during the merge. The data of the skb could span over multiple skb pages. An
direct access without knowledge about the pages would lead to an invalid memory
access.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
[lindner_marek@yahoo.de: Move return from function to the end]
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2011-02-08 00:54:31 +01:00
andrew hendry 95c3043008 x25: possible skb leak on bad facilities
Originally x25_parse_facilities returned
-1 for an error
 0 meaning 0 length facilities
>0 the length of the facilities parsed.

5ef41308f9 ("x25: Prevent crashing when parsing bad X.25 facilities") introduced more
error checking in x25_parse_facilities however used 0 to indicate bad parsing
a6331d6f9a ("memory corruption in X.25 facilities parsing") followed this further for
DTE facilities, again using 0 for bad parsing.

The meaning of 0 got confused in the callers.
If the facilities are messed up we can't determine where the data starts.
So patch makes all parsing errors return -1 and ensures callers close and don't use the skb further.

Reported-by: Andy Whitcroft <apw@canonical.com>
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-07 13:41:38 -08:00
Felix Fietkau fc7c976dc7 mac80211: fix the skb cloned check in the tx path
Using skb_header_cloned to check if it's safe to write to the skb is not
enough - mac80211 also touches the tailroom of the skb.
Initially this check was only used to increase a counter, however this
commit changed the code to also skip skb data reallocation if no extra
head/tailroom was needed:

commit 4cd06a344d
mac80211: skip unnecessary pskb_expand_head calls

It added a regression at least with iwl3945, which is fixed by this patch.

Reported-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Tested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-07 16:02:14 -05:00
Pavel Emelyanov 1158f762e5 bridge: Don't put partly initialized fdb into hash
The fdb_create() puts a new fdb into hash with only addr set. This is
not good, since there are callers, that search the hash w/o the lock
and access all the other its fields.

Applies to current netdev tree.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-04 13:02:36 -08:00
David S. Miller e2d57766e6 net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03 18:05:29 -08:00
David S. Miller ca6b8bb097 net: Support compat SIOCGETVIFCNT ioctl in ipv4.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03 17:24:28 -08:00
David S. Miller 0033d5ad27 net: Fix bug in compat SIOCGETSGCNT handling.
Commit 709b46e8d9 ("net: Add compat
ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT") added the
correct plumbing to handle SIOCGETSGCNT properly.

However, whilst definiting a proper "struct compat_sioc_sg_req" it
isn't actually used in ipmr_compat_ioctl().

Correct this oversight.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-03 17:21:31 -08:00
David S. Miller 0bc0be7f20 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2011-02-02 15:52:23 -08:00
Andy Gospodarek 6d152e23ad gro: reset skb_iif on reuse
Like Herbert's change from a few days ago:

66c46d741e gro: Reset dev pointer on reuse

this may not be necessary at this point, but we should still clean up
the skb->skb_iif.  If not we may end up with an invalid valid for
skb->skb_iif when the skb is reused and the check is done in
__netif_receive_skb.

Signed-off-by: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-02-02 14:53:25 -08:00
Johannes Berg 4334ec8518 mac80211: fix TX status cookie in HW offload case
When the off-channel TX is done with remain-on-channel
offloaded to hardware, the reported cookie is wrong as
in that case we shouldn't use the SKB as the cookie but
need to instead use the corresponding r-o-c cookie
(XOR'ed with 2 to prevent API mismatches).

Fix this by keeping track of the hw_roc_skb pointer
just for the status processing and use the correct
cookie to report in this case. We can't use the
hw_roc_skb pointer itself because it is NULL'ed when
the frame is transmitted to prevent it being used
twice.

This fixes a bug where the P2P state machine in the
supplicant gets stuck because it never gets a correct
result for its transmitted frame.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2011-02-02 16:38:59 -05:00
Bao Liang e733fb6208 Bluetooth: Set conn state to BT_DISCONN to avoid multiple responses
This patch fixes a minor issue that two connection responses will be sent
for one L2CAP connection request. If the L2CAP connection request is first
blocked due to security reason and responded with reason "security block",
the state of the connection remains BT_CONNECT2. If a pairing procedure
completes successfully before the ACL connection is down, local host will
send another connection complete response. See the following packets
captured by hcidump.

2010-12-07 22:21:24.928096 < ACL data: handle 12 flags 0x00 dlen 16
    0000: 0c 00 01 00 03 19 08 00  41 00 53 00 03 00 00 00  ........A.S.....
... ...

2010-12-07 22:21:35.791747 > HCI Event: Auth Complete (0x06) plen 3
    status 0x00 handle 12
... ...

2010-12-07 22:21:35.872372 > ACL data: handle 12 flags 0x02 dlen 16
    L2CAP(s): Connect rsp: dcid 0x0054 scid 0x0040 result 0 status 0
      Connection successful

Signed-off-by: Liang Bao <tim.bao@gmail.com>
Acked-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2011-02-02 12:47:59 -02:00
Pablo Neira Ayuso 3db7e93d33 netfilter: ecache: always set events bits, filter them later
For the following rule:

iptables -I PREROUTING -t raw -j CT --ctevents assured

The event delivered looks like the following:

 [UPDATE] tcp      6 src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

Note that the TCP protocol state is not included. For that reason
the CT event filtering is not very useful for conntrackd.

To resolve this issue, instead of conditionally setting the CT events
bits based on the ctmask, we always set them and perform the filtering
in the late stage, just before the delivery.

Thus, the event delivered looks like the following:

 [UPDATE] tcp      6 432000 ESTABLISHED src=192.168.0.2 dst=192.168.1.2 sport=37041 dport=80 src=192.168.1.2 dst=192.168.1.100 sport=80 dport=37041 [ASSURED]

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:06:30 +01:00
Pablo Neira Ayuso 9d0db8b6b1 netfilter: arpt_mangle: fix return values of checkentry
In 135367b "netfilter: xtables: change xt_target.checkentry return type",
the type returned by checkentry was changed from boolean to int, but the
return values where not adjusted.

arptables: Input/output error

This broke arptables with the mangle target since it returns true
under success, which is interpreted by xtables as >0, thus
returning EIO.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2011-02-01 16:03:46 +01:00
Eric W. Biederman bf36076a67 net: Fix ipv6 neighbour unregister_sysctl_table warning
In my testing of 2.6.37 I was occassionally getting a warning about
sysctl table entries being unregistered in the wrong order.  Digging
in it turns out this dates back to the last great sysctl reorg done
where Al Viro introduced the requirement that sysctl directories
needed to be created before and destroyed after the files in them.

It turns out that in that great reorg /proc/sys/net/ipv6/neigh was
overlooked.  So this patch fixes that oversight and makes an annoying
warning message go away.

>------------[ cut here ]------------
>WARNING: at kernel/sysctl.c:1992 unregister_sysctl_table+0x134/0x164()
>Pid: 23951, comm: kworker/u:3 Not tainted 2.6.37-350888.2010AroraKernelBeta.fc14.x86_64 #1
>Call Trace:
> [<ffffffff8103e034>] warn_slowpath_common+0x80/0x98
> [<ffffffff8103e061>] warn_slowpath_null+0x15/0x17
> [<ffffffff810452f8>] unregister_sysctl_table+0x134/0x164
> [<ffffffff810e7834>] ? kfree+0xc4/0xd1
> [<ffffffff813439b2>] neigh_sysctl_unregister+0x22/0x3a
> [<ffffffffa02cd14e>] addrconf_ifdown+0x33f/0x37b [ipv6]
> [<ffffffff81331ec2>] ? skb_dequeue+0x5f/0x6b
> [<ffffffffa02ce4a5>] addrconf_notify+0x69b/0x75c [ipv6]
> [<ffffffffa02eb953>] ? ip6mr_device_event+0x98/0xa9 [ipv6]
> [<ffffffff813d2413>] notifier_call_chain+0x32/0x5e
> [<ffffffff8105bdea>] raw_notifier_call_chain+0xf/0x11
> [<ffffffff8133cdac>] call_netdevice_notifiers+0x45/0x4a
> [<ffffffff8133d2b0>] rollback_registered_many+0x118/0x201
> [<ffffffff8133d3af>] unregister_netdevice_many+0x16/0x6d
> [<ffffffff8133d571>] default_device_exit_batch+0xa4/0xb8
> [<ffffffff81337c42>] ? cleanup_net+0x0/0x194
> [<ffffffff81337a2a>] ops_exit_list+0x4e/0x56
> [<ffffffff81337d36>] cleanup_net+0xf4/0x194
> [<ffffffff81053318>] process_one_work+0x187/0x280
> [<ffffffff8105441b>] worker_thread+0xff/0x19f
> [<ffffffff8105431c>] ? worker_thread+0x0/0x19f
> [<ffffffff8105776d>] kthread+0x7d/0x85
> [<ffffffff81003824>] kernel_thread_helper+0x4/0x10
> [<ffffffff810576f0>] ? kthread+0x0/0x85
> [<ffffffff81003820>] ? kernel_thread_helper+0x0/0x10
>---[ end trace 8a7e9310b35e9486 ]---

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 20:54:17 -08:00
Tom Herbert 8587523640 net: Check rps_flow_table when RPS map length is 1
In get_rps_cpu, add check that the rps_flow_table for the device is
NULL when trying to take fast path when RPS map length is one.
Without this, RFS is effectively disabled if map length is one which
is not correct.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 16:23:42 -08:00
Roland Dreier ec831ea72e net: Add default_mtu() methods to blackhole dst_ops
When an IPSEC SA is still being set up, __xfrm_lookup() will return
-EREMOTE and so ip_route_output_flow() will return a blackhole route.
This can happen in a sndmsg call, and after d33e455337 ("net: Abstract
default MTU metric calculation behind an accessor.") this leads to a
crash in ip_append_data() because the blackhole dst_ops have no
default_mtu() method and so dst_mtu() calls a NULL pointer.

Fix this by adding default_mtu() methods (that simply return 0, matching
the old behavior) to the blackhole dst_ops.

The IPv4 part of this patch fixes a crash that I saw when using an IPSEC
VPN; the IPv6 part is untested because I don't have an IPv6 VPN, but it
looks to be needed as well.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-31 13:16:00 -08:00
David S. Miller 81c2bdb688 Merge branch 'batman-adv/merge-oopsonly' of git://git.open-mesh.org/ecsv/linux-merge 2011-01-30 22:16:34 -08:00
Sven Eckelmann 1181e1daac batman-adv: Make vis info stack traversal threadsafe
The batman-adv vis server has to a stack which stores all information
about packets which should be send later. This stack is protected
with a spinlock that is used to prevent concurrent write access to it.

The send_vis_packets function has to take all elements from the stack
and send them to other hosts over the primary interface. The send will
be initiated without the lock which protects the stack.

The implementation using list_for_each_entry_safe has the problem that
it stores the next element as "safe ptr" to allow the deletion of the
current element in the list. The list may be modified during the
unlock/lock pair in the loop body which may make the safe pointer
not pointing to correct next element.

It is safer to remove and use the first element from the stack until no
elements are available. This does not need reduntant information which
would have to be validated each time the lock was removed.

Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:08 +01:00
Sven Eckelmann dda9fc6b2c batman-adv: Remove vis info element in free_info
The free_info function will be called when no reference to the info
object exists anymore. It must be ensured that the allocated memory
gets freed and not only the elements which are managed by the info
object.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:06 +01:00
Sven Eckelmann 2674c15870 batman-adv: Remove vis info on hashing errors
A newly created vis info object must be removed when it couldn't be
added to the hash. The old_info which has to be replaced was already
removed and isn't related to the hash anymore.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-30 10:32:02 +01:00
Eric W. Biederman 709b46e8d9 net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT
SIOCGETSGCNT is not a unique ioctl value as it it maps tio SIOCPROTOPRIVATE +1,
which unfortunately means the existing infrastructure for compat networking
ioctls is insufficient.  A trivial compact ioctl implementation would conflict
with:

SIOCAX25ADDUID
SIOCAIPXPRISLT
SIOCGETSGCNT_IN6
SIOCGETSGCNT
SIOCRSSCAUSE
SIOCX25SSUBSCRIP
SIOCX25SDTEFACILITIES

To make this work I have updated the compat_ioctl decode path to mirror the
the normal ioctl decode path.  I have added an ipv4 inet_compat_ioctl function
so that I can have ipv4 specific compat ioctls.   I have added a compat_ioctl
function into struct proto so I can break out ioctls by which kind of ip socket
I am using.  I have added a compat_raw_ioctl function because SIOCGETSGCNT only
works on raw sockets.  I have added a ipmr_compat_ioctl that mirrors the normal
ipmr_ioctl.

This was necessary because unfortunately the struct layout for the SIOCGETSGCNT
has unsigned longs in it so changes between 32bit and 64bit kernels.

This change was sufficient to run a 32bit ip multicast routing daemon on a
64bit kernel.

Reported-by: Bill Fenner <fenner@aristanetworks.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30 01:14:38 -08:00
Eric W. Biederman 13ad17745c net: Fix ip link add netns oops
Ed Swierk <eswierk@bigswitch.com> writes:
> On 2.6.35.7
>  ip link add link eth0 netns 9999 type macvlan
> where 9999 is a nonexistent PID triggers an oops and causes all network functions to hang:
> [10663.821898] BUG: unable to handle kernel NULL pointer dereference at 000000000000006d
>  [10663.821917] IP: [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.821933] PGD 1d3927067 PUD 22f5c5067 PMD 0
>  [10663.821944] Oops: 0000 [#1] SMP
>  [10663.821953] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/scaling_cur_freq
>  [10663.821959] CPU 3
>  [10663.821963] Modules linked in: macvlan ip6table_filter ip6_tables rfcomm ipt_MASQUERADE binfmt_misc iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_state nf_conntrack sco ipt_REJECT bnep l2cap xt_tcpudp iptable_filter ip_tables x_tables bridge stp vboxnetadp vboxnetflt vboxdrv kvm_intel kvm parport_pc ppdev snd_hda_codec_intelhdmi snd_hda_codec_conexant arc4 iwlagn iwlcore mac80211 snd_hda_intel snd_hda_codec snd_hwdep snd_pcm snd_seq_midi snd_rawmidi i915 snd_seq_midi_event snd_seq thinkpad_acpi drm_kms_helper btusb tpm_tis nvram uvcvideo snd_timer snd_seq_device bluetooth videodev v4l1_compat v4l2_compat_ioctl32 tpm drm tpm_bios snd cfg80211 psmouse serio_raw intel_ips soundcore snd_page_alloc intel_agp i2c_algo_bit video output netconsole configfs lp parport usbhid hid e1000e sdhci_pci ahci libahci sdhci led_class
>  [10663.822155]
>  [10663.822161] Pid: 6000, comm: ip Not tainted 2.6.35-23-generic #41-Ubuntu 2901CTO/2901CTO
>  [10663.822167] RIP: 0010:[<ffffffff8149c2fa>] [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.822177] RSP: 0018:ffff88014aebf7b8 EFLAGS: 00010286
>  [10663.822182] RAX: 00000000fffffff4 RBX: ffff8801ad900800 RCX: 0000000000000000
>  [10663.822187] RDX: ffff880000000000 RSI: 0000000000000000 RDI: ffff88014ad63000
>  [10663.822191] RBP: ffff88014aebf808 R08: 0000000000000041 R09: 0000000000000041
>  [10663.822196] R10: 0000000000000000 R11: dead000000200200 R12: ffff88014aebf818
>  [10663.822201] R13: fffffffffffffffd R14: ffff88014aebf918 R15: ffff88014ad62000
>  [10663.822207] FS: 00007f00c487f700(0000) GS:ffff880001f80000(0000) knlGS:0000000000000000
>  [10663.822212] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>  [10663.822216] CR2: 000000000000006d CR3: 0000000231f19000 CR4: 00000000000026e0
>  [10663.822221] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
>  [10663.822226] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
>  [10663.822231] Process ip (pid: 6000, threadinfo ffff88014aebe000, task ffff88014afb16e0)
>  [10663.822236] Stack:
>  [10663.822240] ffff88014aebf808 ffffffff814a2bb5 ffff88014aebf7e8 00000000a00ee8d6
>  [10663.822251] <0> 0000000000000000 ffffffffa00ef940 ffff8801ad900800 ffff88014aebf818
>  [10663.822265] <0> ffff88014aebf918 ffff8801ad900800 ffff88014aebf858 ffffffff8149c413
>  [10663.822281] Call Trace:
>  [10663.822290] [<ffffffff814a2bb5>] ? dev_addr_init+0x75/0xb0
>  [10663.822298] [<ffffffff8149c413>] dev_alloc_name+0x43/0x90
>  [10663.822307] [<ffffffff814a85ee>] rtnl_create_link+0xbe/0x1b0
>  [10663.822314] [<ffffffff814ab2aa>] rtnl_newlink+0x48a/0x570
>  [10663.822321] [<ffffffff814aafcc>] ? rtnl_newlink+0x1ac/0x570
>  [10663.822332] [<ffffffff81030064>] ? native_x2apic_icr_read+0x4/0x20
>  [10663.822339] [<ffffffff814a8c17>] rtnetlink_rcv_msg+0x177/0x290
>  [10663.822346] [<ffffffff814a8aa0>] ? rtnetlink_rcv_msg+0x0/0x290
>  [10663.822354] [<ffffffff814c25d9>] netlink_rcv_skb+0xa9/0xd0
>  [10663.822360] [<ffffffff814a8a85>] rtnetlink_rcv+0x25/0x40
>  [10663.822367] [<ffffffff814c223e>] netlink_unicast+0x2de/0x2f0
>  [10663.822374] [<ffffffff814c303e>] netlink_sendmsg+0x1fe/0x2e0
>  [10663.822383] [<ffffffff81488533>] sock_sendmsg+0xf3/0x120
>  [10663.822391] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822400] [<ffffffff81168656>] ? __d_lookup+0x136/0x150
>  [10663.822406] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822414] [<ffffffff812b7a0d>] ? _atomic_dec_and_lock+0x4d/0x80
>  [10663.822422] [<ffffffff8116ea90>] ? mntput_no_expire+0x30/0x110
>  [10663.822429] [<ffffffff81486ff5>] ? move_addr_to_kernel+0x65/0x70
>  [10663.822435] [<ffffffff81493308>] ? verify_iovec+0x88/0xe0
>  [10663.822442] [<ffffffff81489020>] sys_sendmsg+0x240/0x3a0
> [10663.822450] [<ffffffff8111e2a9>] ? __do_fault+0x479/0x560
>  [10663.822457] [<ffffffff815899fe>] ? _raw_spin_lock+0xe/0x20
>  [10663.822465] [<ffffffff8116cf4a>] ? alloc_fd+0x10a/0x150
>  [10663.822473] [<ffffffff8158d76e>] ? do_page_fault+0x15e/0x350
>  [10663.822482] [<ffffffff8100a0f2>] system_call_fastpath+0x16/0x1b
>  [10663.822487] Code: 90 48 8d 78 02 be 25 00 00 00 e8 92 1d e2 ff 48 85 c0 75 cf bf 20 00 00 00 e8 c3 b1 c6 ff 49 89 c7 b8 f4 ff ff ff 4d 85 ff 74 bd <4d> 8b 75 70 49 8d 45 70 48 89 45 b8 49 83 ee 58 eb 28 48 8d 55
>  [10663.822618] RIP [<ffffffff8149c2fa>] __dev_alloc_name+0x9a/0x170
>  [10663.822627] RSP <ffff88014aebf7b8>
>  [10663.822631] CR2: 000000000000006d
>  [10663.822636] ---[ end trace 3dfd6c3ad5327ca7 ]---

This bug was introduced in:
commit 81adee47df
Author: Eric W. Biederman <ebiederm@aristanetworks.com>
Date:   Sun Nov 8 00:53:51 2009 -0800

    net: Support specifying the network namespace upon device creation.

    There is no good reason to not support userspace specifying the
    network namespace during device creation, and it makes it easier
    to create a network device and pass it to a child network namespace
    with a well known name.

    We have to be careful to ensure that the target network namespace
    for the new device exists through the life of the call.  To keep
    that logic clear I have factored out the network namespace grabbing
    logic into rtnl_link_get_net.

    In addtion we need to continue to pass the source network namespace
    to the rtnl_link_ops.newlink method so that we can find the base
    device source network namespace.

    Signed-off-by: Eric W. Biederman <ebiederm@aristanetworks.com>
    Acked-by: Eric Dumazet <eric.dumazet@gmail.com>

Where apparently I forgot to add error handling to the path where we create
a new network device in a new network namespace, and pass in an invalid pid.

Cc: stable@kernel.org
Reported-by: Ed Swierk <eswierk@bigswitch.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-30 01:14:15 -08:00
Herbert Xu 66c46d741e gro: Reset dev pointer on reuse
On older kernels the VLAN code may zero skb->dev before dropping
it and causing it to be reused by GRO.

Unfortunately we didn't reset skb->dev in that case which causes
the next GRO user to get a bogus skb->dev pointer.

This particular problem no longer happens with the current upstream
kernel due to changes in VLAN processing.

However, for correctness we should still reset the skb->dev pointer
in the GRO reuse function in case a future user does the same thing.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-29 22:36:24 -08:00
David S. Miller 8f2771f2b8 ipv6: Remove route peer binding assertions.
They are bogus.  The basic idea is that I wanted to make sure
that prefixed routes never bind to peers.

The test I used was whether RTF_CACHE was set.

But first of all, the RTF_CACHE flag is set at different spots
depending upon which ip6_rt_copy() caller you're talking about.

I've validated all of the code paths, and even in the future
where we bind peers more aggressively (for route metric COW'ing)
we never bind to prefix'd routes, only fully specified ones.
This even applies when addrconf or icmp6 routes are allocated.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:55:22 -08:00
Eric Dumazet c2aa3665cf net: add kmemcheck annotation in __alloc_skb()
pskb_expand_head() triggers a kmemcheck warning when copy of
skb_shared_info is done in pskb_expand_head()

This is because destructor_arg field is not necessarily initialized at
this point. Add kmemcheck_annotate_variable() call in __alloc_skb() to
instruct kmemcheck this is a normal situation.

Resolves bugzilla.kernel.org 27212

Reference: https://bugzilla.kernel.org/show_bug.cgi?id=27212
Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:41:06 -08:00
Kurt Van Dijck 6d3a9a6854 net: fix validate_link_af in rtnetlink core
I'm testing an API that uses IFLA_AF_SPEC attribute.
In the rtnetlink core , the set_link_af() member
of the rtnl_af_ops struct receives the nested attribute
(as I expected), but the validate_link_af() member
receives the parent attribute.
IMO, this patch fixes this.

Signed-off-by: Kurt Van Dijck <kurt.van.dijck@eia.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:39:21 -08:00
Eric Dumazet 389f2a18c6 econet: remove compiler warnings
net/econet/af_econet.c: In function ‘econet_sendmsg’:
net/econet/af_econet.c:494: warning: label ‘error’ defined but not used
net/econet/af_econet.c:268: warning: unused variable ‘sk’

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-27 14:15:54 -08:00
David S. Miller 7cc2edb834 xfrm6: Don't forget to propagate peer into ipsec route.
Like ipv4, we have to propagate the ipv6 route peer into
the ipsec top-level route during instantiation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-26 13:41:03 -08:00
David S. Miller 9b6941d8b1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2011-01-26 11:49:49 -08:00
Linus Lüssing dd58ddc692 batman-adv: Fix kernel panic when fetching vis data on a vis server
The hash_iterate removal introduced a bug leading to a kernel panic when
fetching the vis data on a vis server. That commit forgot to rename one
variable name, which this commit fixes now.

Reported-by: Russell Senior <russell@personaltelco.net>
Signed-off-by: Linus Lüssing <linus.luessing@web.de>
Signed-off-by: Sven Eckelmann <sven@narfation.org>
2011-01-25 23:58:33 +01:00
Jerry Chu 44f5324b5d TCP: fix a bug that triggers large number of TCP RST by mistake
This patch fixes a bug that causes TCP RST packets to be generated
on otherwise correctly behaved applications, e.g., no unread data
on close,..., etc. To trigger the bug, at least two conditions must
be met:

1. The FIN flag is set on the last data packet, i.e., it's not on a
separate, FIN only packet.
2. The size of the last data chunk on the receive side matches
exactly with the size of buffer posted by the receiver, and the
receiver closes the socket without any further read attempt.

This bug was first noticed on our netperf based testbed for our IW10
proposal to IETF where a large number of RST packets were observed.
netperf's read side code meets the condition 2 above 100%.

Before the fix, tcp_data_queue() will queue the last skb that meets
condition 1 to sk_receive_queue even though it has fully copied out
(skb_copy_datagram_iovec()) the data. Then if condition 2 is also met,
tcp_recvmsg() often returns all the copied out data successfully
without actually consuming the skb, due to a check
"if ((chunk = len - tp->ucopy.len) != 0) {"
and
"len -= chunk;"
after tcp_prequeue_process() that causes "len" to become 0 and an
early exit from the big while loop.

I don't see any reason not to free the skb whose data have been fully
consumed in tcp_data_queue(), regardless of the FIN flag.  We won't
get there if MSG_PEEK is on. Am I missing some arcane cases related
to urgent data?

Signed-off-by: H.K. Jerry Chu <hkchu@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-01-25 13:46:30 -08:00