Commit Graph

17679 Commits (014bcd28a04ec085940ea3125f5ea91b16a1945d)

Author SHA1 Message Date
John W. Linville 1937721f56 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 2010-12-02 14:00:51 -05:00
David S. Miller ae4694b2d3 ipv6: Create inet6_csk_route_req().
Brother of ipv4's inet_csk_route_req().

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-02 10:59:22 -08:00
David S. Miller ccb7c410dd timewait_sock: Create and use getpeer op.
The only thing AF-specific about remembering the timestamp
for a time-wait TCP socket is getting the peer.

Abstract that behind a new timewait_sock_ops vector.

Support for real IPV6 sockets is not filled in yet, but
curiously this makes timewait recycling start to work
for v4-mapped ipv6 sockets.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 18:09:13 -08:00
David S. Miller 8790ca172a inetpeer: Kill use of inet_peer_address_t typedef.
They are verboten these days.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 17:28:18 -08:00
Andrei Emeltchenko 70f23020e6 Bluetooth: clean up hci code
Do not use assignment in IF condition, remove extra spaces,
fixing typos, simplify code.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Andrei Emeltchenko 894718a6be Bluetooth: clean up l2cap code
Do not initialize static vars to zero, macros with complex values
shall be enclosed with (), remove unneeded braces.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Andrei Emeltchenko 285b4e9031 Bluetooth: clean up rfcomm code
Remove extra spaces, assignments in if statement, zeroing static
variables, extra braces. Fix includes.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Andrei Emeltchenko 735cbc4784 Bluetooth: clean up sco code
Do not use assignments in IF condition, remove extra spaces

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Anderson Lizardo b78d7b4f20 Bluetooth: Fix error handling for l2cap_init()
create_singlethread_workqueue() may fail with errors such as -ENOMEM. If
this happens, the return value is not set to a negative value and the
module load will succeed. It will then crash on module unload because of
a destroy_workqueue() call on a NULL pointer.

Additionally, the _busy_wq workqueue is not being destroyed if any
errors happen on l2cap_init().

Signed-off-by: Anderson Lizardo <anderson.lizardo@openbossa.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Gustavo F. Padovan eeb366564b Bluetooth: Get rid of __rfcomm_get_sock_by_channel()
rfcomm_get_sock_by_channel() was the only user of this function, so I merged
both into rfcomm_get_sock_by_channel(). The socket lock now should be hold
outside of rfcomm_get_sock_by_channel() once we hold and release it inside the
same function now.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:43 -02:00
Gustavo F. Padovan e0f0cb5636 Bluetooth: Get rid of __l2cap_get_sock_by_psm()
l2cap_get_sock_by_psm() was the only user of this function, so I merged
both into l2cap_get_sock_by_psm(). The socket lock now should be hold
outside of l2cap_get_sock_by_psm() once we hold and release it inside the
same function now.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:42 -02:00
Andrei Emeltchenko cc11b9c14d Bluetooth: do not use assignment in if condition
Fix checkpatch errors like:
"ERROR: do not use assignment in if condition"
Simplify code and fix one long line.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Ville Tervo <ville.tervo@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Andrei Emeltchenko 940a9eea80 Bluetooth: timer check sk is not owned before freeing
In timer context we might delete l2cap channel used by krfcommd.
The check makes sure that sk is not owned. If sk is owned we
restart timer for HZ/5.

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Andrei Emeltchenko a49184c229 Bluetooth: Check sk is not owned before freeing l2cap_conn
Check that socket sk is not locked in user process before removing
l2cap connection handler.

lock_sock and release_sock do not hold a normal spinlock directly but
instead hold the owner field. This means bh_lock_sock can still execute
even if the socket is "locked". More info can be found here:
http://www.linuxfoundation.org/collaborate/workgroups/networking/socketlocks

krfcommd kernel thread may be preempted with l2cap tasklet which remove
l2cap_conn structure. If krfcommd is in process of sending of RFCOMM reply
(like "RFCOMM UA" reply to "RFCOMM DISC") then kernel crash happens.

...
[  694.175933] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[  694.184936] pgd = c0004000
[  694.187683] [00000000] *pgd=00000000
[  694.191711] Internal error: Oops: 5 [#1] PREEMPT
[  694.196350] last sysfs file: /sys/devices/platform/hci_h4p/firmware/hci_h4p/loading
[  694.260375] CPU: 0    Not tainted  (2.6.32.10 #1)
[  694.265106] PC is at l2cap_sock_sendmsg+0x43c/0x73c [l2cap]
[  694.270721] LR is at 0xd7017303
...
[  694.525085] Backtrace:
[  694.527587] [<bf266be0>] (l2cap_sock_sendmsg+0x0/0x73c [l2cap]) from [<c02f2cc8>] (sock_sendmsg+0xb8/0xd8)
[  694.537292] [<c02f2c10>] (sock_sendmsg+0x0/0xd8) from [<c02f3044>] (kernel_sendmsg+0x48/0x80)

Signed-off-by: Andrei Emeltchenko <andrei.emeltchenko@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Vasiliy Kulikov d31dbf6e59 Bluetooth: hidp: fix information leak to userland
Structure hidp_conninfo is copied to userland with version, product,
vendor and name fields unitialized if both session->input and session->hid
are NULL.  It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:36 -02:00
Vasiliy Kulikov 3185fbd9d7 Bluetooth: cmtp: fix information leak to userland
Structure cmtp_conninfo is copied to userland with some padding fields
unitialized.  It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Vasiliy Kulikov 5520d20f68 Bluetooth: bnep: fix information leak to userland
Structure bnep_conninfo is copied to userland with the field "device"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Johan Hedberg 127178d24c Bluetooth: Automate remote name requests
In Bluetooth there are no automatic updates of remote device names when
they get changed on the remote side. Instead, it is a good idea to do a
manual name request when a new connection gets created (for whatever
reason) since at this point it is very cheap (no costly baseband
connection creation needed just for the sake of the name request).

So far userspace has been responsible for this extra name request but
tighter control is needed in order not to flood Bluetooth controllers
with two many commands during connection creation. It has been shown
that some controllers simply fail to function correctly if they get too
many (almost) simultaneous commands during connection creation. The
simplest way to acheive better control of these commands is to move
their sending completely to the kernel side.

This patch inserts name requests into the sequence of events that the
kernel performs during connection creation. It does this after the
remote features have been successfully requested and before any pending
authentication requests are performed. The code will work sub-optimally
with userspace versions that still do the name requesting themselves (it
shouldn't break anything though) so it is recommended to combine this
with a userspace software version that doesn't have automated name
requests.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Johan Hedberg 392599b95d Bluetooth: Create a unified authentication request function
This patch adds a single function that's responsible for requesting
authentication for outgoing connections. This is preparation for the
next patch which will add automated name requests and thereby move the
authentication requests to a different location.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Johan Hedberg ccd556fe33 Bluetooth: Simplify remote features callback function logic
The current remote and remote extended features event callbacks logic
can be made simpler by using a label and goto statements instead of the
current multiple levels of nested if statements.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-12-01 21:04:35 -02:00
Gustavo F. Padovan 17f490bced Merge git://git.kernel.org/pub/scm/linux/kernel/git/padovan/bluetooth-2.6 into test 2010-12-01 21:04:09 -02:00
David McCullough 6dcdd1b369 net/ipv6/sit.c: return unhandled skb to tunnel4_rcv
I found a problem using an IPv6 over IPv4 tunnel.  When CONFIG_IPV6_SIT
was enabled, the packets would be rejected as net/ipv6/sit.c was catching
all IPPROTO_IPV6 packets and returning an ICMP port unreachable error.

I think this patch fixes the problem cleanly.  I believe the code in
net/ipv4/tunnel4.c:tunnel4_rcv takes care of it properly if none of the
handlers claim the skb.

Signed-off-by: David McCullough <david_mccullough@mcafee.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 13:19:34 -08:00
stephen hemminger 8afe7c8acd ipip: add module alias for tunl0 tunnel device
If ipip is built as a module the 'ip tunnel add' command would fail because
the ipip module was not being autoloaded.  Adding an alias for
the tunl0 device name cause dev_load() to autoload it when needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:53:23 -08:00
stephen hemminger 4da6a738ff gre: add module alias for gre0 tunnel device
If gre is built as a module the 'ip tunnel add' command would fail because
the ip_gre module was not being autoloaded.  Adding an alias for
the gre0 device name cause dev_load() to autoload it when needed.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:53:22 -08:00
stephen hemminger 407d6fcbfd gre: minor cleanups
Use strcpy() rather the sprintf() for the case where name is getting
generated.  Fix indentation.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:53:22 -08:00
Eric Dumazet f2cd2d3e9b net sched: use xps information for qdisc NUMA affinity
Allocate qdisc memory according to NUMA properties of cpus included in
xps map.

To be effective, qdisc should be (re)setup after changes
of /sys/class/net/eth<n>/queues/tx-<n>/xps_cpus

I added a numa_node field in struct netdev_queue, containing NUMA node
if all cpus included in xps_cpus share same node, else -1.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 12:47:42 -08:00
Anders Franzen 381601e5bb Make the ip6_tunnel reflect the true mtu.
The ip6_tunnel always assumes it consumes 40 bytes (ip6 hdr) of the mtu of the
underlaying device. So for a normal ethernet bearer, the mtu of the ip6_tunnel is
1460.
However, when creating a tunnel the encap limit option is enabled by default, and it
consumes 8 bytes more, so the true mtu shall be 1452.

I dont really know if this breaks some statement in some RFC, so this is a request for
comments.

Signed-off-by: Anders Franzen <anders.franzen@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-12-01 10:55:47 -08:00
David S. Miller 3f419d2d48 inet: Turn ->remember_stamp into ->get_peer in connection AF ops.
Then we can make a completely generic tcp_remember_stamp()
that uses ->get_peer() as a helper, minimizing the AF specific
code and minimizing the eventual code duplication when we implement
the ipv6 side of TW recycling.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:28:06 -08:00
David S. Miller b341936380 ipv6: Add infrastructure to bind inet_peer objects to routes.
They are only allowed on cached ipv6 routes.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:27:11 -08:00
David S. Miller 021e929911 inetpeer: Add v6 peers tree, abstract root properly.
Add the ipv6 peer tree instance, and adapt remaining
direct references to 'v4_peers' as needed.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:12:23 -08:00
David S. Miller 0266304502 inetpeer: Abstract address comparisons.
Now v4 and v6 addresses will both work properly.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 12:08:53 -08:00
David S. Miller b534ecf1cd inetpeer: Make inet_getpeer() take an inet_peer_adress_t pointer.
And make an inet_getpeer_v4() helper, update callers.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 11:54:19 -08:00
David S. Miller 582a72da9a inetpeer: Introduce inet_peer_address_t.
Currently only the v4 aspect is used, but this will change.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 11:53:55 -08:00
David S. Miller 98158f5a85 inetpeer: Abstract out the tree root accesses.
Instead of directly accessing "peer", change to code to
operate using a "struct inet_peer_base *" pointer.

This will facilitate the addition of a seperate tree for
ipv6 peer entries.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-30 11:41:59 -08:00
Helmut Schaa 08ca944eb2 mac80211: Minor optimization in ieee80211_rx_h_data
Remove a superfluous ieee80211_is_data check as that was checked a few
lines before already and we wont't get here for non-data frames at all.

Second, the frame was already converted to 802.3 header format and
reading the fc and addr1 fields was only possible because the 802.3
header is short enough and didn't overwrite the relevant parts of the
802.11 header. Make the code more obvious by checking the ethernet
header's h_dest field.

Furthermore reorder the conditions to reduce the number of checks
when dynamic powersave is not needed (AP mode for example).

Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Reviewed-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-30 13:58:07 -05:00
Senthil Balasubramanian 8e26d5ad2f mac80211: Fix STA disconnect due to MIC failure
Th commit titled "mac80211: clean up rx handling wrt. found_sta"
removed found_sta variable which caused a MIC failure event
to be reported twice for a single failure to supplicant resulted
in STA disconnect.

This should fix WPA specific countermeasures WiFi test case (5.2.17)
issues with mac80211 based drivers which report MIC failure events in
rx status.

Cc: Stable <stable@kernel.org> (2.6.37)
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-30 13:45:02 -05:00
Christian Lamparter 2c31333a8f mac80211: ignore non-bcast mcast deauth/disassoc franes
This patch fixes an curious issue due to insufficient
rx frame filtering.

Saqeb Akhter reported frequent disconnects while streaming
videos over samba: <http://marc.info/?m=128600031109136>
> [ 1166.512087] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [ 1526.059997] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [ 2125.324356] wlan1: deauthenticated from 30:46:9a:10:49:f7 (Reason: 7)
> [...]

The reason is that the device generates frames with slightly
bogus SA/TA addresses.

e.g.:
 [ 2314.402316] Ignore 9f:1f:31:f8:64:ff
 [ 2314.402321] Ignore 9f:1f:31:f8:64:ff
 [ 2352.453804] Ignore 0d:1f:31:f8:64:ff
 [ 2352.453808] Ignore 0d:1f:31:f8:64:ff
 					   ^^ the group-address flag is set!
 (the correct SA/TA would be: 00:1f:31:f8:64:ff)

Since the AP does not know from where the frames come, it
generates a DEAUTH response for the (invalid) mcast address.
This mcast deauth frame then passes through all filters and
tricks the stack into thinking that the AP brutally kicked
us!

This patch fixes the problem by simply ignoring
non-broadcast, group-addressed deauth/disassoc frames.

Cc: Jouni Malinen <j@w1.fi>
Cc: Johannes Berg <johannes@sipsolutions.net>
Reported-by: Saqeb Akhter <saqeb.akhter@gmail.com>
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-30 13:23:06 -05:00
Linus Torvalds a01af8e4a4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (27 commits)
  af_unix: limit recursion level
  pch_gbe driver: The wrong of initializer entry
  pch_gbe dreiver: chang author
  ucc_geth: fix ucc halt problem in half duplex mode
  inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
  ehea: Add some info messages and fix an issue
  hso: fix disable_net
  NET: wan/x25_asy, move lapb_unregister to x25_asy_close_tty
  cxgb4vf: fix setting unicast/multicast addresses ...
  net, ppp: Report correct error code if unit allocation failed
  DECnet: don't leak uninitialized stack byte
  au1000_eth: fix invalid address accessing the MAC enable register
  dccp: fix error in updating the GAR
  tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
  netns: Don't leak others' openreq-s in proc
  Net: ceph: Makefile: Remove unnessary code
  vhost/net: fix rcu check usage
  econet: fix CVE-2010-3848
  econet: fix CVE-2010-3850
  econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
  ...
2010-11-29 14:36:33 -08:00
Johannes Berg dd318575ff mac80211: fix RX aggregation locking
The RX aggregation locking documentation was
wrong, which led Christian to also code the
timer timeout handling for it somewhat wrongly.

Fix the documentation, the two places that
need to hold the reorder lock across accesses
to the structure, and the debugfs code that
should just use RCU.

Also, remove acquiring the sta->lock across
reorder timeouts since it isn't necessary, and
change a few places to GFP_KERNEL because the
code path here doesn't need atomic allocations
as I noticed when reviewing all this.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Acked-by: Christian Lamparter <chunkeey@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:30:30 -05:00
Johannes Berg f30221e4ec mac80211: implement off-channel mgmt TX
This implements the new off-channel TX API
in mac80211 with a new work item type. The
operation doesn't add a new work item when
we're on the right channel and there's no
wait time so that for example p2p probe
responses will be transmitted without delay.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:24:35 -05:00
Johannes Berg f7ca38dfe5 nl80211/cfg80211: extend mgmt-tx API for off-channel
With p2p, it is sometimes necessary to transmit
a frame (typically an action frame) on another
channel than the current channel. Enable this
through the CMD_FRAME API, and allow it to wait
for a response. A new command allows that wait
to be aborted.

However, allow userspace to specify whether or
not it wants to allow off-channel TX, it may
actually want to use the same channel only.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 15:24:35 -05:00
Jouni Malinen 7dff312553 mac80211: Fix frame injection using non-AP vif
In order for frame injection to work properly for some use cases
(e.g., finding the station entry and keys for encryption), mac80211
needs to find the correct sdata entry. This works when the main vif
is in AP mode, but commit a2c1e3dad5
broke this particular use case for station main vif. While this type of
injection is quite unusual operation, it has some uses and we should fix
it. Do this by changing the monitor vif sdata selection to allow station
vif to be selected instead of limiting it to just AP vifs. We still need
to skip some iftypes to avoid selecting unsuitable vif for injection.

Signed-off-by: Jouni Malinen <jouni.malinen@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-29 14:41:28 -05:00
David S. Miller 77148625e1 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-29 11:19:09 -08:00
Eric Dumazet 25888e3031 af_unix: limit recursion level
Its easy to eat all kernel memory and trigger NMI watchdog, using an
exploit program that queues unix sockets on top of others.

lkml ref : http://lkml.org/lkml/2010/11/25/8

This mechanism is used in applications, one choice we have is to have a
recursion limit.

Other limits might be needed as well (if we queue other types of files),
since the passfd mechanism is currently limited by socket receive queue
sizes only.

Add a recursion_level to unix socket, allowing up to 4 levels.

Each time we send an unix socket through sendfd mechanism, we copy its
recursion level (plus one) to receiver. This recursion level is cleared
when socket receive queue is emptied.

Reported-by: Марк Коренберг <socketpair@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:45:15 -08:00
Eric Dumazet a417786948 xps: add __rcu annotations
Avoid sparse warnings : add __rcu annotations and use
rcu_dereference_protected() where necessary.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:43:13 -08:00
Eric Dumazet b02038a17b xps: NUMA allocations for per cpu data
store_xps_map() allocates maps that are used by single cpu, it makes
sense to use NUMA allocations.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-29 09:43:13 -08:00
Tom Herbert bf26414510 xps: Add CONFIG_XPS
This patch adds XPS_CONFIG option to enable and disable XPS.  This is
done in the same manner as RPS_CONFIG.  This is also fixes build
failure in XPS code when SMP is not enabled.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 18:24:14 -08:00
Nagendra Tomar b4ff3c90e6 inet: Fix __inet_inherit_port() to correctly increment bsockets and num_owners
inet sockets corresponding to passive connections are added to the bind hash
using ___inet_inherit_port(). These sockets are later removed from the bind
hash using __inet_put_port(). These two functions are not exactly symmetrical.
__inet_put_port() decrements hashinfo->bsockets and tb->num_owners, whereas
___inet_inherit_port() does not increment them. This results in both of these
going to -ve values.

This patch fixes this by calling inet_bind_hash() from ___inet_inherit_port(),
which does the right thing.

'bsockets' and 'num_owners' were introduced by commit a9d8f9110d
(inet: Allowing more than 64k connections and heavily optimize bind(0))

Signed-off-by: Nagendra Singh Tomar <tomer_iisc@yahoo.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Evgeniy Polyakov <zbr@ioremap.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 18:18:44 -08:00
Dan Rosenberg 3c6f27bf33 DECnet: don't leak uninitialized stack byte
A single uninitialized padding byte is leaked to userspace.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
CC: stable <stable@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:32:30 -08:00
Gerrit Renker 0ac7887022 dccp: fix error in updating the GAR
This fixes a bug in updating the Greatest Acknowledgment number Received (GAR):
the current implementation does not track the greatest received value -
lower values in the range AWL..AWH (RFC 4340, 7.5.1) erase higher ones.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:29:27 -08:00
Jozsef Kadlecsik 82a39eb6b3 ipv6: Prepare the tree for un-inlined jhash.
jhash is widely used in the kernel and because the functions
are inlined, the cost in size is significant. Also, the new jhash
functions are slightly larger than the previous ones so better un-inline.
As a preparation step, the calls to the internal macros are replaced
with the plain jhash function calls.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:26:21 -08:00
andrew hendry 3f0a069a1d X25 remove bkl in call user data length ioctl
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:22 -08:00
andrew hendry 74a7e44080 X25 remove bkl from causediag ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:21 -08:00
andrew hendry 5b7958dfa5 X25 remove bkl from calluserdata ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:21 -08:00
andrew hendry f90de66067 X25 remove bkl in facility ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:20 -08:00
andrew hendry 5595a1a599 X25 remove bkl in subscription ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 11:12:20 -08:00
John Fastabend 19eb5cc559 8021q: vlan device is lockless do not transfer real_num_{tx|rx}_queues
Now that the vlan device is lockless and single queue do not
transfer the real num queues. This is causing a BUG_ON to occur.

kernel BUG at net/8021q/vlan.c:345!
Call Trace:
[<ffffffff813fd6e8>] ? fib_rules_event+0x28/0x1b0
[<ffffffff814ad2b5>] notifier_call_chain+0x55/0x80
[<ffffffff81089156>] raw_notifier_call_chain+0x16/0x20
[<ffffffff813e5af7>] call_netdevice_notifiers+0x37/0x70
[<ffffffff813e6756>] netdev_features_change+0x16/0x20
[<ffffffffa02995be>] ixgbe_fcoe_enable+0xae/0x100 [ixgbe]
[<ffffffffa01da06a>] vlan_dev_fcoe_enable+0x2a/0x30 [8021q]
[<ffffffffa02d08c3>] fcoe_create+0x163/0x630 [fcoe]
[<ffffffff811244d5>] ? mmap_region+0x255/0x5a0
[<ffffffff81080ef0>] param_attr_store+0x50/0x80
[<ffffffff810809b6>] module_attr_store+0x26/0x30
[<ffffffff811b9db2>] sysfs_write_file+0xf2/0x180
[<ffffffff8114fc88>] vfs_write+0xc8/0x190
[<ffffffff81150621>] sys_write+0x51/0x90
[<ffffffff8100c0b2>] system_call_fastpath+0x16/0x1b

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:19 -08:00
Eric Dumazet 5a0d2268d2 net: add netif_tx_queue_frozen_or_stopped
When testing struct netdev_queue state against FROZEN bit, we also test
XOFF bit. We can test both bits at once and save some cycles.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:18 -08:00
Shan Wei d3c15cab21 ipv6: kill two unused macro definition
1. IPV6_TLV_TEL_DST_SIZE
This has not been using for several years since created.

2. RT6_INFO_LEN
commit 33120b30 kill all RT6_INFO_LEN's references, but only this definition remained.

commit 33120b30cc
Author: Alexey Dobriyan <adobriyan@sw.ru>
Date:   Tue Nov 6 05:27:11 2007 -0800

    [IPV6]: Convert /proc/net/ipv6_route to seq_file interface

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:17 -08:00
Uwe Kleine-König a40c9f88b5 net: add some KERN_CONT markers to continuation lines
Cc: netdev@vger.kernel.org
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:47:17 -08:00
Alexey Dobriyan 0147fc058d tcp: restrict net.ipv4.tcp_adv_win_scale (#20312)
tcp_win_from_space() does the following:

      if (sysctl_tcp_adv_win_scale <= 0)
              return space >> (-sysctl_tcp_adv_win_scale);
      else
              return space - (space >> sysctl_tcp_adv_win_scale);

"space" is int.

As per C99 6.5.7 (3) shifting int for 32 or more bits is
undefined behaviour.

Indeed, if sysctl_tcp_adv_win_scale is exactly 32,
space >> 32 equals space and function returns 0.

Which means we busyloop in tcp_fixup_rcvbuf().

Restrict net.ipv4.tcp_adv_win_scale to [-31, 31].

Fix https://bugzilla.kernel.org/show_bug.cgi?id=20312

Steps to reproduce:

      echo 32 >/proc/sys/net/ipv4/tcp_adv_win_scale
      wget www.kernel.org
      [softlockup]

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-28 10:39:45 -08:00
Pavel Emelyanov 8475ef9fd1 netns: Don't leak others' openreq-s in proc
The /proc/net/tcp leaks openreq sockets from other namespaces.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 22:57:48 -08:00
Thomas Graf cf7afbfeb8 rtnl: make link af-specific updates atomic
As David pointed out correctly, updates to af-specific attributes
are currently not atomic. If multiple changes are requested and
one of them fails, previous updates may have been applied already
leaving the link behind in a undefined state.

This patch splits the function parse_link_af() into two functions
validate_link_af() and set_link_at(). validate_link_af() is placed
to validate_linkmsg() check for errors as early as possible before
any changes to the link have been made. set_link_af() is called to
commit the changes later.

This method is not fail proof, while it is currently sufficient
to make set_link_af() inerrable and thus 100% atomic, the
validation function method will not be able to detect all error
scenarios in the future, there will likely always be errors
depending on states which are f.e. not protected by rtnl_mutex
and thus may change between validation and setting.

Also, instead of silently ignoring unknown address families and
config blocks for address families which did not register a set
function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
returned to avoid comitting 4 out of 5 update requests without
notifying the user.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 22:56:08 -08:00
Tracey Dent 4cb6a614ba Net: ceph: Makefile: Remove unnessary code
Remove the if and else conditional because the code is in mainline and there
is no need in it being there.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-27 17:39:29 -08:00
Linus Torvalds 19650e8580 Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  NFS: Ensure we return the dirent->d_type when it is known
  NFS: Correct the array bound calculation in nfs_readdir_add_to_array
  NFS: Don't ignore errors from nfs_do_filldir()
  NFS: Fix the error handling in "uncached_readdir()"
  NFS: Fix a page leak in uncached_readdir()
  NFS: Fix a page leak in nfs_do_filldir()
  NFS: Assume eof if the server returns no readdir records
  NFS: Buffer overflow in ->decode_dirent() should not be fatal
  Pure nfs client performance using odirect.
  SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
2010-11-27 07:30:30 +09:00
John W. Linville 51cce8a590 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem 2010-11-24 16:49:20 -05:00
Johannes Berg 99ba2a1428 mac80211: implement packet loss notification
For drivers that have accurate TX status reporting
we can report the number of consecutive lost packets
to userspace using the new cfg80211 CQM event. The
threshold is fixed right now, this may need to be
improved in the future.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Johannes Berg c063dbf52b cfg80211: allow using CQM event to notify packet loss
This adds the ability for drivers to use CQM events
to notify about packet loss for specific stations
(which could be the AP for the managed mode case).
Since the threshold might be determined by the
driver (it isn't passed in right now) it will be
passed out of the driver to userspace in the event.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Luis R. Rodriguez 48124d1a91 mac80211: avoid aggregation for VO traffic
This should help with latency issues which can happen when
using aggregation.

Cc: Felix Fietkau <nbd@openwrt.org>
Cc: Matt Smith <matt.smith@atheros.com>
Cc: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:36 -05:00
Felix Fietkau 72a8a3edd6 mac80211: reduce the number of retries for nullfunc probing
Since nullfunc frames are transmitted as unicast frames, they're more
reliable than the broadcast probe requests, so we need fewer retries
to figure out whether the AP is really gone.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau 4e5ff37692 mac80211: use nullfunc instead of probe request for connection monitoring
nullfunc frames are better for connection monitoring, because probe requests
are answered even if the AP has already dropped the connection, whereas
nullfunc frames from an unassociated station will trigger a disassoc/deauth
frame from the AP (WLAN_REASON_CLASS3_FRAME_FROM_NONASSOC_STA), which allows
the station to reconnect immediately instead of waiting until it attempts to
transmit the next unicast frame.

This only works on hardware with reliable tx ACK reporting, any other hardware
needs to fall back to the probe request method.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau dd5b4cc71c cfg80211/mac80211: improve ad-hoc multicast rate handling
- store the multicast rate as an index instead of the rate value
  (reduces cpu overhead in a hotpath)
- validate the rate values (must match a bitrate in at least one sband)

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:35 -05:00
Felix Fietkau 46090979a5 mac80211: probe the AP when resuming
Check the connection by probing the AP (either using nullfunc or a
probe request). If nullfunc probing is supported and the assoc is no
longer valid, the AP will send a disassoc/deauth immediately.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Felix Fietkau 7ccc8bd759 mac80211: calculate beacon loss time accurately
Instead of using a fixed 2 second timeout, calculate beacon loss interval
from the advertised beacon interval and a frame count.  With this beacon
loss happens after N (default 7) consecutive frames are missed which
for a typical setup (100TU beacon interval) is ~700ms (or ~1/3 previous).

Signed-off-by: Sam Leffler <sleffler@chromium.org>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Felix Fietkau c8a7972c3b mac80211: restart beacon miss timer on system resume from suspend
Signed-off-by: Paul Stewart <pstew@google.com>
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:34 -05:00
Joe Perches e9c0268f02 net/wireless: Use pr_<level> and netdev_<level>
No change in output for pr_<level> prefixes.
netdev_<level> output is different, arguably improved.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:19:33 -05:00
John W. Linville d7a066c923 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-24 16:19:24 -05:00
John W. Linville ccb1435401 Revert "nl80211/mac80211: Report signal average"
This reverts commit 86107fd170.

This patch inadvertantly changed the userland ABI.

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-24 16:18:36 -05:00
Phil Blundell a27e13d370 econet: fix CVE-2010-3848
Don't declare variable sized array of iovecs on the stack since this
could cause stack overflow if msg->msgiovlen is large.  Instead, coalesce
the user-supplied data into a new buffer and use a single iovec for it.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:51:47 -08:00
Phil Blundell 16c41745c7 econet: fix CVE-2010-3850
Add missing check for capable(CAP_NET_ADMIN) in SIOCSIFADDR operation.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:49:53 -08:00
Phil Blundell fa0e846494 econet: disallow NULL remote addr for sendmsg(), fixes CVE-2010-3849
Later parts of econet_sendmsg() rely on saddr != NULL, so return early
with EINVAL if NULL was passed otherwise an oops may occur.

Signed-off-by: Phil Blundell <philb@gnu.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:49:19 -08:00
David S. Miller c39508d6f1 tcp: Make TCP_MAXSEG minimum more correct.
Use TCP_MIN_MSS instead of constant 64.

Reported-by: Min Zhang <mzhang@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:47:22 -08:00
Tom Herbert 1d24eb4815 xps: Transmit Packet Steering
This patch implements transmit packet steering (XPS) for multiqueue
devices.  XPS selects a transmit queue during packet transmission based
on configuration.  This is done by mapping the CPU transmitting the
packet to a queue.  This is the transmit side analogue to RPS-- where
RPS is selecting a CPU based on receive queue, XPS selects a queue
based on the CPU (previously there was an XPS patch from Eric
Dumazet, but that might more appropriately be called transmit completion
steering).

Each transmit queue can be associated with a number of CPUs which will
use the queue to send packets.  This is configured as a CPU mask on a
per queue basis in:

/sys/class/net/eth<n>/queues/tx-<n>/xps_cpus

The mappings are stored per device in an inverted data structure that
maps CPUs to queues.  In the netdevice structure this is an array of
num_possible_cpu structures where each structure holds and array of
queue_indexes for queues which that CPU can use.

The benefits of XPS are improved locality in the per queue data
structures.  Also, transmit completions are more likely to be done
nearer to the sending thread, so this should promote locality back
to the socket on free (e.g. UDP).  The benefits of XPS are dependent on
cache hierarchy, application load, and other factors.  XPS would
nominally be configured so that a queue would only be shared by CPUs
which are sharing a cache, the degenerative configuration woud be that
each CPU has it's own queue.

Below are some benchmark results which show the potential benfit of
this patch.  The netperf test has 500 instances of netperf TCP_RR test
with 1 byte req. and resp.

bnx2x on 16 core AMD
   XPS (16 queues, 1 TX queue per CPU)  1234K at 100% CPU
   No XPS (16 queues)                   996K at 100% CPU

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:44:20 -08:00
Tom Herbert 3853b5841c xps: Improvements in TX queue selection
In dev_pick_tx, don't do work in calculating queue
index or setting
the index in the sock unless the device has more than one queue.  This
allows the sock to be set only with a queue index of a multi-queue
device which is desirable if device are stacked like in a tunnel.

We also allow the mapping of a socket to queue to be changed.  To
maintain in order packet transmission a flag (ooo_okay) has been
added to the sk_buff structure.  If a transport layer sets this flag
on a packet, the transmit queue can be changed for the socket.
Presumably, the transport would set this if there was no possbility
of creating OOO packets (for instance, there are no packets in flight
for the socket).  This patch includes the modification in TCP output
for setting this flag.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:44:19 -08:00
Eric Dumazet bba14de987 scm: lower SCM_MAX_FD
Lower SCM_MAX_FD from 255 to 253 so that allocations for scm_fp_list are
halved. (commit f8d570a4 added two pointers in this structure)

scm_fp_dup() should not copy whole structure (and trigger kmemcheck
warnings), but only the used part. While we are at it, only allocate
needed size.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:43 -08:00
Eric Dumazet 456b61bca8 ipv6: mcast: RCU conversion
ipv6_sk_mc_lock rwlock becomes a spinlock.

readers (inet6_mc_check()) now takes rcu_read_lock() instead of read
lock. Writers dont need to disable BH anymore.

struct ipv6_mc_socklist objects are reclaimed after one RCU grace
period.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 11:16:42 -08:00
Eric Dumazet 9915672d41 af_unix: limit unix_tot_inflight
Vegard Nossum found a unix socket OOM was possible, posting an exploit
program.

My analysis is we can eat all LOWMEM memory before unix_gc() being
called from unix_release_sock(). Moreover, the thread blocked in
unix_gc() can consume huge amount of time to perform cleanup because of
huge working set.

One way to handle this is to have a sensible limit on unix_tot_inflight,
tested from wait_for_unix_gc() and to force a call to unix_gc() if this
limit is hit.

This solves the OOM and also reduce overall latencies, and should not
slowdown normal workloads.

Reported-by: Vegard Nossum <vegard.nossum@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-24 09:15:27 -08:00
Linus Torvalds 3cbaa0f7a7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  of/phylib: Use device tree properties to initialize Marvell PHYs.
  phylib: Add support for Marvell 88E1149R devices.
  phylib: Use common page register definition for Marvell PHYs.
  qlge: Fix incorrect usage of module parameters and netdev msg level
  ipv6: fix missing in6_ifa_put in addrconf
  SuperH IrDA: correct Baud rate error correction
  atl1c: Fix hardware type check for enabling OTP CLK
  net: allow GFP_HIGHMEM in __vmalloc()
  bonding: change list contact to netdev@vger.kernel.org
  e1000: fix screaming IRQ
2010-11-24 08:22:34 +09:00
Helmut Schaa 18890d4b89 mac80211: Disable hw crypto for GTKs on AP VLAN interfaces
When using AP VLAN interfaces, each VLAN interface should be in its own
broadcast domain. Hostapd achieves this by assigning different GTKs to
different AP VLAN interfaces.

However, mac80211 drivers are not aware of AP VLAN interfaces and as
such mac80211 sends the GTK to the driver in the context of the base AP
mode interface. This causes problems when multiple AP VLAN interfaces
are used since the driver will use the same key slot for the different
GTKs (there's no way for the driver to distinguish the different GTKs
from different AP VLAN interfaces). Thus, only the clients associated
to one AP VLAN interface (the one that was created last) can actually
use broadcast traffic.

Fix this by not programming any GTKs for AP VLAN interfaces into the hw
but fall back to using software crypto. The GTK for the underlying AP
interface is still sent to the driver.

That means, broadcast traffic to stations associated to an AP VLAN
interface is encrypted in software whereas broadcast traffic to
stations associated to the non-VLAN AP interface is encrypted in
hardware.

Cc: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Helmut Schaa <helmut.schaa@googlemail.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:51 -05:00
Luis R. Rodriguez b2e253cf30 cfg80211: Fix regulatory bug with multiple cards and delays
When two cards are connected with the same regulatory domain
if CRDA had a delayed response then cfg80211's own set regulatory
domain would still be the world regulatory domain. There was a bug
on cfg80211's logic such that it assumed that once you pegged a
request as the last request it was already the currently set
regulatory domain. This would mean we would race setting a stale
regulatory domain to secondary cards which had the same regulatory
domain since the alpha2 would match.

We fix this by processing each regulatory request atomically,
and only move on to the next one once we get it fully processed.
In the case CRDA is not present we will simply world roam.

This issue is only present when you have a slow system and the
CRDA processing is delayed. Because of this it is not a known
regression.

Without this fix when a delay is present with CRDA the second card
would end up with an intersected regulatory domain and not allow it
to use the channels it really is designed for. When two cards with
two different regulatory domains were inserted you'd end up
rejecting the second card's regulatory domain request.
This fails with mac80211_hswim's regtest=2 (two requests, same alpha2)
and regtest=3 (two requests, different alpha2) module parameter
options.

This was reproduced and tested against mac80211_hwsim using this
CRDA delayer:

       #!/bin/bash
       echo $COUNTRY >> /tmp/log
       sleep 2
       /sbin/crda.orig

And these regulatory tests:

       modprobe mac80211_hwsim regtest=2
       modprobe mac80211_hwsim regtest=3

Reported-by: Mark Mentovai <mark@moxienet.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:51 -05:00
Luis R. Rodriguez b0e2880b05 cfg80211: move mutex locking to reg_process_pending_hints()
This will be required in the next patch and it makes the
next patch easier to review.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:50 -05:00
Luis R. Rodriguez f333a7a2f4 cfg80211: move reg_work and reg_todo above
These will be used earlier in the next few patches.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:50 -05:00
Luis R. Rodriguez 31e99729ae cfg80211: put core regulatory request into queue
This will simplify the synchronization for pending requests.
Without this we have a race between the core and when we
restore regulatory settings, although this is unlikely
its best to just avoid that race altogether.

Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Tested-by: Mark Mentovai <mark@moxienet.com>
Tested-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-22 15:48:50 -05:00
Gustavo F. Padovan c89ad73722 Bluetooth: Fix not returning proper error in SCO
Return 0 in that situation could lead to errors in the caller.

Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-11-22 18:23:18 -02:00
Trond Myklebust 5fc43978a7 SUNRPC: Fix an infinite loop in call_refresh/call_refreshresult
If the rpcauth_refreshcred() call returns an error other than
EACCES, ENOMEM or ETIMEDOUT, we currently end up looping forever
between call_refresh and call_refreshresult.

The correct thing to do here is to exit on all errors except
EAGAIN and ETIMEDOUT, for which case we retry 3 times, then
return EACCES.

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-22 13:22:39 -05:00
Tracey Dent e5700c740d Net: wanrouter: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:16 -08:00
Tracey Dent fdb26195f4 Net: sunrpc: auth_gss: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:16 -08:00
Tracey Dent 9ed05ad3c0 Net: rxrpc: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:15 -08:00
Tracey Dent 094f2faaa2 Net: rds: Makefile: Remove deprecated items
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Also, use the ccflags-$ flag instead of EXTRA_CFLAGS because EXTRA_CFLAGS is
deprecated and should now be switched.

Last but not least, took out if-conditionals.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:15 -08:00
Tracey Dent 927a41f50c Net: phonet: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:14 -08:00
Tracey Dent 64387df8da Net: lapb: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:14 -08:00
Tracey Dent 94ee288e94 Net: irda: irnet: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:13 -08:00
Tracey Dent cd30c62024 Net: irda: irlan: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:13 -08:00
Tracey Dent f91c4ae498 Net: irda: ircomm: Makefile: Remove deprecated kbuild goal defintions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:12 -08:00
Tracey Dent 4de58dfebe Net: ipv6: netfiliter: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:12 -08:00
Tracey Dent 6b8ff8c517 Net: ipv4: netfilter: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:11 -08:00
Tracey Dent cc0bdac399 Net: econet: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:11 -08:00
Tracey Dent 22674a24b4 Net: dns_resolver: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:10 -08:00
Tracey Dent fa13bc3daa Net: ceph: Makefile: remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:10 -08:00
Tracey Dent bac14e0178 Net: can: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:09 -08:00
Tracey Dent a3106d032f Net: caif: Makefile: Remove deprecated items
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Also, use the ccflags-$ flag instead of EXTRA_CFLAGS because EXTRA_CFLAGS is
deprecated and should now be switched.

Last but not least, took out if-conditionals.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:09 -08:00
Tracey Dent 87217502d4 Net: bluetooth: Makefile: Remove deprecated kbuild goal definitions
Changed Makefile to use <modules>-y instead of <modules>-objs
because -objs is deprecated and not mentioned in
Documentation/kbuild/makefiles.txt.

Signed-off-by: Tracey Dent <tdent48227@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 08:16:08 -08:00
John Fastabend 88b2a9a3d9 ipv6: fix missing in6_ifa_put in addrconf
Fix ref count bug introduced by

commit 2de7957072
Author: Lorenzo Colitti <lorenzo@google.com>
Date:   Wed Oct 27 18:16:49 2010 +0000

ipv6: addrconf: don't remove address state on ifdown if the address
is being kept

Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-22 07:37:36 -08:00
Eric Dumazet 551eaff1b3 pktgen: allow faster module unload
Unloading pktgen module needs ~6 seconds on a 64 cpus machine, to stop
64 kthreads.

Add a pktgen_exiting variable to let kernel threads die faster, so that
kthread_stop() doesnt have to wait too long for them. This variable is
not tested in fast path.

Note : Before exiting from pktgen_thread_worker(), we must make sure
kthread_stop() is waiting for this thread to be stopped, like its done
in kernel/softirq.c

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-21 10:26:44 -08:00
Eric Dumazet 7a1c8e5ab1 net: allow GFP_HIGHMEM in __vmalloc()
We forgot to use __GFP_HIGHMEM in several __vmalloc() calls.

In ceph, add the missing flag.

In fib_trie.c, xfrm_hash.c and request_sock.c, using vzalloc() is
cleaner and allows using HIGHMEM pages as well.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-21 10:04:04 -08:00
Eric Dumazet bbce5a59e4 packet: use vzalloc()
alloc_one_pg_vec_page() is supposed to return zeroed memory, so use
vzalloc() instead of vmalloc()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-21 10:01:42 -08:00
Linus Torvalds 76db8ac45f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: fix readdir EOVERFLOW on 32-bit archs
  ceph: fix frag offset for non-leftmost frags
  ceph: fix dangling pointer
  ceph: explicitly specify page alignment in network messages
  ceph: make page alignment explicit in osd interface
  ceph: fix comment, remove extraneous args
  ceph: fix update of ctime from MDS
  ceph: fix version check on racing inode updates
  ceph: fix uid/gid on resent mds requests
  ceph: fix rdcache_gen usage and invalidate
  ceph: re-request max_size if cap auth changes
  ceph: only let auth caps update max_size
  ceph: fix open for write on clustered mds
  ceph: fix bad pointer dereference in ceph_fill_trace
  ceph: fix small seq message skipping
  Revert "ceph: update issue_seq on cap grant"
2010-11-19 15:32:22 -08:00
Linus Torvalds caf8394524 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits)
  net: fix kernel-doc for sk_filter_rcu_release
  be2net: Fix to avoid firmware update when interface is not open.
  netfilter: fix IP_VS dependencies
  net: irda: irttp: sync error paths of data- and udata-requests
  ipv6: Expose reachable and retrans timer values as msecs
  ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies
  3c59x: fix build failure on !CONFIG_PCI
  ipg.c: remove id [SUNDANCE, 0x1021]
  net: caif: spi: fix potential NULL dereference
  ath9k_htc: Avoid setting QoS control for non-QoS frames
  net: zero kobject in rx_queue_release
  net: Fix duplicate volatile warning.
  MAINTAINERS: Add stmmac maintainer
  bonding: fix a race in IGMP handling
  cfg80211: fix can_beacon_sec_chan, reenable HT40
  gianfar: fix signedness issue
  net: bnx2x: fix error value sign
  8139cp: fix checksum broken
  r8169: fix checksum broken
  rds: Integer overflow in RDS cmsg handling
  ...
2010-11-19 15:25:59 -08:00
David S. Miller 24912420e9 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/bonding/bond_main.c
	net/core/net-sysfs.c
	net/ipv6/addrconf.c
2010-11-19 13:13:47 -08:00
andrew hendry 0670b8ae66 X25: remove bkl in routing ioctls
Routing doesn't use the socket data and is protected by x25_route_list_lock

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:02 -08:00
andrew hendry 54aafbd498 X25: remove bkl in inq and outq ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:01 -08:00
andrew hendry 1ecd66bf2c X25: remove bkl in timestamp ioctls
Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:01 -08:00
andrew hendry 70be998c2b X25: pushdown bkl in ioctls
Push down the bkl in the ioctls so they can be removed one at a time.

Signed-off-by: Andrew Hendry <andrew.hendry@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 12:40:00 -08:00
Eric Dumazet c26aed40f4 filter: use reciprocal divide
At compile time, we can replace the DIV_K instruction (divide by a
constant value) by a reciprocal divide.

At exec time, the expensive divide is replaced by a multiply, a less
expensive operation on most processors.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 10:06:54 -08:00
Eric Dumazet 8c1592d68b filter: cleanup codes[] init
Starting the translated instruction to 1 instead of 0 allows us to
remove one descrement at check time and makes codes[] array init
cleaner.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 10:04:23 -08:00
Eric Dumazet 93aaae2e01 filter: optimize sk_run_filter
Remove pc variable to avoid arithmetic to compute fentry at each filter
instruction. Jumps directly manipulate fentry pointer.

As the last instruction of filter[] is guaranteed to be a RETURN, and
all jumps are before the last instruction, we dont need to check filter
bounds (number of instructions in filter array) at each iteration, so we
remove it from sk_run_filter() params.

On x86_32 remove f_k var introduced in commit 57fe93b374
(filter: make sure filters dont read uninitialized memory)

Note : We could use a CONFIG_ARCH_HAS_{FEW|MANY}_REGISTERS in order to
avoid too many ifdefs in this code.

This helps compiler to use cpu registers to hold fentry and A
accumulator.

On x86_32, this saves 401 bytes, and more important, sk_run_filter()
runs much faster because less register pressure (One less conditional
branch per BPF instruction)

# size net/core/filter.o net/core/filter_pre.o
   text    data     bss     dec     hex filename
   2948       0       0    2948     b84 net/core/filter.o
   3349       0       0    3349     d15 net/core/filter_pre.o

on x86_64 :
# size net/core/filter.o net/core/filter_pre.o
   text    data     bss     dec     hex filename
   5173       0       0    5173    1435 net/core/filter.o
   5224       0       0    5224    1468 net/core/filter_pre.o

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 09:49:59 -08:00
Randy Dunlap 0302b8622c net: fix kernel-doc for sk_filter_rcu_release
Fix kernel-doc warning for sk_filter_rcu_release():

Warning(net/core/filter.c:586): missing initial short description on line:
 * 	sk_filter_rcu_release: Release a socket filter by rcu_head

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Cc:	"David S. Miller" <davem@davemloft.net>
Cc:	netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-19 09:27:15 -08:00
Patrick McHardy dba4490d22 netfilter: fix IP_VS dependencies
When NF_CONNTRACK is enabled, IP_VS uses conntrack symbols.
Therefore IP_VS can't be linked statically when conntrack
is built modular.

Reported-by: Justin P. Mattock <justinmattock@gmail.com>
Tested-by: Justin P. Mattock <justinmattock@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 13:14:33 -08:00
Wolfram Sang 925e277f52 net: irda: irttp: sync error paths of data- and udata-requests
irttp_data_request() returns meaningful errorcodes, while irttp_udata_request()
just returns -1 in similar situations. Sync the two and the loglevels of the
accompanying output.

Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: David Miller <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 12:24:25 -08:00
Thomas Graf 18a31e1e28 ipv6: Expose reachable and retrans timer values as msecs
Expose reachable and retrans timer values in msecs instead of jiffies.
Both timer values are already exposed as msecs in the neighbour table
netlink interface.

The creation timestamp format with increased precision is kept but
cleaned up.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 12:08:36 -08:00
David S. Miller 07bfa524d4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-18 11:56:09 -08:00
Bruno Randolf 86107fd170 nl80211/mac80211: Report signal average
Extend nl80211 to report an exponential weighted moving average (EWMA) of the
signal value. Since the signal value usually fluctuates between different
packets, an average can be more useful than the value of the last packet.

This uses the recently added generic EWMA library function.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-18 14:22:20 -05:00
Thomas Graf 93908d1926 ipv6: Expose IFLA_PROTINFO timer values in msecs instead of jiffies
IFLA_PROTINFO exposes timer related per device settings in jiffies.
Change it to expose these values in msecs like the sysctl interface
does.

I did not find any users of IFLA_PROTINFO which rely on any of these
values and even if there are, they are likely already broken because
there is no way for them to reliably convert such a value to another
time format.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 11:05:01 -08:00
Eric Dumazet 57e1ab6ead igmp: refine skb allocations
IGMP allocates MTU sized skbs. This may fail for large MTU (order-2
allocations), so add a fallback to try lower sizes.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 11:02:23 -08:00
Changli Gao 4c3710afbc net: move definitions of BPF_S_* to net/core/filter.c
BPF_S_* are used internally, should not be exposed to the others.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Hagen Paul Pfeifer <hagen@jauu.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 10:59:51 -08:00
Tetsuo Handa cba328fc5e filter: Optimize instruction revalidation code.
Since repeating u16 value to u8 value conversion using switch() clause's
case statement is wasteful, this patch introduces u16 to u8 mapping table
and removes most of case statements. As a result, the size of net/core/filter.o
is reduced by about 29% on x86.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 10:58:35 -08:00
John Fastabend 9e50e3ac5a net: add priority field to pktgen
Add option to set skb priority to pktgen. Useful for testing
QOS features. Also by running pktgen on the vlan device the
qdisc on the real device can be tested.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 10:43:07 -08:00
John Fastabend 7d8e76bf9a net: zero kobject in rx_queue_release
netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).

The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().

This resolves the following error report.

ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
 [<ffffffff8121c940>] kobject_init+0x3a/0x83
 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
 [<ffffffff8107b800>] ? mark_lock+0x21/0x267
 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
 [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
 [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 09:41:40 -08:00
Gerrit Renker f72f2f4cde dccp ccid-2: whitespace fix-up
This fixes whitespace noise introduced in commit "dccp ccid-2: Algorithm to
update buffer state", 5753fdfe8b, 14 Nov 2010.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 09:37:07 -08:00
Eric Dumazet 866f3b25a2 bonding: IGMP handling cleanup
Instead of iterating in_dev->mc_list from bonding driver, its better
to call a helper function provided by igmp.c
Details of implementation (locking) are private to igmp code.

ip_mc_rejoin_group(struct ip_mc_list *im) becomes
ip_mc_rejoin_groups(struct in_device *in_dev);

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-18 09:33:19 -08:00
Mark Mentovai 09a02fdb91 cfg80211: fix can_beacon_sec_chan, reenable HT40
This follows wireless-testing 9236d838c9
("cfg80211: fix extension channel checks to initiate communication") and
fixes accidental case fall-through. Without this fix, HT40 is entirely
blocked.

Signed-off-by: Mark Mentovai <mark@moxienet.com>
Cc: stable@kernel.org
Acked-by: Luis R. Rodriguez <lrodriguez@atheros.com
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-18 11:35:05 -05:00
Johannes Berg 50a9432dae mac80211: fix powersaving clients races
The code to handle powersaving stations has a race:
when the powersave flag is lifted from a station,
we could transmit a packet that is being processed
for TX at the same time right away, even if there
are other frames queued for it. This would cause
frame reordering. To fix this, lift the flag only
under the appropriate lock that blocks TX.

Additionally, the code to allow drivers to block a
station while frames for it are on the HW queue is
never re-enabled the station, so traffic would get
stuck indefinitely. Fix this by clearing the flag
for this appropriately.

Finally, as an optimisation, don't do anything if
the driver unblocks an already unblocked station.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-17 16:19:33 -05:00
Johannes Berg 4bce22b9b8 mac80211: defines for AC numbers
In many places we've just hardcoded the
AC numbers -- which is a relic from the
original mac80211 (d80211). Add constants
for them so we know what we're talking
about.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-17 16:19:31 -05:00
Vasiliy Kulikov dda0b38692 net: ipv4: tcp_probe: cleanup snprintf() use
snprintf() returns number of bytes that were copied if there is no overflow.
This code uses return value as number of copied bytes.  Theoretically format
string '%lu.%09lu %pI4:%u %pI4:%u %d %#x %#x %u %u %u %u\n' may be expanded
up to 163 bytes.  In reality tv.tv_sec is just few bytes instead of 20, 2 ports
are just 5 bytes each instead of 10, length is 5 bytes instead of 10.  The rest
is an unstrusted input.  Theoretically if tv_sec is big then copy_to_user() would
overflow tbuf.

tbuf was increased to fit in 163 bytes.  snprintf() is used to follow return
value semantic.

Signed-off-by: Vasiliy Kulikov <segoon@openwall.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:46 -08:00
John Fastabend 9ea19481db net: zero kobject in rx_queue_release
netif_set_real_num_rx_queues() can decrement and increment
the number of rx queues. For example ixgbe does this as
features and offloads are toggled. Presumably this could
also happen across down/up on most devices if the available
resources changed (cpu offlined).

The kobject needs to be zero'd in this case so that the
state is not preserved across kobject_put()/kobject_init_and_add().

This resolves the following error report.

ixgbe 0000:03:00.0: eth2: NIC Link is Up 10 Gbps, Flow Control: RX/TX
kobject (ffff880324b83210): tried to init an initialized object, something is seriously wrong.
Pid: 1972, comm: lldpad Not tainted 2.6.37-rc18021qaz+ #169
Call Trace:
 [<ffffffff8121c940>] kobject_init+0x3a/0x83
 [<ffffffff8121cf77>] kobject_init_and_add+0x23/0x57
 [<ffffffff8107b800>] ? mark_lock+0x21/0x267
 [<ffffffff813c6d11>] net_rx_queue_update_kobjects+0x63/0xc6
 [<ffffffff813b5e0e>] netif_set_real_num_rx_queues+0x5f/0x78
 [<ffffffffa0261d49>] ixgbe_set_num_queues+0x1c6/0x1ca [ixgbe]
 [<ffffffffa0262509>] ixgbe_init_interrupt_scheme+0x1e/0x79c [ixgbe]
 [<ffffffffa0274596>] ixgbe_dcbnl_set_state+0x167/0x189 [ixgbe]

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:46 -08:00
Changli Gao 5811662b15 net: use the macros defined for the members of flowi
Use the macros defined for the members of flowi to clean the code up.

Signed-off-by: Changli Gao <xiaosuo@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:27:45 -08:00
Dan Rosenberg 218854af84 rds: Integer overflow in RDS cmsg handling
In rds_cmsg_rdma_args(), the user-provided args->nr_local value is
restricted to less than UINT_MAX.  This seems to need a tighter upper
bound, since the calculation of total iov_size can overflow, resulting
in a small sock_kmalloc() allocation.  This would probably just result
in walking off the heap and crashing when calling rds_rdma_pages() with
a high count value.  If it somehow doesn't crash here, then memory
corruption could occur soon after.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 12:20:52 -08:00
Thomas Graf b382b191ea ipv6: AF_INET6 link address family
IPv6 already exposes some address family data via netlink in the
IFLA_PROTINFO attribute if RTM_GETLINK request is sent with the
address family set to AF_INET6. We take over this format and
reuse all the code.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 11:28:26 -08:00
Thomas Graf 9f0f7272ac ipv4: AF_INET link address family
Implements the AF_INET link address family exposing the per
device configuration settings via netlink using the attribute
IFLA_INET_CONF.

The format of IFLA_INET_CONF differs depending on the direction
the attribute is sent. The attribute sent by the kernel consists
of a u32 array, basically a 1:1 copy of in_device->cnf.data[].
The attribute expected by the kernel must consist of a sequence
of nested u32 attributes, each representing a change request,
e.g.
	[IFLA_INET_CONF] = {
		[IPV4_DEVCONF_FORWARDING] = 1,
		[IPV4_DEVCONF_NOXFRM] = 0,
	}

libnl userspace API documentation and example available from:
http://www.infradead.org/~tgr/libnl/doc-git/group__link__inet.html

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 11:28:25 -08:00
Thomas Graf f8ff182c71 rtnetlink: Link address family API
Each net_device contains address family specific data such as
per device settings and statistics. We already expose this data
via procfs/sysfs and partially netlink.

The netlink method requires the requester to send one RTM_GETLINK
request for each address family it wishes to receive data of
and then merge this data itself.

This patch implements a new API which combines all address family
specific link data in a new netlink attribute IFLA_AF_SPEC.
IFLA_AF_SPEC contains a sequence of nested attributes, one for each
address family which in turn defines the structure of its own
attribute. Example:

   [IFLA_AF_SPEC] = {
       [AF_INET] = {
           [IFLA_INET_CONF] = ...,
       },
       [AF_INET6] = {
           [IFLA_INET6_FLAGS] = ...,
           [IFLA_INET6_CONF] = ...,
       }
   }

The API also allows for address families to implement a function
which parses the IFLA_AF_SPEC attribute sent by userspace to
implement address family specific link options.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 11:28:24 -08:00
Eric Paris ee58681195 network: tcp_connect should return certain errors up the stack
The current tcp_connect code completely ignores errors from sending an skb.
This makes sense in many situations (like -ENOBUFFS) but I want to be able to
immediately fail connections if they are denied by the SELinux netfilter hook.
Netfilter does not normally return ECONNREFUSED when it drops a packet so we
respect that error code as a final and fatal error that can not be recovered.

Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 10:54:35 -08:00
Eric Paris da68365004 netfilter: allow hooks to pass error code back up the stack
SELinux would like to pass certain fatal errors back up the stack.  This patch
implements the generic netfilter support for this functionality.

Based-on-patch-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 10:54:34 -08:00
Joe Perches 37d6680042 net/atm: Remove unnecessary casts of netdev_priv
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-17 10:37:52 -08:00
Arnd Bergmann 451a3c24b0 BKL: remove extraneous #include <smp_lock.h>
The big kernel lock has been removed from all these files at some point,
leaving only the #include.

Remove this too as a cleanup.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-11-17 08:59:32 -08:00
Felix Fietkau 8f0729b16a mac80211: add support for setting the ad-hoc multicast rate
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:39:08 -05:00
Felix Fietkau 885a46d0f7 cfg80211: add support for setting the ad-hoc multicast rate
Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:39:08 -05:00
Juuso Oikarinen a619a4c0e1 mac80211: Add function to get probe request template for current AP
Chipsets with hardware based connection monitoring need to autonomically
send directed probe-request frames to the AP (in the event of beacon loss,
for example.)

For the hardware to be able to do this, it requires a template for the frame
to transmit to the AP, filled in with the BSSID and SSID of the AP, but also
the supported rate IE's.

This patch adds a function to mac80211, which allows the hardware driver to
fetch this template after association, so it can be configured to the hardware.

Signed-off-by: Juuso Oikarinen <juuso.oikarinen@nokia.com>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:08 -05:00
Bruno Randolf 15d9675321 mac80211: Add antenna configuration
Allow antenna configuration by calling driver's function for it.

We disallow antenna configuration if the wiphy is already running, mainly to
make life easier for 802.11n drivers which need to recalculate HT capabilites.

Signed-off-by: Bruno Randolf <br1@einfach.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:05 -05:00
Bruno Randolf afe0cbf875 cfg80211: Add nl80211 antenna configuration
Allow setting of TX and RX antennas configuration via nl80211.

The antenna configuration is defined as a bitmap of allowed antennas to use.
This API can be used to mask out antennas which are not attached or should not
be used for other reasons like regulatory concerns or special setups.

Separate bitmaps are used for RX and TX to allow configuring different antennas
for receiving and transmitting. Each bitmap is 32 bit long, each bit
representing one antenna, starting with antenna 1 at the first bit. If an
antenna bit is set, this means the driver is allowed to use this antenna for RX
or TX respectively; if the bit is not set the hardware is not allowed to use
this antenna.

Using bitmaps has the benefit of allowing for a flexible configuration
interface which can support many different configurations and which can be used
for 802.11n as well as non-802.11n devices. Instead of relying on some hardware
specific assumptions, drivers can use this information to know which antennas
are actually attached to the system and derive their capabilities based on
that.

802.11n devices should enable or disable chains, based on which antennas are
present (If all antennas belonging to a particular chain are disabled, the
entire chain should be disabled). HT capabilities (like STBC, TX Beamforming,
Antenna selection) should be calculated based on the available chains after
applying the antenna masks. Should a 802.11n device have diversity antennas
attached to one of their chains, diversity can be enabled or disabled based on
the antenna information.

Non-802.11n drivers can use the antenna masks to select RX and TX antennas and
to enable or disable antenna diversity.

While covering chainmasks for 802.11n and the standard "legacy" modes "fixed
antenna 1", "fixed antenna 2" and "diversity" this API also allows more rare,
but useful configurations as follows:

1) Send on antenna 1, receive on antenna 2 (or vice versa). This can be used to
have a low gain antenna for TX in order to keep within the regulatory
constraints and a high gain antenna for RX in order to receive weaker signals
("speak softly, but listen harder"). This can be useful for building long-shot
outdoor links. Another usage of this setup is having a low-noise pre-amplifier
on antenna 1 and a power amplifier on the other antenna. This way transmit
noise is mostly kept out of the low noise receive channel.
(This would be bitmaps: tx 1 rx 2).

2) Another similar setup is: Use RX diversity on both antennas, but always send
on antenna 1. Again that would allow us to benefit from a higher gain RX
antenna, while staying within the legal limits.
(This would be: tx 0 rx 3).

3) And finally there can be special experimental setups in research and
development even with pre 802.11n hardware where more than 2 antennas are
available. It's good to keep the API simple, yet flexible.

Signed-off-by: Bruno Randolf <br1@einfach.org>

--
v7:	Made bitmasks 32 bit wide and rebased to latest wireless-testing.
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:05 -05:00
Arik Nemtsov f23a478075 mac80211: support hardware TX fragmentation offload
The lower driver is notified when the fragmentation threshold changes
and upon a reconfig of the interface.

If the driver supports hardware TX fragmentation, don't fragment
packets in the stack.

Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 16:37:04 -05:00
Luis R. Rodriguez 9236d838c9 cfg80211: fix extension channel checks to initiate communication
When operating in a mode that initiates communication and using
HT40 we should fail if we cannot use both primary and secondary
channels to initiate communication. Our current ht40 allowmap
only covers STA mode of operation, for beaconing modes we need
a check on the fly as the mode of operation is dynamic and
there other flags other than disable which we should read
to check if we can initiate communication.

Do not allow for initiating communication if our secondary HT40
channel has is either disabled, has a passive scan flag, a
no-ibss flag or is a radar channel. Userspace now has similar
checks but this is also needed in-kernel.

Reported-by: Jouni Malinen <jouni.malinen@atheros.com>
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-16 15:59:39 -05:00
Ulrich Weber 7d98ffd8c2 xfrm: update flowi saddr in icmp_send if unset
otherwise xfrm_lookup will fail to find correct policy

Signed-off-by: Ulrich Weber <uweber@astaro.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:43:39 -08:00
Eric Dumazet c31504dc0d udp: use atomic_inc_not_zero_hint
UDP sockets refcount is usually 2, unless an incoming frame is going to
be queued in receive or backlog queue.

Using atomic_inc_not_zero_hint() permits to reduce latency, because
processor issues less memory transactions.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:17:43 -08:00
Eric Dumazet 213b15ca81 vlan: remove ndo_select_queue() logic
Now vlan are lockless, we dont need special ndo_select_queue() logic.
dev_pick_tx() will do the multiqueue stuff on the real device transmit.

Suggested-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:17:42 -08:00
Eric Dumazet 4af429d29b vlan: lockless transmit path
vlan is a stacked device, like tunnels. We should use the lockless
mechanism we are using in tunnels and loopback.

This patch completely removes locking in TX path.

tx stat counters are added into existing percpu stat structure, renamed
from vlan_rx_stats to vlan_pcpu_stats.

Note : this partially reverts commit 2e59af3dcb (vlan: multiqueue vlan
device)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 11:15:08 -08:00
Neil Horman 0e3125c755 packet: Enhance AF_PACKET implementation to not require high order contiguous memory allocation (v4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Version 4 of this patch.

Change notes:
1) Removed extra memset.  Didn't think kcalloc added a GFP_ZERO the way kzalloc did :)

Summary:
It was shown to me recently that systems under high load were driven very deep
into swap when tcpdump was run.  The reason this happened was because the
AF_PACKET protocol has a SET_RINGBUFFER socket option that allows the user space
application to specify how many entries an AF_PACKET socket will have and how
large each entry will be.  It seems the default setting for tcpdump is to set
the ring buffer to 32 entries of 64 Kb each, which implies 32 order 5
allocation.  Thats difficult under good circumstances, and horrid under memory
pressure.

I thought it would be good to make that a bit more usable.  I was going to do a
simple conversion of the ring buffer from contigous pages to iovecs, but
unfortunately, the metadata which AF_PACKET places in these buffers can easily
span a page boundary, and given that these buffers get mapped into user space,
and the data layout doesn't easily allow for a change to padding between frames
to avoid that, a simple iovec change is just going to break user space ABI
consistency.

So I've done this, I've added a three tiered mechanism to the af_packet set_ring
socket option.  It attempts to allocate memory in the following order:

1) Using __get_free_pages with GFP_NORETRY set, so as to fail quickly without
digging into swap

2) Using vmalloc

3) Using __get_free_pages with GFP_NORETRY clear, causing us to try as hard as
needed to get the memory

The effect is that we don't disturb the system as much when we're under load,
while still being able to conduct tcpdumps effectively.

Tested successfully by me.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Reported-by: Maciej Żenczykowski <zenczykowski@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 10:26:47 -08:00
Wolfram Sang 4c62ab9c53 irda: irttp: allow zero byte packets
Sending zero byte packets is not neccessarily an error (AF_INET accepts it,
too), so just apply a shortcut. This was discovered because of a non-working
software with WINE. See

  http://bugs.winehq.org/show_bug.cgi?id=19397#c86
  http://thread.gmane.org/gmane.linux.irda.general/1643

for very detailed debugging information and a testcase. Kudos to Wolfgang for
those!

Reported-by: Wolfgang Schwotzer <wolfgang.schwotzer@gmx.net>
Signed-off-by: Wolfram Sang <w.sang@pengutronix.de>
Tested-by: Mike Evans <mike.evans@cardolan.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 09:50:47 -08:00
John Fastabend 9d82ca98f7 ipv6: fix missing in6_ifa_put in addrconf
Fix ref count bug introduced by

commit 2de7957072
Author: Lorenzo Colitti <lorenzo@google.com>
Date:   Wed Oct 27 18:16:49 2010 +0000

ipv6: addrconf: don't remove address state on ifdown if the address
is being kept

Fix logic so that addrconf_ifdown() decrements the inet6_ifaddr
refcnt correctly with in6_ifa_put().

Reported-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-16 09:24:58 -08:00
David S. Miller b5e4156743 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 2010-11-16 09:17:12 -08:00
Jesper Juhl 94f58df8e5 SUNRPC: Simplify rpc_alloc_iostats by removing pointless local variable
Hi,

We can simplify net/sunrpc/stats.c::rpc_alloc_iostats() a bit by getting
rid of the unneeded local variable 'new'.

Please CC me on replies.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-11-16 11:58:51 -05:00
David S. Miller 6b35308850 net: Export netif_get_vlan_features().
ERROR: "netif_get_vlan_features" [drivers/net/xen-netfront.ko] undefined!

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 20:15:03 -08:00
Eric Dumazet ec1e5610c0 bridge: add RCU annotations to bridge port lookup
br_port_get() renamed to br_port_get_rtnl() to make clear RTNL is held.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:18 -08:00
stephen hemminger b5ed54e94d bridge: fix RCU races with bridge port
The macro br_port_exists() is not enough protection when only
RCU is being used. There is a tiny race where other CPU has cleared port
handler hook, but is bridge port flag might still be set.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:17 -08:00
Eric Dumazet a386f99025 bridge: add proper RCU annotation to should_route_hook
Add br_should_route_hook_t typedef, this is the only way we can
get a clean RCU implementation for function pointer.

Move route_hook to location where it is used.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:16 -08:00
Eric Dumazet e805168800 bridge: add RCU annotation to bridge multicast table
Add modern __rcu annotatations to bridge multicast table.
Use newer hlist macros to avoid direct access to hlist internals.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:13:16 -08:00
Joe Perches 8a22c99a80 net/ipv6/mcast.c: Remove unnecessary semicolons
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 11:07:16 -08:00
David S. Miller 6c1b6c6b87 Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2010-11-15 10:59:49 -08:00
Tom Herbert fe8222406c net: Simplify RX queue allocation
This patch move RX queue allocation to alloc_netdev_mq and freeing of
the queues to free_netdev (symmetric to TX queue allocation).  Each
kobject RX queue takes a reference to the queue's device so that the
device can't be freed before all the kobjects have been released-- this
obviates the need for reference counts specific to RX queues.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:57:28 -08:00
Tom Herbert ed9af2e839 net: Move TX queue allocation to alloc_netdev_mq
TX queues are now allocated in alloc_netdev_mq and freed in
free_netdev.

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:56:54 -08:00
Timo Teräs cc9ff19da9 xfrm: use gre key as flow upper protocol info
The GRE Key field is intended to be used for identifying an individual
traffic flow within a tunnel. It is useful to be able to have XFRM
policy selector matches to have different policies for different
GRE tunnels.

Signed-off-by: Timo Teräs <timo.teras@iki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:44:04 -08:00
David S. Miller e1f2d8c2cc vlan: Fix build warning in vlandev_seq_show()
net/8021q/vlanproc.c: In function 'vlandev_seq_show':
net/8021q/vlanproc.c:283:20: warning: unused variable 'fmt'

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:37:30 -08:00
Jesper Juhl ffa56e540c mac80211: Remove redundant checks for NULL before calls to crypto_free_cipher()
crypto_free_cipher() is a wrapper around crypto_free_tfm() which is a
wrapper around crypto_destroy_tfm() and the latter can handle being passed
a NULL pointer, so checking for NULL in the
ieee80211_aes_key_free()/ieee80211_aes_cmac_key_free() wrappers around
crypto_free_cipher() is pointless and just increase object code size
needlesly and makes us execute extra test/branch instructions that we
don't need.
Btw; don't we have to many wrappers around wrappers ad nauseam here?
Anyway, this patch removes the redundant conditionals.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:26:11 -05:00
Eliad Peller 07caf9d6c9 mac80211: refactor debugfs function generation code
refactor mac80211 debugfs code by using a format&copy function, instead of
duplicating the code for each generated function.

this change reduces about 600B from mac80211.ko

Signed-off-by: Eliad Peller <eliad@wizery.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:48 -05:00
Felix Fietkau c7317e41df mac80211: minstrel_ht - reduce the overhead of rate sampling
- reduce the number of retransmission attempts for sample rates
- sample lower rates less often
- do not use RTS/CTS for sampling frames
- increase the time between sampling attempts

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:21 -05:00
Luis R. Rodriguez d91e41b690 cfg80211: prefix REG_DBG_PRINT() with cfg80211
Everyone's doing it, its the cool thing.

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:15 -05:00
Luis R. Rodriguez e702d3cf29 cfg80211: add debug print when processing a channel
In the worst case you are seeing really odd things you want
more information than what is provided right now, for those
that insist and want debug info through CONFIG_CFG80211_REG_DEBUG
provide a print of when we are processing a channel and with what
regulatory rule.

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:14 -05:00
Luis R. Rodriguez a65185367f cfg80211: add debug print when disabling a channel on a custom regd
Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:13 -05:00
Luis R. Rodriguez 926a0a094d cfg80211: add debug prints for when we ignore regulatory hints
This can help with debugging issues. You will only see
these with CONFIG_CFG80211_REG_DEBUG enabled.

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: Senthil Balasubramanian <senthilkumar@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:12 -05:00
Luis R. Rodriguez ca4ffe8f28 cfg80211: fix disabling channels based on hints
After a module loads you will have loaded the world roaming regulatory
domain or a custom regulatory domain. Further regulatory hints are
welcomed and should be respected unless the regulatory hint is coming
from a country IE as the IEEE spec allows for a country IE to be a subset
of what is allowed by the local regulatory agencies.

So disable all channels that do not fit a regulatory domain sent
from a unless the hint is from a country IE and the country IE had
no information about the band we are currently processing.

This fixes a few regulatory issues, for example for drivers that depend
on CRDA and had no 5 GHz freqencies allowed were not properly disabling
5 GHz at all, furthermore it also allows users to restrict devices
further as was intended.

If you recieve a country IE upon association we will also disable the
channels that are not allowed if the country IE had at least one
channel on the respective band we are procesing.

This was the original intention behind this design but it was
completely overlooked...

Cc: David Quan <david.quan@atheros.com>
Cc: Jouni Malinen <jouni.malinen@atheros.com>
cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:10 -05:00
Luis R. Rodriguez 749b527b21 cfg80211: fix allowing country IEs for WIPHY_FLAG_STRICT_REGULATORY
We should be enabling country IE hints for WIPHY_FLAG_STRICT_REGULATORY
even if we haven't yet recieved regulatory domain hint for the driver
if it needed one. Without this Country IEs are not passed on to drivers
that have set WIPHY_FLAG_STRICT_REGULATORY, today this is just all
Atheros chipset drivers: ath5k, ath9k, ar9170, carl9170.

This was part of the original design, however it was completely
overlooked...

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Cc: stable@kernel.org
Signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:09 -05:00
Luis R. Rodriguez 7ca43d03b1 cfg80211: pass the reg hint initiator to helpers
This is required later.

Cc: Easwar Krishnan <easwar.krishnan@atheros.com>
Cc: stable@kernel.org
signed-off-by: Luis R. Rodriguez <lrodriguez@atheros.com>

Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:08 -05:00
Stephen Hemminger 2e48928d8a rfkill: remove dead code
The following code is defined but never used.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-15 13:24:06 -05:00
John Fastabend 636e19a342 net: consolidate 8021q tagging
Now that VLAN packets are tagged in dev_hard_start_xmit()
at the bottom of the stack we no longer need to tag them
in the 8021Q module (Except in the !VLAN_FLAG_REORDER_HDR
case).

This allows the accel path and non accel paths to be consolidated.
Here the vlan_tci in the skb is always set and we allow the
stack to add the actual tag in dev_hard_start_xmit().

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:11:40 -08:00
John Fastabend 8f5549f381 net: remove check for headroom in vlan_dev_create
It is possible for the headroom to be smaller then the
hard_header_len for a short period of time after toggling
the vlan offload setting.

This is not a hard error and skb_cow_head is called in
__vlan_put_tag() to resolve this.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:11:40 -08:00
John Fastabend 029f5fc31c 8021q: set hard_header_len when VLAN offload features are toggled
Toggling the vlan tx|rx hw offloads needs to set the hard_header_len
as well otherwise we end up using LL_RESERVED_SPACE incorrectly.
This results in pskb_expand_head() being used unnecessarily.

Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 10:11:39 -08:00
Jesse Gross 58e998c6d2 offloading: Force software GSO for multiple vlan tags.
We currently use vlan_features to check for TSO support if there is
a vlan tag.  However, it's quite likely that the NIC is not able to
do TSO when there is an arbitrary number of tags.  Therefore if there
is more than one tag (in-band or out-of-band), fall back to software
emulation.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 09:22:53 -08:00
Jesse Gross c8d5bcd1af offloading: Support multiple vlan tags in GSO.
We assume that hardware TSO can't support multiple levels of vlan tags
but we allow it to be done.  Therefore, enable GSO to parse these tags
so we can fallback to software.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 09:22:53 -08:00
Jesse Gross e1e78db628 offloading: Make scatter/gather more tolerant of vlans.
When checking if it is necessary to linearize a packet, we currently
use vlan_features if the packet contains either an in-band or out-
of-band vlan tag.  However, in-band tags aren't special in any way
for scatter/gather since they are part of the packet buffer and are
simply more data to DMA.  Therefore, only use vlan_features for out-
of-band tags, which could potentially have some interaction with
scatter/gather.

Signed-off-by: Jesse Gross <jesse@nicira.com>
CC: Ben Hutchings <bhutchings@solarflare.com>
Reviewed-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 09:22:52 -08:00
David S. Miller d9aa93804e ipv4: Fix build with multicast disabled.
net/ipv4/igmp.c: In function 'ip_mc_inc_group':
net/ipv4/igmp.c:1228: error: implicit declaration of function 'for_each_pmc_rtnl'
net/ipv4/igmp.c:1228: error: expected ';' before '{' token
net/ipv4/igmp.c: In function 'ip_mc_unmap':
net/ipv4/igmp.c:1333: error: expected ';' before 'igmp_group_dropped'
 ...

Move for_each_pmc_rcu and for_each_pmc_rtnl macro definitions
outside of multicast ifdef protection.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-15 08:52:02 -08:00
Gerrit Renker 7e87fe8430 dccp ccid-2: Separate option parsing from CCID processing
This patch replaces an almost identical replication of code: large parts
of dccp_parse_options() re-appeared as ccid2_ackvector() in ccid2.c.

Apart from the duplication, this caused two more problems:
 1. CCIDs should not need to be concerned with parsing header options;
 2. one can not assume that Ack Vectors appear as a contiguous area within an
    skb, it is legal to insert other options and/or padding in between. The
    current code would throw an error and stop reading in such a case.

Since Ack Vectors provide CCID-specific information, they are now processed
by the CCID directly, separating this functionality from the main DCCP code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-15 07:12:01 +01:00
Gerrit Renker 52394eecec dccp ccid-2: Remove old infrastructure
This removes
 * functions for which updates have been provided in the preceding patches and
 * the @av_vec_len field - it is no longer necessary since the buffer length is
   now always computed dynamically.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-15 07:12:00 +01:00
Gerrit Renker d83447f094 dccp ccid-2: Schedule Sync as out-of-band mechanism
The problem with Ack Vectors is that
  i) their length is variable and can in principle grow quite large,
 ii) it is hard to predict exactly how large they will be.

Due to the second point it seems not a good idea to reduce the MPS; in
particular when on average there is enough room for the Ack Vector and an
increase in length is momentarily due to some burst loss, after which the
Ack Vector returns to its normal/average length.

The solution taken by this patch is to subtract a minimum-expected Ack Vector
length from the MPS, and to defer any larger Ack Vectors onto a separate
Sync - but only if indeed there is no space left on the skb.

This patch provides the infrastructure to schedule Sync-packets for transporting
(urgent) out-of-band data. Its signalling is quicker than scheduling an Ack, since
it does not need to wait for new application data.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-15 07:12:00 +01:00
Gerrit Renker 18219463c8 dccp ccid-2: Consolidate Ack-Vector processing within main DCCP module
This aggregates Ack Vector processing (handling input and clearing old state)
into one function, for the following reasons and benefits:
 * all Ack Vector-specific processing is now in one place;
 * duplicated code is removed;
 * ensuring sanity: from an Ack Vector point of view, it is better to clear the
                    old state first before entering new state;
 * Ack Event handling happens mostly within the CCIDs, not the main DCCP module.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-15 07:11:59 +01:00
Gerrit Renker 3802408644 dccp ccid-2: Update code for the Ack Vector input/registration routine
This patch updates the code which registers new packets as received, using the
new circular buffer interface. It contributes a new algorithm which
 * supports both tail/head pointers and buffer wrap-around and
 * deals with overflow (head/tail move in lock-step).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-15 07:11:59 +01:00
Gerrit Renker 5753fdfe8b dccp ccid-2: Algorithm to update buffer state
This provides a routine to consistently update the buffer state when the
peer acknowledges receipt of Ack Vectors; updating state in the list of Ack
Vectors as well as in the circular buffer.

While based on RFC 4340, several additional (and necessary) precautions were
added to protect the consistency of the buffer state. These additions are
essential, since analysis and experience showed that the basic algorithm was
insufficient for this task (which lead to problems that were hard to debug).

The algorithm now
 * deals with HC-sender acknowledging to HC-receiver and vice versa,
 * keeps track of the last unacknowledged but received seqno in tail_ackno,
 * has special cases to reset the overflow condition when appropriate,
 * is protected against receiving older information (would mess up buffer state).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-15 07:11:59 +01:00
David S. Miller c25ecd0a21 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-11-14 11:57:05 -08:00
Linus Torvalds 9457b24a09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (66 commits)
  can-bcm: fix minor heap overflow
  gianfar: Do not call device_set_wakeup_enable() under a spinlock
  ipv6: Warn users if maximum number of routes is reached.
  docs: Add neigh/gc_thresh3 and route/max_size documentation.
  axnet_cs: fix resume problem for some Ax88790 chip
  ipv6: addrconf: don't remove address state on ifdown if the address is being kept
  tcp: Don't change unlocked socket state in tcp_v4_err().
  x25: Prevent crashing when parsing bad X.25 facilities
  cxgb4vf: add call to Firmware to reset VF State.
  cxgb4vf: Fail open if link_start() fails.
  cxgb4vf: flesh out PCI Device ID Table ...
  cxgb4vf: fix some errors in Gather List to skb conversion
  cxgb4vf: fix bug in Generic Receive Offload
  cxgb4vf: don't implement trivial (and incorrect) ndo_select_queue()
  ixgbe: Look inside vlan when determining offload protocol.
  bnx2x: Look inside vlan when determining checksum proto.
  vlan: Add function to retrieve EtherType from vlan packets.
  virtio-net: init link state correctly
  ucc_geth: Fix deadlock
  ucc_geth: Do not bring the whole IF down when TX failure.
  ...
2010-11-12 17:17:55 -08:00
Oliver Hartkopp 0597d1b99f can-bcm: fix minor heap overflow
On 64-bit platforms the ASCII representation of a pointer may be up to 17
bytes long. This patch increases the length of the buffer accordingly.

http://marc.info/?l=linux-netdev&m=128872251418192&w=2

Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
CC: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 14:07:14 -08:00
Ben Greear 4038565327 ipv6: Warn users if maximum number of routes is reached.
This gives users at least some clue as to what the problem
might be and how to go about fixing it.

Signed-off-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 14:03:24 -08:00
Lorenzo Colitti 2de7957072 ipv6: addrconf: don't remove address state on ifdown if the address is being kept
Currently, addrconf_ifdown does not delete statically configured IPv6
addresses when the interface is brought down. The intent is that when
the interface comes back up the address will be usable again. However,
this doesn't actually work, because the system stops listening on the
corresponding solicited-node multicast address, so the address cannot
respond to neighbor solicitations and thus receive traffic. Also, the
code notifies the rest of the system that the address is being deleted
(e.g, RTM_DELADDR), even though it is not. Fix it so that none of this
state is updated if the address is being kept on the interface.

Tested: Added a statically configured IPv6 address to an interface,
started ping, brought link down, brought link up again. When link came
up ping kept on going and "ip -6 maddr" showed that the host was still
subscribed to there

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 13:44:24 -08:00
David S. Miller 8f49c2703b tcp: Don't change unlocked socket state in tcp_v4_err().
Alexey Kuznetsov noticed a regression introduced by
commit f1ecd5d9e7
("Revert Backoff [v3]: Revert RTO on ICMP destination unreachable")

The RTO and timer modification code added to tcp_v4_err()
doesn't check sock_owned_by_user(), which if true means we
don't have exclusive access to the socket and therefore cannot
modify it's critical state.

Just skip this new code block if sock_owned_by_user() is true
and eliminate the now superfluous sock_owned_by_user() code
block contained within.

Reported-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
CC: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
2010-11-12 13:35:00 -08:00
Eric Dumazet 190683a9d5 net: net_families __rcu annotations
Use modern RCU API / annotations for net_families array.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 13:27:25 -08:00
Eric Dumazet 1d7138de87 igmp: RCU conversion of in_dev->mc_list
in_dev->mc_list is protected by one rwlock (in_dev->mc_list_lock).

This can easily be converted to a RCU protection.

Writers hold RTNL, so mc_list_lock is removed, not replaced by a
spinlock.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cypher Wu <cypher.w@gmail.com>
Cc: Américo Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 13:18:57 -08:00
Dan Rosenberg 5ef41308f9 x25: Prevent crashing when parsing bad X.25 facilities
Now with improved comma support.

On parsing malformed X.25 facilities, decrementing the remaining length
may cause it to underflow.  Since the length is an unsigned integer,
this will result in the loop continuing until the kernel crashes.

This patch adds checks to ensure decrementing the remaining length does
not cause it to wrap around.

Signed-off-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 12:44:42 -08:00
Mariusz Kozlowski 1f18b7176e net: Fix header size check for GSO case in recvmsg (af_packet)
Parameter 'len' is size_t type so it will never get negative.

Signed-off-by: Mariusz Kozlowski <mk@lab.zgora.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 11:06:46 -08:00
David S. Miller 7c13a0d9a1 Merge git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2010-11-12 11:04:26 -08:00
Thomas Graf 369cf77a6a rtnetlink: Fix message size calculation for link messages
nlmsg_total_size() calculates the length of a netlink message
including header and alignment. nla_total_size() calculates the
space an individual attribute consumes which was meant to be used
in this context.

Also, ensure to account for the attribute header for the
IFLA_INFO_XSTATS attribute as implementations of get_xstats_size()
seem to assume that we do so.

The addition of two message headers minus the missing attribute
header resulted in a calculated message size that was larger than
required. Therefore we never risked running out of skb tailroom.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-12 10:53:09 -08:00
Shan Wei 22e091e525 netfilter: ipv6: fix overlap check for fragments
The type of FRAG6_CB(prev)->offset is int, skb->len is *unsigned* int,
and offset is int.

Without this patch, type conversion occurred to this expression, when
(FRAG6_CB(prev)->offset + prev->len) is less than offset.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-12 08:51:55 +01:00
David S. Miller c753796769 ipv4: Make rt->fl.iif tests lest obscure.
When we test rt->fl.iif against zero, we're seeing if it's
an output or an input route.

Make that explicit with some helper functions.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 17:07:48 -08:00
David S. Miller ed1deb7021 Merge branch 'dccp' of git://eden-feed.erg.abdn.ac.uk/net-next-2.6 2010-11-11 10:43:30 -08:00
Eric Dumazet 72cdd1d971 net: get rid of rtable->idev
It seems idev field in struct rtable has no special purpose, but adding
extra atomic ops.

We hold refcounts on the device itself (using percpu data, so pretty
cheap in current kernel).

infiniband case is solved using dst.dev instead of idev->dev

Removal of this field means routing without route cache is now using
shared data, percpu data, and only potential contention is a pair of
atomic ops on struct neighbour per forwarded packet.

About 5% speedup on routing test.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-11 10:29:40 -08:00
David S. Miller 8877870f8a Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-11-10 22:15:31 -08:00
David S. Miller 7a1abd08d5 tcp: Increase TCP_MAXSEG socket option minimum.
As noted by Steve Chen, since commit
f5fff5dc8a ("tcp: advertise MSS
requested by user") we can end up with a situation where
tcp_select_initial_window() does a divide by a zero (or
even negative) mss value.

The problem is that sometimes we effectively subtract
TCPOLEN_TSTAMP_ALIGNED and/or TCPOLEN_MD5SIG_ALIGNED from the mss.

Fix this by increasing the minimum from 8 to 64.

Reported-by: Steve Chen <schen@mvista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10 21:35:37 -08:00
Gerrit Renker b3d14bff12 dccp ccid-2: Implementation of circular Ack Vector buffer with overflow handling
This completes the implementation of a circular buffer for Ack Vectors, by
extending the current (linear array-based) implementation.  The changes are:

 (a) An `overflow' flag to deal with the case of overflow. As before, dynamic
     growth of the buffer will not be supported; but code will be added to deal
     robustly with overflowing Ack Vector buffers.

 (b) A `tail_seqno' field. When naively implementing the algorithm of Appendix A
     in RFC 4340, problems arise whenever subsequent Ack Vector records overlap,
     which can bring the entire run length calculation completely out of synch.
     (This is documented on http://www.erg.abdn.ac.uk/users/gerrit/dccp/notes/\
                                             ack_vectors/tracking_tail_ackno/ .)
 (c) The buffer length is now computed dynamically (i.e. current fill level),
     as the span between head to tail.

As a result, dccp_ackvec_pending() is now simpler - the #ifdef is no longer
necessary since buf_empty is always true when IP_DCCP_ACKVEC is not configured.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-10 21:21:35 +01:00
Gerrit Renker 7d87093660 dccp ccid-2: Separate internals of Ack Vectors from option-parsing code
This patch
 * separates Ack Vector housekeeping code from option-insertion code;
 * shifts option-specific code from ackvec.c into options.c;
 * introduces a dedicated routine to take care of the Ack Vector records;
 * simplifies the dccp_ackvec_insert_avr() routine: the BUG_ON was redundant,
   since the list is automatically arranged in descending order of ack_seqno.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-10 21:21:02 +01:00
Gerrit Renker f17a37c9b8 dccp ccid-2: Ack Vector interface clean-up
This patch brings the Ack Vector interface up to date. Its main purpose is
to lay the basis for the subsequent patches of this set, which will use the
new data structure fields and routines.

There are no real algorithmic changes, rather an adaptation:

 (1) Replaced the static Ack Vector size (2) with a #define so that it can
     be adapted (with low loss / Ack Ratio, a value of 1 works, so 2 seems
     to be sufficient for the moment) and added a solution so that computing
     the ECN nonce will continue to work - even with larger Ack Vectors.

 (2) Replaced the #defines for Ack Vector states with a complete enum.

 (3) Replaced #defines to compute Ack Vector length and state with general
     purpose routines (inlines), and updated code to use these.

 (4) Added a `tail' field (conversion to circular buffer in subsequent patch).

 (5) Updated the (outdated) documentation for Ack Vector struct.

 (6) All sequence number containers now trimmed to 48 bits.

 (7) Removal of unused bits:
     * removed dccpav_ack_nonce from struct dccp_ackvec, since this is already
       redundantly stored in the `dccpavr_ack_nonce' (of Ack Vector record);
     * removed Elapsed Time for Ack Vectors (it was nowhere used);
     * replaced semantics of dccpavr_sent_len with dccpavr_ack_runlen, since
       the code needs to be able to remember the old run length;
     * reduced the de-/allocation routines (redundant / duplicate tests).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
2010-11-10 21:20:07 +01:00
Eric Dumazet 8d987e5c75 net: avoid limits overflow
Robin Holt tried to boot a 16TB machine and found some limits were
reached : sysctl_tcp_mem[2], sysctl_udp_mem[2]

We can switch infrastructure to use long "instead" of "int", now
atomic_long_t primitives are available for free.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reported-by: Robin Holt <holt@sgi.com>
Reviewed-by: Robin Holt <holt@sgi.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10 12:12:00 -08:00
Vasiliy Kulikov 67286640f6 net: packet: fix information leak to userland
packet_getname_spkt() doesn't initialize all members of sa_data field of
sockaddr struct if strlen(dev->name) < 13.  This structure is then copied
to userland.  It leads to leaking of contents of kernel stack memory.
We have to fully fill sa_data with strncpy() instead of strlcpy().

The same with packet_getname(): it doesn't initialize sll_pkttype field of
sockaddr_ll.  Set it to zero.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10 12:09:10 -08:00
David S. Miller 57fe93b374 filter: make sure filters dont read uninitialized memory
There is a possibility malicious users can get limited information about
uninitialized stack mem array. Even if sk_run_filter() result is bound
to packet length (0 .. 65535), we could imagine this can be used by
hostile user.

Initializing mem[] array, like Dan Rosenberg suggested in his patch is
expensive since most filters dont even use this array.

Its hard to make the filter validation in sk_chk_filter(), because of
the jumps. This might be done later.

In this patch, I use a bitmap (a single long var) so that only filters
using mem[] loads/stores pay the price of added security checks.

For other filters, additional cost is a single instruction.

[ Since we access fentry->k a lot now, cache it in a local variable
  and mark filter entry pointer as const. -DaveM ]

Reported-by: Dan Rosenberg <drosenberg@vsecurity.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10 10:38:24 -08:00
Vasiliy Kulikov fe10ae5338 net: ax25: fix information leak to userland
Sometimes ax25_getname() doesn't initialize all members of fsa_digipeater
field of fsa struct, also the struct has padding bytes between
sax25_call and sax25_ndigis fields.  This structure is then copied to
userland.  It leads to leaking of contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-10 10:14:33 -08:00
Sage Weil c5c6b19d4b ceph: explicitly specify page alignment in network messages
The alignment used for reading data into or out of pages used to be taken
from the data_off field in the message header.  This only worked as long
as the page alignment matched the object offset, breaking direct io to
non-page aligned offsets.

Instead, explicitly specify the page alignment next to the page vector
in the ceph_msg struct, and use that instead of the message header (which
probably shouldn't be trusted).  The alloc_msg callback is responsible for
filling in this field properly when it sets up the page vector.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-09 12:43:17 -08:00
Sage Weil b7495fc2ff ceph: make page alignment explicit in osd interface
We used to infer alignment of IOs within a page based on the file offset,
which assumed they matched.  This broke with direct IO that was not aligned
to pages (e.g., 512-byte aligned IO).  We were also trusting the alignment
specified in the OSD reply, which could have been adjusted by the server.

Explicitly specify the page alignment when setting up OSD IO requests.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-09 12:43:12 -08:00
Sage Weil e98b6fed84 ceph: fix comment, remove extraneous args
The offset/length arguments aren't used.

Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-09 12:24:53 -08:00
Eric Dumazet 332dd96f7a net/dst: dst_dev_event() called after other notifiers
Followup of commit ef885afbf8 (net: use rcu_barrier() in
rollback_registered_many)

dst_dev_event() scans a garbage dst list that might be feeded by various
network notifiers at device dismantle time.

Its important to call dst_dev_event() after other notifiers, or we might
enter the infamous msleep(250) in netdev_wait_allrefs(), and wait one
second before calling again call_netdevice_notifiers(NETDEV_UNREGISTER,
dev) to properly remove last device references.

Use priority -10 to let dst_dev_notifier be called after other network
notifiers (they have the default 0 priority)

Reported-by: Ben Greear <greearb@candelatech.com>
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Reported-by: Octavian Purdila <opurdila@ixiacom.com>
Reported-by: Benjamin LaHaise <bcrl@kvack.org>
Tested-by: Ben Greear <greearb@candelatech.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-09 12:17:16 -08:00
Kulikov Vasiliy 88f8a5e3e7 net: tipc: fix information leak to userland
Structure sockaddr_tipc is copied to userland with padding bytes after
"id" field in union field "name" unitialized.  It leads to leaking of
contents of kernel stack memory.  We have to initialize them to zero.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-09 09:25:46 -08:00
Joe Perches 2af6fd8b18 net/ipv4/tcp.c: Update WARN uses
Coalesce long formats.
Align arguments.
Remove KERN_<level>.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-09 09:22:32 -08:00
Joe Perches b194a3674f net/core/dev.c: Update WARN uses
Coalesce long formats.
Add missing newlines.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-09 09:22:31 -08:00
Eric Dumazet 18943d292f inet: fix ip_mc_drop_socket()
commit 8723e1b4ad (inet: RCU changes in inetdev_by_index())
forgot one call site in ip_mc_drop_socket()

We should not decrease idev refcount after inetdev_by_index() call,
since refcount is not increased anymore.

Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de>
Reported-by: Miles Lane <miles.lane@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-09 08:26:42 -08:00
Luiz Augusto von Dentz 63ce0900d7 Bluetooth: fix not setting security level when creating a rfcomm session
This cause 'No Bonding' to be used if userspace has not yet been paired
with remote device since the l2cap socket used to create the rfcomm
session does not have any security level set.

Signed-off-by: Luiz Augusto von Dentz <luiz.dentz-von@nokia.com>
Acked-by: Ville Tervo <ville.tervo@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-11-09 00:56:10 -02:00
Gustavo F. Padovan 4f8b691c9f Bluetooth: fix endianness conversion in L2CAP
Last commit added a wrong endianness conversion. Fixing that.

Reported-by: Harvey Harrison <harvey.harrison@gmail.com>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-11-09 00:56:09 -02:00
steven miao bfaaeb3ed5 Bluetooth: fix unaligned access to l2cap conf data
In function l2cap_get_conf_opt() and l2cap_add_conf_opt() the address of
opt->val sometimes is not at the edge of 2-bytes/4-bytes, so 2-bytes/4 bytes
access will cause data misalignment exeception.  Use get_unaligned_le16/32
and put_unaligned_le16/32 function to avoid data misalignment execption.

Signed-off-by: steven miao <realmz6@gmail.com>
Signed-off-by: Mike Frysinger <vapier@gentoo.org>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-11-09 00:56:00 -02:00
Johan Hedberg bdb7524a75 Bluetooth: Fix non-SSP auth request for HIGH security level sockets
When initiating dedicated bonding a L2CAP raw socket with HIGH security
level is used. The kernel is supposed to trigger the authentication
request in this case but this doesn't happen currently for non-SSP
(pre-2.1) devices. The reason is that the authentication request happens
in the remote extended features callback which never gets called for
non-SSP devices. This patch fixes the issue by requesting also
authentiation in the (normal) remote features callback in the case of
non-SSP devices.

This rule is applied only for HIGH security level which might at first
seem unintuitive since on the server socket side MEDIUM is already
enough for authentication. However, for the clients we really want to
prefer the server side to decide the authentication requrement in most
cases, and since most client sockets use MEDIUM it's better to be
avoided on the kernel side for these sockets. The important socket to
request it for is the dedicated bonding one and that socket uses HIGH
security level.

The patch is based on the initial investigation and patch proposal from
Andrei Emeltchenko <endrei.emeltchenko@nokia.com>.

Signed-off-by: Johan Hedberg <johan.hedberg@nokia.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-11-09 00:55:27 -02:00
Randy Dunlap 96c99b473a Bluetooth: fix hidp kconfig dependency warning
Fix kconfig dependency warning to satisfy dependencies:

warning: (BT_HIDP && NET && BT && BT_L2CAP && INPUT || USB_HID && HID_SUPPORT && USB && INPUT) selects HID which has unmet direct dependencies (HID_SUPPORT && INPUT)

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Acked-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Gustavo F. Padovan <padovan@profusion.mobi>
2010-11-09 00:55:27 -02:00
Brian Cavagnolo 352ffad646 mac80211: unset SDATA_STATE_OFFCHANNEL when cancelling a scan
For client STA interfaces, ieee80211_do_stop unsets the relevant
interface's SDATA_STATE_RUNNING state bit prior to cancelling an
interrupted scan.  When ieee80211_offchannel_return is invoked as
part of cancelling the scan, it doesn't bother unsetting the
SDATA_STATE_OFFCHANNEL bit because it sees that the interface is
down.  Normally this doesn't matter because when the client STA
interface is brought back up, it will probably issue a scan.  But
in some cases (e.g., the user changes the interface type while it
is down), the SDATA_STATE_OFFCHANNEL bit will remain set.  This
prevents the interface queues from being started.  So we
cancel the scan before unsetting the SDATA_STATE_RUNNING bit.

Signed-off-by: Brian Cavagnolo <brian@cozybit.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-08 16:53:47 -05:00
Felix Fietkau 3cc25e510d cfg80211: fix a crash in dev lookup on dump commands
IS_ERR and PTR_ERR were called with the wrong pointer, leading to a
crash when cfg80211_get_dev_from_ifindex fails.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-11-08 16:53:47 -05:00
Eric Dumazet 973a34aa85 af_unix: optimize unix_dgram_poll()
unix_dgram_poll() is pretty expensive to check POLLOUT status, because
it has to lock the socket to get its peer, take a reference on the peer
to check its receive queue status, and queue another poll_wait on
peer_wait. This all can be avoided if the process calling
unix_dgram_poll() is not interested in POLLOUT status. It makes
unix_dgram_recvmsg() faster by not queueing irrelevant pollers in
peer_wait.

On a test program provided by Alan Crequy :

Before:

real    0m0.211s
user    0m0.000s
sys     0m0.208s

After:

real    0m0.044s
user    0m0.000s
sys     0m0.040s

Suggested-by: Davide Libenzi <davidel@xmailserver.org>
Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 13:50:09 -08:00
Eric Dumazet 5456f09aaf af_unix: fix unix_dgram_poll() behavior for EPOLLOUT event
Alban Crequy reported a problem with connected dgram af_unix sockets and
provided a test program. epoll() would miss to send an EPOLLOUT event
when a thread unqueues a packet from the other peer, making its receive
queue not full.

This is because unix_dgram_poll() fails to call sock_poll_wait(file,
&unix_sk(other)->peer_wait, wait);
if the socket is not writeable at the time epoll_ctl(ADD) is called.

We must call sock_poll_wait(), regardless of 'writable' status, so that
epoll can be notified later of states changes.

Misc: avoids testing twice (sk->sk_shutdown & RCV_SHUTDOWN)

Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 13:50:09 -08:00
Eric Dumazet 67426b756c af_unix: use keyed wakeups
Instead of wakeup all sleepers, use wake_up_interruptible_sync_poll() to
wakeup only ones interested into writing the socket.

This patch is a specialization of commit 37e5540b3c (epoll keyed
wakeups: make sockets use keyed wakeups).

On a test program provided by Alan Crequy :

Before:
real    0m3.101s
user    0m0.000s
sys     0m6.104s

After:

real	0m0.211s
user	0m0.000s
sys	0m0.208s

Reported-by: Alban Crequy <alban.crequy@collabora.co.uk>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Davide Libenzi <davidel@xmailserver.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 13:50:08 -08:00
Eric Dumazet fc766e4c49 decnet: RCU conversion and get rid of dev_base_lock
While tracking dev_base_lock users, I found decnet used it in
dnet_select_source(), but for a wrong purpose:

Writers only hold RTNL, not dev_base_lock, so readers must use RCU if
they cannot use RTNL.

Adds an rcu_head in struct dn_ifaddr and handle proper RCU management.

Adds __rcu annotation in dn_route as well.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Steven Whitehouse <swhiteho@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 13:50:08 -08:00
David S. Miller d0eaeec8e8 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 2010-11-08 12:38:28 -08:00
Pavel Emelyanov aa58163a76 rds: Fix rds message leak in rds_message_map_pages
The sgs allocation error path leaks the allocated message.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 12:17:09 -08:00
Junchang Wang eb589063ed pktgen: correct uninitialized queue_map
This fix a bug reported by backyes.
Right the first time pktgen's using queue_map that's not been initialized
by set_cur_queue_map(pkt_dev);

Signed-off-by: Junchang Wang <junchangwang@gmail.com>
Signed-off-by: Backyes <backyes@mail.ustc.edu.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 12:17:07 -08:00
Shan Wei f46421416f ipv6: fix overlap check for fragments
The type of FRAG6_CB(prev)->offset is int, skb->len is *unsigned* int,
and offset is int.

Without this patch, type conversion occurred to this expression, when
(FRAG6_CB(prev)->offset + prev->len) is less than offset.

Signed-off-by: Shan Wei <shanwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 12:17:06 -08:00
stephen hemminger 4c46ee5258 classifier: report statistics for basic classifier
The basic classifier keeps statistics but does not report it to user space.
This showed up when using basic classifier (with police) as a default catch
all on ingress; no statistics were reported.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-08 12:17:05 -08:00
Dmitry Torokhov 86c2c0a8a4 NET: pktgen - fix compile warning
This should fix the following warning:

net/core/pktgen.c: In function ‘pktgen_if_write’:
net/core/pktgen.c:890: warning: comparison of distinct pointer types lacks a cast

Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
Reviewed-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-07 05:28:01 -08:00
Linus Torvalds 4b4a2700f4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (41 commits)
  inet_diag: Make sure we actually run the same bytecode we audited.
  netlink: Make nlmsg_find_attr take a const nlmsghdr*.
  fib: fib_result_assign() should not change fib refcounts
  netfilter: ip6_tables: fix information leak to userspace
  cls_cgroup: Fix crash on module unload
  memory corruption in X.25 facilities parsing
  net dst: fix percpu_counter list corruption and poison overwritten
  rds: Remove kfreed tcp conn from list
  rds: Lost locking in loop connection freeing
  de2104x: fix panic on load
  atl1 : fix panic on load
  netxen: remove unused firmware exports
  caif: Remove noisy printout when disconnecting caif socket
  caif: SPI-driver bugfix - incorrect padding.
  caif: Bugfix for socket priority, bindtodev and dbg channel.
  smsc911x: Set Ethernet EEPROM size to supported device's size
  ipv4: netfilter: ip_tables: fix information leak to userland
  ipv4: netfilter: arp_tables: fix information leak to userland
  cxgb4vf: remove call to stop TX queues at load time.
  cxgb4: remove call to stop TX queues at load time.
  ...
2010-11-05 15:25:48 -07:00
Nelson Elhage 22e76c849d inet_diag: Make sure we actually run the same bytecode we audited.
We were using nlmsg_find_attr() to look up the bytecode by attribute when
auditing, but then just using the first attribute when actually running
bytecode. So, if we received a message with two attribute elements, where only
the second had type INET_DIAG_REQ_BYTECODE, we would validate and run different
bytecode strings.

Fix this by consistently using nlmsg_find_attr everywhere.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-04 12:26:34 -07:00
Eric Dumazet 1f1b9c9990 fib: fib_result_assign() should not change fib refcounts
After commit ebc0ffae5 (RCU conversion of fib_lookup()),
fib_result_assign()  should not change fib refcounts anymore.

Thanks to Michael who did the bisection and bug report.

Reported-by: Michael Ellerman <michael@ellerman.id.au>
Tested-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-04 12:05:32 -07:00
Jan Engelhardt cccbe5ef85 netfilter: ip6_tables: fix information leak to userspace
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:55:39 -07:00
David S. Miller 758cb41106 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2010-11-03 18:52:32 -07:00
Herbert Xu c00b2c9e79 cls_cgroup: Fix crash on module unload
Somewhere along the lines net_cls_subsys_id became a macro when
cls_cgroup is built as a module.  Not only did it make cls_cgroup
completely useless, it also causes it to crash on module unload.

This patch fixes this by removing that macro.

Thanks to Eric Dumazet for diagnosing this problem.

Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:50 -07:00
andrew hendry a6331d6f9a memory corruption in X.25 facilities parsing
Signed-of-by: Andrew Hendry <andrew.hendry@gmail.com>

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:50 -07:00
Xiaotian Feng 41bb78b4b9 net dst: fix percpu_counter list corruption and poison overwritten
There're some percpu_counter list corruption and poison overwritten warnings
in recent kernel, which is resulted by fc66f95c.

commit fc66f95c switches to use percpu_counter, in ip6_route_net_init, kernel
init the percpu_counter for dst entries, but, the percpu_counter is never destroyed
in ip6_route_net_exit. So if the related data is freed by kernel, the freed percpu_counter
is still on the list, then if we insert/remove other percpu_counter, list corruption
resulted. Also, if the insert/remove option modifies the ->prev,->next pointer of
the freed value, the poison overwritten is resulted then.

With the following patch, the percpu_counter list corruption and poison overwritten
warnings disappeared.

Signed-off-by: Xiaotian Feng <dfeng@redhat.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: "Pekka Savola (ipv6)" <pekkas@netcore.fi>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:07 -07:00
Pavel Emelyanov 8200a59f24 rds: Remove kfreed tcp conn from list
All the rds_tcp_connection objects are stored list, but when
being freed it should be removed from there.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:07 -07:00
Pavel Emelyanov 58c490babd rds: Lost locking in loop connection freeing
The conn is removed from list in there and this requires
proper lock protection.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:06 -07:00
sjur.brandeland@stericsson.com 47d1ff1765 caif: Remove noisy printout when disconnecting caif socket
Signed-off-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:04 -07:00
André Carvalho de Matos f2527ec436 caif: Bugfix for socket priority, bindtodev and dbg channel.
Changes:
o Bugfix: SO_PRIORITY for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute priority instead.

o Bugfix: SO_BINDTODEVICE for SOL_SOCKET could not be handled
  in caif's setsockopt,  using the struct sock attribute ifindex instead.

o Wrong assert statement for RFM layer segmentation.

o CAIF Debug channels was not working over SPI, caif_payload_info
  containing padding info must be initialized.

o Check on pointer before dereferencing when unregister dev in caif_dev.c

Signed-off-by: Sjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-03 18:50:03 -07:00
Vasiliy Kulikov b5f15ac4f8 ipv4: netfilter: ip_tables: fix information leak to userland
Structure ipt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-03 08:45:06 +01:00
Vasiliy Kulikov 1a8b7a6722 ipv4: netfilter: arp_tables: fix information leak to userland
Structure arpt_getinfo is copied to userland with the field "name"
that has the last elements unitialized.  It leads to leaking of
contents of kernel stack memory.

Signed-off-by: Vasiliy Kulikov <segooon@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-11-03 08:44:12 +01:00
Sage Weil df9f86faf3 ceph: fix small seq message skipping
If the client gets out of sync with the server message sequence number, we
normally skip low seq messages (ones we already received).  The skip code
was also incrementing the expected seq, such that all subsequent messages
also appeared old and got skipped, and an eventual timeout on the osd
connection.  This resulted in some lagging requests and console messages
like

[233480.882885] ceph: skipping osd22 10.138.138.13:6804 seq 2016, expected 2017
[233480.882919] ceph: skipping osd22 10.138.138.13:6804 seq 2017, expected 2018
[233480.882963] ceph: skipping osd22 10.138.138.13:6804 seq 2018, expected 2019
[233480.883488] ceph: skipping osd22 10.138.138.13:6804 seq 2019, expected 2020
[233485.219558] ceph: skipping osd22 10.138.138.13:6804 seq 2020, expected 2021
[233485.906595] ceph: skipping osd22 10.138.138.13:6804 seq 2021, expected 2022
[233490.379536] ceph: skipping osd22 10.138.138.13:6804 seq 2022, expected 2023
[233495.523260] ceph: skipping osd22 10.138.138.13:6804 seq 2023, expected 2024
[233495.923194] ceph: skipping osd22 10.138.138.13:6804 seq 2024, expected 2025
[233500.534614] ceph:  tid 6023602 timed out on osd22, will reset osd

Reported-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Sage Weil <sage@newdream.net>
2010-11-01 15:49:23 -07:00
Tom Herbert df32cc193a net: check queue_index from sock is valid for device
In dev_pick_tx recompute the queue index if the value stored in the
socket is greater than or equal to the number of real queues for the
device.  The saved index in the sock structure is not guaranteed to
be appropriate for the egress device (this could happen on a route
change or in presence of tunnelling).  The result of the queue index
being bad would be to return a bogus queue (crash could prersumably
follow).

Signed-off-by: Tom Herbert <therbert@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-01 12:55:52 -07:00
Dr. David Alan Gilbert 6f9b901823 l2tp: kzalloc with swapped params in l2tp_dfs_seq_open
'sparse' spotted that the parameters to kzalloc in l2tp_dfs_seq_open
were swapped.

Tested on current git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
at 1792f17b72  build, boots and I can see that directory,
but there again I could see /sys/kernel/debug/l2tp with it swapped; I don't have
any l2tp in use.

Signed-off-by: Dr. David Alan Gilbert <linux@treblig.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-11-01 06:56:02 -07:00
Thomas Graf 5ec1cea057 text ematch: check for NULL pointer before destroying textsearch config
While validating the configuration em_ops is already set, thus the
individual destroy functions are called, but the ematch data has
not been allocated and associated with the ematch yet.

Signed-off-by: Thomas Graf <tgraf@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-31 09:37:38 -07:00
Linus Torvalds 3985c7ce85 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
  isdn: mISDN: socket: fix information leak to userland
  netdev: can: Change mail address of Hans J. Koch
  pcnet_cs: add new_id
  net: Truncate recvfrom and sendto length to INT_MAX.
  RDS: Let rds_message_alloc_sgs() return NULL
  RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace
  RDS: Clean up error handling in rds_cmsg_rdma_args
  RDS: Return -EINVAL if rds_rdma_pages returns an error
  net: fix rds_iovec page count overflow
  can: pch_can: fix section mismatch warning by using a whitelisted name
  can: pch_can: fix sparse warning
  netxen_nic: Fix the tx queue manipulation bug in netxen_nic_probe
  ip_gre: fix fallback tunnel setup
  vmxnet: trivial annotation of protocol constant
  vmxnet3: remove unnecessary byteswapping in BAR writing macros
  ipv6/udp: report SndbufErrors and RcvbufErrors
  phy/marvell: rename 88ec048 to 88e1318s and fix mscr1 addr
2010-10-30 18:42:58 -07:00
Linus Torvalds 253eacc070 net: Truncate recvfrom and sendto length to INT_MAX.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:44:07 -07:00
Andy Grover d139ff0907 RDS: Let rds_message_alloc_sgs() return NULL
Even with the previous fix, we still are reading the iovecs once
to determine SGs needed, and then again later on. Preallocating
space for sg lists as part of rds_message seemed like a good idea
but it might be better to not do this. While working to redo that
code, this patch attempts to protect against userspace rewriting
the rds_iovec array between the first and second accesses.

The consequences of this would be either a too-small or too-large
sg list array. Too large is not an issue. This patch changes all
callers of message_alloc_sgs to handle running out of preallocated
sgs, and fail gracefully.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:18 -07:00
Andy Grover fc8162e3c0 RDS: Copy rds_iovecs into kernel memory instead of rereading from userspace
Change rds_rdma_pages to take a passed-in rds_iovec array instead
of doing copy_from_user itself.

Change rds_cmsg_rdma_args to copy rds_iovec array once only. This
eliminates the possibility of userspace changing it after our
sanity checks.

Implement stack-based storage for small numbers of iovecs, based
on net/socket.c, to save an alloc in the extremely common case.

Although this patch reduces iovec copies in cmsg_rdma_args to 1,
we still do another one in rds_rdma_extra_size. Getting rid of
that one will be trickier, so it'll be a separate patch.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:17 -07:00
Andy Grover f4a3fc03c1 RDS: Clean up error handling in rds_cmsg_rdma_args
We don't need to set ret = 0 at the end -- it's initialized to 0.

Also, don't increment s_send_rdma stat if we're exiting with an
error.

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:17 -07:00
Andy Grover a09f69c49b RDS: Return -EINVAL if rds_rdma_pages returns an error
rds_cmsg_rdma_args would still return success even if rds_rdma_pages
returned an error (or overflowed).

Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:16 -07:00
Linus Torvalds 1b1f693d7a net: fix rds_iovec page count overflow
As reported by Thomas Pollet, the rdma page counting can overflow.  We
get the rdma sizes in 64-bit unsigned entities, but then limit it to
UINT_MAX bytes and shift them down to pages (so with a possible "+1" for
an unaligned address).

So each individual page count fits comfortably in an 'unsigned int' (not
even close to overflowing into signed), but as they are added up, they
might end up resulting in a signed return value. Which would be wrong.

Catch the case of tot_pages turning negative, and return the appropriate
error code.

Reported-by: Thomas Pollet <thomas.pollet@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andy Grover <andy.grover@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:34:16 -07:00
Eric Dumazet 3285ee3bb2 ip_gre: fix fallback tunnel setup
Before making the fallback tunnel visible to lookups, we should make
sure it is completely setup, once ipgre_tunnel_init() had been called
and tstats per_cpu pointer allocated.

move rcu_assign_pointer(ign->tunnels_wc[0], tunnel); from
ipgre_fb_tunnel_init() to ipgre_init_net()

Based on a patch from Pavel Emelyanov

Reported-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:21:28 -07:00
Eric Dumazet 870be39258 ipv6/udp: report SndbufErrors and RcvbufErrors
commit a18135eb93 (Add UDP_MIB_{SND,RCV}BUFERRORS handling.)
forgot to make the necessary changes in net/ipv6/proc.c to report
additional counters in /proc/net/snmp6

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-30 16:17:23 -07:00
Linus Torvalds 1840897ab5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
  b43: Fix warning at drivers/mmc/core/core.c:237 in mmc_wait_for_cmd
  mac80211: fix failure to check kmalloc return value in key_key_read
  libertas: Fix sd8686 firmware reload
  ath9k: Fix incorrect access of rate flags in RC
  netfilter: xt_socket: Make tproto signed in socket_mt6_v1().
  stmmac: enable/disable rx/tx in the core with a single write.
  net: atarilance - flags should be unsigned long
  netxen: fix kdump
  pktgen: Limit how much data we copy onto the stack.
  net: Limit socket I/O iovec total length to INT_MAX.
  USB: gadget: fix ethernet gadget crash in gether_setup
  fib: Fix fib zone and its hash leak on namespace stop
  cxgb3: Fix panic in free_tx_desc()
  cxgb3: fix crash due to manipulating queues before registration
  8390: Don't oops on starting dev queue
  dccp ccid-2: Stop polling
  dccp: Refine the wait-for-ccid mechanism
  dccp: Extend CCID packet dequeueing interface
  dccp: Return-value convention of hc_tx_send_packet()
  igbvf: fix panic on load
  ...
2010-10-29 14:17:12 -07:00
David S. Miller a4765fa7bf Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-10-29 12:23:15 -07:00
Jesper Juhl 520efd1ace mac80211: fix failure to check kmalloc return value in key_key_read
I noticed two small issues in mac80211/debugfs_key.c::key_key_read while
reading through the code. Patch below.

The key_key_read() function returns ssize_t and the value that's actually
returned is the return value of simple_read_from_buffer() which also
returns ssize_t, so let's hold the return value in a ssize_t local
variable rather than a int one.

Also, memory is allocated dynamically with kmalloc() which can fail, but
the return value of kmalloc() is not checked, so we may end up operating
on a null pointer further on. So check for a NULL return and bail out with
-ENOMEM in that case.

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-29 14:33:26 -04:00
Eric Dumazet d817d29d0b netfilter: fix nf_conntrack_l4proto_register()
While doing __rcu annotations work on net/netfilter I found following
bug. On some arches, it is possible we publish a table while its content
is not yet committed to memory, and lockless reader can dereference wild
pointer.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-29 19:59:40 +02:00
Patrick McHardy 64e4674922 netfilter: nf_nat: fix compiler warning with CONFIG_NF_CT_NETLINK=n
net/ipv4/netfilter/nf_nat_core.c:52: warning: 'nf_nat_proto_find_get' defined but not used
net/ipv4/netfilter/nf_nat_core.c:66: warning: 'nf_nat_proto_put' defined but not used

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-29 16:28:07 +02:00
Al Viro 51139adac9 convert get_sb_pseudo() users
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-29 04:16:33 -04:00
Al Viro fc14f2fef6 convert get_sb_single() users
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-29 04:16:28 -04:00
David S. Miller 089282fb02 netfilter: xt_socket: Make tproto signed in socket_mt6_v1().
Otherwise error indications from ipv6_find_hdr() won't be noticed.

This required making the protocol argument to extract_icmp6_fields()
signed too.

Reported-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 12:59:53 -07:00
Nelson Elhage 448d7b5daf pktgen: Limit how much data we copy onto the stack.
A program that accidentally writes too much data to the pktgen file can overflow
the kernel stack and oops the machine. This is only triggerable by root, so
there's no security issue, but it's still an unfortunate bug.

printk() won't print more than 1024 bytes in a single call, anyways, so let's
just never copy more than that much data. We're on a fairly shallow stack, so
that should be safe even with CONFIG_4KSTACKS.

Signed-off-by: Nelson Elhage <nelhage@ksplice.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 11:47:53 -07:00
David S. Miller 8acfe468b0 net: Limit socket I/O iovec total length to INT_MAX.
This helps protect us from overflow issues down in the
individual protocol sendmsg/recvmsg handlers.  Once
we hit INT_MAX we truncate out the rest of the iovec
by setting the iov_len members to zero.

This works because:

1) For SOCK_STREAM and SOCK_SEQPACKET sockets, partial
   writes are allowed and the application will just continue
   with another write to send the rest of the data.

2) For datagram oriented sockets, where there must be a
   one-to-one correspondance between write() calls and
   packets on the wire, INT_MAX is going to be far larger
   than the packet size limit the protocol is going to
   check for and signal with -EMSGSIZE.

Based upon a patch by Linus Torvalds.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 11:47:52 -07:00
Pavel Emelyanov 4aa2c466a7 fib: Fix fib zone and its hash leak on namespace stop
When we stop a namespace we flush the table and free one, but the
added fn_zone-s (and their hashes if grown) are leaked. Need to free.
Tries releases all its stuff in the flushing code.

Shame on us - this bug exists since the very first make-fib-per-net
patches in 2.6.27 :(

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:03 -07:00
Gerrit Renker 1c0e0a0569 dccp ccid-2: Stop polling
This updates CCID-2 to use the CCID dequeuing mechanism, converting from
previous continuous-polling to a now event-driven mechanism.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:01 -07:00
Gerrit Renker b1fcf55eea dccp: Refine the wait-for-ccid mechanism
This extends the existing wait-for-ccid routine so that it may be used with
different types of CCID, addressing the following problems:

 1) The queue-drain mechanism only works with rate-based CCIDs. If CCID-2 for
    example has a full TX queue and becomes network-limited just as the
    application wants to close, then waiting for CCID-2 to become unblocked
    could lead to an indefinite  delay (i.e., application "hangs").
 2) Since each TX CCID in turn uses a feedback mechanism, there may be changes
    in its sending policy while the queue is being drained. This can lead to
    further delays during which the application will not be able to terminate.
 3) The minimum wait time for CCID-3/4 can be expected to be the queue length
    times the current inter-packet delay. For example if tx_qlen=100 and a delay
    of 15 ms is used for each packet, then the application would have to wait
    for a minimum of 1.5 seconds before being allowed to exit.
 4) There is no way for the user/application to control this behaviour. It would
    be good to use the timeout argument of dccp_close() as an upper bound. Then
    the maximum time that an application is willing to wait for its CCIDs to can
    be set via the SO_LINGER option.

These problems are addressed by giving the CCID a grace period of up to the
`timeout' value.

The wait-for-ccid function is, as before, used when the application
 (a) has read all the data in its receive buffer and
 (b) if SO_LINGER was set with a non-zero linger time, or
 (c) the socket is either in the OPEN (active close) or in the PASSIVE_CLOSEREQ
     state (client application closes after receiving CloseReq).

In addition, there is a catch-all case of __skb_queue_purge() after waiting for
the CCID. This is necessary since the write queue may still have data when
 (a) the host has been passively-closed,
 (b) abnormal termination (unread data, zero linger time),
 (c) wait-for-ccid could not finish within the given time limit.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:01 -07:00
Gerrit Renker dc841e30ea dccp: Extend CCID packet dequeueing interface
This extends the packet dequeuing interface of dccp_write_xmit() to allow
 1. CCIDs to take care of timing when the next packet may be sent;
 2. delayed sending (as before, with an inter-packet gap up to 65.535 seconds).

The main purpose is to take CCID-2 out of its polling mode (when it is network-
limited, it tries every millisecond to send, without interruption).

The mode of operation for (2) is as follows:
 * new packet is enqueued via dccp_sendmsg() => dccp_write_xmit(),
 * ccid_hc_tx_send_packet() detects that it may not send (e.g. window full),
 * it signals this condition via `CCID_PACKET_WILL_DEQUEUE_LATER',
 * dccp_write_xmit() returns without further action;
 * after some time the wait-condition for CCID becomes true,
 * that CCID schedules the tasklet,
 * tasklet function calls ccid_hc_tx_send_packet() via dccp_write_xmit(),
 * since the wait-condition is now true, ccid_hc_tx_packet() returns "send now",
 * packet is sent, and possibly more (since dccp_write_xmit() loops).

Code reuse: the taskled function calls dccp_write_xmit(), the timer function
            reduces to a wrapper around the same code.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:00 -07:00
Gerrit Renker fe84f4140f dccp: Return-value convention of hc_tx_send_packet()
This patch reorganises the return value convention of the CCID TX sending
function, to permit more flexible schemes, as required by subsequent patches.

Currently the convention is
 * values < 0     mean error,
 * a value == 0   means "send now", and
 * a value x > 0  means "send in x milliseconds".

The patch provides symbolic constants and a function to interpret return values.

In addition, it caps the maximum positive return value to 0xFFFF milliseconds,
corresponding to 65.535 seconds.  This is possible since in CCID-3/4 the
maximum possible inter-packet gap is fixed at t_mbi = 64 sec.

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-28 10:27:00 -07:00
Sanchit Garg f6ac55b6c1 net/9p: Return error on read with NULL buffer
This patch ensures that a read(fd, NULL, 10) returns  EFAULT on a 9p file.

Signed-off-by: Sanchit Garg <sancgarg@linux.vnet.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:49 -05:00
Venkateswararao Jujjuri (JV) b165d60145 9p: Add datasync to client side TFSYNC/RFSYNC for dotl
SYNOPSIS
    size[4] Tfsync tag[2] fid[4] datasync[4]

    size[4] Rfsync tag[2]

DESCRIPTION

    The Tfsync transaction transfers ("flushes") all modified in-core data of
    file identified by fid to the disk device (or other  permanent  storage
    device)  where that  file  resides.

    If datasync flag is specified data will be fleshed but does not flush
    modified metadata unless  that  metadata  is  needed  in order to allow a
    subsequent data retrieval to be correctly handled.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:49 -05:00
Aneesh Kumar K.V 7b3bb3fe16 net/9p: Return error if we fail to encode protocol data
We need to return error in case we fail to encode data in protocol buffer.
This patch also return error in case of a failed copy_from_user.

Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:49 -05:00
Venkateswararao Jujjuri (JV) 52f44e0d08 net/9p: Add waitq to VirtIO transport.
If there is not enough space for the PDU on the VirtIO ring, current
code returns -EIO propagating the error to user.

This patch introduced a wqit_queue on the channel, and lets the process
wait on this queue until VirtIO ring frees up.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:48 -05:00
Venkateswararao Jujjuri (JV) 419b39561e [net/9p]Serialize virtqueue operations to make VirtIO transport SMP safe.
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:48 -05:00
M. Mohan Kumar 329176cc2c 9p: Implement TREADLINK operation for 9p2000.L
Synopsis

	size[4] TReadlink tag[2] fid[4]
	size[4] RReadlink tag[2] target[s]

Description
	Readlink is used to return the contents of the symoblic link
        referred by fid. Contents of symboic link is returned as a
        response.

	target[s] - Contents of the symbolic link referred by fid.

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:48 -05:00
M. Mohan Kumar 1d769cd192 9p: Implement TGETLOCK
Synopsis

    size[4] TGetlock tag[2] fid[4] getlock[n]
    size[4] RGetlock tag[2] getlock[n]

Description

TGetlock is used to test for the existence of byte range posix locks on a file
identified by given fid. The reply contains getlock structure. If the lock could
be placed it returns F_UNLCK in type field of getlock structure.  Otherwise it
returns the details of the conflicting locks in the getlock structure

    getlock structure:
      type[1] - Type of lock: F_RDLCK, F_WRLCK
      start[8] - Starting offset for lock
      length[8] - Number of bytes to check for the lock
             If length is 0, check for lock in all bytes starting at the location
            'start' through to the end of file
      pid[4] - PID of the process that wants to take lock/owns the task
               in case of reply
      client[4] - Client id of the system that owns the process which
                  has the conflicting lock

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
M. Mohan Kumar a099027c77 9p: Implement TLOCK
Synopsis

    size[4] TLock tag[2] fid[4] flock[n]
    size[4] RLock tag[2] status[1]

Description

Tlock is used to acquire/release byte range posix locks on a file
identified by given fid. The reply contains status of the lock request

    flock structure:
        type[1] - Type of lock: F_RDLCK, F_WRLCK, F_UNLCK
        flags[4] - Flags could be either of
          P9_LOCK_FLAGS_BLOCK - Blocked lock request, if there is a
            conflicting lock exists, wait for that lock to be released.
          P9_LOCK_FLAGS_RECLAIM - Reclaim lock request, used when client is
            trying to reclaim a lock after a server restrart (due to crash)
        start[8] - Starting offset for lock
        length[8] - Number of bytes to lock
          If length is 0, lock all bytes starting at the location 'start'
          through to the end of file
        pid[4] - PID of the process that wants to take lock
        client_id[4] - Unique client id

        status[1] - Status of the lock request, can be
          P9_LOCK_SUCCESS(0), P9_LOCK_BLOCKED(1), P9_LOCK_ERROR(2) or
          P9_LOCK_GRACE(3)
          P9_LOCK_SUCCESS - Request was successful
          P9_LOCK_BLOCKED - A conflicting lock is held by another process
          P9_LOCK_ERROR - Error while processing the lock request
          P9_LOCK_GRACE - Server is in grace period, it can't accept new lock
            requests in this period (except locks with
            P9_LOCK_FLAGS_RECLAIM flag set)

Signed-off-by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
Venkateswararao Jujjuri (JV) 920e65dc69 [9p] Introduce client side TFSYNC/RFSYNC for dotl.
SYNOPSIS
    size[4] Tfsync tag[2] fid[4]

    size[4] Rfsync tag[2]

DESCRIPTION

The Tfsync transaction transfers ("flushes") all modified in-core data of
file identified by fid to the disk device (or other  permanent  storage
device)  where that  file  resides.

Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:47 -05:00
jvrao 8e44a0805f net/9p: Add a Warning to catch NULL fids passed to p9_client_clunk().
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:45 -05:00
Arun R Bharadwaj 4f7ebe8072 net/9p: This patch implements TLERROR/RLERROR on the 9P client.
Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com>
Signed-off-by: Venkateswararao Jujjuri <jvrao@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
2010-10-28 09:08:45 -05:00
Eric Dumazet 6b1686a71e netfilter: nf_conntrack: allow nf_ct_alloc_hashtable() to get highmem pages
commit ea781f197d (use SLAB_DESTROY_BY_RCU and get rid of call_rcu())
did a mistake in __vmalloc() call in nf_ct_alloc_hashtable().

I forgot to add __GFP_HIGHMEM, so pages were taken from LOWMEM only.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: stable@kernel.org
Signed-off-by: Patrick McHardy <kaber@trash.net>
2010-10-28 12:34:21 +02:00
Linus Torvalds 22cdbd1d57 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (108 commits)
  ehea: Fixing statistics
  bonding: Fix lockdep warning after bond_vlan_rx_register()
  tunnels: Fix tunnels change rcu protection
  caif-u5500: Build config for CAIF shared mem driver
  caif-u5500: CAIF shared memory mailbox interface
  caif-u5500: CAIF shared memory transport protocol
  caif-u5500: Adding shared memory include
  drivers/isdn: delete double assignment
  drivers/net/typhoon.c: delete double assignment
  drivers/net/sb1000.c: delete double assignment
  qlcnic: define valid vlan id range
  qlcnic: reduce rx ring size
  qlcnic: fix mac learning
  ehea: fix use after free
  inetpeer: __rcu annotations
  fib_rules: __rcu annotates ctarget
  tunnels: add __rcu annotations
  net: add __rcu annotations to protocol
  ipv4: add __rcu annotations to routes.c
  qlge: bugfix: Restoring the vlan setting.
  ...
2010-10-27 18:28:00 -07:00
Pavel Emelyanov 74b0b85b88 tunnels: Fix tunnels change rcu protection
After making rcu protection for tunnels (ipip, gre, sit and ip6) a bug
was introduced into the SIOCCHGTUNNEL code.

The tunnel is first unlinked, then addresses change, then it is linked
back probably into another bucket. But while changing the parms, the
hash table is unlocked to readers and they can lookup the improper tunnel.

Respective commits are b7285b79 (ipip: get rid of ipip_lock), 1507850b
(gre: get rid of ipgre_lock), 3a43be3c (sit: get rid of ipip6_lock) and
94767632 (ip6tnl: get rid of ip6_tnl_lock).

The quick fix is to wait for quiescent state to pass after unlinking,
but if it is inappropriate I can invent something better, just let me
know.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 14:20:08 -07:00
Jouni Malinen dc9f48ce7c mac80211: Fix scan_ies_len to include DS Params
Commit 651b52254f added DS Parameter Set
information into Probe Request frames that are transmitted on 2.4 GHz
band, but it failed to increment local->scan_ies_len to cover this new
information. This variable needs to be updated to match the maximum IE
data length so that the extra buffer need gets reduced from the driver
limit.

Signed-off-by: Jouni Malinen <j@w1.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-27 15:46:51 -04:00
Eric Dumazet b914c4ea92 inetpeer: __rcu annotations
Adds __rcu annotations to inetpeer
	(struct inet_peer)->avl_left
	(struct inet_peer)->avl_right

This is a tedious cleanup, but removes one smp_wmb() from link_to_pool()
since we now use more self documenting rcu_assign_pointer().

Note the use of RCU_INIT_POINTER() instead of rcu_assign_pointer() in
all cases we dont need a memory barrier.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:33 -07:00
Eric Dumazet 7a2b03c517 fib_rules: __rcu annotates ctarget
Adds __rcu annotation to (struct fib_rule)->ctarget

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet b33eab0844 tunnels: add __rcu annotations
Add __rcu annotations to :
        (struct ip_tunnel)->prl
        (struct ip_tunnel_prl_entry)->next
        (struct xfrm_tunnel)->next
	struct xfrm_tunnel *tunnel4_handlers
	struct xfrm_tunnel *tunnel64_handlers

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:32 -07:00
Eric Dumazet e0ad61ec86 net: add __rcu annotations to protocol
Add __rcu annotations to :
        struct net_protocol *inet_protos
        struct net_protocol *inet6_protos

And use appropriate casts to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Eric Dumazet 1c31720a74 ipv4: add __rcu annotations to routes.c
Add __rcu annotations to :
        (struct dst_entry)->rt_next
        (struct rt_hash_bucket)->chain

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:31 -07:00
Ursula Braun 853dc2e03d ipv6: fix refcnt problem related to POSTDAD state
After running this bonding setup script
    modprobe bonding miimon=100 mode=0 max_bonds=1
    ifconfig bond0 10.1.1.1/16
    ifenslave bond0 eth1
    ifenslave bond0 eth3
on s390 with qeth-driven slaves, modprobe -r fails with this message
    unregister_netdevice: waiting for bond0 to become free. Usage count = 1
due to twice detection of duplicate address.
Problem is caused by a missing decrease of ifp->refcnt in addrconf_dad_failure.
An extra call of in6_ifa_put(ifp) solves it.
Problem has been introduced with commit f2344a131b.

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:30 -07:00
Ben Hutchings 66c68bcc48 net: NETIF_F_HW_CSUM does not imply FCoE CRC offload
NETIF_F_HW_CSUM indicates the ability to update an TCP/IP-style 16-bit
checksum with the checksum of an arbitrary part of the packet data,
whereas the FCoE CRC is something entirely different.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Cc: stable@kernel.org [2.6.32+]
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:29 -07:00
Ben Hutchings af1905dbec net: Fix some corner cases in dev_can_checksum()
dev_can_checksum() incorrectly returns true in these cases:

1. The skb has both out-of-band and in-band VLAN tags and the device
   supports checksum offload for the encapsulated protocol but only with
   one layer of encapsulation.
2. The skb has a VLAN tag and the device supports generic checksumming
   but not in conjunction with VLAN encapsulation.

Rearrange the VLAN tag checks to avoid these.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-27 11:37:29 -07:00
Linus Torvalds 426e1f5cec Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (52 commits)
  split invalidate_inodes()
  fs: skip I_FREEING inodes in writeback_sb_inodes
  fs: fold invalidate_list into invalidate_inodes
  fs: do not drop inode_lock in dispose_list
  fs: inode split IO and LRU lists
  fs: switch bdev inode bdi's correctly
  fs: fix buffer invalidation in invalidate_list
  fsnotify: use dget_parent
  smbfs: use dget_parent
  exportfs: use dget_parent
  fs: use RCU read side protection in d_validate
  fs: clean up dentry lru modification
  fs: split __shrink_dcache_sb
  fs: improve DCACHE_REFERENCED usage
  fs: use percpu counter for nr_dentry and nr_dentry_unused
  fs: simplify __d_free
  fs: take dcache_lock inside __d_path
  fs: do not assign default i_ino in new_inode
  fs: introduce a per-cpu last_ino allocator
  new helper: ihold()
  ...
2010-10-26 17:58:44 -07:00
Eric Dumazet 518de9b39e fs: allow for more than 2^31 files
Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

<quote>

We were seeing a failure which prevented boot.  The kernel was incapable
of creating either a named pipe or unix domain socket.  This comes down
to a common kernel function called unix_create1() which does:

        atomic_inc(&unix_nr_socks);
        if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

        n = (mempages * (PAGE_SIZE / 1024)) / 10;
        files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000).  That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

</quote>

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of atomic_t.

get_max_files() is changed to return an unsigned long.  get_nr_files() is
changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.

Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704     0       2147483648

Reported-by: Robin Holt <holt@sgi.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Robin Holt <holt@sgi.com>
Tested-by: Robin Holt <holt@sgi.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-26 16:52:15 -07:00
Glenn Wurster 7a876b0efc IPv6: Temp addresses are immediately deleted.
There is a bug in the interaction between ipv6_create_tempaddr and
addrconf_verify.  Because ipv6_create_tempaddr uses the cstamp and tstamp
from the public address in creating a private address, if we have not
received a router advertisement in a while, tstamp + temp_valid_lft might be
< now.  If this happens, the new address is created inside
ipv6_create_tempaddr, then the loop within addrconf_verify starts again and
the address is immediately deleted.  We are left with no temporary addresses
on the interface, and no more will be created until the public IP address is
updated.  To avoid this, set the expiry time to be the minimum of the time
left on the public address or the config option PLUS the current age of the
public interface.

Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 12:35:13 -07:00
Glenn Wurster aed65501e8 IPv6: Create temporary address if none exists.
If privacy extentions are enabled, but no current temporary address exists,
then create one when we get a router advertisement.

Signed-off-by: Glenn Wurster <gwurster@scs.carleton.ca>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 12:35:12 -07:00
Eric Dumazet ded85aa86b fib_hash: fix rcu sparse and logical errors
While fixing CONFIG_SPARSE_RCU_POINTER errors, I had to fix accesses to
fz->fz_hash for real.

-	&fz->fz_hash[fn_hash(f->fn_key, fz)]
+	rcu_dereference(fz->fz_hash) + fn_hash(f->fn_key, fz)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:42:39 -07:00
Eric Dumazet ebb9fed2de fib: fix fib_nl_newrule()
Some panic reports in fib_rules_lookup() show a rule could have a NULL
pointer as a next pointer in the rules_list.

This can actually happen because of a bug in fib_nl_newrule() : It
checks if current rule is the destination of unresolved gotos. (Other
rules have gotos to this about to be inserted rule)

Problem is it does the resolution of the gotos before the rule is
inserted in the rules_list (and has a valid next pointer)

Fix this by moving the rules_list insertion before the changes on gotos.

A lockless reader can not any more follow a ctarget pointer, unless
destination is ready (has a valid next pointer)

Reported-by: Oleg A. Arkhangelsky <sysoleg@yandex.ru>
Reported-by: Joe Buehler <aspam@cox.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 11:42:38 -07:00
David S. Miller 78fd9c4491 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-2.6 2010-10-26 11:32:28 -07:00
Linus Torvalds 4390110fef Merge branch 'for-2.6.37' of git://linux-nfs.org/~bfields/linux
* 'for-2.6.37' of git://linux-nfs.org/~bfields/linux: (99 commits)
  svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
  svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
  svcrpc: assume svc_delete_xprt() called only once
  svcrpc: never clear XPT_BUSY on dead xprt
  nfsd4: fix connection allocation in sequence()
  nfsd4: only require krb5 principal for NFSv4.0 callbacks
  nfsd4: move minorversion to client
  nfsd4: delay session removal till free_client
  nfsd4: separate callback change and callback probe
  nfsd4: callback program number is per-session
  nfsd4: track backchannel connections
  nfsd4: confirm only on succesful create_session
  nfsd4: make backchannel sequence number per-session
  nfsd4: use client pointer to backchannel session
  nfsd4: move callback setup into session init code
  nfsd4: don't cache seq_misordered replies
  SUNRPC: Properly initialize sock_xprt.srcaddr in all cases
  SUNRPC: Use conventional switch statement when reclassifying sockets
  sunrpc/xprtrdma: clean up workqueue usage
  sunrpc: Turn list_for_each-s into the ..._entry-s
  ...

Fix up trivial conflicts (two different deprecation notices added in
separate branches) in Documentation/feature-removal-schedule.txt
2010-10-26 09:55:25 -07:00
Linus Torvalds a4dd8dce14 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6:
  net/sunrpc: Use static const char arrays
  nfs4: fix channel attribute sanity-checks
  NFSv4.1: Use more sensible names for 'initialize_mountpoint'
  NFSv4.1: pnfs: filelayout: add driver's LAYOUTGET and GETDEVICEINFO infrastructure
  NFSv4.1: pnfs: add LAYOUTGET and GETDEVICEINFO infrastructure
  NFS: client needs to maintain list of inodes with active layouts
  NFS: create and destroy inode's layout cache
  NFSv4.1: pnfs: filelayout: introduce minimal file layout driver
  NFSv4.1: pnfs: full mount/umount infrastructure
  NFS: set layout driver
  NFS: ask for layouttypes during v4 fsinfo call
  NFS: change stateid to be a union
  NFSv4.1: pnfsd, pnfs: protocol level pnfs constants
  SUNRPC: define xdr_decode_opaque_fixed
  NFSD: remove duplicate NFS4_STATEID_SIZE
2010-10-26 09:52:09 -07:00
David S. Miller 7932c2e55c netfilter: Add missing CONFIG_SYSCTL checks in ipv6's nf_conntrack_reasm.c
Reported-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-26 09:08:53 -07:00
Joe Perches 411b5e0561 net/sunrpc: Use static const char arrays
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2010-10-25 22:19:52 -04:00
Christoph Hellwig 85fe4025c6 fs: do not assign default i_ino in new_inode
Instead of always assigning an increasing inode number in new_inode
move the call to assign it into those callers that actually need it.
For now callers that need it is estimated conservatively, that is
the call is added to all filesystems that do not assign an i_ino
by themselves.  For a few more filesystems we can avoid assigning
any inode number given that they aren't user visible, and for others
it could be done lazily when an inode number is actually needed,
but that's left for later patches.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Dave Chinner <dchinner@redhat.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
Al Viro 7de9c6ee3e new helper: ihold()
Clones an existing reference to inode; caller must already hold one.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:26:11 -04:00
Eric Dumazet 7e360c38ab fs: allow for more than 2^31 files
Andrew,

Could you please review this patch, you probably are the right guy to
take it, because it crosses fs and net trees.

Note : /proc/sys/fs/file-nr is a read-only file, so this patch doesnt
depend on previous patch (sysctl: fix min/max handling in
__do_proc_doulongvec_minmax())

Thanks !

[PATCH V4] fs: allow for more than 2^31 files

Robin Holt tried to boot a 16TB system and found af_unix was overflowing
a 32bit value :

<quote>

We were seeing a failure which prevented boot.  The kernel was incapable
of creating either a named pipe or unix domain socket.  This comes down
to a common kernel function called unix_create1() which does:

        atomic_inc(&unix_nr_socks);
        if (atomic_read(&unix_nr_socks) > 2 * get_max_files())
                goto out;

The function get_max_files() is a simple return of files_stat.max_files.
files_stat.max_files is a signed integer and is computed in
fs/file_table.c's files_init().

        n = (mempages * (PAGE_SIZE / 1024)) / 10;
        files_stat.max_files = n;

In our case, mempages (total_ram_pages) is approx 3,758,096,384
(0xe0000000).  That leaves max_files at approximately 1,503,238,553.
This causes 2 * get_max_files() to integer overflow.

</quote>

Fix is to let /proc/sys/fs/file-nr & /proc/sys/fs/file-max use long
integers, and change af_unix to use an atomic_long_t instead of
atomic_t.

get_max_files() is changed to return an unsigned long.
get_nr_files() is changed to return a long.

unix_nr_socks is changed from atomic_t to atomic_long_t, while not
strictly needed to address Robin problem.

Before patch (on a 64bit kernel) :
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
-18446744071562067968

After patch:
# echo 2147483648 >/proc/sys/fs/file-max
# cat /proc/sys/fs/file-max
2147483648
# cat /proc/sys/fs/file-nr
704     0       2147483648

Reported-by: Robin Holt <holt@sgi.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David Miller <davem@davemloft.net>
Reviewed-by: Robin Holt <holt@sgi.com>
Tested-by: Robin Holt <holt@sgi.com>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2010-10-25 21:18:20 -04:00
J. Bruce Fields 42d7ba3d6d svcrpc: svc_tcp_sendto XPT_DEAD check is redundant
The only caller (svc_send) has already checked XPT_DEAD.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-25 17:59:34 -04:00
J. Bruce Fields 01dba075d5 svcrpc: no need for XPT_DEAD check in svc_xprt_enqueue
If any xprt marked DEAD is also left BUSY for the rest of its life, then
the XPT_DEAD check here is superfluous--we'll get the same result from
the XPT_BUSY check just after.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-25 17:59:33 -04:00
J. Bruce Fields ac9303eb74 svcrpc: assume svc_delete_xprt() called only once
As long as DEAD exports are left BUSY, and svc_delete_xprt is called
only with BUSY held, then svc_delete_xprt() will never be called on an
xprt that is already DEAD.

Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-25 17:59:32 -04:00
J. Bruce Fields 7e4fdd0744 svcrpc: never clear XPT_BUSY on dead xprt
Once an xprt has been deleted, there's no reason to allow it to be
enqueued--at worst, that might cause the xprt to be re-added to some
global list, resulting in later corruption.

Also, note this leaves us with no need for the reference-count
manipulation here.

Reviewed-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2010-10-25 17:58:40 -04:00
Eric Dumazet 43a951e999 ipv4: add __rcu annotations to ip_ra_chain
Add __rcu annotations to :
        (struct ip_ra_chain)->next
	struct ip_ra_chain *ip_ra_chain;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:28 -07:00
Eric Dumazet 0d7da9ddd9 net: add __rcu annotation to sk_filter
Add __rcu annotation to :
        (struct sock)->sk_filter

And use appropriate rcu primitives to reduce sparse warnings if
CONFIG_SPARSE_RCU_POINTER=y

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:28 -07:00
Eric Dumazet 1c87733d06 net_ns: add __rcu annotations
add __rcu annotation to (struct net)->gen, and use
rcu_dereference_protected() in net_assign_generic()

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:27 -07:00
Eric Dumazet 6e3f7faf3e rps: add __rcu annotations
Add __rcu annotations to :
	(struct netdev_rx_queue)->rps_map
	(struct netdev_rx_queue)->rps_flow_table
	struct rps_sock_flow_table *rps_sock_flow_table;

And use appropriate rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 14:18:27 -07:00
KOVACS Krisztian f6318e5588 netfilter: fix module dependency issues with IPv6 defragmentation, ip6tables and xt_TPROXY
One of the previous tproxy related patches split IPv6 defragmentation and
connection tracking, but did not correctly add Kconfig stanzas to handle the
new dependencies correctly. This patch fixes that by making the config options
mirror the setup we have for IPv4: a distinct config option for defragmentation
that is automatically selected by both connection tracking and
xt_TPROXY/xt_socket.

The patch also changes the #ifdefs enclosing IPv6 specific code in xt_socket
and xt_TPROXY: we only compile these in case we have ip6tables support enabled.

Signed-off-by: KOVACS Krisztian <hidden@balabit.hu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:58:36 -07:00
Linus Torvalds 74eb94b218 Merge branch 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
* 'nfs-for-2.6.37' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6: (67 commits)
  SUNRPC: Cleanup duplicate assignment in rpcauth_refreshcred
  nfs: fix unchecked value
  Ask for time_delta during fsinfo probe
  Revalidate caches on lock
  SUNRPC: After calling xprt_release(), we must restart from call_reserve
  NFSv4: Fix up the 'dircount' hint in encode_readdir
  NFSv4: Clean up nfs4_decode_dirent
  NFSv4: nfs4_decode_dirent must clear entry->fattr->valid
  NFSv4: Fix a regression in decode_getfattr
  NFSv4: Fix up decode_attr_filehandle() to handle the case of empty fh pointer
  NFS: Ensure we check all allocation return values in new readdir code
  NFS: Readdir plus in v4
  NFS: introduce generic decode_getattr function
  NFS: check xdr_decode for errors
  NFS: nfs_readdir_filler catch all errors
  NFS: readdir with vmapped pages
  NFS: remove page size checking code
  NFS: decode_dirent should use an xdr_stream
  SUNRPC: Add a helper function xdr_inline_peek
  NFS: remove readdir plus limit
  ...
2010-10-25 13:48:29 -07:00
Eric Dumazet 6f0bcf1525 tunnels: add _rcu annotations
(struct ip6_tnl)->next is rcu protected :
(struct ip_tunnel)->next is rcu protected :
(struct xfrm6_tunnel)->next is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:45 -07:00
Eric Dumazet 3cc77ec74e net/802: add __rcu annotations
(struct net_device)->garp_port is rcu protected :
(struct garp_port)->applicants is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:44 -07:00
Eric Dumazet 198caeca3e ipv6: ip6_ptr rcu annotations
(struct net_device)->ip6_ptr is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:43 -07:00
Eric Dumazet b616b09afa vlan: rcu annotations
(struct net_device)->vlgrp is rcu protected :

add __rcu annotation and proper rcu primitives.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 13:09:43 -07:00
David S. Miller 11a766ce91 net: Increase xmit RECURSION_LIMIT to 10.
Three is definitely too low, and we know from reports that GRE tunnels
stacked as deeply as 37 levels cause stack overflows, so pick some
reasonable value between those two.

Signed-off-by: David S. Miller <davem@davemloft.net>
2010-10-25 12:51:55 -07:00
Rajkumar Manoharan c8716d9dc1 mac80211: Fix ibss station got expired immediately
Station addition in ieee80211_ibss_rx_queued_mgmt is not updating
sta->last_rx which is causing station expiry in ieee80211_ibss_work
path. So sta addition and deletion happens repeatedly.

CC: stable@kernel.org
Signed-off-by: Rajkumar Manoharan <rmanoharan@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2010-10-25 14:45:56 -04:00