Commit graph

24934 commits

Author SHA1 Message Date
Pablo Neira Ayuso
5901b6be88 netfilter: nf_nat: support IPv6 in IRC NAT helper
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:23 +02:00
Patrick McHardy
9a66482106 netfilter: nf_nat: support IPv6 in SIP NAT helper
Add IPv6 support to the SIP NAT helper. There are no functional differences
to IPv4 NAT, just different formats for addresses.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:22 +02:00
Patrick McHardy
ee6eb96673 netfilter: nf_nat: support IPv6 in amanda NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:21 +02:00
Patrick McHardy
d33cbeeb1a netfilter: nf_nat: support IPv6 in FTP NAT helper
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:20 +02:00
Patrick McHardy
ed72d9e294 netfilter: ip6tables: add NETMAP target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:19 +02:00
Patrick McHardy
115e23ac78 netfilter: ip6tables: add REDIRECT target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:19 +02:00
Patrick McHardy
b3f644fc82 netfilter: ip6tables: add MASQUERADE target
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:18 +02:00
Patrick McHardy
58a317f106 netfilter: ipv6: add IPv6 NAT support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:17 +02:00
Patrick McHardy
2cf545e835 net: core: add function for incremental IPv6 pseudo header checksum updates
Add inet_proto_csum_replace16 for incrementally updating IPv6 pseudo header
checksums for IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-30 03:00:16 +02:00
Patrick McHardy
0ad352cb43 netfilter: ipv6: expand skb head in ip6_route_me_harder after oif change
Expand the skb headroom if the oif changed due to rerouting similar to
how IPv4 packets are handled.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:15 +02:00
Patrick McHardy
c7232c9979 netfilter: add protocol independent NAT core
Convert the IPv4 NAT implementation to a protocol independent core and
address family specific modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:14 +02:00
Patrick McHardy
051966c0c6 netfilter: nf_nat: add protoff argument to packet mangling functions
For mangling IPv6 packets the protocol header offset needs to be known
by the NAT packet mangling functions. Add a so far unused protoff argument
and convert the conntrack and NAT helpers to use it in preparation of
IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:13 +02:00
Patrick McHardy
811927ccfe netfilter: nf_conntrack: restrict NAT helper invocation to IPv4
The NAT helpers currently only handle IPv4 packets correctly. Restrict
invocation of the helpers to IPv4 in preparation of IPv6 NAT.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:12 +02:00
Patrick McHardy
2b60af0178 netfilter: nf_conntrack_ipv6: fix tracking of ICMPv6 error messages containing fragments
ICMPv6 error messages are tracked by extracting the conntrack tuple of
the inner packet and looking up the corresponding conntrack entry. Tuple
extraction uses the ->get_l4proto() callback, which in case of fragments
returns NEXTHDR_FRAGMENT instead of the upper protocol, even for the
first fragment when the entire next header is present, resulting in a
failure to find the correct connection tracking entry.

This patch changes ipv6_get_l4proto() to use ipv6_skip_exthdr() instead
of nf_ct_ipv6_skip_exthdr() in order to skip fragment headers when the
fragment offset is zero.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:11 +02:00
Patrick McHardy
4cdd34084d netfilter: nf_conntrack_ipv6: improve fragmentation handling
The IPv6 conntrack fragmentation currently has a couple of shortcomings.
Fragmentes are collected in PREROUTING/OUTPUT, are defragmented, the
defragmented packet is then passed to conntrack, the resulting conntrack
information is attached to each original fragment and the fragments then
continue their way through the stack.

Helper invocation occurs in the POSTROUTING hook, at which point only
the original fragments are available. The result of this is that
fragmented packets are never passed to helpers.

This patch improves the situation in the following way:

- If a reassembled packet belongs to a connection that has a helper
  assigned, the reassembled packet is passed through the stack instead
  of the original fragments.

- During defragmentation, the largest received fragment size is stored.
  On output, the packet is refragmented if required. If the largest
  received fragment size exceeds the outgoing MTU, a "packet too big"
  message is generated, thus behaving as if the original fragments
  were passed through the stack from an outside point of view.

- The ipv6_helper() hook function can't receive fragments anymore for
  connections using a helper, so it is switched to use ipv6_skip_exthdr()
  instead of the netfilter specific nf_ct_ipv6_skip_exthdr() and the
  reassembled packets are passed to connection tracking helpers.

The result of this is that we can properly track fragmented packets, but
still generate ICMPv6 Packet too big messages if we would have before.

This patch is also required as a precondition for IPv6 NAT, where NAT
helpers might enlarge packets up to a point that they require
fragmentation. In that case we can't generate Packet too big messages
since the proper MTU can't be calculated in all cases (f.i. when
changing textual representation of a variable amount of addresses),
so the packet is transparently fragmented iff the original packet or
fragments would have fit the outgoing MTU.

IPVS parts by Jesper Dangaard Brouer <brouer@redhat.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2012-08-30 03:00:10 +02:00
Jesper Dangaard Brouer
590e3f79a2 ipvs: IPv6 MTU checking cleanup and bugfix
Cleaning up the IPv6 MTU checking in the IPVS xmit code, by using
a common helper function __mtu_check_toobig_v6().

The MTU check for tunnel mode can also use this helper as
ntohs(old_iph->payload_len) + sizeof(struct ipv6hdr) is qual to
skb->len.  And the 'mtu' variable have been adjusted before
calling helper.

Notice, this also fixes a bug, as the the MTU check in ip_vs_dr_xmit_v6()
were missing a check for skb_is_gso().

This bug e.g. caused issues for KVM IPVS setups, where different
Segmentation Offloading techniques are utilized, between guests,
via the virtio driver.  This resulted in very bad performance,
due to the ICMPv6 "too big" messages didn't affect the sender.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-30 02:55:39 +02:00
Amerigo Wang
072a9c4860 netpoll: revert 6bdb7fe310 and fix be_poll() instead
Against -net.

In the patch "netpoll: re-enable irq in poll_napi()", I tried to
fix the following warning:

[100718.051041] ------------[ cut here ]------------
[100718.051048] WARNING: at kernel/softirq.c:159 local_bh_enable_ip+0x7d/0xb0()
(Not tainted)
[100718.051049] Hardware name: ProLiant BL460c G7
...
[100718.051068] Call Trace:
[100718.051073]  [<ffffffff8106b747>] ? warn_slowpath_common+0x87/0xc0
[100718.051075]  [<ffffffff8106b79a>] ? warn_slowpath_null+0x1a/0x20
[100718.051077]  [<ffffffff810747ed>] ? local_bh_enable_ip+0x7d/0xb0
[100718.051080]  [<ffffffff8150041b>] ? _spin_unlock_bh+0x1b/0x20
[100718.051085]  [<ffffffffa00ee974>] ? be_process_mcc+0x74/0x230 [be2net]
[100718.051088]  [<ffffffffa00ea68c>] ? be_poll_tx_mcc+0x16c/0x290 [be2net]
[100718.051090]  [<ffffffff8144fe76>] ? netpoll_poll_dev+0xd6/0x490
[100718.051095]  [<ffffffffa01d24a5>] ? bond_poll_controller+0x75/0x80 [bonding]
[100718.051097]  [<ffffffff8144fde5>] ? netpoll_poll_dev+0x45/0x490
[100718.051100]  [<ffffffff81161b19>] ? ksize+0x19/0x80
[100718.051102]  [<ffffffff81450437>] ? netpoll_send_skb_on_dev+0x157/0x240

by reenabling IRQ before calling ->poll, but it seems more
problems are introduced after that patch:

http://ozlabs.org/~akpm/stuff/IMG_20120824_122054.jpg
http://marc.info/?l=linux-netdev&m=134563282530588&w=2

So it is safe to fix be2net driver code directly.

This patch reverts the offending commit and fixes be_poll() by
avoid disabling BH there, this is okay because be_poll()
can be called either by poll_napi() which already disables
IRQ, or by net_rx_action() which already disables BH.

Reported-by: Andrew Morton <akpm@linux-foundation.org>
Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Sylvain Munaut <s.munaut@whatever-company.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Miller <davem@davemloft.net>
Cc: Sathya Perla <sathya.perla@emulex.com>
Cc: Subbu Seetharaman <subbu.seetharaman@emulex.com>
Cc: Ajit Khaparde <ajit.khaparde@emulex.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-29 15:03:23 -04:00
Vinicius Costa Gomes
d8343f1257 Bluetooth: Fix sending a HCI Authorization Request over LE links
In the case that the link is already in the connected state and a
Pairing request arrives from the mgmt interface, hci_conn_security()
would be called but it was not considering LE links.

Reported-by: João Paulo Rechi Vita <jprvita@openbossa.org>
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:11:51 -07:00
Vinicius Costa Gomes
cc110922da Bluetooth: Change signature of smp_conn_security()
To make it clear that it may be called from contexts that may not have
any knowledge of L2CAP, we change the connection parameter, to receive
a hci_conn.

This also makes it clear that it is checking the security of the link.

Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-27 08:07:18 -07:00
Patrick McHardy
5f2d04f1f9 ipv4: fix path MTU discovery with connection tracking
IPv4 conntrack defragments incoming packet at the PRE_ROUTING hook and
(in case of forwarded packets) refragments them at POST_ROUTING
independent of the IP_DF flag. Refragmentation uses the dst_mtu() of
the local route without caring about the original fragment sizes,
thereby breaking PMTUD.

This patch fixes this by keeping track of the largest received fragment
with IP_DF set and generates an ICMP fragmentation required error during
refragmentation if that size exceeds the MTU.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
2012-08-26 19:13:55 +02:00
Linus Torvalds
8497ae61d0 Merge branch 'for-3.6' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from J. Bruce Fields:
 "Particular thanks to Michael Tokarev, Malahal Naineni, and Jamie
  Heilman for their testing and debugging help."

* 'for-3.6' of git://linux-nfs.org/~bfields/linux:
  svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
  svcrpc: sends on closed socket should stop immediately
  svcrpc: fix BUG() in svc_tcp_clear_pages
  nfsd4: fix security flavor of NFSv4.0 callback
2012-08-25 11:43:41 -07:00
David S. Miller
e6acb38480 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
This is an initial merge in of Eric Biederman's work to start adding
user namespace support to the networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 18:54:37 -04:00
David S. Miller
85c21049fc Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is a batch of updates intended for 3.7.  The bulk of it is
mac80211 changes, including some mesh work from Thomas Pederson and
some multi-channel work from Johannes.  A variety of driver updates
and other bits are scattered in there as well.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 15:18:07 -04:00
David S. Miller
d05cebb915 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
This batch of fixes is intended for 3.6...

Johannes Berg gives us a pair of iwlwifi fixes.  One corrects some
improperly defined ifdefs that lead to crashes and BUG_ONs.  The other
prevents attempts to read SRAM for devices that aren't actually started.

Julia Lawall provides an ipw2100 fix to properly set the return code
from a function call before testing it! :-)

Thomas Huehn corrects the improper use of a constant related to a power
setting in ath5k.

Thomas Pedersen offers a mac80211 fix to properly handle destination
addresses of unicast frames passing though a mesh gate.

Vladimir Zapolskiy provides a brcmsmac fix to properly mark the
interface state when the device goes down.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 15:15:10 -04:00
Yuchung Cheng
7c4a56fec3 tcp: fix cwnd reduction for non-sack recovery
The cwnd reduction in fast recovery is based on the number of packets
newly delivered per ACK. For non-sack connections every DUPACK
signifies a packet has been delivered, but the sender mistakenly
skips counting them for cwnd reduction.

The fix is to compute newly_acked_sacked after DUPACKs are accounted
in sacked_out for non-sack connections.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Acked-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:48:58 -04:00
Jiri Pirko
9b361c13ce vlan: add helper which can be called to see if device is used by vlan
also, remove unused vlan_info definition from header

CC: Patrick McHardy <kaber@trash.net>
Signed-off-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:46:39 -04:00
Pablo Neira Ayuso
20e1db19db netlink: fix possible spoofing from non-root processes
Non-root user-space processes can send Netlink messages to other
processes that are well-known for being subscribed to Netlink
asynchronous notifications. This allows ilegitimate non-root
process to send forged messages to Netlink subscribers.

The userspace process usually verifies the legitimate origin in
two ways:

a) Socket credentials. If UID != 0, then the message comes from
   some ilegitimate process and the message needs to be dropped.

b) Netlink portID. In general, portID == 0 means that the origin
   of the messages comes from the kernel. Thus, discarding any
   message not coming from the kernel.

However, ctnetlink sets the portID in event messages that has
been triggered by some user-space process, eg. conntrack utility.
So other processes subscribed to ctnetlink events, eg. conntrackd,
know that the event was triggered by some user-space action.

Neither of the two ways to discard ilegitimate messages coming
from non-root processes can help for ctnetlink.

This patch adds capability validation in case that dst_pid is set
in netlink_sendmsg(). This approach is aggressive since existing
applications using any Netlink bus to deliver messages between
two user-space processes will break. Note that the exception is
NETLINK_USERSOCK, since it is reserved for netlink-to-netlink
userspace communication.

Still, if anyone wants that his Netlink bus allows netlink-to-netlink
userspace, then they can set NL_NONROOT_SEND. However, by default,
I don't think it makes sense to allow to use NETLINK_ROUTE to
communicate two processes that are sending no matter what information
that is not related to link/neighbouring/routing. They should be using
NETLINK_USERSOCK instead for that.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 13:36:09 -04:00
Ben Hutchings
8f4cccbbd9 net: Set device operstate at registration time
The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver.  The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.

For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened.  However, we must not activate linkwatch for a
unregistered device, and commit b473001 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't.  But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.

The same issue exists with the dormant state.

The proper initialisation sequence, avoiding a race with opening of
the device, is:

        rtnl_lock();
        rc = register_netdevice(dev);
        if (rc)
                goto out_unlock;
        netif_carrier_off(dev); /* or netif_dormant_on(dev) */
        rtnl_unlock();

but it seems silly that this should have to be repeated in so many
drivers.  Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.

Commit 22604c8 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.

This initialises the operstate synchronously at registration time
only.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:46:13 -04:00
Neil Horman
3afa6d00fb cls_cgroup: Allow classifier cgroups to have their classid reset to 0
The network classifier cgroup initalizes each cgroups instance classid value to
0.  However, the sock_update_classid function only updates classid's in sockets
if the tasks cgroup classid is not zero, and if it differs from the current
classid.  The later check is to prevent cache line dirtying, but the former is
detrimental, as it prevents resetting a classid for a cgroup to 0.  While this
is not a common action, it has administrative usefulness (if the admin wants to
disable classification of a certain group temporarily for instance).

Easy fix, just remove the zero check.  Tested successfully by myself

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 12:41:17 -04:00
John W. Linville
f20b6213f1 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-08-24 12:25:30 -04:00
Eric Dumazet
78df76a065 ipv4: take rt_uncached_lock only if needed
Multicast traffic allocates dst with DST_NOCACHE, but dst is
not inserted into rt_uncached_list.

This slowdown multicast workloads on SMP because rt_uncached_lock is
contended.

Change the test before taking the lock to actually check the dst
was inserted into rt_uncached_list.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 11:47:48 -04:00
David S. Miller
e6e94e392f Merge tag 'batman-adv-for-davem' of git://git.open-mesh.org/linux-merge
Antonio Quartulli says:

====================
Included changes:
- a set of codestyle rearrangements/fixes
- new feature to early detect new joining (mesh-unaware) clients
- a minor fix for the gw-feature
- substitution of shift operations with the BIT() macro
- reorganization of the main batman-adv structure (struct batadv_priv)
- some more (very) minor cleanups and fixes
===================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-24 11:30:50 -04:00
John W. Linville
e72615f6ab Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2012-08-24 11:16:58 -04:00
Fengguang Wu
a0dfb2634e af_packet: match_fanout_group() can be static
cc: Eric Leblond <eric@regit.org>
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:27:12 -07:00
Eric Dumazet
748e2d9396 net: reinstate rtnl in call_netdevice_notifiers()
Eric Biederman pointed out that not holding RTNL while calling
call_netdevice_notifiers() was racy.

This patch is a direct transcription his feedback
against commit 0115e8e30d (net: remove delay at device dismantle)

Thanks Eric !

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-23 09:24:42 -07:00
John W. Linville
c6b6eedc29 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211 2012-08-23 09:51:15 -04:00
John W. Linville
a4881cc45a Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next 2012-08-23 09:49:42 -04:00
Sven Eckelmann
fa4f0afcf4 batman-adv: Start new development cycle
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:23 +02:00
Antonio Quartulli
371351731e batman-adv: change interface_rx to get orig node
In order to understand  where a broadcast packet is coming from and use
this information to detect not yet announced clients, this patch modifies the
interface_rx() function by passing a new argument: the orig node
corresponding to the node that originated the received packet (if known).
This new argument if not NULL for broadcast packets only (other packets does not
have source field).

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:22 +02:00
Antonio Quartulli
30cfd02b60 batman-adv: detect not yet announced clients
With the current TT mechanism a new client joining the network is not
immediately able to communicate with other hosts because its MAC address has not
been announced yet. This situation holds until the first OGM containing its
joining event will be spread over the mesh network.

This behaviour can be acceptable in networks where the originator interval is a
small value (e.g. 1sec) but if that value is set to an higher time (e.g. 5secs)
the client could suffer from several malfunctions like DHCP client timeouts,
etc.

This patch adds an early detection mechanism that makes nodes in the network
able to recognise "not yet announced clients" by means of the broadcast packets
they emitted on connection (e.g. ARP or DHCP request). The added client will
then be confirmed upon receiving the OGM claiming it or purged if such OGM
is not received within a fixed amount of time.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:22 +02:00
Sven Eckelmann
c67893d17a batman-adv: Reduce accumulated length of simple statements
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:21 +02:00
Sven Eckelmann
bbb1f90efb batman-adv: Don't break statements after assignment operator
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:20 +02:00
Sven Eckelmann
8de47de575 batman-adv: Use BIT(x) macro to calculate bit positions
Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:19 +02:00
Martin Hundebøll
74ee3634dc batman-adv: Drop tt queries with foreign dest
When enabling promiscuous mode, tt queries for other hosts might be
received. Before this patch, "foreign" tt queries were processed like
any other query and thus forwarded to its destination again and thereby
causing a loop.

This patch adds a check to drop foreign tt queries.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:19 +02:00
Martin Hundebøll
ff51fd70ad batman-adv: Move batadv_check_unicast_packet()
batadv_check_unicast_packet() is needed in batadv_recv_tt_query(), so
move the former to before the latter.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:18 +02:00
Sven Eckelmann
807736f6e0 batman-adv: Split batadv_priv in sub-structures for features
The structure batadv_priv grows everytime a new feature is introduced. It gets
hard to find the parts of the struct that belongs to a specific feature. This
becomes even harder by the fact that not every feature uses a prefix in the
member name.

The variables for bridge loop avoidence, gateway handling, translation table
and visualization server are moved into separate structs that are included in
the bat_priv main struct.

Signed-off-by: Sven Eckelmann <sven@narfation.org>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:20:13 +02:00
Simon Wunderlich
624463079e batman-adv: check batadv_orig_hash_add_if() return code
If this call fails, some of the orig_nodes spaces may have been
resized for the increased number of interface, and some may not.
If we would just continue with the larger number of interfaces,
this would lead to access to not allocated memory later.

We better check the return code, and don't add the interface if
no memory is available. OTOH, keeping some of the orig_nodes
with too much memory allocated should hurt no one (except for
a few too many bytes allocated).

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:46 +02:00
Antonio Quartulli
a51fb9b2ac batman-adv: fix typos in comments
the word millisecond is misspelled in several comments. This patch fixes it.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:45 +02:00
Antonio Quartulli
d657e621a0 batman-adv: add reference counting for type batadv_tt_orig_list_entry
The batadv_tt_orig_list_entry structure didn't have any refcounting mechanism so
far. This patch introduces it and makes the structure being usable in much more
complex context.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:44 +02:00
Jonathan Corbet
3a7f291bf6 batman-adv: remove a misleading comment
As much as I'm happy to see LWN links sprinkled through the kernel by the
dozen, this one in particular reflects a very old state of reality; the
associated comment is now incorrect.  So just delete it.

Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:44 +02:00
Marek Lindner
1c9b0550f4 batman-adv: convert remaining packet counters to per_cpu_ptr() infrastructure
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Acked-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:43 +02:00
Simon Wunderlich
3eb8773e3a batman-adv: rename bridge loop avoidance claim types
for consistency reasons within the code and with the documentation,
we should always call it "claim" and "unclaim".

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:42 +02:00
Simon Wunderlich
99e966fc96 batman-adv: correct comments in bridge loop avoidance
Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:41 +02:00
Simon Wunderlich
536a23f119 batman-adv: Add the backbone gateway list to debugfs
This is especially useful if there are no claims yet, but we still want
to know which gateways are using bridge loop avoidance in the network.

Signed-off-by: Simon Wunderlich <siwu@hrz.tu-chemnitz.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:41 +02:00
Antonio Quartulli
c70437289c batman-adv: move function arguments on one line
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2012-08-23 14:02:40 +02:00
Pavel Emelyanov
0fa7fa98db packet: Protect packet sk list with mutex (v2)
Change since v1:

* Fixed inuse counters access spotted by Eric

In patch eea68e2f (packet: Report socket mclist info via diag module) I've
introduced a "scheduling in atomic" problem in packet diag module -- the
socket list is traversed under rcu_read_lock() while performed under it sk
mclist access requires rtnl lock (i.e. -- mutex) to be taken.

[152363.820563] BUG: scheduling while atomic: crtools/12517/0x10000002
[152363.820573] 4 locks held by crtools/12517:
[152363.820581]  #0:  (sock_diag_mutex){+.+.+.}, at: [<ffffffff81a2dcb5>] sock_diag_rcv+0x1f/0x3e
[152363.820613]  #1:  (sock_diag_table_mutex){+.+.+.}, at: [<ffffffff81a2de70>] sock_diag_rcv_msg+0xdb/0x11a
[152363.820644]  #2:  (nlk->cb_mutex){+.+.+.}, at: [<ffffffff81a67d01>] netlink_dump+0x23/0x1ab
[152363.820693]  #3:  (rcu_read_lock){.+.+..}, at: [<ffffffff81b6a049>] packet_diag_dump+0x0/0x1af

Similar thing was then re-introduced by further packet diag patches (fanount
mutex and pgvec mutex for rings) :(

Apart from being terribly sorry for the above, I propose to change the packet
sk list protection from spinlock to mutex. This lock currently protects two
modifications:

* sklist
* prot inuse counters

The sklist modifications can be just reprotected with mutex since they already
occur in a sleeping context. The inuse counters modifications are trickier -- the
__this_cpu_-s are used inside, thus requiring the caller to handle the potential
issues with contexts himself. Since packet sockets' counters are modified in two
places only (packet_create and packet_release) we only need to protect the context
from being preempted. BH disabling is not required in this case.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
danborkmann@iogearbox.net
9e67030af3 af_packet: use define instead of constant
Instead of using a hard-coded value for the status variable, it would make
the code more readable to use its destined define from linux/if_packet.h.

Signed-off-by: daniel.borkmann@tik.ee.ethz.ch
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:58:27 -07:00
Ying Xue
bfdc587c5a rds: Don't disable BH on BH context
Since we have already in BH context when *_write_space(),
*_data_ready() as well as *_state_change() are called, it's
unnecessary to disable BH.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:52:04 -07:00
Eric Dumazet
b87fb39e39 ipv6: gre: fix ip6gre_err()
ip6gre_err() miscomputes grehlen (sizeof(ipv6h) is 4 or 8,
not 40 as expected), and should take into account 'offset' parameter.

Also uses pskb_may_pull() to cope with some fragged skbs

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Dmitry Kozlov <xeb@mail.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:48:32 -07:00
Eric Dumazet
ef8531b64c xfrm: fix RCU bugs
This patch reverts commit 56892261ed (xfrm: Use rcu_dereference_bh to
deference pointer protected by rcu_read_lock_bh), and fixes bugs
introduced in commit 418a99ac6a ( Replace rwlock on xfrm_policy_afinfo
with rcu )

1) We properly use RCU variant in this file, not a mix of RCU/RCU_BH

2) We must defer some writes after the synchronize_rcu() call or a reader
 can crash dereferencing NULL pointer.

3) Now we use the xfrm_policy_afinfo_lock spinlock only from process
context, we no longer need to block BH in xfrm_policy_register_afinfo()
and xfrm_policy_unregister_afinfo()

4) Can use RCU_INIT_POINTER() instead of rcu_assign_pointer() in
xfrm_policy_unregister_afinfo()

5) Remove a forward inline declaration (xfrm_policy_put_afinfo()),
  and also move xfrm_policy_get_afinfo() declaration.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Fan Du <fan.du@windriver.com>
Cc: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 22:39:46 -07:00
Eric Dumazet
0115e8e30d net: remove delay at device dismantle
I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

With help from Gao feng

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 21:50:36 -07:00
Eric Dumazet
9b04f35005 ipv4: properly update pmtu
Sylvain Munault reported following info :

 - TCP connection get "stuck" with data in send queue when doing
   "large" transfers ( like typing 'ps ax' on a ssh connection )
 - Only happens on path where the PMTU is lower than the MTU of
   the interface
 - Is not present right after boot, it only appears 10-20min after
   boot or so. (and that's inside the _same_ TCP connection, it works
   fine at first and then in the same ssh session, it'll get stuck)
 - Definitely seems related to fragments somehow since I see a router
   sending ICMP message saying fragmentation is needed.
 - Exact same setup works fine with kernel 3.5.1

Problem happens when the 10 minutes (ip_rt_mtu_expires) expiration
period is over.

ip_rt_update_pmtu() calls dst_set_expires() to rearm a new expiration,
but dst_set_expires() does nothing because dst.expires is already set.

It seems we want to set the expires field to a new value, regardless
of prior one.

With help from Julian Anastasov.

Reported-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
CC: Julian Anastasov <ja@ssi.bg>
Tested-by: Sylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 19:14:30 -07:00
David S. Miller
bf277b0cce Merge git://1984.lsi.us.es/nf-next
Pablo Neira Ayuso says:

====================
This is the first batch of Netfilter and IPVS updates for your
net-next tree. Mostly cleanups for the Netfilter side. They are:

* Remove unnecessary RTNL locking now that we have support
  for namespace in nf_conntrack, from Patrick McHardy.

* Cleanup to eliminate unnecessary goto in the initialization
  path of several Netfilter tables, from Jean Sacren.

* Another cleanup from Wu Fengguang, this time to PTR_RET instead
  of if IS_ERR then return PTR_ERR.

* Use list_for_each_entry_continue_rcu in nf_iterate, from
  Michael Wang.

* Add pmtu_disc sysctl option to disable PMTU in their tunneling
  transmitter, from Julian Anastasov.

* Generalize application protocol registration in IPVS and modify
  IPVS FTP helper to use it, from Julian Anastasov.

* update Kconfig. The IPVS FTP helper depends on the Netfilter FTP
  helper for NAT support, from Julian Anastasov.

* Add logic to update PMTU for IPIP packets in IPVS, again
  from Julian Anastasov.

* A couple of sparse warning fixes for IPVS and Netfilter from
  Claudiu Ghioc and Patrick McHardy respectively.

Patrick's IPv6 NAT changes will follow after this batch, I need
to flush this batch first before refreshing my tree.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-22 18:48:52 -07:00
Pravin B Shelar
46df7b8145 openvswitch: Add support for network namespaces.
Following patch adds support for network namespace to openvswitch.
Since it must release devices when namespaces are destroyed, a
side effect of this patch is that the module no longer keeps a
refcount but instead cleans up any state when it is unloaded.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Jesse Gross <jesse@nicira.com>
2012-08-22 14:48:55 -07:00
David S. Miller
1304a7343b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-08-22 14:21:38 -07:00
Jean Sacren
90efbed18a netfilter: remove unnecessary goto statement for error recovery
Usually it's a good practice to use goto statement for error recovery
when initializing the module. This approach could be an overkill if:

 1) there is only one fail case;
 2) success and failure use the same return statement.

For a cleaner approach, remove the unnecessary goto statement and
directly implement error recovery.

Signed-off-by: Jean Sacren <sakiwit@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-22 19:17:38 +02:00
Michael Wang
6705e86724 netfilter: replace list_for_each_continue_rcu with new interface
This patch replaces list_for_each_continue_rcu() with
list_for_each_entry_continue_rcu() to allow removing
list_for_each_continue_rcu().

Signed-off-by: Michael Wang <wangyun@linux.vnet.ibm.com>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-22 19:17:20 +02:00
Linus Torvalds
f753c4ec15 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull ceph fixes from Sage Weil:
 "Jim's fix closes a narrow race introduced with the msgr changes.  One
  fix resolves problems with debugfs initialization that Yan found when
  multiple client instances are created (e.g., two clusters mounted, or
  rbd + cephfs), another one fixes problems with mounting a nonexistent
  server subdirectory, and the last one fixes a divide by zero error
  from unsanitized ioctl input that Dan Carpenter found."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  ceph: avoid divide by zero in __validate_layout()
  libceph: avoid truncation due to racing banners
  ceph: tolerate (and warn on) extraneous dentry from mds
  libceph: delay debugfs initialization until we learn global_id
2012-08-22 09:58:05 -07:00
Sujith Manoharan
aba4e6fff8 mac80211: Fix AP mode regression
Commit mac80211: avoid using synchronize_rcu in ieee80211_set_probe_resp
changed the return value when the probe response template is not present.
Revert to the earlier value of 1 - this fixes AP mode for drivers like
ath9k.

Signed-off-by: Sujith Manoharan <c_manoha@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-22 10:55:04 +02:00
Thomas Pedersen
27f011243a mac80211: fix DS to MBSS address translation
The destination address of unicast frames forwarded through a mesh gate
was being replaced with the broadcast address. Instead leave the
original destination address as the mesh DA. If the nexthop address is
not in the mpath table it will be resolved. If that fails, the frame
will be forwarded to known mesh gates.

Reported-by: Cedric Voncken <cedric.voncken@acksys.fr>
Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-22 09:45:05 +02:00
Jim Schutt
6d4221b537 libceph: avoid truncation due to racing banners
Because the Ceph client messenger uses a non-blocking connect, it is
possible for the sending of the client banner to race with the
arrival of the banner sent by the peer.

When ceph_sock_state_change() notices the connect has completed, it
schedules work to process the socket via con_work().  During this
time the peer is writing its banner, and arrival of the peer banner
races with con_work().

If con_work() calls try_read() before the peer banner arrives, there
is nothing for it to do, after which con_work() calls try_write() to
send the client's banner.  In this case Ceph's protocol negotiation
can complete succesfully.

The server-side messenger immediately sends its banner and addresses
after accepting a connect request, *before* actually attempting to
read or verify the banner from the client.  As a result, it is
possible for the banner from the server to arrive before con_work()
calls try_read().  If that happens, try_read() will read the banner
and prepare protocol negotiation info via prepare_write_connect().
prepare_write_connect() calls con_out_kvec_reset(), which discards
the as-yet-unsent client banner.  Next, con_work() calls
try_write(), which sends the protocol negotiation info rather than
the banner that the peer is expecting.

The result is that the peer sees an invalid banner, and the client
reports "negotiation failed".

Fix this by moving con_out_kvec_reset() out of
prepare_write_connect() to its callers at all locations except the
one where the banner might still need to be sent.

[elder@inktak.com: added note about server-side behavior]

Signed-off-by: Jim Schutt <jaschut@sandia.gov>
Reviewed-by: Alex Elder <elder@inktank.com>
2012-08-21 15:55:27 -07:00
Eric Dumazet
e0e3cea46d af_netlink: force credentials passing [CVE-2012-3520]
Pablo Neira Ayuso discovered that avahi and
potentially NetworkManager accept spoofed Netlink messages because of a
kernel bug.  The kernel passes all-zero SCM_CREDENTIALS ancillary data
to the receiver if the sender did not provide such data, instead of not
including any such data at all or including the correct data from the
peer (as it is the case with AF_UNIX).

This bug was introduced in commit 16e5726269
(af_unix: dont send SCM_CREDENTIALS by default)

This patch forces passing credentials for netlink, as
before the regression.

Another fix would be to not add SCM_CREDENTIALS in
netlink messages if not provided by the sender, but it
might break some programs.

With help from Florian Weimer & Petr Matousek

This issue is designated as CVE-2012-3520

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Florian Weimer <fweimer@redhat.com>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:53:01 -07:00
Eric Dumazet
a9915a1b52 ipv4: fix ip header ident selection in __ip_make_skb()
Christian Casteyde reported a kmemcheck 32-bit read from uninitialized
memory in __ip_select_ident().

It turns out that __ip_make_skb() called ip_select_ident() before
properly initializing iph->daddr.

This is a bug uncovered by commit 1d861aa4b3 (inet: Minimize use of
cached route inetpeer.)

Addresses https://bugzilla.kernel.org/show_bug.cgi?id=46131

Reported-by: Christian Casteyde <casteyde.christian@free.fr>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:51:06 -07:00
Christoph Paasch
1a7b27c97c ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
Since 0e73441992 ("ipv4: Use inet_csk_route_child_sock() in DCCP and
TCP."), inet_csk_route_child_sock() is called instead of
inet_csk_route_req().

However, after creating the child-sock in tcp/dccp_v4_syn_recv_sock(),
ireq->opt is set to NULL, before calling inet_csk_route_child_sock().
Thus, inside inet_csk_route_child_sock() opt is always NULL and the
SRR-options are not respected anymore.
Packets sent by the server won't have the correct destination-IP.

This patch fixes it by accessing newinet->inet_opt instead of ireq->opt
inside inet_csk_route_child_sock().

Reported-by: Luca Boccassi <luca.boccassi@gmail.com>
Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:49:11 -07:00
Eric Dumazet
144d56e910 tcp: fix possible socket refcount problem
Commit 6f458dfb40 (tcp: improve latencies of timer triggered events)
added bug leading to following trace :

[ 2866.131281] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.131726]
[ 2866.132188] =========================
[ 2866.132281] [ BUG: held lock freed! ]
[ 2866.132281] 3.6.0-rc1+ #622 Not tainted
[ 2866.132281] -------------------------
[ 2866.132281] kworker/0:1/652 is freeing memory ffff880019ec0000-ffff880019ec0a1f, with a lock still held there!
[ 2866.132281]  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281] 4 locks held by kworker/0:1/652:
[ 2866.132281]  #0:  (rpciod){.+.+.+}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #1:  ((&task->u.tk_work)){+.+.+.}, at: [<ffffffff81083567>] process_one_work+0x1de/0x47f
[ 2866.132281]  #2:  (sk_lock-AF_INET-RPC){+.+...}, at: [<ffffffff81903619>] tcp_sendmsg+0x29/0xcc6
[ 2866.132281]  #3:  (&icsk->icsk_retransmit_timer){+.-...}, at: [<ffffffff81078017>] run_timer_softirq+0x1ad/0x35f
[ 2866.132281]
[ 2866.132281] stack backtrace:
[ 2866.132281] Pid: 652, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #622
[ 2866.132281] Call Trace:
[ 2866.132281]  <IRQ>  [<ffffffff810bc527>] debug_check_no_locks_freed+0x112/0x159
[ 2866.132281]  [<ffffffff818a0839>] ? __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff811549fa>] kmem_cache_free+0x6b/0x13a
[ 2866.132281]  [<ffffffff818a0839>] __sk_free+0xfd/0x114
[ 2866.132281]  [<ffffffff818a08c0>] sk_free+0x1c/0x1e
[ 2866.132281]  [<ffffffff81911e1c>] tcp_write_timer+0x51/0x56
[ 2866.132281]  [<ffffffff81078082>] run_timer_softirq+0x218/0x35f
[ 2866.132281]  [<ffffffff81078017>] ? run_timer_softirq+0x1ad/0x35f
[ 2866.132281]  [<ffffffff810f5831>] ? rb_commit+0x58/0x85
[ 2866.132281]  [<ffffffff81911dcb>] ? tcp_write_timer_handler+0x148/0x148
[ 2866.132281]  [<ffffffff81070bd6>] __do_softirq+0xcb/0x1f9
[ 2866.132281]  [<ffffffff81a0a00c>] ? _raw_spin_unlock+0x29/0x2e
[ 2866.132281]  [<ffffffff81a1227c>] call_softirq+0x1c/0x30
[ 2866.132281]  [<ffffffff81039f38>] do_softirq+0x4a/0xa6
[ 2866.132281]  [<ffffffff81070f2b>] irq_exit+0x51/0xad
[ 2866.132281]  [<ffffffff81a129cd>] do_IRQ+0x9d/0xb4
[ 2866.132281]  [<ffffffff81a0a3ef>] common_interrupt+0x6f/0x6f
[ 2866.132281]  <EOI>  [<ffffffff8109d006>] ? sched_clock_cpu+0x58/0xd1
[ 2866.132281]  [<ffffffff81a0a172>] ? _raw_spin_unlock_irqrestore+0x4c/0x56
[ 2866.132281]  [<ffffffff81078692>] mod_timer+0x178/0x1a9
[ 2866.132281]  [<ffffffff818a00aa>] sk_reset_timer+0x19/0x26
[ 2866.132281]  [<ffffffff8190b2cc>] tcp_rearm_rto+0x99/0xa4
[ 2866.132281]  [<ffffffff8190dfba>] tcp_event_new_data_sent+0x6e/0x70
[ 2866.132281]  [<ffffffff8190f7ea>] tcp_write_xmit+0x7de/0x8e4
[ 2866.132281]  [<ffffffff818a565d>] ? __alloc_skb+0xa0/0x1a1
[ 2866.132281]  [<ffffffff8190f952>] __tcp_push_pending_frames+0x2e/0x8a
[ 2866.132281]  [<ffffffff81904122>] tcp_sendmsg+0xb32/0xcc6
[ 2866.132281]  [<ffffffff819229c2>] inet_sendmsg+0xaa/0xd5
[ 2866.132281]  [<ffffffff81922918>] ? inet_autobind+0x5f/0x5f
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189adab>] sock_sendmsg+0xa3/0xc4
[ 2866.132281]  [<ffffffff810f5de6>] ? rb_reserve_next_event+0x26f/0x2d5
[ 2866.132281]  [<ffffffff8103e6a9>] ? native_sched_clock+0x29/0x6f
[ 2866.132281]  [<ffffffff8103e6f8>] ? sched_clock+0x9/0xd
[ 2866.132281]  [<ffffffff810ee7f1>] ? trace_clock_local+0x9/0xb
[ 2866.132281]  [<ffffffff8189ae03>] kernel_sendmsg+0x37/0x43
[ 2866.132281]  [<ffffffff8199ce49>] xs_send_kvec+0x77/0x80
[ 2866.132281]  [<ffffffff8199cec1>] xs_sendpages+0x6f/0x1a0
[ 2866.132281]  [<ffffffff8107826d>] ? try_to_del_timer_sync+0x55/0x61
[ 2866.132281]  [<ffffffff8199d0d2>] xs_tcp_send_request+0x55/0xf1
[ 2866.132281]  [<ffffffff8199bb90>] xprt_transmit+0x89/0x1db
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff81999d92>] call_transmit+0x1c5/0x20e
[ 2866.132281]  [<ffffffff819a0d55>] __rpc_execute+0x6f/0x225
[ 2866.132281]  [<ffffffff81999bcd>] ? call_connect+0x3c/0x3c
[ 2866.132281]  [<ffffffff819a0f33>] rpc_async_schedule+0x28/0x34
[ 2866.132281]  [<ffffffff810835d6>] process_one_work+0x24d/0x47f
[ 2866.132281]  [<ffffffff81083567>] ? process_one_work+0x1de/0x47f
[ 2866.132281]  [<ffffffff819a0f0b>] ? __rpc_execute+0x225/0x225
[ 2866.132281]  [<ffffffff81083a6d>] worker_thread+0x236/0x317
[ 2866.132281]  [<ffffffff81083837>] ? process_scheduled_works+0x2f/0x2f
[ 2866.132281]  [<ffffffff8108b7b8>] kthread+0x9a/0xa2
[ 2866.132281]  [<ffffffff81a12184>] kernel_thread_helper+0x4/0x10
[ 2866.132281]  [<ffffffff81a0a4b0>] ? retint_restore_args+0x13/0x13
[ 2866.132281]  [<ffffffff8108b71e>] ? __init_kthread_worker+0x5a/0x5a
[ 2866.132281]  [<ffffffff81a12180>] ? gs_change+0x13/0x13
[ 2866.308506] IPv4: Attempt to release TCP socket in state 1 ffff880019ec0000
[ 2866.309689] =============================================================================
[ 2866.310254] BUG TCP (Not tainted): Object already free
[ 2866.310254] -----------------------------------------------------------------------------
[ 2866.310254]

The bug comes from the fact that timer set in sk_reset_timer() can run
before we actually do the sock_hold(). socket refcount reaches zero and
we free the socket too soon.

timer handler is not allowed to reduce socket refcnt if socket is owned
by the user, or we need to change sk_reset_timer() implementation.

We should take a reference on the socket in case TCP_DELACK_TIMER_DEFERRED
or TCP_DELACK_TIMER_DEFERRED bit are set in tsq_flags

Also fix a typo in tcp_delack_timer(), where TCP_WRITE_TIMER_DEFERRED
was used instead of TCP_DELACK_TIMER_DEFERRED.

For consistency, use same socket refcount change for TCP_MTU_REDUCED_DEFERRED,
even if not fired from a timer.

Reported-by: Fengguang Wu <fengguang.wu@intel.com>
Tested-by: Fengguang Wu <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-21 14:42:23 -07:00
John W. Linville
01e17dacd4 Merge branch 'for-john' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Conflicts:
	drivers/net/wireless/mac80211_hwsim.c
2012-08-21 16:00:21 -04:00
AceLan Kao
06d7de831d Revert "rfkill: remove dead code"
This reverts commit 2e48928d8a.

Those functions are needed and should not be removed, or
there is no way to set the rfkill led trigger name.

Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-21 20:50:25 +02:00
J. Bruce Fields
d10f27a750 svcrpc: fix svc_xprt_enqueue/svc_recv busy-looping
The rpc server tries to ensure that there will be room to send a reply
before it receives a request.

It does this by tracking, in xpt_reserved, an upper bound on the total
size of the replies that is has already committed to for the socket.

Currently it is adding in the estimate for a new reply *before* it
checks whether there is space available.  If it finds that there is not
space, it then subtracts the estimate back out.

This may lead the subsequent svc_xprt_enqueue to decide that there is
space after all.

The results is a svc_recv() that will repeatedly return -EAGAIN, causing
server threads to loop without doing any actual work.

Cc: stable@vger.kernel.org
Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:39:19 -04:00
J. Bruce Fields
f06f00a24d svcrpc: sends on closed socket should stop immediately
svc_tcp_sendto sets XPT_CLOSE if we fail to transmit the entire reply.
However, the XPT_CLOSE won't be acted on immediately.  Meanwhile other
threads could send further replies before the socket is really shut
down.  This can manifest as data corruption: for example, if a truncated
read reply is followed by another rpc reply, that second reply will look
to the client like further read data.

Symptoms were data corruption preceded by svc_tcp_sendto logging
something like

	kernel: rpc-srv/tcp: nfsd: sent only 963696 when sending 1048708 bytes - shutting down socket

Cc: stable@vger.kernel.org
Reported-by: Malahal Naineni <malahal@us.ibm.com>
Tested-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:59 -04:00
J. Bruce Fields
be1e44441a svcrpc: fix BUG() in svc_tcp_clear_pages
Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
sk_tcplen would lead us to expect n pages of data).

svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
This is OK, since both functions are serialized by XPT_BUSY.  However,
that means the inconsistency must be repaired before dropping XPT_BUSY.

Therefore we should be ensuring that svc_tcp_save_pages repairs the
problem before exiting svc_tcp_recv_record on error.

Symptoms were a BUG() in svc_tcp_clear_pages.

Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2012-08-20 18:38:44 -04:00
Sage Weil
d1c338a509 libceph: delay debugfs initialization until we learn global_id
The debugfs directory includes the cluster fsid and our unique global_id.
We need to delay the initialization of the debug entry until we have
learned both the fsid and our global_id from the monitor or else the
second client can't create its debugfs entry and will fail (and multiple
client instances aren't properly reflected in debugfs).

Reported by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2012-08-20 10:03:15 -07:00
Johannes Berg
dcf33963c4 mac80211: clean up ieee80211_subif_start_xmit
There's no need to carry around a return value
that is always NETDEV_TX_OK anyway.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:46 +02:00
Johannes Berg
fe94fe05e9 mac80211: pass channel to ieee80211_send_probe_req
In multi-channel scenarios, the channel that we will
transmit a probe request on isn't always the current
channel (which will be NULL anyway) but will instead
be the channel that the AP is on. Pass the channel
to the ieee80211_send_probe_req() function so it can
be used in the different scenarios. The scan code
continues to pass the current channel, of course.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:45 +02:00
Johannes Berg
c0af07340a mac80211: convert ops checks to WARN_ON
There's no need to BUG_ON when a driver registers
invalid operations, warn and return an error.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:44 +02:00
Johannes Berg
9b86487043 mac80211: check operating channel in scan
The optimisation of scanning only on the current
channel should check the operating channel. Also
modify it to compare channel pointer rather than
the frequency.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:43 +02:00
Johannes Berg
4797c7ba93 mac80211: use RX status band instead of current band
Even for single-channel devices it is possible that we
switch the channel temporarily (e.g. for scanning) but
while doing so process a received frame that was still
received on the old channel, so checking the current
band is racy. Use the band from status instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:43 +02:00
Johannes Berg
3d01be72e6 mac80211: don't assume channel is set in tracing
With the move to multi-channel and away from
drv_config(), hw.conf.channel will not always
be set, only for devices using the current API
instead of the new channel context APIs. Check
the channel is set before adding its frequency
to the trace data.

Also break some overly long lines in the code.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:42 +02:00
Johannes Berg
f9e6e95b63 mac80211: use oper_channel in rate init
Using hw.conf.channel is wrong as it could be the
temporary channel if the station is added from the
workqueue while the device is already on another
channel. Use oper_channel instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:41 +02:00
Johannes Berg
9e99a127b5 mac80211: remove freq/chantype from debugfs
You can now get these values through iw, and
they conflict with multi-channel work.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:40 +02:00
Johannes Berg
e31583cdf0 mac80211: remove almost unused local variable
In ieee80211_beacon_get_tim() we can use the
txrc.sband instead of a separate local variable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:39 +02:00
Johannes Berg
466f310d10 mac80211: mesh: don't use global channel type
Using local->_oper_channel_type in the mesh code is
completely wrong as this value is the combination
of the various interface channel types and can be
a different value from the mesh interface in case
there are multiple virtual interfaces.

Use sdata->vif.bss_conf.channel_type instead as it
tracks the per-vif channel type.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:13:38 +02:00
Johannes Berg
d348f69f59 mac80211: simplify buffers in aes_128_cmac_vector
There's no need to use a single scratch buffer and
calculate offsets into it, just use two separate
buffers for the separate variables.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 14:03:18 +02:00
Johannes Berg
6d71117a27 mac80211: add IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
Some devices like the current iwlwifi implementation
require that the P2P interface address match the P2P
Device address (only one P2P interface is supported.)
Add the HW flag IEEE80211_HW_P2P_DEV_ADDR_FOR_INTF
that allows drivers to request that P2P Interfaces
added while a P2P Device is active get the same MAC
address by default.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:23 +02:00
Johannes Berg
f142c6b906 mac80211: support P2P Device abstraction
After cfg80211 got a P2P Device abstraction, add
support to mac80211. Whether it really is supported
or not will depend on whether or not the driver has
support for it, but mac80211 needs to change to be
able to support drivers that need a P2P Device.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:22 +02:00
Johannes Berg
98104fdeda cfg80211: add P2P Device abstraction
In order to support using a different MAC address
for the P2P Device address we must first have a
P2P Device abstraction that can be assigned a MAC
address.

This abstraction will also be useful to support
offloading P2P operations to the device, e.g.
periodic listen for discoverability.

Currently, the driver is responsible for assigning
a MAC address to the P2P Device, but this could be
changed by allowing a MAC address to be given to
the NEW_INTERFACE command.

As it has no associated netdev, a P2P Device can
only be identified by its wdev identifier but the
previous patches allowed using the wdev identifier
in various APIs, e.g. remain-on-channel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:58:21 +02:00
Johannes Berg
cc74c0c7d6 mac80211: make ieee80211_beacon_connection_loss_work static
There's no need to declare the function in the
header file since it's only used in a single
place, so make it static.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:50 +02:00
Johannes Berg
5bc1420b11 mac80211: check size of channel switch IE when parsing
The channel switch IE has a fixed size, so we can
discard it in parsing if it's not the right size
and use the right struct pointer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:50 +02:00
Johannes Berg
3049000b97 mac80211: fix CSA handling timer
The time until the channel switch is in TU,
not in milliseconds, so use TU_TO_EXP_TIME()
to correctly program the timer.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:49 +02:00
Johannes Berg
57eebdf3c2 mac80211: clean up CSA handling code
Clean up the CSA handling code by moving some
of it out of the if and using a C99 initializer
for the struct passed to the driver method.

While at it, also add a comment that we should
wait for a beacon after switching the channel.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:48 +02:00
Johannes Berg
90bcf867ce mac80211: remove unneeded 'bssid' variable
There's no need to copy the BSSID just to print
it, remove the unnecessary variable.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:57:47 +02:00
Johannes Berg
4c29867790 mac80211: support A-MPDU status reporting
Support getting A-MPDU status information from the
drivers and reporting it to userspace via radiotap
in the standard fields.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:22 +02:00
Johannes Berg
48613ece3d wireless: add radiotap A-MPDU status field
Define the A-MPDU status field in radiotap, also
update the radiotap parser for it and the MCS field
that was apparently missed last time.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:53:09 +02:00
Antonio Quartulli
e687f61eed mac80211: add supported rates change notification in IBSS
In IBSS it is possible that the supported rates set for a station changes over
time (e.g. it gets first initialised as an empty set because of no available
information about rates and updated later). In this case the driver has to be
notified about the change in order to update its internal table accordingly (if
needed).

This behaviour is needed by all those drivers that handle rc internally but
leave stations management to mac80211

Reported-by: Gui Iribarren <gui@altermundi.net>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
[Johannes - add docs, validate IBSS mode only, fix compilation]
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:31:43 +02:00
Thomas Pedersen
4bd4c2dd8e mac80211: clean up mpath_move_to_queue()
Use skb_queue_walk_safe instead, and fix a few issues:

	- didn't free old skbs on moving
	- didn't react to failed skb alloc
	- needlessly held a local pointer to the destination frame queue
	- didn't check destination queue length before adding skb

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:25:05 +02:00
Thomas Pedersen
b22bd5221c mac80211: use skb_queue_walk() in mesh_path_assign_nexthop
Since all we really want is just to iterate over all skbs, do just that
and avoid (de)queueing to a clusmy tmpq.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:25:05 +02:00
Eyal Shapira
aa7a00809c mac80211: avoid using synchronize_rcu in ieee80211_set_probe_resp
This could take a while (100ms+) and may delay sending assoc resp
in AP mode with WPS or P2P GO (as setting the probe resp takes place
there). We've encountered situations where the delay was big enough
to cause connection problems with devices like Galaxy Nexus.
Switch to using call_rcu with a free handler.

[Arik - rework to use plain buffer and instead of skb]

Signed-off-by: Eyal Shapira <eyal@wizery.com>
Signed-off-by: Arik Nemtsov <arik@wizery.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2012-08-20 13:20:56 +02:00
Patrick McHardy
2834a6386b netfilter: nf_conntrack: remove unnecessary RTNL locking
Locking the rtnl was added to nf_conntrack_l{3,4}_proto_unregister()
for walking the network namespace list. This is not done anymore since
we have proper namespace support in the protocols now, so we don't
need to take the RTNL anymore.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:46:29 +02:00
Patrick McHardy
fe31d1a860 netfilter: sparse endian fixes
Fix a couple of endian annotation in net/netfilter:

net/netfilter/nfnetlink_acct.c:82:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_acct.c:86:30: warning: cast to restricted __be64
net/netfilter/nfnetlink_cthelper.c:77:28: warning: cast to restricted __be16
net/netfilter/xt_NFQUEUE.c:46:16: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:60:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_NFQUEUE.c:68:34: warning: restricted __be32 degrades to integer
net/netfilter/xt_osf.c:272:55: warning: cast to restricted __be16

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:45:57 +02:00
Patrick McHardy
2dba62c30e netfilter: nfnetlink_log: fix NLA_PUT macro removal bug
Commit 1db20a52 (nfnetlink_log: Stop using NLA_PUT*().) incorrectly
converted a NLA_PUT_BE16 macro to nla_put_be32() in nfnetlink_log:

-               NLA_PUT_BE16(inst->skb, NFULA_HWTYPE, htons(skb->dev->type));
+               if (nla_put_be32(inst->skb, NFULA_HWTYPE, htons(skb->dev->type))

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-20 12:40:23 +02:00
Neal Cardwell
fae6ef87fa net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
This commit removes the sk_rx_dst_set calls from
tcp_create_openreq_child(), because at that point the icsk_af_ops
field of ipv6_mapped TCP sockets has not been set to its proper final
value.

Instead, to make sure we get the right sk_rx_dst_set variant
appropriate for the address family of the new connection, we have
tcp_v{4,6}_syn_recv_sock() directly call the appropriate function
shortly after the call to tcp_create_openreq_child() returns.

This also moves inet6_sk_rx_dst_set() to avoid a forward declaration
with the new approach.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Reported-by: Artem Savkov <artem.savkov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 03:03:33 -07:00
Randy Dunlap
3de7a37b02 net/core/dev.c: fix kernel-doc warning
Fix kernel-doc warning:

Warning(net/core/dev.c:5745): No description found for parameter 'dev'

Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Cc:	"David S. Miller" <davem@davemloft.net>
Cc:	netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 03:00:55 -07:00
Patrick McHardy
9d7b0fc1ef net: ipv6: fix oops in inet_putpeer()
Commit 97bab73f (inet: Hide route peer accesses behind helpers.) introduced
a bug in xfrm6_policy_destroy(). The xfrm_dst's _rt6i_peer member is not
initialized, causing a false positive result from inetpeer_ptr_is_peer(),
which in turn causes a NULL pointer dereference in inet_putpeer().

Pid: 314, comm: kworker/0:1 Not tainted 3.6.0-rc1+ #17 To Be Filled By O.E.M. To Be Filled By O.E.M./P4S800D-X
EIP: 0060:[<c03abf93>] EFLAGS: 00010246 CPU: 0
EIP is at inet_putpeer+0xe/0x16
EAX: 00000000 EBX: f3481700 ECX: 00000000 EDX: 000dd641
ESI: f3481700 EDI: c05e949c EBP: f551def4 ESP: f551def4
 DS: 007b ES: 007b FS: 0000 GS: 00e0 SS: 0068
CR0: 8005003b CR2: 00000070 CR3: 3243d000 CR4: 00000750
DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
DR6: ffff0ff0 DR7: 00000400
 f551df04 c0423de1 00000000 f3481700 f551df18 c038d5f7 f254b9f8 f551df28
 f34f85d8 f551df20 c03ef48d f551df3c c0396870 f30697e8 f24e1738 c05e98f4
 f5509540 c05cd2b4 f551df7c c0142d2b c043feb5 f5509540 00000000 c05cd2e8
 [<c0423de1>] xfrm6_dst_destroy+0x42/0xdb
 [<c038d5f7>] dst_destroy+0x1d/0xa4
 [<c03ef48d>] xfrm_bundle_flo_delete+0x2b/0x36
 [<c0396870>] flow_cache_gc_task+0x85/0x9f
 [<c0142d2b>] process_one_work+0x122/0x441
 [<c043feb5>] ? apic_timer_interrupt+0x31/0x38
 [<c03967eb>] ? flow_cache_new_hashrnd+0x2b/0x2b
 [<c0143e2d>] worker_thread+0x113/0x3cc

Fix by adding a init_dst() callback to struct xfrm_policy_afinfo to
properly initialize the dst's peer pointer.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:56:56 -07:00
Jesper Juhl
d92c7f8aab caif: Do not dereference NULL in chnl_recv_cb()
In net/caif/chnl_net.c::chnl_recv_cb() we call skb_header_pointer()
which may return NULL, but we do not check for a NULL pointer before
dereferencing it.
This patch adds such a NULL check and properly free's allocated memory
and return an error (-EINVAL) on failure - much better than crashing..

Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Acked-by: Sjur Brændeland <sjur.brandeland@stericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:47:49 -07:00
David S. Miller
6c71bec66a Merge git://1984.lsi.us.es/nf
Pable Neira Ayuso says:

====================
The following five patches contain fixes for 3.6-rc, they are:

* Two fixes for message parsing in the SIP conntrack helper, from
  Patrick McHardy.

* One fix for the SIP helper introduced in the user-space cthelper
  infrastructure, from Patrick McHardy.

* fix missing appropriate locking while modifying one conntrack entry
  from the nfqueue integration code, from myself.

* fix possible access to uninitiliazed timer in the nf_conntrack
  expectation infrastructure, from myself.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:44:29 -07:00
Eric Leblond
c0de08d042 af_packet: don't emit packet on orig fanout group
If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.

This patch fixes the issue by changing the transmission check to
take fanout group info account.

Reported-by: Aleksandr Kotov <a1k@mail.ru>
Signed-off-by: Eric Leblond <eric@regit.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:37:29 -07:00
Ying Xue
e6a04b1d3f tipc: eliminate configuration for maximum number of name publications
Gets rid of the need for users to specify the maximum number of
name publications supported by TIPC. TIPC now automatically provides
support for the maximum number of name publications to 65535.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
34f256cc79 tipc: eliminate configuration for maximum number of name subscriptions
Gets rid of the need for users to specify the maximum number of
name subscriptions supported by TIPC. TIPC now automatically provides
support for the maximum number of name subscriptions to 65535.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
61cdd4d80b tipc: add __read_mostly annotations to several global variables
Added to the following:

 - tipc_random
 - tipc_own_addr
 - tipc_max_ports
 - tipc_net_id
 - tipc_remote_management
 - handler_enabled

The above global variables are read often, but written rarely. Use
__read_mostly to prevent them being on the same cacheline as another
variable which is written to often, which would cause cacheline
bouncing.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
f046e7d9be tipc: convert tipc_nametbl_size type from variable to macro
There is nothing changing this variable dynamically, so change
it to a macro to make that more obvious when reading the code.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:31 -07:00
Ying Xue
379c0456af tipc: change tipc_net_start routine return value type
Since now tipc_net_start() always returns a success code - 0, its
return value type should be changed from integer to void, which can
avoid unnecessary check for its return value.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
381294331e tipc: manually inline single use media_name_valid routine
After eliminating the mechanism which checks whether all letters
in media name string are within a given character set, the
media_name_valid routine becomes trivial.  It is also only
used once, so it is unnecessary to keep it as a separate function.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
fc0739385b tipc: remove pointless name sanity check and tipc_alphabet array
There is no real reason to check whether all letters in the given
media name and network interface name are within the character set
defined in tipc_alphabet array. Even if we eliminate the checking,
the rest of checking conditions in tipc_enable_bearer() can ensure
we do not enable an invalid or illegal bearer.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
4225a398c1 tipc: fix lockdep warning during bearer initialization
When the lockdep validator is enabled, it will report the below
warning when we enable a TIPC bearer:

[ INFO: possible irq lock inversion dependency detected ]
---------------------------------------------------------
Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(ptype_lock);
                                local_irq_disable();
                                lock(tipc_net_lock);
                                lock(ptype_lock);
   <Interrupt>
   lock(tipc_net_lock);

  *** DEADLOCK ***

the shortest dependencies between 2nd lock and 1st lock:
  -> (ptype_lock){+.+...} ops: 10 {
[...]
SOFTIRQ-ON-W at:
                      [<c1089418>] __lock_acquire+0x528/0x13e0
                      [<c108a360>] lock_acquire+0x90/0x100
                      [<c1553c38>] _raw_spin_lock+0x38/0x50
                      [<c14651ca>] dev_add_pack+0x3a/0x60
                      [<c182da75>] arp_init+0x1a/0x48
                      [<c182dce5>] inet_init+0x181/0x27e
                      [<c1001114>] do_one_initcall+0x34/0x170
                      [<c17f7329>] kernel_init+0x110/0x1b2
                      [<c155b6a2>] kernel_thread_helper+0x6/0x10
[...]
   ... key      at: [<c17e4b10>] ptype_lock+0x10/0x20
   ... acquired at:
    [<c108a360>] lock_acquire+0x90/0x100
    [<c1553c38>] _raw_spin_lock+0x38/0x50
    [<c14651ca>] dev_add_pack+0x3a/0x60
    [<c8bc18d2>] enable_bearer+0xf2/0x140 [tipc]
    [<c8bb283a>] tipc_enable_bearer+0x1ba/0x450 [tipc]
    [<c8bb3a04>] tipc_cfg_do_cmd+0x5c4/0x830 [tipc]
    [<c8bbc032>] handle_cmd+0x42/0xd0 [tipc]
    [<c148e802>] genl_rcv_msg+0x232/0x280
    [<c148d3f6>] netlink_rcv_skb+0x86/0xb0
    [<c148e5bc>] genl_rcv+0x1c/0x30
    [<c148d144>] netlink_unicast+0x174/0x1f0
    [<c148ddab>] netlink_sendmsg+0x1eb/0x2d0
    [<c1456bc1>] sock_aio_write+0x161/0x170
    [<c1135a7c>] do_sync_write+0xac/0xf0
    [<c11360f6>] vfs_write+0x156/0x170
    [<c11361e2>] sys_write+0x42/0x70
    [<c155b0df>] sysenter_do_call+0x12/0x38
[...]
}
  -> (tipc_net_lock){+..-..} ops: 4 {
[...]
    IN-SOFTIRQ-R at:
                     [<c108953a>] __lock_acquire+0x64a/0x13e0
                     [<c108a360>] lock_acquire+0x90/0x100
                     [<c15541cd>] _raw_read_lock_bh+0x3d/0x50
                     [<c8bb874d>] tipc_recv_msg+0x1d/0x830 [tipc]
                     [<c8bc195f>] recv_msg+0x3f/0x50 [tipc]
                     [<c146a5fa>] __netif_receive_skb+0x22a/0x590
                     [<c146ab0b>] netif_receive_skb+0x2b/0xf0
                     [<c13c43d2>] pcnet32_poll+0x292/0x780
                     [<c146b00a>] net_rx_action+0xfa/0x1e0
                     [<c103a4be>] __do_softirq+0xae/0x1e0
[...]
}

>From the log, we can see three different call chains between
CPU0 and CPU1:

Time 0 on CPU0:

  kernel_init()->inet_init()->dev_add_pack()

At time 0, the ptype_lock is held by CPU0 in dev_add_pack();

Time 1 on CPU1:

  tipc_enable_bearer()->enable_bearer()->dev_add_pack()

At time 1, tipc_enable_bearer() first holds tipc_net_lock, and then
wants to take ptype_lock to register TIPC protocol handler into the
networking stack.  But the ptype_lock has been taken by dev_add_pack()
on CPU0, so at this time the dev_add_pack() running on CPU1 has to be
busy looping.

Time 2 on CPU0:

  netif_receive_skb()->recv_msg()->tipc_recv_msg()

At time 2, an incoming TIPC packet arrives at CPU0, hence
tipc_recv_msg() will be invoked. In tipc_recv_msg(), it first wants
to hold tipc_net_lock.  At the moment, below scenario happens:

On CPU0, below is our sequence of taking locks:

  lock(ptype_lock)->lock(tipc_net_lock)

On CPU1, our sequence of taking locks looks like:

  lock(tipc_net_lock)->lock(ptype_lock)

Obviously deadlock may happen in this case.

But please note the deadlock possibly doesn't occur at all when the
first TIPC bearer is enabled.  Before enable_bearer() -- running on
CPU1 does not hold ptype_lock, so the TIPC receive handler (i.e.
recv_msg()) is not registered successfully via dev_add_pack(), so
the tipc_recv_msg() cannot be called by recv_msg() even if a TIPC
message comes to CPU0. But when the second TIPC bearer is
registered, the deadlock can perhaps really happen.

To fix it, we will push the work of registering TIPC protocol
handler into workqueue context. After the change, both paths taking
ptype_lock are always in process contexts, thus, the deadlock should
never occur.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Ying Xue
fa7f86f1bb tipc: optimize the initialization of network device notifier
Ethernet media initialization is only done when TIPC is started or
switched to network mode. So the initialization of the network device
notifier structure can be moved out of this function and done
statically instead.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:26:30 -07:00
Pavel Emelyanov
fff3321d75 packet: Report fanout status via diag engine
Reported value is the same reported by the FANOUT getsockoption, but
unlike it, the absent fanout setup results in absent nlattr, rather
than in nlattr with zero value. This is done so, since zero fanout
report may mean both -- no fanout, and fanout with both id and type zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:23:14 -07:00
Pavel Emelyanov
16f01365fa packet: Report rings cfg via diag engine
One extension bit may result in two nlattrs -- one per ring type.
If some ring type is not configured, then the respective nlatts
will be empty.

The structure reported contains the data, that is given to the
corresponding ring setup socket option.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-20 02:23:14 -07:00
Dan Carpenter
5ef5d6c569 gre: information leak in ip6_tnl_ioctl()
There is a one byte hole between p->hop_limit and p->flowinfo where
stack memory is leaked to the user.  This was introduced in c12b395a46
"gre: Support GRE over IPv6".

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
2012-08-20 02:21:30 -07:00
Fan Du
56892261ed xfrm: Use rcu_dereference_bh to deference pointer protected by rcu_read_lock_bh
Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 19:39:22 -07:00
Dan Carpenter
898132ae76 ipv6: move dereference after check in fl_free()
There is a dereference before checking for NULL bug here.  Generally
free() functions should accept NULL pointers.  For example, fl_create()
can pass a NULL pointer to fl_free() on the error path.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-16 16:04:42 -07:00
John Fastabend
476ad154f3 net: netprio: fix cgrp create and write priomap race
A race exists where creating cgroups and also updating the priomap
may result in losing a priomap update. This is because priomap
writers are not protected by rtnl_lock.

Move priority writer into rtnl_lock()/rtnl_unlock().

CC: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:56:11 -07:00
John Fastabend
48a87cc26c net: netprio: fd passed in SCM_RIGHTS datagram not set correctly
A socket fd passed in a SCM_RIGHTS datagram was not getting
updated with the new tasks cgrp prioidx. This leaves IO on
the socket tagged with the old tasks priority.

To fix this add a check in the scm recvmsg path to update the
sock cgrp prioidx with the new tasks value.

Thanks to Al Viro for catching this.

CC: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:56:11 -07:00
John Fastabend
f796c20cf6 net: netprio: fix files lock and remove useless d_path bits
Add lock to prevent a race with a file closing and also remove
useless and ugly sscanf code. The extra code was never needed
and the case it supposedly protected against is in fact handled
correctly by sock_from_file as pointed out by Al Viro.

CC: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:56:11 -07:00
Jason Wang
16c0b164bd act_mirred: do not drop packets when fails to mirror it
We drop packet unconditionally when we fail to mirror it. This is not intended
in some cases. Consdier for kvm guest, we may mirror the traffic of the bridge
to a tap device used by a VM. When kernel fails to mirror the packet in
conditions such as when qemu crashes or stop polling the tap, it's hard for the
management software to detect such condition and clean the the mirroring
before. This would lead all packets to the bridge to be dropped and break the
netowrk of other virtual machines.

To solve the issue, the patch does not drop packets when kernel fails to mirror
it, and only drop the redirected packets.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 14:54:44 -07:00
Dan Carpenter
02644a1745 sctp: fix bogus if statement in sctp_auth_recv_cid()
There is an extra semi-colon here, so we always return 0 instead of
calling __sctp_auth_cid().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 13:36:29 -07:00
Ulrich Weber
6932f119bd sctp: fix compile issue with disabled CONFIG_NET_NS
struct seq_net_private has no struct net
if CONFIG_NET_NS is not enabled

Signed-off-by: Ulrich Weber <ulrich.weber@sophos.com>
Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-16 13:36:29 -07:00
Pablo Neira Ayuso
2614f86490 netfilter: nf_ct_expect: fix possible access to uninitialized timer
In __nf_ct_expect_check, the function refresh_timer returns 1
if a matching expectation is found and its timer is successfully
refreshed. This results in nf_ct_expect_related returning 0.
Note that at this point:

- the passed expectation is not inserted in the expectation table
  and its timer was not initialized, since we have refreshed one
  matching/existing expectation.

- nf_ct_expect_alloc uses kmem_cache_alloc, so the expectation
  timer is in some undefined state just after the allocation,
  until it is appropriately initialized.

This can be a problem for the SIP helper during the expectation
addition:

 ...
 if (nf_ct_expect_related(rtp_exp) == 0) {
         if (nf_ct_expect_related(rtcp_exp) != 0)
                 nf_ct_unexpect_related(rtp_exp);
 ...

Note that nf_ct_expect_related(rtp_exp) may return 0 for the timer refresh
case that is detailed above. Then, if nf_ct_unexpect_related(rtcp_exp)
returns != 0, nf_ct_unexpect_related(rtp_exp) is called, which does:

 spin_lock_bh(&nf_conntrack_lock);
 if (del_timer(&exp->timeout)) {
         nf_ct_unlink_expect(exp);
         nf_ct_expect_put(exp);
 }
 spin_unlock_bh(&nf_conntrack_lock);

Note that del_timer always returns false if the timer has been
initialized.  However, the timer was not initialized since setup_timer
was not called, therefore, the expectation timer remains in some
undefined state. If I'm not missing anything, this may lead to the
removal an unexistent expectation.

To fix this, the optimization that allows refreshing an expectation
is removed. Now nf_conntrack_expect_related looks more consistent
to me since it always add the expectation in case that it returns
success.

Thanks to Patrick McHardy for participating in the discussion of
this patch.

I think this may be the source of the problem described by:
http://marc.info/?l=netfilter-devel&m=134073514719421&w=2

Reported-by: Rafal Fitt <rafalf@aplusc.com.pl>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2012-08-16 11:49:53 +02:00
Mathias Krause
43da5f2e0d net: fix info leak in compat dev_ifconf()
The implementation of dev_ifconf() for the compat ioctl interface uses
an intermediate ifc structure allocated in userland for the duration of
the syscall. Though, it fails to initialize the padding bytes inserted
for alignment and that for leaks four bytes of kernel stack. Add an
explicit memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
2d8a041b7b ipvs: fix info leak in getsockopt(IP_VS_SO_GET_TIMEOUT)
If at least one of CONFIG_IP_VS_PROTO_TCP or CONFIG_IP_VS_PROTO_UDP is
not set, __ip_vs_get_timeouts() does not fully initialize the structure
that gets copied to userland and that for leaks up to 12 bytes of kernel
stack. Add an explicit memset(0) before passing the structure to
__ip_vs_get_timeouts() to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Wensong Zhang <wensong@linux-vs.org>
Cc: Simon Horman <horms@verge.net.au>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
7b07f8eb75 dccp: fix info leak via getsockopt(DCCP_SOCKOPT_CCID_TX_INFO)
The CCID3 code fails to initialize the trailing padding bytes of struct
tfrc_tx_info added for alignment on 64 bit architectures. It that for
potentially leaks four bytes kernel stack via the getsockopt() syscall.
Add an explicit memset(0) before filling the structure to avoid the
info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
276bdb82de dccp: check ccid before dereferencing
ccid_hc_rx_getsockopt() and ccid_hc_tx_getsockopt() might be called with
a NULL ccid pointer leading to a NULL pointer dereference. This could
lead to a privilege escalation if the attacker is able to map page 0 and
prepare it with a fake ccid_ops pointer.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Cc: stable@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
3592aaeb80 llc: fix info leak via getsockname()
The LLC code wrongly returns 0, i.e. "success", when the socket is
zapped. Together with the uninitialized uaddrlen pointer argument from
sys_getsockname this leads to an arbitrary memory leak of up to 128
bytes kernel stack via the getsockname() syscall.

Return an error instead when the socket is zapped to prevent the info
leak. Also remove the unnecessary memset(0). We don't directly write to
the memory pointed by uaddr but memcpy() a local structure at the end of
the function that is properly initialized.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
04d4fbca10 l2tp: fix info leak via getsockname()
The L2TP code for IPv6 fails to initialize the l2tp_unused member of
struct sockaddr_l2tpip6 and that for leaks two bytes kernel stack via
the getsockname() syscall. Initialize l2tp_unused with 0 to avoid the
info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
792039c73c Bluetooth: L2CAP - Fix info leak via getsockname()
The L2CAP code fails to initialize the l2_bdaddr_type member of struct
sockaddr_l2 and the padding byte added for alignment. It that for leaks
two bytes kernel stack via the getsockname() syscall. Add an explicit
memset(0) before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
9344a97296 Bluetooth: RFCOMM - Fix info leak via getsockname()
The RFCOMM code fails to initialize the trailing padding byte of struct
sockaddr_rc added for alignment. It that for leaks one byte kernel stack
via the getsockname() syscall. Add an explicit memset(0) before filling
the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:31 -07:00
Mathias Krause
f9432c5ec8 Bluetooth: RFCOMM - Fix info leak in ioctl(RFCOMMGETDEVLIST)
The RFCOMM code fails to initialize the two padding bytes of struct
rfcomm_dev_list_req inserted for alignment before copying it to
userland. Additionally there are two padding bytes in each instance of
struct rfcomm_dev_info. The ioctl() that for disclosures two bytes plus
dev_num times two bytes uninitialized kernel heap memory.

Allocate the memory using kzalloc() to fix this issue.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
9ad2de43f1 Bluetooth: RFCOMM - Fix info leak in getsockopt(BT_SECURITY)
The RFCOMM code fails to initialize the key_size member of struct
bt_security before copying it to userland -- that for leaking one
byte kernel stack. Initialize key_size with 0 to avoid the info
leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
3f68ba07b1 Bluetooth: HCI - Fix info leak via getsockname()
The HCI code fails to initialize the hci_channel member of struct
sockaddr_hci and that for leaks two bytes kernel stack via the
getsockname() syscall. Initialize hci_channel with 0 to avoid the
info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
e15ca9a0ef Bluetooth: HCI - Fix info leak in getsockopt(HCI_FILTER)
The HCI code fails to initialize the two padding bytes of struct
hci_ufilter before copying it to userland -- that for leaking two
bytes kernel stack. Add an explicit memset(0) before filling the
structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Cc: Marcel Holtmann <marcel@holtmann.org>
Cc: Gustavo Padovan <gustavo@padovan.org>
Cc: Johan Hedberg <johan.hedberg@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
3c0c5cfdcd atm: fix info leak via getsockname()
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
Mathias Krause
e862f1a9b7 atm: fix info leak in getsockopt(SO_ATMPVC)
The ATM code fails to initialize the two padding bytes of struct
sockaddr_atmpvc inserted for alignment. Add an explicit memset(0)
before filling the structure to avoid the info leak.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 21:36:30 -07:00
David S. Miller
2ea214929d Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:

====================
This is a batch of updates intended for 3.7.  The ath9k, mwifiex,
and b43 drivers get the bulk of the commits this time, with a handful
of other driver bits thrown-in.  It is mostly just minor fixes and
cleanups, etc.

Also included is a Bluetooth pull, with a lot of refactoring.
Gustavo says:

	"These are the changes I queued for 3.7. There are a many
	small fixes/improvements by Andre Guedes. A l2cap channel
	refcounting refactor by Jaganath. Bluetooth sockets now
	appears in /proc/net, by Masatake Yamato and Sachin Kamat
	changes ours drivers to use devm_kzalloc()."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:26:05 -07:00
Razvan Ghitulete
1b3a6926fc net: remove wrong initialization for snd_wl1
The field tp->snd_wl1 is twice initialized, the second time
seems to be wrong as it may overwrite any update in tcp_ack.

Signed-off-by: Razvan Ghitulete <rghitulete@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:23:46 -07:00
Fan Du
65e0736bc2 xfrm: remove redundant parameter "int dir" in struct xfrm_mgr.acquire
Sematically speaking, xfrm_mgr.acquire is called when kernel intends to ask
user space IKE daemon to negotiate SAs with peers. IOW the direction will
*always* be XFRM_POLICY_OUT, so remove int dir for clarity.

Signed-off-by: Fan Du <fan.du@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:13:30 -07:00
Stephen Hemminger
c03307eab6 bridge: fix rcu dereference outside of rcu_read_lock
Alternative solution for problem found by Linux Driver Verification
project (linuxtesting.org).

As it noted in the comment before the br_handle_frame_finish
function, this function should be called under rcu_read_lock.

The problem callgraph:
br_dev_xmit -> br_nf_pre_routing_finish_bridge_slow ->
 -> br_handle_frame_finish -> br_port_get_rcu -> rcu_dereference

And in this case there is no read-lock section.

Reported-by: Denis Efremov <yefremov.denis@gmail.com>
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-15 15:09:41 -07:00
John W. Linville
16698918cd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next into for-davem 2012-08-15 14:29:37 -04:00
Eric W. Biederman
e1fc3b14f9 sctp: Make sysctl tunables per net
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:32:16 -07:00
Eric W. Biederman
f53b5b097e sctp: Push struct net down into sctp_verify_ext_param
Add struct net as a parameter to sctp_verify_param so it can be passed
to sctp_verify_ext_param where struct net will be needed when the sctp
tunables become per net tunables.

Add struct net as a parameter to sctp_verify_init so struct net can be
passed to sctp_verify_param.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
24cb81a6a9 sctp: Push struct net down into all of the state machine functions
There are a handle of state machine functions primarily those dealing
with processing INIT packets where there is neither a valid endpoint nor
a valid assoication from which to derive a struct net.  Therefore add
struct net * to the parameter list of sctp_state_fn_t and update all of
the state machine functions.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
e7ff4a7037 sctp: Push struct net down into sctp_in_scope
struct net will be needed shortly when the tunables are made per network
namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
89bf3450cb sctp: Push struct net down into sctp_transport_init
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
55e26eb95a sctp: Push struct net down to sctp_chunk_event_lookup
This trickles up through sctp_sm_lookup_event up to sctp_do_sm
and up further into sctp_primitiv_NAME before the code reaches
places where struct net can be reliably found.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
ebb7e95d93 sctp: Add infrastructure for per net sysctls
Start with an empty sctp_net_table that will be populated as the various
tunable sysctls are made per net.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:37 -07:00
Eric W. Biederman
b01a24078f sctp: Make the mib per network namespace
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:30:36 -07:00
Eric W. Biederman
bb2db45b54 sctp: Enable sctp in all network namespaces
- Fix the sctp_af operations to work in all namespaces
- Enable sctp socket creation in all network namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:29:59 -07:00
Eric W. Biederman
13d782f6b4 sctp: Make the proc files per network namespace.
- Convert all of the files under /proc/net/sctp to be per
  network namespace.

- Don't print anything for /proc/net/sctp/snmp except in
  the initial network namespaces as the snmp counters still
  have to be converted to be per network namespace.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:29:53 -07:00
Eric W. Biederman
632c928a6a sctp: Move the percpu sockets counter out of sctp_proc_init
The percpu sctp socket counter has nothing at all to do with the sctp
proc files, and having it in the wrong initialization is confusing,
and makes network namespace support a pain.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:17:26 -07:00
Eric W. Biederman
2ce9550350 sctp: Make the ctl_sock per network namespace
- Kill sctp_get_ctl_sock, it is useless now.
- Pass struct net where needed so net->sctp.ctl_sock is accessible.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:17:26 -07:00
Eric W. Biederman
4db67e8086 sctp: Make the address lists per network namespace
- Move the address lists into struct net
- Add per network namespace initialization and cleanup
- Pass around struct net so it is everywhere I need it.
- Rename all of the global variable references into references
  to the variables moved into struct net

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 23:12:17 -07:00
Eric W. Biederman
4110cc255d sctp: Make the association hashtable handle multiple network namespaces
- Use struct net in the hash calculation
- Use sock_net(association.base.sk) in the association lookups.
- On receive calculate the network namespace from skb->dev.
- Pass struct net from receive down to the functions that actually
  do the association lookup.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
4cdadcbcb6 sctp: Make the endpoint hashtable handle multiple network namespaces
- Use struct net in the hash calculation
- Use sock_net(endpoint.base.sk) in the endpoint lookups.
- On receive calculate the network namespace from skb->dev.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
f1f4376307 sctp: Make the port hash table use struct net in it's key.
- Add struct net into the port hash table hash calculation
- Add struct net inot the struct sctp_bind_bucket so there
  is a memory of which network namespace a port is allocated in.
  No need for a ref count because sctp_bind_bucket only exists
  when there are sockets in the hash table and sockets can not
  change their network namspace, and sockets already ref count
  their network namespace.
- Add struct net into the key comparison when we are testing
  to see if we have found the port hash table entry we are
  looking for.

With these changes lookups in the port hash table becomes
safe to use in multiple network namespaces.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 22:44:12 -07:00
Eric W. Biederman
26711a791e userns: xt_owner: Add basic user namespace support.
- Only allow adding matches from the initial user namespace
- Add the appropriate conversion functions to handle matches
  against sockets in other user namespaces.

Cc: Jan Engelhardt <jengelh@medozas.de>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:30 -07:00
Eric W. Biederman
da7428080a userns xt_recent: Specify the owner/group of ip_list_perms in the initial user namespace
xt_recent creates a bunch of proc files and initializes their uid
and gids to the values of ip_list_uid and ip_list_gid.  When
initialize those proc files convert those values to kuids so they
can continue to reside on the /proc inode.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Jan Engelhardt <jengelh@medozas.de>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:29 -07:00
Eric W. Biederman
8c6e2a941a userns: Convert xt_LOG to print socket kuids and kgids as uids and gids
xt_LOG always writes messages via sb_add via printk.  Therefore when
xt_LOG logs the uid and gid of a socket a packet came from the
values should be converted to be in the initial user namespace.

Thus making xt_LOG as user namespace safe as possible.

Cc: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:29 -07:00
Eric W. Biederman
a6c6796c71 userns: Convert cls_flow to work with user namespaces enabled
The flow classifier can use uids and gids of the sockets that
are transmitting packets and do insert those uids and gids
into the packet classification calcuation.  I don't fully
understand the details but it appears that we can depend
on specific uids and gids when making traffic classification
decisions.

To work with user namespaces enabled map from kuids and kgids
into uids and gids in the initial user namespace giving raw
integer values the code can play with and depend on.

To avoid issues of userspace depending on uids and gids in
packet classifiers installed from other user namespaces
and getting confused deny all packet classifiers that
use uids or gids that are not comming from a netlink socket
in the initial user namespace.

Cc: Patrick McHardy <kaber@trash.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Changli Gao <xiaosuo@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:28 -07:00
Eric W. Biederman
af4c6641f5 net sched: Pass the skb into change so it can access NETLINK_CB
cls_flow.c plays with uids and gids.  Unless I misread that
code it is possible for classifiers to depend on the specific uid and
gid values.  Therefore I need to know the user namespace of the
netlink socket that is installing the packet classifiers.  Pass
in the rtnetlink skb so I can access the NETLINK_CB of the passed
packet.  In particular I want access to sk_user_ns(NETLINK_CB(in_skb).ssk).

Pass in not the user namespace but the incomming rtnetlink skb into
the the classifier change routines as that is generally the more useful
parameter.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:28 -07:00
Eric W. Biederman
9eea9515cb userns: nfnetlink_log: Report socket uids in the log sockets user namespace
At logging instance creation capture the peer netlink socket's user
namespace. Use the captured peer user namespace when reporting socket
uids to the peer.

The peer socket's user namespace is guaranateed to be valid until the user
closes the netlink socket.  nfnetlink_log removes instances during the final
close of a socket.  __build_packet_message does not get called after an
instance is destroyed.   Therefore it is safe to let the peer netlink socket
take care of the user namespace reference counting for us.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:27 -07:00
Eric W. Biederman
d06ca95643 userns: Teach inet_diag to work with user namespaces
Compute the user namespace of the socket that we are replying to
and translate the kuids of reported sockets into that user namespace.

Cc: Andrew Vagin <avagin@openvz.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Pavel Emelyanov <xemul@parallels.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:55:20 -07:00
Eric W. Biederman
3fbc290540 netlink: Make the sending netlink socket availabe in NETLINK_CB
The sending socket of an skb is already available by it's port id
in the NETLINK_CB.  If you want to know more like to examine the
credentials on the sending socket you have to look up the sending
socket by it's port id and all of the needed functions and data
structures are static inside of af_netlink.c.  So do the simple
thing and pass the sending socket to the receivers in the NETLINK_CB.

I intend to use this to get the user namespace of the sending socket
in inet_diag so that I can report uids in the context of the process
who opened the socket, the same way I report uids in the contect
of the process who opens files.

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:49 -07:00
Eric W. Biederman
d13fda8564 userns: Convert net/ax25 to use kuid_t where appropriate
Cc: Ralf Baechle <ralf@linux-mips.org>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:42 -07:00
Eric W. Biederman
4f82f45730 net ip6 flowlabel: Make owner a union of struct pid * and kuid_t
Correct a long standing omission and use struct pid in the owner
field of struct ip6_flowlabel when the share type is IPV6_FL_S_PROCESS.
This guarantees we don't have issues when pid wraparound occurs.

Use a kuid_t in the owner field of struct ip6_flowlabel when the
share type is IPV6_FL_S_USER to add user namespace support.

In /proc/net/ip6_flowlabel capture the current pid namespace when
opening the file and release the pid namespace when the file is
closed ensuring we print the pid owner value that is meaning to
the reader of the file.  Similarly use from_kuid_munged to print
uid values that are meaningful to the reader of the file.

This requires exporting pid_nr_ns so that ipv6 can continue to built
as a module.  Yoiks what silliness

Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:25 -07:00
Eric W. Biederman
7064d16e16 userns: Use kgids for sysctl_ping_group_range
- Store sysctl_ping_group_range as a paire of kgid_t values
  instead of a pair of gid_t values.
- Move the kgid conversion work from ping_init_sock into ipv4_ping_group_range
- For invalid cases reset to the default disabled state.

With the kgid_t conversion made part of the original value sanitation
from userspace understand how the code will react becomes clearer
and it becomes possible to set the sysctl ping group range from
something other than the initial user namespace.

Cc: Vasiliy Kulikov <segoon@openwall.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:49:10 -07:00
Eric W. Biederman
a7cb5a49bf userns: Print out socket uids in a user namespace aware fashion.
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:48:06 -07:00
Eric W. Biederman
976d020150 userns: Convert sock_i_uid to return a kuid_t
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:47:34 -07:00
Eric W. Biederman
d04a48b06d userns: Convert __dev_set_promiscuity to use kuids in audit logs
Cc: Klaus Heinrich Kiwi <klausk@br.ibm.com>
Cc: Eric Paris <eparis@redhat.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2012-08-14 21:42:40 -07:00
Eric W. Biederman
b2e4f544fd userns: Convert net/core/scm.c to use kuids and kgids
With the existence of kuid_t and kgid_t we can take this further
and remove the usage of struct cred altogether, ensuring we
don't get cache line misses from reference counts.   For now
however start simply and do a straight forward conversion
I can be certain is correct.

In cred_to_ucred use from_kuid_munged and from_kgid_munged
as these values are going directly to userspace and we want to use
the userspace safe values not -1 when reporting a value that does not
map.  The earlier conversion that used from_kuid was buggy in that
respect.  Oops.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
2012-08-14 21:41:58 -07:00
Andre Guedes
61a0cfb008 Bluetooth: Fix use-after-free bug in SMP
If SMP fails, we should always cancel security_timer delayed work.
Otherwise, security_timer function may run after l2cap_conn object
has been freed.

This patch fixes the following warning reported by ODEBUG:

WARNING: at lib/debugobjects.c:261 debug_print_object+0x7c/0x8d()
Hardware name: Bochs
ODEBUG: free active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x27
Modules linked in: btusb bluetooth
Pid: 440, comm: kworker/u:2 Not tainted 3.5.0-rc1+ #4
Call Trace:
 [<ffffffff81174600>] ? free_obj_work+0x4a/0x7f
 [<ffffffff81023eb8>] warn_slowpath_common+0x7e/0x97
 [<ffffffff81023f65>] warn_slowpath_fmt+0x41/0x43
 [<ffffffff811746b1>] debug_print_object+0x7c/0x8d
 [<ffffffff810394f0>] ? __queue_work+0x241/0x241
 [<ffffffff81174fdd>] debug_check_no_obj_freed+0x92/0x159
 [<ffffffff810ac08e>] slab_free_hook+0x6f/0x77
 [<ffffffffa0019145>] ? l2cap_conn_del+0x148/0x157 [bluetooth]
 [<ffffffff810ae408>] kfree+0x59/0xac
 [<ffffffffa0019145>] l2cap_conn_del+0x148/0x157 [bluetooth]
 [<ffffffffa001b9a2>] l2cap_recv_frame+0xa77/0xfa4 [bluetooth]
 [<ffffffff810592f9>] ? trace_hardirqs_on_caller+0x112/0x1ad
 [<ffffffffa001c86c>] l2cap_recv_acldata+0xe2/0x264 [bluetooth]
 [<ffffffffa0002b2f>] hci_rx_work+0x235/0x33c [bluetooth]
 [<ffffffff81038dc3>] ? process_one_work+0x126/0x2fe
 [<ffffffff81038e22>] process_one_work+0x185/0x2fe
 [<ffffffff81038dc3>] ? process_one_work+0x126/0x2fe
 [<ffffffff81059f2e>] ? lock_acquired+0x1b5/0x1cf
 [<ffffffffa00028fa>] ? le_scan_work+0x11d/0x11d [bluetooth]
 [<ffffffff81036fb6>] ? spin_lock_irq+0x9/0xb
 [<ffffffff81039209>] worker_thread+0xcf/0x175
 [<ffffffff8103913a>] ? rescuer_thread+0x175/0x175
 [<ffffffff8103cfe0>] kthread+0x95/0x9d
 [<ffffffff812c5054>] kernel_threadi_helper+0x4/0x10
 [<ffffffff812c36b0>] ? retint_restore_args+0x13/0x13
 [<ffffffff8103cf4b>] ? flush_kthread_worker+0xdb/0xdb
 [<ffffffff812c5050>] ? gs_change+0x13/0x13

This bug can be reproduced using hctool lecc or l2test tools and
bluetoothd not running.

Signed-off-by: Andre Guedes <andre.guedes@openbossa.org>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2012-08-15 01:06:23 -03:00
David S. Miller
7bab3ae760 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
Alexey Khoroshilov provides a potential memory leak in rndis_wlan.

Bob Copeland gives us an ath5k fix for a lockdep problem.

Dan Carpenter fixes a signedness mismatch in at76c50x.

Felix Fietkau corrects a regression caused by an earlier commit that can
lead to an IRQ storm.

Lorenzo Bianconi offers a fix for a bad variable initialization in ath9k
that can cause it to improperly mark decrypted frames.

Rajkumar Manoharan fixes ath9k to prevent the btcoex time from running
when the hardware is asleep.

The remainder are Bluetooth fixes, about which Gustavo says:

	"Here goes some fixes for 3.6-rc1, there are a few fix to
	thte inquiry code by Ram Malovany, support for 2 new devices,
	and few others fixes for NULL dereference, possible deadlock
	and a memory leak."
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 17:03:22 -07:00
Ben Hutchings
4acd4945cd ipv6: addrconf: Avoid calling netdevice notifiers with RCU read-side lock
Cong Wang reports that lockdep detected suspicious RCU usage while
enabling IPV6 forwarding:

 [ 1123.310275] ===============================
 [ 1123.442202] [ INFO: suspicious RCU usage. ]
 [ 1123.558207] 3.6.0-rc1+ #109 Not tainted
 [ 1123.665204] -------------------------------
 [ 1123.768254] include/linux/rcupdate.h:430 Illegal context switch in RCU read-side critical section!
 [ 1123.992320]
 [ 1123.992320] other info that might help us debug this:
 [ 1123.992320]
 [ 1124.307382]
 [ 1124.307382] rcu_scheduler_active = 1, debug_locks = 0
 [ 1124.522220] 2 locks held by sysctl/5710:
 [ 1124.648364]  #0:  (rtnl_mutex){+.+.+.}, at: [<ffffffff81768498>] rtnl_trylock+0x15/0x17
 [ 1124.882211]  #1:  (rcu_read_lock){.+.+.+}, at: [<ffffffff81871df8>] rcu_lock_acquire+0x0/0x29
 [ 1125.085209]
 [ 1125.085209] stack backtrace:
 [ 1125.332213] Pid: 5710, comm: sysctl Not tainted 3.6.0-rc1+ #109
 [ 1125.441291] Call Trace:
 [ 1125.545281]  [<ffffffff8109d915>] lockdep_rcu_suspicious+0x109/0x112
 [ 1125.667212]  [<ffffffff8107c240>] rcu_preempt_sleep_check+0x45/0x47
 [ 1125.781838]  [<ffffffff8107c260>] __might_sleep+0x1e/0x19b
[...]
 [ 1127.445223]  [<ffffffff81757ac5>] call_netdevice_notifiers+0x4a/0x4f
[...]
 [ 1127.772188]  [<ffffffff8175e125>] dev_disable_lro+0x32/0x6b
 [ 1127.885174]  [<ffffffff81872d26>] dev_forward_change+0x30/0xcb
 [ 1128.013214]  [<ffffffff818738c4>] addrconf_forward_change+0x85/0xc5
[...]

addrconf_forward_change() uses RCU iteration over the netdev list,
which is unnecessary since it already holds the RTNL lock.  We also
cannot reasonably require netdevice notifier functions not to sleep.

Reported-by: Cong Wang <amwang@redhat.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 17:02:12 -07:00
Pavel Emelyanov
eea68e2f1a packet: Report socket mclist info via diag module
The info is reported as an array of packet_diag_mclist structures. Each
includes not only the directly configured values (index, type, etc), but
also the "count".

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
8a360be0c5 packet: Report more packet sk info via diag module
This reports in one rtattr message all the other scalar values, that can be
set on a packet socket with setsockopt.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
96ec632714 packet: Diag core and basic socket info dumping
The diag module can be built independently from the af_packet.ko one,
just like it's done in unix sockets.

The core dumping message carries the info available at socket creation
time, i.e. family, type and protocol (in the same byte order as shown in
the proc file).

The socket inode number and cookie is reserved for future per-socket info
retrieving. The per-protocol filtering is also reserved for future by
requiring the sdiag_protocol to be zero.

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Pavel Emelyanov
2787b04b6c packet: Introduce net/packet/internal.h header
The diag module will need to access some private packet_sock data, so
move it to a header in advance. This file will be shared between the
af_packet.c and the diag.c

Signed-off-by: Pavel Emelyanov <xemul@parallels.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:56:33 -07:00
Ben Hutchings
aadf31de16 llc: Fix races between llc2 handler use and (un)registration
When registering the handlers, any state they rely on must be
completely initialised first.  When unregistering, we must wait until
they are definitely no longer running.  llc_rcv() must also avoid
reading the handler pointers again after checking for NULL.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:52:02 -07:00
Ben Hutchings
f4f8720feb llc2: Call llc_station_exit() on llc2_init() failure path
Otherwise the station packet handler will remain registered even though
the module is unloaded.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:51:40 -07:00
Ben Hutchings
6024935f5f llc2: Fix silent failure of llc_station_init()
llc_station_init() creates and processes an event skb with no effect
other than to change the state from DOWN to UP.  Allocation failure is
reported, but then ignored by its caller, llc2_init().  Remove this
possibility by simply initialising the state as UP.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 16:51:18 -07:00
Igor Maravic
ad5b310228 net: ipv4: fib_trie: Don't unnecessarily search for already found fib leaf
We've already found leaf, don't search for it again. Same is for fib leaf info.

Signed-off-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 15:02:20 -07:00
Priyanka Jain
418a99ac6a Replace rwlock on xfrm_policy_afinfo with rcu
xfrm_policy_afinfo is read mosly data structure.
Write on xfrm_policy_afinfo is done only at the
time of configuration.
So rwlocks can be safely replaced with RCU.

RCUs usage optimizes the performance.

Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:57:37 -07:00
Igor Maravic
4855d6f311 net: ipv6: proc: Fix error handling
Fix error handling in case making of dir dev_snmp6 failes

Signed-off-by: Igor Maravic <igorm@etf.rs>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:45:07 -07:00
Yan, Zheng
7bd86cc282 ipv4: Cache local output routes
Commit caacf05e5a causes big drop of UDP loop back performance.
The cause of the regression is that we do not cache the local output
routes. Each time we send a datagram from unconnected UDP socket,
the kernel allocates a dst_entry and adds it to the rt_uncached_list.
It creates lock contention on the rt_uncached_lock.

Reported-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 14:45:07 -07:00