Commit graph

27438 commits

Author SHA1 Message Date
John W. Linville
f3a3440063 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-15 10:44:36 -04:00
Li RongQing
35353c2b42 ipv4: replace ip_fast_csum with csum_replace2
replace ip_fast_csum with csum_replace2 to save cpu cycles

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15 09:12:25 -04:00
David S. Miller
296b60109e Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jesse/openvswitch
Jesse Gross says:

====================
A few different bug fixes, including several for issues with userspace
communication that have gone unnoticed up until now.  These are intended
for net/3.9.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15 09:00:39 -04:00
Reilly Grant
2a89f9247a VSOCK: Support VM sockets connected to the hypervisor.
The resource ID used for VM socket control packets (0) is already
used for the VMCI_GET_CONTEXT_ID hypercall so a new ID (15) must be
used when the guest sends these datagrams to the hypervisor.

The hypervisor context ID must also be removed from the internal
blacklist.

Signed-off-by: Reilly Grant <grantr@vmware.com>
Acked-by: Andy King <acking@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-15 08:26:26 -04:00
Florian Westphal
a82783c91d netfilter: ip6t_NPT: restrict to mangle table
As the translation is stateless, using it in nat table
doesn't work (only initial packet is translated).
filter table OUTPUT works but won't re-route the packet after translation.

Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:58:21 +01:00
Pablo Neira Ayuso
1cdb09056b netfilter: nfnetlink_queue: use xor hash function to distribute instances
Thanks to Eric Dumazet for suggesting this during the NFWS.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:38:40 +01:00
Pablo Neira Ayuso
bae99f7a1d netfilter: nfnetlink_queue: fix incorrect initialization of copy range field
2^16 = 0xffff, not 0xfffff (note the extra 'f'). Not dangerous since you
adjust it to min_t(data_len, skb->len) just after on.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:35:49 +01:00
Gao feng
0d98da5d84 netfilter: nf_conntrack: register pernet subsystem before register L4 proto
In (c296bb4 netfilter: nf_conntrack: refactor l4proto support for netns)
the l4proto gre/dccp/udplite/sctp registration happened before the pernet
subsystem, which is wrong.

Register pernet subsystem before register L4proto since after register
L4proto, init_conntrack may try to access the resources which allocated
in register_pernet_subsys.

Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Cc: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 12:29:25 +01:00
Gao feng
fa900b9cf5 netfilter: ebt_ulog: remove unnecessary spin lock protection
No need for spinlock to protect the netlink skb in the
ebt_ulog_fini path. We are sure there is noone using it
at that stage.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 11:56:09 +01:00
Hannes Frederic Sowa
d00bd3d4fb netfilter: nf_ct_ipv6: use ipv6_iface_scope_id in conntrack to return scope id
As in (842df07 ipv6: use newly introduced __ipv6_addr_needs_scope_id and
ipv6_iface_scope_id).

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 11:48:43 +01:00
Silviu-Mihai Popescu
5eb358d029 bridge: netfilter: use PTR_RET instead of IS_ERR + PTR_ERR
This uses PTR_RET instead of IS_ERR and PTR_ERR in order to increase
readability.

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 11:03:56 +01:00
Silviu-Mihai Popescu
015ba03c1a ipv4: netfilter: use PTR_RET instead of IS_ERR + PTR_ERR
This uses PTR_RET instead of IS_ERR and PTR_ERR in order to increase
readability.

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 11:02:14 +01:00
YOSHIFUJI Hideaki
2d2fd8c50a netfilter: ip6t_NPT: Use csum_partial()
[ Some fixes went into mainstream before this patch, so I needed
  to rebase it upon the current tree, that's why it's different from
  the original one posted on the list --pablo ]

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-15 11:02:04 +01:00
Vinicius Costa Gomes
eb20ff9c91 Bluetooth: Fix not closing SCO sockets in the BT_CONNECT2 state
With deferred setup for SCO, it is possible that userspace closes the
socket when it is in the BT_CONNECT2 state, after the Connect Request is
received but before the Accept Synchonous Connection is sent.

If this happens the following crash was observed, when the connection is
terminated:

[  +0.000003] hci_sync_conn_complete_evt: hci0 status 0x10
[  +0.000005] sco_connect_cfm: hcon ffff88003d1bd800 bdaddr 40:98:4e:32:d7:39 status 16
[  +0.000003] sco_conn_del: hcon ffff88003d1bd800 conn ffff88003cc8e300, err 110
[  +0.000015] BUG: unable to handle kernel NULL pointer dereference at 0000000000000199
[  +0.000906] IP: [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000] PGD 3d21f067 PUD 3d291067 PMD 0
[  +0.000000] Oops: 0002 [#1] SMP
[  +0.000000] Modules linked in: rfcomm bnep btusb bluetooth
[  +0.000000] CPU 0
[  +0.000000] Pid: 1481, comm: kworker/u:2H Not tainted 3.9.0-rc1-25019-gad82cdd #1 Bochs Bochs
[  +0.000000] RIP: 0010:[<ffffffff810620dd>]  [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000] RSP: 0018:ffff88003c3c19d8  EFLAGS: 00010002
[  +0.000000] RAX: 0000000000000001 RBX: 0000000000000246 RCX: 0000000000000000
[  +0.000000] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff88003d1be868
[  +0.000000] RBP: ffff88003c3c1a98 R08: 0000000000000002 R09: 0000000000000000
[  +0.000000] R10: ffff88003d1be868 R11: ffff88003e20b000 R12: 0000000000000002
[  +0.000000] R13: ffff88003aaa8000 R14: 000000000000006e R15: ffff88003d1be850
[  +0.000000] FS:  0000000000000000(0000) GS:ffff88003e200000(0000) knlGS:0000000000000000
[  +0.000000] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  +0.000000] CR2: 0000000000000199 CR3: 000000003c1cb000 CR4: 00000000000006b0
[  +0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  +0.000000] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  +0.000000] Process kworker/u:2H (pid: 1481, threadinfo ffff88003c3c0000, task ffff88003aaa8000)
[  +0.000000] Stack:
[  +0.000000]  ffffffff81b16342 0000000000000000 0000000000000000 ffff88003d1be868
[  +0.000000]  ffffffff00000000 00018c0c7863e367 000000003c3c1a28 ffffffff8101efbd
[  +0.000000]  0000000000000000 ffff88003e3d2400 ffff88003c3c1a38 ffffffff81007c7a
[  +0.000000] Call Trace:
[  +0.000000]  [<ffffffff8101efbd>] ? kvm_clock_read+0x34/0x3b
[  +0.000000]  [<ffffffff81007c7a>] ? paravirt_sched_clock+0x9/0xd
[  +0.000000]  [<ffffffff81007fd4>] ? sched_clock+0x9/0xb
[  +0.000000]  [<ffffffff8104fd7a>] ? sched_clock_local+0x12/0x75
[  +0.000000]  [<ffffffff810632d1>] lock_acquire+0x93/0xb1
[  +0.000000]  [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffff8105f3d8>] ? lock_release_holdtime.part.22+0x4e/0x55
[  +0.000000]  [<ffffffff814f6038>] _raw_spin_lock+0x40/0x74
[  +0.000000]  [<ffffffffa0022339>] ? spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffff814f6936>] ? _raw_spin_unlock+0x23/0x36
[  +0.000000]  [<ffffffffa0022339>] spin_lock+0x9/0xb [bluetooth]
[  +0.000000]  [<ffffffffa00230cc>] sco_conn_del+0x76/0xbb [bluetooth]
[  +0.000000]  [<ffffffffa002391d>] sco_connect_cfm+0x2da/0x2e9 [bluetooth]
[  +0.000000]  [<ffffffffa000862a>] hci_proto_connect_cfm+0x38/0x65 [bluetooth]
[  +0.000000]  [<ffffffffa0008d30>] hci_sync_conn_complete_evt.isra.79+0x11a/0x13e [bluetooth]
[  +0.000000]  [<ffffffffa000cd96>] hci_event_packet+0x153b/0x239d [bluetooth]
[  +0.000000]  [<ffffffff814f68ff>] ? _raw_spin_unlock_irqrestore+0x48/0x5c
[  +0.000000]  [<ffffffffa00025f6>] hci_rx_work+0xf3/0x2e3 [bluetooth]
[  +0.000000]  [<ffffffff8103efed>] process_one_work+0x1dc/0x30b
[  +0.000000]  [<ffffffff8103ef83>] ? process_one_work+0x172/0x30b
[  +0.000000]  [<ffffffff8103e07f>] ? spin_lock_irq+0x9/0xb
[  +0.000000]  [<ffffffff8103fc8d>] worker_thread+0x123/0x1d2
[  +0.000000]  [<ffffffff8103fb6a>] ? manage_workers+0x240/0x240
[  +0.000000]  [<ffffffff81044211>] kthread+0x9d/0xa5
[  +0.000000]  [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60
[  +0.000000]  [<ffffffff814f75bc>] ret_from_fork+0x7c/0xb0
[  +0.000000]  [<ffffffff81044174>] ? __kthread_parkme+0x60/0x60
[  +0.000000] Code: d7 44 89 8d 50 ff ff ff 4c 89 95 58 ff ff ff e8 44 fc ff ff 44 8b 8d 50 ff ff ff 48 85 c0 4c 8b 95 58 ff ff ff 0f 84 7a 04 00 00 <f0> ff 80 98 01 00 00 83 3d 25 41 a7 00 00 45 8b b5 e8 05 00 00
[  +0.000000] RIP  [<ffffffff810620dd>] __lock_acquire+0xed/0xe82
[  +0.000000]  RSP <ffff88003c3c19d8>
[  +0.000000] CR2: 0000000000000199
[  +0.000000] ---[ end trace e73cd3b52352dd34 ]---

Cc: stable@vger.kernel.org [3.8]
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@openbossa.org>
Tested-by: Frederic Dalleau <frederic.dalleau@intel.com>
Signed-off-by: Gustavo Padovan <gustavo.padovan@collabora.co.uk>
2013-03-14 13:14:21 -03:00
Eric Dumazet
16fad69cfe tcp: fix skb_availroom()
Chrome OS team reported a crash on a Pixel ChromeBook in TCP stack :

https://code.google.com/p/chromium/issues/detail?id=182056

commit a21d45726a (tcp: avoid order-1 allocations on wifi and tx
path) did a poor choice adding an 'avail_size' field to skb, while
what we really needed was a 'reserved_tailroom' one.

It would have avoided commit 22b4a4f22d (tcp: fix retransmit of
partially acked frames) and this commit.

Crash occurs because skb_split() is not aware of the 'avail_size'
management (and should not be aware)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Mukesh Agrawal <quiche@chromium.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-14 11:49:45 -04:00
Linus Torvalds
aea8b5d1e5 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace bugfixes from Eric Biederman:
 "This tree includes a partial revert for "fs: Limit sys_mount to only
  request filesystem modules." When I added the new style module aliases
  to the filesystems I deleted the old ones.  A bad move.  It turns out
  that distributions like Arch linux use module aliases when
  constructing ramdisks.  Which meant ultimately that an ext3 filesystem
  mounted with ext4 would not result in the ext4 module being put into
  the ramdisk.

  The other change in this tree adds a handful of filesystem module
  alias I simply failed to add the first time.  Which inconvinienced a
  few folks using cifs.

  I don't want to inconvinience folks any longer than I have to so here
  are these trivial fixes."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  fs: Readd the fs module aliases.
  fs: Limit sys_mount to only request filesystem modules. (Part 3)
2013-03-13 15:47:50 -07:00
Martin Hundebøll
2df5278b02 batman-adv: network coding - receive coded packets and decode them
When receiving a network coded packet, the decoding buffer is searched
for a packet to use for decoding. The source, destination, and crc32 from
the coded packet is used to identify the wanted packet. The decoded
packet is passed to the usual unicast receiver function, as had it never
been network coded.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:51 +01:00
Martin Hundebøll
612d2b4fe0 batman-adv: network coding - save overheard and tx packets for decoding
To be able to decode a network coded packet, a node must already know
one of the two coded packets. This is done by buffering skbs before
transmission and buffering packets sniffed with promiscuous mode from
other hosts.

Packets are kept in a buffer similar to the one with forward-skbs: A
hash table, where each entry, which corresponds to a src-dst pair, has a
linked list packets.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:50 +01:00
Martin Hundebøll
3c12de9a5c batman-adv: network coding - code and transmit packets if possible
Before adding forward-skbs to the coding buffer, the buffer is searched
for a potential coding opportunity. If one is found, the two packets are
network coded and transmitted right away. If not, the forward-skb is
added to the buffer.

Network coded packets are transmitted with information about the two
receivers and the two coded packets. The first receiver is given by the
MAC header, while the second is given in the payload/bat-header. The
second receiver uses promiscuous mode to receive the packet and check
the second destination.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:50 +01:00
Martin Hundebøll
953324776d batman-adv: network coding - buffer unicast packets before forward
Two be able to network code two packets, one packet must be buffered
until the next is available. This is done in a "coding buffer", which is
essentially a hash table with lists of packets. Each entry in the hash
table corresponds to a specific src-dst pair, which has a linked list of
packets that are buffered.

This patch adds skbs to the buffer just before forwarding them. The
buffer is traversed every 10 ms, where timed skbs are removed from the
buffer and transmitted. To allow experiments with the network coding
scheme, the timeout is tunable through a file in debugfs.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:49 +01:00
Martin Hundebøll
d56b1705e2 batman-adv: network coding - detect coding nodes and remove these after timeout
To use network coding efficiently, a relay must know when neighbor nodes
are likely to have enough information to be able to decode a network
coded packet. This is detected by using OGMs from batman-adv to discover
when one neighbor is in range of another neighbor. The relay check the
TLL to detect when an OGM is forwarded from one neighbor by another
neighbor, and thereby knows that the two neighbors are in range and thus
overhear packets sent by each other.

This information is saved in the orig_node struct to be used when
searching for coding opportunities. Two lists are added to the
orig_node struct: One for neighbors that can hear the orig_node
(outgoing nc_nodes) and one for neighbors that the orig_node can hear
(incoming nc_nodes).

Information about nc_nodes is kept for 10 seconds and is available
through debugfs in batman_adv/nc_nodes to use when debugging network
coding.

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:49 +01:00
Martin Hundebøll
d353d8d4d9 batman-adv: network coding - add the initial infrastructure code
Network coding exploits the 802.11 shared medium to allow multiple
packets to be sent in a single transmission. In brief, a relay can XOR
two packets, and send the coded packet to two destinations. The
receivers can decode one of the original packets by XOR'ing the coded
packet with the other original packet. This will lead to increased
throughput in topologies where two packets cross one relay.

In a simple topology with three nodes, it takes four transmissions
without network coding to get one packet from Node A to Node B and one
from Node B to Node A:

 1.  Node A  ---- p1 --->  Node R                Node B
 2.  Node A                Node R  <--- p2 ----  Node B
 3.  Node A  <--- p2 ----  Node R                Node B
 4.  Node A                Node R  ---- p1 --->  Node B

With network coding, the relay only needs one transmission, which saves
us one slot of valuable airtime:

 1.  Node A  ---- p1 --->  Node R                Node B
 2.  Node A                Node R  <--- p2 ----  Node B
 3.  Node A  <- p1 x p2 -  Node R  - p1 x p2 ->  Node B

The same principle holds for a topology including five nodes. Here the
packets from Node A and Node B are overheard by Node C and Node D,
respectively. This allows Node R to send a network coded packet to save
one transmission:

   Node A                  Node B

    |     \              /    |
    |      p1          p2     |
    |       \          /      |
    p1       > Node R <       p2
    |                         |
    |         /      \        |
    |    p1 x p2    p1 x p2   |
    v       /          \      v
           /            \
   Node C <              > Node D

More information is available on the open-mesh.org wiki[1].

This patch adds the initial code to support network coding in
batman-adv. It sets up a worker thread to do house keeping and adds a
sysfs file to enable/disable network coding. The feature is disabled by
default, as it requires a wifi-driver with working promiscuous mode, and
also because it adds a small delay at each hop.

[1] http://www.open-mesh.org/projects/batman-adv/wiki/Catwoman

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:48 +01:00
Antonio Quartulli
c1d07431b9 batman-adv: don't use !! in bool conversion
In C standard any expression different from 0 will be converted to
'true' when casting to bool (whatever is the length of the value).
Therefore all the "!!" conversions can be removed.

Signed-off-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
2013-03-13 22:53:48 +01:00
Martin Hundebøll
f86ce0ad10 batman-adv: Return reason for failure in batadv_check_unicast_packet()
batadv_check_unicast_packet() is changed to return a value based on the
reason to drop the packet, which will be useful information for
future users of batadv_check_unicast_packet().

Signed-off-by: Martin Hundebøll <martin@hundeboll.net>
Acked-by: Antonio Quartulli <ordex@autistici.org>
Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:47 +01:00
Marek Lindner
736292c2e8 batman-adv: replace redundant primary_if_get calls
The batadv_priv struct carries a pointer to its own interface
struct. Therefore, it is not necessary to retrieve the soft_iface
via the primary interface.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-13 22:53:47 +01:00
Xufeng Zhang
2317f449af sctp: don't break the loop while meeting the active_path so as to find the matched transport
sctp_assoc_lookup_tsn() function searchs which transport a certain TSN
was sent on, if not found in the active_path transport, then go search
all the other transports in the peer's transport_addr_list, however, we
should continue to the next entry rather than break the loop when meet
the active_path transport.

Signed-off-by: Xufeng Zhang <xufeng.zhang@windriver.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-13 10:09:55 -04:00
Vlad Yasevich
f281563350 sctp: Use correct sideffect command in duplicate cookie handling
When SCTP is done processing a duplicate cookie chunk, it tries
to delete a newly created association.  For that, it has to set
the right association for the side-effect processing to work.
However, when it uses the SCTP_CMD_NEW_ASOC command, that performs
more work then really needed (like hashing the associationa and
assigning it an id) and there is no point to do that only to
delete the association as a next step.  In fact, it also creates
an impossible condition where an association may be found by
the getsockopt() call, and that association is empty.  This
causes a crash in some sctp getsockopts.

The solution is rather simple.  We simply use SCTP_CMD_SET_ASOC
command that doesn't have all the overhead and does exactly
what we need.

Reported-by: Karl Heiss <kheiss@gmail.com>
Tested-by: Karl Heiss <kheiss@gmail.com>
CC: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-13 09:59:21 -04:00
Eric W. Biederman
fa7614ddd6 fs: Readd the fs module aliases.
I had assumed that the only use of module aliases for filesystems
prior to "fs: Limit sys_mount to only request filesystem modules."
was in request_module.  It turns out I was wrong.  At least mkinitcpio
in Arch linux uses these aliases.

So readd the preexising aliases, to keep from breaking userspace.

Userspace eventually will have to follow and use the same aliases the
kernel does.  So at some point we may be delete these aliases without
problems.  However that day is not today.

Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-12 18:55:21 -07:00
Linus Torvalds
368edaadc0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph fix from Sage Weil:
 "This fixes a bug in the new message decoding that just went in during
  the last window."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
  libceph: fix decoding of pgids
2013-03-12 09:22:42 -07:00
Linus Torvalds
5b22b1848b Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd bugfixes from Bruce Fields:
 "Some minor fallout from the user-namespace work broke most krb5 mounts
  to nfsd, and I screwed up a change to the AF_LOCAL rpc code."

* 'for-3.9' of git://linux-nfs.org/~bfields/linux:
  sunrpc: don't attempt to cancel unitialized work
  nfsd: fix krb5 handling of anonymous principals
2013-03-12 09:20:58 -07:00
Li RongQing
c80a8512ee net/core: move vlan_depth out of while loop in skb_network_protocol()
[ Bug added added in commit 05e8ef4ab2 (net: factor out
  skb_mac_gso_segment() from skb_gso_segment() ) ]

move vlan_depth out of while loop, or else vlan_depth always is ETH_HLEN,
can not be increased, and lead to infinite loop when frame has two vlan headers.

Signed-off-by: Li RongQing <roy.qing.li@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 11:47:40 -04:00
Nandita Dukkipati
9b717a8d24 tcp: TLP loss detection.
This is the second of the TLP patch series; it augments the basic TLP
algorithm with a loss detection scheme.

This patch implements a mechanism for loss detection when a Tail
loss probe retransmission plugs a hole thereby masking packet loss
from the sender. The loss detection algorithm relies on counting
TLP dupacks as outlined in Sec. 3 of:
http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01

The basic idea is: Sender keeps track of TLP "episode" upon
retransmission of a TLP packet. An episode ends when the sender receives
an ACK above the SND.NXT (tracked by tlp_high_seq) at the time of the
episode. We want to make sure that before the episode ends the sender
receives a "TLP dupack", indicating that the TLP retransmission was
unnecessary, so there was no loss/hole that needed plugging. If the
sender gets no TLP dupack before the end of the episode, then it reduces
ssthresh and the congestion window, because the TLP packet arriving at
the receiver probably plugged a hole.

Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 08:30:34 -04:00
Nandita Dukkipati
6ba8a3b19e tcp: Tail loss probe (TLP)
This patch series implement the Tail loss probe (TLP) algorithm described
in http://tools.ietf.org/html/draft-dukkipati-tcpm-tcp-loss-probe-01. The
first patch implements the basic algorithm.

TLP's goal is to reduce tail latency of short transactions. It achieves
this by converting retransmission timeouts (RTOs) occuring due
to tail losses (losses at end of transactions) into fast recovery.
TLP transmits one packet in two round-trips when a connection is in
Open state and isn't receiving any ACKs. The transmitted packet, aka
loss probe, can be either new or a retransmission. When there is tail
loss, the ACK from a loss probe triggers FACK/early-retransmit based
fast recovery, thus avoiding a costly RTO. In the absence of loss,
there is no change in the connection state.

PTO stands for probe timeout. It is a timer event indicating
that an ACK is overdue and triggers a loss probe packet. The PTO value
is set to max(2*SRTT, 10ms) and is adjusted to account for delayed
ACK timer when there is only one oustanding packet.

TLP Algorithm

On transmission of new data in Open state:
  -> packets_out > 1: schedule PTO in max(2*SRTT, 10ms).
  -> packets_out == 1: schedule PTO in max(2*RTT, 1.5*RTT + 200ms)
  -> PTO = min(PTO, RTO)

Conditions for scheduling PTO:
  -> Connection is in Open state.
  -> Connection is either cwnd limited or no new data to send.
  -> Number of probes per tail loss episode is limited to one.
  -> Connection is SACK enabled.

When PTO fires:
  new_segment_exists:
    -> transmit new segment.
    -> packets_out++. cwnd remains same.

  no_new_packet:
    -> retransmit the last segment.
       Its ACK triggers FACK or early retransmit based recovery.

ACK path:
  -> rearm RTO at start of ACK processing.
  -> reschedule PTO if need be.

In addition, the patch includes a small variation to the Early Retransmit
(ER) algorithm, such that ER and TLP together can in principle recover any
N-degree of tail loss through fast recovery. TLP is controlled by the same
sysctl as ER, tcp_early_retrans sysctl.
tcp_early_retrans==0; disables TLP and ER.
		 ==1; enables RFC5827 ER.
		 ==2; delayed ER.
		 ==3; TLP and delayed ER. [DEFAULT]
		 ==4; TLP only.

The TLP patch series have been extensively tested on Google Web servers.
It is most effective for short Web trasactions, where it reduced RTOs by 15%
and improved HTTP response time (average by 6%, 99th percentile by 10%).
The transmitted probes account for <0.5% of the overall transmissions.

Signed-off-by: Nandita Dukkipati <nanditad@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 08:30:34 -04:00
Wei Yongjun
74694e7bd0 bridge: using for_each_set_bit to simplify the code
Using for_each_set_bit() to simplify the code.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 08:04:09 -04:00
Wei Yongjun
5096e3c4b2 bridge: using for_each_set_bit_from to simplify the code
Using for_each_set_bit_from() to simplify the code.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 08:04:08 -04:00
David S. Miller
e5f2ef7ab4 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/intel/e1000e/netdev.c

Minor conflict in e1000e, a line that got fixed in 'net'
has been removed in 'net-next'.

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:52:22 -04:00
stephen hemminger
3da889b616 bridge: reserve space for IFLA_BRPORT_FAST_LEAVE
The bridge multicast fast leave feature was added sufficient space
was not reserved in the netlink message. This means the flag may be
lost in netlink events and results of queries.

Found by observation while looking up some netlink stuff for discussion with Vlad.
Problem introduced by commit c2d3babfaf
Author: David S. Miller <davem@davemloft.net>
Date:   Wed Dec 5 16:24:45 2012 -0500

    bridge: implement multicast fast leave

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:38:29 -04:00
David S. Miller
2230e0c193 Included changes ares:
- fix packet parsing routine to avoid to read beyond the packet boundary
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.19 (GNU/Linux)
 
 iQIcBAABCAAGBQJRPlQRAAoJEADl0hg6qKeOP44QAI20iYAaMmA40lrsELAAJMVJ
 G/hHyKYcup0nxrXnlj9yOft6bYx0232/TjhGKGQ7eYcl+ri+Wyu96kC1hJG9rr/Z
 +WCU4CimTY5MRVzFKwNriaiqyAsW2cw2T1k1KfZD9Wb9t6hEdvd8f+4DbXYrYxHG
 nSZQKKDD0cxs1ARScOEGbf7KF8sw6RcGWj0m4xM00Wo/fai+CZZX/HLcUnHQrQxx
 4w9safvaIVuQV3mANTpSoerfkraNzaX14i2ZU5SGi2/mhR9PC4JyGz5FIge+fuvp
 rP/E40GdCYpcuDL7UAyd+IBaOoiP6llDUJA/LqbZLyEZgkMtt8rgQwBsmcYDtiTt
 zmqCgwjp2/mTs44LfuxtxvLcIDRsQh52I0ceZaAzflG3m9t5eYs6L7oyEEUtOSCm
 wwY+RmBdMrArr8dohkxopjxAJtCLuHxC8e9AfXwzqt8FYZIQG/oayBrgtEoxCgzf
 PnJWX0uw4m6WisvMN5Ko8bNeacVRyceTqTOpIWxbdF0wku2evCbkxkK6PfvRDAca
 UKyrLfbDH59OObhq3fEov7wiNjLJo92bV6dLqTGgQp/GQBTswttb+9WwjT6+PE8b
 dKRM1eKlCBDMT5q4tGSzKvoH6cfC+h7GIPmpNDdcG+ByJI4bLyzDxLs6ZzYoyYP9
 plB7HO/1r1pnOE/vtqXE
 =X3VY
 -----END PGP SIGNATURE-----

Merge tag 'batman-adv-fix-for-davem' of git://git.open-mesh.org/linux-merge

Included changes ares:
- fix packet parsing routine to avoid to read beyond the packet boundary

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:36:52 -04:00
David Ward
4660c7f498 net/ipv4: Ensure that location of timestamp option is stored
This is needed in order to detect if the timestamp option appears
more than once in a packet, to remove the option if the packet is
fragmented, etc. My previous change neglected to store the option
location when the router addresses were prespecified and Pointer >
Length. But now the option location is also stored when Flag is an
unrecognized value, to ensure these option handling behaviors are
still performed.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:35:39 -04:00
Michael Dalton
e1733de224 flow_dissector: support L2 GRE
Add support for L2 GRE tunnels, so that RPS can be more effective.

Signed-off-by: Michael Dalton <mwdalton@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-12 05:14:06 -04:00
Marek Lindner
b47506d912 batman-adv: verify tt len does not exceed packet len
batadv_iv_ogm_process() accesses the packet using the tt_num_changes
attribute regardless of the real packet len (assuming the length check
was done before). Therefore a length check is needed to avoid reading
random memory.

Signed-off-by: Marek Lindner <lindner_marek@yahoo.de>
Signed-off-by: Antonio Quartulli <ordex@autistici.org>
2013-03-11 22:59:47 +01:00
Sage Weil
d6c0dd6b0c libceph: fix decoding of pgids
In 4f6a7e5ee1 we effectively dropped support
for the legacy encoding for the OSDMap and incremental.  However, we didn't
fix the decoding for the pgid.

Signed-off-by: Sage Weil <sage@inktank.com>
Reviewed-by: Yehuda Sadeh <yehuda@inktank.com>
2013-03-11 14:31:00 -07:00
Linus Torvalds
0cb7750825 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) Missing cancel of work items in mac80211 MLME, from Ben Greear.

 2) Fix DMA mapping handling in iwlwifi by using coherent DMA for
    command headers, from Johannes Berg.

 3) Decrease the amount of pressure on the page allocator by using order
    1 pages less in iwlwifi, from Emmanuel Grumbach.

 4) Fix mesh PS broadcast OOPS in mac80211, from Marco Porsch.

 5) Don't forget to recalculate idle state in mac80211 monitor
    interface, from Felix Fietkau.

 6) Fix varargs in netfilter conntrack handler, from Joe Perches.

 7) Need to reset entire chip when command queue fills up in iwlwifi,
    from Emmanuel Grumbach.

 8) The TX antenna value must be valid when calibrations are performed
    in iwlwifi, fix from Dor Shaish.

 9) Don't generate netfilter audit log entries when audit is disabled,
    from Gao Feng.

10) Deal with DMA unit hang on e1000e during power state transitions,
    from Bruce Allan.

11) Remove BUILD_BUG_ON check from igb driver, from Alexander Duyck.

12) Fix lockdep warning on i2c handling of igb driver, from Carolyn
    Wyborny.

13) Fix several TTY handling issues in IRDA ircomm tty driver, from
    Peter Hurley.

14) Several QFQ packet scheduler fixes from Paolo Valente.

15) When VXLAN encapsulates on transmit, we have to reset the netfilter
    state.  From Zang MingJie.

16) Fix jiffie check in net_rx_action() so that we really cap the
    processing at 2HZ.  From Eric Dumazet.

17) Fix erroneous trigger of IP option space exhaustion, when routers
    are pre-specified and we are looking to see if we can insert a
    timestamp, we will have the space.  From David Ward.

18) Fix various issues in benet driver wrt waiting for firmware to
    finish POST after resets or errors.  From Gavin Shan and Sathya
    Perla.

19) Fix TX locking in SFC driver, from Ben Hutchings.

20) Like the VXLAN fix above, when we encap in a TUN device we have to
    reset the netfilter state.  This should fix several strange crashes
    reported by Dave Jones and others.  From Eric Dumazet.

21) Don't forget to clean up MAC address resources when shutting down a
    port in mlx4 driver, from Yan Burman.

22) Fix divide by zero in vmxnet3 driver, from Bhavesh Davda.

23) Fix device statistic regression in tg3 when the driver is using
    phylib, from Nithin Sujir.

24) Fix info leak in several netlink handlers, from Mathias Krause.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (79 commits)
  6lowpan: Fix endianness issue in is_addr_link_local().
  rrunner.c: fix possible memory leak in rr_init_one()
  dcbnl: fix various netlink info leaks
  rtnl: fix info leak on RTM_GETLINK request for VF devices
  bridge: fix mdb info leaks
  tg3: Update link_up flag for phylib devices
  ipv6: stop multicast forwarding to process interface scoped addresses
  bridging: fix rx_handlers return code
  netlabel: fix build problems when CONFIG_IPV6=n
  drivers/isdn: checkng length to be sure not memory overflow
  net/rds: zero last byte for strncpy
  bnx2x: Fix SFP+ misconfiguration in iSCSI boot scenario
  bnx2x: Fix intermittent long KR2 link up time
  macvlan: Set IFF_UNICAST_FLT flag to prevent unnecessary promisc mode.
  team: unsyc the devices addresses when port is removed
  bridge: add missing vid to br_mdb_get()
  Fix: sparse warning in inet_csk_prepare_forced_close
  afkey: fix a typo
  MAINTAINERS: Update qlcnic maintainers list
  netlabel: correctly list all the static label mappings
  ...
2013-03-11 07:51:59 -07:00
Valentin Ilie
f4f3efdaf9 net: can: af_can.c: Fix checkpatch warnings
Replace printk(KERN_ERR with pr_err
Add space before {
Removed OOM messages

Signed-off-by: Valentin Ilie <valentin.ilie@gmail.com>
Acked-by: Oliver Hartkopp <socketcan@hartkopp.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-11 07:40:17 -04:00
Thierry Escande
40213fa851 NFC: llcp: Add cleanup support for unreplied SNL requests
If the remote LLC doesn't reply in time to our SNL requests we remove
them from the list of pending requests. The timeout is fixed to an
arbitrary value of 3 times remote_lto.

When not replied, the local LLC broadcasts NFC_EVENT_LLC_SDRES nl events for
the concerned uris with sap values set to LLCP_SDP_UNBOUND (which is 65).

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 23:16:41 +01:00
Thierry Escande
d9b8d8e19b NFC: llcp: Service Name Lookup netlink interface
This adds a netlink interface for service name lookup support.
Multiple URIs can be passed nested into the NFC_ATTR_LLC_SDP attribute
using the NFC_CMD_LLC_SDREQ netlink command.
When the SNL reply is received, a NFC_EVENT_LLC_SDRES event is sent to
the user space. URI and SAP tuples are passed back, nested into
NFC_ATTR_LLC_SDP attribute.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 23:14:54 +01:00
Thierry Escande
e0ae7bac06 NFC: llcp: Service Name Lookup SDRES aggregation
This modifies the way SDRES PDUs are sent back. If multiple SDREQs are
received within a single SNL PDU, all SDRES replies are sent packed in
one SNL PDU too.

Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 23:10:55 +01:00
Thierry Escande
8af362d124 NFC: Add missing type policies for netlink attributes
Signed-off-by: Thierry Escande <thierry.escande@linux.intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 22:20:05 +01:00
Samuel Ortiz
8808edb1ec NFC: llcp: Remove redundant printk
We already have a pr_debug for that.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 22:20:05 +01:00
Samuel Ortiz
06d44f806a NFC: llcp: Use socket specific link parameters before the local ones
If the socket link options are set, use them before the local one.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 22:20:05 +01:00
Samuel Ortiz
26fd76cab2 NFC: llcp: Implement socket options
Some LLCP services (e.g. the validation ones) require some control over
the LLCP link parameters like the receive window (RW) or the MIU extension
(MIUX). This can only be done through socket options.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 22:20:05 +01:00
Samuel Ortiz
e4306bec47 NFC: llcp: Rename socket rw and miu fields
They really are remote peer parameters, and we need to distinguish them
from the local ones as we'll modify the latter with socket options.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-10 22:20:05 +01:00
Cong Wang
e8f72ea4a1 ipv6: introduce ip6tunnel_xmit() helper
Similar to iptunnel_xmit(), group these operations into a
helper function.

This by the way fixes the missing u64_stats_update_begin()
and u64_stats_update_end() for 32 bit arch.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:53:34 -04:00
YOSHIFUJI Hideaki / 吉藤英明
9026c49272 6lowpan: Fix endianness issue in is_addr_link_local().
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:49:35 -04:00
Mathias Krause
22c352195e ipv6: remove superfluous nla_data() NULL pointer checks
nla_data() cannot return NULL, so these NULL pointer checks are
superfluous.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 16:46:09 -04:00
Mathias Krause
29cd8ae0e1 dcbnl: fix various netlink info leaks
The dcb netlink interface leaks stack memory in various places:
* perm_addr[] buffer is only filled at max with 12 of the 32 bytes but
  copied completely,
* no in-kernel driver fills all fields of an IEEE 802.1Qaz subcommand,
  so we're leaking up to 58 bytes for ieee_ets structs, up to 136 bytes
  for ieee_pfc structs, etc.,
* the same is true for CEE -- no in-kernel driver fills the whole
  struct,

Prevent all of the above stack info leaks by properly initializing the
buffers/structures involved.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 05:19:26 -04:00
Mathias Krause
84d73cd3fb rtnl: fix info leak on RTM_GETLINK request for VF devices
Initialize the mac address buffer with 0 as the driver specific function
will probably not fill the whole buffer. In fact, all in-kernel drivers
fill only ETH_ALEN of the MAX_ADDR_LEN bytes, i.e. 6 of the 32 possible
bytes. Therefore we currently leak 26 bytes of stack memory to userland
via the netlink interface.

Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 05:19:26 -04:00
Mathias Krause
c085c49920 bridge: fix mdb info leaks
The bridging code discloses heap and stack bytes via the RTM_GETMDB
netlink interface and via the notify messages send to group RTNLGRP_MDB
afer a successful add/del.

Fix both cases by initializing all unset members/padding bytes with
memset(0).

Cc: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Mathias Krause <minipli@googlemail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 05:19:25 -04:00
Cong Wang
6aed0c8bf7 tunnel: use iptunnel_xmit() again
With recent patches from Pravin, most tunnels can't use iptunnel_xmit()
any more, due to ip_select_ident() and skb->ip_summed. But we can just
move these operations out of iptunnel_xmit(), so that tunnels can
use it again.

This by the way fixes a bug in vxlan (missing nf_reset()) for net-next.

Cc: Pravin B Shelar <pshelar@nicira.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-10 03:05:44 -04:00
Linus Torvalds
72932611b4 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull namespace bugfixes from Eric Biederman:
 "This is three simple fixes against 3.9-rc1.  I have tested each of
  these fixes and verified they work correctly.

  The userns oops in key_change_session_keyring and the BUG_ON triggered
  by proc_ns_follow_link were found by Dave Jones.

  I am including the enhancement for mount to only trigger requests of
  filesystem modules here instead of delaying this for the 3.10 merge
  window because it is both trivial and the kind of change that tends to
  bit-rot if left untouched for two months."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
  proc: Use nd_jump_link in proc_ns_follow_link
  fs: Limit sys_mount to only request filesystem modules (Part 2).
  fs: Limit sys_mount to only request filesystem modules.
  userns: Stop oopsing in key_change_session_keyring
2013-03-09 16:51:13 -08:00
Pravin B Shelar
4f3ed9209f ipip: capture inner headers during encapsulation
Allow IPIP to make use of tx-checksum offloading.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:09:20 -05:00
Pravin B Shelar
8344bfc600 ipip: Use tunnel_ip_select_ident() for tunnel IP-Identification.
tunnel_ip_select_ident() is more efficient when generating ip-header
id given inner packet is of ipv4 type.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:09:19 -05:00
stephen hemminger
23bdbc80e1 dcb: fix sparse warnings
Add header with function definitions to quiet warnings and avoid future errors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:09:18 -05:00
Pravin B Shelar
7313626745 tunneling: Add generic Tunnel segmentation.
Adds generic tunneling offloading support for IPv4-UDP based
tunnels.
GSO type is added to request this offload for a skb.
netdev feature NETIF_F_UDP_TUNNEL is added for hardware offloaded
udp-tunnel support. Currently no device supports this feature,
software offload is used.

This can be used by tunneling protocols like VXLAN.

CC: Jesse Gross <jesse@nicira.com>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:09:17 -05:00
Pravin B Shelar
aefbd2b3c2 tunneling: Capture inner mac header during encapsulation.
This patch adds inner mac header. This will be used in next patch
to find tunner header length. Header len is required to copy tunnel
header to each gso segment.
This patch does not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:08:57 -05:00
Pravin B Shelar
f5b1729443 net: Add skb_headers_offset_update helper function.
This function will be used in next VXLAN_GSO patch. This patch does
not change any functionality.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Acked-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:08:57 -05:00
Pravin B Shelar
ee579677c2 tunnel: Inherit NETIF_F_SG for hw_enc_features.
Inherit scatergather feature for tunnel devices to avoid
copy for TSO packets of tunneling device like GRE.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:08:57 -05:00
Pravin B Shelar
ec5f061564 net: Kill link between CSUM and SG features.
Earlier SG was unset if CSUM was not available for given device to
force skb copy to avoid sending inconsistent csum.
Commit c9af6db4c1 (net: Fix possible wrong checksum generation)
added explicit flag to force copy to fix this issue.  Therefore
there is no need to link SG and CSUM, following patch kills this
link between there two features.

This patch is also required following patch in series.

Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-09 16:08:57 -05:00
J. Bruce Fields
190b1ecf25 sunrpc: don't attempt to cancel unitialized work
As of dc107402ae "SUNRPC: make AF_LOCAL connect synchronous", we no longer initialize connect_worker in the
AF_LOCAL case, resulting in warnings like:

    WARNING: at lib/debugobjects.c:261 debug_print_object+0x8c/0xb0() Hardware name: Bochs
    ODEBUG: assert_init not available (active state 0) object type: timer_list hint: stub_timer+0x0/0x20
    Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi nfsd auth_rpcgss nfs_acl lockd sunrpc
    Pid: 4816, comm: nfsd Tainted: G        W    3.8.0-rc2-00049-gdc10740 #801
    Call Trace:
     [<ffffffff8156ec00>] ? free_obj_work+0x60/0xa0
     [<ffffffff81046aaf>] warn_slowpath_common+0x7f/0xc0
     [<ffffffff81046ba6>] warn_slowpath_fmt+0x46/0x50
     [<ffffffff8156eccc>] debug_print_object+0x8c/0xb0
     [<ffffffff81055030>] ? timer_debug_hint+0x10/0x10
     [<ffffffff8156f7e3>] debug_object_assert_init+0xe3/0x120
     [<ffffffff81057ebb>] del_timer+0x2b/0x80
     [<ffffffff8109c4e6>] ? mark_held_locks+0x86/0x110
     [<ffffffff81065a29>] try_to_grab_pending+0xd9/0x150
     [<ffffffff81065b57>] __cancel_work_timer+0x27/0xc0
     [<ffffffff81065c03>] cancel_delayed_work_sync+0x13/0x20
     [<ffffffffa0007067>] xs_destroy+0x27/0x80 [sunrpc]
     [<ffffffffa00040d8>] xprt_destroy+0x78/0xa0 [sunrpc]
     [<ffffffffa0006241>] xprt_put+0x21/0x30 [sunrpc]
     [<ffffffffa00030cf>] rpc_free_client+0x10f/0x1a0 [sunrpc]
     [<ffffffffa0002ff3>] ? rpc_free_client+0x33/0x1a0 [sunrpc]
     [<ffffffffa0002f7e>] rpc_release_client+0x6e/0xb0 [sunrpc]
     [<ffffffffa000325d>] rpc_shutdown_client+0xfd/0x1b0 [sunrpc]
     [<ffffffffa0017196>] rpcb_put_local+0x106/0x130 [sunrpc]
    ...

Acked-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-09 12:43:42 -05:00
Arnd Bergmann
dc893e19b5 Revert parts of "hlist: drop the node parameter from iterators"
Commit b67bfe0d42 ("hlist: drop the node parameter from iterators")
did a lot of nice changes but also contains two small hunks that seem to
have slipped in accidentally and have no apparent connection to the
intent of the patch.

This reverts the two extraneous changes.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Senna Tschudin <peter.senna@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-03-08 15:05:34 -08:00
Hannes Frederic Sowa
3868b7aa76 ipv6: report sin6_scope_id if sockopt RECVORIGDSTADDR is set
v4:
a) unchanged

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:23 -05:00
Hannes Frederic Sowa
842df07397 ipv6: use newly introduced __ipv6_addr_needs_scope_id and ipv6_iface_scope_id
This patch requires multicast interface-scoped addresses to supply a
sin6_scope_id. Because the sin6_scope_id is now also correctly used
in case of interface-scoped multicast traffic this enables one to use
interface scoped addresses over interfaces which are not targeted by the
default multicast route (the route has to be put there manually, though).

getsockname() and getpeername() now return the correct sin6_scope_id in
case of interface-local mc addresses.

v2:
a) rebased ontop of patch 1/4 (now uses ipv6_addr_props)

v3:
a) reverted changes for ipv6_addr_props

v4:
a) unchanged

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>dave
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:22 -05:00
Thomas Graf
80580d4b20 ipv6: ndisc: remove redundant check for !dev->addr_len
send_sllao is already initialized with the value of dev->addr_len

Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Signed-off-by: Thomas Graf <tgraf@suug.ch>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:29:22 -05:00
Hannes Frederic Sowa
ddf64354af ipv6: stop multicast forwarding to process interface scoped addresses
v2:
a) used struct ipv6_addr_props

v3:
a) reverted changes for ipv6_addr_props

v4:
a) do not use __ipv6_addr_needs_scope_id

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:28:20 -05:00
Cristian Bercaru
3bc1b1add7 bridging: fix rx_handlers return code
The frames for which rx_handlers return RX_HANDLER_CONSUMED are no longer
counted as dropped. They are counted as successfully received by
'netif_receive_skb'.

This allows network interface drivers to correctly update their RX-OK and
RX-DRP counters based on the result of 'netif_receive_skb'.

Signed-off-by: Cristian Bercaru <B43982@freescale.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 12:19:59 -05:00
Samuel Ortiz
3bbc0ceb7a NFC: llcp: Report error to pending sockets when a device is removed
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 17:35:22 +01:00
Samuel Ortiz
e6a3a4bb85 NFC: llcp: Clean raw sockets from nfc_llcp_socket_release
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 17:34:57 +01:00
Paul Moore
a6a8fe950e netlabel: fix build problems when CONFIG_IPV6=n
My last patch to solve a problem where the static/fallback labels were
not fully displayed resulted in build problems when IPv6 was disabled.
This patch resolves the IPv6 build problems; sorry for the screw-up.

Please queue for -stable or simply merge with the previous patch.

Reported-by: Kbuild Test Robot <fengguang.wu@intel.com>
Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 11:33:51 -05:00
Samuel Ortiz
3536da06db NFC: llcp: Clean local timers and works when removing a device
Whenever an adapter is removed we must clean all the local structures,
especially the timers and scheduled work. Otherwise those asynchronous
threads will eventually try to access the freed nfc_dev pointer if an LLCP
link is up.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 14:25:04 +01:00
Samuel Ortiz
b141e811a0 NFC: llcp: Decrease socket ack log when accepting a connection
This is really difficult to test with real NFC devices, but without
this fix an LLCP server will eventually refuse new connections.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
2013-03-08 14:25:04 +01:00
Chen Gang
2e85d67690 net/rds: zero last byte for strncpy
for NUL terminated string, need be always sure '\0' in the end.

additional info:
  strncpy will pads with zeroes to the end of the given buffer.
  should initialise every bit of memory that is going to be copied to userland

Signed-off-by: Chen Gang <gang.chen@asianux.com>
Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-08 00:35:44 -05:00
Eric Dumazet
7f0e44ac9f ipv6 flowlabel: add __rcu annotations
Commit 18367681a1 (ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.)
omitted proper __rcu annotations.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:33:10 -05:00
Cong Wang
fbca58a224 bridge: add missing vid to br_mdb_get()
Obviously, vid should be considered when searching for multicast
group.

Cc: Vlad Yasevich <vyasevic@redhat.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Vlad Yasevich <vyasevich@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:32:19 -05:00
Christoph Paasch
c10cb5fc0f Fix: sparse warning in inet_csk_prepare_forced_close
In e337e24d66 (inet: Fix kmemleak in tcp_v4/6_syn_recv_sock and
dccp_v4/6_request_recv_sock) I introduced the function
inet_csk_prepare_forced_close, which does a call to bh_unlock_sock().
This produces a sparse-warning.

This patch adds the missing __releases.

Signed-off-by: Christoph Paasch <christoph.paasch@uclouvain.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:31:29 -05:00
Amerigo Wang
bf5e4dd6b2 bridge: use ipv4_is_local_multicast() helper
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:29:53 -05:00
Junwei Zhang
d0d79c3fd7 afkey: fix a typo
Signed-off-by: Martin Zhang <martinbj2008@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:26:45 -05:00
Silviu-Mihai Popescu
3bffc475f9 CAIF: fix indentation for function arguments
This lines up function arguments on second and subsequent lines at the
first column after the openning parenthesis of the first line.

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:24:45 -05:00
Eric Dumazet
b2fb4f54ec tcp: uninline tcp_prequeue()
tcp_prequeue() became too big to be inlined.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:22:39 -05:00
Paul Moore
0c1233aba1 netlabel: correctly list all the static label mappings
When we have a large number of static label mappings that spill across
the netlink message boundary we fail to properly save our state in the
netlink_callback struct which causes us to repeat the same listings.
This patch fixes this problem by saving the state correctly between
calls to the NetLabel static label netlink "dumpit" routines.

Signed-off-by: Paul Moore <pmoore@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 16:20:23 -05:00
Vlad Yasevich
090096bf3d net: generic fdb support for drivers without ndo_fdb_<op>
If the driver does not support the ndo_op use the generic
handler for it. This should work in the majority of cases.
Eventually the fdb_dflt_add call gets translated into a
__dev_set_rx_mode() call which should handle hardware
support for filtering via the IFF_UNICAST_FLT flag.

Namely IFF_UNICAST_FLT indicates if the hardware can do
unicast address filtering. If no support is available
the device is put into promisc mode.

Signed-off-by: Vlad Yasevich <vyasevic@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 15:29:45 -05:00
David S. Miller
43b18db8a2 Merge branch 'master' of git://1984.lsi.us.es/nf
Pablo Neira Ayuso says:

====================
The following patchset contains Netfilter fixes for your net tree,
they are:

* Don't generate audit log message if audit is not enabled, from Gao Feng.

* Fix logging formatting for packets dropped by helpers, by Joe Perches.

* Fix a compilation warning in nfnetlink if CONFIG_PROVE_RCU is not set,
  from Paul Bolle.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-07 15:20:02 -05:00
Eric Dumazet
6906f4ed6f htb: add HTB_DIRECT_QLEN attribute
HTB uses an internal pfifo queue, which limit is not reported
to userland tools (tc), and value inherited from device tx_queue_len
at setup time.

Introduce TCA_HTB_DIRECT_QLEN attribute to allow finer control.

Remove two obsolete pr_err() calls as well.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 15:40:53 -05:00
Nicolas Dichtel
7a6742003f netconf: add the handler to dump entries
It's useful to be able to get the initial state of all entries. The patch adds
the support for IPv4 and IPv6.

Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 15:40:53 -05:00
Thomas Pedersen
87f59c70ce mac80211: init mesh timer for user authed STAs
There is a corner case which wasn't being covered:
userspace may authenticate and allocate stations,
but still leave the peering up to the kernel.

Initialize the peering timer if the MPM is not in
userspace, in a path which is taken by both the kernel and
userspace when allocating stations.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:14 +01:00
Thomas Pedersen
146bb4839a mac80211: disallow changing auto_open_plinks
while user MPM is running.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:13 +01:00
Thomas Pedersen
d37bb18ae3 nl80211: user_mpm overrides auto_open_plinks
If the user requested a userspace MPM, automatically
disable auto_open_plinks to fully disable the kernel MPM.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:13 +01:00
Thomas Pedersen
a6dad6a26e mac80211: support userspace MPM
Earlier mac80211 would check whether some kind of mesh
security was enabled, when the real question was "is the
MPM in userspace"?

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:12 +01:00
Thomas Pedersen
eef941e6d6 cfg80211: rename mesh station types
The mesh station types used to refer to whether the
station was secure or nonsecure. Really the salient
information is whether it is managed by the kernel or
userspace

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:11 +01:00
Thomas Pedersen
bb2798d45f nl80211: explicit userspace MPM
Secure mesh had the implicit requirement that the Mesh
Peering Management entity be in userspace.  However
userspace might want to implement an open MPM as well, so
specify a mesh setup parameter to indicate this.

Signed-off-by: Thomas Pedersen <thomas@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:11 +01:00
Thomas Huehn
2ff2b690c5 mac80211: improve minstrels rate sorting by means of throughput & probability
This patch improves the way minstrel sorts rates according to throughput
and success probability. 3 FOR-loops across the entire rate set in function
minstrel_update_stats() which where used to determine the fastest, second
fastest and most robust rate are reduced to 1 FOR-loop.

The sorted list of rates according throughput is extended to the best four
rates as we need them in upcoming joint rate and power control. The sorting
is done via the new function minstrel_sort_best_tp_rates().

The most robust rate selection is aligned with minstrel_ht's approach.
Once any success probability is above 95% the one with the highest
throughput is chosen as most robust rate. If success probabilities of all
rates are below 95%, the rate with the highest succ. prob. is elected as
most robust one

Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:10 +01:00
Thomas Huehn
db8c5ee692 mac80211: treat minstrel success probabilities below 10% as implausible
Based on minstrel_ht this patch treats success probabilities below 10% as
implausible values for throughput calculation in minstrel's statistics.
Current throughput per rate with such a low success probability is reset
to 0 MBit/s.

Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:09 +01:00
Thomas Huehn
f744bf81f7 mac80211: add lowest rate into minstrel's random rate sampling table
While minstrel bootstraps and fills the success probabilities of each
rate the lowest rate has typically a very high success probability
(often 100% in our tests).
Its statistics are never updated but considered to setup the mrr chain.
In our tests we see that especially the 3rd mrr stage (which is that
rate providing highest success probability) is filled with the lowest rate
because its initial high sucess probability is never updated. By design
the 4th mrr stage is filled with the lowest rate so often 3rd and 4th
mrr stage are equal.

This patch follows minstrels general approach of assuming as little
as possible about rate dependencies. Consequently we include the
lowest rate into the random sampling table to get balanced up-to-date
statistics of all rates and therefore balanced decisions.

Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:09 +01:00
Thomas Huehn
1e9c27df7b mac80211: extend minstrel's rate sampling to avoid unsampled rates
Minstrel's decision which rate should be directly sampled within the
1st mrr stage is limited to such rates faster than the current max
throughput rate. All rates below the current max. throughput rate
are indirectly sampled via the 2nd mrr stage.
This approach leads to deprecated per rate statistics and therfore
a deprecated mrr chain setup.

This patch uses the sampling approach from minstrel_ht. A counter is
added to sum all indirect sample attempts per rate. After 20 indirect
sampling attempts the rate is directly sampled within the 1st mrr stage.
Therefore more up-to-date statistics for all rates are maintained and
used to setup the mrr chain.

Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:08 +01:00
Thomas Huehn
8f15761197 mac80211: add documentation and verbose variable names in
Add documentation and more verbose variable names to minstrel's
multi-rate-retry setup within function minstrel_get_rate() to
increase the readability of the algorithm.

Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:08 +01:00
Thomas Huehn
c8ca8c2f93 mac80211: merge value scaling macros of minstrel_ht and minstrel
Both minstrel versions use individual ways to scale up integer values
to perform calculations. Merge minstrel_ht's scaling macros into
minstrels header file and use them in both minstrel versions.

Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:07 +01:00
Thomas Huehn
a512d4b543 mac80211: merge EWMA calculation of minstrel_ht and minstrel
Both rate control algorithms (minstrel and minstrel_ht) calculate
averages based on EWMA. Shift function minstrel_ewma() into
rc80211_minstrel.h and make use of it in both minstrel version.
Also shift the default EWMA level (75%) definition to the header file
and clean up variable usage.

Acked-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Thomas Huehn <thomas@net.t-labs.tu-berlin.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:06 +01:00
Felix Fietkau
52c00a37a3 mac80211/minstrel_ht: disable multiple consecutive sample attempts
The last minstrel_ht changes increased the sampling frequency for
potentially useful rates to decrease the response time to rate
fluctuations. This caused an increase in sampling frequency that can
slightly reduce throughput, so this patch limits the sampling attempts
to one per rate instead of two.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:06 +01:00
Johannes Berg
8ab9d85c65 regulatory: allow VHT channels in world roaming
For VHT, the wider bandwidths (up to 160 MHz) need
to be allowed. Since world roaming only covers the
case of connecting to an AP, it can be opened up
there, we will rely on the AP to know the local
regulations.

Acked-by: Luis R. Rodriguez <mcgrof@do-not-panic.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:05 +01:00
Johannes Berg
93d08f0b78 cfg80211: enable TDLS on P2P client interfaces
There's no reason TDLS should be prevented on P2P client
interfaces, and most of the code already handles it, so
allow adding stations for it.

Reported-by: Jouni Malinen <j@w1.fi>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:04 +01:00
Johannes Berg
90fcba65d2 mac80211: add VHT capabilities station debugfs file
Add a new debugfs file to view a station's VHT capabilities.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:04 +01:00
Johannes Berg
55d942f424 mac80211: restrict peer's VHT capabilities to own
Implement restricting peer VHT capabilities to the device's own
capabilities. This is useful when a single driver supports more
than one device and the devices have different capabilities
(often they will differ in the number of spatial streams), but
in particular is also necessary for VHT capability overrides to
work correctly -- otherwise it'd be possible to e.g. advertise,
due to overrides, that TX-STBC is not supported, but then still
use it to TX to the AP because it supports RX-STBC.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:03 +01:00
Johannes Berg
c07270b605 mac80211: fix HT capability overrides for AP station
HT capabilites are asymmetric -- e.g. beamforming is both an
RX and TX capability. If, for example, we support RX but not
TX, the RX capability of the AP station is masked out (if it
supports it). This works correctly if it's really the driver
capability.

If, on the other hand, the reason for not supporting TX BF
is that it was removed by HT capability overrides then the
wrong thing happens: the AP's TX capability will be removed
rather than its RX capability, because the override function
works on own capabilities, not remote ones, and doesn't take
the asymmetry into account.

To fix this make a copy of our own capabilities, apply the
overrides to them (where needed) and then use that to set up
the peer's capabilities.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:02 +01:00
Johannes Berg
4f4b9357e4 mac80211: don't apply HT overrides to TDLS peers
The HT overrides are intended only for the connection
to the AP, not for any other purpose. Therefore, don't
apply them to TDLS peers that are also stations added
to a managed station interface.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:02 +01:00
Johannes Berg
1861b84553 mac80211: simplify AP interface stop
For AP interfaces, there's no need to flush stations
or keys again when the interface is stopped as already
happened when the BSS was stopped on the interface.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:01 +01:00
Johannes Berg
7b4396bd68 mac80211: flush keys when stopping AP
Since hostapd will remove keys this isn't usually
an issue, but we shouldn't leak keys to the next
BSS started on the same interface. For VLANs this
also fixes a bug, keys that aren't removed would
otherwise be leaked.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:00 +01:00
Johannes Berg
8d1f7ecd2a mac80211: defer tailroom counter manipulation when roaming
During roaming, the crypto_tx_tailroom_needed_cnt counter
will often take values 2,1,0,1,2 because first keys are
removed and then new keys are added. This is inefficient
because during the 0->1 transition, synchronize_net must
be called to avoid packet races, although typically no
packets would be flowing during that time.

To avoid that, defer the decrement (2->1, 1->0) when keys
are removed (by half a second). This means the counter
will really have the values 2,2,2,3,4 ... 2, thus never
reaching 0 and having to do the 0->1 transition.

Note that this patch entirely disregards the drivers for
which this optimisation was done to start with, for them
the key removal itself will be expensive because it has
to synchronize_net() after the counter is incremented to
remove the key from HW crypto. For them the sequence will
look like this: 0,1,0,1,0,1,0,1,0 (*) which is clearly a
lot more inefficient. This could be addressed separately,
during key removal the 0->1->0 sequence isn't necessary.

(*) it starts at 0 because HW crypto is on, then goes to
    1 when HW crypto is disabled for a key, then back to
    0 because the key is deleted; this happens for both
    keys in the example. When new keys are added, it goes
    to 1 first because they're added in software; when a
    key is moved to hardware it goes back to 0

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:36:00 +01:00
Johannes Berg
a87121051c mac80211: remove IEEE80211_KEY_FLAG_WMM_STA
There's no driver using this flag, so it seems
that all drivers support HW crypto with WMM or
don't support it at all. Remove the flag and
code setting it.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:59 +01:00
Stanislaw Gruszka
153a5fc410 mac80211: merge reconfig assign chanctx code
Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:58 +01:00
Stanislaw Gruszka
690205f18f mac80211: cleanup suspend/resume on mesh mode
Remove not used any longer suspend/resume code.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:58 +01:00
Stanislaw Gruszka
a61829437e mac80211: cleanup suspend/resume on ibss mode
Remove not used any longer suspend/resume code.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:57 +01:00
Stanislaw Gruszka
9b7d72c104 mac80211: cleanup suspend/resume on managed mode
Remove not used any longer suspend/resume code.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:56 +01:00
Stanislaw Gruszka
12e7f51702 mac80211: cleanup generic suspend/resume procedures
Since now we disconnect before suspend, various code which save
connection state can now be removed from suspend and resume
procedure. Cleanup on resume side is smaller as ieee80211_reconfig()
is also used for H/W restart.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:56 +01:00
Stanislaw Gruszka
8125696991 cfg80211/mac80211: disconnect on suspend
If possible that after suspend, cfg80211 will receive request to
disconnect what require action on interface that was removed during
suspend.

Problem can manifest itself by various warnings similar to below one:

WARNING: at net/mac80211/driver-ops.h:12 ieee80211_bss_info_change_notify+0x2f9/0x300 [mac80211]()
wlan0:  Failed check-sdata-in-driver check, flags: 0x4
Call Trace:
 [<c043e0b3>] warn_slowpath_fmt+0x33/0x40
 [<f83707c9>] ieee80211_bss_info_change_notify+0x2f9/0x300 [mac80211]
 [<f83a660a>] ieee80211_recalc_ps_vif+0x2a/0x30 [mac80211]
 [<f83a6706>] ieee80211_set_disassoc+0xf6/0x500 [mac80211]
 [<f83a9441>] ieee80211_mgd_deauth+0x1f1/0x280 [mac80211]
 [<f8381b36>] ieee80211_deauth+0x16/0x20 [mac80211]
 [<f8261e70>] cfg80211_mlme_down+0x70/0xc0 [cfg80211]
 [<f8264de1>] __cfg80211_disconnect+0x1b1/0x1d0 [cfg80211]

To fix the problem disconnect from any associated network before
suspend. User space is responsible to establish connection again
after resume. This basically need to be done by user space anyway,
because associated stations can go away during suspend (for example
NetworkManager disconnects on suspend and connect on resume by default).

Patch also handle situation when driver refuse to suspend with wowlan
configured and try to suspend again without it.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:55 +01:00
Stanislaw Gruszka
30c97120c6 mac80211: remove napi
Since two years no mac80211 driver implement support for NAPI. Looks
this feature is unneeded, so remove it from generic mac80211 code.

Signed-off-by: Stanislaw Gruszka <sgruszka@redhat.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:54 +01:00
Felix Fietkau
098b8afbf2 mac80211/minstrel_ht: fix spacing between sample attempts
A sample attempt should only count in mi->sample_tries if the sample
attempt wasn't skipped based on slower rate criteria.
This patch increases the sampling frequency for potentially desirable
rates and thus enables faster recovery from interference or collisions.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:54 +01:00
Felix Fietkau
965237ab9f mac80211/minstrel_ht: increase sampling frequency of some slower rates
If a rate is below the max_tp_rate, sample it frequently if:
- it is above max_tp_rate2, or
- it is above max_prob_rate and is a candidate for max_prob_rate
  (has fewer streams than max_tp_rate).
This helps the retry chain recover more quickly from bad statistics
caused by collisions or interference, and slightly reduces throughput
fluctuations with higher rates.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:53 +01:00
Felix Fietkau
96d4ac3f2f minstrel_ht: increase sampling frequency
Try to sample all available rates, as sample attempts do not cost much
airtime and are appropriately spaced based on the average A-MPDU length.
This helps with faster recovery on rate fluctuations.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:53 +01:00
Felix Fietkau
a299c6d591 mac80211/minstrel_ht: improve max_prob_rate selection
max_prob_rate should be selected to be very reliable, however limiting
it to single-stream on 3-stream devices is a bit much.
Allow max_prob_rate to use one stream less than the max_tp_rate.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:52 +01:00
Felix Fietkau
ed97a13c54 mac80211/minstrel_ht: improve accuracy of throughput metric at high data rates
At high data rates the average frame transmission durations are small
enough for rounding errors to matter, sometimes causing minstrel to use
slightly lower transmit rates than necessary.
To fix this, change the unit of the duration value to nanoseconds
instead of microseconds, and reorder the multiplications/divisions when
calculating the throughput metric so that they don't overflow or
truncate prematurely.
At 2-stream HT40 this makes TCP throughput a bit more stable.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:51 +01:00
Jouni Malinen
355199e02b cfg80211: Extend support for IEEE 802.11r Fast BSS Transition
Add NL80211_CMD_UPDATE_FT_IES to support update of FT IEs to the WLAN
driver and NL80211_CMD_FT_EVENT to send FT events from the WLAN driver.
This will carry the target AP's MAC address along with the relevant
Information Elements. This event is used to report received FT IEs
(MDIE, FTIE, RSN IE, TIE, RICIE). These changes allow FT to be supported
with drivers that use an internal SME instead of user space option (like
FT implementation in wpa_supplicant with mac80211-based drivers).

Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:51 +01:00
Johannes Berg
723d568aa5 cfg80211: prohibit zero keepalive interval
It's not useful to specify a 0 keepalive interval, this
would send too much data. Prohibit this to also avoid
device issues.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:50 +01:00
Ilan Peer
d339d5ca8e mac80211: Allow drivers to differentiate between ROC types
Some devices can handle remain on channel requests differently
based on the request type/priority. Add support to
differentiate between different ROC types, i.e., indicate that
the ROC is required for sending managment frames.

Signed-off-by: Ilan Peer <ilan.peer@intel.com>
Reviewed-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:49 +01:00
Johannes Berg
f62fab735e cfg80211: refactor association parameters
cfg80211_mlme_assoc() has grown far too many arguments,
make the caller build almost all of the driver struct
and pass that to the function instead.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:49 +01:00
Johannes Berg
dd5ecfeac8 mac80211: support VHT capability overrides
Support the cfg80211 API to override VHT capabilities
on association.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:48 +01:00
Johannes Berg
ee2aca343c cfg80211: add ability to override VHT capabilities
For testing it's sometimes useful to be able to
override certain VHT capability advertisement,
add the ability to do that in cfg80211.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:47 +01:00
Johannes Berg
947add36ca cfg80211: move exported event functions into nl80211
This is the sort of thing gcc's LTO could do, but since
we don't have that yet we can also do it manually. The
advantage is reduced code, both source and binary, e.g.
on x86-64

   text	   data	    bss	    dec	    hex	filename
 442825	  56230	    776	 499831	  7a077	cfg80211.ko (before)
 441585	  56230	    776	 498591	  79b9f	cfg80211.ko (after)

a reduction of ~1k.

But in order to not complicate the code move only those
functions that are simple wrappers, not those that have
functionality of their own.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:46 +01:00
Johannes Berg
fe1abafd94 nl80211: re-add channel width and extended capa advertising
Add back the channel width and extended capability data
to wiphy information if split information is supported.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:45 +01:00
Johannes Berg
9a886586c8 wireless: move sequence number arithmetic to ieee80211.h
Move the sequence number arithmetic code from mac80211 to
ieee80211.h so others can use it. Also rename the functions
from _seq to _sn, they operate on the sequence number, not
the sequence_control field.

Also move macros to convert the sequence control to/from
the sequence number value from various drivers.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:44 +01:00
Johannes Berg
b56cf72083 nl80211: conditionally add back TCP WoWLAN information
Add back the previously removed TCP WoWLAN information,
but only if userspace is prepared to deal with large
wiphy capability data dumps.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:43 +01:00
Johannes Berg
cdc89b97bf nl80211: conditionally add back radar information
If userspace is updated to deal with large split wiphy
information dumps, add back the radar information that
could otherwise push the data over the limit of the
netlink dump messages.

Cc: Simon Wunderlich <simon.wunderlich@s2003.tu-chemnitz.de>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:43 +01:00
Johannes Berg
3713b4e364 nl80211: allow splitting wiphy information in dumps
The per-wiphy information is getting large, to the point
where with more than the typical number of channels it's
too large and overflows, and userspace can't get any of
the information at all.

To address this (in a way that doesn't require making all
messages bigger) allow userspace to specify that it can
deal with wiphy information split across multiple parts
of the dump, and if it can split up the data. This also
splits up each channel separately so an arbitrary number
of channels can be supported.

Additionally, since GET_WIPHY has the same problem, add
support for filtering the wiphy dump and get information
for a single wiphy only, this allows userspace apps to
use dump in this case to retrieve all data from a single
device.

As userspace needs to know if all this this is supported,
add a global nl80211 feature set and include a bit for
this behaviour in it.

Cc: Dennis H Jensen <dennis.h.jensen@siemens.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:42 +01:00
Johannes Berg
191922cd4b mac80211: clarify alignment comment
The comment says something about __skb_push(), but that
isn't even called in the code any more. Looking at the
git history, that comment never even made sense when it
was still called, so just replace that part to note it
still works even when align isn't 0 or 2.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:41 +01:00
Sachin Kamat
9fed3096d7 net: rfkill: Fix sparse warning in rfkill-regulator.c
'rfkill_regulator_ops' is used only in this file. Hence make it static.
Silences the following warning:
net/rfkill/rfkill-regulator.c:54:19: warning:
symbol 'rfkill_regulator_ops' was not declared. Should it be static?

Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:41 +01:00
Johannes Berg
77ee7c891a cfg80211: comprehensively check station changes
The station change API isn't being checked properly before
drivers are called, and as a result it is difficult to see
what should be allowed and what not.

In order to comprehensively check the API parameters parse
everything first, and then have the driver call a function
(cfg80211_check_station_change()) with the additionally
information about the kind of station that is being changed;
this allows the function to make better decisions than the
old code could.

While at it, also add a few checks, particularly in mesh
and clarify the TDLS station lifetime in documentation.

To be able to reduce a few checks, ignore any flag set bits
when the mask isn't set, they shouldn't be applied then.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:40 +01:00
Johannes Berg
ff276691e9 cfg80211: unify station WME parsing
Instead of copying the code, create a new function
to parse the station's WME information.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:39 +01:00
Johannes Berg
984c311b09 cfg80211: clean up station WME attribute parsing
Parse the attributes first, and then disable the apply
flag if needed.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:39 +01:00
Johannes Berg
f8bacc2104 cfg80211: clean up mesh plink station change API
Make the ability to leave the plink_state unchanged not use a
magic -1 variable that isn't in the enum, but an explicit change
flag; reject invalid plink states or actions and move the needed
constants for plink actions to the right header file. Also
reject plink_state changes for non-mesh interfaces.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:35:37 +01:00
Johannes Berg
c0f3a317f2 Merge remote-tracking branch 'mac80211/master' into HEAD
There are a few things that would otherwise conflict.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-06 16:33:12 +01:00
John W. Linville
32cdd592b7 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-06 10:21:17 -05:00
J. Bruce Fields
3c34ae11fa nfsd: fix krb5 handling of anonymous principals
krb5 mounts started failing as of
683428fae8 "sunrpc: Update svcgss xdr
handle to rpsec_contect cache".

The problem is that mounts are usually done with some host principal
which isn't normally mapped to any user, in which case svcgssd passes
down uid -1, which the kernel is then expected to map to the
export-specific anonymous uid or gid.

The new uid_valid/gid_valid checks were therefore causing that downcall
to fail.

(Note the regression may not have been seen with older userspace that
tended to map unknown principals to an anonymous id on their own rather
than leaving it to the kernel.)

Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-03-06 10:11:08 -05:00
David Ward
fa2b04f450 net/ipv4: Timestamp option cannot overflow with prespecified addresses
When a router forwards a packet that contains the IPv4 timestamp option,
if there is no space left in the option for the router to add its own
timestamp, then the router increments the Overflow value in the option.

However, if the addresses of the routers are prespecified in the option,
then the overflow condition cannot happen: the option is structured so
that each prespecified router has a place to write its timestamp. Other
routers do not add a timestamp, so there will never be a lack of space.

This fix ensures that the Overflow value in the IPv4 timestamp option is
not incremented when the addresses of the routers are prespecified, even
if the Pointer value is greater than the Length value.

Signed-off-by: David Ward <david.ward@ll.mit.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:06 -05:00
Eric Dumazet
d1f41b67ff net: reduce net_rx_action() latency to 2 HZ
We should use time_after_eq() to get maximum latency of two ticks,
instead of three.

Bug added in commit 24f8b2385 (net: increase receive packet quantum)

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:06 -05:00
Randy Dunlap
691b3b7e13 net: fix new kernel-doc warnings in net core
Fix new kernel-doc warnings in net/core/dev.c:

Warning(net/core/dev.c:4788): No description found for parameter 'new_carrier'
Warning(net/core/dev.c:4788): Excess function parameter 'new_carries' description in 'dev_change_carrier'

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:06 -05:00
Paolo Valente
76e4cb0d3a pkt_sched: sch_qfq: remove a useless invocation of qfq_update_eligible
QFQ+ can select for service only 'eligible' aggregates, i.e.,
aggregates that would have started to be served also in the emulated
ideal system.  As a consequence, for QFQ+ to be work conserving, at
least one of the active aggregates must be eligible when it is time to
choose the next aggregate to serve.

The set of eligible aggregates is updated through the function
qfq_update_eligible(), which does guarantee that, after its
invocation, at least one of the active aggregates is eligible.
Because of this property, this function is invoked in
qfq_deactivate_agg() to guarantee that at least one of the active
aggregates is still eligible after an aggregate has been deactivated.
In particular, the critical case is when there are other active
aggregates, but the aggregate being deactivated happens to be the only
one eligible.

However, this precaution is not needed for QFQ+ to be work conserving,
because update_eligible() is always invoked also at the beginning of
qfq_choose_next_agg(). This patch removes the additional invocation of
update_eligible() in qfq_deactivate_agg().

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
40dd2d5461 pkt_sched: sch_qfq: do not allow virtual time to jump if an aggregate is in service
By definition of (the algorithm of) QFQ+, the system virtual time must
be pushed up only if there is no 'eligible' aggregate, i.e. no
aggregate that would have started to be served also in the ideal
system emulated by QFQ+.  QFQ+ serves only eligible aggregates, hence
the aggregate currently in service is eligible.  As a consequence, to
decide whether there is no eligible aggregate, QFQ+ must also check
whether there is no aggregate in service.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
a0143efa96 pkt_sched: sch_qfq: prevent budget from wrapping around after a dequeue
Aggregate budgets are computed so as to guarantee that, after an
aggregate has been selected for service, that aggregate has enough
budget to serve at least one maximum-size packet for the classes it
contains. For this reason, after a new aggregate has been selected
for service, its next packet is immediately dequeued, without any
further control.

The maximum packet size for a class, lmax, can be changed through
qfq_change_class(). In case the user sets lmax to a lower value than
the the size of some of the still-to-arrive packets, QFQ+ will
automatically push up lmax as it enqueues these packets.  This
automatic push up is likely to happen with TSO/GSO.

In any case, if lmax is assigned a lower value than the size of some
of the packets already enqueued for the class, then the following
problem may occur: the size of the next packet to dequeue for the
class may happen to be larger than lmax, after the aggregate to which
the class belongs has been just selected for service. In this case,
even the budget of the aggregate, which is an unsigned value, may be
lower than the size of the next packet to dequeue. After dequeueing
this packet and subtracting its size from the budget, the latter would
wrap around.

This fix prevents the budget from wrapping around after any packet
dequeue.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
2f3b89a1fe pkt_sched: sch_qfq: serve activated aggregates immediately if the scheduler is empty
If no aggregate is in service, then the function qfq_dequeue() does
not dequeue any packet. For this reason, to guarantee QFQ+ to be work
conserving, a just-activated aggregate must be set as in service
immediately if it happens to be the only active aggregate.
This is done by the function qfq_enqueue().

Unfortunately, the function qfq_add_to_agg(), used to add a class to
an aggregate, does not perform this important additional operation.
In particular, if: 1) qfq_add_to_agg() is invoked to complete the move
of a class from a source aggregate, becoming, for this move, inactive,
to a destination aggregate, becoming instead active, and 2) the
destination aggregate becomes the only active aggregate, then this
aggregate is not however set as in service. QFQ+ remains then in a
non-work-conserving state until a new invocation of qfq_enqueue()
recovers the situation.

This fix solves the problem by moving the logic for setting an
aggregate as in service directly into the function qfq_activate_agg().
Hence, from whatever point qfq_activate_aggregate() is invoked, QFQ+
remains work conserving.  Since the more-complex logic of this new
version of activate_aggregate() is not necessary, in qfq_dequeue(), to
reschedule an aggregate that finishes its budget, then the aggregate
is now rescheduled by invoking directly the functions needed.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
624b85fb96 pkt_sched: sch_qfq: fix the update of eligible-group sets
Between two invocations of make_eligible, the system virtual time may
happen to grow enough that, in its binary representation, a bit with
higher order than 31 flips. This happens especially with
TSO/GSO. Before this fix, the mask used in make_eligible was computed
as (1UL<<index_of_last_flipped_bit)-1, whose value is well defined on
a 64-bit architecture, because index_of_flipped_bit <= 63, but is in
general undefined on a 32-bit architecture if index_of_flipped_bit > 31.
The fix just replaces 1UL with 1ULL.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Paolo Valente
9b99b7e90b pkt_sched: sch_qfq: properly cap timestamps in charge_actual_service
QFQ+ schedules the active aggregates in a group using a bucket list
(one list per group). The bucket in which each aggregate is inserted
depends on the aggregate's timestamps, and the number
of buckets in a group is enough to accomodate the possible (range of)
values of the timestamps of all the aggregates in the group. For this
property to hold, timestamps must however be computed correctly.  One
necessary condition for computing timestamps correctly is that the
number of bits dequeued for each aggregate, while the aggregate is in
service, does not exceed the maximum budget budgetmax assigned to the
aggregate.

For each aggregate, budgetmax is proportional to the number of classes
in the aggregate. If the number of classes of the aggregate is
decreased through qfq_change_class(), then budgetmax is decreased
automatically as well.  Problems may occur if the aggregate is in
service when budgetmax is decreased, because the current remaining
budget of the aggregate and/or the service already received by the
aggregate may happen to be larger than the new value of budgetmax.  In
this case, when the aggregate is eventually deselected and its
timestamps are updated, the aggregate may happen to have received an
amount of service larger than budgetmax.  This may cause the aggregate
to be assigned a higher virtual finish time than the maximum
acceptable value for the last bucket in the bucket list of the group.

This fix introduces a cap that addresses this issue.

Signed-off-by: Paolo Valente <paolo.valente@unimore.it>
Reviewed-by: Fabio Checconi <fchecconi@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Peter Hurley
f74861ca87 net/irda: Raise dtr in non-blocking open
DTR/RTS need to be raised, regardless of the open() mode, but not
if the port has already shutdown.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:05 -05:00
Peter Hurley
0b176ce3a7 net/irda: Use barrier to set task state
Without a memory and compiler barrier, the task state change
can migrate relative to the condition testing in a blocking loop.
However, the task state change must be visible across all cpus
prior to testing those conditions. Failing to do this can result
in the familiar 'lost wakeup' and this task will hang until killed.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:04 -05:00
Peter Hurley
2f7c069b96 net/irda: Hold port lock while bumping blocked_open
Although tty_lock() already protects concurrent update to
blocked_open, that fails to meet the separation-of-concerns between
tty_port and tty.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:04 -05:00
Peter Hurley
a4ed2e737c net/irda: Fix port open counts
Saving the port count bump is unsafe. If the tty is hung up while
this open was blocking, the port count is zeroed.

Explicitly check if the tty was hung up while blocking, and correct
the port count if not.

Signed-off-by: Peter Hurley <peter@hurleysoftware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-06 02:47:04 -05:00
Eric Dumazet
82dc3c63c6 net: introduce NAPI_POLL_WEIGHT
Some drivers use a too big NAPI poll weight.

This patch adds a NAPI_POLL_WEIGHT default value
and issues an error message if a driver attempts
to use a bigger weight.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-05 23:40:01 -05:00
Flavio Leitner
dd9f319d94 tcp: ipv6: bind() use stronger condition for bind_conflict
We must try harder to get unique (addr, port) pairs when
doing port autoselection for sockets with SO_REUSEADDR
option set.

This is a continuation of commit aacd9289af
for IPv6.

Signed-off-by: Flavio Leitner <fbl@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-05 23:40:00 -05:00
Linus Torvalds
9da060d0ed Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:
 "A moderately sized pile of fixes, some specifically for merge window
  introduced regressions although others are for longer standing items
  and have been queued up for -stable.

  I'm kind of tired of all the RDS protocol bugs over the years, to be
  honest, it's way out of proportion to the number of people who
  actually use it.

   1) Fix missing range initialization in netfilter IPSET, from Jozsef
      Kadlecsik.

   2) ieee80211_local->tim_lock needs to use BH disabling, from Johannes
      Berg.

   3) Fix DMA syncing in SFC driver, from Ben Hutchings.

   4) Fix regression in BOND device MAC address setting, from Jiri
      Pirko.

   5) Missing usb_free_urb in ISDN Hisax driver, from Marina Makienko.

   6) Fix UDP checksumming in bnx2x driver for 57710 and 57711 chips,
      fix from Dmitry Kravkov.

   7) Missing cfgspace_lock initialization in BCMA driver.

   8) Validate parameter size for SCTP assoc stats getsockopt(), from
      Guenter Roeck.

   9) Fix SCTP association hangs, from Lee A Roberts.

  10) Fix jumbo frame handling in r8169, from Francois Romieu.

  11) Fix phy_device memory leak, from Petr Malat.

  12) Omit trailing FCS from frames received in BGMAC driver, from Hauke
      Mehrtens.

  13) Missing socket refcount release in L2TP, from Guillaume Nault.

  14) sctp_endpoint_init should respect passed in gfp_t, rather than use
      GFP_KERNEL unconditionally.  From Dan Carpenter.

  15) Add AISX AX88179 USB driver, from Freddy Xin.

  16) Remove MAINTAINERS entries for drivers deleted during the merge
      window, from Cesar Eduardo Barros.

  17) RDS protocol can try to allocate huge amounts of memory, check
      that the user's request length makes sense, from Cong Wang.

  18) SCTP should use the provided KMALLOC_MAX_SIZE instead of it's own,
      bogus, definition.  From Cong Wang.

  19) Fix deadlocks in FEC driver by moving TX reclaim into NAPI poll,
      from Frank Li.  Also, fix a build error introduced in the merge
      window.

  20) Fix bogus purging of default routes in ipv6, from Lorenzo Colitti.

  21) Don't double count RTT measurements when we leave the TCP receive
      fast path, from Neal Cardwell."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (61 commits)
  tcp: fix double-counted receiver RTT when leaving receiver fast path
  CAIF: fix sparse warning for caif_usb
  rds: simplify a warning message
  net: fec: fix build error in no MXC platform
  net: ipv6: Don't purge default router if accept_ra=2
  net: fec: put tx to napi poll function to fix dead lock
  sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
  rds: limit the size allocated by rds_message_alloc()
  MAINTAINERS: remove eexpress
  MAINTAINERS: remove drivers/net/wan/cycx*
  MAINTAINERS: remove 3c505
  caif_dev: fix sparse warnings for caif_flow_cb
  ax88179_178a: ASIX AX88179_178A USB 3.0/2.0 to gigabit ethernet adapter driver
  sctp: use the passed in gfp flags instead GFP_KERNEL
  ipv[4|6]: correct dropwatch false positive in local_deliver_finish
  l2tp: Restore socket refcount when sendmsg succeeds
  net/phy: micrel: Disable asymmetric pause for KSZ9021
  bgmac: omit the fcs
  phy: Fix phy_device_free memory leak
  bnx2x: Fix KR2 work-around condition
  ...
2013-03-05 18:42:29 -08:00
Neal Cardwell
aab2b4bf22 tcp: fix double-counted receiver RTT when leaving receiver fast path
We should not update ts_recent and call tcp_rcv_rtt_measure_ts() both
before and after going to step5. That wastes CPU and double-counts the
receiver-side RTT sample.

Signed-off-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Silviu-Mihai Popescu
d2123be0e5 CAIF: fix sparse warning for caif_usb
This fixes the following sparse warning:
net/caif/caif_usb.c:84:16: warning: symbol 'cfusbl_create' was not
declared. Should it be static?

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Cong Wang
7dac1b514a rds: simplify a warning message
Cc: David S. Miller <davem@davemloft.net>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Lorenzo Colitti
3e8b0ac3e4 net: ipv6: Don't purge default router if accept_ra=2
Setting net.ipv6.conf.<interface>.accept_ra=2 causes the kernel
to accept RAs even when forwarding is enabled. However, enabling
forwarding purges all default routes on the system, breaking
connectivity until the next RA is received. Fix this by not
purging default routes on interfaces that have accept_ra=2.

Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:07 -05:00
Cong Wang
3f736868b4 sctp: use KMALLOC_MAX_SIZE instead of its own MAX_KMALLOC_SIZE
Don't definite its own MAX_KMALLOC_SIZE, use the one
defined in mm.

Cc: Vlad Yasevich <vyasevich@gmail.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:06 -05:00
Cong Wang
ece6b0a2b2 rds: limit the size allocated by rds_message_alloc()
Dave Jones reported the following bug:

"When fed mangled socket data, rds will trust what userspace gives it,
and tries to allocate enormous amounts of memory larger than what
kmalloc can satisfy."

WARNING: at mm/page_alloc.c:2393 __alloc_pages_nodemask+0xa0d/0xbe0()
Hardware name: GA-MA78GM-S2H
Modules linked in: vmw_vsock_vmci_transport vmw_vmci vsock fuse bnep dlci bridge 8021q garp stp mrp binfmt_misc l2tp_ppp l2tp_core rfcomm s
Pid: 24652, comm: trinity-child2 Not tainted 3.8.0+ #65
Call Trace:
 [<ffffffff81044155>] warn_slowpath_common+0x75/0xa0
 [<ffffffff8104419a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff811444ad>] __alloc_pages_nodemask+0xa0d/0xbe0
 [<ffffffff8100a196>] ? native_sched_clock+0x26/0x90
 [<ffffffff810b2128>] ? trace_hardirqs_off_caller+0x28/0xc0
 [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff811861f8>] alloc_pages_current+0xb8/0x180
 [<ffffffff8113eaaa>] __get_free_pages+0x2a/0x80
 [<ffffffff811934fe>] kmalloc_order_trace+0x3e/0x1a0
 [<ffffffff81193955>] __kmalloc+0x2f5/0x3a0
 [<ffffffff8104df0c>] ? local_bh_enable_ip+0x7c/0xf0
 [<ffffffffa0401ab3>] rds_message_alloc+0x23/0xb0 [rds]
 [<ffffffffa04043a1>] rds_sendmsg+0x2b1/0x990 [rds]
 [<ffffffff810b21cd>] ? trace_hardirqs_off+0xd/0x10
 [<ffffffff81564620>] sock_sendmsg+0xb0/0xe0
 [<ffffffff810b2052>] ? get_lock_stats+0x22/0x70
 [<ffffffff810b24be>] ? put_lock_stats.isra.23+0xe/0x40
 [<ffffffff81567f30>] sys_sendto+0x130/0x180
 [<ffffffff810b872d>] ? trace_hardirqs_on+0xd/0x10
 [<ffffffff816c547b>] ? _raw_spin_unlock_irq+0x3b/0x60
 [<ffffffff816cd767>] ? sysret_check+0x1b/0x56
 [<ffffffff810b8695>] ? trace_hardirqs_on_caller+0x115/0x1a0
 [<ffffffff81341d8e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
 [<ffffffff816cd742>] system_call_fastpath+0x16/0x1b
---[ end trace eed6ae990d018c8b ]---

Reported-by: Dave Jones <davej@redhat.com>
Cc: Dave Jones <davej@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
Acked-by: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-04 14:12:06 -05:00
Paul Bolle
9df9e78323 netfilter: nfnetlink: silence warning if CONFIG_PROVE_RCU isn't set
Since commit c14b78e7de ("netfilter:
nfnetlink: add mutex per subsystem") building nefnetlink.o without
CONFIG_PROVE_RCU set, triggers this GCC warning:
    net/netfilter/nfnetlink.c:65:22: warning: ‘nfnl_get_lock’ defined but not used [-Wunused-function]

The cause of that warning is, in short, that rcu_lockdep_assert()
compiles away if CONFIG_PROVE_RCU is not set. Silence this warning by
open coding nfnl_get_lock() in the sole place it was called, which
allows to remove that function.

Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-04 14:45:36 +01:00
Gao feng
ed018fa4df netfilter: xt_AUDIT: only generate audit log when audit enabled
We should stop generting audit log if audit is disabled.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-04 14:45:25 +01:00
Eric W. Biederman
7f78e03513 fs: Limit sys_mount to only request filesystem modules.
Modify the request_module to prefix the file system type with "fs-"
and add aliases to all of the filesystems that can be built as modules
to match.

A common practice is to build all of the kernel code and leave code
that is not commonly needed as modules, with the result that many
users are exposed to any bug anywhere in the kernel.

Looking for filesystems with a fs- prefix limits the pool of possible
modules that can be loaded by mount to just filesystems trivially
making things safer with no real cost.

Using aliases means user space can control the policy of which
filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
with blacklist and alias directives.  Allowing simple, safe,
well understood work-arounds to known problematic software.

This also addresses a rare but unfortunate problem where the filesystem
name is not the same as it's module name and module auto-loading
would not work.  While writing this patch I saw a handful of such
cases.  The most significant being autofs that lives in the module
autofs4.

This is relevant to user namespaces because we can reach the request
module in get_fs_type() without having any special permissions, and
people get uncomfortable when a user specified string (in this case
the filesystem type) goes all of the way to request_module.

After having looked at this issue I don't think there is any
particular reason to perform any filtering or permission checks beyond
making it clear in the module request that we want a filesystem
module.  The common pattern in the kernel is to call request_module()
without regards to the users permissions.  In general all a filesystem
module does once loaded is call register_filesystem() and go to sleep.
Which means there is not much attack surface exposed by loading a
filesytem module unless the filesystem is mounted.  In a user
namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
which most filesystems do not set today.

Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Acked-by: Kees Cook <keescook@chromium.org>
Reported-by: Kees Cook <keescook@google.com>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2013-03-03 19:36:31 -08:00
Silviu-Mihai Popescu
d6e89c0b76 caif_dev: fix sparse warnings for caif_flow_cb
This fixed the following sparse warning:
net/caif/caif_dev.c:121:6: warning: symbol 'caif_flow_cb' was not
declared. Should it be static?

Signed-off-by: Silviu-Mihai Popescu <silviupopescu1990@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-03 01:43:48 -05:00
Linus Torvalds
8d05b3771d NFS client bugfixes for Linux 3.9
- Don't allow NFS silly-renamed files to be deleted
 - Don't start the retransmission timer when out of socket space
 - Fix a couple of pnfs-related Oopses.
 - Fix one more NFSv4 state recovery deadlock
 - Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.13 (GNU/Linux)
 
 iQIcBAABAgAGBQJRMpMhAAoJEGcL54qWCgDy4BMP/0Zl7Ei7x9bJSb1C1lpPSo5p
 Lr9XoHLYqhPcAwKUXQfgM5IkC69bE62bD5esmdDqkgZYqnmGE0E4LG6MsbsMmvzk
 yug5WOxmjOFee7Bdpd8B86Z0qsa4l2TkQu2h9G3zE36P2rPKQaNzpteIjhis5UEQ
 EfNyLoBdFcuUSh4ztMVZOzbAyDcbNfsyl03XVmlv+Qn/o0l42Zjth0qwOP60bjuM
 zJF1CkHi5NLbXEhmOev9mA6UYz6zWRbiA/Yu92pomtXVDtOtzWpUniBIcf/S1ZH/
 V8Gj6bWdHHyFCa2PjhY1/QdLBOPRPdxpAAJk+q48AKmzyiOU6g3lIHBp5ai1WZNI
 1C+SYxABE/EJgq9SoQYGqq6SUiolrFulqnFHXF0jHF+ifdjoHjSRmpGQAoyoZ0k1
 aSl+Ojqx7QHibJd8GZBavWc3upRDzhHDRRB3tkQCENi+hryBZxEyeS2Z54NmBRUN
 tsOuyac6rtknZdD8Do4DMt9uc9u1DWicaiZbLfkP1VL1Angh6NKSA7qbmH6giLBS
 9Y+DPcIk5e34uKQ21WTxFydGD+SMg0EMnOmfr6EYXWEHBhKNYVR+cHyH0mAF6RzX
 enU2g0H2m+3vUQqajPUP0DV/eLGtdsvWvMjiskc3KX90CWfHmV2C8GFSxjV2OkT1
 vG1KFrICO6DR2943Udit
 =FMtb
 -----END PGP SIGNATURE-----

Merge tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs

Pull NFS client bugfixes from Trond Myklebust:
 "We've just concluded another Connectathon interoperability testing
  week, and so here are the fixes for the bugs that were discovered:

   - Don't allow NFS silly-renamed files to be deleted
   - Don't start the retransmission timer when out of socket space
   - Fix a couple of pnfs-related Oopses.
   - Fix one more NFSv4 state recovery deadlock
   - Don't loop forever when LAYOUTGET returns NFS4ERR_LAYOUTTRYLATER"

* tag 'nfs-for-3.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
  SUNRPC: One line comment fix
  NFSv4.1: LAYOUTGET EDELAY loops timeout to the MDS
  SUNRPC: add call to get configured timeout
  PNFS: set the default DS timeout to 60 seconds
  NFSv4: Fix another open/open_recovery deadlock
  nfs: don't allow nfs_find_actor to match inodes of the wrong type
  NFSv4.1: Hold reference to layout hdr in layoutget
  pnfs: fix resend_to_mds for directio
  SUNRPC: Don't start the retransmission timer when out of socket space
  NFS: Don't allow NFS silly-renamed files to be deleted, no signal
2013-03-02 16:46:07 -08:00
Trond Myklebust
512e4b291c SUNRPC: One line comment fix
Reported-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-03-02 15:54:11 -08:00
Joe Perches
f9caed59f8 netfilter: nf_ct_helper: Fix logging for dropped packets
Update nf_ct_helper_log to emit args along with the format.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2013-03-02 22:44:45 +01:00
Felix Fietkau
801d929ca7 mac80211: another fix for idle handling in monitor mode
When setting a monitor interface up or down, the idle state needs to be
recalculated, otherwise the hardware will just stay in its previous idle
state.

Signed-off-by: Felix Fietkau <nbd@openwrt.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-02 21:24:42 +01:00
Dan Carpenter
81ce0dbc11 sctp: use the passed in gfp flags instead GFP_KERNEL
This patch doesn't change how the code works because in the current
kernel gfp is always GFP_KERNEL.  But gfp was obviously intended
instead of GFP_KERNEL.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-01 15:59:56 -05:00
Neil Horman
d8c6f4b9b7 ipv[4|6]: correct dropwatch false positive in local_deliver_finish
I had a report recently of a user trying to use dropwatch to localise some frame
loss, and they were getting false positives.  Turned out they were using a user
space SCTP stack that used raw sockets to grab frames.  When we don't have a
registered protocol for a given packet, we record it as a drop, even if a raw
socket receieves the frame.  We should only record the drop in the event a raw
socket doesnt exist to receive the frames

Tested by the reported successfully

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: William Reich <reich@ulticom.com>
Tested-by: William Reich <reich@ulticom.com>
CC: "David S. Miller" <davem@davemloft.net>
CC: William Reich <reich@ulticom.com>
CC: eric.dumazet@gmail.com
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-01 15:56:29 -05:00
David S. Miller
9e0aab8649 Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless
John W. Linville says:

====================
This is another flurry of fixes intended for the 3.9 stream...

A mac80211 pull from Johannes:

"Seth fixes a stupid bug I introduced into one of his earlier patches,
Chun-Yeow fixes mesh forwarding and Felix fixes monitor mode. I myself
fixed a small locking issue and, the biggest change here, removed some
nl80211 information with which sometimes the per wiphy information was
getting too large for the typical 4k-minus-overhead. In my -next tree I
have a patch to allow splitting that and add back the information
removed now."

An iwlwifi pull from Johannes:

"I have a fix for a pretty important bug regarding DMA mapping, that
could cause the DMA engine to overwrite data we wanted to send to it, so
that the next time we send it it would be bad. This particularly affects
calibration results. Other than that, three little fixes for the MVM
driver."

But wait, there's more!

Avinash Patil fixes an incorrectly timed delay in mwifiex.

Bing Zhao prevents a crash in SD8688 caused by failing to properly
set a flag before issuing a command.

Felix Fietkau is the big here this time, providing a trio of minor
ath9k fixes and correcting the advertised interface combinations for
rt2x00 when mesh support is disabled.

Finally, Hauke Mehrtens gives us a patch that correctlin initializes
a spin lock in the bcma code.

Please let me know if there are problems!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-01 15:54:12 -05:00
Guillaume Nault
8b82547e33 l2tp: Restore socket refcount when sendmsg succeeds
The sendmsg() syscall handler for PPPoL2TP doesn't decrease the socket
reference counter after successful transmissions. Any successful
sendmsg() call from userspace will then increase the reference counter
forever, thus preventing the kernel's session and tunnel data from
being freed later on.

The problem only happens when writing directly on L2TP sockets.
PPP sockets attached to L2TP are unaffected as the PPP subsystem
uses pppol2tp_xmit() which symmetrically increase/decrease reference
counters.

This patch adds the missing call to sock_put() before returning from
pppol2tp_sendmsg().

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-03-01 14:13:09 -05:00
John W. Linville
98b7ff9a49 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless into for-davem 2013-03-01 13:52:03 -05:00
Johannes Berg
24af717c35 mac80211: fix VHT MCS calculation
The VHT MCSes we advertise to the AP were supposed to
be restricted to the AP, but due to a bug in the logic
mac80211 will advertise rates to the AP that aren't
even supported by the local device. To fix this skip
any adjustment if the NSS isn't supported at all.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-01 19:20:25 +01:00
Marco Porsch
7cbf9d017d mac80211: fix oops on mesh PS broadcast forwarding
Introduced with de74a1d903
"mac80211: fix WPA with VLAN on AP side with ps-sta".
Apparently overwrites the sdata pointer with non-valid data in
the case of mesh.
Fix this by checking for IFTYPE_AP_VLAN.

Signed-off-by: Marco Porsch <marco@cozybit.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-01 16:09:41 +01:00
Johannes Berg
645e77def9 nl80211: increase wiphy dump size dynamically
Given a device with many channels capabilities the wiphy
information can still overflow even though its size in
3.9 was reduced to 3.8 levels. For new userspace and
kernel 3.10 we're going to implement a new "split dump"
protocol that can use multiple messages per wiphy.

For now though, add a workaround to be able to send more
information to userspace. Since generic netlink doesn't
have a way to set the minimum dump size globally, and we
wouldn't really want to set it globally anyway, increase
the size only when needed, as described in the comments.
As userspace might not be prepared for large buffers, we
can only use 4k.

Also increase the size for the get_wiphy command.

Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-03-01 15:05:19 +01:00
Linus Torvalds
b6669737d3 Merge branch 'for-3.9' of git://linux-nfs.org/~bfields/linux
Pull nfsd changes from J Bruce Fields:
 "Miscellaneous bugfixes, plus:

   - An overhaul of the DRC cache by Jeff Layton.  The main effect is
     just to make it larger.  This decreases the chances of intermittent
     errors especially in the UDP case.  But we'll need to watch for any
     reports of performance regressions.

   - Containerized nfsd: with some limitations, we now support
     per-container nfs-service, thanks to extensive work from Stanislav
     Kinsbursky over the last year."

Some notes about conflicts, since there were *two* non-data semantic
conflicts here:

 - idr_remove_all() had been added by a memory leak fix, but has since
   become deprecated since idr_destroy() does it for us now.

 - xs_local_connect() had been added by this branch to make AF_LOCAL
   connections be synchronous, but in the meantime Trond had changed the
   calling convention in order to avoid a RCU dereference.

There were a couple of more obvious actual source-level conflicts due to
the hlist traversal changes and one just due to code changes next to
each other, but those were trivial.

* 'for-3.9' of git://linux-nfs.org/~bfields/linux: (49 commits)
  SUNRPC: make AF_LOCAL connect synchronous
  nfsd: fix compiler warning about ambiguous types in nfsd_cache_csum
  svcrpc: fix rpc server shutdown races
  svcrpc: make svc_age_temp_xprts enqueue under sv_lock
  lockd: nlmclnt_reclaim(): avoid stack overflow
  nfsd: enable NFSv4 state in containers
  nfsd: disable usermode helper client tracker in container
  nfsd: use proper net while reading "exports" file
  nfsd: containerize NFSd filesystem
  nfsd: fix comments on nfsd_cache_lookup
  SUNRPC: move cache_detail->cache_request callback call to cache_read()
  SUNRPC: remove "cache_request" argument in sunrpc_cache_pipe_upcall() function
  SUNRPC: rework cache upcall logic
  SUNRPC: introduce cache_detail->cache_request callback
  NFS: simplify and clean cache library
  NFS: use SUNRPC cache creation and destruction helper for DNS cache
  nfsd4: free_stid can be static
  nfsd: keep a checksum of the first 256 bytes of request
  sunrpc: trim off trailing checksum before returning decrypted or integrity authenticated buffer
  sunrpc: fix comment in struct xdr_buf definition
  ...
2013-02-28 18:02:55 -08:00
Linus Torvalds
1cf0209c43 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph updates from Sage Weil:
 "A few groups of patches here.  Alex has been hard at work improving
  the RBD code, layout groundwork for understanding the new formats and
  doing layering.  Most of the infrastructure is now in place for the
  final bits that will come with the next window.

  There are a few changes to the data layout.  Jim Schutt's patch fixes
  some non-ideal CRUSH behavior, and a set of patches from me updates
  the client to speak a newer version of the protocol and implement an
  improved hashing strategy across storage nodes (when the server side
  supports it too).

  A pair of patches from Sam Lang fix the atomicity of open+create
  operations.  Several patches from Yan, Zheng fix various mds/client
  issues that turned up during multi-mds torture tests.

  A final set of patches expose file layouts via virtual xattrs, and
  allow the policies to be set on directories via xattrs as well
  (avoiding the awkward ioctl interface and providing a consistent
  interface for both kernel mount and ceph-fuse users)."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (143 commits)
  libceph: add support for HASHPSPOOL pool flag
  libceph: update osd request/reply encoding
  libceph: calculate placement based on the internal data types
  ceph: update support for PGID64, PGPOOL3, OSDENC protocol features
  ceph: update "ceph_features.h"
  libceph: decode into cpu-native ceph_pg type
  libceph: rename ceph_pg -> ceph_pg_v1
  rbd: pass length, not op for osd completions
  rbd: move rbd_osd_trivial_callback()
  libceph: use a do..while loop in con_work()
  libceph: use a flag to indicate a fault has occurred
  libceph: separate non-locked fault handling
  libceph: encapsulate connection backoff
  libceph: eliminate sparse warnings
  ceph: eliminate sparse warnings in fs code
  rbd: eliminate sparse warnings
  libceph: define connection flag helpers
  rbd: normalize dout() calls
  rbd: barriers are hard
  rbd: ignore zero-length requests
  ...
2013-02-28 17:43:09 -08:00
Weston Andros Adamson
edddbb1eda SUNRPC: add call to get configured timeout
Returns the configured timeout for the xprt of the rpc client.

Signed-off-by: Weston Andros Adamson <dros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2013-02-28 17:35:20 -08:00
Lee A. Roberts
d003b41b80 sctp: fix association hangs due to partial delivery errors
In sctp_ulpq_tail_data(), use return values 0,1 to indicate whether
a complete event (with MSG_EOR set) was delivered.  A return value
of -ENOMEM continues to indicate an out-of-memory condition was
encountered.

In sctp_ulpq_retrieve_partial() and sctp_ulpq_retrieve_first(),
correct message reassembly logic for SCTP partial delivery.
Change logic to ensure that as much data as possible is sent
with the initial partial delivery and that following partial
deliveries contain all available data.

In sctp_ulpq_partial_delivery(), attempt partial delivery only
if the data on the head of the reassembly queue is at or before
the cumulative TSN ACK point.

In sctp_ulpq_renege(), use the modified return values from
sctp_ulpq_tail_data() to choose whether to attempt partial
delivery or to attempt to drain the reassembly queue as a
means to reduce memory pressure.  Remove call to
sctp_tsnmap_mark(), as this is handled correctly in call to
sctp_ulpq_tail_data().

Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2013-02-28 15:34:27 -05:00
Lee A. Roberts
95ac7b859f sctp: fix association hangs due to errors when reneging events from the ordering queue
In sctp_ulpq_renege_list(), events being reneged from the
ordering queue may correspond to multiple TSNs.  Identify
all affected packets; sum freed space and renege from the
tsnmap.

Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2013-02-28 15:34:26 -05:00
Lee A. Roberts
e67f85ecd8 sctp: fix association hangs due to reneging packets below the cumulative TSN ACK point
In sctp_ulpq_renege_list(), do not renege packets below the
cumulative TSN ACK point.

Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2013-02-28 15:34:26 -05:00
Lee A. Roberts
70fc69bc5a sctp: fix association hangs due to off-by-one errors in sctp_tsnmap_grow()
In sctp_tsnmap_mark(), correct off-by-one error when calculating
size value for sctp_tsnmap_grow().

In sctp_tsnmap_grow(), correct off-by-one error when copying
and resizing the tsnmap.  If max_tsn_seen is in the LSB of the
word, this bit can be lost, causing the corresponding packet
to be transmitted again and to be entered as a duplicate into
the SCTP reassembly/ordering queues.  Change parameter name
from "gap" (zero-based index) to "size" (one-based) to enhance
code readability.

Signed-off-by: Lee A. Roberts <lee.roberts@hp.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
2013-02-28 15:34:26 -05:00
J. Bruce Fields
dc107402ae SUNRPC: make AF_LOCAL connect synchronous
It doesn't appear that anyone actually needs to connect asynchronously.

Also, using a workqueue for the connect means we lose the namespace
information from the original process.  This is a problem since there's
no way to explicitly pass in a filesystem namespace for resolution of an
AF_LOCAL address.

Acked-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
2013-02-28 09:47:17 -08:00
Johannes Berg
feda30271e mac80211: really fix monitor mode channel reporting
After Felix's patch it was still broken in case you
used more than just a single monitor interface. Fix
it better now.

Reported-by: Sujith Manoharan <sujith@msujith.org>
Tested-by: Sujith Manoharan <sujith@msujith.org>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2013-02-28 09:59:22 +01:00
Sasha Levin
b67bfe0d42 hlist: drop the node parameter from iterators
I'm not sure why, but the hlist for each entry iterators were conceived

        list_for_each_entry(pos, head, member)

The hlist ones were greedy and wanted an extra parameter:

        hlist_for_each_entry(tpos, pos, head, member)

Why did they need an extra pos parameter? I'm not quite sure. Not only
they don't really need it, it also prevents the iterator from looking
exactly like the list iterator, which is unfortunate.

Besides the semantic patch, there was some manual work required:

 - Fix up the actual hlist iterators in linux/list.h
 - Fix up the declaration of other iterators based on the hlist ones.
 - A very small amount of places were using the 'node' parameter, this
 was modified to use 'obj->member' instead.
 - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
 properly, so those had to be fixed up manually.

The semantic patch which is mostly the work of Peter Senna Tschudin is here:

@@
iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;

type T;
expression a,c,d,e;
identifier b;
statement S;
@@

-T b;
    <+... when != b
(
hlist_for_each_entry(a,
- b,
c, d) S
|
hlist_for_each_entry_continue(a,
- b,
c) S
|
hlist_for_each_entry_from(a,
- b,
c) S
|
hlist_for_each_entry_rcu(a,
- b,
c, d) S
|
hlist_for_each_entry_rcu_bh(a,
- b,
c, d) S
|
hlist_for_each_entry_continue_rcu_bh(a,
- b,
c) S
|
for_each_busy_worker(a, c,
- b,
d) S
|
ax25_uid_for_each(a,
- b,
c) S
|
ax25_for_each(a,
- b,
c) S
|
inet_bind_bucket_for_each(a,
- b,
c) S
|
sctp_for_each_hentry(a,
- b,
c) S
|
sk_for_each(a,
- b,
c) S
|
sk_for_each_rcu(a,
- b,
c) S
|
sk_for_each_from
-(a, b)
+(a)
S
+ sk_for_each_from(a) S
|
sk_for_each_safe(a,
- b,
c, d) S
|
sk_for_each_bound(a,
- b,
c) S
|
hlist_for_each_entry_safe(a,
- b,
c, d, e) S
|
hlist_for_each_entry_continue_rcu(a,
- b,
c) S
|
nr_neigh_for_each(a,
- b,
c) S
|
nr_neigh_for_each_safe(a,
- b,
c, d) S
|
nr_node_for_each(a,
- b,
c) S
|
nr_node_for_each_safe(a,
- b,
c, d) S
|
- for_each_gfn_sp(a, c, d, b) S
+ for_each_gfn_sp(a, c, d) S
|
- for_each_gfn_indirect_valid_sp(a, c, d, b) S
+ for_each_gfn_indirect_valid_sp(a, c, d) S
|
for_each_host(a,
- b,
c) S
|
for_each_host_safe(a,
- b,
c, d) S
|
for_each_mesh_entry(a,
- b,
c, d) S
)
    ...+>

[akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
[akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
[akpm@linux-foundation.org: checkpatch fixes]
[akpm@linux-foundation.org: fix warnings]
[akpm@linux-foudnation.org: redo intrusive kvm changes]
Tested-by: Peter Senna Tschudin <peter.senna@gmail.com>
Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Signed-off-by: Sasha Levin <sasha.levin@oracle.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:24 -08:00
Tejun Heo
94960e8c2e sctp: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Cc: Sridhar Samudrala <sri@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:20 -08:00
Tejun Heo
9475af6e44 mac80211: convert to idr_alloc()
Convert to the much saner new idr interface.

Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-02-27 19:10:20 -08:00