Commit graph

1829 commits

Author SHA1 Message Date
Eric Dumazet
fa9a86ddc8 netfilter: use rcu_read_bh() in ipt_do_table()
Commit 784544739a
(netfilter: iptables: lock free counters) forgot to disable BH
in arpt_do_table(), ipt_do_table() and  ip6t_do_table()

Use rcu_read_lock_bh() instead of rcu_read_lock() cures the problem.

Reported-and-bisected-by: Roman Mindalev <r000n@r000n.net>
Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-04-02 00:54:43 -07:00
Jesper Nilsson
71f6f6dfdf ipv6: Plug sk_buff leak in ipv6_rcv (net/ipv6/ip6_input.c)
Commit 778d80be52
(ipv6: Add disable_ipv6 sysctl to disable IPv6 operaion on specific interface)
seems to have introduced a leak of sk_buff's for ipv6 traffic,
at least in some configurations where idev is NULL, or when ipv6
is disabled via sysctl.

The problem is that if the first condition of the if-statement
returns non-NULL, it returns an skb with only one reference,
and when the other conditions apply, execution jumps to the "out"
label, which does not call kfree_skb for it.

To plug this leak, change to use the "drop" label instead.
(this relies on it being ok to call kfree_skb on NULL)
This also allows us to avoid calling rcu_read_unlock here,
and removes the only user of the "out" label.

Signed-off-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-27 00:17:45 -07:00
David S. Miller
01e6de64d9 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-26 22:45:23 -07:00
Holger Eitzenberger
a400c30edb netfilter: nf_conntrack: calculate per-protocol nlattr size
Signed-off-by: Holger Eitzenberger <holger@eitzenberger.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 21:53:39 +01:00
Patrick McHardy
1f9352ae22 netfilter: {ip,ip6,arp}_tables: fix incorrect loop detection
Commit e1b4b9f ([NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case
search for loops) introduced a regression in the loop detection algorithm,
causing sporadic incorrectly detected loops.

When a chain has already been visited during the check, it is treated as
having a standard target containing a RETURN verdict directly at the
beginning in order to not check it again. The real target of the first
rule is then incorrectly treated as STANDARD target and checked not to
contain invalid verdicts.

Fix by making sure the rule does actually contain a standard target.

Based on patch by Francis Dupont <Francis_Dupont@isc.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 19:26:35 +01:00
Eric Dumazet
b8dfe49877 netfilter: factorize ifname_compare()
We use same not trivial helper function in four places. We can factorize it.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-25 17:31:52 +01:00
Vlad Yasevich
b2f5e7cd3d ipv6: Fix conflict resolutions during ipv6 binding
The ipv6 version of bind_conflict code calls ipv6_rcv_saddr_equal()
which at times wrongly identified intersections between addresses.
It particularly broke down under a few instances and caused erroneous
bind conflicts.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:11 -07:00
Vlad Yasevich
63d9950b08 ipv6: Make v4-mapped bindings consistent with IPv4
Binding to a v4-mapped address on an AF_INET6 socket should
produce the same result as binding to an IPv4 address on
AF_INET socket.  The two are interchangable as v4-mapped
address is really a portability aid.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:10 -07:00
Vlad Yasevich
0f8d3c7ac3 ipv6: Allow ipv4 wildcard binds after ipv6 address binds
The IPv4 wildcard (0.0.0.0) address does not intersect
in any way with explicit IPv6 addresses.  These two should
be permitted, but the IPv4 conflict code checks the ipv6only
bit as part of the test.  Since binding to an explicit IPv6
address restricts the socket to only that IPv6 address, the
side-effect is that the socket behaves as v6-only.  By
explicitely setting ipv6only in this case, allows the 2 binds
to succeed.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:10 -07:00
Vlad Yasevich
783ed5a783 ipv6: Disallow binding to v4-mapped address on v6-only socket.
A socket marked v6-only, can not receive or send traffic to v4-mapped
addresses.  Thus allowing binding to v4-mapped address on such a
socket makes no sense.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 19:49:09 -07:00
Jan Engelhardt
8dd1d0471b netfilter: trivial Kconfig spelling fixes
Supplements commit 67c0d57930.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-24 13:35:27 -07:00
David S. Miller
b5bb14386e Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6 2009-03-24 13:24:36 -07:00
Ilpo Järvinen
a0bffffc14 net/*: use linux/kernel.h swap()
tcp_sack_swap seems unnecessary so I pushed swap to the caller.
Also removed comment that seemed then pointless, and added include
when not already there. Compile tested.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-21 13:36:17 -07:00
David S. Miller
2b1c4354de Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/virtio_net.c
2009-03-20 02:27:41 -07:00
Jorge Boncompte [DTI2]
2bad35b7c9 netns: oops in ip[6]_frag_reasm incrementing stats
dev can be NULL in ip[6]_frag_reasm for skb's coming from RAW sockets.

Quagga's OSPFD sends fragmented packets on a RAW socket, when netfilter
conntrack reassembles them on the OUTPUT path you hit this code path.

You can test it with something like "hping2 -0 -d 2000 -f AA.BB.CC.DD"

With help from Jarek Poplawski.

Signed-off-by: Jorge Boncompte [DTI2] <jorge@dti2.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 23:26:11 -07:00
Bjørn Mork
1b1d8f73a4 ipv6: fix display of local and remote sit endpoints
This fixes the regressions cause by
commit 1326c3d5a4
(v2.6.28-rc6-461-g23a12b1) broke the display of local and remote
addresses of an SIT tunnel in iproute2.

nt->parms is used by ipip6_tunnel_init() and therefore need to be
initialized first.

Tracked as http://bugzilla.kernel.org/show_bug.cgi?id=12868

Reported-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 18:56:54 -07:00
Brian Haley
9bdd8d40c8 ipv6: Fix incorrect disable_ipv6 behavior
Fix the behavior of allowing both sysctl and addrconf_dad_failure()
to set the disable_ipv6 parameter without any bad side-effects.
If DAD fails and accept_dad > 1, we will still set disable_ipv6=1,
but then instead of allowing an RA to add an address then
immediately fail DAD, we simply don't allow the address to be
added in the first place.  This also lets the user set this flag
and disable all IPv6 addresses on the interface, or on the entire
system.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-18 18:22:48 -07:00
David S. Miller
2d6a5e9500 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/igb/igb_main.c
	drivers/net/qlge/qlge_main.c
	drivers/net/wireless/ath9k/ath9k.h
	drivers/net/wireless/ath9k/core.h
	drivers/net/wireless/ath9k/hw.c
2009-03-17 15:01:30 -07:00
David S. Miller
4ada8107f4 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6 2009-03-17 13:12:47 -07:00
Christoph Paasch
d1238d5337 netfilter: conntrack: check for NEXTHDR_NONE before header sanity checking
NEXTHDR_NONE doesn't has an IPv6 option header, so the first check
for the length will always fail and results in a confusing message
"too short" if debugging enabled. With this patch, we check for
NEXTHDR_NONE before length sanity checkings are done.

Signed-off-by: Christoph Paasch <christoph.paasch@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:52:11 +01:00
Scott James Remnant
26c3b67806 netfilter: auto-load ip6_queue module when socket opened
The ip6_queue module is missing the net-pf-16-proto-13 alias that would
cause it to be auto-loaded when a socket of that type is opened.  This
patch adds the alias.

Signed-off-by: Scott James Remnant <scott@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:30:14 +01:00
Christoph Paasch
9d2493f88f netfilter: remove IPvX specific parts from nf_conntrack_l4proto.h
Moving the structure definitions to the corresponding IPvX specific header files.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 15:15:35 +01:00
Eric Leblond
ca735b3aaa netfilter: use a linked list of loggers
This patch modifies nf_log to use a linked list of loggers for each
protocol. This list of loggers is read and write protected with a
mutex.

This patch separates registration and binding. To be used as
logging module, a module has to register calling nf_log_register()
and to bind to a protocol it has to call nf_log_bind_pf().
This patch also converts the logging modules to the new API. For nfnetlink_log,
it simply switchs call to register functions to call to bind function and
adds a call to nf_log_register() during init. For other modules, it just
remove a const flag from the logger structure and replace it with a
__read_mostly.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-03-16 14:54:21 +01:00
John Dykstra
ff8cf9a938 ipv6: Fix BUG when disabled ipv6 module is unloaded
Do not try to "uninitialize" ipv6 if its initialization had been skipped
because module parameter disable=1 had been specified.

Reported-by:  Thomas Backlund <tmb@mandriva.org>
Signed-off-by: John Dykstra <john.dykstra1@gmail.com>
Acked-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-11 09:22:51 -07:00
Stephen Hemminger
7546dd97d2 net: convert usage of packet_type to read_mostly
Protocols that use packet_type can be __read_mostly section for better
locality. Elminate any unnecessary initializations of NULL.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-10 05:22:43 -07:00
David S. Miller
508827ff0a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/tokenring/tmspci.c
	drivers/net/ucc_geth_mii.c
2009-03-05 02:06:47 -08:00
Brian Haley
fe7ca2e1e8 IPv6: add "disable" module parameter support to ipv6.ko
Add "disable" module parameter support to ipv6.ko by specifying
"disable=1" on module load.  We just do the minimum of initializing
inetsw6[] so calls from other modules to inet6_register_protosw()
won't OOPs, then bail out.  No IPv6 addresses or sockets can be
created as a result, and a reboot is required to enable IPv6.

Signed-off-by: Brian Haley <brian.haley@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-04 03:19:08 -08:00
Daniel Lezcano
176c39af29 netns: fix addrconf_ifdown kernel panic
When a network namespace is destroyed the network interfaces are
all unregistered, making addrconf_ifdown called by the netdevice
notifier. 
In the other hand, the addrconf exit method does a loop on the network
devices and does addrconf_ifdown on each of them. But the ordering of 
the netns subsystem is not right because it uses the register_pernet_device
instead of register_pernet_subsys. If we handle the loopback as
any network device, we can safely use register_pernet_subsys.

But if we use register_pernet_subsys, the addrconf exit method will do
exactly what was already done with the unregistering of the network
devices. So in definitive, this code is pointless.

I removed the netns addrconf exit method and moved the code to the
addrconf cleanup function.

Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 01:06:45 -08:00
Stephen Hemminger
b325fddb7f ipv6: Fix sysctl unregistration deadlock
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-03-03 00:47:47 -08:00
David S. Miller
aa4abc9bcc Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-tx.c
	net/8021q/vlan_core.c
	net/core/dev.c
2009-03-01 21:35:16 -08:00
Pavel Emelyanov
3f53a38131 ipv6: don't use tw net when accounting for recycled tw
We already have a valid net in that place, but this is not just a
cleanup - the tw pointer can be NULL there sometimes, thus causing
an oops in NET_NS=y case.

The same place in ipv4 code already works correctly using existing 
net, rather than tw's one.

The bug exists since 2.6.27.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-26 03:35:13 -08:00
David S. Miller
f11c179eea Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/orinoco/orinoco.c
2009-02-25 00:02:05 -08:00
Wei Yongjun
bb80087a94 sit: used time_before for comparing jiffies
The functions time_before is more robust for comparing
jiffies against other values.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:37:19 -08:00
Wei Yongjun
800d55f146 ipv6: Remove some pointless conditionals before kfree_skb()
Remove some pointless conditionals before kfree_skb().

The semantic match that finds the problem is as follows:
(http://www.emn.fr/x-info/coccinelle/)

// <smpl>
@@
expression E;
@@
- if (E)
- 	kfree_skb(E);
+ kfree_skb(E);
// </smpl>

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:33:52 -08:00
Pablo Neira Ayuso
1ce85fe402 netlink: change nlmsg_notify() return value logic
This patch changes the return value of nlmsg_notify() as follows:

If NETLINK_BROADCAST_ERROR is set by any of the listeners and
an error in the delivery happened, return the broadcast error;
else if there are no listeners apart from the socket that
requested a change with the echo flag, return the result of the
unicast notification. Thus, with this patch, the unicast
notification is handled in the same way of a broadcast listener
that has set the NETLINK_BROADCAST_ERROR socket flag.

This patch is useful in case that the caller of nlmsg_notify()
wants to know the result of the delivery of a netlink notification
(including the broadcast delivery) and take any action in case
that the delivery failed. For example, ctnetlink can drop packets
if the event delivery failed to provide reliable logging and
state-synchronization at the cost of dropping packets.

This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-24 23:18:28 -08:00
Hannes Eder
66da8c529a ipv6: fix sparse warning: Using plain integer as NULL pointer
Fix this sparse warning:
  net/ipv6/xfrm6_state.c:72:26: warning: Using plain integer as NULL pointer

Signed-off-by: Hannes Eder <hannes@hanneseder.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-21 23:37:10 -08:00
Stephen Hemminger
784544739a netfilter: iptables: lock free counters
The reader/writer lock in ip_tables is acquired in the critical path of
processing packets and is one of the reasons just loading iptables can cause
a 20% performance loss. The rwlock serves two functions:

1) it prevents changes to table state (xt_replace) while table is in use.
   This is now handled by doing rcu on the xt_table. When table is
   replaced, the new table(s) are put in and the old one table(s) are freed
   after RCU period.

2) it provides synchronization when accesing the counter values.
   This is now handled by swapping in new table_info entries for each cpu
   then summing the old values, and putting the result back onto one
   cpu.  On a busy system it may cause sampling to occur at different
   times on each cpu, but no packet/byte counts are lost in the process.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>

Sucessfully tested on my dual quad core machine too, but iptables only (no ipv6 here)
BTW, my new "tbench 8" result is 2450 MB/s, (it was 2150 MB/s not so long ago)

Acked-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-20 10:35:32 +01:00
Eric Dumazet
323dbf9638 netfilter: ip6_tables: unfold two loops in ip6_packet_match()
ip6_tables netfilter module can use an ifname_compare() helper
so that two loops are unfolded.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19 11:18:23 +01:00
Jan Engelhardt
4323362e49 netfilter: xtables: add backward-compat options
Concern has been expressed about the changing Kconfig options.
Provide the old options that forward-select.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-19 11:16:03 +01:00
Jan Engelhardt
cfac5ef7b9 netfilter: Combine ipt_ttl and ip6t_hl source
Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 18:39:31 +01:00
Jan Engelhardt
563d36eb3f netfilter: Combine ipt_TTL and ip6t_HL source
Suggested by: James King <t.james.king@gmail.com>

Similarly to commit c9fd496809, merge
TTL and HL. Since HL does not depend on any IPv6-specific function,
no new module dependencies would arise.

With slight adjustments to the Kconfig help text.

Signed-off-by: Jan Engelhardt <jengelh@medozas.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 18:38:40 +01:00
Eric Leblond
55df4ac0c9 netfilter: log invalid new icmpv6 packet with nf_log_packet()
This patch adds a logging message for invalid new icmpv6 packet.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 16:30:56 +01:00
Stephen Hemminger
9c8222b9e7 netfilter: x_tables: remove unneeded initializations
Later patches change the locking on xt_table and the initialization of
the lock element is not needed since the lock is always initialized in
xt_table_register anyway.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 16:30:20 +01:00
Eric Leblond
4aa3b2ee19 netfilter: nf_conntrack_ipv6: fix nf_log_packet message in icmpv6 conntrack
This patch fixes a trivial typo that was adding a new line at end of
the nf_log_packet() prefix. It also make the logging conditionnal by
adding a LOG_INVALID test.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2009-02-18 15:28:46 +01:00
David S. Miller
0ecc103aec Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/gianfar.c
2009-02-09 23:22:21 -08:00
Noriaki TAKAMIYA
20461c1740 IPv6: fix to set device name when new IPv6 over IPv6 tunnel device is created.
When the user creates IPv6 over IPv6 tunnel, the device name created
by the kernel isn't set to t->parm.name, which is referred as the
result of ioctl().

Signed-off-by: Noriaki TAKAMIYA <takamiya@po.ntts.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 15:01:19 -08:00
Eric Leblond
3f9007135c netfilter: nf_conntrack_ipv6: don't track ICMPv6 negotiation message
This patch removes connection tracking handling for ICMPv6 messages
related to Stateless Address Autoconfiguration, MLD, and MLDv2. They
can not be tracked because they are massively using multicast (on
pre-defined address). But they are not invalid and should not be
detected as such.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 14:33:20 -08:00
Eric Leblond
a51f42f3c9 netfilter: fix tuple inversion for Node information request
The patch fixes a typo in the inverse mapping of Node Information
request. Following draft-ietf-ipngwg-icmp-name-lookups-09, "Querier"
sends a type 139 (ICMPV6_NI_QUERY) packet to "Responder" which answer
with a type 140 (ICMPV6_NI_REPLY) packet.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-09 14:33:03 -08:00
David S. Miller
409f0a9014 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/wireless/iwlwifi/iwl-agn.c
	drivers/net/wireless/iwlwifi/iwl3945-base.c
2009-02-07 02:52:44 -08:00
Ilpo Järvinen
b5f348e5a4 ipv6/addrconf: common code located
$ codiff net/ipv6/addrconf.o net/ipv6/addrconf.o.new
net/ipv6/addrconf.c:
 addrconf_notify | -267
1 function changed, 267 bytes removed

net/ipv6/addrconf.c:
 add_addr |  +86
1 function changed, 86 bytes added

net/ipv6/addrconf.o.new:
2 functions changed, 86 bytes added, 267 bytes removed, diff: -181

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-02-06 23:48:01 -08:00