Commit graph

2437 commits

Author SHA1 Message Date
Pavel Emelyanov
f57e7d5a7b [GRE]: Add net/gre_net argument to some functions.
The fallback device and hashes are to become per-net, but many
code doesn't have anything to get the struct net pointer from.

So pass the proper net there with an extra argument.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:09:22 -07:00
Pavel Emelyanov
59a4c7594b [GRE]: Introduce empty ipgre_net structure and net init/exit ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:08:53 -07:00
Pavel Emelyanov
4597a0ce08 [IPIP]: Allow for IPPROTO_IPIP protocol in namespaces.
This one was disabled by default for sanity.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:06:56 -07:00
Pavel Emelyanov
0a826406d4 [IPIP]: Allow to create IPIP tunnels in net namespaces.
Set the proper net before calling register_netdev and disable 
the tunnel device netns changing.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:06:18 -07:00
Pavel Emelyanov
b99f0152e5 [IPIP]: Use proper net in (mostly) routing calls.
There are some ip_route_output_key() calls in there that require
a proper net so give one to them.

Besides - give a proper net to a single __get_dev_by_index call 
in ipip_tunnel_bind_dev().

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:05:57 -07:00
Pavel Emelyanov
44d3c299dc [IPIP]: Make tunnels hashes per net.
Either net or ipip_net already exists in all the required 
places, so just use one.

Besides, tune net_init and net_exit calls to respectively 
initialize the hashes and destroy devices.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:05:32 -07:00
Pavel Emelyanov
cec3ffae1a [IPIP]: Use proper net in hash-lookup functions.
This is the part#2 of the previous patch - get the proper
net for these functions.

I make it in a separate patch, so that this change does not
get lost in a large previous patch.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:05:03 -07:00
Pavel Emelyanov
b9fae5c913 [IPIP]: Add net/ipip_net argument to some functions.
The hashes of tunnels will be per-net too, so prepare all the 
functions that uses them for this change by adding an argument.

Use init_net temporarily in places, where the net does not exist
explicitly yet.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:04:35 -07:00
Pavel Emelyanov
b9855c54da [IPIP]: Make the fallback tunnel device per-net.
Create on in ipip_init_net(), use it all over the code (the
proper place to get the net from already exists) and destroy
in ipip_net_exit().

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:04:13 -07:00
Pavel Emelyanov
10dc4c7bb7 [IPIP]: Introduce empty ipip_net structure and net init/exit ops.
Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-16 01:03:13 -07:00
Ilpo Järvinen
17515408a1 [TCP]: Remove superflushious skb == write_queue_tail() check
Needed can only be more strict than what was checked by the
earlier common case check for non-tail skbs, thus
cwnd_len <= needed will never match in that case anyway.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-15 20:36:55 -07:00
YOSHIFUJI Hideaki
569508c964 [TCP]: Format addresses appropriately in debug messages.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 04:09:36 -07:00
YOSHIFUJI Hideaki
a7d632b6b4 [IPV4]: Use NIPQUAD_FMT to format ipv4 addresses.
And use %u to format port.

Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 04:09:00 -07:00
David S. Miller
334f8b2afd Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26 2008-04-14 03:50:43 -07:00
Pavel Emelyanov
7477fd2e6b [SOCK]: Add some notes about per-bind-bucket sock lookup.
I was asked about "why don't we perform a sk_net filtering in
bind_conflict calls, like we do in other sock lookup places"
for a couple of times.

Can we please add a comment about why we do not need one?

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 02:42:27 -07:00
David S. Miller
df39e8ba56 Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	drivers/net/ehea/ehea_main.c
	drivers/net/wireless/iwlwifi/Kconfig
	drivers/net/wireless/rt2x00/rt61pci.c
	net/ipv4/inet_timewait_sock.c
	net/ipv6/raw.c
	net/mac80211/ieee80211_sta.c
2008-04-14 02:30:23 -07:00
Jan Engelhardt
3c9fba656a [NETFILTER]: nf_conntrack: replace NF_CT_DUMP_TUPLE macro indrection by function call
Directly call IPv4 and IPv6 variants where the address family is
easily known.

Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:54 +02:00
Jan Engelhardt
12c33aa20e [NETFILTER]: nf_conntrack: const annotations in nf_conntrack_sctp, nf_nat_proto_gre
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:54 +02:00
Jan Engelhardt
f2ea825f48 [NETFILTER]: nf_nat: use bool type in nf_nat_proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt
09f263cd39 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l4proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:53 +02:00
Jan Engelhardt
8ce8439a31 [NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l3proto
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy
5e8fbe2ac8 [NETFILTER]: nf_conntrack: add tuplehash l3num/protonum accessors
Add accessors for l3num and protonum and get rid of some overly long
expressions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy
dd13b01036 [NETFILTER]: nf_nat: kill helper and seq_adjust hooks
Connection tracking helpers (specifically FTP) need to be called
before NAT sequence numbers adjustments are performed to be able
to compare them against previously seen ones. We've introduced
two new hooks around 2.6.11 to maintain this ordering when NAT
modules were changed to get called from conntrack helpers directly.

The cost of netfilter hooks is quite high and sequence number
adjustments are only rarely needed however. Add a RCU-protected
sequence number adjustment function pointer and call it from
IPv4 conntrack after calling the helper.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:52 +02:00
Patrick McHardy
8c87238b72 [NETFILTER]: nf_nat: don't add NAT extension for confirmed conntracks
Adding extensions to confirmed conntracks is not allowed to avoid races
on reallocation. Don't setup NAT for confirmed conntracks in case NAT
module is loaded late.

The has one side-effect, the connections existing before the NAT module
was loaded won't enter the bysource hash. The only case where this actually
makes a difference is in case of SNAT to a multirange where the IP before
NAT is also part of the range. Since old connections don't enter the
bysource hash the first new connection from the IP will have a new address
selected. This shouldn't matter at all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:51 +02:00
Patrick McHardy
42cf800c24 [NETFILTER]: nf_nat: remove obsolete check for ICMP redirects
Locally generated ICMP packets have a reference to the conntrack entry
of the original packet manually attached by icmp_send(). Therefore the
check for locally originated untracked ICMP redirects can never be
true.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:50 +02:00
Patrick McHardy
9d908a69a3 [NETFILTER]: nf_nat: add SCTP protocol support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:50 +02:00
Patrick McHardy
4910a08799 [NETFILTER]: nf_nat: add DCCP protocol support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:50 +02:00
Patrick McHardy
d63a650736 [NETFILTER]: Add partial checksum validation helper
Move the UDP-Lite conntrack checksum validation to a generic helper
similar to nf_checksum() and make it fall back to nf_checksum()
in case the full packet is to be checksummed and hardware checksums
are available. This is to be used by DCCP conntrack, which also
needs to verify partial checksums.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:49 +02:00
Patrick McHardy
6185f870e2 [NETFILTER]: nf_nat: add UDP-Lite support
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:48 +02:00
Patrick McHardy
2d2d84c40e [NETFILTER]: nf_nat: remove unused name from struct nf_nat_protocol
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:48 +02:00
Patrick McHardy
ca6a507490 [NETFILTER]: nf_conntrack_netlink: clean up NAT protocol parsing
Move responsibility for setting the IP_NAT_RANGE_PROTO_SPECIFIED flag
to the NAT protocol, properly propagate errors and get rid of ugly
return value convention.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:47 +02:00
Patrick McHardy
535b57c7c1 [NETFILTER]: nf_nat: move NAT ctnetlink helpers to nf_nat_proto_common
Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're
also used by protocols that don't have port numbers.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:47 +02:00
Patrick McHardy
5abd363f73 [NETFILTER]: nf_nat: fix random mode not to overwrite port rover
The port rover should not get overwritten when using random mode,
otherwise other rules will also use more or less random ports.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:46 +02:00
Patrick McHardy
937e0dfd87 [NETFILTER]: nf_nat: add helpers for common NAT protocol operations
Add generic ->in_range and ->unique_tuple ops to avoid duplicating them
again and again for future NAT modules and save a few bytes of text:

net/ipv4/netfilter/nf_nat_proto_tcp.c:
  tcp_in_range     |  -62 (removed)
  tcp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_udp.c:
  udp_in_range     |  -62 (removed)
  udp_unique_tuple | -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0
 2 functions changed, 321 bytes removed

net/ipv4/netfilter/nf_nat_proto_gre.c:
  gre_in_range |  -62 (removed)
 1 function changed, 62 bytes removed

vmlinux:
 5 functions changed, 704 bytes removed

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:46 +02:00
Patrick McHardy
544473c166 [NETFILTER]: {ip,ip6,arp}_tables: return EAGAIN for invalid SO_GET_ENTRIES size
Rule dumping is performed in two steps: first userspace gets the
ruleset size using getsockopt(SO_GET_INFO) and allocates memory,
then it calls getsockopt(SO_GET_ENTRIES) to actually dump the
ruleset. When another process changes the ruleset in between the
sizes from the first getsockopt call doesn't match anymore and
the kernel aborts. Unfortunately it returns EAGAIN, as for multiple
other possible errors, so userspace can't distinguish this case
from real errors.

Return EAGAIN so userspace can retry the operation.

Fixes (with current iptables SVN version) netfilter bugzilla #104.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:45 +02:00
Jan Engelhardt
c2f9c68398 [NETFILTER]: Explicitly initialize .priority in arptable_filter
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:44 +02:00
Jan Engelhardt
3bb0362d2f [NETFILTER]: remove arpt_(un)register_target indirection macros
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:44 +02:00
Jan Engelhardt
95eea855af [NETFILTER]: remove arpt_target indirection macro
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:43 +02:00
Jan Engelhardt
4abff0775d [NETFILTER]: remove arpt_table indirection macro
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:43 +02:00
Jan Engelhardt
72b72949db [NETFILTER]: annotate rest of nf_nat_* with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:42 +02:00
Jan Engelhardt
5452e425ad [NETFILTER]: annotate {arp,ip,ip6,x}tables with const
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 11:15:35 +02:00
Jan Engelhardt
3cf93c96af [NETFILTER]: annotate xtables targets with const and remove casts
Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:05 +02:00
Robert P. J. Day
fdccecd0cc [NETFILTER]: Use non-deprecated __RW_LOCK_UNLOCKED macro
Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:03 +02:00
Alexey Dobriyan
666953df35 [NETFILTER]: ip_tables: per-netns FILTER/MANGLE/RAW tables for real
Commit 9335f047fe aka
"[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW"
added per-netns _view_ of iptables rules. They were shown to user, but
ignored by filtering code. Now that it's possible to at least ping loopback,
per-netns tables can affect filtering decisions.

netns is taken in case of
	PRE_ROUTING, LOCAL_IN -- from in device,
	POST_ROUTING, LOCAL_OUT -- from out device,
	FORWARD -- from in device which should be equal to out device's netns.
		   This code is relatively new, so BUG_ON was plugged.

Wrappers were added to a) keep code the same from CONFIG_NET_NS=n users
(overwhelming majority), b) consolidate code in one place -- similar
changes will be done in ipv6 and arp netfilter code.

Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:02 +02:00
Patrick McHardy
36e2a1b0f7 [NETFILTER]: {ip,ip6}t_LOG: print MARK value in log output
Dump the mark value in log messages similar to nfnetlink_log. This
is useful for debugging complex setups where marks are used for
routing or traffic classification.

Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 09:56:01 +02:00
Pavel Emelyanov
4dee959723 [NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put
Consider we are putting a clusterip_config entry with the "entries"
count == 1, and on the other CPU there's a clusterip_config_find_get
in progress:

CPU1:							CPU2:
clusterip_config_entry_put:				clusterip_config_find_get:
if (atomic_dec_and_test(&c->entries)) {
	/* true */
							read_lock_bh(&clusterip_lock);
							c = __clusterip_config_find(clusterip);
							/* found - it's still in list */
							...
							atomic_inc(&c->entries);
							read_unlock_bh(&clusterip_lock);

	write_lock_bh(&clusterip_lock);
	list_del(&c->list);
	write_unlock_bh(&clusterip_lock);
	...
	dev_put(c->dev);

Oops! We have an entry returned by the clusterip_config_find_get,
which is a) not in list b) has a stale dev pointer.

The problems will happen when the CPU2 will release the entry - it
will remove it from the list for the 2nd time, thus spoiling it, and
will put a stale dev pointer.

The fix is to make atomic_dec_and_test under the clusterip_lock.

Signed-off-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
2008-04-14 00:44:52 -07:00
Gerrit Renker
7de6c03336 [SKB]: __skb_append = __skb_queue_after
This expresses __skb_append in terms of __skb_queue_after, exploiting that

  __skb_append(old, new, list) = __skb_queue_after(list, old, new).

Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-14 00:05:09 -07:00
Denis V. Lunev
5f4472c5a6 [TCP]: Remove owner from tcp_seq_afinfo.
Move it to tcp_seq_afinfo->seq_fops as should be.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:53 -07:00
Denis V. Lunev
68fcadd16c [TCP]: Place file operations directly into tcp_seq_afinfo.
No need to have separate never-used variable.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:13:30 -07:00
Denis V. Lunev
52d6f3f11b [TCP]: Cleanup /proc/tcp[6] creation/removal.
Replace seq_open with seq_open_net and remove tcp_seq_release
completely.  seq_release_net will do this job just fine.

Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-04-13 22:12:41 -07:00