Commit graph

1294 commits

Author SHA1 Message Date
Patrick McHardy
848c29fd64 [NETFILTER]: nat: avoid rerouting packets if only XFRM policy key changed
Currently NAT not only reroutes packets in the OUTPUT chain when the
routing key changed, but also if only the non-routing part of the
IPsec policy key changed. This breaks ping -I since it doesn't use
SO_BINDTODEVICE but IP_PKTINFO cmsg to specify the output device, and
this information is lost.

Only do full rerouting if the routing key changed, and just do a new
policy lookup with the old route if only the ports changed.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-22 12:30:29 -07:00
John Heffner
53cdcc04c1 [TCP]: Fix tcp_mem[] initialization.
Change tcp_mem initialization function.  The fraction of total memory
is now a continuous function of memory size, and independent of page
size.

Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-16 15:04:03 -07:00
Robert Olsson
d5cc4a73a5 [IPV4]: Do not disable preemption in trie_leaf_remove().
Hello, Just discussed this Patrick...

We have two users of trie_leaf_remove, fn_trie_flush and fn_trie_delete
both are holding RTNL. So there shouldn't be need for this preempt stuff.
This is assumed to a leftover from an older RCU-take.

> Mhh .. I think I just remembered something - me incorrectly suggesting
> to add it there while we were talking about this at OLS :) IIRC the
> idea was to make sure tnode_free (which at that time didn't use
> call_rcu) wouldn't free memory while still in use in a rcu read-side
> critical section. It should have been synchronize_rcu of course,
> but with tnode_free using call_rcu it seems to be completely
> unnecessary. So I guess we can simply remove it.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-16 15:00:07 -07:00
Geert Uytterhoeven
08882669e0 [IPV4]: Fix warning in ip_mc_rejoin_group.
Kill warning about unused variable `in_dev' when CONFIG_IP_MULTICAST
is not set.

Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-12 17:02:37 -07:00
Paul Moore
38c8947c1b [NetLabel]: parse the CIPSO ranged tag on incoming packets
Commit 484b366932 added support for the CIPSO
ranged categories tag.  However, it appears that I made a mistake when rebasing
then patch to the latest upstream sources for submission and dropped the part
of the patch that actually parses the tag on incoming packets.  This patch
fixes this mistake by adding the required function call to the
cipso_v4_skbuff_getattr() function.

I've run this patch over the weekend and have not noticed any problems.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-12 14:38:02 -07:00
Evgeniy Polyakov
c4e38f41e3 [IPV4]: Fix rtm_to_ifaddr() error handling.
Return negative error value (embedded in the pointer) instead of
returning NULL.

Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-09 13:43:24 -08:00
Herbert Xu
d644329bc9 [UDP]: Reread uh pointer after pskb_trim
The header may have moved when trimming.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-07 16:08:04 -08:00
Linus Torvalds
5b3c1184e7 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6:
  [DCCP]: Set RTO for newly created child socket
  [DCCP]: Correctly split CCID half connections
  [NET]: Fix compat_sock_common_getsockopt typo.
  [NET]: Revert incorrect accept queue backlog changes.
  [INET]: twcal_jiffie should be unsigned long, not int
  [GIANFAR]: Fix compile error in latest git
  [PPPOE]: Use ifindex instead of device pointer in key lookups.
  [NETFILTER]: ip6_route_me_harder should take into account mark
  [NETFILTER]: nfnetlink_log: fix reference counting
  [NETFILTER]: nfnetlink_log: fix module reference counting
  [NETFILTER]: nfnetlink_log: fix possible NULL pointer dereference
  [NETFILTER]: nfnetlink_log: fix NULL pointer dereference
  [NETFILTER]: nfnetlink_log: fix use after free
  [NETFILTER]: nfnetlink_log: fix reference leak
  [NETFILTER]: tcp conntrack: accept SYN|URG as valid
  [NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs
  [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops
2007-03-06 19:53:34 -08:00
Jay Vosburgh
a816c7c712 bonding: Improve IGMP join processing
In active-backup mode, the current bonding code duplicates IGMP
traffic to all slaves, so that switches are up to date in case of a
failover from an active to a backup interface.  If bonding then fails
back to the original active interface, it is likely that the "active
slave" switch's IGMP forwarding for the port will be out of date until
some event occurs to refresh the switch (e.g., a membership query).

	This patch alters the behavior of bonding to no longer flood
IGMP to all ports, and to issue IGMP JOINs to the newly active port at
the time of a failover.  This insures that switches are kept up to date
for all cases.

	"GOELLESCH Niels" <niels.goellesch@eurocontrol.int> originally
reported this problem, and included a patch.  His original patch was
modified by Jay Vosburgh to additionally remove the existing IGMP flood
behavior, use RCU, streamline code paths, fix trailing white space, and
adjust for style.

Signed-off-by: Jay Vosburgh <fubar@us.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-03-06 06:08:11 -05:00
Patrick McHardy
d3ab4298aa [NETFILTER]: tcp conntrack: accept SYN|URG as valid
Some stacks apparently send packets with SYN|URG set. Linux accepts
these packets, so TCP conntrack should to.

Pointed out by Martijn Posthuma <posthuma@sangine.com>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05 13:25:20 -08:00
Patrick McHardy
e281db5cdf [NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs
The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK,
but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or
CONFIG_NF_CONNTRACK_NETLINK for ifdefs.

Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05 13:25:19 -08:00
Patrick McHardy
ec68e97ded [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops
Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling:

- unconfirmed entries can not be killed manually, they are removed on
  confirmation or final destruction of the conntrack entry, which means
  we might iterate forever without making forward progress.

  This can happen in combination with the conntrack event cache, which
  holds a reference to the conntrack entry, which is only released when
  the packet makes it all the way through the stack or a different
  packet is handled.

- taking references to an unconfirmed entry and using it outside the
  locked section doesn't work, the list entries are not refcounted and
  another CPU might already be waiting to destroy the entry

What the code really wants to do is make sure the references of the hash
table to the selected conntrack entries are released, so they will be
destroyed once all references from skbs and the event cache are dropped.

Since unconfirmed entries haven't even entered the hash yet, simply mark
them as dying and skip confirmation based on that.

Reported and tested by Chuck Ebbert <cebbert@redhat.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-05 13:25:18 -08:00
Paul Moore
c6387a8694 [NetLabel]: Verify sensitivity level has a valid CIPSO mapping
The current CIPSO engine has a problem where it does not verify that
the given sensitivity level has a valid CIPSO mapping when the "std"
CIPSO DOI type is used.  The end result is that bad packets are sent
on the wire which should have never been sent in the first place.
This patch corrects this problem by verifying the sensitivity level
mapping similar to what is done with the category mapping.  This patch
also changes the returned error code in this case to -EPERM to better
match what the category mapping verification code returns.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-03-02 20:37:36 -08:00
Arnaldo Carvalho de Melo
a9948a7e15 [TCP]: Fix minisock tcp_create_openreq_child() typo.
On 2/28/07, KOVACS Krisztian <hidden@balabit.hu> wrote:
>
>   Hi,
>
>   While reading TCP minisock code I've found this suspiciously looking
> code fragment:
>
> - 8< -
> struct sock *tcp_create_openreq_child(struct sock *sk, struct request_sock *req, struct sk_buff *skb)
> {
>         struct sock *newsk = inet_csk_clone(sk, req, GFP_ATOMIC);
>
>         if (newsk != NULL) {
>                 const struct inet_request_sock *ireq = inet_rsk(req);
>                 struct tcp_request_sock *treq = tcp_rsk(req);
>                 struct inet_connection_sock *newicsk = inet_csk(sk);
>                 struct tcp_sock *newtp;
> - 8< -
>
>   The above code initializes newicsk to inet_csk(sk), isn't that supposed
> to be inet_csk(newsk)?  As far as I can tell this might leave
> icsk_ack.last_seg_size zero even if we do have received data.

Good catch!

David, please apply the attached patch.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-28 11:05:56 -08:00
Bernhard Walle
aef8811abb [XFRM]: Fix oops in xfrm4_dst_destroy()
With 2.6.21-rc1, I get an oops when running 'ifdown eth0' and an IPsec
connection is active. If I shut down the connection before running 'ifdown
eth0', then there's no problem.  The critical operation of this script is to
kill dhcpd.

The problem is probably caused by commit with git identifier
4337226228 (Linus tree) "[IPSEC]: IPv4 over IPv6
IPsec tunnel".

This patch fixes that oops. I don't know the network code of the Linux
kernel in deep, so if that fix is wrong, please change it. But please
fix the oops. :)

Signed-off-by: Bernhard Walle <bwalle@suse.de>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26 12:10:32 -08:00
Arnaldo Carvalho de Melo
e4396b544f [XFRM_TUNNEL]: Reload header pointer after pskb_may_pull/pskb_expand_head
Please consider applying, this was found on your latest
net-2.6 tree while playing around with that ip_hdr() + turn
skb->nh/h/mac pointers  as offsets on 64 bits idea :-)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26 11:43:01 -08:00
Joe Perches
4c3ae4d7e7 [IPV4]: Use random32() in net/ipv4/multipath
Removed local random number generator function

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26 11:43:00 -08:00
Herbert Xu
8030f54499 [IPV4] devinet: Register inetdev earlier.
This patch allocates inetdev at registration for all devices
in line with IPv6.  This allows sysctl configuration on the
devices to occur before they're brought up or addresses are
added.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-02-26 11:42:56 -08:00
Baruch Even
f4b9479dc5 [IPV4]: Correct links in net/ipv4/Kconfig
Correct dead/indirect links in net/ipv4/Kconfig

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26 11:42:51 -08:00
David S. Miller
2c4f6219ac [TCP]: Fix MD5 signature pool locking.
The locking calls assumed that these code paths were only
invoked in software interrupt context, but that isn't true.

Therefore we need to use spin_{lock,unlock}_bh() throughout.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-26 11:42:48 -08:00
Adrian Bunk
936bb14ce9 correct a dead URL in the IP_MULTICAST help text
Reported in kernel Bugzilla #6216.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:49:13 +01:00
Robert P. J. Day
d08df601a3 Various typo fixes.
Correct mis-spellings of "algorithm", "appear", "consistent" and
(shame, shame) "kernel".

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-02-17 19:07:33 +01:00
Eric W. Biederman
3fbfa98112 [PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables
It isn't needed anymore, all of the users are gone, and all of the ctl_table
initializers have been converted to use explicit names of the fields they are
initializing.

[akpm@osdl.org: NTFS fix]
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:10:00 -08:00
Eric W. Biederman
0b4d414714 [PATCH] sysctl: remove insert_at_head from register_sysctl
The semantic effect of insert_at_head is that it would allow new registered
sysctl entries to override existing sysctl entries of the same name.  Which is
pain for caching and the proc interface never implemented.

I have done an audit and discovered that none of the current users of
register_sysctl care as (excpet for directories) they do not register
duplicate sysctl entries.

So this patch simply removes the support for overriding existing entries in
the sys_sysctl interface since no one uses it or cares and it makes future
enhancments harder.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Ralf Baechle <ralf@linux-mips.org>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Ralf Baechle <ralf@linux-mips.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Corey Minyard <minyard@acm.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Cc: David Chinner <dgc@sgi.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:59 -08:00
Tim Schmielau
cd354f1ae7 [PATCH] remove many unneeded #includes of sched.h
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there.  Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.

To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.

Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm.  I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).

Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-14 08:09:54 -08:00
Kazunori MIYAZAWA
c0d56408e3 [IPSEC]: Changing API of xfrm4_tunnel_register.
This patch changes xfrm4_tunnel register and deregister
interface to prepare for solving the conflict of device
tunnels with inter address family IPsec tunnel.

Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:54:47 -08:00
Ilpo Järvinen
600ff0c24b [TCP]: Prevent pseudo garbage in SYN's advertized window
TCP may advertize up to 16-bits window in SYN packets (no window
scaling allowed). At the same time, TCP may have rcv_wnd
(32-bits) that does not fit to 16-bits without window scaling
resulting in pseudo garbage into advertized window from the
low-order bits of rcv_wnd. This can happen at least when
mss <= (1<<wscale) (see tcp_select_initial_window). This patch
fixes the handling of SYN advertized windows (compile tested
only).

In worst case (which is unlikely to occur though), the receiver
advertized window could be just couple of bytes. I'm not sure
that such situation would be handled very well at all by the
receiver!? Fortunately, the situation normalizes after the
first non-SYN ACK is received because it has the correct,
scaled window.

Alternatively, tcp_select_initial_window could be changed to
prevent too large rcv_wnd in the first place.

[ tcp_make_synack() has the same bug, and I've added a fix for
  that to this patch -DaveM ]

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:42:11 -08:00
Herbert Xu
bbf4a6bc8c [NETFILTER]: Clear GSO bits for TCP reset packet
The TCP reset packet is copied from the original.  This
includes all the GSO bits which do not apply to the new
packet.  So we should clear those bits.

Spotted by Patrick McHardy.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-13 12:32:58 -08:00
Patrick McHardy
e2e01bfef8 [XFRM]: Fix IPv4 tunnel mode decapsulation with IPV6=n
Add missing break when CONFIG_IPV6=n.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 20:27:10 -08:00
Stephen Hemminger
9121c77706 [TCP]: cleanup of htcp (resend)
Minor non-invasive cleanups:
 * white space around operators and line wrapping
 * use const
 * use __read_mostly

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 13:34:03 -08:00
Stephen Hemminger
59758f4459 [TCP]: Use read mostly for CUBIC parameters.
These module parameters should be in the read mostly area to avoid
cache pollution.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 13:15:20 -08:00
Patrick McHardy
a3c941b08d [NETFILTER]: Kconfig: improve dependency handling
Instead of depending on internally needed options and letting users
figure out what is needed, select them when needed:

- IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select
  NETFILTER_XTABLES

- NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and
  IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK

- NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:15:02 -08:00
Patrick McHardy
982d9a9ce3 [NETFILTER]: nf_conntrack: properly use RCU for nf_conntrack_destroyed callback
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:14:11 -08:00
Patrick McHardy
6b48a7d08d [NETFILTER]: ip_conntrack: properly use RCU for ip_conntrack_destroyed callback
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:13:58 -08:00
Patrick McHardy
abbaccda4c [NETFILTER]: ip_conntrack: fix invalid conntrack statistics RCU assumption
CONNTRACK_STAT_INC assumes rcu_read_lock in nf_hook_slow disables
preemption as well, making it legal to use __get_cpu_var without
disabling preemption manually. The assumption is not correct anymore
with preemptable RCU, additionally we need to protect against softirqs
when not holding ip_conntrack_lock.

Add CONNTRACK_STAT_INC_ATOMIC macro, which disables local softirqs,
and use where necessary.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:13:14 -08:00
Patrick McHardy
923f4902fe [NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).
  
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:57 -08:00
Patrick McHardy
642d628b2c [NETFILTER]: ip_conntrack: properly use RCU API for ip_ct_protos array
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
all paths not obviously only used within packet process context
(nfnetlink_conntrack).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:40 -08:00
Patrick McHardy
e22a054869 [NETFILTER]: nf_nat: properly use RCU API for nf_nat_protos array
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
paths used outside of packet processing context (nfnetlink_conntrack).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:26 -08:00
Patrick McHardy
a441dfdbb2 [NETFILTER]: ip_nat: properly use RCU API for ip_nat_protos array
Replace preempt_{enable,disable} based RCU by proper use of the
RCU API and add missing rcu_read_lock/rcu_read_unlock calls in
paths used outside of packet processing context (nfnetlink_conntrack).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:12:09 -08:00
Patrick McHardy
e92ad99c78 [NETFILTER]: nf_log: minor cleanups
- rename nf_logging to nf_loggers since its an array of registered loggers

- rename nf_log_unregister_logger() to nf_log_unregister() to make it
  symetrical to nf_log_register() and convert all users

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:11:55 -08:00
Patrick McHardy
c3a47ab3e5 [NETFILTER]: Properly use RCU in nf_ct_attach
Use rcu_assign_pointer/rcu_dereference for ip_ct_attach pointer instead
of self-made RCU and use rcu_read_lock to make sure the conntrack module
doesn't disappear below us while calling it, since this function can be
called from outside the netfilter hooks.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-12 11:09:19 -08:00
Arjan van de Ven
9a32144e9d [PATCH] mark struct file_operations const 7
Many struct file_operations in the kernel can be "const".  Marking them const
moves these to the .rodata section, which avoids false sharing with potential
dirty data.  In addition it'll catch accidental writes at compile time to
these shared resources.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-12 09:48:46 -08:00
Linus Torvalds
cb18eccff4 Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits)
  [IPV4]: Restore multipath routing after rt_next changes.
  [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch.
  [NET]: Reorder fields of struct dst_entry
  [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer
  [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer
  [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer
  [NET]: Introduce union in struct dst_entry to hold 'next' pointer
  [DECNET]: fix misannotation of linkinfo_dn
  [DECNET]: FRA_{DST,SRC} are le16 for decnet
  [UDP]: UDP can use sk_hash to speedup lookups
  [NET]: Fix whitespace errors.
  [NET] XFRM: Fix whitespace errors.
  [NET] X25: Fix whitespace errors.
  [NET] WANROUTER: Fix whitespace errors.
  [NET] UNIX: Fix whitespace errors.
  [NET] TIPC: Fix whitespace errors.
  [NET] SUNRPC: Fix whitespace errors.
  [NET] SCTP: Fix whitespace errors.
  [NET] SCHED: Fix whitespace errors.
  [NET] RXRPC: Fix whitespace errors.
  ...
2007-02-11 11:38:13 -08:00
Robert P. J. Day
c376222960 [PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc().
Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the
corresponding "kmem_cache_zalloc()" call.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Andi Kleen <ak@muc.de>
Cc: Roland McGrath <roland@redhat.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: Greg KH <greg@kroah.com>
Acked-by: Joel Becker <Joel.Becker@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Jan Kara <jack@ucw.cz>
Cc: Michael Halcrow <mhalcrow@us.ibm.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Stephen Smalley <sds@tycho.nsa.gov>
Cc: James Morris <jmorris@namei.org>
Cc: Chris Wright <chrisw@sous-sol.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-02-11 10:51:27 -08:00
Eric Dumazet
5ef213f684 [IPV4]: Restore multipath routing after rt_next changes.
I forgot to test build this part of the networking code... Sorry guys.
This patch renames u.rt_next to u.dst.rt_next

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:50 -08:00
Eric Dumazet
093c2ca416 [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer
This patch removes the rt_next pointer from 'struct rtable.u' union,
and renames u.rt_next to u.dst_rt_next.

It also moves 'struct flowi' right after 'struct dst_entry' to prepare
the gain on lookups.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:38 -08:00
Eric Dumazet
95f30b336b [UDP]: UDP can use sk_hash to speedup lookups
In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash)  to let
tcp lookups use one cache line per unmatched entry instead of two.

We can also use sk_hash to speedup UDP part as well. We store in sk_hash the
hnum value, and use sk->sk_hash (same cache line than 'next' pointer),
instead of inet->num (different cache line)

Note : We still have a false sharing problem for SMP machines, because
sock_hold(sock) dirties the cache line containing the 'next' pointer. Not
counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) )

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:20:29 -08:00
YOSHIFUJI Hideaki
e905a9edab [NET] IPV4: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-10 23:19:39 -08:00
Eric Dumazet
dbca9b2750 [NET]: change layout of ehash table
ehash table layout is currently this one :

First half of this table is used by sockets not in TIME_WAIT state
Second half of it is used by sockets in TIME_WAIT state.

This is non optimal because of for a given hash or socket, the two chain heads 
are located in separate cache lines.
Moreover the locks of the second half are never used.

If instead of this halving, we use two list heads in inet_ehash_bucket instead 
of only one, we probably can avoid one cache miss, and reduce ram usage, 
particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, 
CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep 
together related chains to speedup lookups and socket state change.

In this patch I did not try to align struct inet_ehash_bucket, but a future 
patch could try to make this structure have a convenient size (a power of two 
or a multiple of L1_CACHE_SIZE).
I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so 
maybe we dont need to scratch our heads to align the bucket...

Note : In case struct inet_ehash_bucket is not a power of two, we could 
probably change alloc_large_system_hash() (in case it use __get_free_pages()) 
to free the unused space. It currently allocates a big zone, but the last 
quarter of it could be freed. Again, this should be a temporary 'problem'.

Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 14:16:46 -08:00
Jan Engelhardt
e60a13e030 [NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:20 -08:00
Jan Engelhardt
6709dbbb19 [NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functions
Use the x_tables functions directly to make it better visible which
parts are shared between ip_tables and ip6_tables.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:19 -08:00
Jan Engelhardt
e1fd0586b0 [NETFILTER]: x_tables: fix return values for LOG/ULOG
Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:18 -08:00
Eric Leblond
41f4689a7c [NETFILTER]: NAT: optional source port randomization support
This patch adds support to NAT to randomize source ports.

Signed-off-by: Eric Leblond <eric@inl.fr>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:17 -08:00
Patrick McHardy
cdd289a2f8 [NETFILTER]: add IPv6-capable TCPMSS target
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:16 -08:00
Patrick McHardy
a8d0f9526f [NET]: Add UDPLITE support in a few missing spots
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:14 -08:00
Patrick McHardy
efbc597634 [NETFILTER]: nf_nat: remove broken HOOKNAME macro
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:12 -08:00
Patrick McHardy
a09113c2c8 [NETFILTER]: tcp conntrack: do liberal tracking for picked up connections
Do liberal tracking (only RSTs need to be in-window) for connections picked
up without seeing a SYN to deal with window scaling. Also change logging
of invalid packets not to log packets accepted by liberal tracking to avoid
spamming the logs.

Based on suggestion from James Ralston <ralston@pobox.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:10 -08:00
Stephen Hemminger
22f8cde5bc [NET]: unregister_netdevice as void
There was no real useful information from the unregister_netdevice() return
code, the only error occurred in a situation that was a driver bug. So
change it to a void function.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:06 -08:00
Alexey Dobriyan
cc63f70b8b [IPV4/IPV6] multicast: Check add_grhead() return value
add_grhead() allocates memory with GFP_ATOMIC and in at least two places skb
from it passed to skb_put() without checking.

Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:04 -08:00
David S. Miller
f2f2102d1a [XFRM]: Fix missed error setting in xfrm4_policy.c
When we can't find the afinfo we should return EAFNOSUPPORT.
GCC warned about the uninitialized 'err' for this path as well.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:03 -08:00
Miika Komu
4337226228 [IPSEC]: IPv4 over IPv6 IPsec tunnel
This is the patch to support IPv4 over IPv6 IPsec.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:02 -08:00
Miika Komu
c82f963efe [IPSEC]: IPv6 over IPv4 IPsec tunnel
This is the patch to support IPv6 over IPv4 IPsec

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:01 -08:00
Miika Komu
cdca72652a [IPSEC]: exporting xfrm_state_afinfo
This patch exports xfrm_state_afinfo.

Signed-off-by: Miika Komu <miika@iki.fi>
Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi>
Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:39:00 -08:00
John Heffner
104439a887 [TCP]: Don't apply FIN exception to full TSO segments.
Signed-off-by: John Heffner <jheffner@psc.edu>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:51 -08:00
Baruch Even
8a3c3a9727 [TCP]: Check num sacks in SACK fast path
We clear the unused parts of the SACK cache, This prevents us from mistakenly
taking the cache data if the old data in the SACK cache is the same as the data
in the SACK block. This assumes that we never receive an empty SACK block with
start and end both at zero.

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:50 -08:00
Baruch Even
6f74651ae6 [TCP]: Seperate DSACK from SACK fast path
Move DSACK code outside the SACK fast-path checking code. If the DSACK
determined that the information was too old we stayed with a partial cache
copied. Most likely this matters very little since the next packet will not be
DSACK and we will find it in the cache. but it's still not good form and there
is little reason to couple the two checks.

Since the SACK receive cache doesn't need the data to be in host order we also
remove the ntohl in the checking loop.

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:49 -08:00
Baruch Even
fda03fbb56 [TCP]: Advance fast path pointer for first block only
Only advance the SACK fast-path pointer for the first block, the
fast-path assumes that only the first block advances next time so we
should not move the cached skb for the next sack blocks.

Signed-off-by: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:48 -08:00
David S. Miller
8eb9086f21 [IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts.
Do this even for non-blocking sockets.  This avoids the silly -EAGAIN
that applications can see now, even for non-blocking sockets in some
cases (f.e. connect()).

With help from Venkat Tekkirala.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:45 -08:00
Frederik Deweerdt
ba7808eac1 [TCP]: remove tcp header from tcp_v4_check (take #2)
The tcphdr struct passed to tcp_v4_check is not used, the following
patch removes it from the parameter list.

This adds the netfilter modifications missing in the patch I sent
for rc3-mm1.

Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:44 -08:00
Patrick McHardy
26932566a4 [NETLINK]: Don't BUG on undersized allocations
Currently netlink users BUG when the allocated skb for an event
notification is undersized. While this is certainly a kernel bug,
its not critical and crashing the kernel is too drastic, especially
when considering that these errors have appeared multiple times in
the past and it BUGs even if no listeners are present.

This patch replaces BUG by WARN_ON and changes the notification
functions to inform potential listeners of undersized allocations
using a unique error code (EMSGSIZE).

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-08 12:38:41 -08:00
Patrick McHardy
40e0cb004a [NETFILTER]: ctnetlink: fix compile failure with NF_CONNTRACK_MARK=n
CC      net/netfilter/nf_conntrack_netlink.o
net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_conntrack_event':
net/netfilter/nf_conntrack_netlink.c:392: error: 'struct nf_conn' has no member named 'mark'
make[3]: *** [net/netfilter/nf_conntrack_netlink.o] Error 1

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-02-02 19:33:11 -08:00
Patrick McHardy
adcb471110 [NETFILTER]: SIP conntrack: fix out of bounds memory access
When checking for an @-sign in skp_epaddr_len, make sure not to
run over the packet boundaries.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-30 14:25:24 -08:00
Lars Immisch
7da5bfbb12 [NETFILTER]: SIP conntrack: fix skipping over user info in SIP headers
When trying to skip over the username in the Contact header, stop at the
end of the line if no @ is found to avoid mangling following headers.
We don't need to worry about continuation lines because we search inside
a SIP URI.

Fixes Netfilter Bugzilla #532.

Signed-off-by: Lars Immisch <lars@ibp.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-30 14:24:57 -08:00
Robert Olsson
095b8501e4 [IPV4]: Fix single-entry /proc/net/fib_trie output.
When main table is just a single leaf this gets printed as belonging to the 
local table in /proc/net/fib_trie. A fix is below.

Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Acked-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 19:06:01 -08:00
Patrick McHardy
a46bf7d5a8 [NETFILTER]: nf_nat_pptp: fix expectation removal
When removing the expectation for the opposite direction, the PPTP NAT
helper initializes the tuple for lookup with the addresses of the
opposite direction, which makes the lookup fail.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:07:30 -08:00
Patrick McHardy
c72c6b2a29 [NETFILTER]: nf_nat: fix ICMP translation with statically linked conntrack
When nf_nat/nf_conntrack_ipv4 are linked statically, nf_nat is initialized
before nf_conntrack_ipv4, which makes the nf_ct_l3proto_find_get(AF_INET)
call during nf_nat initialization return the generic l3proto instead of
the AF_INET specific one. This breaks ICMP error translation since the
generic protocol always initializes the IPs in the tuple to 0.

Change the linking order and put nf_conntrack_ipv4 first.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:06:47 -08:00
David S. Miller
e89862f4c5 [TCP]: Restore SKB socket owner setting in tcp_transmit_skb().
Revert 931731123a

We can't elide the skb_set_owner_w() here because things like certain
netfilter targets (such as owner MATCH) need a socket to be set on the
SKB for correct operation.

Thanks to Jan Engelhardt and other netfilter list members for
pointing this out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-26 01:04:55 -08:00
Baruch Even
db3ccdac26 [TCP]: Fix sorting of SACK blocks.
The sorting of SACK blocks actually munges them rather than sort,
causing the TCP stack to ignore some SACK information and breaking the
assumption of ordered SACK blocks after sorting.

The sort takes the data from a second buffer which isn't moved causing
subsequent data moves to occur from the wrong location. The fix is to
use a temporary buffer as a normal sort does.

Signed-off-By: Baruch Even <baruch@ev-en.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-25 13:35:06 -08:00
Eric W. Biederman
6640e69731 [IPV4]: Fix the fib trie iterator to work with a single entry routing tables
In a kernel with trie routing enabled I had a simple routing setup
with only a single route to the outside world and no default
route. "ip route table list main" showed my the route just fine but
/proc/net/route was an empty file.  What was going on?

Thinking it was a bug in something I did and I looked deeper.  Eventually
I setup a second route and everything looked correct, huh?  Finally I
realized that the it was just the iterator pair in fib_trie_get_first,
fib_trie_get_next just could not handle a routing table with a single entry.

So to save myself and others further confusion, here is a simple fix for
the fib proc iterator so it works even when there is only a single route
in a routing table.

Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
Signed-off-by: Robert Olsson <robert.olsson@its.uu.se>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-24 14:42:04 -08:00
Jarek Poplawski
52d570aabe [TCP]: rare bad TCP checksum with 2.6.19
The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE"
changed to unconditional copying of ip_summed field from collapsed
skb. This patch reverts this change.

The majority of substantial work including heavy testing
and diagnosing by: Michael Tokarev <mjt@tls.msk.ru>
Possible reasons pointed by: Herbert Xu and Patrick McHardy.

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 22:07:12 -08:00
Masayuki Nakagawa
fb7e2399ec [TCP]: skb is unexpectedly freed.
I encountered a kernel panic with my test program, which is a very
simple IPv6 client-server program.

The server side sets IPV6_RECVPKTINFO on a listening socket, and the
client side just sends a message to the server.  Then the kernel panic
occurs on the server.  (If you need the test program, please let me
know. I can provide it.)

This problem happens because a skb is forcibly freed in
tcp_rcv_state_process().

When a socket in listening state(TCP_LISTEN) receives a syn packet,
then tcp_v6_conn_request() will be called from
tcp_rcv_state_process().  If the tcp_v6_conn_request() successfully
returns, the skb would be discarded by __kfree_skb().

However, in case of a listening socket which was already set
IPV6_RECVPKTINFO, an address of the skb will be stored in
treq->pktopts and a ref count of the skb will be incremented in
tcp_v6_conn_request().  But, even if the skb is still in use, the skb
will be freed.  Then someone still using the freed skb will cause the
kernel panic.

I suggest to use kfree_skb() instead of __kfree_skb().

Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 20:25:52 -08:00
Patrick McHardy
c54ea3b95a [NETFILTER]: ctnetlink: fix leak in ctnetlink_create_conntrack error path
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-23 20:25:42 -08:00
Stephen Hemminger
65ebe63420 [PATCH] email change for shemminger@osdl.org
Change my email address to reflect OSDL merger.

Signed-off-by: Stephen Hemminger <shemminger@osdl.org>
[ The irony. Somebody still has his sign-off message hardcoded
  in a script or his brainstem ;^]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-01-23 14:18:49 -08:00
Jarek Poplawski
483479ecc5 [IPV4] devinet: inetdev_init out label moved after RCU assignment
inetdev_init out label moved after RCU assignment
(final suggestion by Herbert Xu)

Signed-off-by: Jarek Poplawski <jarkao2@o2.pl>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-09 14:38:31 -08:00
Paul Moore
469de9b90f [INET]: style updates for the inet_sock->is_icsk assignment fix
A quick patch to change the inet_sock->is_icsk assignment to better fit with
existing kernel coding style.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-09 14:37:06 -08:00
Patrick McHardy
ffed53d25b [NETFILTER]: nf_nat: fix hanging connections when loading the NAT module
When loading the NAT module, existing connection tracking entries don't
have room for NAT information allocated and packets are dropped, causing
hanging connections. They really should be entered into the NAT table
as NULL mappings, but the current allocation scheme doesn't allow this.

For now simply accept those packets to avoid the hanging connections.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-09 14:33:49 -08:00
Craig Schlenter
cb48cfe807 [TCP]: Fix iov_len calculation in tcp_v4_send_ack().
This fixes the ftp stalls present in the current kernels.

All credit goes to Komuro <komurojun-mbn@nifty.com> for tracking
this down. The patch is untested but it looks *cough* obviously
correct.

Signed-off-by: Craig Schlenter <craig@codefountain.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-09 00:30:08 -08:00
Paul Moore
cbbd7d4f36 [INET]: Fix incorrect "inet_sock->is_icsk" assignment.
The inet_create() and inet6_create() functions incorrectly set the
inet_sock->is_icsk field.  Both functions assume that the is_icsk field is
large enough to hold at least a INET_PROTOSW_ICSK value when it is actually
only a single bit.  This patch corrects the assignment by doing a boolean
comparison whose result will safely fit into a single bit field.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-09 00:29:51 -08:00
David L Stevens
30c4cf577f [IPV4/IPV6]: Fix inet{,6} device initialization order.
It is important that we only assign dev->ip{,6}_ptr
only after all portions of the inet{,6} are setup.

Otherwise we can receive packets before the multicast
spinlocks et al. are initialized.

Signed-off-by: David L Stevens <dlstevens@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-04 12:31:14 -08:00
Martin Josefsson
bbdc176a2f [NETFILTER]: nf_nat: fix MASQUERADE crash on device down
Check the return value of nfct_nat() in device_cmp(), we might very well
have non NAT conntrack entries as well (Netfilter bugzilla #528).

Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-04 12:16:54 -08:00
Patrick McHardy
c9386cfddc [NETFILTER]: New connection tracking is not EXPERIMENTAL anymore
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-04 12:16:06 -08:00
Patrick McHardy
c68b8b687f [NETFILTER]: Fix routing of REJECT target generated packets in output chain
Packets generated by the REJECT target in the output chain have a local
destination address and a foreign source address. Make sure not to use
the foreign source address for the output route lookup.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-04 12:15:34 -08:00
Dmitry Mishin
e5b5ef7d2b [NETFILTER]: compat offsets size change
Used by compat code offsets of entries should be 'unsigned int' as entries
array size has this dimension.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-01-04 12:14:41 -08:00
David S. Miller
5c668704b7 [UDP]: Fix reversed logic in udp_get_port().
When this code was converted to use sk_for_each() the
logic for the "best hash chain length" code was reversed,
breaking everything.

The original code was of the form:

			size = 0;
			do {
				if (++size >= best_size_so_far)
					goto next;
			} while ((sk = sk->next) != NULL);
			best_size_so_far = size;
			best = result;
		next:;

and this got converted into:

			sk_for_each(sk2, node, head)
				if (++size < best_size_so_far) {
					best_size_so_far = size;
					best = result;
				}

Which does something very very different from the original.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-22 11:42:26 -08:00
Li Yewang
14fb8a7647 [IPV4]: Fix BUG of ip_rt_send_redirect()
Fix the redirect packet of the router if the jiffies wraparound.

Signed-off-by: Li Yewang <lyw@nanjing-fnst.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-18 00:26:35 -08:00
Leigh Brown
a9fc00cca8 [TCP]: Trivial fix to message in tcp_v4_inbound_md5_hash
The message logged in tcp_v4_inbound_md5_hash when the hash was expected
but not found was reversed.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:26 -08:00
Leigh Brown
8228a18dd3 [TCP]: Fix oops caused by tcp_v4_md5_do_del
md5sig_info.alloced4 must be set to zero when freeing keys4, otherwise
it will not be alloc'd again when another key is added to the same
socket by tcp_v4_md5_do_add.

Signed-off-by: Leigh Brown <leigh@solinno.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-17 21:59:25 -08:00
David S. Miller
6931ba7cef [TCP]: Fix oops caused by __tcp_put_md5sig_pool()
It should call tcp_free_md5sig_pool() not __tcp_free_md5sig_pool()
so that it does proper refcounting.

Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:48:26 -08:00
Al Viro
e1b4b9f398 [NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case search for loops
If we come to node we'd already marked as seen and it's not a part of path
(i.e. we don't have a loop right there), we already know that it isn't a
part of any loop, so we don't need to revisit it.

That speeds the things up if some chain is refered to from several places
and kills O(exp(table size)) worst-case behaviour (without sleeping,
at that, so if you manage to self-LART that way, you are SOL for a long
time)...

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:48:23 -08:00
Dmitry Mishin
a96be24679 [NETFILTER]: ip_tables: ipt and ipt_compat checks unification
Matches and targets verification is duplicated in normal and compat processing
ways. This patch refactors code in order to remove this.

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2006-12-13 16:48:22 -08:00