linux

q3k/linux

Author	SHA1	Message	Date
Evgeniy Polyakov	c4e38f41e3	[IPV4]: Fix rtm_to_ifaddr() error handling. Return negative error value (embedded in the pointer) instead of returning NULL. Signed-off-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-09 13:43:24 -08:00
Herbert Xu	d644329bc9	[UDP]: Reread uh pointer after pskb_trim The header may have moved when trimming. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-07 16:08:04 -08:00
Linus Torvalds	5b3c1184e7	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [DCCP]: Set RTO for newly created child socket [DCCP]: Correctly split CCID half connections [NET]: Fix compat_sock_common_getsockopt typo. [NET]: Revert incorrect accept queue backlog changes. [INET]: twcal_jiffie should be unsigned long, not int [GIANFAR]: Fix compile error in latest git [PPPOE]: Use ifindex instead of device pointer in key lookups. [NETFILTER]: ip6_route_me_harder should take into account mark [NETFILTER]: nfnetlink_log: fix reference counting [NETFILTER]: nfnetlink_log: fix module reference counting [NETFILTER]: nfnetlink_log: fix possible NULL pointer dereference [NETFILTER]: nfnetlink_log: fix NULL pointer dereference [NETFILTER]: nfnetlink_log: fix use after free [NETFILTER]: nfnetlink_log: fix reference leak [NETFILTER]: tcp conntrack: accept SYN\|URG as valid [NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs [NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops	2007-03-06 19:53:34 -08:00
Jay Vosburgh	a816c7c712	bonding: Improve IGMP join processing In active-backup mode, the current bonding code duplicates IGMP traffic to all slaves, so that switches are up to date in case of a failover from an active to a backup interface. If bonding then fails back to the original active interface, it is likely that the "active slave" switch's IGMP forwarding for the port will be out of date until some event occurs to refresh the switch (e.g., a membership query). This patch alters the behavior of bonding to no longer flood IGMP to all ports, and to issue IGMP JOINs to the newly active port at the time of a failover. This insures that switches are kept up to date for all cases. "GOELLESCH Niels" <niels.goellesch@eurocontrol.int> originally reported this problem, and included a patch. His original patch was modified by Jay Vosburgh to additionally remove the existing IGMP flood behavior, use RCU, streamline code paths, fix trailing white space, and adjust for style. Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>	2007-03-06 06:08:11 -05:00
Patrick McHardy	d3ab4298aa	[NETFILTER]: tcp conntrack: accept SYN\|URG as valid Some stacks apparently send packets with SYN\|URG set. Linux accepts these packets, so TCP conntrack should to. Pointed out by Martijn Posthuma <posthuma@sangine.com>. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:20 -08:00
Patrick McHardy	e281db5cdf	[NETFILTER]: nf_conntrack/nf_nat: fix incorrect config ifdefs The nf_conntrack_netlink config option is named CONFIG_NF_CT_NETLINK, but multiple files use CONFIG_IP_NF_CONNTRACK_NETLINK or CONFIG_NF_CONNTRACK_NETLINK for ifdefs. Fix this and reformat all CONFIG_NF_CT_NETLINK ifdefs to only use a line. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:19 -08:00
Patrick McHardy	ec68e97ded	[NETFILTER]: conntrack: fix {nf,ip}_ct_iterate_cleanup endless loops Fix {nf,ip}_ct_iterate_cleanup unconfirmed list handling: - unconfirmed entries can not be killed manually, they are removed on confirmation or final destruction of the conntrack entry, which means we might iterate forever without making forward progress. This can happen in combination with the conntrack event cache, which holds a reference to the conntrack entry, which is only released when the packet makes it all the way through the stack or a different packet is handled. - taking references to an unconfirmed entry and using it outside the locked section doesn't work, the list entries are not refcounted and another CPU might already be waiting to destroy the entry What the code really wants to do is make sure the references of the hash table to the selected conntrack entries are released, so they will be destroyed once all references from skbs and the event cache are dropped. Since unconfirmed entries haven't even entered the hash yet, simply mark them as dying and skip confirmation based on that. Reported and tested by Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-05 13:25:18 -08:00
Paul Moore	c6387a8694	[NetLabel]: Verify sensitivity level has a valid CIPSO mapping The current CIPSO engine has a problem where it does not verify that the given sensitivity level has a valid CIPSO mapping when the "std" CIPSO DOI type is used. The end result is that bad packets are sent on the wire which should have never been sent in the first place. This patch corrects this problem by verifying the sensitivity level mapping similar to what is done with the category mapping. This patch also changes the returned error code in this case to -EPERM to better match what the category mapping verification code returns. Signed-off-by: Paul Moore <paul.moore@hp.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-03-02 20:37:36 -08:00
Arnaldo Carvalho de Melo	a9948a7e15	[TCP]: Fix minisock tcp_create_openreq_child() typo. On 2/28/07, KOVACS Krisztian <hidden@balabit.hu> wrote: > > Hi, > > While reading TCP minisock code I've found this suspiciously looking > code fragment: > > - 8< - > struct sock tcp_create_openreq_child(struct sock sk, struct request_sock req, struct sk_buff skb) > { > struct sock newsk = inet_csk_clone(sk, req, GFP_ATOMIC); > > if (newsk != NULL) { > const struct inet_request_sock ireq = inet_rsk(req); > struct tcp_request_sock treq = tcp_rsk(req); > struct inet_connection_sock newicsk = inet_csk(sk); > struct tcp_sock *newtp; > - 8< - > > The above code initializes newicsk to inet_csk(sk), isn't that supposed > to be inet_csk(newsk)? As far as I can tell this might leave > icsk_ack.last_seg_size zero even if we do have received data. Good catch! David, please apply the attached patch. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-28 11:05:56 -08:00
Bernhard Walle	aef8811abb	[XFRM]: Fix oops in xfrm4_dst_destroy() With 2.6.21-rc1, I get an oops when running 'ifdown eth0' and an IPsec connection is active. If I shut down the connection before running 'ifdown eth0', then there's no problem. The critical operation of this script is to kill dhcpd. The problem is probably caused by commit with git identifier `4337226228` (Linus tree) "[IPSEC]: IPv4 over IPv6 IPsec tunnel". This patch fixes that oops. I don't know the network code of the Linux kernel in deep, so if that fix is wrong, please change it. But please fix the oops. :) Signed-off-by: Bernhard Walle <bwalle@suse.de> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 12:10:32 -08:00
Arnaldo Carvalho de Melo	e4396b544f	[XFRM_TUNNEL]: Reload header pointer after pskb_may_pull/pskb_expand_head Please consider applying, this was found on your latest net-2.6 tree while playing around with that ip_hdr() + turn skb->nh/h/mac pointers as offsets on 64 bits idea :-) Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:43:01 -08:00
Joe Perches	4c3ae4d7e7	[IPV4]: Use random32() in net/ipv4/multipath Removed local random number generator function Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:43:00 -08:00
Herbert Xu	8030f54499	[IPV4] devinet: Register inetdev earlier. This patch allocates inetdev at registration for all devices in line with IPv6. This allows sysctl configuration on the devices to occur before they're brought up or addresses are added. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2007-02-26 11:42:56 -08:00
Baruch Even	f4b9479dc5	[IPV4]: Correct links in net/ipv4/Kconfig Correct dead/indirect links in net/ipv4/Kconfig Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:51 -08:00
David S. Miller	2c4f6219ac	[TCP]: Fix MD5 signature pool locking. The locking calls assumed that these code paths were only invoked in software interrupt context, but that isn't true. Therefore we need to use spin_{lock,unlock}_bh() throughout. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-26 11:42:48 -08:00
Adrian Bunk	936bb14ce9	correct a dead URL in the IP_MULTICAST help text Reported in kernel Bugzilla #6216. Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 19:49:13 +01:00
Robert P. J. Day	d08df601a3	Various typo fixes. Correct mis-spellings of "algorithm", "appear", "consistent" and (shame, shame) "kernel". Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Signed-off-by: Adrian Bunk <bunk@stusta.de>	2007-02-17 19:07:33 +01:00
Eric W. Biederman	3fbfa98112	[PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables It isn't needed anymore, all of the users are gone, and all of the ctl_table initializers have been converted to use explicit names of the fields they are initializing. [akpm@osdl.org: NTFS fix] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:10:00 -08:00
Eric W. Biederman	0b4d414714	[PATCH] sysctl: remove insert_at_head from register_sysctl The semantic effect of insert_at_head is that it would allow new registered sysctl entries to override existing sysctl entries of the same name. Which is pain for caching and the proc interface never implemented. I have done an audit and discovered that none of the current users of register_sysctl care as (excpet for directories) they do not register duplicate sysctl entries. So this patch simply removes the support for overriding existing entries in the sys_sysctl interface since no one uses it or cares and it makes future enhancments harder. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Acked-by: Ralf Baechle <ralf@linux-mips.org> Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Andi Kleen <ak@muc.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Corey Minyard <minyard@acm.org> Cc: Neil Brown <neilb@suse.de> Cc: "John W. Linville" <linville@tuxdriver.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Jan Kara <jack@ucw.cz> Cc: Trond Myklebust <trond.myklebust@fys.uio.no> Cc: Mark Fasheh <mark.fasheh@oracle.com> Cc: David Chinner <dgc@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:59 -08:00
Tim Schmielau	cd354f1ae7	[PATCH] remove many unneeded #includes of sched.h After Al Viro (finally) succeeded in removing the sched.h #include in module.h recently, it makes sense again to remove other superfluous sched.h includes. There are quite a lot of files which include it but don't actually need anything defined in there. Presumably these includes were once needed for macros that used to live in sched.h, but moved to other header files in the course of cleaning it up. To ease the pain, this time I did not fiddle with any header files and only removed #includes from .c-files, which tend to cause less trouble. Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha, arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig, allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all configs in arch/arm/configs on arm. I also checked that no new warnings were introduced by the patch (actually, some warnings are removed that were emitted by unnecessarily included header files). Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de> Acked-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-14 08:09:54 -08:00
Kazunori MIYAZAWA	c0d56408e3	[IPSEC]: Changing API of xfrm4_tunnel_register. This patch changes xfrm4_tunnel register and deregister interface to prepare for solving the conflict of device tunnels with inter address family IPsec tunnel. Signed-off-by: Kazunori MIYAZAWA <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:54:47 -08:00
Ilpo Järvinen	600ff0c24b	[TCP]: Prevent pseudo garbage in SYN's advertized window TCP may advertize up to 16-bits window in SYN packets (no window scaling allowed). At the same time, TCP may have rcv_wnd (32-bits) that does not fit to 16-bits without window scaling resulting in pseudo garbage into advertized window from the low-order bits of rcv_wnd. This can happen at least when mss <= (1<<wscale) (see tcp_select_initial_window). This patch fixes the handling of SYN advertized windows (compile tested only). In worst case (which is unlikely to occur though), the receiver advertized window could be just couple of bytes. I'm not sure that such situation would be handled very well at all by the receiver!? Fortunately, the situation normalizes after the first non-SYN ACK is received because it has the correct, scaled window. Alternatively, tcp_select_initial_window could be changed to prevent too large rcv_wnd in the first place. [ tcp_make_synack() has the same bug, and I've added a fix for that to this patch -DaveM ] Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:42:11 -08:00
Herbert Xu	bbf4a6bc8c	[NETFILTER]: Clear GSO bits for TCP reset packet The TCP reset packet is copied from the original. This includes all the GSO bits which do not apply to the new packet. So we should clear those bits. Spotted by Patrick McHardy. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-13 12:32:58 -08:00
Patrick McHardy	e2e01bfef8	[XFRM]: Fix IPv4 tunnel mode decapsulation with IPV6=n Add missing break when CONFIG_IPV6=n. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 20:27:10 -08:00
Stephen Hemminger	9121c77706	[TCP]: cleanup of htcp (resend) Minor non-invasive cleanups: * white space around operators and line wrapping * use const * use __read_mostly Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 13:34:03 -08:00
Stephen Hemminger	59758f4459	[TCP]: Use read mostly for CUBIC parameters. These module parameters should be in the read mostly area to avoid cache pollution. Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 13:15:20 -08:00
Patrick McHardy	a3c941b08d	[NETFILTER]: Kconfig: improve dependency handling Instead of depending on internally needed options and letting users figure out what is needed, select them when needed: - IP_NF_IPTABLES, IP_NF_ARPTABLES and IP6_NF_IPTABLES select NETFILTER_XTABLES - NETFILTER_XT_TARGET_CONNMARK, NETFILTER_XT_MATCH_CONNMARK and IP_NF_TARGET_CLUSTERIP select NF_CONNTRACK_MARK - NETFILTER_XT_MATCH_CONNBYTES selects NF_CT_ACCT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:15:02 -08:00
Patrick McHardy	982d9a9ce3	[NETFILTER]: nf_conntrack: properly use RCU for nf_conntrack_destroyed callback Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:14:11 -08:00
Patrick McHardy	6b48a7d08d	[NETFILTER]: ip_conntrack: properly use RCU for ip_conntrack_destroyed callback Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:13:58 -08:00
Patrick McHardy	abbaccda4c	[NETFILTER]: ip_conntrack: fix invalid conntrack statistics RCU assumption CONNTRACK_STAT_INC assumes rcu_read_lock in nf_hook_slow disables preemption as well, making it legal to use __get_cpu_var without disabling preemption manually. The assumption is not correct anymore with preemptable RCU, additionally we need to protect against softirqs when not holding ip_conntrack_lock. Add CONNTRACK_STAT_INC_ATOMIC macro, which disables local softirqs, and use where necessary. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:13:14 -08:00
Patrick McHardy	923f4902fe	[NETFILTER]: nf_conntrack: properly use RCU API for nf_ct_protos/nf_ct_l3protos arrays Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:57 -08:00
Patrick McHardy	642d628b2c	[NETFILTER]: ip_conntrack: properly use RCU API for ip_ct_protos array Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in all paths not obviously only used within packet process context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:40 -08:00
Patrick McHardy	e22a054869	[NETFILTER]: nf_nat: properly use RCU API for nf_nat_protos array Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in paths used outside of packet processing context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:26 -08:00
Patrick McHardy	a441dfdbb2	[NETFILTER]: ip_nat: properly use RCU API for ip_nat_protos array Replace preempt_{enable,disable} based RCU by proper use of the RCU API and add missing rcu_read_lock/rcu_read_unlock calls in paths used outside of packet processing context (nfnetlink_conntrack). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:12:09 -08:00
Patrick McHardy	e92ad99c78	[NETFILTER]: nf_log: minor cleanups - rename nf_logging to nf_loggers since its an array of registered loggers - rename nf_log_unregister_logger() to nf_log_unregister() to make it symetrical to nf_log_register() and convert all users Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:11:55 -08:00
Patrick McHardy	c3a47ab3e5	[NETFILTER]: Properly use RCU in nf_ct_attach Use rcu_assign_pointer/rcu_dereference for ip_ct_attach pointer instead of self-made RCU and use rcu_read_lock to make sure the conntrack module doesn't disappear below us while calling it, since this function can be called from outside the netfilter hooks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-12 11:09:19 -08:00
Arjan van de Ven	9a32144e9d	[PATCH] mark struct file_operations const 7 Many struct file_operations in the kernel can be "const". Marking them const moves these to the .rodata section, which avoids false sharing with potential dirty data. In addition it'll catch accidental writes at compile time to these shared resources. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-12 09:48:46 -08:00
Linus Torvalds	cb18eccff4	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (45 commits) [IPV4]: Restore multipath routing after rt_next changes. [XFRM] IPV6: Fix outbound RO transformation which is broken by IPsec tunnel patch. [NET]: Reorder fields of struct dst_entry [DECNET]: Convert decnet route to use the new dst_entry 'next' pointer [IPV6]: Convert ipv6 route to use the new dst_entry 'next' pointer [IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer [NET]: Introduce union in struct dst_entry to hold 'next' pointer [DECNET]: fix misannotation of linkinfo_dn [DECNET]: FRA_{DST,SRC} are le16 for decnet [UDP]: UDP can use sk_hash to speedup lookups [NET]: Fix whitespace errors. [NET] XFRM: Fix whitespace errors. [NET] X25: Fix whitespace errors. [NET] WANROUTER: Fix whitespace errors. [NET] UNIX: Fix whitespace errors. [NET] TIPC: Fix whitespace errors. [NET] SUNRPC: Fix whitespace errors. [NET] SCTP: Fix whitespace errors. [NET] SCHED: Fix whitespace errors. [NET] RXRPC: Fix whitespace errors. ...	2007-02-11 11:38:13 -08:00
Robert P. J. Day	c376222960	[PATCH] Transform kmem_cache_alloc()+memset(0) -> kmem_cache_zalloc(). Replace appropriate pairs of "kmem_cache_alloc()" + "memset(0)" with the corresponding "kmem_cache_zalloc()" call. Signed-off-by: Robert P. J. Day <rpjday@mindspring.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Andi Kleen <ak@muc.de> Cc: Roland McGrath <roland@redhat.com> Cc: James Bottomley <James.Bottomley@steeleye.com> Cc: Greg KH <greg@kroah.com> Acked-by: Joel Becker <Joel.Becker@oracle.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Jan Kara <jack@ucw.cz> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Stephen Smalley <sds@tycho.nsa.gov> Cc: James Morris <jmorris@namei.org> Cc: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-02-11 10:51:27 -08:00
Eric Dumazet	5ef213f684	[IPV4]: Restore multipath routing after rt_next changes. I forgot to test build this part of the networking code... Sorry guys. This patch renames u.rt_next to u.dst.rt_next Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:50 -08:00
Eric Dumazet	093c2ca416	[IPV4]: Convert ipv4 route to use the new dst_entry 'next' pointer This patch removes the rt_next pointer from 'struct rtable.u' union, and renames u.rt_next to u.dst_rt_next. It also moves 'struct flowi' right after 'struct dst_entry' to prepare the gain on lookups. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:38 -08:00
Eric Dumazet	95f30b336b	[UDP]: UDP can use sk_hash to speedup lookups In a prior patch, I introduced a sk_hash field (__sk_common.skc_hash) to let tcp lookups use one cache line per unmatched entry instead of two. We can also use sk_hash to speedup UDP part as well. We store in sk_hash the hnum value, and use sk->sk_hash (same cache line than 'next' pointer), instead of inet->num (different cache line) Note : We still have a false sharing problem for SMP machines, because sock_hold(sock) dirties the cache line containing the 'next' pointer. Not counting the udp_hash_lock rwlock. (did someone mentioned RCU ? :) ) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:20:29 -08:00
YOSHIFUJI Hideaki	e905a9edab	[NET] IPV4: Fix whitespace errors. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-10 23:19:39 -08:00
Eric Dumazet	dbca9b2750	[NET]: change layout of ehash table ehash table layout is currently this one : First half of this table is used by sockets not in TIME_WAIT state Second half of it is used by sockets in TIME_WAIT state. This is non optimal because of for a given hash or socket, the two chain heads are located in separate cache lines. Moreover the locks of the second half are never used. If instead of this halving, we use two list heads in inet_ehash_bucket instead of only one, we probably can avoid one cache miss, and reduce ram usage, particularly if sizeof(rwlock_t) is big (various CONFIG_DEBUG_SPINLOCK, CONFIG_DEBUG_LOCK_ALLOC settings). So we still halves the table but we keep together related chains to speedup lookups and socket state change. In this patch I did not try to align struct inet_ehash_bucket, but a future patch could try to make this structure have a convenient size (a power of two or a multiple of L1_CACHE_SIZE). I guess rwlock will just vanish as soon as RCU is plugged into ehash :) , so maybe we dont need to scratch our heads to align the bucket... Note : In case struct inet_ehash_bucket is not a power of two, we could probably change alloc_large_system_hash() (in case it use __get_free_pages()) to free the unused space. It currently allocates a big zone, but the last quarter of it could be freed. Again, this should be a temporary 'problem'. Patch tested on ipv4 tcp only, but should be OK for IPV6 and DCCP. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 14:16:46 -08:00
Jan Engelhardt	e60a13e030	[NETFILTER]: {ip,ip6}_tables: use struct xt_table instead of redefined structure names Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:20 -08:00
Jan Engelhardt	6709dbbb19	[NETFILTER]: {ip,ip6}_tables: remove x_tables wrapper functions Use the x_tables functions directly to make it better visible which parts are shared between ip_tables and ip6_tables. Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:19 -08:00
Jan Engelhardt	e1fd0586b0	[NETFILTER]: x_tables: fix return values for LOG/ULOG Signed-off-by: Jan Engelhardt <jengelh@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:18 -08:00
Eric Leblond	41f4689a7c	[NETFILTER]: NAT: optional source port randomization support This patch adds support to NAT to randomize source ports. Signed-off-by: Eric Leblond <eric@inl.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:17 -08:00
Patrick McHardy	cdd289a2f8	[NETFILTER]: add IPv6-capable TCPMSS target Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:16 -08:00
Patrick McHardy	a8d0f9526f	[NET]: Add UDPLITE support in a few missing spots Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:14 -08:00
Patrick McHardy	efbc597634	[NETFILTER]: nf_nat: remove broken HOOKNAME macro Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:12 -08:00
Patrick McHardy	a09113c2c8	[NETFILTER]: tcp conntrack: do liberal tracking for picked up connections Do liberal tracking (only RSTs need to be in-window) for connections picked up without seeing a SYN to deal with window scaling. Also change logging of invalid packets not to log packets accepted by liberal tracking to avoid spamming the logs. Based on suggestion from James Ralston <ralston@pobox.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:10 -08:00
Stephen Hemminger	22f8cde5bc	[NET]: unregister_netdevice as void There was no real useful information from the unregister_netdevice() return code, the only error occurred in a situation that was a driver bug. So change it to a void function. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:06 -08:00
Alexey Dobriyan	cc63f70b8b	[IPV4/IPV6] multicast: Check add_grhead() return value add_grhead() allocates memory with GFP_ATOMIC and in at least two places skb from it passed to skb_put() without checking. Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:04 -08:00
David S. Miller	f2f2102d1a	[XFRM]: Fix missed error setting in xfrm4_policy.c When we can't find the afinfo we should return EAFNOSUPPORT. GCC warned about the uninitialized 'err' for this path as well. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:03 -08:00
Miika Komu	4337226228	[IPSEC]: IPv4 over IPv6 IPsec tunnel This is the patch to support IPv4 over IPv6 IPsec. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:02 -08:00
Miika Komu	c82f963efe	[IPSEC]: IPv6 over IPv4 IPsec tunnel This is the patch to support IPv6 over IPv4 IPsec Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:01 -08:00
Miika Komu	cdca72652a	[IPSEC]: exporting xfrm_state_afinfo This patch exports xfrm_state_afinfo. Signed-off-by: Miika Komu <miika@iki.fi> Signed-off-by: Diego Beltrami <Diego.Beltrami@hiit.fi> Signed-off-by: Kazunori Miyazawa <miyazawa@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:39:00 -08:00
John Heffner	104439a887	[TCP]: Don't apply FIN exception to full TSO segments. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:51 -08:00
Baruch Even	8a3c3a9727	[TCP]: Check num sacks in SACK fast path We clear the unused parts of the SACK cache, This prevents us from mistakenly taking the cache data if the old data in the SACK cache is the same as the data in the SACK block. This assumes that we never receive an empty SACK block with start and end both at zero. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:50 -08:00
Baruch Even	6f74651ae6	[TCP]: Seperate DSACK from SACK fast path Move DSACK code outside the SACK fast-path checking code. If the DSACK determined that the information was too old we stayed with a partial cache copied. Most likely this matters very little since the next packet will not be DSACK and we will find it in the cache. but it's still not good form and there is little reason to couple the two checks. Since the SACK receive cache doesn't need the data to be in host order we also remove the ntohl in the checking loop. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:49 -08:00
Baruch Even	fda03fbb56	[TCP]: Advance fast path pointer for first block only Only advance the SACK fast-path pointer for the first block, the fast-path assumes that only the first block advances next time so we should not move the cached skb for the next sack blocks. Signed-off-by: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:48 -08:00
David S. Miller	8eb9086f21	[IPV4/IPV6]: Always wait for IPSEC SA resolution in socket contexts. Do this even for non-blocking sockets. This avoids the silly -EAGAIN that applications can see now, even for non-blocking sockets in some cases (f.e. connect()). With help from Venkat Tekkirala. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:45 -08:00
Frederik Deweerdt	ba7808eac1	[TCP]: remove tcp header from tcp_v4_check (take #2 ) The tcphdr struct passed to tcp_v4_check is not used, the following patch removes it from the parameter list. This adds the netfilter modifications missing in the patch I sent for rc3-mm1. Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:44 -08:00
Patrick McHardy	26932566a4	[NETLINK]: Don't BUG on undersized allocations Currently netlink users BUG when the allocated skb for an event notification is undersized. While this is certainly a kernel bug, its not critical and crashing the kernel is too drastic, especially when considering that these errors have appeared multiple times in the past and it BUGs even if no listeners are present. This patch replaces BUG by WARN_ON and changes the notification functions to inform potential listeners of undersized allocations using a unique error code (EMSGSIZE). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-08 12:38:41 -08:00
Patrick McHardy	40e0cb004a	[NETFILTER]: ctnetlink: fix compile failure with NF_CONNTRACK_MARK=n CC net/netfilter/nf_conntrack_netlink.o net/netfilter/nf_conntrack_netlink.c: In function 'ctnetlink_conntrack_event': net/netfilter/nf_conntrack_netlink.c:392: error: 'struct nf_conn' has no member named 'mark' make[3]: *** [net/netfilter/nf_conntrack_netlink.o] Error 1 Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-02-02 19:33:11 -08:00
Patrick McHardy	adcb471110	[NETFILTER]: SIP conntrack: fix out of bounds memory access When checking for an @-sign in skp_epaddr_len, make sure not to run over the packet boundaries. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-30 14:25:24 -08:00
Lars Immisch	7da5bfbb12	[NETFILTER]: SIP conntrack: fix skipping over user info in SIP headers When trying to skip over the username in the Contact header, stop at the end of the line if no @ is found to avoid mangling following headers. We don't need to worry about continuation lines because we search inside a SIP URI. Fixes Netfilter Bugzilla #532. Signed-off-by: Lars Immisch <lars@ibp.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-30 14:24:57 -08:00
Robert Olsson	095b8501e4	[IPV4]: Fix single-entry /proc/net/fib_trie output. When main table is just a single leaf this gets printed as belonging to the local table in /proc/net/fib_trie. A fix is below. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Acked-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 19:06:01 -08:00
Patrick McHardy	a46bf7d5a8	[NETFILTER]: nf_nat_pptp: fix expectation removal When removing the expectation for the opposite direction, the PPTP NAT helper initializes the tuple for lookup with the addresses of the opposite direction, which makes the lookup fail. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:07:30 -08:00
Patrick McHardy	c72c6b2a29	[NETFILTER]: nf_nat: fix ICMP translation with statically linked conntrack When nf_nat/nf_conntrack_ipv4 are linked statically, nf_nat is initialized before nf_conntrack_ipv4, which makes the nf_ct_l3proto_find_get(AF_INET) call during nf_nat initialization return the generic l3proto instead of the AF_INET specific one. This breaks ICMP error translation since the generic protocol always initializes the IPs in the tuple to 0. Change the linking order and put nf_conntrack_ipv4 first. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:06:47 -08:00
David S. Miller	e89862f4c5	[TCP]: Restore SKB socket owner setting in tcp_transmit_skb(). Revert `931731123a` We can't elide the skb_set_owner_w() here because things like certain netfilter targets (such as owner MATCH) need a socket to be set on the SKB for correct operation. Thanks to Jan Engelhardt and other netfilter list members for pointing this out. Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-26 01:04:55 -08:00
Baruch Even	db3ccdac26	[TCP]: Fix sorting of SACK blocks. The sorting of SACK blocks actually munges them rather than sort, causing the TCP stack to ignore some SACK information and breaking the assumption of ordered SACK blocks after sorting. The sort takes the data from a second buffer which isn't moved causing subsequent data moves to occur from the wrong location. The fix is to use a temporary buffer as a normal sort does. Signed-off-By: Baruch Even <baruch@ev-en.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-25 13:35:06 -08:00
Eric W. Biederman	6640e69731	[IPV4]: Fix the fib trie iterator to work with a single entry routing tables In a kernel with trie routing enabled I had a simple routing setup with only a single route to the outside world and no default route. "ip route table list main" showed my the route just fine but /proc/net/route was an empty file. What was going on? Thinking it was a bug in something I did and I looked deeper. Eventually I setup a second route and everything looked correct, huh? Finally I realized that the it was just the iterator pair in fib_trie_get_first, fib_trie_get_next just could not handle a routing table with a single entry. So to save myself and others further confusion, here is a simple fix for the fib proc iterator so it works even when there is only a single route in a routing table. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-24 14:42:04 -08:00
Jarek Poplawski	52d570aabe	[TCP]: rare bad TCP checksum with 2.6.19 The patch "Replace CHECKSUM_HW by CHECKSUM_PARTIAL/CHECKSUM_COMPLETE" changed to unconditional copying of ip_summed field from collapsed skb. This patch reverts this change. The majority of substantial work including heavy testing and diagnosing by: Michael Tokarev <mjt@tls.msk.ru> Possible reasons pointed by: Herbert Xu and Patrick McHardy. Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 22:07:12 -08:00
Masayuki Nakagawa	fb7e2399ec	[TCP]: skb is unexpectedly freed. I encountered a kernel panic with my test program, which is a very simple IPv6 client-server program. The server side sets IPV6_RECVPKTINFO on a listening socket, and the client side just sends a message to the server. Then the kernel panic occurs on the server. (If you need the test program, please let me know. I can provide it.) This problem happens because a skb is forcibly freed in tcp_rcv_state_process(). When a socket in listening state(TCP_LISTEN) receives a syn packet, then tcp_v6_conn_request() will be called from tcp_rcv_state_process(). If the tcp_v6_conn_request() successfully returns, the skb would be discarded by __kfree_skb(). However, in case of a listening socket which was already set IPV6_RECVPKTINFO, an address of the skb will be stored in treq->pktopts and a ref count of the skb will be incremented in tcp_v6_conn_request(). But, even if the skb is still in use, the skb will be freed. Then someone still using the freed skb will cause the kernel panic. I suggest to use kfree_skb() instead of __kfree_skb(). Signed-off-by: Masayuki Nakagawa <nakagawa.msy@ncos.nec.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 20:25:52 -08:00
Patrick McHardy	c54ea3b95a	[NETFILTER]: ctnetlink: fix leak in ctnetlink_create_conntrack error path Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-23 20:25:42 -08:00
Stephen Hemminger	65ebe63420	[PATCH] email change for shemminger@osdl.org Change my email address to reflect OSDL merger. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> [ The irony. Somebody still has his sign-off message hardcoded in a script or his brainstem ;^] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2007-01-23 14:18:49 -08:00
Jarek Poplawski	483479ecc5	[IPV4] devinet: inetdev_init out label moved after RCU assignment inetdev_init out label moved after RCU assignment (final suggestion by Herbert Xu) Signed-off-by: Jarek Poplawski <jarkao2@o2.pl> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 14:38:31 -08:00
Paul Moore	469de9b90f	[INET]: style updates for the inet_sock->is_icsk assignment fix A quick patch to change the inet_sock->is_icsk assignment to better fit with existing kernel coding style. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 14:37:06 -08:00
Patrick McHardy	ffed53d25b	[NETFILTER]: nf_nat: fix hanging connections when loading the NAT module When loading the NAT module, existing connection tracking entries don't have room for NAT information allocated and packets are dropped, causing hanging connections. They really should be entered into the NAT table as NULL mappings, but the current allocation scheme doesn't allow this. For now simply accept those packets to avoid the hanging connections. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 14:33:49 -08:00
Craig Schlenter	cb48cfe807	[TCP]: Fix iov_len calculation in tcp_v4_send_ack(). This fixes the ftp stalls present in the current kernels. All credit goes to Komuro <komurojun-mbn@nifty.com> for tracking this down. The patch is untested but it looks cough obviously correct. Signed-off-by: Craig Schlenter <craig@codefountain.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 00:30:08 -08:00
Paul Moore	cbbd7d4f36	[INET]: Fix incorrect "inet_sock->is_icsk" assignment. The inet_create() and inet6_create() functions incorrectly set the inet_sock->is_icsk field. Both functions assume that the is_icsk field is large enough to hold at least a INET_PROTOSW_ICSK value when it is actually only a single bit. This patch corrects the assignment by doing a boolean comparison whose result will safely fit into a single bit field. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-09 00:29:51 -08:00
David L Stevens	30c4cf577f	[IPV4/IPV6]: Fix inet{,6} device initialization order. It is important that we only assign dev->ip{,6}_ptr only after all portions of the inet{,6} are setup. Otherwise we can receive packets before the multicast spinlocks et al. are initialized. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-04 12:31:14 -08:00
Martin Josefsson	bbdc176a2f	[NETFILTER]: nf_nat: fix MASQUERADE crash on device down Check the return value of nfct_nat() in device_cmp(), we might very well have non NAT conntrack entries as well (Netfilter bugzilla #528). Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-04 12:16:54 -08:00
Patrick McHardy	c9386cfddc	[NETFILTER]: New connection tracking is not EXPERIMENTAL anymore Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-04 12:16:06 -08:00
Patrick McHardy	c68b8b687f	[NETFILTER]: Fix routing of REJECT target generated packets in output chain Packets generated by the REJECT target in the output chain have a local destination address and a foreign source address. Make sure not to use the foreign source address for the output route lookup. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-04 12:15:34 -08:00
Dmitry Mishin	e5b5ef7d2b	[NETFILTER]: compat offsets size change Used by compat code offsets of entries should be 'unsigned int' as entries array size has this dimension. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2007-01-04 12:14:41 -08:00
David S. Miller	5c668704b7	[UDP]: Fix reversed logic in udp_get_port(). When this code was converted to use sk_for_each() the logic for the "best hash chain length" code was reversed, breaking everything. The original code was of the form: size = 0; do { if (++size >= best_size_so_far) goto next; } while ((sk = sk->next) != NULL); best_size_so_far = size; best = result; next:; and this got converted into: sk_for_each(sk2, node, head) if (++size < best_size_so_far) { best_size_so_far = size; best = result; } Which does something very very different from the original. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-22 11:42:26 -08:00
Li Yewang	14fb8a7647	[IPV4]: Fix BUG of ip_rt_send_redirect() Fix the redirect packet of the router if the jiffies wraparound. Signed-off-by: Li Yewang <lyw@nanjing-fnst.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-18 00:26:35 -08:00
Leigh Brown	a9fc00cca8	[TCP]: Trivial fix to message in tcp_v4_inbound_md5_hash The message logged in tcp_v4_inbound_md5_hash when the hash was expected but not found was reversed. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-17 21:59:26 -08:00
Leigh Brown	8228a18dd3	[TCP]: Fix oops caused by tcp_v4_md5_do_del md5sig_info.alloced4 must be set to zero when freeing keys4, otherwise it will not be alloc'd again when another key is added to the same socket by tcp_v4_md5_do_add. Signed-off-by: Leigh Brown <leigh@solinno.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-17 21:59:25 -08:00
David S. Miller	6931ba7cef	[TCP]: Fix oops caused by __tcp_put_md5sig_pool() It should call tcp_free_md5sig_pool() not __tcp_free_md5sig_pool() so that it does proper refcounting. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:26 -08:00
Al Viro	e1b4b9f398	[NETFILTER]: {ip,ip6,arp}_tables: fix exponential worst-case search for loops If we come to node we'd already marked as seen and it's not a part of path (i.e. we don't have a loop right there), we already know that it isn't a part of any loop, so we don't need to revisit it. That speeds the things up if some chain is refered to from several places and kills O(exp(table size)) worst-case behaviour (without sleeping, at that, so if you manage to self-LART that way, you are SOL for a long time)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:23 -08:00
Dmitry Mishin	a96be24679	[NETFILTER]: ip_tables: ipt and ipt_compat checks unification Matches and targets verification is duplicated in normal and compat processing ways. This patch refactors code in order to remove this. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:22 -08:00
Yasuyuki Kozakai	11078c371e	[NETFILTER]: x_tables: add missing try to load conntrack from match/targets CLUSTERIP, CONNMARK, CONNSECMARK, and connbytes need ip_conntrack or layer 3 protocol module of nf_conntrack. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:21 -08:00
Yasuyuki Kozakai	083e69e99e	[NETFILTER]: nf_nat: fix NF_NAT dependency NF_NAT depends on NF_CONNTRACK_IPV4, not NF_CONNTRACK. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-13 16:48:19 -08:00
Peter Zijlstra	47c6bf7760	fix typo in net/ipv4/ip_fragment.c Signed-off-by: Adrian Bunk <bunk@stusta.de>	2006-12-12 19:48:59 +01:00
Simon Horman	37004af3aa	[IPVS]: Make ip_vs_sync.c <= 80col wide. Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-11 14:35:03 -08:00
Simon Horman	89eaeb09ba	[IPVS]: Use msleep_interruptable() instead of ssleep() aka msleep() Dean Manners notices that when an IPVS synchonisation daemons are started the system load slowly climbs up to 1. This seems to be related to the call to ssleep(1) (aka msleep(1000) in the main loop. Replacing this with a call to msleep_interruptable() seems to make the problem go away. Though I'm not sure that it is correct. This is the second edition of this patch, which replaces ssleep() in the main loop for both the master and backup threads, as well as some thread synchronisation code. The latter is just for thorougness as it shouldn't be causing any problems. Signed-Off-By: Simon Horman <horms@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-11 14:35:02 -08:00
Alexey Dobriyan	1f29bcd739	[PATCH] sysctl: remove unused "context" param Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Howells <dhowells@redhat.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-10 09:55:41 -08:00
Stephen Hemminger	3644f0cee7	[NET]: Convert hh_lock to seqlock. The hard header cache is in the main output path, so using seqlock instead of reader/writer lock should reduce overhead. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-08 17:19:20 -08:00
Josef Sipek	6df81ab227	[PATCH] struct path: convert netfilter Signed-off-by: Josef Sipek <jsipek@fsl.cs.sunysb.edu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-08 08:28:48 -08:00
Linus Torvalds	2685b267bc	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits) [NETFILTER]: Fix non-ANSI func. decl. [TG3]: Identify Serdes devices more clearly. [TG3]: Use msleep. [TG3]: Use netif_msg_*. [TG3]: Allow partial speed advertisement. [TG3]: Add TG3_FLG2_IS_NIC flag. [TG3]: Add 5787F device ID. [TG3]: Fix Phy loopback. [WANROUTER]: Kill kmalloc debugging code. [TCP] inet_twdr_hangman: Delete unnecessary memory barrier(). [NET]: Memory barrier cleanups [IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries. audit: disable ipsec auditing when CONFIG_AUDITSYSCALL=n audit: Add auditing to ipsec [IRDA] irlan: Fix compile warning when CONFIG_PROC_FS=n [IrDA]: Incorrect TTP header reservation [IrDA]: PXA FIR code device model conversion [GENETLINK]: Fix misplaced command flags. [NETLIK]: Add a pointer to the Generic Netlink wiki page. [IPV6] RAW: Don't release unlocked sock. ...	2006-12-07 09:05:15 -08:00
Christoph Lameter	e18b890bb0	[PATCH] slab: remove kmem_cache_t Replace all uses of kmem_cache_t with struct kmem_cache. The patch was generated using the following script: #!/bin/sh # # Replace one string by another in all the kernel sources. # set -e for file in `find * -name ".c" -o -name ".h"\|xargs grep -l $1`; do quilt add $file sed -e "1,\$s/$1/$2/g" $file >/tmp/$$ mv /tmp/$$ $file quilt refresh done The script was run like this sh replace kmem_cache_t "struct kmem_cache" Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:25 -08:00
Christoph Lameter	e94b176609	[PATCH] slab: remove SLAB_KERNEL SLAB_KERNEL is an alias of GFP_KERNEL. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
Christoph Lameter	54e6ecb239	[PATCH] slab: remove SLAB_ATOMIC SLAB_ATOMIC is an alias of GFP_ATOMIC Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-12-07 08:39:24 -08:00
David S. Miller	905eee008b	[TCP] inet_twdr_hangman: Delete unnecessary memory barrier(). As per Ralf Baechle's observations, the schedule_work() call should give enough of a memory barrier, so the explicit one here is totally unnecessary. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-07 00:12:30 -08:00
Ralf Baechle	e16aa207cc	[NET]: Memory barrier cleanups I believe all the below memory barriers only matter on SMP so therefore the smp_* variant of the barrier should be used. I'm wondering if the barrier in net/ipv4/inet_timewait_sock.c should be dropped entirely. schedule_work's implementation currently implies a memory barrier and I think sane semantics of schedule_work() should imply a memory barrier, as needed so the caller shouldn't have to worry. It's not quite obvious why the barrier in net/packet/af_packet.c is needed; maybe it should be implied through flush_dcache_page? Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-07 00:11:33 -08:00
David S. Miller	26db167702	[IPSEC]: Fix inetpeer leak in ipv4 xfrm dst entries. We grab a reference to the route's inetpeer entry but forget to release it in xfrm4_dst_destroy(). Bug discovered by Kazunori MIYAZAWA <kazunori@miyazawa.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 23:45:15 -08:00
Dmitry Mishin	f6677f4312	[NETFILTER]: Fix iptables compat hook validation In compat mode, matches and targets valid hooks checks always successful due to not initialized e->comefrom field yet. This patch separates this checks from translation code and moves them after mark_source_chains() call, where these marks are initialized. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by; Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:03 -08:00
Dmitry Mishin	74c9c0c17d	[NETFILTER]: Fix {ip,ip6,arp}_tables hook validation Commit `590bdf7fd2` introduced a regression in match/target hook validation. mark_source_chains builds a bitmask for each rule representing the hooks it can be reached from, which is then used by the matches and targets to make sure they are only called from valid hooks. The patch moved the match/target specific validation before the mark_source_chains call, at which point the mask is always zero. This patch returns back to the old order and moves the standard checks to mark_source_chains. This allows to get rid of a special case for standard targets as a nice side-effect. Signed-off-by: Dmitry Mishin <dim@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:39:02 -08:00
Patrick McHardy	1b6651f1bf	[XFRM]: Use output device disable_xfrm for forwarded packets Currently the behaviour of disable_xfrm is inconsistent between locally generated and forwarded packets. For locally generated packets disable_xfrm disables the policy lookup if it is set on the output device, for forwarded traffic however it looks at the input device. This makes it impossible to disable xfrm on all devices but a dummy device and use normal routing to direct traffic to that device. Always use the output device when checking disable_xfrm. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-06 18:38:43 -08:00
David Howells	9db7372445	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/ata/libata-scsi.c include/linux/libata.h Futher merge of Linus's head and compilation fixups. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 17:01:28 +00:00
David Howells	4c1ac1b491	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: drivers/infiniband/core/iwcm.c drivers/net/chelsio/cxgb2.c drivers/net/wireless/bcm43xx/bcm43xx_main.c drivers/net/wireless/prism54/islpci_eth.c drivers/usb/core/hub.h drivers/usb/input/hid-core.c net/core/netpoll.c Fix up merge failures with Linus's head and fix new compilation failures. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-12-05 14:37:56 +00:00
Al Viro	d7fe0f241d	[PATCH] severing skbuff.h -> mm.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:34 -05:00
Al Viro	a1f8e7f7fb	[PATCH] severing skbuff.h -> highmem.h Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2006-12-04 02:00:29 -05:00
Patrick McHardy	13b1833910	[NETFILTER]: nf_conntrack: EXPORT_SYMBOL cleanup - move EXPORT_SYMBOL next to exported symbol - use EXPORT_SYMBOL_GPL since this is what the original code used Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:11:25 -08:00
Patrick McHardy	a3c479772c	[NETFILTER]: Mark old IPv4-only connection tracking scheduled for removal Also remove the references to "new connection tracking" from Kconfig. After some short stabilization period of the new connection tracking helpers/NAT code the old one will be removed. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:11:01 -08:00
Patrick McHardy	807467c22a	[NETFILTER]: nf_nat: add SNMP NAT helper port Add nf_conntrack port of the SNMP NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:10:34 -08:00
Patrick McHardy	a536df35b3	[NETFILTER]: nf_conntrack/nf_nat: add TFTP helper port Add IPv4 and IPv6 capable nf_conntrack port of the TFTP conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:10:18 -08:00
Patrick McHardy	9fafcd7b20	[NETFILTER]: nf_conntrack/nf_nat: add SIP helper port Add IPv4 and IPv6 capable nf_conntrack port of the SIP conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:09:57 -08:00
Patrick McHardy	f09943fefe	[NETFILTER]: nf_conntrack/nf_nat: add PPTP helper port Add nf_conntrack port of the PPtP conntrack/NAT helper. Since there seems to be no IPv6-capable PPtP implementation the helper only support IPv4. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:09:41 -08:00
Patrick McHardy	869f37d8e4	[NETFILTER]: nf_conntrack/nf_nat: add IRC helper port Add nf_conntrack port of the IRC conntrack/NAT helper. Since DCC doesn't support IPv6 yet, the helper is still IPv4 only. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:09:06 -08:00
Patrick McHardy	f587de0e2f	[NETFILTER]: nf_conntrack/nf_nat: add H.323 helper port Add IPv4 and IPv6 capable nf_conntrack port of the H.323 conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:08:46 -08:00
Patrick McHardy	1695890057	[NETFILTER]: nf_conntrack/nf_nat: add amanda helper port Add IPv4 and IPv6 capable nf_conntrack port of the Amanda conntrack/NAT helper. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:08:26 -08:00
Jozsef Kadlecsik	55a733247d	[NETFILTER]: nf_nat: add FTP NAT helper port Add FTP NAT helper. Split out from Jozsef's big nf_nat patch with a few small fixes by myself. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:07:44 -08:00
Jozsef Kadlecsik	5b1158e909	[NETFILTER]: Add NAT support for nf_conntrack Add NAT support for nf_conntrack. Joint work of Jozsef Kadlecsik, Yasuyuki Kozakai, Martin Josefsson and myself. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:07:13 -08:00
Patrick McHardy	d2483ddefd	[NETFILTER]: nf_conntrack: add module aliases to IPv4 conntrack names Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:06:05 -08:00
Patrick McHardy	b321e14425	[NETFILTER]: Kconfig: improve conntrack selection Improve the connection tracking selection (well, the user experience, not really the aesthetics) by offering one option to enable connection tracking and a choice between the implementations. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:05:46 -08:00
Patrick McHardy	bff9a89bca	[NETFILTER]: nf_conntrack: endian annotations Resync with Al Viro's ip_conntrack annotations and fix a missed spot in ip_nat_proto_icmp.c. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:05:08 -08:00
Patrick McHardy	0c4ca1bd86	[NETFILTER]: nf_conntrack: fix NF_CONNTRACK_PROC_COMPAT dependency NF_CONNTRACK_PROC_COMPAT depends on NF_CONNTRACK_IPV4, not NF_CONNTRACK. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 22:04:24 -08:00
Andrew Morton	b6332e6cf9	[TCP]: Fix warnings with TCP_MD5SIG disabled. Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:52 -08:00
Adrian Bunk	f5b99bcddd	[NET]: Possible cleanups. This patch contains the following possible cleanups: - make the following needlessly global functions statis: - ipv4/tcp.c: __tcp_alloc_md5sig_pool() - ipv4/tcp_ipv4.c: tcp_v4_reqsk_md5_lookup() - ipv4/udplite.c: udplite_rcv() - ipv4/udplite.c: udplite_err() - make the following needlessly global structs static: - ipv4/tcp_ipv4.c: tcp_request_sock_ipv4_ops - ipv4/tcp_ipv4.c: tcp_sock_ipv4_specific - ipv6/tcp_ipv6.c: tcp_request_sock_ipv6_ops - net/ipv{4,6}/udplite.c: remove inline's from static functions (gcc should know best when to inline them) Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:51 -08:00
David S. Miller	08dd1a506b	[TCP] MD5SIG: Kill CONFIG_TCP_MD5SIG_DEBUG. It just obfuscates the code and adds limited value. And as Adrian Bunk noticed, it lacked Kconfig help text too, so just kill it. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:47 -08:00
Paul Moore	484b366932	NetLabel: add the ranged tag to the CIPSOv4 protocol Add support for the ranged tag (tag type #5) to the CIPSOv4 protocol. The ranged tag allows for seven, or eight if zero is the lowest category, category ranges to be specified in a CIPSO option. Each range is specified by two unsigned 16 bit fields, each with a maximum value of 65534. The two values specify the start and end of the category range; if the start of the category range is zero then it is omitted. See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:31:38 -08:00
Paul Moore	654bbc2a2b	NetLabel: add the enumerated tag to the CIPSOv4 protocol Add support for the enumerated tag (tag type #2) to the CIPSOv4 protocol. The enumerated tag allows for 15 categories to be specified in a CIPSO option, where each category is an unsigned 16 bit field with a maximum value of 65534. See Documentation/netlabel/draft-ietf-cipso-ipsecurity-01.txt for more details. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:31:37 -08:00
Paul Moore	0275276035	NetLabel: convert to an extensibile/sparse category bitmap The original NetLabel category bitmap was a straight char bitmap which worked fine for the initial release as it only supported 240 bits due to limitations in the CIPSO restricted bitmap tag (tag type 0x01). This patch converts that straight char bitmap into an extensibile/sparse bitmap in order to lay the foundation for other CIPSO tag types and protocols. This patch also has a nice side effect in that all of the security attributes passed by NetLabel into the LSM are now in a format which is in the host's native byte/bit ordering which makes the LSM specific code much simpler; look at the changes in security/selinux/ss/ebitmap.c as an example. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:31:36 -08:00
Patrick McHardy	76592584be	[NETFILTER]: Fix PROC_FS=n warnings Fix some unused function/variable warnings. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:34 -08:00
Patrick McHardy	65195686ff	[NETFILTER]: remove remaining ASSERT_{READ,WRITE}_LOCK Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:33 -08:00
Patrick McHardy	baf7b1e112	[NETFILTER]: x_tables: add NFLOG target Add new NFLOG target to allow use of nfnetlink_log for both IPv4 and IPv6. Currently we have two (unsupported by userspace) hacks in the LOG and ULOG targets to optionally call to the nflog API. They lack a few features, namely the IPv4 and IPv6 LOG targets can not specify a number of arguments related to nfnetlink_log, while the ULOG target is only available for IPv4. Remove those hacks and add a clean way to use nfnetlink_log. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:31 -08:00
Patrick McHardy	39b46fc6f0	[NETFILTER]: x_tables: add port of hashlimit match for IPv4 and IPv6 Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:31 -08:00
Pablo Neira Ayuso	7b621c1ea6	[NETFILTER]: ctnetlink: rework conntrack fields dumping logic on events \| NEW \| UPDATE \| DESTROY \| ----------------------------------------\| tuples \| Y \| Y \| Y \| status \| Y \| Y \| N \| timeout \| Y \| Y \| N \| protoinfo \| S \| S \| N \| helper \| S \| S \| N \| mark \| S \| S \| N \| counters \| F \| F \| Y \| Leyend: Y: yes N: no S: iif the field is set F: iif overflow This patch also replace IPCT_HELPINFO by IPCT_HELPER since we want to track the helper assignation process, not the changes in the private information held by the helper. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:28 -08:00
Pablo Neira Ayuso	bbb3357d14	[NETFILTER]: ctnetlink: check for status attribute existence on conntrack creation Check that status flags are available in the netlink message received to create a new conntrack. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:27 -08:00
Patrick McHardy	1b683b5512	[NETFILTER]: sip conntrack: better NAT handling The NAT handling of the SIP helper has a few problems: - Request headers are only mangled in the reply direction, From/To headers not at all, which can lead to authentication failures with DNAT in case the authentication domain is the IP address - Contact headers in responses are only mangled for REGISTER responses - Headers may be mangled even though they contain addresses not participating in the connection, like alternative addresses - Packets are droppen when domain names are used where the helper expects IP addresses This patch takes a different approach, instead of fixed rules what field to mangle to what content, it adds symetric mapping of From/To/Via/Contact headers, which allows to deal properly with echoed addresses in responses and foreign addresses not belonging to the connection. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:26 -08:00
Patrick McHardy	77a78dec48	[NETFILTER]: sip conntrack: make header shortcuts optional Not every header has a shortcut, so make them optional instead of searching for the same string twice. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:25 -08:00
Patrick McHardy	40883e8184	[NETFILTER]: sip conntrack: do case insensitive SIP header search SIP headers are generally case-insensitive, only SDP headers are case sensitive. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:24 -08:00
Patrick McHardy	9d5b8baa4e	[NETFILTER]: sip conntrack: minor cleanup - Use enum for header field enumeration - Use numerical value instead of pointer to header info structure to identify headers, unexport ct_sip_hdrs - group SIP and SDP entries in header info structure - remove double forward declaration of ct_sip_get_info Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:23 -08:00
Patrick McHardy	337fbc4166	[NETFILTER]: ip_conntrack: fix NAT helper unload races The NAT helpr hooks are protected by RCU, but all of the conntrack helpers test and use the global pointers instead of copying them first using rcu_dereference() Also replace synchronize_net() by synchronize_rcu() for clarity since sychronizing only with packet receive processing is insufficient to prevent races. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:22 -08:00
Yasuyuki Kozakai	468ec44bd5	[NETFILTER]: conntrack: add '_get' to {ip, nf}_conntrack_expect_find We usually uses 'xxx_find_get' for function which increments reference count. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:21 -08:00
Patrick McHardy	e4bd8bce3e	[NETFILTER]: nf_conntrack: /proc compatibility with old connection tracking This patch adds /proc/net/ip_conntrack, /proc/net/ip_conntrack_expect and /proc/net/stat/ip_conntrack files to keep old programs using them working. The /proc/net/ip_conntrack and /proc/net/ip_conntrack_expect files show only IPv4 entries, the /proc/net/stat/ip_conntrack shows global statistics. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:20 -08:00
Patrick McHardy	a999e68376	[NETFILTER]: nf_conntrack: sysctl compatibility with old connection tracking This patch adds an option to keep the connection tracking sysctls visible under their old names. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:19 -08:00
Patrick McHardy	933a41e7e1	[NETFILTER]: nf_conntrack: move conntrack protocol sysctls to individual modules Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:18 -08:00
Patrick McHardy	f8eb24a89a	[NETFILTER]: nf_conntrack: move extern declaration to header files Using extern in a C file is a bad idea because the compiler can't catch type errors. Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:16 -08:00
Martin Josefsson	824621eddd	[NETFILTER]: nf_conntrack: remove unused struct list_head from protocols Remove unused struct list_head from struct nf_conntrack_l3proto and nf_conntrack_l4proto as all protocols are kept in arrays, not linked lists. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:13 -08:00
Martin Josefsson	605dcad6c8	[NETFILTER]: nf_conntrack: rename struct nf_conntrack_protocol Rename 'struct nf_conntrack_protocol' to 'struct nf_conntrack_l4proto' in order to help distinguish it from 'struct nf_conntrack_l3proto'. It gets rather confusing with 'nf_conntrack_protocol'. Signed-off-by: Martin Josefsson <gandalf@wlug.westbo.se> Signed-off-by: Patrick McHardy <kaber@trash.net>	2006-12-02 21:31:09 -08:00
David S. Miller	d2e4bdc870	[TCP] Vegas: Increase default alpha to 2 and beta to 4. This helps Vegas cope better with delayed ACKs, see analysis at: http://www.cs.caltech.edu/%7Eweixl/technical/ns2linux/known_linux/index.html#vegas Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:31:03 -08:00
Gerrit Renker	4c0a6cb0db	[UDP(-Lite)]: consolidate v4 and v6 get\|setsockopt code This patch consolidates set/getsockopt code between UDP(-Lite) v4 and 6. The justification is that UDP(-Lite) is a transport-layer protocol and therefore the socket option code (at least in theory) should be AF-independent. Furthermore, there is the following code reduplication: * do_udp{,v6}_getsockopt is 100% identical between v4 and v6 * do_udp{,v6}_setsockopt is identical up to the following differerence --v4 in contrast to v4 additionally allows the experimental encapsulation types UDP_ENCAP_ESPINUDP and UDP_ENCAP_ESPINUDP_NON_IKE --the remainder is identical between v4 and v6 I believe that this difference is of little relevance. The advantages in not duplicating twice almost completely identical code. The patch further simplifies the interface of udp{,v6}_push_pending_frames, since for the second argument (struct udp_sock *up) it always holds that up = udp_sk(sk); where sk is the first function argument. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:45 -08:00
Thomas Graf	e3703b3de1	[RTNETLINK]: Add rtnl_put_cacheinfo() to unify some code IPv4, IPv6, and DECNet all use struct rta_cacheinfo in a similiar way, therefore rtnl_put_cacheinfo() is added to reuse code. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:44 -08:00
Thomas Graf	4e9b826935	[NETLINK]: Remove unused dst_pid field in netlink_skb_parms The destination PID is passed directly to netlink_unicast() respectively netlink_multicast(). Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:30:43 -08:00
Arnaldo Carvalho de Melo	8b2ed4bba4	[IPVS]: Use kmemdup where appropriate Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:30:16 -08:00
Al Viro	66625984ca	[CIPSO]: Missing annotation in cipso_ipv4 update. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:21 -08:00
Al Viro	ff1dcadb1b	[NET]: Split skb->csum ... into anonymous union of __wsum and __u32 (csum and csum_offset resp.) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:18 -08:00
Al Viro	5b14027bf2	[NETFILTER]: ip_nat_snmp_basic annotations. ... and switch the damn checksum update to something saner Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:17 -08:00
Al Viro	8e5200f540	[NET]: Fix assorted misannotations (from md5 and udplite merges). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:27:16 -08:00
Paul Moore	9bb5fd2b05	NetLabel: use cipso_v4_doi_search() for local CIPSOv4 functions The cipso_v4_doi_search() function behaves the same as cipso_v4_doi_getdef() but is a local, static function so use it whenever possibile in the CIPSOv4 code base. Signed-of-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:24:12 -08:00
Paul Moore	9fade4bf8e	NetLabel: return the correct error for translated CIPSOv4 tags The CIPSOv4 translated tag #1 mapping does not always return the correct error code if the desired mapping does not exist; instead of returning -EPERM it returns -ENOSPC indicating that the buffer is not large enough to hold the translated value. This was caused by failing to check a specific error condition. This patch fixes this so that unknown mappings return -EPERM which is consistent with the rest of the related CIPSOv4 code. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:24:11 -08:00
Paul Moore	91b1ed0afd	NetLabel: fixup the handling of CIPSOv4 tags to allow for multiple tag types While the original CIPSOv4 code had provisions for multiple tag types the implementation was not as great as it could be, pushing a lot of non-tag specific processing into the tag specific code blocks. This patch fixes that issue making it easier to support multiple tag types in the future. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:24:10 -08:00
Paul Moore	6ce61a7c26	NetLabel: add tag verification when adding new CIPSOv4 DOI definitions Currently the CIPSOv4 engine does not do any sort of checking when a new DOI definition is added. The tags are still verified but only as a side effect of normal NetLabel operation (packet processing, socket labeling, etc.) which would cause application errors due to the faulty configuration. This patch adds tag checking when new DOI definition are added allowing us to catch these configuration problems when they happen. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:24:09 -08:00
Paul Moore	05e00cbf50	NetLabel: check for a CIPSOv4 option before we do call into the CIPSOv4 layer Right now the NetLabel code always jumps into the CIPSOv4 layer to determine if a CIPSO IP option is present. However, we can do this check directly in the NetLabel code by making use of the CIPSO_V4_OPTEXIST() macro which should save us a function call in the common case of not having a CIPSOv4 option present. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:24:08 -08:00
Paul Moore	701a90bad9	NetLabel: make netlbl_lsm_secattr struct easier/quicker to understand The existing netlbl_lsm_secattr struct required the LSM to check all of the fields to determine if any security attributes were present resulting in a lot of work in the common case of no attributes. This patch adds a 'flags' field which is used to indicate which attributes are present in the structure; this should allow the LSM to do a quick comparison to determine if the structure holds any security attributes. Example: if (netlbl_lsm_secattr->flags) /* security attributes present / else / NO security attributes present */ Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:24:07 -08:00
Arnaldo Carvalho de Melo	352d48008b	[TCP]: Tidy up skb_entail Heck, it even saves us some few bytes: [acme@newtoy net-2.6.20]$ codiff -f /tmp/tcp.o.before ../OUTPUT/qemu/net-2.6.20/net/ipv4/tcp.o /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp.c: tcp_sendpage \| -7 tcp_sendmsg \| -5 2 functions changed, 12 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:24:03 -08:00
Arnaldo Carvalho de Melo	c67862403e	[TCP] minisocks: Use kmemdup and LIMIT_NETDEBUG Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/tcp_minisocks.o.before /tmp/tcp_minisocks.o.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp_minisocks.c: tcp_check_req \| -44 1 function changed, 44 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:57 -08:00
Arnaldo Carvalho de Melo	42e5ea466c	[IPV4]: Use kmemdup in net/ipv4/devinet.c Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/devinet.o.before /tmp/devinet.o.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/devinet.c: devinet_sysctl_register \| -38 1 function changed, 38 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:56 -08:00
Arnaldo Carvalho de Melo	fac5d73151	[NETLABEL]: Use kmemdup in cipso_ipv4.c Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/cipso_ipv4.o.before /tmp/cipso_ipv4.o.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/cipso_ipv4.c: cipso_v4_cache_add \| -46 1 function changed, 46 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:55 -08:00
Arnaldo Carvalho de Melo	f6685938f9	[TCP_IPV4]: Use kmemdup where appropriate Also use a variable to avoid the longish tp->md5sig_info-> use in tcp_v4_md5_do_add. Code diff stats: [acme@newtoy net-2.6.20]$ codiff /tmp/tcp_ipv4.o.before /tmp/tcp_ipv4.o.after /pub/scm/linux/kernel/git/acme/net-2.6.20/net/ipv4/tcp_ipv4.c: tcp_v4_md5_do_add \| -62 tcp_v4_syn_recv_sock \| -32 tcp_v4_parse_md5_keys \| -86 3 functions changed, 180 bytes removed [acme@newtoy net-2.6.20]$ Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:54 -08:00
Arnaldo Carvalho de Melo	7174259e6c	[TCP_IPV4]: CodingStyle cleanups, no code change Mostly related to CONFIG_TCP_MD5SIG recent merge. Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:23:53 -08:00
Gerrit Renker	078250d68d	[NET/IPv4]: Make udp_push_pending_frames static udp_push_pending_frames is only referenced within net/ipv4/udp.c and hence can remain static. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:47 -08:00
Al Viro	43bc0ca7ea	[NET]: netfilter checksum annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:42 -08:00
Al Viro	f9214b2627	[NET]: ipvs checksum annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:41 -08:00
Al Viro	f6ab028804	[NET]: Make mangling a checksum (0 -> 0xffff on the wire) explicit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:39 -08:00
Al Viro	b51655b958	[NET]: Annotate __skb_checksum_complete() and friends. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:38 -08:00
Al Viro	b1550f2212	[NET]: Annotate ip_vs_checksum_complete() and callers. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:37 -08:00
Al Viro	5f92a7388a	[NET]: Annotate callers of the reset of checksum.h stuff. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:34 -08:00
Al Viro	5084205faf	[NET]: Annotate callers of csum_partial_copy_...() and csum_and_copy...() in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:33 -08:00
Al Viro	44bb93633f	[NET]: Annotate csum_partial() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:32 -08:00
Al Viro	6b11687ef0	[NET]: Annotate csum_tcpudp_magic() callers in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:29 -08:00
Al Viro	d3bc23e7ee	[NET]: Annotate callers of csum_fold() in net/* Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:23:27 -08:00
Al Viro	75e7ce66ef	[IPVS]: Annotate ..._app_hashkey(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:58 -08:00
Al Viro	42d224aa17	[NETFILTER]: More trivial annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:54 -08:00
Al Viro	714e85be35	[IPV6]: Assorted trivial endianness annotations. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:50 -08:00
Gerrit Renker	ba4e58eca8	[NET]: Supporting UDP-Lite (RFC 3828) in Linux This is a revision of the previously submitted patch, which alters the way files are organized and compiled in the following manner: * UDP and UDP-Lite now use separate object files * source file dependencies resolved via header files net/ipv{4,6}/udp_impl.h * order of inclusion files in udp.c/udplite.c adapted accordingly [NET/IPv4]: Support for the UDP-Lite protocol (RFC 3828) This patch adds support for UDP-Lite to the IPv4 stack, provided as an extension to the existing UDPv4 code: * generic routines are all located in net/ipv4/udp.c * UDP-Lite specific routines are in net/ipv4/udplite.c * MIB/statistics support in /proc/net/snmp and /proc/net/udplite * shared API with extensions for partial checksum coverage [NET/IPv6]: Extension for UDP-Lite over IPv6 It extends the existing UDPv6 code base with support for UDP-Lite in the same manner as per UDPv4. In particular, * UDPv6 generic and shared code is in net/ipv6/udp.c * UDP-Litev6 specific extensions are in net/ipv6/udplite.c * MIB/statistics support in /proc/net/snmp6 and /proc/net/udplite6 * support for IPV6_ADDRFORM * aligned the coding style of protocol initialisation with af_inet6.c * made the error handling in udpv6_queue_rcv_skb consistent; to return `-1' on error on all error cases * consolidation of shared code [NET]: UDP-Lite Documentation and basic XFRM/Netfilter support The UDP-Lite patch further provides * API documentation for UDP-Lite * basic xfrm support * basic netfilter support for IPv4 and IPv6 (LOG target) Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:46 -08:00
David S. Miller	a928630a2f	[TCP]: Fix some warning when MD5 is disabled. Just some mis-placed ifdefs: net/ipv4/tcp_minisocks.c: In function ‘tcp_twsk_destructor’: net/ipv4/tcp_minisocks.c:364: warning: unused variable ‘twsk’ net/ipv6/tcp_ipv6.c:1846: warning: ‘tcp_sock_ipv6_specific’ defined but not used net/ipv6/tcp_ipv6.c:1877: warning: ‘tcp_sock_ipv6_mapped_specific’ defined but not used Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:43 -08:00
YOSHIFUJI Hideaki	cfb6eeb4c8	[TCP]: MD5 Signature Option (RFC2385) support. Based on implementation by Rick Payne. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:39 -08:00
Gerrit Renker	b9df3cb8cf	[TCP/DCCP]: Introduce net_xmit_eval Throughout the TCP/DCCP (and tunnelling) code, it often happens that the return code of a transmit function needs to be tested against NET_XMIT_CN which is a value that does not indicate a strict error condition. This patch uses a macro for these recurring situations which is consistent with the already existing macro net_xmit_errno, saving on duplicated code. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2006-12-02 21:22:27 -08:00
David S. Miller	2404043a66	[TCP] htcp: Better packing of struct htcp. Based upon a patch by Joe Perches. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:14 -08:00
Thomas Graf	339bf98ffc	[NETLINK]: Do precise netlink message allocations where possible Account for the netlink message header size directly in nlmsg_new() instead of relying on the caller calculate it correctly. Replaces error handling of message construction functions when constructing notifications with bug traps since a failure implies a bug in calculating the size of the skb. Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:11 -08:00
Gerrit Renker	a94f723d59	[TCP]: Remove dead code in init_sequence This removes two redundancies: 1) The test (skb->protocol == htons(ETH_P_IPV6) in tcp_v6_init_sequence() is always true, due to * tcp_v6_conn_request() is the only function calling this one * tcp_v6_conn_request() redirects all skb's with ETH_P_IP protocol to tcp_v4_conn_request() [ cf. top of tcp_v6_conn_request()] 2) The first argument, `struct sock *sk' of tcp_v{4,6}_init_sequence() is never used. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:22:10 -08:00
David S. Miller	931731123a	[TCP]: Don't set SKB owner in tcp_transmit_skb(). The data itself is already charged to the SKB, doing the skb_set_owner_w() just generates a lot of noise and extra atomics we don't really need. Lmbench improvements on lat_tcp are minimal: before: TCP latency using localhost: 23.2701 microseconds TCP latency using localhost: 23.1994 microseconds TCP latency using localhost: 23.2257 microseconds after: TCP latency using localhost: 22.8380 microseconds TCP latency using localhost: 22.9465 microseconds TCP latency using localhost: 22.8462 microseconds Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:52 -08:00
Stephen Hemminger	35bfbc9407	[TCP]: Allow autoloading of congestion control via setsockopt. If user has permision to load modules, then autoload then attempt autoload of TCP congestion module. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:50 -08:00
Stephen Hemminger	ce7bc3bf15	[TCP]: Restrict congestion control choices. Allow normal users to only choose among a restricted set of congestion control choices. The default is reno and what ever has been configured as default. But the policy can be changed by administrator at any time. For example, to allow any choice: cp /proc/sys/net/ipv4/tcp_available_congestion_control \ /proc/sys/net/ipv4/tcp_allowed_congestion_control Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:49 -08:00
Stephen Hemminger	3ff825b28d	[TCP]: Add tcp_available_congestion_control sysctl. Create /proc/sys/net/ipv4/tcp_available_congestion_control that reflects currently available TCP choices. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:48 -08:00
Eric Dumazet	72a3effaf6	[NET]: Size listen hash tables using backlog hint We currently allocate a fixed size (TCP_SYNQ_HSIZE=512) slots hash table for each LISTEN socket, regardless of various parameters (listen backlog for example) On x86_64, this means order-1 allocations (might fail), even for 'small' sockets, expecting few connections. On the contrary, a huge server wanting a backlog of 50000 is slowed down a bit because of this fixed limit. This patch makes the sizing of listen hash table a dynamic parameter, depending of : - net.core.somaxconn tunable (default is 128) - net.ipv4.tcp_max_syn_backlog tunable (default : 256, 1024 or 128) - backlog value given by user application (2nd parameter of listen()) For large allocations (bigger than PAGE_SIZE), we use vmalloc() instead of kmalloc(). We still limit memory allocation with the two existing tunables (somaxconn & tcp_max_syn_backlog). So for standard setups, this patch actually reduce RAM usage. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:44 -08:00
Thomas Graf	1f6c9557e8	[NET] rules: Share common attribute validation policy Move the attribute policy for the non-specific attributes into net/fib_rules.h and include it in the respective protocols. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:41 -08:00
Thomas Graf	b8964ed9fa	[NET] rules: Protocol independant mark selector Move mark selector currently implemented per protocol into the protocol independant part. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:41 -08:00
Thomas Graf	5f300893fd	[IPV4] nl_fib_lookup: Rename fl_fwmark to fl_mark For the sake of consistency. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:40 -08:00
Thomas Graf	47dcf0cb10	[NET]: Rethink mark field in struct flowi Now that all protocols have been made aware of the mark field it can be moved out of the union thus simplyfing its usage. The config options in the IPv4/IPv6/DECnet subsystems to enable respectively disable mark based routing only obfuscate the code with ifdefs, the cost for the additional comparison in the flow key is insignificant, and most distributions have all these options enabled by default anyway. Therefore it makes sense to remove the config options and enable mark based routing by default. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:39 -08:00
Thomas Graf	82e91ffef6	[NET]: Turn nfmark into generic mark nfmark is being used in various subsystems and has become the defacto mark field for all kinds of packets. Therefore it makes sense to rename it to `mark' and remove the dependency on CONFIG_NETFILTER. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:38 -08:00
Venkat Yekkirala	6b877699c6	SELinux: Return correct context for SO_PEERSEC Fix SO_PEERSEC for tcp sockets to return the security context of the peer (as represented by the SA from the peer) as opposed to the SA used by the local/source socket. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-12-02 21:21:33 -08:00
Al Viro	d5a0a1e310	[IPV4]: encapsulation annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:17 -08:00
Al Viro	8c689a6eae	[XFRM]: misc annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:11 -08:00
Al Viro	5a874db4d9	[NET]: ipconfig and nfsroot annotations Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-12-02 21:21:09 -08:00
Patrick McHardy	af443b6d90	[NETFILTER]: ipt_REJECT: fix memory corruption On devices with hard_header_len > LL_MAX_HEADER ip_route_me_harder() reallocates the skb, leading to memory corruption when using the stale tcph pointer to update the checksum. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-28 20:59:38 -08:00
Yasuyuki Kozakai	2e47c264a2	[NETFILTER]: conntrack: fix refcount leak when finding expectation All users of __{ip,nf}_conntrack_expect_find() don't expect that it increments the reference count of expectation. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-28 20:59:37 -08:00
Patrick McHardy	c537b75a3b	[NETFILTER]: ctnetlink: fix reference count leak When NFA_NEST exceeds the skb size the protocol reference is leaked. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-28 20:59:36 -08:00
Akinobu Mita	ac16ca6412	[NET]: Fix kfifo_alloc() error check. The return value of kfifo_alloc() should be checked by IS_ERR(). Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-25 15:16:49 -08:00
Olaf Kirch	753eab76a3	[UDP]: Make udp_encap_rcv use pskb_may_pull Make udp_encap_rcv use pskb_may_pull IPsec with NAT-T breaks on some notebooks using the latest e1000 chipset, when header split is enabled. When receiving sufficiently large packets, the driver puts everything up to and including the UDP header into the header portion of the skb, and the rest goes into the paged part. udp_encap_rcv forgets to use pskb_may_pull, and fails to decapsulate it. Instead, it passes it up it to the IKE daemon. Signed-off-by: Olaf Kirch <okir@suse.de> Signed-off-by: Jean Delvare <jdelvare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-25 15:16:48 -08:00
Faidon Liambotis	38f7efd52c	[NETFILTER]: H.323 conntrack: fix crash with CONFIG_IP_NF_CT_ACCT H.323 connection tracking code calls ip_ct_refresh_acct() when processing RCFs and URQs but passes NULL as the skb. When CONFIG_IP_NF_CT_ACCT is enabled, the connection tracking core tries to derefence the skb, which results in an obvious panic. A similar fix was applied on the SIP connection tracking code some time ago. Signed-off-by: Faidon Liambotis <paravoid@debian.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-25 15:16:47 -08:00
David Howells	c4028958b6	WorkStruct: make allyesconfig Fix up for make allyesconfig. Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:57:56 +00:00
David Howells	65f27f3844	WorkStruct: Pass the work_struct pointer instead of context data Pass the work_struct pointer to the work function rather than context data. The work function can use container_of() to work out the data. For the cases where the container of the work_struct may go away the moment the pending bit is cleared, it is made possible to defer the release of the structure by deferring the clearing of the pending bit. To make this work, an extra flag is introduced into the management side of the work_struct. This governs auto-release of the structure upon execution. Ordinarily, the work queue executor would release the work_struct for further scheduling or deallocation by clearing the pending bit prior to jumping to the work function. This means that, unless the driver makes some guarantee itself that the work_struct won't go away, the work function may not access anything else in the work_struct or its container lest they be deallocated.. This is a problem if the auxiliary data is taken away (as done by the last patch). However, if the pending bit is not cleared before jumping to the work function, then the work function may access the work_struct and its container with no problems. But then the work function must itself release the work_struct by calling work_release(). In most cases, automatic release is fine, so this is the default. Special initiators exist for the non-auto-release case (ending in _NAR). Signed-Off-By: David Howells <dhowells@redhat.com>	2006-11-22 14:55:48 +00:00
John Heffner	52bf376c63	[TCP]: Fix up sysctl_tcp_mem initialization. Fix up tcp_mem initial settings to take into account the size of the hash entries (different on SMP and non-SMP systems). Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-15 21:18:51 -08:00
Patrick McHardy	d8a585d78e	[NETFILTER]: Use pskb_trim in {ip,ip6,nfnetlink}_queue Based on patch by James D. Nurmi: I've got some code very dependant on nfnetlink_queue, and turned up a large number of warns coming from skb_trim. While it's quite possibly my code, having not seen it on older kernels made me a bit suspect. Anyhow, based on some googling I turned up this thread: http://lkml.org/lkml/2006/8/13/56 And believe the issue to be related, so attached is a small patch to the kernel -- not sure if this is completely correct, but for anyone else hitting the WARN_ON(1) in skbuff.h, it might be helpful.. Signed-off-by: James D. Nurmi <jdnurmi@gmail.com> Ported to ip6_queue and nfnetlink_queue and added return value checks. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-15 21:18:48 -08:00
Julian Anastasov	bb831eb202	[IPVS]: More endianness fixed. - make sure port in FTP data is in network order (in fact it was looking buggy for big endian boxes before Viro's changes) - htonl -> htons for port Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-10 14:57:37 -08:00
John Heffner	9e950efa20	[TCP]: Don't use highmem in tcp hash size calculation. This patch removes consideration of high memory when determining TCP hash table sizes. Taking into account high memory results in tcp_mem values that are too large. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-07 15:10:11 -08:00
Stephen Hemminger	b1736a7140	[TCP]: Set default congestion control when no sysctl. The setting of the default congestion control was buried in the sysctl code so it would not be done properly if SYSCTL was not enabled. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-11-01 15:42:34 -08:00
Paul Moore	f8687afefc	[NetLabel]: protect the CIPSOv4 socket option from setsockopt() This patch makes two changes to protect applications from either removing or tampering with the CIPSOv4 IP option on a socket. The first is the requirement that applications have the CAP_NET_RAW capability to set an IPOPT_CIPSO option on a socket; this prevents untrusted applications from setting their own CIPSOv4 security attributes on the packets they send. The second change is to SELinux and it prevents applications from setting any IPv4 options when there is an IPOPT_CIPSO option already present on the socket; this prevents applications from removing CIPSOv4 security attributes from the packets they send. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:49 -08:00
Dmitry Mishin	920b868ae1	[NETFILTER]: ip_tables: compat code module refcounting fix This patch fixes bug in iptables modules refcounting on compat error way. As we are getting modules in check_compat_entry_size_and_hooks(), in case of later error, we should put them all in translate_compat_table(), not in the compat_copy_entry_from_user() or compat_copy_match_from_user(), as it is now. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Vasily Averin <vvs@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:47 -08:00
Vasily Averin	ef4512e766	[NETFILTER]: ip_tables: compat error way cleanup This patch adds forgotten compat_flush_offset() call to error way of translate_compat_table(). May lead to table corruption on the next compat_do_replace(). Signed-off-by: Vasily Averin <vvs@openvz.org> Acked-by: Dmitry Mishin <dim@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:45 -08:00
Dmitry Mishin	590bdf7fd2	[NETFILTER]: Missed and reordered checks in {arp,ip,ip6}_tables There is a number of issues in parsing user-provided table in translate_table(). Malicious user with CAP_NET_ADMIN may crash system by passing special-crafted table to the _tables. The first issue is that mark_source_chains() function is called before entry content checks. In case of standard target, mark_source_chains() function uses t->verdict field in order to determine new position. But the check, that this field leads no further, than the table end, is in check_entry(), which is called later, than mark_source_chains(). The second issue, that there is no check that target_offset points inside entry. If so, _ITERATE_MATCH macro will follow further, than the entry ends. As a result, we'll have oops or memory disclosure. And the third issue, that there is no check that the target is completely inside entry. Results are the same, as in previous issue. Signed-off-by: Dmitry Mishin <dim@openvz.org> Acked-by: Kirill Korotaev <dev@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:44 -08:00
Heiko Carstens	a27b58fed9	[NET]: fix uaccess handling Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-30 15:24:41 -08:00
Gavin McCullagh	2a272f9861	[TCP] H-TCP: fix integer overflow When using H-TCP with a single flow on a 500Mbit connection (or less actually), alpha can exceed 65000, so alpha needs to be a u32. Signed-off-by: Gavin McCullagh <gavin.mccullagh@nuim.ie> Signed-off-by: Doug Leith <doug.leith@nuim.ie> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-25 23:05:52 -07:00
Stephen Hemminger	22119240b1	[TCP] cubic: scaling error Doug Leith observed a discrepancy between the version of CUBIC described in the papers and the version in 2.6.18. A math error related to scaling causes Cubic to grow too slowly. Patch is from "Sangtae Ha" <sha2@ncsu.edu>. I validated that it does fix the problems. See the following to show behavior over 500ms 100 Mbit link. Sender (2.6.19-rc3) --- Bridge (2.6.18-rt7) ------- Receiver (2.6.19-rc3) 1G [netem] 100M http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-orig.png http://developer.osdl.org/shemminger/tcp/2.6.19-rc3/cubic-fix.png Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-25 23:04:12 -07:00
Al Viro	82571026b9	[IPV4] ipconfig: fix RARP ic_servaddr breakage memcpy 4 bytes to address of auto unsigned long variable followed by comparison with u32 is a bloody bad idea. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-24 15:18:36 -07:00
Eric Dumazet	06ca719fad	[TCP]: One NET_INC_STATS() could be NET_INC_STATS_BH in tcp_v4_err() I believe this NET_INC_STATS() call can be replaced by NET_INC_STATS_BH(), a little bit cheaper. Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-20 00:22:25 -07:00
Björn Steinbrink	82fac0542e	[NETFILTER]: Missing check for CAP_NET_ADMIN in iptables compat layer The 32bit compatibility layer has no CAP_NET_ADMIN check in compat_do_ipt_get_ctl, which for example allows to list the current iptables rules even without having that capability (the non-compat version requires it). Other capabilities might be required to exploit the bug (eg. CAP_NET_RAW to get the nfnetlink socket?), so a plain user can't exploit it, but a setup actually using the posix capability system might very well hit such a constellation of granted capabilities. Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-20 00:21:10 -07:00
John Heffner	ae8064ac32	[TCP]: Bound TSO defer time This patch limits the amount of time you will defer sending a TSO segment to less than two clock ticks, or the time between two acks, whichever is longer. On slow links, deferring causes significant bursts. See attached plots, which show RTT through a 1 Mbps link with a 100 ms RTT and ~100 ms queue for (a) non-TSO, (b) currnet TSO, and (c) patched TSO. This burstiness causes significant jitter, tends to overflow queues early (bad for short queues), and makes delay-based congestion control more difficult. Deferring by a couple clock ticks I believe will have a relatively small impact on performance. Signed-off-by: John Heffner <jheffner@psc.edu> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 20:36:48 -07:00
Thomas Graf	b52f070c9c	[IPv4] fib: Remove unused fib_config members Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-18 20:26:36 -07:00
Eric Dumazet	4663afe2c8	[NET]: reduce sizeof(struct inet_peer), cleanup, change in peer_check_expire() 1) shrink struct inet_peer on 64 bits platforms.	2006-10-15 23:14:17 -07:00
Paul Moore	ea614d7f4f	NetLabel: the CIPSOv4 passthrough mapping does not pass categories correctly The CIPSO passthrough mapping had a problem when sending categories which would cause no or incorrect categories to be sent on the wire with a packet. This patch fixes the problem which was a simple off-by-one bug. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-10-15 23:14:16 -07:00
Paul Moore	044a68ed8a	NetLabel: only deref the CIPSOv4 standard map fields when using standard mapping Fix several places in the CIPSO code where it was dereferencing fields which did not have valid pointers by moving those pointer dereferences into code blocks where the pointers are valid. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-10-15 23:14:14 -07:00
Pablo Neira Ayuso	9ea8cfd6aa	[NETFILTER]: ctnetlink: Remove debugging messages Remove (compilation-breaking) debugging messages introduced at early development stage. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-15 23:14:11 -07:00
Patrick McHardy	a9f54596fa	[NETFILTER]: ipt_ECN/ipt_TOS: fix incorrect checksum update Even though the tos field is only a single byte large, the values need to be converted to net-endian for the checkum update so they are in the corrent byte position. Also fix incorrect endian annotations. Reported by Stephane Chazelas <Stephane_Chazelas@yahoo.fr> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-15 23:14:08 -07:00
Patrick McHardy	f603b6ec50	[NETFILTER]: arp_tables: missing unregistration on module unload Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-15 23:14:07 -07:00
David S. Miller	8238b218ec	[NET]: Do not memcmp() over pad bytes of struct flowi. They are not necessarily initialized to zero by the compiler, for example when using run-time initializers of automatic on-stack variables. Noticed by Eric Dumazet and Patrick McHardy. Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-12 00:49:15 -07:00
YOSHIFUJI Hideaki	9469c7b4aa	[NET]: Use typesafe inet_twsk() inline function instead of cast. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:58 -07:00
YOSHIFUJI Hideaki	496c98dff8	[NET]: Use hton{l,s}() for non-initializers. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:56 -07:00
YOSHIFUJI Hideaki	4244f8a9f8	[TCP]: Use TCPOLEN_TSTAMP_ALIGNED macro instead of magic number. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2006-10-11 23:59:54 -07:00
Venkat Yekkirala	5b368e61c2	IPsec: correct semantics for SELinux policy matching Currently when an IPSec policy rule doesn't specify a security context, it is assumed to be "unlabeled" by SELinux, and so the IPSec policy rule fails to match to a flow that it would otherwise match to, unless one has explicitly added an SELinux policy rule allowing the flow to "polmatch" to the "unlabeled" IPSec policy rules. In the absence of such an explicitly added SELinux policy rule, the IPSec policy rule fails to match and so the packet(s) flow in clear text without the otherwise applicable xfrm(s) applied. The above SELinux behavior violates the SELinux security notion of "deny by default" which should actually translate to "encrypt by default" in the above case. This was first reported by Evgeniy Polyakov and the way James Morris was seeing the problem was when connecting via IPsec to a confined service on an SELinux box (vsftpd), which did not have the appropriate SELinux policy permissions to send packets via IPsec. With this patch applied, SELinux "polmatching" of flows Vs. IPSec policy rules will only come into play when there's a explicit context specified for the IPSec policy rule (which also means there's corresponding SELinux policy allowing appropriate domains/flows to polmatch to this context). Secondly, when a security module is loaded (in this case, SELinux), the security_xfrm_policy_lookup() hook can return errors other than access denied, such as -EINVAL. We were not handling that correctly, and in fact inverting the return logic and propagating a false "ok" back up to xfrm_lookup(), which then allowed packets to pass as if they were not associated with an xfrm policy. The solution for this is to first ensure that errno values are correctly propagated all the way back up through the various call chains from security_xfrm_policy_lookup(), and handled correctly. Then, flow_cache_lookup() is modified, so that if the policy resolver fails (typically a permission denied via the security module), the flow cache entry is killed rather than having a null policy assigned (which indicates that the packet can pass freely). This also forces any future lookups for the same flow to consult the security module (e.g. SELinux) for current security policy (rather than, say, caching the error on the flow cache entry). This patch: Fix the selinux side of things. This makes sure SELinux polmatching of flow contexts to IPSec policy rules comes into play only when an explicit context is associated with the IPSec policy rule. Also, this no longer defaults the context of a socket policy to the context of the socket since the "no explicit context" case is now handled properly. Signed-off-by: Venkat Yekkirala <vyekkirala@TrustedCS.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-10-11 23:59:37 -07:00
paul.moore@hp.com	ffb733c650	NetLabel: fix a cache race condition Testing revealed a problem with the NetLabel cache where a cached entry could be freed while in use by the LSM layer causing an oops and other problems. This patch fixes that problem by introducing a reference counter to the cache entry so that it is only freed when it is no longer in use. Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: James Morris <jmorris@namei.org>	2006-10-11 23:59:29 -07:00
Al Viro	5e7ddac75d	[PATCH] ptrdiff_t is %t, not %z Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2006-10-10 15:37:23 -07:00

... 3 4 5 6 7 ...

1439 commits