linux

Commit Graph

Author	SHA1	Message	Date
Patrick McHardy	a795756333	[NETFILTER]: Mark ctnetlink as EXPERIMENTAL Should have been marked EXPERIMENTAL from the beginning, as the current bunch of fixes show. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:36:25 -08:00
Patrick McHardy	0be7fa92ca	[NETFILTER]: Fix CTA_PROTO_NUM attribute size in ctnetlink CTA_PROTO_NUM is a u_int8_t. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:34:51 -08:00
Patrick McHardy	afe5c6bb03	[NETFILTER]: Fix ip_conntrack_flush abuse in ctnetlink ip_conntrack_flush() used to be part of ip_conntrack_cleanup(), which needs to drop _all_ references on module unload. Table flushed using ctnetlink just needs to clean the table and doesn't need to flush the event cache or wait for any references attached to skbs. Move everything but pure table flushing back to ip_conntrack_cleanup(). Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:33:50 -08:00
Pablo Neira Ayuso	8d1ca69984	[NETFILTER]: Fix incorrect argument to ip_nat_initialized() in ctnetlink ip_nat_initialized() takes enum ip_nat_manip_type as it's second argument, not a hook number. Noticed and initial patch by Marcus Sundberg <marcus@ingate.com>. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-05 13:32:14 -08:00
Herbert Xu	86c8f9d158	[IPV4] Fix EPROTONOSUPPORT error in inet_create There is a coding error in inet_create that causes it to always return ESOCKTNOSUPPORT. It should return EPROTONOSUPPORT when there are protocols registered for a given socket type but none of them match the requested protocol. This is based on a patch by Jayachandran C. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-02 20:43:26 -08:00
David Stevens	24c6927505	[IGMP]: workaround for IGMP v1/v2 bug From: David Stevens <dlstevens@us.ibm.com> As explained at: http://www.cs.ucsb.edu/~krishna/igmp_dos/ With IGMP version 1 and 2 it is possible to inject a unicast report to a client which will make it ignore multicast reports sent later by the router. The fix is to only accept the report if is was sent to a multicast or unicast address. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-02 20:32:59 -08:00
Thomas Graf	ea86575eaf	[NETLINK]: Fix processing of fib_lookup netlink messages The receive path for fib_lookup netlink messages is lacking sanity checks for header and payload and is thus vulnerable to malformed netlink messages causing illegal memory references. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-01 14:30:00 -08:00
Phil Oester	2a43c4af3f	[NETFILTER]: Fix recent match jiffies wrap mismatches Around jiffies wrap time (i.e. within first 5 mins after boot), recent match rules which contain both --seconds and --hitcount arguments experience false matches. This is because the last_pkts array is filled with zeros on creation, and when comparing 'now' to 0 (+ --seconds argument), time_before_eq thinks it has found a hit. Below patch adds a break if the packet value is zero. This has the unfortunate side effect of causing mismatches if a packet was received when jiffies really was equal to zero. The odds of that happening are slim compared to the problems caused by not adding the break however. Plus, the author used this same method just below, so it is "good enough". This fixes netfilter bugs #383 and #395. Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-01 14:29:24 -08:00
Jozsef Kadlecsik	73f306024c	[NETFILTER]: Ignore ACKs ACKs on half open connections in TCP conntrack Mounting NFS file systems after a (warm) reboot could take a long time if firewalling and connection tracking was enabled. The reason is that the NFS clients tends to use the same ports (800 and counting down). Now on reboot, the server would still have a TCB for an existing TCP connection client:800 -> server:2049. The client sends a SYN from port 800 to server:2049, which elicits an ACK from the server. The firewall on the client drops the ACK because (from its point of view) the connection is still in half-open state, and it expects to see a SYNACK. The client will eventually time out after several minutes. The following patch corrects this, by accepting ACKs on half open connections as well. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-12-01 14:28:58 -08:00
Adrian Bunk	d127e94a5c	[NETFILTER] ipv4: small cleanups This patch contains the following cleanups: - make needlessly global code static - ip_conntrack_core.c: ip_conntrack_flush() -> ip_conntrack_flush(void) Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:28:18 -08:00
Adrian Bunk	4b30b1c6a3	[IPV4]: make two functions static This patch makes two needlessly global functions static. Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:27:20 -08:00
Arjan van de Ven	9b5b5cff9a	[NET]: Add const markers to various variables. the patch below marks various variables const in net/; the goal is to move them to the .rodata section so that they can't false-share cachelines with things that get written to, as well as potentially helping gcc a bit with optimisations. (these were found using a gcc patch to warn about such variables) Signed-off-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:21:38 -08:00
Mike Stroyan	18955cfcb2	[IPV4] tcp/route: Another look at hash table sizes The tcp_ehash hash table gets too big on systems with really big memory. It is worse on systems with pages larger than 4KB. It wastes memory that could be better used. It also makes the netstat command slow because reading /proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table. The default value should not be larger for larger page sizes. It seems that the effect of page size is an unintended error dating back a long time. I also wonder if the default value really should be a larger fraction of memory for systems with more memory. While systems with really big ram can afford more space for hash tables, it is not clear to me that they benefit from increasing the allocation ratio for this table. The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and mm/page_alloc.c:alloc_large_system_hash. tcp_init calls alloc_large_system_hash passing parameters- bucketsize=sizeof(struct tcp_ehash_bucket) numentries=thash_entries scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT) limit=0 On i386, PAGE_SHIFT is 12 for a page size of 4K On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K The num_physpages test above makes the allocation take a larger fraction of the total memory on systems with larger memory. The threshold size for a i386 system is 512MB. For an ia64 system with 16KB pages the threshold is 2GB. For smaller memory systems- On i386, scale = (27 - 12) = 15 On ia64, scale = (27 - 14) = 13 For larger memory systems- On i386, scale = (25 - 12) = 13 On ia64, scale = (25 - 14) = 11 For the rest of this discussion, I'll just track the larger memory case. The default behavior has numentries=thash_entries=0, so the allocated size is determined by either scale or by the default limit of 1/16 of total memory. In alloc_large_system_hash- \| numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages; \| numentries += (1UL << (20 - PAGE_SHIFT)) - 1; \| numentries >>= 20 - PAGE_SHIFT; \| numentries <<= 20 - PAGE_SHIFT; At this point, numentries is pages for all of memory, rounded up to the nearest megabyte boundary. \| /* limit to 1 bucket per 2^scale bytes of low memory */ \| if (scale > PAGE_SHIFT) \| numentries >>= (scale - PAGE_SHIFT); \| else \| numentries <<= (PAGE_SHIFT - scale); On i386, numentries >>= (13 - 12), so numentries is 1/8196 of bytes of total memory. On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of bytes of total memory. \| log2qty = long_log2(numentries); \| \| do { \| size = bucketsize << log2qty; bucketsize is 16, so size is 16 times numentries, rounded down to a power of two. On i386, size is 1/512 of bytes of total memory. On ia64, size is 1/128 of bytes of total memory. For smaller systems the results are On i386, size is 1/2048 of bytes of total memory. On ia64, size is 1/512 of bytes of total memory. The large page effect can be removed by just replacing the use of PAGE_SHIFT with a constant of 12 in the calls to alloc_large_system_hash. That makes them more like the other uses of that function from fs/inode.c and fs/dcache.c Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-29 16:12:55 -08:00
Benoit Boissinot	de919820cf	[NETFILTER]: ip_conntrack_netlink.c needs linux/interrupt.h net/ipv4/netfilter/ip_conntrack_netlink.c: In function 'ctnetlink_dump_table': net/ipv4/netfilter/ip_conntrack_netlink.c:409: warning: implicit declaration of function 'local_bh_disable' net/ipv4/netfilter/ip_conntrack_netlink.c:427: warning: implicit declaration of function 'local_bh_enable' Signed-off-by: Benoit Boissinot <benoit.boissinot@ens-lyon.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-23 19:03:46 -08:00
Pablo Neira Ayuso	00cb277a4a	[NETFILTER] ctnetlink: Fix refcount leak ip_conntrack/nat_proto Remove proto == NULL checking since ip_conntrack_[nat_]proto_find_get always returns a valid pointer. Fix missing ip_conntrack_proto_put in some paths. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-22 14:54:34 -08:00
Jamal Hadi Salim	0ff60a4567	[IPV4]: Fix secondary IP addresses after promotion This patch fixes the problem with promoting aliases when: a) a single primary and > 1 secondary addresses b) multiple primary addresses each with at least one secondary address Based on earlier efforts from Brian Pomerantz <bapper@piratehaven.org>, Patrick McHardy <kaber@trash.net> and Thomas Graf <tgraf@suug.ch> Signed-off-by: Jamal Hadi Salim <hadi@cyberus.ca> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-22 14:47:37 -08:00
Yasuyuki Kozakai	2b8f2ff6f4	[NETFILTER]: fixed dependencies between modules related with ip_conntrack - IP_NF_CONNTRACK_MARK is bool and depends on only IP_NF_CONNTRACK which is tristate. If a variable depends on IP_NF_CONNTRACK_MARK and doesn't care about IP_NF_CONNTRACK, it can be y. This must be avoided. - IP_NF_CT_ACCT has same problem. - IP_NF_TARGET_CLUSTERIP also depends on IP_NF_MANGLE. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-20 21:09:55 -08:00
Patrick McHardy	c9e53cbe7a	[FIB_TRIE]: Don't show local table in /proc/net/route output Don't show local table to behave similar to fib_hash. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-20 21:09:00 -08:00
Harald Welte	2fce76afdb	[NETFILTER] ip_conntrack: fix ftp/irc/tftp helpers on ports >= 32768 Since we've converted the ftp/irc/tftp helpers to use the new module_parm_array() some time ago, we ware accidentially using signed data types - thus preventing those modules from being used on ports >= 32768. This patch fixes it by using 'ushort' module parameters. Thanks to Jan Nijs for reporting this bug. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-17 15:06:47 -08:00
Stephen Hemminger	bd6af700a7	[TCP]: TCP highspeed build error There is a compile error that crept in with the last patch of TCP patches. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-17 14:11:18 -08:00
Yasuyuki Kozakai	e7c8a41e81	[IPV4,IPV6]: replace handmade list with hlist in IPv{4,6} reassembly Both of ipq and frag_queue have next and *prev, and they can be replaced with hlist. Thanks Arnaldo Carvalho de Melo for the suggestion. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-16 12:55:37 -08:00
Stephen Hemminger	31f3426904	[TCP]: More spelling fixes. From Joe Perches Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-15 15:17:10 -08:00
Harald Welte	37d2e7a20d	[NETFILTER] nfnetlink: unconditionally require CAP_NET_ADMIN This patch unconditionally requires CAP_NET_ADMIN for all nfnetlink messages. It also removes the per-message cap_required field, since all existing subsystems use CAP_NET_ADMIN for all their messages anyway. Patrick McHardy owes me a beer if we ever need to re-introduce this. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-14 15:24:59 -08:00
Pablo Neira Ayuso	5655820852	[NETFILTER] ctnetlink: More thorough size checking of attributes Add missing size checks. Thanks Patrick McHardy for the hint. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-14 15:22:11 -08:00
Pablo Neira Ayuso	dbd36ea496	[NETFILTER] ctnetlink: use size_t to make gcc-4.x happy Make gcc-4.x happy. Use size_t instead of int. Thanks to Patrick McHardy for the hint. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-14 15:21:01 -08:00
Vlad Drukker	a2d7222f0f	[NETFILTER] {ip,nf}_conntrack TCP: Accept SYN+PUSH like SYN Some devices (e.g. Qlogic iSCSI HBA hardware like QLA4010 up to firmware 3.0.0.4) initiates TCP with SYN and PUSH flags set. The Linux TCP/IP stack deals fine with that, but the connection tracking code doesn't. This patch alters TCP connection tracking to accept SYN+PUSH as a valid flag combination. Signed-off-by: Vlad Drukker <vlad@storewiz.com> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-12 12:13:14 -08:00
Jeff Garzik	c050970a25	[PATCH] TCP: fix vegas build Recent TCP changes broke the build. Signed-off-by: Jeff Garzik <jgarzik@pobox.com> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-11-11 09:21:28 -08:00
Stephen Hemminger	6a438bbe68	[TCP]: speed up SACK processing Use "hints" to speed up the SACK processing. Various forms of this have been used by TCP developers (Web100, STCP, BIC) to avoid the 2x linear search of outstanding segments. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:14:59 -08:00
Stephen Hemminger	caa20d9abe	[TCP]: spelling fixes Minor spelling fixes for TCP code. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:13:47 -08:00
John Heffner	326f36e9e7	[TCP]: receive buffer growth limiting with mixed MTU This is a patch for discussion addressing some receive buffer growing issues. This is partially related to the thread "Possible BUG in IPv4 TCP window handling..." last week. Specifically it addresses the problem of an interaction between rcvbuf moderation (receiver autotuning) and rcv_ssthresh. The problem occurs when sending small packets to a receiver with a larger MTU. (A very common case I have is a host with a 1500 byte MTU sending to a host with a 9k MTU.) In such a case, the rcv_ssthresh code is targeting a window size corresponding to filling up the current rcvbuf, not taking into account that the new rcvbuf moderation may increase the rcvbuf size. One hunk makes rcv_ssthresh use tcp_rmem[2] as the size target rather than rcvbuf. The other changes the behavior when it overflows its memory bounds with in-order data so that it tries to grow rcvbuf (the same as with out-of-order data). These changes should help my problem of mixed MTUs, and should also help the case from last week's thread I think. (In both cases though you still need tcp_rmem[2] to be set much larger than the TCP window.) One question is if this is too aggressive at trying to increase rcvbuf if it's under memory stress. Orignally-from: John Heffner <jheffner@psc.edu> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:11:48 -08:00
Stephen Hemminger	9772efb970	[TCP]: Appropriate Byte Count support This is an updated version of the RFC3465 ABC patch originally for Linux 2.6.11-rc4 by Yee-Ting Li. ABC is a way of counting bytes ack'd rather than packets when updating congestion control. The orignal ABC described in the RFC applied to a Reno style algorithm. For advanced congestion control there is little change after leaving slow start. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:09:53 -08:00
Stephen Hemminger	7faffa1c7f	[TCP]: add tcp_slow_start helper Move all the code that does linear TCP slowstart to one inline function to ease later patch to add ABC support. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 17:07:24 -08:00
Stephen Hemminger	2d2abbab63	[TCP]: simplify microsecond rtt sampling Simplify the code that comuputes microsecond rtt estimate used by TCP Vegas. Move the callback out of the RTT sampler and into the end of the ack cleanup. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 16:56:12 -08:00
Stephen Hemminger	f4805eded7	[TCP]: fix congestion window update when using TSO deferal TCP peformance with TSO over networks with delay is awful. On a 100Mbit link with 150ms delay, we get 4Mbits/sec with TSO and 50Mbits/sec without TSO. The problem is with TSO, we intentionally do not keep the maximum number of packets in flight to fill the window, we hold out to until we can send a MSS chunk. But, we also don't update the congestion window unless we have filled, as per RFC2861. This patch replaces the check for the congestion window being full with something smarter that accounts for TSO. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 16:53:30 -08:00
Herbert Xu	fb286bb299	[NET]: Detect hardware rx checksum faults correctly Here is the patch that introduces the generic skb_checksum_complete which also checks for hardware RX checksum faults. If that happens, it'll call netdev_rx_csum_fault which currently prints out a stack trace with the device name. In future it can turn off RX checksum. I've converted every spot under net/ that does RX checksum checks to use skb_checksum_complete or __skb_checksum_complete with the exceptions of: * Those places where checksums are done bit by bit. These will call netdev_rx_csum_fault directly. * The following have not been completely checked/converted: ipmr ip_vs netfilter dccp This patch is based on patches and suggestions from Stephen Hemminger and David S. Miller. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 13:01:24 -08:00
Thomas Graf	a8f74b2288	[NETLINK]: Make netlink_callback->done() optional Most netlink families make no use of the done() callback, making it optional gets rid of all unnecessary dummy implementations. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-10 02:26:40 +01:00
Yasuyuki Kozakai	9fb9cbb108	[NETFILTER]: Add nf_conntrack subsystem. The existing connection tracking subsystem in netfilter can only handle ipv4. There were basically two choices present to add connection tracking support for ipv6. We could either duplicate all of the ipv4 connection tracking code into an ipv6 counterpart, or (the choice taken by these patches) we could design a generic layer that could handle both ipv4 and ipv6 and thus requiring only one sub-protocol (TCP, UDP, etc.) connection tracking helper module to be written. In fact nf_conntrack is capable of working with any layer 3 protocol. The existing ipv4 specific conntrack code could also not deal with the pecularities of doing connection tracking on ipv6, which is also cured here. For example, these issues include: 1) ICMPv6 handling, which is used for neighbour discovery in ipv6 thus some messages such as these should not participate in connection tracking since effectively they are like ARP messages 2) fragmentation must be handled differently in ipv6, because the simplistic "defrag, connection track and NAT, refrag" (which the existing ipv4 connection tracking does) approach simply isn't feasible in ipv6 3) ipv6 extension header parsing must occur at the correct spots before and after connection tracking decisions, and there were no provisions for this in the existing connection tracking design 4) ipv6 has no need for stateful NAT The ipv4 specific conntrack layer is kept around, until all of the ipv4 specific conntrack helpers are ported over to nf_conntrack and it is feature complete. Once that occurs, the old conntrack stuff will get placed into the feature-removal-schedule and we will fully kill it off 6 months later. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-09 16:38:16 -08:00
Krzysztof Piotr Oledzki	5fd52fe098	[NETFILTER] ctnetlink: ICMP_ID is u_int16_t not u_int8_t. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:04:32 -08:00
Krzysztof Piotr Oledzki	439a9994bb	[NETFILTER] ctnetlink: Fix oops when no ICMP ID info in message This patch fixes an userspace triggered oops. If there is no ICMP_ID info the reference to attr will be NULL. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:04:08 -08:00
Pablo Neira Ayuso	a856a19a9f	[NETFILTER] ctnetlink: Add support to identify expectations by ID's Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:03:42 -08:00
Pablo Neira Ayuso	fcda46128d	[NETFILTER] ctnetlink: propagate error instaed of returning -EPERM Propagate the error to userspace instead of returning -EPERM if the get conntrack operation fails. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:03:26 -08:00
Pablo Neira Ayuso	fe902a91ff	[NETFILTER] ctnetlink: return -EINVAL if size is wrong Return -EINVAL if the size isn't OK instead of -EPERM. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:03:09 -08:00
Yasuyuki Kozakai	d63a928108	[NETFILTER]: stop tracking ICMP error at early point Currently connection tracking handles ICMP error like normal packets if it failed to get related connection. But it fails that after all. This makes connection tracking stop tracking ICMP error at early point. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:02:45 -08:00
Philip Craig	5978a9b82c	[NETFILTER] PPTP helper: fix PNS-PAC expectation call id The reply tuple of the PNS->PAC expectation was using the wrong call id. So we had the following situation: - PNS behind NAT firewall - PNS call id requires NATing - PNS->PAC gre packet arrives first then the PNS->PAC expectation is matched, and the other expectation is deleted, but the PAC->PNS gre packets do not match the gre conntrack because the call id is wrong. We also cannot use ip_nat_follow_master(). Signed-off-by: Philip Craig <philipc@snapgear.com> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:01:53 -08:00
Pablo Neira Ayuso	81e5c27d08	[NETFILTER] ctnetlink: get_conntrack can use GFP_KERNEL ctnetlink_get_conntrack is always called from user context, so GFP_KERNEL is enough. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:01:19 -08:00
Pablo Neira Ayuso	7a4fe3664b	[NETFILTER] ctnetlink: kill unused includes Kill some useless headers included in ctnetlink. They aren't used in any way. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:00:47 -08:00
Pablo Neira Ayuso	119a318494	[NETFILTER] ctnetlink: add module alias to fix autoloading Add missing module alias. This is a must to load ctnetlink on demand. For example, the conntrack tool will fail if the module isn't loaded. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:00:29 -08:00
Pablo Neira Ayuso	02a78cdf42	[NETFILTER] ctnetlink: add marking support from userspace This patch adds support for conntrack marking from user space. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 13:00:04 -08:00
Pablo Neira Ayuso	51df784ed7	[NETFILTER] ctnetlink: check if protoinfo is present This fixes an oops triggered from userspace. If we don't pass information about the private protocol info, the reference to attr will be NULL. This is likely to happen in update messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 12:59:41 -08:00
Harald Welte	a2506c0432	[NETFILTER] nfnetlink: nfattr_parse() can never fail, make it void nfattr_parse (and thus nfattr_parse_nested) always returns success. So we can make them 'void' and remove all the checking at the caller side. Based on original patch by Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 12:59:13 -08:00
Yasuyuki Kozakai	eaae4fa45e	[NETFILTER]: refcount leak of proto when ctnetlink dumping tuple Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 12:58:46 -08:00
Yasuyuki Kozakai	46998f59c0	[NETFILTER]: packet counter of conntrack is 32bits The packet counter variable of conntrack was changed to 32bits from 64bits. This follows that change. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-09 12:58:05 -08:00
Herbert Xu	89f5f0aeed	[IPV4]: Fix ip_queue_xmit identity increment for TSO packets When ip_queue_xmit calls ip_select_ident_more for IP identity selection it gives it the wrong packet count for TSO packets. The ip_select_* functions expect one less than the number of packets, so we need to subtract one for TSO packets. This bug was diagnosed and fixed by Tom Young. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-08 09:41:56 -08:00
Jesper Juhl	a51482bde2	[NET]: kfree cleanup From: Jesper Juhl <jesper.juhl@gmail.com> This is the net/ part of the big kfree cleanup patch. Remove pointless checks for NULL prior to calling kfree() in net/. Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Arnaldo Carvalho de Melo <acme@conectiva.com.br> Acked-by: Marcel Holtmann <marcel@holtmann.org> Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: Andrew Morton <akpm@osdl.org>	2005-11-08 09:41:34 -08:00
Julian Anastasov	dc8103f25f	[IPVS]: fix connection leak if expire_nodest_conn=1 There was a fix in 2.6.13 that changed the behaviour of ip_vs_conn_expire_now function not to put reference to connection, its callers should hold write lock or connection refcnt. But we forgot to convert one caller, when the real server for connection is unavailable caller should put the connection reference. It happens only when sysctl var expire_nodest_conn is set to 1 and such connections never expire. Thanks to Roberto Nibali who found the problem and tested a 2.4.32-rc2 patch, which is equal to this 2.6 version. Patch for 2.4 is already sent to Marcelo. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Roberto Nibali <ratz@drugphish.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-11-08 09:40:05 -08:00
Stephen Hemminger	6df716340d	[TCP/DCCP]: Randomize port selection This patch randomizes the port selected on bind() for connections to help with possible security attacks. It should also be faster in most cases because there is no need for a global lock. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-05 21:23:15 -02:00
Harald Welte	433a4d3b54	[NETFILTER]: CONNMARK target needs ip_conntrack There's a missing dependency from the CONNMARK target to ip_conntrack. Signed-off-by: Pablo Neira Ayuso <pablo@eurodev.net> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-05 16:39:20 -02:00
Harald Welte	0f81eb4db4	[NETFILTER]: Fix double free after netlink_unicast() in ctnetlink It's not necessary to free skb if netlink_unicast() failed. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-05 03:28:37 -02:00
Harald Welte	d2a7bb7141	[NETFILTER] NAT: Fix module refcount dropping too far The unknown protocol is used as a fallback when a protocol isn't known. Hence we cannot handle it failing, so don't set ".me". It's OK, since we only grab a reference from within the same module (iptable_nat.ko), so we never take the module refcount from 0 to 1. Also, remove the "protocol is NULL" test: it's never NULL. Signed-off-by: Rusty Rusty <rusty@rustcorp.com.au> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-05 01:23:34 -02:00
Harald Welte	d811552eda	[NETFILTER] PPTP helper: Fix endianness bug in GRE key / CallID NAT This endianness bug slipped through while changing the 'gre.key' field in the conntrack tuple from 32bit to 16bit. None of my tests caught the problem, since the linux pptp client always has '0' as call id / gre key. Only windows clients actually trigger the bug. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-04 23:19:17 -02:00
Harald Welte	3428c209c6	[NETFILTER] PPTP helper: Fix compilation of conntrack helper without NAT This patch fixes compilation of the PPTP conntrack helper when NAT is configured off. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-04 23:02:53 -02:00
Stephen Hemminger	450b5b1898	[TCP]: BIC max increment too large The max growth of BIC TCP is too large. Original code was based on BIC 1.0 and the default there was 32. Later code (2.6.13) included compensation for delayed acks, and should have reduced the default value to 16; since normally TCP gets one ack for every two packets sent. The current value of 32 makes BIC too aggressive and unfair to other flows. Submitted-by: Injong Rhee <rhee@eos.ncsu.edu> Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-02 21:24:01 -02:00
Yan Zheng	8713dbf057	[MCAST]: ip[6]_mc_add_src should be called when number of sources is zero And filter mode is exclude. Further explanation by David Stevens: Multicast source filters aren't widely used yet, and that's really the only feature that's affected if an application actually exercises this bug, as far as I can tell. An ordinary filter-less multicast join should still work, and only forwarded multicast traffic making use of filters and doing empty-source filters with the MSFILTER ioctl would be at risk of not getting multicast traffic forwarded to them because the reports generated would not be based on the correct counts. Signed-off-by: Yan Zheng <yanzheng@21cn.com Acked-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-11-02 21:03:57 -02:00
Harald Welte	6b7d31fcdd	[NETFILTER]: Add "revision" support to arp_tables and ip6_tables Like ip_tables already has it for some time, this adds support for having multiple revisions for each match/target. We steal one byte from the name in order to accomodate a 8 bit version number. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-31 16:36:08 -02:00
Jean Delvare	3fa63c7d82	[PATCH] Typo fix: dot after newline in printk strings Typo fix: dots appearing after a newline in printk strings. Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-30 17:37:20 -08:00
Jayachandran C	9fcc2e8a75	[IPV4]: Fix issue reported by Coverity in ipv4/fib_frontend.c fib_del_ifaddr() dereferences ifa->ifa_dev, so the code already assumes that ifa->ifa_dev is non-NULL, the check is unnecessary. Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-29 02:53:39 -02:00
Ananda Raju	e89e9cf539	[IPv4/IPv6]: UFO Scatter-gather approach Attached is kernel patch for UDP Fragmentation Offload (UFO) feature. 1. This patch incorporate the review comments by Jeff Garzik. 2. Renamed USO as UFO (UDP Fragmentation Offload) 3. udp sendfile support with UFO This patches uses scatter-gather feature of skb to generate large UDP datagram. Below is a "how-to" on changes required in network device driver to use the UFO interface. UDP Fragmentation Offload (UFO) Interface: ------------------------------------------- UFO is a feature wherein the Linux kernel network stack will offload the IP fragmentation functionality of large UDP datagram to hardware. This will reduce the overhead of stack in fragmenting the large UDP datagram to MTU sized packets 1) Drivers indicate their capability of UFO using dev->features \|= NETIF_F_UFO \| NETIF_F_HW_CSUM \| NETIF_F_SG NETIF_F_HW_CSUM is required for UFO over ipv6. 2) UFO packet will be submitted for transmission using driver xmit routine. UFO packet will have a non-zero value for "skb_shinfo(skb)->ufo_size" skb_shinfo(skb)->ufo_size will indicate the length of data part in each IP fragment going out of the adapter after IP fragmentation by hardware. skb->data will contain MAC/IP/UDP header and skb_shinfo(skb)->frags[] contains the data payload. The skb->ip_summed will be set to CHECKSUM_HW indicating that hardware has to do checksum calculation. Hardware should compute the UDP checksum of complete datagram and also ip header checksum of each fragmented IP packet. For IPV6 the UFO provides the fragment identification-id in skb_shinfo(skb)->ip6_frag_id. The adapter should use this ID for generating IPv6 fragments. Signed-off-by: Ananda Raju <ananda.raju@neterion.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (forwarded) Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-28 16:30:00 -02:00
Linus Torvalds	236fa08168	Merge master.kernel.org:/pub/scm/linux/kernel/git/acme/net-2.6.15	2005-10-28 08:50:37 -07:00
Herbert Xu	2ad41065d9	[TCP]: Clear stale pred_flags when snd_wnd changes This bug is responsible for causing the infamous "Treason uncloaked" messages that's been popping up everywhere since the printk was added. It has usually been blamed on foreign operating systems. However, some of those reports implicate Linux as both systems are running Linux or the TCP connection is going across the loopback interface. In fact, there really is a bug in the Linux TCP header prediction code that's been there since at least 2.1.8. This bug was tracked down with help from Dale Blount. The effect of this bug ranges from harmless "Treason uncloaked" messages to hung/aborted TCP connections. The details of the bug and fix is as follows. When snd_wnd is updated, we only update pred_flags if tcp_fast_path_check succeeds. When it fails (for example, when our rcvbuf is used up), we will leave pred_flags with an out-of-date snd_wnd value. When the out-of-date pred_flags happens to match the next incoming packet we will again hit the fast path and use the current snd_wnd which will be wrong. In the case of the treason messages, it just happens that the snd_wnd cached in pred_flags is zero while tp->snd_wnd is non-zero. Therefore when a zero-window packet comes in we incorrectly conclude that the window is non-zero. In fact if the peer continues to send us zero-window pure ACKs we will continue making the same mistake. It's only when the peer transmits a zero-window packet with data attached that we get a chance to snap out of it. This is what triggers the treason message at the next retransmit timeout. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-27 15:11:04 -02:00
David Engel	dcab5e1eec	[IPV4]: Fix setting broadcast for SIOCSIFNETMASK Fix setting of the broadcast address when the netmask is set via SIOCSIFNETMASK in Linux 2.6. The code wanted the old value of ifa->ifa_mask but used it after it had already been overwritten with the new value. Signed-off-by: David Engel <gigem@comcast.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-26 01:20:21 -02:00
Jayachandran C	0d0d2bba97	[IPV4]: Remove dead code from ip_output.c skb_prev is assigned from skb, which cannot be NULL. This patch removes the unnecessary NULL check. Signed-off-by: Jayachandran C. <c.jayachandran at gmail.com> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-26 00:58:54 -02:00
Herbert Xu	1371e37da2	[IPV4]: Kill redundant rcu_dereference on fa_info This patch kills a redundant rcu_dereference on fa->fa_info in fib_trie.c. As this dereference directly follows a list_for_each_entry_rcu line, we have already taken a read barrier with respect to getting an entry from the list. This read barrier guarantees that all values read out of fa are valid. In particular, the contents of structure pointed to by fa->fa_info is initialised before fa->fa_info is actually set (see fn_trie_insert); the setting of fa->fa_info itself is further separated with a write barrier from the insertion of fa into the list. Therefore by taking a read barrier after obtaining fa from the list (which is given by list_for_each_entry_rcu), we can be sure that fa->fa_info contains a valid pointer, as well as the fact that the data pointed to by fa->fa_info is itself valid. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Paul E. McKenney <paulmck@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-26 00:25:03 -02:00
Harald Welte	eed75f191d	[NETFILTER] ip_conntrack: Make "hashsize" conntrack parameter writable It's fairly simple to resize the hash table, but currently you need to remove and reinsert the module. That's bad (we lose connection state). Harald has even offered to write a daemon which sets this based on load. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-26 00:19:27 -02:00
John Hawkes	670c02c2bf	[NET]: Wider use of for_each_*cpu() In 'net' change the explicit use of for-loops and NR_CPUS into the general for_each_cpu() or for_each_online_cpu() constructs, as appropriate. This widens the scope of potential future optimizations of the general constructs, as well as takes advantage of the existing optimizations of first_cpu() and next_cpu(), which is advantageous when the true CPU count is much smaller than NR_CPUS. Signed-off-by: John Hawkes <hawkes@sgi.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-25 23:54:01 -02:00
Julian Anastasov	c98d80edc8	[SK_BUFF]: ipvs_property field must be copied IPVS used flag NFC_IPVS_PROPERTY in nfcache but as now nfcache was removed the new flag 'ipvs_property' still needs to be copied. This patch should be included in 2.6.14. Further comments from Harald Welte: Sorry, seems like the bug was introduced by me. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-22 17:06:01 -02:00
Herbert Xu	b2cc99f04c	[TCP] Allow len == skb->len in tcp_fragment It is legitimate to call tcp_fragment with len == skb->len since that is done for FIN packets and the FIN flag counts as one byte. So we should only check for the len > skb->len case. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Arnaldo Carvalho de Melo <acme@mandriva.com>	2005-10-20 17:13:13 -02:00
Herbert Xu	046d20b739	[TCP]: Ratelimit debugging warning. Better safe than sorry. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-13 14:42:24 -07:00
David S. Miller	c8923c6b85	[NETFILTER]: Fix OOPSes on machines with discontiguous cpu numbering. Original patch by Harald Welte, with feedback from Herbert Xu and testing by S�bastien Bernard. EBTABLES, ARP tables, and IP/IP6 tables all assume that cpus are numbered linearly. That is not necessarily true. This patch fixes that up by calculating the largest possible cpu number, and allocating enough per-cpu structure space given that. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-13 14:41:23 -07:00
Herbert Xu	9ff5c59ce2	[TCP]: Add code to help track down "BUG at net/ipv4/tcp_output.c:438!" This is the second report of this bug. Unfortunately the first reporter hasn't been able to reproduce it since to provide more debugging info. So let's apply this patch for 2.6.14 to 1) Make this non-fatal. 2) Provide the info we need to track it down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-12 15:59:39 -07:00
Arnaldo Carvalho de Melo	eeb2b85606	[TWSK]: Grab the module refcount for timewait sockets This is required to avoid unloading a module that has active timewait sockets, such as DCCP. Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:25:23 -07:00
Pablo Neira Ayuso	061cb4a0ec	[NETFILTER] ctnetlink: add support to change protocol info This patch add support to change the state of the private protocol information via conntrack_netlink. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:23:46 -07:00
Pablo Neira Ayuso	3392315375	[NETFILTER] ctnetlink: allow userspace to change TCP state This patch adds the ability of changing the state a TCP connection. I know that this must be used with care but it's required to provide a complete conntrack creation via conntrack_netlink. So I'll document this aspect on the upcoming docs. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:23:28 -07:00
Harald Welte	a051a8f730	[NETFILTER]: Use only 32bit counters for CONNTRACK_ACCT Initially we used 64bit counters for conntrack-based accounting, since we had no event mechanism to tell userspace that our counters are about to overflow. With nfnetlink_conntrack, we now have such a event mechanism and thus can save 16bytes per connection. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:21:10 -07:00
Herbert Xu	d4875b049b	[IPSEC] Fix block size/MTU bugs in ESP This patch fixes the following bugs in ESP: * Fix transport mode MTU overestimate. This means that the inner MTU is smaller than it needs be. Worse yet, given an input MTU which is a multiple of 4 it will always produce an estimate which is not a multiple of 4. For example, given a standard ESP/3DES/MD5 transform and an MTU of 1500, the resulting MTU for transport mode is 1462 when it should be 1464. The reason for this is because IP header lengths are always a multiple of 4 for IPv4 and 8 for IPv6. * Ensure that the block size is at least 4. This is required by RFC2406 and corresponds to what the esp_output function does. At the moment this only affects crypto_null as its block size is 1. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:11:34 -07:00
Herbert Xu	a02a64223e	[IPSEC]: Use ALIGN macro in ESP This patch uses the macro ALIGN in all the applicable spots for ESP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 21:11:08 -07:00
Pablo Neira Ayuso	e1c73b78e3	[NETFILTER] ctnetlink: add one nesting level for TCP state To keep consistency, the TCP private protocol information is nested attributes under CTA_PROTOINFO_TCP. This way the sequence of attributes to access the TCP state information looks like here below: CTA_PROTOINFO CTA_PROTOINFO_TCP CTA_PROTOINFO_TCP_STATE instead of: CTA_PROTOINFO CTA_PROTOINFO_TCP_STATE Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:55:49 -07:00
Pablo Neira Ayuso	a1bcc3f268	[NETFILTER] ctnetlink: ICMP ID is not mandatory The ID is only required by ICMP type 8 (echo), so it's not mandatory for all sort of ICMP connections. This patch makes mandatory only the type and the code for ICMP netlink messages. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:53:16 -07:00
Harald Welte	d000eaf772	[NETFILTER] conntrack_netlink: Fix endian issue with status from userspace When we send "status" from userspace, we forget to convert the endianness. This patch adds the reqired conversion. Thanks to Pablo Neira for discovering this. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:52:51 -07:00
Harald Welte	f40863cec8	[NETFILTER] ipt_ULOG: Mark ipt_ULOG as OBSOLETE Similar to nfnetlink_queue and ip_queue, we mark ipt_ULOG as obsolete. This should have been part of the original nfnetlink_log merge, but I somehow missed it. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:51:53 -07:00
Harald Welte	85d9b05d9b	[NETFILTER] PPTP helper: Add missing Kconfig dependency PPTP should not be selectable without conntrack enabled Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-10 20:47:42 -07:00
Al Viro	dd0fc66fb3	[PATCH] gfp flags annotations - part 1 - added typedef unsigned int __nocast gfp_t; - replaced __nocast uses for gfp flags with gfp_t - it gives exactly the same warnings as far as sparse is concerned, doesn't change generated code (from gcc point of view we replaced unsigned int with typedef) and documents what's going on far better. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-10-08 15:00:57 -07:00
Stephen Hemminger	42a39450f8	[TCP]: BIC coding bug in Linux 2.6.13 Missing parenthesis in causes BIC to be slow in increasing congestion window. Spotted by Injong Rhee. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-05 12:09:31 -07:00
Randy Dunlap	8eea00a44d	[IPVS]: fix sparse gfp nocast warnings From: Randy Dunlap <rdunlap@xenotime.net> Fix implicit nocast warnings in ip_vs code: net/ipv4/ipvs/ip_vs_app.c:631:54: warning: implicit cast to nocast type Signed-off-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-04 22:42:15 -07:00
Horst H. von Brand	a5181ab06d	[NETFILTER]: Fix Kconfig typo Signed-off-by: Horst H. von Brand <vonbrand@inf.utfsm.cl> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-04 15:58:56 -07:00
Robert Olsson	e6308be85a	[IPV4]: fib_trie root-node expansion The patch below introduces special thresholds to keep root node in the trie large. This gives a flatter tree at the cost of a modest memory increase. Overall it seems to be gain and this was also proposed by one the authors of the paper in recent a seminar. Main table after loading 123 k routes. Aver depth: 3.30 Max depth: 9 Root-node size 12 bits Total size: 4044 kB With the patch: Aver depth: 2.78 Max depth: 8 Root-node size 15 bits Total size: 4150 kB An increase of 8-10% was seen in forwading performance for an rDoS attack. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-04 13:01:58 -07:00
David S. Miller	7ce312467e	[IPV4]: Update icmp sysctl docs and disable broadcast ECHO/TIMESTAMP by default It's not a good idea to be smurf'able by default. The few people who need this can turn it on. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 16:07:30 -07:00
Herbert Xu	e5ed639913	[IPV4]: Replace __in_dev_get with __in_dev_get_rcu/rtnl The following patch renames __in_dev_get() to __in_dev_get_rtnl() and introduces __in_dev_get_rcu() to cover the second case. 1) RCU with refcnt should use in_dev_get(). 2) RCU without refcnt should use __in_dev_get_rcu(). 3) All others must hold RTNL and use __in_dev_get_rtnl(). There is one exception in net/ipv4/route.c which is in fact a pre-existing race condition. I've marked it as such so that we remember to fix it. This patch is based on suggestions and prior work by Suzanne Wood and Paul McKenney. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:35:55 -07:00
Herbert Xu	444fc8fc3a	[IPV4]: Fix "Proxy ARP seems broken" Meelis Roos <mroos@linux.ee> wrote: > RK> My firewall setup relies on proxyarp working. However, with 2.6.14-rc3, > RK> it appears to be completely broken. The firewall is 212.18.232.186, > > Same here with some kernel between 14-rc2 and 14-rc3 - no reposnse to > ARP on a proxyarp gateway. Sorry, no exact revison and no more debugging > yet since it'a a production gateway. The breakage is caused by the change to use the CB area for flagging whether a packet has been queued due to proxy_delay. This area gets cleared every time arp_rcv gets called. Unfortunately packets delayed due to proxy_delay also go through arp_rcv when they are reprocessed. In fact, I can't think of a reason why delayed proxy packets should go through netfilter again at all. So the easiest solution is to bypass that and go straight to arp_process. This is essentially what would've happened before netfilter support was added to ARP. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:18:10 -07:00
Eric Dumazet	81c3d5470e	[INET]: speedup inet (tcp/dccp) lookups Arnaldo and I agreed it could be applied now, because I have other pending patches depending on this one (Thank you Arnaldo) (The other important patch moves skc_refcnt in a separate cache line, so that the SMP/NUMA performance doesnt suffer from cache line ping pongs) 1) First some performance data : -------------------------------- tcp_v4_rcv() wastes a lot of time in __inet_lookup_established() The most time critical code is : sk_for_each(sk, node, &head->chain) { if (INET_MATCH(sk, acookie, saddr, daddr, ports, dif)) goto hit; /* You sunk my battleship! / } The sk_for_each() does use prefetch() hints but only the begining of "struct sock" is prefetched. As INET_MATCH first comparison uses inet_sk(__sk)->daddr, wich is far away from the begining of "struct sock", it has to bring into CPU cache cold cache line. Each iteration has to use at least 2 cache lines. This can be problematic if some chains are very long. 2) The goal ----------- The idea I had is to change things so that INET_MATCH() may return FALSE in 99% of cases only using the data already in the CPU cache, using one cache line per iteration. 3) Description of the patch --------------------------- Adds a new 'unsigned int skc_hash' field in 'struct sock_common', filling a 32 bits hole on 64 bits platform. struct sock_common { unsigned short skc_family; volatile unsigned char skc_state; unsigned char skc_reuse; int skc_bound_dev_if; struct hlist_node skc_node; struct hlist_node skc_bind_node; atomic_t skc_refcnt; + unsigned int skc_hash; struct proto skc_prot; }; Store in this 32 bits field the full hash, not masked by (ehash_size - 1) Using this full hash as the first comparison done in INET_MATCH permits us immediatly skip the element without touching a second cache line in case of a miss. Suppress the sk_hashent/tw_hashent fields since skc_hash (aliased to sk_hash and tw_hash) already contains the slot number if we mask with (ehash_size - 1) File include/net/inet_hashtables.h 64 bits platforms : #define INET_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) ((((__u64 )&(inet_sk(__sk)->daddr)))== (__cookie)) && \ ((((__u32 )&(inet_sk(__sk)->dport))) == (__ports)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) 32bits platforms: #define TCP_IPV4_MATCH(__sk, __hash, __cookie, __saddr, __daddr, __ports, __dif)\ (((__sk)->sk_hash == (__hash)) && \ (inet_sk(__sk)->daddr == (__saddr)) && \ (inet_sk(__sk)->rcv_saddr == (__daddr)) && \ (!((__sk)->sk_bound_dev_if) \|\| ((__sk)->sk_bound_dev_if == (__dif)))) - Adds a prefetch(head->chain.first) in __inet_lookup_established()/__tcp_v4_check_established() and __inet6_lookup_established()/__tcp_v6_check_established() and __dccp_v4_check_established() to bring into cache the first element of the list, before the {read\|write}_lock(&head->lock); Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Acked-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 14:13:38 -07:00
Herbert Xu	325ed82393	[NET]: Fix packet timestamping. I've found the problem in general. It affects any 64-bit architecture. The problem occurs when you change the system time. Suppose that when you boot your system clock is forward by a day. This gets recorded down in skb_tv_base. You then wind the clock back by a day. From that point onwards the offset will be negative which essentially overflows the 32-bit variables they're stored in. In fact, why don't we just store the real time stamp in those 32-bit variables? After all, we're not going to overflow for quite a while yet. When we do overflow, we'll need a better solution of course. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-10-03 13:57:23 -07:00
Alexey Kuznetsov	09e9ec8711	[TCP]: Don't over-clamp window in tcp_clamp_window() From: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Handle better the case where the sender sends full sized frames initially, then moves to a mode where it trickles out small amounts of data at a time. This known problem is even mentioned in the comments above tcp_grow_window() in tcp_input.c, specifically: ... * The scheme does not work when sender sends good segments opening * window and then starts to feed us spagetti. But it should work * in common situations. Otherwise, we have to rely on queue collapsing. ... When the sender gives full sized frames, the "struct sk_buff" overhead from each packet is small. So we'll advertize a larger window. If the sender moves to a mode where small segments are sent, this ratio becomes tilted to the other extreme and we start overrunning the socket buffer space. tcp_clamp_window() tries to address this, but it's clamping of tp->window_clamp is a wee bit too aggressive for this particular case. Fix confirmed by Ion Badulescu. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-29 17:17:15 -07:00
David S. Miller	01ff367e62	[TCP]: Revert `6b251858d3` But retain the comment fix. Alexey Kuznetsov has explained the situation as follows: -------------------- I think the fix is incorrect. Look, the RFC function init_cwnd(mss) is not continuous: f.e. for mss=1095 it needs initial window 10954, but for mss=1096 it is 10963. We do not know exactly what mss sender used for calculations. If we advertised 1096 (and calculate initial window 31096), the sender could limit it to some value < 1096 and then it will need window his_mss4 > 31096 to send initial burst. See? So, the honest function for inital rcv_wnd derived from tcp_init_cwnd() is: init_rcv_wnd(mss)= min { init_cwnd(mss1)mss1 for mss1 <= mss } It is something sort of: if (mss < 1096) return mss4; if (mss < 10962) return 10964; return mss2; (I just scrablled a graph of piece of paper, it is difficult to see or to explain without this) I selected it differently giving more window than it is strictly required. Initial receive window must be large enough to allow sender following to the rfc (or just setting initial cwnd to 2) to send initial burst. But besides that it is arbitrary, so I decided to give slack space of one segment. Actually, the logic was: If mss is low/normal (<=ethernet), set window to receive more than initial burst allowed by rfc under the worst conditions i.e. mss4. This gives slack space of 1 segment for ethernet frames. For msses slighlty more than ethernet frame, take 3. Try to give slack space of 1 frame again. If mss is huge, force 2mss. No slack space. Value 14603 is really confusing. Minimal one is 10962, but besides that it is an arbitrary value. It was meant to be ~4096. 14603 is just the magic number from RFC, 14603 = 1095*4 is the magic :-), so that I guess hands typed this themselves. -------------------- Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-29 17:07:20 -07:00
David S. Miller	6b251858d3	[TCP]: Fix init_cwnd calculations in tcp_select_initial_window() Match it up to what RFC2414 really specifies. Noticed by Rick Jones. Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-28 16:31:48 -07:00
Harald Welte	188bab3ae0	[NETFILTER]: Fix invalid module autoloading by splitting iptable_nat When you've enabled conntrack and NAT as a module (standard case in all distributions), and you've also enabled the new conntrack netlink interface, loading ip_conntrack_netlink.ko will auto-load iptable_nat.ko. This causes a huge performance penalty, since for every packet you iterate the nat code, even if you don't want it. This patch splits iptable_nat.ko into the NAT core (ip_nat.ko) and the iptables frontend (iptable_nat.ko). Threfore, ip_conntrack_netlink.ko will only pull ip_nat.ko, but not the frontend. ip_nat.ko will "only" allocate some resources, but not affect runtime performance. This separation is also a nice step in anticipation of new packet filters (nf-hipac, ipset, pkttables) being able to use the NAT core. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-26 15:25:11 -07:00
Harald Welte	8ddec7460d	[NETFILTER] ip_conntrack: Update event cache when status changes The GRE, SCTP and TCP protocol helpers did not call ip_conntrack_event_cache() when updating ct->status. This patch adds the respective calls. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-24 16:56:08 -07:00
Harald Welte	d67b24c40f	[NETFILTER]: Fix ip[6]t_NFQUEUE Kconfig dependency We have to introduce a separate Kconfig menu entry for the NFQUEUE targets. They cannot "just" depend on nfnetlink_queue, since nfnetlink_queue could be linked into the kernel, whereas iptables can be a module. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-24 16:52:03 -07:00
Harald Welte	1dfbab5949	[NETFILTER] Fix conntrack event cache deadlock/oops This patch fixes a number of bugs. It cannot be reasonably split up in multiple fixes, since all bugs interact with each other and affect the same function: Bug #1: The event cache code cannot be called while a lock is held. Therefore, the call to ip_conntrack_event_cache() within ip_ct_refresh_acct() needs to be moved outside of the locked section. This fixes a number of 2.6.14-rcX oops and deadlock reports. Bug #2: We used to call ct_add_counters() for unconfirmed connections without holding a lock. Since the add operations are not atomic, we could race with another CPU. Bug #3: ip_ct_refresh_acct() lost REFRESH events in some cases where refresh (and the corresponding event) are desired, but no accounting shall be performed. Both, evenst and accounting implicitly depended on the skb parameter bein non-null. We now re-introduce a non-accounting "ip_ct_refresh()" variant to explicitly state the desired behaviour. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-22 23:46:57 -07:00
Alexey Dobriyan	67497205b1	[NETFILTER] Fix sparse endian warnings in pptp helper Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-22 23:45:24 -07:00
Harald Welte	0ae5d253ad	[NETFILTER] fix DEBUG statement in PPTP helper As noted by Alexey Dobriyan, the DEBUGP statement prints the wrong callID. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-22 23:44:58 -07:00
Herbert Xu	83ca28befc	[TCP]: Adjust Reno SACK estimate in tcp_fragment Since the introduction of TSO pcount a year ago, it has been possible for tcp_fragment() to cause packets_out to decrease. Prior to that, tcp_retrans_try_collapse() was the only way for that to happen on the retransmission path. When this happens with Reno, it is possible for sasked_out to become invalid because it is only an estimate and not tied to any particular packet on the retransmission queue. Therefore we need to adjust sacked_out as well as left_out in the Reno case. The following patch does exactly that. This bug is pretty difficult to trigger in practice though since you need a SACKless peer with a retransmission that occurs just as the cached MTU value expires. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-22 23:32:56 -07:00
Stephen Hemminger	7957aed72b	[TCP]: Set default congestion control correctly for incoming connections. Patch from Joel Sing to fix the default congestion control algorithm for incoming connections. If a new congestion control handler is added (via module), it should become the default for new connections. Instead, the incoming connections use reno. The cause is incorrect initialisation causes the tcp_init_congestion_control() function to return after the initial if test fails. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Ian McDonald <imcdnzl@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-21 00:19:46 -07:00
Stephen Hemminger	78c6671a88	[FIB_TRIE]: message cleanup Cleanup the printk's in fib_trie: * Convert a couple of places in the dump code to BUG_ON * Put log level's on each message The version message really needed the message since it leaks out on the pretty Fedora bootup. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Acked-by: Robert Olsson <Robert.Olsson@data.slu.se>, Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-21 00:15:39 -07:00
Linus Torvalds	875bd5ab01	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2005-09-19 18:46:11 -07:00
Mark J Cox	6d1cfe3f17	[PATCH] raw_sendmsg DoS on 2.6 Fix unchecked __get_user that could be tricked into generating a memory read on an arbitrary address. The result of the read is not returned directly but you may be able to divine some information about it, or use the read to cause a crash on some architectures by reading hardware state. CAN-2004-2492. Fix from Al Viro, ack from Dave Miller. Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-09-19 18:45:42 -07:00
Herbert Xu	e14c3caf60	[TCP]: Handle SACK'd packets properly in tcp_fragment(). The problem is that we're now calling tcp_fragment() in a context where the packets might be marked as SACKED_ACKED or SACKED_RETRANS. This was not possible before as you never retransmitted packets that are so marked. Because of this, we need to adjust sacked_out and retrans_out in tcp_fragment(). This is exactly what the following patch does. We also need to preserve the SACKED_ACKED/SACKED_RETRANS marking if they exist. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-19 18:18:38 -07:00
Harald Welte	8922bc93aa	[NETFILTER]: Export ip_nat_port_{nfattr_to_range,range_to_nfattr} Those exports are needed by the PPTP helper following in the next couple of changes. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-19 15:35:57 -07:00
Patrick McHardy	a41bc00234	[NETFILTER]: Rename misnamed function Both __ip_conntrack_expect_find and ip_conntrack_expect_find_get take a reference to the expectation, the difference is that callers of __ip_conntrack_expect_find must hold ip_conntrack_lock. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-19 15:35:31 -07:00
Harald Welte	926b50f92a	[NETFILTER]: Add new PPTP conntrack and NAT helper This new "version 3" PPTP conntrack/nat helper is finally ready for mainline inclusion. Special thanks to lots of last-minute bugfixing by Patric McHardy. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-19 15:33:08 -07:00
Robert Olsson	772cb712b1	[IPV4]: fib_trie RCU refinements * This patch is from Paul McKenney's RCU reviewing. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-19 15:31:18 -07:00
Robert Olsson	1d25cd6cc2	[IPV4]: fib_trie tnode stats refinements * Prints the route tnode and set the stats level deepth as before. Signed-off-by: Robert Olsson <robert.olsson@its.uu.se> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-19 15:29:52 -07:00
Harald Welte	628f87f3d5	[NETFILTER]: Solve Kconfig dependency problem As suggested by Roman Zippel. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-18 00:33:02 -07:00
Harald Welte	777ed97f3e	[NETFILTER] Fix Kconfig dependencies for nfnetlink/ctnetlink Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-17 00:41:02 -07:00
Harald Welte	a8f39143ac	[NETFILTER]: Fix oops in conntrack event cache ip_ct_refresh_acct() can be called without a valid "skb" pointer. This used to work, since ct_add_counters() deals with that fact. However, the recently-added event cache doesn't handle this at all. This patch is a quick fix that is supposed to be replaced soon by a cleaner solution during the pending redesign of the event cache. Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-16 17:00:38 -07:00
KOVACS Krisztian	136e92bbec	[NETFILTER] CLUSTERIP: use a bitmap to store node responsibility data Instead of maintaining an array containing a list of nodes this instance is responsible for let's use a simple bitmap. This provides the following features: * clusterip_responsible() and the add_node()/delete_node() operations become very simple and don't need locking * the config structure is much smaller In spite of the completely different internal data representation the user-space interface remains almost unchanged; the only difference is that the proc file does not list nodes in the order they were added. (The target info structure remains the same.) Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-16 17:00:04 -07:00
KOVACS Krisztian	4451362445	[NETFILTER] CLUSTERIP: introduce reference counting for entries The CLUSTERIP target creates a procfs entry for all different cluster IPs. Although more than one rules can refer to a single cluster IP (and thus a single config structure), removal of the procfs entry is done unconditionally in destroy(). In more complicated situations involving deferred dereferencing of the config structure by procfs and creating a new rule with the same cluster IP it's also possible that no entry will be created for the new rule. This patch fixes the problem by counting the number of entries referencing a given config structure and moving the config list manipulation and procfs entry deletion parts to the clusterip_config_entry_put() function. Signed-off-by: KOVACS Krisztian <hidden@balabit.hu> Signed-off-by: Harald Welte <laforge@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-16 16:59:46 -07:00
Julian Anastasov	87375ab47c	[IPVS]: ip_vs_ftp breaks connections using persistence ip_vs_ftp when loaded can create NAT connections with unknown client port for passive FTP. For such expectations we lookup with cport=0 on incoming packet but it matches the format of the persistence templates causing packets to other persistent virtual servers to be forwarded to real server without creating connection. Later the reply packets are treated as foreign and not SNAT-ed. This patch changes the connection lookup for packets from clients: * introduce IP_VS_CONN_F_TEMPLATE connection flag to mark the connection as template * create new connection lookup function just for templates - ip_vs_ct_in_get * make sure ip_vs_conn_in_get hits only connections with IP_VS_CONN_F_NO_CPORT flag set when s_port is 0. By this way we avoid returning template when looking for cport=0 (ftp) Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-14 21:08:51 -07:00
Julian Anastasov	f5e229db9c	[IPVS]: Really invalidate persistent templates Agostino di Salle noticed that persistent templates are not invalidated due to buggy optimization. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-14 21:04:23 -07:00
Denis Lukianov	de9daad90e	[MCAST]: Fix MCAST_EXCLUDE line dupes This patch fixes line dupes at /ipv4/igmp.c and /ipv6/mcast.c in the 2.6 kernel, where MCAST_EXCLUDE is mistakenly used instead of MCAST_INCLUDE. Signed-off-by: Denis Lukianov <denis@voxelsoft.com> Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-14 20:53:42 -07:00
Herbert Xu	3c05d92ed4	[TCP]: Compute in_sacked properly when we split up a TSO frame. The problem is that the SACK fragmenting code may incorrectly call tcp_fragment() with a length larger than the skb->len. This happens when the skb on the transmit queue completely falls to the LHS of the SACK. And add a BUG() check to tcp_fragment() so we can spot this kind of error more quickly in the future. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-14 20:50:35 -07:00
Patrick McHardy	adcb5ad1e5	[NETFILTER]: Fix DHCP + MASQUERADE problem In 2.6.13-rcX the MASQUERADE target was changed not to exclude local packets for better source address consistency. This breaks DHCP clients using UDP sockets when the DHCP requests are caught by a MASQUERADE rule because the MASQUERADE target drops packets when no address is configured on the outgoing interface. This patch makes it ignore packets with a source address of 0. Thanks to Rusty for this suggestion. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-13 13:49:15 -07:00
Patrick McHardy	cd0bf2d796	[NETFILTER]: Fix rcu race in ipt_REDIRECT Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-13 13:48:58 -07:00
Patrick McHardy	e7fa1bd93f	[NETFILTER]: Simplify netbios helper Don't parse the packet, the data is already available in the conntrack structure. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-13 13:48:34 -07:00
Patrick McHardy	5cb30640ce	[NETFILTER]: Use correct type for "ports" module parameter With large port numbers the helper_names buffer can overflow. Noticed by Samir Bellabes <sbellabes@mandriva.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-13 13:48:00 -07:00
Nishanth Aravamudan	121caf577d	[NET]: fix-up schedule_timeout() usage Use schedule_timeout_{,un}interruptible() instead of set_current_state()/schedule_timeout() to reduce kernel size. Also use human-time conversion functions instead of hard-coded division to avoid rounding issues. Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-12 14:15:34 -07:00
Herbert Xu	e130af5dab	[TCP]: Fix double adjustment of tp->{lost,left}_out in tcp_fragment(). There is an extra left_out/lost_out adjustment in tcp_fragment which means that the lost_out accounting is always wrong. This patch removes that chunk of code. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-10 17:19:09 -07:00
Linus Torvalds	1d8674edb5	Merge master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6	2005-09-09 14:25:22 -07:00
Ingo Molnar	8d06afab73	[PATCH] timer initialization cleanup: DEFINE_TIMER Clean up timer initialization by introducing DEFINE_TIMER a'la DEFINE_SPINLOCK. Build and boot-tested on x86. A similar patch has been been in the -RT tree for some time. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-09-09 14:03:48 -07:00
Dipankar Sarma	b835996f62	[PATCH] files: lock-free fd look-up With the use of RCU in files structure, the look-up of files using fds can now be lock-free. The lookup is protected by rcu_read_lock()/rcu_read_unlock(). This patch changes the readers to use lock-free lookup. Signed-off-by: Maneesh Soni <maneesh@in.ibm.com> Signed-off-by: Ravikiran Thirumalai <kiran_th@gmail.com> Signed-off-by: Dipankar Sarma <dipankar@in.ibm.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>	2005-09-09 13:57:55 -07:00
Stephen Hemminger	cb7b593c2c	[IPV4] fib_trie: fix proc interface Create one iterator for walking over FIB trie, and use it for all the /proc functions. Add a /proc/net/route output for backwards compatibility with old applications. Make initialization of fib_trie same as fib_hash so no #ifdef is needed in af_inet.c Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=5209 Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-09 13:35:42 -07:00
Patrick McHardy	e104411b82	[XFRM]: Always release dst_entry on error in xfrm_lookup Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-08 15:11:55 -07:00
Herbert Xu	cf0b450cd5	[TCP]: Fix off by one in tcp_fragment() "already sent" test. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-08 15:10:52 -07:00
Andrew Morton	3a93481589	[NETFILTER]: ip_conntrack_netbios_ns.c gcc-2.95.x build fix gcc-2.95.x can't do this sort of initialisation Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-08 13:36:34 -07:00
Julian Anastasov	ce723d8e04	[IPV4]: Fix refcount damaging in net/ipv4/route.c One such place that can damage the dst refcnts is route.c with CONFIG_IP_ROUTE_MULTIPATH_CACHED enabled, i don't see the user's .config. In this new code i see that rt_intern_hash is called before dst->refcnt is set to 1, dst is the 2nd arg to rt_intern_hash. Arg 2 of rt_intern_hash must come with refcnt 1 as it is added to table or dropped depending on error/add/update. One such example is ip_mkroute_input where __mkroute_input return rth with refcnt 0 which is provided to rt_intern_hash. ip_mkroute_output looks like a 2nd such place. Appending untested patch for comments and review. The idea is to put previous reference as we are going to return next result/error. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-08 13:34:47 -07:00
Stephen Hemminger	e308e25c97	[IPV4] udp: trim forgets about CHECKSUM_HW A UDP packet may contain extra data that needs to be trimmed off. But when doing so, UDP forgets to fixup the skb checksum if CHECKSUM_HW is being used. I think this explains the case of a NFS receive using skge driver causing 'udp hw checksum failures' when interacting with a crufty settop box. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-08 12:32:21 -07:00
Stephen Hemminger	48bc41a49c	[IPV4]: Reassembly trim not clearing CHECKSUM_HW This was found by inspection while looking for checksum problems with the skge driver that sets CHECKSUM_HW. It did not fix the problem, but it looks like it is needed. If IP reassembly is trimming an overlapping fragment, it should reset (or adjust) the hardware checksum flag on the skb. Signed-off-by: Stephen Hemminger <shemminger@osdl.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-06 15:51:48 -07:00
Patrick McHardy	e446639939	[NETFILTER]: Missing unlock in TCP connection tracking error path Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-06 15:11:10 -07:00
Pablo Neira Ayuso	49719eb355	[NETFILTER]: kill __ip_ct_expect_unlink_destroy The following patch kills __ip_ct_expect_unlink_destroy and export unlink_expect as ip_ct_unlink_expect. As it was discussed [1], the function __ip_ct_expect_unlink_destroy is a bit confusing so better do the following sequence: ip_ct_destroy_expect and ip_conntrack_expect_put. [1] https://lists.netfilter.org/pipermail/netfilter-devel/2005-August/020794.html Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-06 15:10:46 -07:00
Pablo Neira Ayuso	91c46e2e60	[NETFILTER]: Don't increase master refcount on expectations As it's been discussed [1][2]. We shouldn't increase the master conntrack refcount for non-fulfilled conntracks. During the conntrack destruction, the expectations are always killed before the conntrack itself, this guarantees that there won't be any orphan expectation. [1]https://lists.netfilter.org/pipermail/netfilter-devel/2005-August/020783.html [2]https://lists.netfilter.org/pipermail/netfilter-devel/2005-August/020904.html Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-06 15:10:23 -07:00
Patrick McHardy	03486a4f83	[NETFILTER]: Handle NAT module load race When the NAT module is loaded when connections are already confirmed it must not change their tuples anymore. This is especially important with CONFIG_NETFILTER_DEBUG, the netfilter listhelp functions will refuse to remove an entry from a list when it can not be found on the list, so when a changed tuple hashes to a new bucket the entry is kept in the list until and after the conntrack is freed. Allocate the exact conntrack tuple for NAT for already confirmed connections or drop them if that fails. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-06 15:09:43 -07:00
Yasuyuki Kozakai	31c913e7fd	[NETFILTER]: Fix CONNMARK Kconfig dependency Connection mark tracking support is one of the feature in connection tracking, so IP_NF_CONNTRACK_MARK depends on IP_NF_CONNTRACK. Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2005-09-06 15:09:20 -07:00

1 2 3 4 5 ...

521 Commits (a54dfd2ce03446a180e5fb7c30e8a5307f276567)