linux

q3k/linux

Author	SHA1	Message	Date
Pavel Emelyanov	9f26b3add3	inet: add struct net argument to inet_ehashfn Although this hash takes addresses into account, the ehash chains can also be too long when, for instance, communications via lo occur. So, prepare the inet_hashfn to take struct net into account. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:13:27 -07:00
Pavel Emelyanov	2086a65078	inet: add struct net argument to inet_lhashfn Listening-on-one-port sockets in many namespaces produce long chains in the listening_hash-es, so prepare the inet_lhashfn to take struct net into account. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:13:08 -07:00
Pavel Emelyanov	7f635ab71e	inet: add struct net argument to inet_bhashfn Binding to some port in many namespaces may create too long chains in bhash-es, so prepare the hashfn to take struct net into account. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:12:49 -07:00
Pavel Emelyanov	19c7578fb2	udp: add struct net argument to udp_hashfn Every caller already has this one. The new argument is currently unused, but this will be fixed shortly. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:12:29 -07:00
Pavel Emelyanov	e31634931d	udp: provide a struct net pointer for __udp[46]_lib_mcast_deliver They both calculate the hash chain, but currently do not have a struct net pointer, so pass one there via additional argument, all the more so their callers already have such. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:12:11 -07:00
Pavel Emelyanov	d6266281f8	udp: introduce a udp_hashfn function Currently the chain to store a UDP socket is calculated with simple (x & (UDP_HTABLE_SIZE - 1)). But taking net into account would make this calculation a bit more complex, so moving it into a function would help. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:11:50 -07:00
Rami Rosen	a9d246dbb0	ipv4: Remove unused definitions in net/ipv4/tcp_ipv4.c. 1) Remove ICMP_MIN_LENGTH, as it is unused. 2) Remove unneeded tcp_v4_send_check() declaration. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:07:16 -07:00
Eric Dumazet	68be802cd5	raw: Restore /proc/net/raw correct behavior I just noticed "cat /proc/net/raw" was buggy, missing '\n' separators. I believe this was introduced by commit `8cd850efa4` ([RAW]: Cleanup IPv4 raw_seq_show.) This trivial patch restores correct behavior, and applies to current Linus tree (should also be applied to stable tree as well.) Signed-off-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 17:03:32 -07:00
David S. Miller	93653e0448	tcp: Revert reset of deferred accept changes in 2.6.26 Ingo's system is still seeing strange behavior, and he reports that is goes away if the rest of the deferred accept changes are reverted too. Therefore this reverts `e4c7884028` ("[TCP]: TCP_DEFER_ACCEPT updates - dont retxmt synack") and `539fae89be` ("[TCP]: TCP_DEFER_ACCEPT updates - defer timeout conflicts with max_thresh"). Just like the other revert, these ideas can be revisited for 2.6.27 Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-16 16:57:40 -07:00
Brian Haley	7d06b2e053	net: change proto destroy method to return void Change struct proto destroy function pointer to return void. Noticed by Al Viro. Signed-off-by: Brian Haley <brian.haley@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-14 17:04:49 -07:00
David S. Miller	4ae127d1b6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/smc911x.c	2008-06-13 20:52:39 -07:00
Linus Torvalds	51558576ea	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: tcp: Revert 'process defer accept as established' changes. ipv6: Fix duplicate initialization of rawv6_prot.destroy bnx2x: Updating the Maintainer net: Eliminate flush_scheduled_work() calls while RTNL is held. drivers/net/r6040.c: correct bad use of round_jiffies() fec_mpc52xx: MPC52xx_MESSAGES_DEFAULT: 2nd NETIF_MSG_IFDOWN => IFUP ipg: fix receivemode IPG_RM_RECEIVEMULTICAST{,HASH} in ipg_nic_set_multicast_list() netfilter: nf_conntrack: fix ctnetlink related crash in nf_nat_setup_info() netfilter: Make nflog quiet when no one listen in userspace. ipv6: Fail with appropriate error code when setting not-applicable sockopt. ipv6: Check IPV6_MULTICAST_LOOP option value. ipv6: Check the hop limit setting in ancillary data. ipv6 route: Fix route lifetime in netlink message. ipv6 mcast: Check address family of gf_group in getsockopt(MS_FILTER). dccp: Bug in initial acknowledgment number assignment dccp ccid-3: X truncated due to type conversion dccp ccid-3: TFRC reverse-lookup Bug-Fix dccp ccid-2: Bug-Fix - Ack Vectors need to be ignored on request sockets dccp: Fix sparse warnings dccp ccid-3: Bug-Fix - Zero RTT is possible	2008-06-13 07:34:47 -07:00
David S. Miller	ec0a196626	tcp: Revert 'process defer accept as established' changes. This reverts two changesets, `ec3c0982a2` ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") and the follow-on bug fix `9ae27e0adb` ("tcp: Fix slab corruption with ipv6 and tcp6fuzz"). This change causes several problems, first reported by Ingo Molnar as a distcc-over-loopback regression where connections were getting stuck. Ilpo Järvinen first spotted the locking problems. The new function added by this code, tcp_defer_accept_check(), only has the child socket locked, yet it is modifying state of the parent listening socket. Fixing that is non-trivial at best, because we can't simply just grab the parent listening socket lock at this point, because it would create an ABBA deadlock. The normal ordering is parent listening socket --> child socket, but this code path would require the reverse lock ordering. Next is a problem noticed by Vitaliy Gusev, he noted: ---------------------------------------- >--- a/net/ipv4/tcp_timer.c >+++ b/net/ipv4/tcp_timer.c >@@ -481,6 +481,11 @@ static void tcp_keepalive_timer (unsigned long data) > goto death; > } > >+ if (tp->defer_tcp_accept.request && sk->sk_state == TCP_ESTABLISHED) { >+ tcp_send_active_reset(sk, GFP_ATOMIC); >+ goto death; Here socket sk is not attached to listening socket's request queue. tcp_done() will not call inet_csk_destroy_sock() (and tcp_v4_destroy_sock() which should release this sk) as socket is not DEAD. Therefore socket sk will be lost for freeing. ---------------------------------------- Finally, Alexey Kuznetsov argues that there might not even be any real value or advantage to these new semantics even if we fix all of the bugs: ---------------------------------------- Hiding from accept() sockets with only out-of-order data only is the only thing which is impossible with old approach. Is this really so valuable? My opinion: no, this is nothing but a new loophole to consume memory without control. ---------------------------------------- So revert this thing for now. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-12 16:34:35 -07:00
David S. Miller	e6e30add6b	Merge branch 'net-next-2.6-misc-20080612a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-next	2008-06-11 22:33:59 -07:00
Adrian Bunk	0b04082995	net: remove CVS keywords This patch removes CVS keywords that weren't updated for a long time from comments. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-11 21:00:38 -07:00
YOSHIFUJI Hideaki	9501f97229	tcp md5sig: Let the caller pass appropriate key for tcp_v{4,6}_do_calc_md5_hash(). As we do for other socket/timewait-socket specific parameters, let the callers pass appropriate arguments to tcp_v{4,6}_do_calc_md5_hash(). Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 03:46:30 +09:00
YOSHIFUJI Hideaki	8d26d76dd4	tcp md5sig: Share most of hash calcucaltion bits between IPv4 and IPv6. We can share most part of the hash calculation code because the only difference between IPv4 and IPv6 is their pseudo headers. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 02:38:20 +09:00
YOSHIFUJI Hideaki	076fb72233	tcp md5sig: Remove redundant protocol argument. Protocol is always TCP, so remove useless protocol argument. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 02:38:19 +09:00
YOSHIFUJI Hideaki	7d5d5525bd	tcp md5sig: Share MD5 Signature option parser between IPv4 and IPv6. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-12 02:38:18 +09:00
Linus Torvalds	f7f866eed0	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (42 commits) net: Fix routing tables with id > 255 for legacy software sky2: Hold RTNL while calling dev_close() s2io iomem annotations atl1: fix suspend regression qeth: start dev queue after tx drop error qeth: Prepare-function to call s390dbf was wrong qeth: reduce number of kernel messages qeth: Use ccw_device_get_id(). qeth: layer 3 Oops in ip event handler virtio: use callback on empty in virtio_net virtio: virtio_net free transmit skbs in a timer virtio: Fix typo in virtio_net_hdr comments virtio_net: Fix skb->csum_start computation ehea: set mac address fix sfc: Recover from RX queue flush failure add missing lance_* exports ixgbe: fix typo forcedeth: msi interrupts ipsec: pfkey should ignore events when no listeners pppoe: Unshare skb before anything else ...	2008-06-11 08:39:51 -07:00
Krzysztof Piotr Oledzki	709772e6e0	net: Fix routing tables with id > 255 for legacy software Most legacy software do not like tables > 255 as rtm_table is u8 so tb_id is sent &0xff and it is possible to mismatch for example table 510 with table 254 (main). This patch introduces RT_TABLE_COMPAT=252 so the code uses it if tb_id > 255. It makes such old applications happy, new ones are still able to use RTA_TABLE to get a proper table id. Signed-off-by: Krzysztof Piotr Oledzki <ole@ans.pl> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-10 15:44:49 -07:00
Thomas Graf	573bf470e6	ipv4 addr: Send netlink notification for address label changes Makes people happy who try to keep a list of addresses up to date by listening to notifications. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-10 15:40:04 -07:00
Arnaldo Carvalho de Melo	ce4a7d0d48	inet{6}_request_sock: Init ->opt and ->pktopts in the constructor Wei Yongjun noticed that we may call reqsk_free on request sock objects where the opt fields may not be initialized, fix it by introducing inet_reqsk_alloc where we initialize ->opt to NULL and set ->pktopts to NULL in inet6_reqsk_alloc. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-10 12:39:35 -07:00
David S. Miller	65b53e4cc9	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/tg3.c drivers/net/wireless/rt2x00/rt2x00dev.c net/mac80211/ieee80211_i.h	2008-06-10 02:22:26 -07:00
Rami Rosen	e64bda89b8	netfilter: {ip,ip6,nfnetlink}_queue: misc cleanups - No need to perform data_len = 0 in the switch command, since data_len is initialized to 0 in the beginning of the ipq_build_packet_message() method. - {ip,ip6}_queue: We can reach nlmsg_failure only from one place; skb is sure to be NULL when getting there; since skb is NULL, there is no need to check this fact and call kfree_skb(). Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 16:00:45 -07:00
Fabian Hugelshofer	718d4ad98e	netfilter: nf_conntrack: properly account terminating packets Currently the last packet of a connection isn't accounted when its causing abnormal termination. Introduces nf_ct_kill_acct() which increments the accounting counters on conntrack kill. The new function was necessary, because there are calls to nf_ct_kill() which don't need accounting: nf_conntrack_proto_tcp.c line ~847: Kills ct and returns NF_REPEAT. We don't want to count twice. nf_conntrack_proto_tcp.c line ~880: Kills ct and returns NF_DROP. I think we don't want to count dropped packets. nf_conntrack_netlink.c line ~824: As far as I can see ctnetlink_del_conntrack() is used to destroy a conntrack on behalf of the user. There is an sk_buff, but I don't think this is an actual packet. Incrementing counters here is therefore not desired. Signed-off-by: Fabian Hugelshofer <hugelshofer2006@gmx.ch> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 15:59:40 -07:00
Patrick McHardy	51091764f2	netfilter: nf_conntrack: add nf_ct_kill() Encapsulate the common if (del_timer(&ct->timeout)) ct->timeout.function((unsigned long)ct) sequence in a new function. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 15:59:06 -07:00
James Morris	560ee653b6	netfilter: ip_tables: add iptables security table for mandatory access control rules The following patch implements a new "security" table for iptables, so that MAC (SELinux etc.) networking rules can be managed separately to standard DAC rules. This is to help with distro integration of the new secmark-based network controls, per various previous discussions. The need for a separate table arises from the fact that existing tools and usage of iptables will likely clash with centralized MAC policy management. The SECMARK and CONNSECMARK targets will still be valid in the mangle table to prevent breakage of existing users. Signed-off-by: James Morris <jmorris@namei.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-09 15:57:24 -07:00
Chris Wright	ddb2c43594	asn1: additional sanity checking during BER decoding - Don't trust a length which is greater than the working buffer. An invalid length could cause overflow when calculating buffer size for decoding oid. - An oid length of zero is invalid and allows for an off-by-one error when decoding oid because the first subid actually encodes first 2 subids. - A primitive encoding may not have an indefinite length. Thanks to Wei Wang from McAfee for report. Cc: Steven French <sfrench@us.ibm.com> Cc: stable@kernel.org Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-06-05 14:24:54 -07:00
Octavian Purdila	293ad60401	tcp: Fix for race due to temporary drop of the socket lock in skb_splice_bits. skb_splice_bits temporary drops the socket lock while iterating over the socket queue in order to break a reverse locking condition which happens with sendfile. This, however, opens a window of opportunity for tcp_collapse() to aggregate skbs and thus potentially free the current skb used in skb_splice_bits and tcp_read_sock. This patch fixes the problem by (re-)getting the same "logical skb" after the lock has been temporary dropped. Based on idea and initial patch from Evgeniy Polyakov. Signed-off-by: Octavian Purdila <opurdila@ixiacom.com> Acked-by: Evgeniy Polyakov <johnpol@2ka.mipt.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 15:45:58 -07:00
Sridhar Samudrala	26af65cbeb	tcp: Increment OUTRSTS in tcp_send_active_reset() TCP "resets sent" counter is not incremented when a TCP Reset is sent via tcp_send_active_reset(). Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 15:19:35 -07:00
Denis V. Lunev	22dd485022	raw: Raw socket leak. The program below just leaks the raw kernel socket int main() { int fd = socket(PF_INET, SOCK_RAW, IPPROTO_UDP); struct sockaddr_in addr; memset(&addr, 0, sizeof(addr)); inet_aton("127.0.0.1", &addr.sin_addr); addr.sin_family = AF_INET; addr.sin_port = htons(2048); sendto(fd, "a", 1, MSG_MORE, &addr, sizeof(addr)); return 0; } Corked packet is allocated via sock_wmalloc which holds the owner socket, so one should uncork it and flush all pending data on close. Do this in the same way as in UDP. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 15:16:12 -07:00
David S. Miller	aed5a833fb	Merge branch 'net-2.6-misc-20080605a' of git://git.linux-ipv6.org/gitroot/yoshfuji/linux-2.6-fix	2008-06-04 12:10:21 -07:00
Ilpo Järvinen	a6604471db	tcp: fix skb vs fack_count out-of-sync condition This bug is able to corrupt fackets_out in very rare cases. In order for this to cause corruption: 1) DSACK in the middle of previous SACK block must be generated. 2) In order to take that particular branch, part or all of the DSACKed segment must already be SACKed so that we have that in cache in the first place. 3) The new info must be top enough so that fackets_out will be updated on this iteration. ...then fack_count is updated while skb wasn't, then we walk again that particular segment thus updating fack_count twice for a single skb and finally that value is assigned to fackets_out by tcp_sacktag_one. It is safe to call tcp_sacktag_one just once for a segment (at DSACK), no need to call again for plain SACK. Potential problem of the miscount are limited to premature entry to recovery and to inflated reordering metric (which could even cancel each other out in the most the luckiest scenarios :-)). Both are quite insignificant in worst case too and there exists also code to reset them (fackets_out once sacked_out becomes zero and reordering metric on RTO). This has been reported by a number of people, because it occurred quite rarely, it has been very evasive. Andy Furniss was able to get it to occur couple of times so that a bit more info was collected about the problem using a debug patch, though it still required lot of checking around. Thanks also to others who have tried to help here. This is listed as Bugzilla #10346. The bug was introduced by me in commit `68f8353b48` ([TCP]: Rewrite SACK block processing & sack_recv_cache use), I probably thought back then that there's need to scan that entry twice or didn't dare to make it go through it just once there. Going through twice would have required restoring fack_count after the walk but as noted above, I chose to drop the additional walk step altogether here. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 12:07:44 -07:00
Denis V. Lunev	36d926b94a	[IPV6]: inet_sk(sk)->cork.opt leak IPv6 UDP sockets wth IPv4 mapped address use udp_sendmsg to send the data actually. In this case ip_flush_pending_frames should be called instead of ip6_flush_pending_frames. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-05 04:02:38 +09:00
YOSHIFUJI Hideaki	baa2bfb8ae	[IPV4] TUNNEL4: Fix incoming packet length check for inter-protocol tunnel. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>	2008-06-05 04:02:33 +09:00
Ilpo Järvinen	8aca6cb117	tcp: Fix inconsistency source (CA_Open only when !tcp_left_out(tp)) It is possible that this skip path causes TCP to end up into an invalid state where ca_state was left to CA_Open while some segments already came into sacked_out. If next valid ACK doesn't contain new SACK information TCP fails to enter into tcp_fastretrans_alert(). Thus at least high_seq is set incorrectly to a too high seqno because some new data segments could be sent in between (and also, limited transmit is not being correctly invoked there). Reordering in both directions can easily cause this situation to occur. I guess we would want to use tcp_moderate_cwnd(tp) there as well as it may be possible to use this to trigger oversized burst to network by sending an old ACK with huge amount of SACK info, but I'm a bit unsure about its effects (mainly to FlightSize), so to be on the safe side I just currently fixed it minimally to keep TCP's state consistent (obviously, such nasty ACKs have been possible this far). Though it seems that FlightSize is already underestimated by some amount, so probably on the long term we might want to trigger recovery there too, if appropriate, to make FlightSize calculation to resemble reality at the time when the losses where discovered (but such change scares me too much now and requires some more thinking anyway how to do that as it likely involves some code shuffling). This bug was found by Brian Vowell while running my TCP debug patch to find cause of another TCP issue (fackets_out miscount). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-04 11:34:22 -07:00
Thomas Graf	ab32cd793d	route: Remove unused ifa_anycast field The field was supposed to allow the creation of an anycast route by assigning an anycast address to an address prefix. It was never implemented so this field is unused and serves no purpose. Remove it. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:37:33 -07:00
Thomas Graf	1f9d11c7c9	route: Mark unused routing attributes as such Also removes an unused policy entry for an attribute which is only used in kernel->user direction. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:36:27 -07:00
Thomas Graf	51b77cae0d	route: Mark unused route cache flags as such. Also removes an obsolete check for the unused flag RTCF_MASQ. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-06-03 16:36:01 -07:00
David S. Miller	43154d08d6	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/cpmac.c net/mac80211/mlme.c	2008-05-25 23:26:10 -07:00
Rami Rosen	071f92d059	net: The world is not perfect patch. Unless there will be any objection here, I suggest consider the following patch which simply removes the code for the -DI_WISH_WORLD_WERE_PERFECT in the three methods which use it. The compilation errors we get when using -DI_WISH_WORLD_WERE_PERFECT show that this code was not built and not used for really a long time. Signed-off-by: Rami Rosen <ramirose@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 17:47:54 -07:00
Denis Cheng	51f82a2b12	net/ipv4/arp.c: Use common hex_asc helpers Here the local hexbuf is a duplicate of global const char hex_asc from lib/hexdump.c, except the hex letters' cases: const char hexbuf[] = "0123456789ABCDEF"; const char hex_asc[] = "0123456789abcdef"; and here to print HW addresses, the hex cases are not significant. Thanks to Harvey Harrison to introduce the hex_asc_hi/hex_asc_lo helpers. Signed-off-by: Denis Cheng <crquan@gmail.com> Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 17:34:32 -07:00
Sridhar Samudrala	7d227cd235	tcp: TCP connection times out if ICMP frag needed is delayed We are seeing an issue with TCP in handling an ICMP frag needed message that is received after net.ipv4.tcp_retries1 retransmits. The default value of retries1 is 3. So if the path mtu changes and ICMP frag needed is lost for the first 3 retransmits or if it gets delayed until 3 retransmits are done, TCP doesn't update MSS correctly and continues to retransmit the orginal message until it timesout after tcp_retries2 retransmits. I am seeing this issue even with the latest 2.6.25.4 kernel. In tcp_retransmit_timer(), when retransmits counter exceeds tcp_retries1 value, the dst cache entry of the socket is reset. At this time, if we receive an ICMP frag needed message, the dst entry gets updated with the new MTU, but the TCP sockets dst_cache entry remains NULL. So the next time when we try to retransmit after the ICMP frag needed is received, tcp_retransmit_skb() gets called. Here the cur_mss value is calculated at the start of the routine with a NULL sk_dst_cache. Instead we should call tcp_current_mss after the rebuild_header that caches the dst entry with the updated mtu. Also the rebuild_header should be called before tcp_fragment so that skb is fragmented if the mss goes down. Signed-off-by: Sridhar Samudrala <sri@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 16:42:20 -07:00
Pavel Emelyanov	cf3677ae19	ipmr: Use on-device stats instead of private ones. These devices use the private area of appropriate size for statistics. Turning them to use on-device ones make them "privless" and thus - really small wrt kmalloc cache, they are allocated from. Besides, code looks nicer, because of absence of multi-braced type casts and dereferences. [ Fix build failures -DaveM ] Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:43:45 -07:00
Pavel Emelyanov	2f4c02d40c	ipmr: Ipip tunnel uses on-device stats. The ipmr uses ipip tunnels for its purposes and updates the tunnels' stats, but the ipip driver is already switched to use on-device ones. Actually, this is a part of the patch #4 from this set. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:16:14 -07:00
Pavel Emelyanov	50f59cea07	ipip: Use on-device stats instead of private ones. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:15:16 -07:00
Pavel Emelyanov	addd68eb6f	ipgre: Use on-device stats instead of private ones. Just switch from tunnel->stat to tunnel->dev->stats. The ip_tunnel->stat member itself will be removed after I fix its other users (very soon). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-21 14:14:22 -07:00
Herbert Xu	1ac06e0306	ipsec: Use the correct ip_local_out function Because the IPsec output function xfrm_output_resume does its own dst_output call it should always call __ip_local_output instead of ip_local_output as the latter may invoke dst_output directly. Otherwise the return values from nf_hook and dst_output may clash as they both use the value 1 but for different purposes. When that clash occurs this can cause a packet to be used after it has been freed which usually leads to a crash. Because the offending value is only returned from dst_output with qdiscs such as HTB, this bug is normally not visible. Thanks to Marco Berizzi for his perseverance in tracking this down. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-20 14:32:14 -07:00
Pavel Emelyanov	7d291ebb83	inet: Register fragmentation some ctls at read-only root. Parts of fragments-related sysctls are read-only, but this is done by cloning all the tables and dropping write-bits from mode. Do the same but with read-only root. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-19 13:53:02 -07:00
Pavel Emelyanov	0a64b4b811	inet: Rename fragmentation sysctl-related functions/variables. The fragments sysctls also contains some, that are to be visible, but read-only in net namespaces. The naming in net/core/sysctl_net_core.c is - tables, that are to be registered in namespaces have a "ns" word in their names. So rename ones in ipv4/ip_fragment.c and ipv6/reassembly.c to fit this. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-19 13:51:29 -07:00
Linus Torvalds	6aa5fc4349	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (73 commits) net: Fix typo in net/core/sock.c. ppp: Do not free not yet unregistered net device. netfilter: xt_iprange: module aliases for xt_iprange netfilter: ctnetlink: dump conntrack ID in event messages irda: Fix a misalign access issue. (v2) sctp: Fix use of uninitialized pointer cipso: Relax too much careful cipso hash function. tcp FRTO: work-around inorder receivers tcp FRTO: Fix fallback to conventional recovery New maintainer for Intel ethernet adapters DM9000: Use delayed work to update MII PHY state DM9000: Update and fix driver debugging messages DM9000: Add __devinit and __devexit attributes to probe and remove sky2: fix simple define thinko [netdrvr] sfc: sfc: Add self-test support [netdrvr] sfc: Increment rx_reset when reported as driver event [netdrvr] sfc: Remove unused macro EFX_XAUI_RETRAIN_MAX [netdrvr] sfc: Fix code formatting [netdrvr] sfc: Remove kernel-doc comments for removed members of struct efx_nic [netdrvr] sfc: Remove garbage from comment ...	2008-05-14 10:08:24 -07:00
Pavel Emelyanov	5e0f8923f3	cipso: Relax too much careful cipso hash function. The cipso_v4_cache is allocated to contain CIPSO_V4_CACHE_BUCKETS buckets. The CIPSO_V4_CACHE_BUCKETS = 1 << CIPSO_V4_CACHE_BUCKETBITS, where CIPSO_V4_CACHE_BUCKETBITS = 7. The bucket-selection function for this hash is calculated like this: bkt = hash & (CIPSO_V4_CACHE_BUCKETBITS - 1); ^^^ i.e. picking only 4 buckets of possible 128 :) Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Acked-by: Paul Moore <paul.moore@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-13 23:23:55 -07:00
Ilpo Järvinen	79d44516b4	tcp FRTO: work-around inorder receivers If receiver consumes segments successfully only in-order, FRTO fallback to conventional recovery produces RTO loop because FRTO's forward transmissions will always get dropped and need to be resent, yet by default they're not marked as lost (which are the only segments we will retransmit in CA_Loss). Price to pay about this is occassionally unnecessarily retransmitting the forward transmission(s). SACK blocks help a bit to avoid this, so it's mainly a concern for NewReno case though SACK is not fully immune either. This change has a side-effect of fixing SACKFRTO problem where it didn't have snd_nxt of the RTO time available anymore when fallback become necessary (this problem would have only occured when RTO would occur for two or more segments and ECE arrives in step 3; no need to figure out how to fix that unless the TODO item of selective behavior is considered in future). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-13 02:54:19 -07:00
Ilpo Järvinen	a1c1f281b8	tcp FRTO: Fix fallback to conventional recovery It seems that commit `009a2e3e4e` ("[TCP] FRTO: Improve interoperability with other undo_marker users") run into another land-mine which caused fallback to conventional recovery to break: 1. Cumulative ACK arrives after FRTO retransmission 2. tcp_try_to_open sees zero retrans_out, clears retrans_stamp which should be kept like in CA_Loss state it would be 3. undo_marker change allowed tcp_packet_delayed to return true because of the cleared retrans_stamp once FRTO is terminated causing LossUndo to occur, which means all loss markings FRTO made are reverted. This means that the conventional recovery basically recovered one loss per RTT, which is not that efficient. It was quite unobvious that the undo_marker change broken something like this, I had a quite long session to track it down because of the non-intuitiviness of the bug (luckily I had a trivial reproducer at hand and I was also able to learn to use kprobes in the process as well :-)). This together with the NewReno+FRTO fix and FRTO in-order workaround this fixes Damon's problems, this and the first mentioned are enough to fix Bugzilla #10063. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Damon L. Chesser <damon@damtek.com> Tested-by: Damon L. Chesser <damon@damtek.com> Tested-by: Sebastian Hyrwall <zibbe@cisko.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-13 02:53:26 -07:00
Johannes Berg	f5184d267c	net: Allow netdevices to specify needed head/tailroom This patch adds needed_headroom/needed_tailroom members to struct net_device and updates many places that allocate sbks to use them. Not all of them can be converted though, and I'm sure I missed some (I mostly grepped for LL_RESERVED_SPACE) Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-12 20:48:31 -07:00
Linus Torvalds	28a4acb485	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (32 commits) net: Added ASSERT_RTNL() to dev_open() and dev_close(). can: Fix can_send() handling on dev_queue_xmit() failures netns: Fix arbitrary net_device-s corruptions on net_ns stop. netfilter: Kconfig: default DCCP/SCTP conntrack support to the protocol config values netfilter: nf_conntrack_sip: restrict RTP expect flushing on error to last request macvlan: Fix memleak on device removal/crash on module removal net/ipv4: correct RFC 1122 section reference in comment tcp FRTO: SACK variant is errorneously used with NewReno e1000e: don't return half-read eeprom on error ucc_geth: Don't use RX clock as TX clock. cxgb3: Use CAP_SYS_RAWIO for firmware pcnet32: delete non NAPI code from driver. fs_enet: Fix a memory leak in fs_enet_mdio_probe [netdrvr] eexpress: IPv6 fails - multicast problems 3c59x: use netstats in net_device structure 3c980-TX needs EXTRA_PREAMBLE fix warning in drivers/net/appletalk/cops.c e1000e: Add support for BM PHYs on ICH9 uli526x: fix endianness issues in the setup frame uli526x: initialize the hardware prior to requesting interrupts ...	2008-05-08 19:03:26 -07:00
J.H.M. Dassen (Ray)	c67fa02799	net/ipv4: correct RFC 1122 section reference in comment RFC 1122 does not have a section 3.1.2.2. The requirement to silently discard datagrams with a bad checksum is in section 3.2.1.2 instead. Addresses http://bugzilla.kernel.org/show_bug.cgi?id=10611 Signed-off-by: J.H.M. Dassen (Ray) <jdassen@debian.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-08 01:11:04 -07:00
Ilpo Järvinen	62ab222783	tcp FRTO: SACK variant is errorneously used with NewReno Note: there's actually another bug in FRTO's SACK variant, which is the causing failure in NewReno case because of the error that's fixed here. I'll fix the SACK case separately (it's a separate bug really, though related, but in order to fix that I need to audit tp->snd_nxt usage a bit). There were two places where SACK variant of FRTO is getting incorrectly used even if SACK wasn't negotiated by the TCP flow. This leads to incorrect setting of frto_highmark with NewReno if a previous recovery was interrupted by another RTO. An eventual fallback to conventional recovery then incorrectly considers one or couple of segments as forward transmissions though they weren't, which then are not LOST marked during fallback making them "non-retransmittable" until the next RTO. In a bad case, those segments are really lost and are the only one left in the window. Thus TCP needs another RTO to continue. The next FRTO, however, could again repeat the same events making the progress of the TCP flow extremely slow. In order for these events to occur at all, FRTO must occur again in FRTOs step 3 while the key segments must be lost as well, which is not too likely in practice. It seems to most frequently with some small devices such as network printers that seem to accept TCP segments only in-order. In cases were key segments weren't lost, things get automatically resolved because those wrongly marked segments don't need to be retransmitted in order to continue. I found a reproducer after digging up relevant reports (few reports in total, none at netdev or lkml I know of), some cases seemed to indicate middlebox issues which seems now to be a false assumption some people had made. Bugzilla #10063 _might_ be related. Damon L. Chesser <damon@damtek.com> had a reproducable case and was kind enough to tcpdump it for me. With the tcpdump log it was quite trivial to figure out. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-08 01:09:11 -07:00
Linus Torvalds	4880d10927	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: net_cls_act: act_simple dont ignore realloc code iwlwifi: make IWLWIFI a tristate Revert "atm: Do not free already unregistered net device." dccp: return -EINVAL on invalid feature length irda: fix !PNP support for drivers/net/irda/smsc-ircc2.c irda: fix !PNP support in drivers/net/irda/nsc-ircc.c net_cls_act: Make act_simple use of netlink policy. ip: Use inline function dst_metric() instead of direct access to dst->metric[] ip: Make use of the inline function dst_metric_locked() atm: Bad locking on br2684_devs modifications. atm: Do not free already unregistered net device. mac80211: Do not free net device after it is unregistered. bridge: Consolidate error paths in br_add_bridge(). bridge: Net device leak in br_add_bridge(). niu: Fix probing regression for maramba on-board chips. lapbeth: Release ->ethdev when unregistering device. xfrm: convert empty xfrm_audit_* macros to functions net: Fix useless comment reference loop. sch_htb: remove from event queue in htb_parent_to_leaf()	2008-05-06 07:49:20 -07:00
Satoru SATOH	5ffc02a158	ip: Use inline function dst_metric() instead of direct access to dst->metric[] There are functions to refer to the value of dst->metric[THE_METRIC-1] directly without use of a inline function "dst_metric" defined in net/dst.h. The following patch changes them to use the inline function consistently. Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-04 22:14:42 -07:00
Satoru SATOH	0bbeafd011	ip: Make use of the inline function dst_metric_locked() Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-04 22:12:43 -07:00
Linus Torvalds	4f9faaace2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (47 commits) rose: Wrong list_lock argument in rose_node seqops netns: Fix reassembly timer to use the right namespace netns: Fix device renaming for sysfs bnx2: Update version to 1.7.5. bnx2: Update RV2P firmware for 5709. bnx2: Zero out context memory for 5709. bnx2: Fix register test on 5709. bnx2: Fix remote PHY initial link state. bnx2: Refine remote PHY locking. bridge: forwarding table information for >256 devices tg3: Update version to 3.92 tg3: Add link state reporting to UMP firmware tg3: Fix ethtool loopback test for 5761 BX devices tg3: Fix 5761 NVRAM sizes tg3: Use constant 500KHz MI clock on adapters with a CPMU hci_usb.h: fix hard-to-trigger race dccp: ccid2.c, ccid3.c use clamp(), clamp_t() net: remove NR_CPUS arrays in net/core/dev.c net: use get/put_unaligned_* helpers bluetooth: use get/put_unaligned_* helpers ...	2008-05-03 10:18:21 -07:00
Harvey Harrison	d3e2ce3bcd	net: use get/put_unaligned_* helpers Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 16:26:16 -07:00
Denis V. Lunev	84841c3c6c	ipv4: assign PDE->data before gluing PDE into /proc tree The check for PDE->data != NULL becomes useless after the replacement of proc_net_fops_create with proc_create_data. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 04:10:08 -07:00
Denis V. Lunev	6e79d85d9a	netfilter: assign PDE->data before gluing PDE into /proc tree Simply replace proc_create and further data assigned with proc_create_data. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-02 02:45:42 -07:00
Roman Zippel	6f6d6a1a6a	rename div64_64 to div64_u64 Rename div64_64 to div64_u64 to make it consistent with the other divide functions, so it clearly includes the type of the divide. Move its definition to math64.h as currently no architecture overrides the generic implementation. They can still override it of course, but the duplicated declarations are avoided. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Cc: Avi Kivity <avi@qumranet.com> Cc: Russell King <rmk@arm.linux.org.uk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: David Howells <dhowells@redhat.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "David S. Miller" <davem@davemloft.net> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-05-01 08:03:58 -07:00
Harvey Harrison	ab59859de1	net: fix returning void-valued expression warnings drivers/net/8390.c:37:2: warning: returning void-valued expression drivers/net/bnx2.c:1635:3: warning: returning void-valued expression drivers/net/xen-netfront.c:1806:2: warning: returning void-valued expression net/ipv4/tcp_hybla.c:105:3: warning: returning void-valued expression net/ipv4/tcp_vegas.c:171:3: warning: returning void-valued expression net/ipv4/tcp_veno.c:123:3: warning: returning void-valued expression net/sysctl_net.c:85:2: warning: returning void-valued expression Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Acked-by: Alan Cox <alan@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-05-01 02:47:38 -07:00
Linus Torvalds	95dfec6ae1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (53 commits) tcp: Overflow bug in Vegas [IPv4] UFO: prevent generation of chained skb destined to UFO device iwlwifi: move the selects to the tristate drivers ipv4: annotate a few functions __init in ipconfig.c atm: ambassador: vcc_sf semaphore to mutex MAINTAINERS: The socketcan-core list is subscribers-only. netfilter: nf_conntrack: padding breaks conntrack hash on ARM ipv4: Update MTU to all related cache entries in ip_rt_frag_needed() sch_sfq: use del_timer_sync() in sfq_destroy() net: Add compat support for getsockopt (MCAST_MSFILTER) net: Several cleanups for the setsockopt compat support. ipvs: fix oops in backup for fwmark conn templates bridge: kernel panic when unloading bridge module bridge: fix error handling in br_add_if() netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets netfilter: x_tables: fix net namespace leak when reading /proc/net/xxx_tables_names netfilter: xt_TCPOPTSTRIP: signed tcphoff for ipv6_skip_exthdr() retval tcp: Limit cwnd growth when deferring for GSO tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled [netdrvr] gianfar: Determine TBIPA value dynamically ...	2008-04-30 08:45:48 -07:00
Lachlan Andrew	159131149c	tcp: Overflow bug in Vegas From: Lachlan Andrew <lachlan.andrew@gmail.com> There is an overflow bug in net/ipv4/tcp_vegas.c for large BDPs (e.g. 400Mbit/s, 400ms). The multiplication (old_wnd * vegas->baseRTT) << V_PARAM_SHIFT overflows a u32. [ Fix tcp_veno.c too, it has similar calculations. -DaveM ] Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-30 01:04:03 -07:00
Kostya B	be9164e769	[IPv4] UFO: prevent generation of chained skb destined to UFO device Problem: ip_append_data() could wrongly generate a chained skb for devices which support UFO. When sk_write_queue is not empty (e.g. MSG_MORE), __instead__ of appending data into the next nr_frag of the queued skb, a new chained skb is created. I would normally assume UFO device should get data in nr_frags and not in frag_list. Later the udp4_hwcsum_outgoing() resets csum to NONE and skb_gso_segment() has oops. Proposal: 1. Even length is less than mtu, employ ip_ufo_append_data() and append data to the __existed__ skb in the sk_write_queue. 2. ip_ufo_append_data() is fixed due to a wrong manipulation of peek-ing and later enqueue-ing of the same skb. Now, enqueuing is always performed, because on error the further ip_flush_pending_frames() would release the queued skb. Signed-off-by: Kostya B <bkostya@hotmail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 22:36:30 -07:00
Sam Ravnborg	45e741b890	ipv4: annotate a few functions __init in ipconfig.c A few functions are only used from __init context. So annotate these with __init for consistency and silence the following warnings: WARNING: net/ipv4/built-in.o(.text+0x2a876): Section mismatch in reference from the function ic_bootp_init() to the variable .init.data:bootp_packet_type WARNING: net/ipv4/built-in.o(.text+0x2a907): Section mismatch in reference from the function ic_bootp_cleanup() to the variable .init.data:bootp_packet_type Note: The warnings only appear with CONFIG_DEBUG_SECTION_MISMATCH=y Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 20:58:15 -07:00
Hirofumi Nakagawa	801678c5a3	Remove duplicated unlikely() in IS_ERR() Some drivers have duplicated unlikely() macros. IS_ERR() already has unlikely() in itself. This patch cleans up such pointless code. Signed-off-by: Hirofumi Nakagawa <hnakagawa@miraclelinux.com> Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Jeff Garzik <jeff@garzik.org> Cc: Paul Clements <paul.clements@steeleye.com> Cc: Richard Purdie <rpurdie@rpsys.net> Cc: Alessandro Zummo <a.zummo@towertech.it> Cc: David Brownell <david-b@pacbell.net> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Michael Halcrow <mhalcrow@us.ibm.com> Cc: Anton Altaparmakov <aia21@cantab.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Carsten Otte <cotte@de.ibm.com> Cc: Patrick McHardy <kaber@trash.net> Cc: Paul Mundt <lethal@linux-sh.org> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Takashi Iwai <tiwai@suse.de> Acked-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2008-04-29 08:06:25 -07:00
Philip Craig	443a70d50b	netfilter: nf_conntrack: padding breaks conntrack hash on ARM commit `0794935e` "[NETFILTER]: nf_conntrack: optimize hash_conntrack()" results in ARM platforms hashing uninitialised padding. This padding doesn't exist on other architectures. Fix this by replacing NF_CT_TUPLE_U_BLANK() with memset() to ensure everything is initialised. There were only 4 bytes that NF_CT_TUPLE_U_BLANK() wasn't clearing anyway (or 12 bytes on ARM). Signed-off-by: Philip Craig <philipc@snapgear.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:35:10 -07:00
Timo Teras	0010e46577	ipv4: Update MTU to all related cache entries in ip_rt_frag_needed() Add struct net_device parameter to ip_rt_frag_needed() and update MTU to cache entries where ifindex is specified. This is similar to what is already done in ip_rt_redirect(). Signed-off-by: Timo Teras <timo.teras@iki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:32:25 -07:00
David L Stevens	42908c69f6	net: Add compat support for getsockopt (MCAST_MSFILTER) This patch adds support for getsockopt for MCAST_MSFILTER for both IPv4 and IPv6. It depends on the previous setsockopt patch, and uses the same method. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:23:22 -07:00
Julian Anastasov	2ad17defd5	ipvs: fix oops in backup for fwmark conn templates Fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=10556 where conn templates with protocol=IPPROTO_IP can oops backup box. Result from ip_vs_proto_get() should be checked because protocol value can be invalid or unsupported in backup. But for valid message we should not fail for templates which use IPPROTO_IP. Also, add checks to validate message limits and connection state. Show state NONE for templates using IPPROTO_IP. Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:21:23 -07:00
Arnaud Ebalard	9a732ed6d0	netfilter: {nfnetlink,ip,ip6}_queue: fix skb_over_panic when enlarging packets While reinjecting bigger modified versions of IPv6 packets using libnetfilter_queue, things work fine on a 2.6.24 kernel (2.6.22 too) but I get the following on recents kernels (2.6.25, trace below is against today's net-2.6 git tree): skb_over_panic: text:c04fddb0 len:696 put:632 head:f7592c00 data:f7592c00 tail:0xf7592eb8 end:0xf7592e80 dev:eth0 ------------[ cut here ]------------ invalid opcode: 0000 [#1] PREEMPT Process sendd (pid: 3657, ti=f6014000 task=f77c31d0 task.ti=f6014000) Stack: c071e638 c04fddb0 000002b8 00000278 f7592c00 f7592c00 f7592eb8 f7592e80 f763c000 f6bc5200 f7592c40 f6015c34 c04cdbfc f6bc5200 00000278 f6015c60 c04fddb0 00000020 f72a10c0 f751b420 00000001 0000000a 000002b8 c065582c Call Trace: [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0 [<c04cdbfc>] ? skb_put+0x3c/0x40 [<c04fddb0>] ? nfqnl_recv_verdict+0x1c0/0x2e0 [<c04fd115>] ? nfnetlink_rcv_msg+0xf5/0x160 [<c04fd03e>] ? nfnetlink_rcv_msg+0x1e/0x160 [<c04fd020>] ? nfnetlink_rcv_msg+0x0/0x160 [<c04f8ed7>] ? netlink_rcv_skb+0x77/0xa0 [<c04fcefc>] ? nfnetlink_rcv+0x1c/0x30 [<c04f8c73>] ? netlink_unicast+0x243/0x2b0 [<c04cfaba>] ? memcpy_fromiovec+0x4a/0x70 [<c04f9406>] ? netlink_sendmsg+0x1c6/0x270 [<c04c8244>] ? sock_sendmsg+0xc4/0xf0 [<c011970d>] ? set_next_entity+0x1d/0x50 [<c0133a80>] ? autoremove_wake_function+0x0/0x40 [<c0118f9e>] ? __wake_up_common+0x3e/0x70 [<c0342fbf>] ? n_tty_receive_buf+0x34f/0x1280 [<c011d308>] ? __wake_up+0x68/0x70 [<c02cea47>] ? copy_from_user+0x37/0x70 [<c04cfd7c>] ? verify_iovec+0x2c/0x90 [<c04c837a>] ? sys_sendmsg+0x10a/0x230 [<c011967a>] ? __dequeue_entity+0x2a/0xa0 [<c011970d>] ? set_next_entity+0x1d/0x50 [<c0345397>] ? pty_write+0x47/0x60 [<c033d59b>] ? tty_default_put_char+0x1b/0x20 [<c011d2e9>] ? __wake_up+0x49/0x70 [<c033df99>] ? tty_ldisc_deref+0x39/0x90 [<c033ff20>] ? tty_write+0x1a0/0x1b0 [<c04c93af>] ? sys_socketcall+0x7f/0x260 [<c0102ff9>] ? sysenter_past_esp+0x6a/0x91 [<c05f0000>] ? snd_intel8x0m_probe+0x270/0x6e0 ======================= Code: 00 00 89 5c 24 14 8b 98 9c 00 00 00 89 54 24 0c 89 5c 24 10 8b 40 50 89 4c 24 04 c7 04 24 38 e6 71 c0 89 44 24 08 e8 c4 46 c5 ff <0f> 0b eb fe 55 89 e5 56 89 d6 53 89 c3 83 ec 0c 8b 40 50 39 d0 EIP: [<c04ccdfc>] skb_over_panic+0x5c/0x60 SS:ESP 0068:f6015bf8 Looking at the code, I ended up in nfq_mangle() function (called by nfqnl_recv_verdict()) which performs a call to skb_copy_expand() due to the increased size of data passed to the function. AFAICT, it should ask for 'diff' instead of 'diff - skb_tailroom(e->skb)'. Because the resulting sk_buff has not enough space to support the skb_put(skb, diff) call a few lines later, this results in the call to skb_over_panic(). The patch below asks for allocation of a copy with enough space for mangled packet and the same amount of headroom as old sk_buff. While looking at how the regression appeared (`e2b58a67`), I noticed the same pattern in ipq_mangle_ipv6() and ipq_mangle_ipv4(). The patch corrects those locations too. Tested with bigger reinjected IPv6 packets (nfqnl_mangle() path), things are ok (2.6.25 and today's net-2.6 git tree). Signed-off-by: Arnaud Ebalard <arno@natisbad.org> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:16:34 -07:00
John Heffner	246eb2af06	tcp: Limit cwnd growth when deferring for GSO This fixes inappropriately large cwnd growth on sender-limited flows when GSO is enabled, limiting cwnd growth to 64k. Signed-off-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:13:52 -07:00
John Heffner	ce447eb914	tcp: Allow send-limited cwnd to grow up to max_burst when gso disabled This changes the logic in tcp_is_cwnd_limited() so that cwnd may grow up to tcp_max_burst() even when sk_can_gso() is false, or when sysctl_tcp_tso_win_divisor != 0. Signed-off-by: John Heffner <johnwheffner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-29 03:13:02 -07:00
Linus Torvalds	77a50df2b1	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: iwlwifi: Allow building iwl3945 without iwl4965. wireless: Fix compile error with wifi & leds tcp: Fix slab corruption with ipv6 and tcp6fuzz ipv4/ipv6 compat: Fix SSM applications on 64bit kernels. [IPSEC]: Use digest_null directly for auth sunrpc: fix missing kernel-doc can: Fix copy_from_user() results interpretation Revert "ipv6: Fix typo in net/ipv6/Kconfig" tipc: endianness annotations ipv6: result of csum_fold() is already 16bit, no need to cast [XFRM] AUDIT: Fix flowlabel text format ambibuity.	2008-04-28 09:44:11 -07:00
Evgeniy Polyakov	9ae27e0adb	tcp: Fix slab corruption with ipv6 and tcp6fuzz From: Evgeniy Polyakov <johnpol@2ka.mipt.ru> This fixes a regression added by `ec3c0982a2` ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") tcp_v6_do_rcv()->tcp_rcv_established(), the latter goes to step5, where eventually skb can be freed via tcp_data_queue() (drop: label), then if check for tcp_defer_accept_check() returns true and thus tcp_rcv_established() returns -1, which forces tcp_v6_do_rcv() to jump to reset: label, which in turn will pass through discard: label and free the same skb again. Tested by Eric Sesterhenn. Signed-off-by: David S. Miller <davem@davemloft.net> Acked-By: Patrick McManus <mcmanus@ducksong.com>	2008-04-27 15:27:30 -07:00
David L Stevens	dae5029548	ipv4/ipv6 compat: Fix SSM applications on 64bit kernels. Add support on 64-bit kernels for seting 32-bit compatible MCAST* socket options. Signed-off-by: David L Stevens <dlstevens@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-27 14:26:53 -07:00
Linus Torvalds	2e561c7b7e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (48 commits) net: Fix wrong interpretation of some copy_to_user() results. xfrm: alg_key_len & alg_icv_len should be unsigned [netdrvr] tehuti: move ioctl perm check closer to function start ipv6: Fix typo in net/ipv6/Kconfig via-velocity: fix vlan receipt tg3: sparse cleanup forcedeth: realtek phy crossover detection ibm_newemac: Increase MDIO timeouts gianfar: Fix skb allocation strategy netxen: reduce stack usage of netxen_nic_flash_print smc911x: test after postfix decrement fails in smc911x_{reset,drop_pkt} net drivers: fix platform driver hotplug/coldplug forcedeth: new backoff implementation ehea: make things static phylib: Add support for board-level PHY fixups [netdrvr] atlx: code movement: move atl1 parameter parsing atlx: remove flash vendor parameter korina: misc cleanup korina: fix misplaced return statement WAN: Fix confusing insmod error code for C101 too. ...	2008-04-25 12:28:28 -07:00
Tom Quetchenbach	8d390efd90	tcp: tcp_probe buffer overflow and incorrect return value tcp_probe has a bounds-checking bug that causes many programs (less, python) to crash reading /proc/net/tcp_probe. When it outputs a log line to the reader, it only checks if that line alone will fit in the reader's buffer, rather than that line and all the previous lines it has already written. tcpprobe_read also returns the wrong value if copy_to_user fails--it just passes on the return value of copy_to_user (number of bytes not copied), which makes a failure look like a success. This patch fixes the buffer overflow and sets the return value to -EFAULT if copy_to_user fails. Patch is against latest net-2.6; tested briefly and seems to fix the crashes in less and python. Signed-off-by: Tom Quetchenbach <virtualphtn@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-24 21:11:58 -07:00
Linus Torvalds	d02aacff44	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (22 commits) tun: Multicast handling in tun_chr_ioctl() needs proper locking. [NET]: Fix heavy stack usage in seq_file output routines. [AF_UNIX] Initialise UNIX sockets before general device initcalls [RTNETLINK]: Fix bogus ASSERT_RTNL warning iwlwifi: Fix built-in compilation of iwlcore (part 2) tun: Fix minor race in TUNSETLINK ioctl handling. ppp_generic: use stats from net_device structure iwlwifi: Don't unlock priv->mutex if it isn't locked wireless: rndis_wlan: modparam_workaround_interval is never below 0. prism54: prism54_get_encode() test below 0 on unsigned index mac80211: update mesh EID values b43: Workaround DMA quirks mac80211: fix use before check of Qdisc length net/mac80211/rx.c: fix off-by-one mac80211: Fix race between ieee80211_rx_bss_put and lookup routines. ath5k: Fix radio identification on AR5424/2424 ssb: Fix all-ones boardflags b43: Add more btcoexist workarounds b43: Fix HostFlags data types b43: Workaround invalid bluetooth settings ...	2008-04-24 08:40:34 -07:00
Pavel Emelyanov	5e659e4cb0	[NET]: Fix heavy stack usage in seq_file output routines. Plan C: we can follow the Al Viro's proposal about %n like in this patch. The same applies to udp, fib (the /proc/net/route file), rt_cache and sctp debug. This is minus ~150-200 bytes for each. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-24 01:02:16 -07:00
Linus Torvalds	b0d19a378a	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: iwlwifi: Fix built-in compilation of iwlcore net: Unexport move_addr_to_{kernel,user} rt2x00: Select LEDS_CLASS. iwlwifi: Select LEDS_CLASS. leds: Do not guard NEW_LEDS with HAS_IOMEM [IPSEC]: Fix catch-22 with algorithm IDs above 31 time: Export set_normalized_timespec. tcp: Make use of before macro in tcp_input.c hamradio: Remove unneeded and deprecated cli()/sti() calls in dmascc.c [NETNS]: Remove empty ->init callback. [DCCP]: Convert do_gettimeofday() to getnstimeofday(). [NETNS]: Don't initialize err variable twice. [NETNS]: The ip6_fib_timer can work with garbage on net namespace stop. [IPV4]: Convert do_gettimeofday() to getnstimeofday(). [IPV4]: Make icmp_sk_init() static. [IPV6]: Make struct ip6_prohibit_entry_template static. tcp: Trivial fix to correct function name in a comment in net/ipv4/tcp.c [NET]: Expose netdevice dev_id through sysfs skbuff: fix missing kernel-doc notation [ROSE]: Fix soft lockup wrt. rose_node_list_lock	2008-04-23 12:23:45 -07:00
Linus Torvalds	529a41e366	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: rose: Socket lock was not released before returning to user space hci_usb: remove code obfuscation drivers/net/appletalk: use time_before, time_before_eq, etc drivers/atm: use time_before, time_before_eq, etc hci_usb: do not initialize static variables to 0 tg3: 5701 DMA corruption fix atm nicstar: Removal of debug code containing deprecated calls to cli()/sti() iwlwifi: Fix unconditional access to station->tidp[].agg. netfilter: Fix SIP conntrack build with NAT disabled. netfilter: Fix SCTP nat build.	2008-04-21 15:46:17 -07:00
Arnd Hannemann	d7ee147d4f	tcp: Make use of before macro in tcp_input.c Make use of tcp before macro. Signed-off-by: Arnd Hannemann <hannemann@nets.rwth-aachen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-21 14:46:22 -07:00
YOSHIFUJI Hideaki	f25c3d613b	[IPV4]: Convert do_gettimeofday() to getnstimeofday(). What do_gettimeofday() does is to call getnstimeofday() and to convert the result from timespec{} to timeval{}. After that, these callers convert the result again to msec. Use getnstimeofday() and convert the units at once. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-21 02:34:08 -07:00
Adrian Bunk	263173af5b	[IPV4]: Make icmp_sk_init() static. This patch makes the needlessly global icmp_sk_init() static. Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-21 02:31:23 -07:00
Satoru SATOH	1f29b0584d	tcp: Trivial fix to correct function name in a comment in net/ipv4/tcp.c This is a trivial fix to correct function name in a comment in net/ipv4/tcp.c. Signed-off-by: Satoru SATOH <satoru.satoh@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-21 02:27:58 -07:00
Patrick McHardy	4e9d8a70e4	netfilter: Fix SCTP nat build. We need to select LIBCRC32C. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-19 17:52:51 -07:00
Matthew Wilcox	5f090dcb4d	net: Remove unnecessary inclusions of asm/semaphore.h None of these files use any of the functionality promised by asm/semaphore.h. It's possible that they rely on it dragging in some unrelated header file, but I can't build all these files, so we'll have fix any build failures as they come up. Signed-off-by: Matthew Wilcox <willy@linux.intel.com>	2008-04-18 22:15:50 -04:00
David S. Miller	1e42198609	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6	2008-04-17 23:56:30 -07:00
Pavel Emelyanov	53083773dc	[INET]: Uninline the __inet_inherit_port call. This deblats ~200 bytes when ipv6 and dccp are 'y'. Besides, this will ease compilation issues for patches I'm working on to make inet hash tables more scalable wrt net namespaces. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-17 23:18:15 -07:00
Linus Torvalds	b4b8f57965	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: [TCP]: Add return value indication to tcp_prune_ofo_queue(). PS3: gelic: fix the oops on the broken IE returned from the hypervisor b43legacy: fix DMA mapping leakage mac80211: remove message on receiving unexpected unencrypted frames Update rt2x00 MAINTAINERS entry Add rfkill to MAINTAINERS file rfkill: Fix device type check when toggling states b43legacy: Fix usage of struct device used for DMAing ssb: Fix usage of struct device used for DMAing MAINTAINERS: move to generic repository for iwlwifi b43legacy: fix initvals loading on bcm4303 rtl8187: Add missing priv->vif assignments netconsole: only set CON_PRINTBUFFER if the user specifies a netconsole [CAN]: Update documentation of struct sockaddr_can MAINTAINERS: isdn4linux@listserv.isdn4linux.de is subscribers-only [TCP]: Fix never pruned tcp out-of-order queue. [NET_SCHED] sch_api: fix qdisc_tree_decrease_qlen() loop	2008-04-16 07:44:27 -07:00
Denis V. Lunev	8c5da49a63	[NETNS]: Add netns refcnt debug for inet bind buckets. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 02:01:11 -07:00
Denis V. Lunev	57d7a60092	[NETNS]: Add netns refcnt debug into fib_info. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 02:00:50 -07:00
Denis V. Lunev	cd5342d905	[NETNS]: Add netns refcnt debug for timewait buckets. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 02:00:28 -07:00
Pavel Emelyanov	b0970c428b	[SIT]: Allow for IPPROTO_IPV6 protocol in namespaces. This makes sit-generated traffic enter the namespace. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:17:39 -07:00
Pavel Emelyanov	f96c148fd5	[GRE]: Allow for IPPROTO_GRE protocol in namespaces. This one was also disabled by default for sanity. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:11:36 -07:00
Pavel Emelyanov	0b67eceb19	[GRE]: Allow to create IPGRE tunnels in net namespaces. I.e. set the proper net and mark as NETNS_LOCAL. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:11:13 -07:00
Pavel Emelyanov	96635522f7	[GRE]: Use proper net in routing calls. As for the IPIP tunnel, there are some ip_route_output_key() calls in there that require a proper net so give one to them. And a proper net for the __get_dev_by_index hanging around. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:10:44 -07:00
Pavel Emelyanov	eb8ce741a3	[GRE]: Make tunnels hashes per-net. Very similar to what was done for the IPIP code. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:10:26 -07:00
Pavel Emelyanov	7daa000489	[GRE]: Make the fallback tunnel device per-net. Everything is prepared for this change now. Create on in init callback, use it over the code and destroy on net exit. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:10:05 -07:00
Pavel Emelyanov	3b4667f3db	[GRE]: Use proper net in hash-lookup functions. This is the part#2 of the patch #2 - get the proper net for these functions. This change in a separate patch in order not to get lost in a large previous patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:09:44 -07:00
Pavel Emelyanov	f57e7d5a7b	[GRE]: Add net/gre_net argument to some functions. The fallback device and hashes are to become per-net, but many code doesn't have anything to get the struct net pointer from. So pass the proper net there with an extra argument. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:09:22 -07:00
Pavel Emelyanov	59a4c7594b	[GRE]: Introduce empty ipgre_net structure and net init/exit ops. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:08:53 -07:00
Pavel Emelyanov	4597a0ce08	[IPIP]: Allow for IPPROTO_IPIP protocol in namespaces. This one was disabled by default for sanity. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:06:56 -07:00
Pavel Emelyanov	0a826406d4	[IPIP]: Allow to create IPIP tunnels in net namespaces. Set the proper net before calling register_netdev and disable the tunnel device netns changing. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:06:18 -07:00
Pavel Emelyanov	b99f0152e5	[IPIP]: Use proper net in (mostly) routing calls. There are some ip_route_output_key() calls in there that require a proper net so give one to them. Besides - give a proper net to a single __get_dev_by_index call in ipip_tunnel_bind_dev(). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:05:57 -07:00
Pavel Emelyanov	44d3c299dc	[IPIP]: Make tunnels hashes per net. Either net or ipip_net already exists in all the required places, so just use one. Besides, tune net_init and net_exit calls to respectively initialize the hashes and destroy devices. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:05:32 -07:00
Pavel Emelyanov	cec3ffae1a	[IPIP]: Use proper net in hash-lookup functions. This is the part#2 of the previous patch - get the proper net for these functions. I make it in a separate patch, so that this change does not get lost in a large previous patch. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:05:03 -07:00
Pavel Emelyanov	b9fae5c913	[IPIP]: Add net/ipip_net argument to some functions. The hashes of tunnels will be per-net too, so prepare all the functions that uses them for this change by adding an argument. Use init_net temporarily in places, where the net does not exist explicitly yet. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:04:35 -07:00
Pavel Emelyanov	b9855c54da	[IPIP]: Make the fallback tunnel device per-net. Create on in ipip_init_net(), use it all over the code (the proper place to get the net from already exists) and destroy in ipip_net_exit(). Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:04:13 -07:00
Pavel Emelyanov	10dc4c7bb7	[IPIP]: Introduce empty ipip_net structure and net init/exit ops. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-16 01:03:13 -07:00
Ilpo Järvinen	17515408a1	[TCP]: Remove superflushious skb == write_queue_tail() check Needed can only be more strict than what was checked by the earlier common case check for non-tail skbs, thus cwnd_len <= needed will never match in that case anyway. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-15 20:36:55 -07:00
Vitaliy Gusev	56f367bbfd	[TCP]: Add return value indication to tcp_prune_ofo_queue(). Returns non-zero if tp->out_of_order_queue was seen non-empty. This allows tcp_try_rmem_schedule() to return early. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-15 20:26:34 -07:00
Vitaliy Gusev	b000cd3707	[TCP]: Fix never pruned tcp out-of-order queue. tcp_prune_queue() doesn't prune an out-of-order queue at all. Therefore sk_rmem_schedule() can fail but the out-of-order queue isn't pruned . This can lead to tcp deadlock state if the next two conditions are held: 1. There are a sequence hole between last received in order segment and segments enqueued to the out-of-order queue. 2. Size of all segments in the out-of-order queue is more than tcp_mem[2]. Signed-off-by: Vitaliy Gusev <vgusev@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-15 00:33:38 -07:00
Linus Torvalds	533bb8a4d7	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (31 commits) [BRIDGE]: Fix crash in __ip_route_output_key with bridge netfilter [NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put [IPV6] ADDRCONF: Don't generate temporary address for ip6-ip6 interface. [IPV6] ADDRCONF: Ensure disabling multicast RS even if privacy extensions are disabled. [IPV6]: Use appropriate sock tclass setting for routing lookup. [IPV6]: IPv6 extension header structures need to be packed. [IPV6]: Fix ipv6 address fetching in raw6_icmp_error(). [NET]: Return more appropriate error from eth_validate_addr(). [ISDN]: Do not validate ISDN net device address prior to interface-up [NET]: Fix kernel-doc for skb_segment [SOCK] sk_stamp: should be initialized to ktime_set(-1L, 0) net: check for underlength tap writes net: make struct tun_struct private to tun.c [SCTP]: IPv4 vs IPv6 addresses mess in sctp_inet[6]addr_event. [SCTP]: Fix compiler warning about const qualifiers [SCTP]: Fix protocol violation when receiving an error lenght INIT-ACK [SCTP]: Add check for hmac_algo parameter in sctp_verify_param() [NET_SCHED] cls_u32: refcounting fix for u32_delete() [DCCP]: Fix skb->cb conflicts with IP [AX25]: Potential ax25_uid_assoc-s leaks on module unload. ...	2008-04-14 07:56:24 -07:00
YOSHIFUJI Hideaki	569508c964	[TCP]: Format addresses appropriately in debug messages. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 04:09:36 -07:00
YOSHIFUJI Hideaki	a7d632b6b4	[IPV4]: Use NIPQUAD_FMT to format ipv4 addresses. And use %u to format port. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 04:09:00 -07:00
David S. Miller	334f8b2afd	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6.26	2008-04-14 03:50:43 -07:00
Pavel Emelyanov	7477fd2e6b	[SOCK]: Add some notes about per-bind-bucket sock lookup. I was asked about "why don't we perform a sk_net filtering in bind_conflict calls, like we do in other sock lookup places" for a couple of times. Can we please add a comment about why we do not need one? Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 02:42:27 -07:00
David S. Miller	df39e8ba56	Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/ehea/ehea_main.c drivers/net/wireless/iwlwifi/Kconfig drivers/net/wireless/rt2x00/rt61pci.c net/ipv4/inet_timewait_sock.c net/ipv6/raw.c net/mac80211/ieee80211_sta.c	2008-04-14 02:30:23 -07:00
Jan Engelhardt	3c9fba656a	[NETFILTER]: nf_conntrack: replace NF_CT_DUMP_TUPLE macro indrection by function call Directly call IPv4 and IPv6 variants where the address family is easily known. Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:54 +02:00
Jan Engelhardt	12c33aa20e	[NETFILTER]: nf_conntrack: const annotations in nf_conntrack_sctp, nf_nat_proto_gre Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:54 +02:00
Jan Engelhardt	f2ea825f48	[NETFILTER]: nf_nat: use bool type in nf_nat_proto Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:53 +02:00
Jan Engelhardt	09f263cd39	[NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l4proto Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:53 +02:00
Jan Engelhardt	8ce8439a31	[NETFILTER]: nf_conntrack: use bool type in struct nf_conntrack_l3proto Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:52 +02:00
Patrick McHardy	5e8fbe2ac8	[NETFILTER]: nf_conntrack: add tuplehash l3num/protonum accessors Add accessors for l3num and protonum and get rid of some overly long expressions. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:52 +02:00
Patrick McHardy	dd13b01036	[NETFILTER]: nf_nat: kill helper and seq_adjust hooks Connection tracking helpers (specifically FTP) need to be called before NAT sequence numbers adjustments are performed to be able to compare them against previously seen ones. We've introduced two new hooks around 2.6.11 to maintain this ordering when NAT modules were changed to get called from conntrack helpers directly. The cost of netfilter hooks is quite high and sequence number adjustments are only rarely needed however. Add a RCU-protected sequence number adjustment function pointer and call it from IPv4 conntrack after calling the helper. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:52 +02:00
Patrick McHardy	8c87238b72	[NETFILTER]: nf_nat: don't add NAT extension for confirmed conntracks Adding extensions to confirmed conntracks is not allowed to avoid races on reallocation. Don't setup NAT for confirmed conntracks in case NAT module is loaded late. The has one side-effect, the connections existing before the NAT module was loaded won't enter the bysource hash. The only case where this actually makes a difference is in case of SNAT to a multirange where the IP before NAT is also part of the range. Since old connections don't enter the bysource hash the first new connection from the IP will have a new address selected. This shouldn't matter at all. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:51 +02:00
Patrick McHardy	42cf800c24	[NETFILTER]: nf_nat: remove obsolete check for ICMP redirects Locally generated ICMP packets have a reference to the conntrack entry of the original packet manually attached by icmp_send(). Therefore the check for locally originated untracked ICMP redirects can never be true. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:50 +02:00
Patrick McHardy	9d908a69a3	[NETFILTER]: nf_nat: add SCTP protocol support Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:50 +02:00
Patrick McHardy	4910a08799	[NETFILTER]: nf_nat: add DCCP protocol support Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:50 +02:00
Patrick McHardy	d63a650736	[NETFILTER]: Add partial checksum validation helper Move the UDP-Lite conntrack checksum validation to a generic helper similar to nf_checksum() and make it fall back to nf_checksum() in case the full packet is to be checksummed and hardware checksums are available. This is to be used by DCCP conntrack, which also needs to verify partial checksums. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:49 +02:00
Patrick McHardy	6185f870e2	[NETFILTER]: nf_nat: add UDP-Lite support Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:48 +02:00
Patrick McHardy	2d2d84c40e	[NETFILTER]: nf_nat: remove unused name from struct nf_nat_protocol Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:48 +02:00
Patrick McHardy	ca6a507490	[NETFILTER]: nf_conntrack_netlink: clean up NAT protocol parsing Move responsibility for setting the IP_NAT_RANGE_PROTO_SPECIFIED flag to the NAT protocol, properly propagate errors and get rid of ugly return value convention. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:47 +02:00
Patrick McHardy	535b57c7c1	[NETFILTER]: nf_nat: move NAT ctnetlink helpers to nf_nat_proto_common Move to nf_nat_proto_common and rename to nf_nat_proto_... since they're also used by protocols that don't have port numbers. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:47 +02:00
Patrick McHardy	5abd363f73	[NETFILTER]: nf_nat: fix random mode not to overwrite port rover The port rover should not get overwritten when using random mode, otherwise other rules will also use more or less random ports. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:46 +02:00
Patrick McHardy	937e0dfd87	[NETFILTER]: nf_nat: add helpers for common NAT protocol operations Add generic ->in_range and ->unique_tuple ops to avoid duplicating them again and again for future NAT modules and save a few bytes of text: net/ipv4/netfilter/nf_nat_proto_tcp.c: tcp_in_range \| -62 (removed) tcp_unique_tuple \| -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0 2 functions changed, 321 bytes removed net/ipv4/netfilter/nf_nat_proto_udp.c: udp_in_range \| -62 (removed) udp_unique_tuple \| -259 # 271 -> 12, # inlines: 1 -> 0, size inlines: 7 -> 0 2 functions changed, 321 bytes removed net/ipv4/netfilter/nf_nat_proto_gre.c: gre_in_range \| -62 (removed) 1 function changed, 62 bytes removed vmlinux: 5 functions changed, 704 bytes removed Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:46 +02:00
Patrick McHardy	544473c166	[NETFILTER]: {ip,ip6,arp}_tables: return EAGAIN for invalid SO_GET_ENTRIES size Rule dumping is performed in two steps: first userspace gets the ruleset size using getsockopt(SO_GET_INFO) and allocates memory, then it calls getsockopt(SO_GET_ENTRIES) to actually dump the ruleset. When another process changes the ruleset in between the sizes from the first getsockopt call doesn't match anymore and the kernel aborts. Unfortunately it returns EAGAIN, as for multiple other possible errors, so userspace can't distinguish this case from real errors. Return EAGAIN so userspace can retry the operation. Fixes (with current iptables SVN version) netfilter bugzilla #104. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:45 +02:00
Jan Engelhardt	c2f9c68398	[NETFILTER]: Explicitly initialize .priority in arptable_filter Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:44 +02:00
Jan Engelhardt	3bb0362d2f	[NETFILTER]: remove arpt_(un)register_target indirection macros Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:44 +02:00
Jan Engelhardt	95eea855af	[NETFILTER]: remove arpt_target indirection macro Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:43 +02:00
Jan Engelhardt	4abff0775d	[NETFILTER]: remove arpt_table indirection macro Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:43 +02:00
Jan Engelhardt	72b72949db	[NETFILTER]: annotate rest of nf_nat_* with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:42 +02:00
Jan Engelhardt	5452e425ad	[NETFILTER]: annotate {arp,ip,ip6,x}tables with const Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 11:15:35 +02:00
Jan Engelhardt	3cf93c96af	[NETFILTER]: annotate xtables targets with const and remove casts Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 09:56:05 +02:00
Robert P. J. Day	fdccecd0cc	[NETFILTER]: Use non-deprecated __RW_LOCK_UNLOCKED macro Signed-off-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 09:56:03 +02:00
Alexey Dobriyan	666953df35	[NETFILTER]: ip_tables: per-netns FILTER/MANGLE/RAW tables for real Commit `9335f047fe` aka "[NETFILTER]: ip_tables: per-netns FILTER, MANGLE, RAW" added per-netns _view_ of iptables rules. They were shown to user, but ignored by filtering code. Now that it's possible to at least ping loopback, per-netns tables can affect filtering decisions. netns is taken in case of PRE_ROUTING, LOCAL_IN -- from in device, POST_ROUTING, LOCAL_OUT -- from out device, FORWARD -- from in device which should be equal to out device's netns. This code is relatively new, so BUG_ON was plugged. Wrappers were added to a) keep code the same from CONFIG_NET_NS=n users (overwhelming majority), b) consolidate code in one place -- similar changes will be done in ipv6 and arp netfilter code. Signed-off-by: Alexey Dobriyan <adobriyan@sw.ru> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 09:56:02 +02:00
Patrick McHardy	36e2a1b0f7	[NETFILTER]: {ip,ip6}t_LOG: print MARK value in log output Dump the mark value in log messages similar to nfnetlink_log. This is useful for debugging complex setups where marks are used for routing or traffic classification. Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 09:56:01 +02:00
Pavel Emelyanov	4dee959723	[NETFILTER]: ipt_CLUSTERIP: fix race between clusterip_config_find_get and _entry_put Consider we are putting a clusterip_config entry with the "entries" count == 1, and on the other CPU there's a clusterip_config_find_get in progress: CPU1: CPU2: clusterip_config_entry_put: clusterip_config_find_get: if (atomic_dec_and_test(&c->entries)) { /* true / read_lock_bh(&clusterip_lock); c = __clusterip_config_find(clusterip); / found - it's still in list */ ... atomic_inc(&c->entries); read_unlock_bh(&clusterip_lock); write_lock_bh(&clusterip_lock); list_del(&c->list); write_unlock_bh(&clusterip_lock); ... dev_put(c->dev); Oops! We have an entry returned by the clusterip_config_find_get, which is a) not in list b) has a stale dev pointer. The problems will happen when the CPU2 will release the entry - it will remove it from the list for the 2nd time, thus spoiling it, and will put a stale dev pointer. The fix is to make atomic_dec_and_test under the clusterip_lock. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Patrick McHardy <kaber@trash.net>	2008-04-14 00:44:52 -07:00
Gerrit Renker	7de6c03336	[SKB]: __skb_append = __skb_queue_after This expresses __skb_append in terms of __skb_queue_after, exploiting that __skb_append(old, new, list) = __skb_queue_after(list, old, new). Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-14 00:05:09 -07:00
Denis V. Lunev	5f4472c5a6	[TCP]: Remove owner from tcp_seq_afinfo. Move it to tcp_seq_afinfo->seq_fops as should be. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:13:53 -07:00
Denis V. Lunev	68fcadd16c	[TCP]: Place file operations directly into tcp_seq_afinfo. No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:13:30 -07:00
Denis V. Lunev	52d6f3f11b	[TCP]: Cleanup /proc/tcp[6] creation/removal. Replace seq_open with seq_open_net and remove tcp_seq_release completely. seq_release_net will do this job just fine. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:12:41 -07:00
Denis V. Lunev	9427c4b36b	[TCP]: Move seq_ops from tcp_iter_state to tcp_seq_afinfo. No need to create seq_operations for each instance of 'netstat'. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:12:13 -07:00
Denis V. Lunev	1abf4fb20d	[TCP]: No need to check afinfo != NULL in tcp_proc_(un)register. tcp_proc_register/tcp_proc_unregister are called with a static pointer only. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:11:46 -07:00
Denis V. Lunev	a4146b1b2c	[TCP]: Replace struct net on tcp_iter_state with seq_net_private. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 22:11:14 -07:00
Gerrit Renker	ac6f781920	[INET]: sk_reuse is valbool sk_reuse is declared as "unsigned char", but is set as type valbool in net/core/sock.c. There is no other place in net/ where sk->sk_reuse is set to a value > 1, so the test "sk_reuse > 1" can not be true. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-13 21:50:08 -07:00
Linus Torvalds	14897e35fd	Merge branch 'docs' of git://git.lwn.net/linux-2.6 * 'docs' of git://git.lwn.net/linux-2.6: Add additional examples in Documentation/spinlocks.txt Move sched-rt-group.txt to scheduler/ Documentation: move rpc-cache.txt to filesystems/ Documentation: move nfsroot.txt to filesystems/ Spell out behavior of atomic_dec_and_lock() in kerneldoc Fix a typo in highres.txt Fixes to the seq_file document Fill out information on patch tags in SubmittingPatches Add the seq_file documentation	2008-04-11 13:24:16 -07:00
J. Bruce Fields	6ded55da6b	Documentation: move nfsroot.txt to filesystems/ Documentation/ is a little large, and filesystems/ seems an obvious place for this file. Signed-off-by: J. Bruce Fields <bfields@citi.umich.edu> Signed-off-by: Jonathan Corbet <corbet@lwn.net>	2008-04-11 13:18:01 -06:00
Daniel Lezcano	7951f0b03a	[NETNS][IPV6] tcp - assign the netns for timewait sockets Copy the network namespace from the socket to the timewait socket. Signed-off-by: Daniel Lezcano <dlezcano@fr.ibm.com> Acked-by: Mark Lord <mlord@pobox.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 20:53:10 -07:00
Stephen Hemminger	c0b8c32b1c	IPV4: use xor rather than multiple ands for route compare The comparison in ip_route_input is a hot path, by recoding the C "and" as bit operations, fewer conditional branches get generated so the code should be faster. Maybe someday Gcc will be smart enough to do this? Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Acked-by: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 04:00:28 -07:00
Stephen Hemminger	387a5487f5	ipv4: fib_trie leaf free optimization Avoid unneeded test in the case where object to be freed has to be a leaf. Don't need to use the generic tnode_free() function, instead just setup leaf to be freed. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 03:47:34 -07:00
Stephen Hemminger	ef3660ce06	ipv4: fib_trie remove unused argument The trie pointer is passed down to flush_list and flush_leaf but never used. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 03:46:12 -07:00
Florian Westphal	4dfc281702	[Syncookies]: Add support for TCP options via timestamps. Allow the use of SACK and window scaling when syncookies are used and the client supports tcp timestamps. Options are encoded into the timestamp sent in the syn-ack and restored from the timestamp echo when the ack is received. Based on earlier work by Glenn Griffin. This patch avoids increasing the size of structs by encoding TCP options into the least significant bits of the timestamp and by not using any 'timestamp offset'. The downside is that the timestamp sent in the packet after the synack will increase by several seconds. changes since v1: don't duplicate timestamp echo decoding function, put it into ipv4/syncookie.c and have ipv6/syncookies.c use it. Feedback from Glenn Griffin: fix line indented with spaces, kill redundant if () Reviewed-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 03:12:40 -07:00
Stephen Hemminger	15be75cdb5	IPV4: fib_trie use vmalloc for large tnodes Use vmalloc rather than alloc_pages to avoid wasting memory. The problem is that tnode structure has a power of 2 sized array, plus a header. So the current code wastes almost half the memory allocated because it always needs the next bigger size to hold that small header. This is similar to an earlier patch by Eric, but instead of a list and lock, I used a workqueue to handle the fact that vfree can't be done in interrupt context. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 02:56:38 -07:00
Stephen Hemminger	2fa7527ba1	IPV4: route rekey timer can be deferrable No urgency on the rehash interval timer, so mark it as deferrable. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 01:55:27 -07:00
Stephen Hemminger	1294fc4a48	IPV4: route use jhash3 Since route hash is a triple, use jhash_3words rather doing the mixing directly. This should be as fast and give better distribution. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 01:54:01 -07:00
Stephen Hemminger	5969f71d57	IPV4: route inline changes Don't mark functions that are large as inline, let compiler decide. Also, use inline rather than __inline__. Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 01:52:09 -07:00
David S. Miller	951e07c930	[IPV4]: Fix byte value boundary check in do_ip_getsockopt(). This fixes kernel bugzilla 10371. As reported by M.Piechaczek@osmosys.tv, if we try to grab a char sized socket option value, as in: unsigned char ttl = 255; socklen_t len = sizeof(ttl); setsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len); getsockopt(socket, IPPROTO_IP, IP_MULTICAST_TTL, &ttl, &len); The ttl returned will be wrong on big-endian, and on both little- endian and big-endian the next three bytes in userspace are written with garbage. It's because of this test in do_ip_getsockopt(): if (len < sizeof(int) && len > 0 && val>=0 && val<255) { It should allow a 'val' of 255 to pass here, but it doesn't so it copies a full 'int' back to userspace. On little-endian that will write the correct value into the location but it spams on the next three bytes in userspace. On big endian it writes the wrong value into the location and spams the next three bytes. Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-10 01:29:36 -07:00
Jan Engelhardt	475959d477	[NETFILTER]: nf_nat: autoload IPv4 connection tracking Without this patch, the generic L3 tracker would kick in if nf_conntrack_ipv4 was not loaded before nf_nat, which would lead to translation problems with ICMP errors. NAT does not make sense without IPv4 connection tracking anyway, so just add a call to need_ipv4_conntrack(). Signed-off-by: Jan Engelhardt <jengelh@computergmbh.de> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-09 15:14:58 -07:00
Ilpo Järvinen	6adb4f733e	[TCP]: Don't allow FRTO to take place while MTU is being probed MTU probe can cause some remedies for FRTO because the normal packet ordering may be violated allowing FRTO to make a wrong decision (it might not be that serious threat for anything though). Thus it's safer to not run FRTO while MTU probe is underway. It seems that the basic FRTO variant should also look for an skb at probe_seq.start to check if that's retransmitted one but I didn't implement it now (plain seqno in window check isn't robust against wraparounds). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-07 22:33:57 -07:00
Ilpo Järvinen	882bebaaca	[TCP]: tcp_simple_retransmit can cause S+L This fixes Bugzilla #10384 tcp_simple_retransmit does L increment without any checking whatsoever for overflowing S+L when Reno is in use. The simplest scenario I can currently think of is rather complex in practice (there might be some more straightforward cases though). Ie., if mss is reduced during mtu probing, it may end up marking everything lost and if some duplicate ACKs arrived prior to that sacked_out will be non-zero as well, leading to S+L > packets_out, tcp_clean_rtx_queue on the next cumulative ACK or tcp_fastretrans_alert on the next duplicate ACK will fix the S counter. More straightforward (but questionable) solution would be to just call tcp_reset_reno_sack() in tcp_simple_retransmit but it would negatively impact the probe's retransmission, ie., the retransmissions would not occur if some duplicate ACKs had arrived. So I had to add reno sacked_out reseting to CA_Loss state when the first cumulative ACK arrives (this stale sacked_out might actually be the explanation for the reports of left_out overflows in kernel prior to 2.6.23 and S+L overflow reports of 2.6.24). However, this alone won't be enough to fix kernel before 2.6.24 because it is building on top of the commit `1b6d427bb7` ([TCP]: Reduce sacked_out with reno when purging write_queue) to keep the sacked_out from overflowing. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Reported-by: Alessandro Suardi <alessandro.suardi@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-07 22:33:07 -07:00
Ilpo Järvinen	c137f3dda0	[TCP]: Fix NewReno's fast rexmit/recovery problems with GSOed skb Fixes a long-standing bug which makes NewReno recovery crippled. With GSO the whole head skb was marked as LOST which is in violation of NewReno procedure that only wants to mark one packet and ended up breaking our TCP code by causing counter overflow because our code was built on top of assumption about valid NewReno procedure. This manifested as triggering a WARN_ON for the overflow in a number of places. It seems relatively safe alternative to just do nothing if tcp_fragment fails due to oom because another duplicate ACK is likely to be received soon and the fragmentation will be retried. Special thanks goes to Soeren Sonnenburg <kernel@nn7.de> who was lucky enough to be able to reproduce this so that the warning for the overflow was hit. It's not as easy task as it seems even if this bug happens quite often because the amount of outstanding data is pretty significant for the mismarkings to lead to an overflow. Because it's very late in 2.6.25-rc cycle (if this even makes in time), I didn't want to touch anything with SACK enabled here. Fragmenting might be useful for it as well but it's more or less a policy decision rather than mandatory fix. Thus there's no need to rush and we can postpone considering tcp_fragment with SACK for 2.6.26. In 2.6.24 and earlier, this very same bug existed but the effect is slightly different because of a small changes in the if conditions that fit to the patch's context. With them nothing got lost marker and thus no retransmissions happened. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-07 22:32:38 -07:00
Ilpo Järvinen	1b69d74539	[TCP]: Restore 2.6.24 mark_head_lost behavior for newreno/fack The fast retransmission can be forced locally to the rfc3517 branch in tcp_update_scoreboard instead of making such fragile constructs deeper in tcp_mark_head_lost. This is necessary for the next patch which must not have loopholes for cnt > packets check. As one can notice, readability got some improvements too because of this :-). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-07 22:31:38 -07:00
Denis V. Lunev	7feb49c82a	[NETNS]: Use TCP control socket from a correct namespace. Signed-off-by: Denis V.Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:32:00 -07:00
Denis V. Lunev	046ee90235	[NETNS]: Create tcp control socket in the each namespace. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:31:33 -07:00
Denis V. Lunev	5616bdd6df	[INET]: uc_ttl assignment in inet_ctl_sock_create is redundant. uc_ttl is initialized in inet(6)_create and never changed except setsockopt ioctl. Remove this assignment. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:30:12 -07:00
Denis V. Lunev	c1e9894d48	[ICMP]: Simplify ICMP control socket creation. Replace sock_create_kern with inet_ctl_sock_create. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:29:00 -07:00
Denis V. Lunev	5677242f43	[NETNS]: Inet control socket should not hold a namespace. This is a generic requirement, so make inet_ctl_sock_create namespace aware and create a inet_ctl_sock_destroy wrapper around sk_release_kernel. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:28:30 -07:00
Denis V. Lunev	eee4fe4ded	[INET]: Let inet_ctl_sock_create return sock rather than socket. All upper protocol layers are already use sock internally. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:27:58 -07:00
Denis V. Lunev	3d58b5fa8e	[INET]: Rename inet_csk_ctl_sock_create to inet_ctl_sock_create. This call is nothing common with INET connection sockets code. It simply creates an unhashes kernel sockets for protocol messages. Move the new call into af_inet.c after the rename. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:22:32 -07:00
Denis V. Lunev	14c0c8e8e0	[TCP]: Replace socket with sock for reset sending. Replace tcp_socket with tcp_sock. This is more effective (less derefferences on fast paths). Additionally, the approach is unified to one used in ICMP. Signed-off-by: Denis V. Lunev <den@openvz.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 14:19:38 -07:00
Herbert Xu	af2681828a	[ICMP]: Ensure that ICMP relookup maintains status quo The ICMP relookup path is only meant to modify behaviour when appropriate IPsec policies are in place and marked as requiring relookups. It is certainly not meant to modify behaviour when IPsec policies don't exist at all. However, due to an oversight on the error paths existing behaviour may in fact change should one of the relookup steps fail. This patch corrects this by redirecting all errors on relookup failures to the previous code path. That is, if the initial xfrm_lookup let the packet pass, we will stand by that decision should the relookup fail due to an error. This should be safe from a security point-of-view because compliant systems must install a default deny policy so the packet would'nt have passed in that case. Many thanks to Julian Anastasov for pointing out this error. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-04-03 12:52:19 -07:00
David S. Miller	e1ec1b8ccd	Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 Conflicts: drivers/net/s2io.c	2008-04-02 22:35:23 -07:00
Pavel Emelyanov	fd4e7b5045	[IPV4][NETNS]: Display per-net info in sockstat file. Besides, now we can see per-net fragments statistics in the same file, since this stats is already per-net. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:43:18 -07:00
Pavel Emelyanov	d0538ca355	[SOCK][NETNS]: Register sockstat(6) files in each net. Currently they live in init_net only, but now almost all the info they can provide is available per-net. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:42:37 -07:00
Pavel Emelyanov	c29a0bc4df	[SOCK][NETNS]: Add a struct net argument to sock_prot_inuse_add and _get. This counter is about to become per-proto-and-per-net, so we'll need two arguments to determine which cell in this "table" to work with. All the places, but proc already pass proper net to it - proc will be tuned a bit later. Some indentation with spaces in proc files is done to keep the file coding style consistent. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:41:46 -07:00
YOSHIFUJI Hideaki	b50660f1fe	[IP] UDP: Use SEQ_START_TOKEN. Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-31 19:38:15 -07:00
Denis V. Lunev	4ad96d39a2	[UDP]: Remove owner from udp_seq_afinfo. Move it to udp_seq_afinfo->seq_fops as should be. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:25:53 -07:00
Denis V. Lunev	3ba9441bdf	[UDP]: Place file operations directly into udp_seq_afinfo. No need to have separate never-used variable. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:25:32 -07:00
Denis V. Lunev	a2be75c182	[UDP]: Cleanup /proc/udp[6] creation/removal. Replace seq_open with seq_open_net and remove udp_seq_release completely. seq_release_net will do this job just fine. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:25:06 -07:00
Denis V. Lunev	dda61925f8	[UDP]: Move seq_ops from udp_iter_state to udp_seq_afinfo. No need to create seq_operations for each instance of 'netstat'. Signed-off-by: Denis V. Lunev <den@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2008-03-28 18:24:26 -07:00

... 2 3 4 5 6 ...

2700 commits