Commit Graph

1660 Commits (16e906812f885cf16d95577dba260db6375ba571)

Author SHA1 Message Date
David S. Miller 26722873a4 [TCP]: Describe tcp_init_cwnd() thoroughly in a comment.
People often get tripped up by this function and think that
it does not implemented the prescribed algorithms from
RFC2414 and RFC3390, even though it does.

So add a comment to head off such misunderstandings in the
future.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-26 18:35:36 -07:00
Flavio Leitner a96fb49be3 [NET]: Fix IP_ADD/DROP_MEMBERSHIP to handle only connectionless
Fix IP[V6]_ADD_MEMBERSHIP and IP[V6]_DROP_MEMBERSHIP to
return -EPROTO for connection oriented sockets.

Signed-off-by: Flavio Leitner <fleitner@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-26 18:35:35 -07:00
Nick Bowler 96fe1c0237 [IPSEC] AH4: Update IPv4 options handling to conform to RFC 4302.
In testing our ESP/AH offload hardware, I discovered an issue with how
AH handles mutable fields in IPv4.  RFC 4302 (AH) states the following
on the subject:

        For IPv4, the entire option is viewed as a unit; so even
        though the type and length fields within most options are immutable
        in transit, if an option is classified as mutable, the entire option
        is zeroed for ICV computation purposes.

The current implementation does not zero the type and length fields,
resulting in authentication failures when communicating with hosts
that do (i.e. FreeBSD).

I have tested record route and timestamp options (ping -R and ping -T)
on a small network involving Windows XP, FreeBSD 6.2, and Linux hosts,
with one router.  In the presence of these options, the FreeBSD and
Linux hosts (with the patch or with the hardware) can communicate.
The Windows XP host simply fails to accept these packets with or
without the patch.

I have also been trying to test source routing options (using
traceroute -g), but haven't had much luck getting this option to work
*without* AH, let alone with.

Signed-off-by: Nick Bowler <nbowler@ellipticsemi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-26 18:35:33 -07:00
Patrick McHardy 45241a7a07 [NETFILTER]: nf_nat_sip: don't drop short packets
Don't drop packets shorter than "SIP/2.0", just ignore them. Keep-alives
can validly be shorter for example.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-14 13:14:58 -07:00
Heiko Carstens cae7ca3d3d [IPVS]: Use IP_VS_WAIT_WHILE when encessary.
For architectures that don't have a volatile atomic_ts constructs like
while (atomic_read(&something)); might result in endless loops since a
barrier() is missing which forces the compiler to generate code that
actually reads memory contents.
Fix this in ipvs by using the IP_VS_WAIT_WHILE macro which resolves to
while (expr) { cpu_relax(); }
(why isn't this open coded btw?)

Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Simon Horman <horms@verge.net.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-13 22:52:15 -07:00
Jesper Juhl f49f9967b2 [IPV4]: Clean up duplicate includes in net/ipv4/
This patch cleans up duplicate includes in
	net/ipv4/

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-13 22:52:02 -07:00
Joakim Tjernlund dcbdc93c6c [IPCONFIG]: ip_auto_config fix
The following commandline:

 root=/dev/mtdblock6 rw rootfstype=jffs2 ip=192.168.1.10:::255.255.255.0:localhost.localdomain:eth1:off console=ttyS0,115200

makes ip_auto_config fall back to DHCP and complain "IP-Config: Incomplete
network configuration information." depending on if CONFIG_IP_PNP_DHCP is
set or not.

The only way I can make ip_auto_config accept my IP config is to add an
entry for the server IP:

ip=192.168.1.10:192.168.1.15::255.255.255.0:localhost.localdomain:eth1:off

I think this is a bug since I am not using a NFS root FS.

The following patch fixes the above problem.

From: Andrew Morton <akpm@linux-foundation.org>

Davem said (in February!):

  Well, first of all the change in question is not in 2.4.x either.  I just
  checked the current 2.4.x GIT tree and the test is exactly:

	if (ic_myaddr == INADDR_NONE ||
#ifdef CONFIG_ROOT_NFS
	    (MAJOR(ROOT_DEV) == UNNAMED_MAJOR
	     && root_server_addr == INADDR_NONE
	     && ic_servaddr == INADDR_NONE) ||
#endif
	    ic_first_dev->next) {

  which matches 2.6.x

  I even checked 2.4.x when it was branched for 2.5.x and the test was the
  same at the point in time too.

  Looking at the proposed change a bit it appears that it is probably
  correct, as it's trying to check that ROOT_DEV is nfs root.  But if it is
  correct then the UNNAMED_MAJOR comparison in the same code block should be
  removed as it becomes superfluous.

  I'm happy to apply this patch with that modification made.

Signed-off-by: Joakim Tjernlund <joakim.tjernlund@transmode.se>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-13 22:51:59 -07:00
Stephen Hemminger f34d1955df [TCP]: H-TCP maxRTT estimation at startup
Small patch to H-TCP from Douglas Leith. 

Fix estimation of maxRTT.  The original code ignores rtt measurements
during slow start (via the check tp->snd_ssthresh < 0xFFFF) yet this
is probably a good time to try to estimate max rtt as delayed acking
is disabled and slow start will only exit on a loss which presumably
corresponds to a maxrtt measurement.  Second, the original code (via
the check htcp_ccount(ca) > 3) ignores rtt data during what it
estimates to be the first 3 round-trip times.  This seems like an
unnecessary check now that the RCV timestamp are no longer used
for rtt estimation.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-07 18:29:05 -07:00
Patrick McHardy 591e620693 [NETFILTER]: nf_nat: add symbolic dependency on IPv4 conntrack
Loading nf_nat causes the conntrack core to be loaded, but we need IPv4 as
well.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-07 18:12:01 -07:00
Jesper Juhl 3af8e31cf5 [NETFILTER]: ipt_recent: avoid a possible NULL pointer deref in recent_seq_open()
If the call to seq_open() returns != 0 then the code calls 
kfree(st) but then on the very next line proceeds to 
dereference the pointer - not good.

Problem spotted by the Coverity checker.

Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-07 18:10:54 -07:00
Ilpo Järvinen 49ff4bb4cd [TCP]: DSACK signals data receival, be conservative
In case a DSACK is received, it's better to lower cwnd as it's
a sign of data receival.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:47:59 -07:00
Ilpo Järvinen 2e6052941a [TCP]: Also handle snd_una changes in tcp_cwnd_down
tcp_cwnd_down must check for it too as it should be conservative
in case of collapse stuff and also when receiver is trying to
lie (though that wouldn't be very successful/useful anyway).

Note:
- Separated also is_dupack and do_lost in fast_retransalert
	* Much cleaner look-and-feel now
	* This time it really fixes cumulative ACK with many new
	  SACK blocks recovery entry (I claimed this fixes with
	  last patch but it wasn't). TCP will now call
	  tcp_update_scoreboard regardless of is_dupack when
	  in recovery as long as there is enough fackets_out.
- Introduce FLAG_SND_UNA_ADVANCED
	* Some prior_snd_una arguments are unnecessary after it
- Added helper FLAG_ANY_PROGRESS to avoid long FLAG...|FLAG...
  constructs

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:46:58 -07:00
David S. Miller 3516ffb0fe [TCP]: Invoke tcp_sendmsg() directly, do not use inet_sendmsg().
As discovered by Evegniy Polyakov, if we try to sendmsg after
a connection reset, we can do incredibly stupid things.

The core issue is that inet_sendmsg() tries to autobind the
socket, but we should never do that for TCP.  Instead we should
just go straight into TCP's sendmsg() code which will do all
of the necessary state and pending socket error checks.

TCP's sendpage already directly vectors to tcp_sendpage(), so this
merely brings sendmsg() in line with that.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:42:28 -07:00
Mariusz Kozlowski 1bcabbdb0b [IPV4] route.c: mostly kmalloc + memset conversion to k[cz]alloc
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:42:27 -07:00
Mariusz Kozlowski 4487b2f657 [IPV4] raw.c: kmalloc + memset conversion to kzalloc
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:42:26 -07:00
Mariusz Kozlowski 8adc546552 [NETFILTER] nf_conntrack_l3proto_ipv4_compat.c: kmalloc + memset conversion to kzalloc
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-02 19:42:24 -07:00
Mariusz Kozlowski 376407039c [IPV4] ip_options.c: kmalloc + memset conversion to kzalloc
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 14:06:45 -07:00
Ilpo Järvinen b8ed601cef [TCP]: Bidir flow must not disregard SACK blocks for lost marking
It's possible that new SACK blocks that should trigger new LOST
markings arrive with new data (which previously made is_dupack
false). In addition, I think this fixes a case where we get
a cumulative ACK with enough SACK blocks to trigger the fast
recovery (is_dupack would be false there too).

I'm not completely pleased with this solution because readability
of the code is somewhat questionable as 'is_dupack' in SACK case
is no longer about dupacks only but would mean something like
'lost_marker_work_todo' too... But because of Eifel stuff done
in CA_Recovery, the FLAG_DATA_SACKED check cannot be placed to
the if statement which seems attractive solution. Nevertheless,
I didn't like adding another variable just for that either... :-)

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:31 -07:00
Ilpo Järvinen 1e757f9996 [TCP]: Fix ratehalving with bidirectional flows
Actually, the ratehalving seems to work too well, as cwnd is
reduced on every second ACK even though the packets in flight
remains unchanged. Recoveries in a bidirectional flows suffer
quite badly because of this, both NewReno and SACK are affected.

After this patch, rate halving is performed for ACK only if
packets in flight was supposedly changed too.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:30 -07:00
Herbert Xu b217d616a1 [IPV4/IPV6]: Fail registration if inet device construction fails
Now that netdev notifications can fail, we can use this to signal
errors during registration for IPv4/IPv6.  In particular, if we
fail to allocate memory for the inet device, we can fail the netdev
registration.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:16 -07:00
Herbert Xu ccc7911fbd [IPVS]: Use skb_forward_csum
As a path that forwards packets, IPVS should be using
skb_forward_csum instead of directly setting ip_summed
to CHECKSUM_NONE.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:28:11 -07:00
Stephen Hemminger 113bbbd8d2 [TCP]: htcp - use measured rtt
Change HTCP to use measured RTT rather than smooth RTT.
Srtt is computed using the TCP receive timestamp
options, so it is vulnerable to hostile receivers. To avoid any problems
this might cause use the measured RTT instead.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:27:59 -07:00
Stephen Hemminger e7d0c88586 [TCP]: cubic - eliminate use of receive time stamp
Remove use of received timestamp option value from RTT calculation in Cubic.
A hostile receiver may be returning a larger timestamp option than the original
value. This would cause the sender to believe the malevolent receiver had
a larger RTT and because Cubic tries to provide some RTT friendliness, the
sender would then favor the liar.

Instead, use the jiffie resolutionRTT value already computed and
passed back after ack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:27:58 -07:00
Stephen Hemminger 30cfd0baf0 [TCP]: congestion control API pass RTT in microseconds
This patch changes the API for the callback that is done after an ACK is
received. It solves a couple of issues:

  * Some congestion controls want higher resolution value of RTT
    (controlled by TCP_CONG_RTT_SAMPLE flag). These don't really want a ktime, but
    all compute a RTT in microseconds.

  * Other congestion control could use RTT at jiffies resolution.

To keep API consistent the units should be the same for both cases, just the
resolution should change.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-31 02:27:57 -07:00
Al Viro a34c45896a netfilter endian regressions
no real bugs, just misannotations cropping up

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-26 11:11:56 -07:00
Patrick McHardy 7e2acc7e27 [NETFILTER]: Fix logging regression
Loading one of the LOG target fails if a different target has already
registered itself as backend for the same family. This can affect the
ipt_LOG and ipt_ULOG modules when both are loaded.

Reported and tested by: <t.artem@mailcity.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-24 15:29:55 -07:00
Patrick McHardy fc7b93800b [IPV4]: Fix inetpeer gcc-4.2 warnings
CC      net/ipv4/inetpeer.o
net/ipv4/inetpeer.c: In function 'unlink_from_pool':
net/ipv4/inetpeer.c:297: warning: the address of 'stack' will always evaluate as 'true'
net/ipv4/inetpeer.c:297: warning: the address of 'stack' will always evaluate as 'true'
net/ipv4/inetpeer.c: In function 'inet_getpeer':
net/ipv4/inetpeer.c:409: warning: the address of 'stack' will always evaluate as 'true'
net/ipv4/inetpeer.c:409: warning: the address of 'stack' will always evaluate as 'true'

"Fix" by checking for != NULL.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-20 19:39:17 -07:00
Paul Mundt 20c2df83d2 mm: Remove slab destructors from kmem_cache_create().
Slab destructors were no longer supported after Christoph's
c59def9f22 change. They've been
BUGs for both slab and slub, and slob never supported them
either.

This rips out support for the dtor pointer from kmem_cache_create()
completely and fixes up every single callsite in the kernel (there were
about 224, not including the slab allocator definitions themselves,
or the documentation references).

Signed-off-by: Paul Mundt <lethal@linux-sh.org>
2007-07-20 10:11:58 +09:00
Al Viro c65c5131b3 missed cong_avoid() instance
Removal of rtt argument in ->cong_avoid() had missed tcp_htcp.c
instance.

Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 16:29:55 -07:00
Linus Torvalds ce8c2293be Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
* 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: (25 commits)
  [TG3]: Fix msi issue with kexec/kdump.
  [NET] XFRM: Fix whitespace errors.
  [NET] TIPC: Fix whitespace errors.
  [NET] SUNRPC: Fix whitespace errors.
  [NET] SCTP: Fix whitespace errors.
  [NET] RXRPC: Fix whitespace errors.
  [NET] ROSE: Fix whitespace errors.
  [NET] RFKILL: Fix whitespace errors.
  [NET] PACKET: Fix whitespace errors.
  [NET] NETROM: Fix whitespace errors.
  [NET] NETFILTER: Fix whitespace errors.
  [NET] IPV4: Fix whitespace errors.
  [NET] DCCP: Fix whitespace errors.
  [NET] CORE: Fix whitespace errors.
  [NET] BLUETOOTH: Fix whitespace errors.
  [NET] AX25: Fix whitespace errors.
  [PATCH] mac80211: remove rtnl locking in ieee80211_sta.c
  [PATCH] mac80211: fix GCC warning on 64bit platforms
  [GENETLINK]: Dynamic multicast groups.
  [NETLIKN]: Allow removing multicast groups.
  ...
2007-07-19 10:23:21 -07:00
Yoann Padioleau dd00cc486a some kmalloc/memset ->kzalloc (tree wide)
Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).

Here is a short excerpt of the semantic patch performing
this transformation:

@@
type T2;
expression x;
identifier f,fld;
expression E;
expression E1,E2;
expression e1,e2,e3,y;
statement S;
@@

 x =
- kmalloc
+ kzalloc
  (E1,E2)
  ...  when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
- memset((T2)x,0,E1);

@@
expression E1,E2,E3;
@@

- kzalloc(E1 * E2,E3)
+ kcalloc(E1,E2,E3)

[akpm@linux-foundation.org: get kcalloc args the right way around]
Signed-off-by: Yoann Padioleau <padator@wanadoo.fr>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Acked-by: Russell King <rmk@arm.linux.org.uk>
Cc: Bryan Wu <bryan.wu@analog.com>
Acked-by: Jiri Slaby <jirislaby@gmail.com>
Cc: Dave Airlie <airlied@linux.ie>
Acked-by: Roland Dreier <rolandd@cisco.com>
Cc: Jiri Kosina <jkosina@suse.cz>
Acked-by: Dmitry Torokhov <dtor@mail.ru>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Acked-by: Mauro Carvalho Chehab <mchehab@infradead.org>
Acked-by: Pierre Ossman <drzeus-list@drzeus.cx>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Acked-by: Greg KH <greg@kroah.com>
Cc: James Bottomley <James.Bottomley@steeleye.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:50 -07:00
Michael Ellerman 9e367d8592 jprobes: remove JPROBE_ENTRY()
AFAICT now that jprobe.entry is a void *, JPROBE_ENTRY doesn't do anything
useful - so remove it ..

I've left a do-nothing version so that out-of-tree jprobes code will still
compile without modifications.

Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: Prasanna S Panchamukhi <prasanna@in.ibm.com>
Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com>
Cc: Anil S Keshavamurthy <anil.s.keshavamurthy@intel.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-07-19 10:04:44 -07:00
YOSHIFUJI Hideaki 9c681b43fa [NET] IPV4: Fix whitespace errors.
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
2007-07-19 10:43:47 +09:00
Stephen Hemminger 16751347a0 [TCP]: remove unused argument to cong_avoid op
None of the existing TCP congestion controls use the rtt value pased
in the ca_ops->cong_avoid interface.  Which is lucky because seq_rtt
could have been -1 when handling a duplicate ack.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-18 01:46:58 -07:00
Ilpo Järvinen 0a9f2a467d [TCP]: Verify the presence of RETRANS bit when leaving FRTO
For yet unknown reason, something cleared SACKED_RETRANS bit
underneath FRTO.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-15 00:19:29 -07:00
Jean Delvare 1b1ac759d7 [IPV4]: Cleanup call to __neigh_lookup()
Back in the times of Linux 2.2, negative values for the creat parameter
of __neigh_lookup() had a particular meaning, but no longer, so we
should pass 1 instead.

Signed-off-by: Jean Delvare <khali@linux-fr.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:51:44 -07:00
Patrick McHardy 61075af51f [NETFILTER]: nf_conntrack: mark protocols __read_mostly
Also remove two unnecessary EXPORT_SYMBOLs and move the
nf_conntrack_l3proto_ipv4 declaration to the correct file.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:48:19 -07:00
Patrick McHardy a887c1c148 [NETFILTER]: Lower *tables printk severity
Lower ip6tables, arptables and ebtables printk severity similar to
Dan Aloni's patch for iptables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:46:15 -07:00
Yasuyuki Kozakai 130e7a83d7 [NETFILTER]: nf_conntrack: Don't track locally generated special ICMP error
The conntrack assigned to locally generated ICMP error is usually the one
assigned to the original packet which has caused the error. But if
the original packet is handled as invalid by nf_conntrack, no conntrack
is assigned to the original packet. Then nf_ct_attach() cannot assign
any conntrack to the ICMP error packet. In that case the current
nf_conntrack_icmp assigns appropriate conntrack to it. But the current
code mistakes the direction of the packet. As a result, NAT code mistakes
the address to be mangled.

To fix the bug, this changes nf_conntrack_icmp not to assign conntrack
to such ICMP error. Actually no address is necessary to be mangled
in this case.

Spotted by Jordan Russell.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:45:41 -07:00
Yasuyuki Kozakai e2a3123fbe [NETFILTER]: nf_conntrack: Introduces nf_ct_get_tuplepr and uses it
nf_ct_get_tuple() requires the offset to transport header and that bothers
callers such as icmp[v6] l4proto modules. This introduces new function
to simplify them.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:45:14 -07:00
Yasuyuki Kozakai ffc3069048 [NETFILTER]: nf_conntrack: make l3proto->prepare() generic and renames it
The icmp[v6] l4proto modules parse headers in ICMP[v6] error to get tuple.
But they have to find the offset to transport protocol header before that.
Their processings are almost same as prepare() of l3proto modules.
This makes prepare() more generic to simplify icmp[v6] l4proto module
later.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 20:44:50 -07:00
Adrian Bunk acd159b6b5 [INET_SOCK]: make net/ipv4/inet_timewait_sock.c:__inet_twsk_kill() static
This patch makes the needlessly global __inet_twsk_kill() static.

Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 19:00:59 -07:00
Stephen Hemminger b3b0b681b1 [TCP]: tcp probe add back ssthresh field
Sangtae noticed the ssthresh got missed.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-14 18:57:19 -07:00
Linus Torvalds e030dbf91a Merge branch 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop
* 'ioat-md-accel-for-linus' of git://lost.foo-projects.org/~dwillia2/git/iop: (28 commits)
  ioatdma: add the unisys "i/oat" pci vendor/device id
  ARM: Add drivers/dma to arch/arm/Kconfig
  iop3xx: surface the iop3xx DMA and AAU units to the iop-adma driver
  iop13xx: surface the iop13xx adma units to the iop-adma driver
  dmaengine: driver for the iop32x, iop33x, and iop13xx raid engines
  md: remove raid5 compute_block and compute_parity5
  md: handle_stripe5 - request io processing in raid5_run_ops
  md: handle_stripe5 - add request/completion logic for async expand ops
  md: handle_stripe5 - add request/completion logic for async read ops
  md: handle_stripe5 - add request/completion logic for async check ops
  md: handle_stripe5 - add request/completion logic for async compute ops
  md: handle_stripe5 - add request/completion logic for async write ops
  md: common infrastructure for running operations with raid5_run_ops
  md: raid5_run_ops - run stripe operations outside sh->lock
  raid5: replace custom debug PRINTKs with standard pr_debug
  raid5: refactor handle_stripe5 and handle_stripe6 (v3)
  async_tx: add the async_tx api
  xor: make 'xor_blocks' a library routine for use with async_tx
  dmaengine: make clients responsible for managing channels
  dmaengine: refactor dmaengine around dma_async_tx_descriptor
  ...
2007-07-13 10:52:27 -07:00
Stephen Hemminger 662ad4f8ef [TCP]: tcp probe wraparound handling and other changes
Switch from formatting messages in probe routine and copying with
kfifo, to using a small circular queue of information and formatting
on read.  This avoids wraparound issues with kfifo, and saves one
copy.

Also make sure to state correct license, rather than copying off some
other driver I started with.

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-11 19:45:39 -07:00
Andrew Morton e00c5d8b4d I/OAT: warning fix
net/ipv4/tcp.c: In function 'tcp_recvmsg':
net/ipv4/tcp.c:1111: warning: unused variable 'available'

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Chris Leech <christopher.leech@intel.com>
2007-07-11 16:10:53 -07:00
Chris Leech 2b1244a43b I/OAT: Only offload copies for TCP when there will be a context switch
The performance wins come with having the DMA copy engine doing the copies
in parallel with the context switch.  If there is enough data ready on the
socket at recv time just use a regular copy.

Signed-off-by: Chris Leech <christopher.leech@intel.com>
2007-07-11 16:10:53 -07:00
Philippe De Muyter 56b3d975bb [NET]: Make all initialized struct seq_operations const.
Make all initialized struct seq_operations in net/ const

Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 23:07:31 -07:00
Patrick McHardy 3be550f34b [UDP]: Fix length check.
Rémi Denis-Courmont wrote:
> Right. By the way, shouldn't "len" rather be signed in there?
> 
> 		unsigned int len;
> 
> 		/* if we're overly short, let UDP handle it */
> 		len = skb->len - sizeof(struct udphdr);
> 		if (len <= 0)
> 			goto udp;

It should, but the < 0 case can't happen since __udp4_lib_rcv
already makes sure that we have at least a complete UDP header.

Anyways, this patch fixes it.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 23:06:43 -07:00
Patrick McHardy cfbba49d80 [NET]: Avoid copying writable clones in tunnel drivers
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:19:05 -07:00
Philippe De Muyter 4839c52b01 [IPV4]: Make ip_tos2prio const.
Signed-off-by: Philippe De Muyter <phdm@macqel.be>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:19:04 -07:00
Dan Aloni 0236e667e1 [NETFILTER] net/ipv4/netfilter/ip_tables.c: lower printk severity
Signed-off-by: Dan Aloni <da-x@monatomic.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:53 -07:00
Patrick McHardy 0d53778e81 [NETFILTER]: Convert DEBUGP to pr_debug
Convert DEBUGP to pr_debug and fix lots of non-compiling debug statements.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:20 -07:00
Patrick McHardy d3c3f4243e [NETFILTER]: ipt_CLUSTERIP: add compat code
Adjust structure size and don't expect pointers passed in from
userspace to be valid. Also replace an enum in an ABI structure
by a fixed size type.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:17 -07:00
Patrick McHardy 3569b621ce [NETFILTER]: ipt_SAME: add to feature-removal-schedule
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:16 -07:00
Patrick McHardy 5d08ad440f [NETFILTER]: nf_conntrack_expect: convert proc functions to hash
Convert from the global expectation list to the hash table.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:18:00 -07:00
Patrick McHardy d4156e8cd9 [NETFILTER]: nf_conntrack: reduce masks to a subset of tuples
Since conntrack currently allows to use masks for every bit of both
helper and expectation tuples, we can't hash them and have to keep
them on two global lists that are searched for every new connection.

This patch removes the never used ability to use masks for the
destination part of the expectation tuple and completely removes
masks from helpers since the only reasonable choice is a full
match on l3num, protonum and src.u.all.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:55 -07:00
Patrick McHardy 6823645d60 [NETFILTER]: nf_conntrack_expect: function naming unification
Currently there is a wild mix of nf_conntrack_expect_, nf_ct_exp_,
expect_, exp_, ...

Consistently use nf_ct_ as prefix for exported functions.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:53 -07:00
Patrick McHardy 53aba5979e [NETFILTER]: nf_nat: use hlists for bysource hash
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:43 -07:00
Patrick McHardy 330f7db5e5 [NETFILTER]: nf_conntrack: remove 'ignore_conntrack' argument from nf_conntrack_find_get
All callers pass NULL, this also doesn't seem very useful for modules.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:41 -07:00
Patrick McHardy f205c5e0c2 [NETFILTER]: nf_conntrack: use hlists for conntrack hash
Convert conntrack hash to hlists to reduce its size and cache
footprint. Since the default hashsize to max. entries ratio
sucks (1:16), this patch doesn't reduce the amount of memory
used for the hash by default, but instead uses a better ratio
of 1:8, which results in the same max. entries value.

One thing worth noting is early_drop. It really should use LRU,
so it now has to iterate over the entire chain to find the last
unconfirmed entry. Since chains shouldn't be very long and the
entire operation is very rare this shouldn't be a problem.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:40 -07:00
Patrick McHardy 61eb3107cd [NETFILTER]: nf_conntrack_extend: use __read_mostly for struct nf_ct_ext_type
Also make them static.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:38 -07:00
Yasuyuki Kozakai b6b84d4a94 [NETFILTER]: nf_nat: merge nf_conn and nf_nat_info
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:37 -07:00
Yasuyuki Kozakai d8a0509a69 [NETFILTER]: nf_nat: kill global 'destroy' operation
This kills the global 'destroy' operation which was used by NAT.
Instead it uses the extension infrastructure so that multiple
extensions can register own operations.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:36 -07:00
Yasuyuki Kozakai dacd2a1a5c [NETFILTER]: nf_conntrack: remove old memory allocator of conntrack
Now memory space for help and NAT are allocated by extension
infrastructure.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:35 -07:00
Yasuyuki Kozakai ff09b7493c [NETFILTER]: nf_nat: remove unused nf_nat_module_is_loaded
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:34 -07:00
Yasuyuki Kozakai 2d59e5ca8c [NETFILTER]: nf_nat: use extension infrastructure
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:20 -07:00
Yasuyuki Kozakai e54cbc1f91 [NETFILTER]: nf_nat: add reference to conntrack from entry of bysource list
I will split 'struct nf_nat_info' out from conntrack. So I cannot use
'offsetof' to get the pointer to conntrack from it.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:19 -07:00
Yasuyuki Kozakai ceceae1b15 [NETFILTER]: nf_conntrack: use extension infrastructure for helper
Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:18 -07:00
Patrick McHardy 9f15c5302d [NETFILTER]: x_tables: mark matches and targets __read_mostly
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:15 -07:00
Jozsef Kadlecsik ba9dda3ab5 [NETFILTER]: x_tables: add TRACE target
The TRACE target can be used to follow IP and IPv6 packets through
the ruleset.

Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Signed-off-by: Patrick NcHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:14 -07:00
Jerome Borsboom f4a607bfae [NETFILTER]: nf_nat_sip: only perform RTP DNAT if SIP session was SNATed
DNAT of the the RTP session is only necessary if the SIP session has
been SNATed.

Signed-off-by: Jerome Borsboom <j.borsboom@erasmusmc.nl>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:12 -07:00
Jan Engelhardt 7c4e36bc17 [NETFILTER]: Remove redundant parentheses/braces
Removes redundant parentheses and braces (And add one pair in a
xt_tcpudp.c macro).

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:11 -07:00
Jan Engelhardt 170b197c0a [NETFILTER]: Remove incorrect inline markers
device_cmp: the function's address is taken (call to nf_ct_iterate_cleanup)
alloc_null_binding: referenced externally

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:02 -07:00
Jan Engelhardt a47362a226 [NETFILTER]: add some consts, remove some casts
Make a number of variables const and/or remove unneeded casts.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:17:01 -07:00
Jan Engelhardt e1931b784a [NETFILTER]: x_tables: switch xt_target->checkentry to bool
Switch the return type of target checkentry functions to boolean.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:59 -07:00
Jan Engelhardt ccb79bdce7 [NETFILTER]: x_tables: switch xt_match->checkentry to bool
Switch the return type of match functions to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:58 -07:00
Jan Engelhardt 1d93a9cbad [NETFILTER]: x_tables: switch xt_match->match to bool
Switch the return type of match functions to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:57 -07:00
Jan Engelhardt cff533ac12 [NETFILTER]: x_tables: switch hotdrop to bool
Switch the "hotdrop" variables to boolean

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:56 -07:00
James Chapman 067b207b28 [UDP]: Cleanup UDP encapsulation code
This cleanup fell out after adding L2TP support where a new encap_rcv
funcptr was added to struct udp_sock. Have XFRM use the new encap_rcv
funcptr, which allows us to move the XFRM encap code from udp.c into
xfrm4_input.c.

Make xfrm4_rcv_encap() static since it is no longer called externally.

Signed-off-by: James Chapman <jchapman@katalix.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:53 -07:00
Ilpo Järvinen d041005116 [TCP]: SACK fastpath did override adjusted fackets_out
Do same adjustment to SACK fastpath counters provided that
they're valid.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:16:24 -07:00
James Chapman 342f0234c7 [UDP]: Introduce UDP encapsulation type for L2TP
This patch adds a new UDP_ENCAP_L2TPINUDP encapsulation type for UDP
sockets. When a UDP socket's encap_type is UDP_ENCAP_L2TPINUDP, the
skb is delivered to a function pointed to by the udp_sock's
encap_rcv funcptr. If the skb isn't wanted by L2TP, it returns >0, which
causes it to be passed through to UDP.

Include padding to put the new encap_rcv field on a 4-byte boundary.

Previously, the only user of UDP encap sockets was ESP, so when
CONFIG_XFRM was not defined, some of the encap code was compiled
out. This patch changes that. As a result, udp_encap_rcv() will
now do a little more work when CONFIG_XFRM is not defined.

Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:57 -07:00
Stephen Hemminger d212f87b06 [NET]: IPV6 checksum offloading in network devices
The existing model for checksum offload does not correctly handle
devices that can offload IPV4 and IPV6 only. The NETIF_F_HW_CSUM flag
implies device can do any arbitrary protocol.

This patch:
 * adds NETIF_F_IPV6_CSUM for those devices
 * fixes bnx2 and tg3 devices that need it
 * add NETIF_F_IPV6_CSUM to ipv6 output (incl GSO)
 * fixes assumptions about NETIF_F_ALL_CSUM in nat
 * adjusts bridge union of checksumming computation

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:52 -07:00
Masahide NAKAMURA d3d6dd3ada [XFRM]: Add module alias for transformation type.
It is clean-up for XFRM type modules and adds aliases with its
protocol:
 ESP, AH, IPCOMP, IPIP and IPv6 for IPsec
 ROUTING and DSTOPTS for MIPv6

It is almost the same thing as XFRM mode alias, but it is added
new defines XFRM_PROTO_XXX for preprocessing since some protocols
are defined as enum.

Signed-off-by: Masahide NAKAMURA <nakam@linux-ipv6.org>
Acked-by: Ingo Oeser <netdev@axxeo.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:15:43 -07:00
Herbert Xu a7ab4b501f [TCPv4]: Improve BH latency in /proc/net/tcp
Currently the code for /proc/net/tcp disable BH while iterating
over the entire established hash table.  Even though we call
cond_resched_softirq for each entry, we still won't process
softirq's as regularly as we would otherwise do which results
in poor performance when the system is loaded near capacity.

This anomaly comes from the 2.4 code where this was all in a
single function and the local_bh_disable might have made sense
as a small optimisation.

The cost of each local_bh_disable is so small when compared
against the increased latency in keeping it disabled over a
large but mostly empty TCP established hash table that we
should just move it to the individual read_lock/read_unlock
calls as we do in inet_diag.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:06:20 -07:00
David S. Miller e06e7c6158 [IPV4]: The scheduled removal of multipath cached routing support.
With help from Chris Wedgwood.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-07-10 22:05:57 -07:00
Jens Axboe ddb61a57bb [TCP] tcp_read_sock: Allow recv_actor() return return negative error value.
tcp_read_sock() currently assumes that the recv_actor() only returns
number of bytes copied. For network splice receive, we may have to
return an error in some cases. So allow the actor to return a negative
error value.

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-23 23:07:50 -07:00
Neil Horman cc0191aeef [IPVS]: Fix state variable on failure to start ipvs threads
ip_vs currently fails to reset its ip_vs_sync_state variable if the
sync thread fails to start properly.  The result is that the kernel
will report a running daemon when their actuall is none.

If you issue the following commands:

1. ipvsadm --start-daemon master --mcast-interface bla
2. ipvsadm -L --daemon
3. ipvsadm --stop-daemon master

Assuming that bla is not an actual interface, step 2 should return no
data, but instead returns:

$ ipvsadm -L --daemon
master sync daemon (mcast=bla, syncid=0)

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-18 22:33:20 -07:00
Ilpo Järvinen 7769f4064c [TCP]: Fix logic breakage due to DSACK separation
Commit 6f74651ae6 is found guilty
of breaking DSACK counting, which should be done only for the
SACK block reported by the DSACK instead of every SACK block
that is received along with DSACK information.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-15 15:14:04 -07:00
Ilpo Järvinen b9ce204f0a [TCP]: Congestion control API RTT sampling fix
Commit 164891aadf broke RTT
sampling of congestion control modules. Inaccurate timestamps
could be fed to them without providing any way for them to
identify such cases. Previously RTT sampler was called only if
FLAG_RETRANS_DATA_ACKED was not set filtering inaccurate
timestamps nicely. In addition, the new behavior could give an
invalid timestamp (zero) to RTT sampler if only skbs with
TCPCB_RETRANS were ACKed. This solves both problems.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-15 15:08:43 -07:00
Ilpo Järvinen d7ea5b91fa [TCP]: Add missing break to TCP option parsing code
This flaw does not affect any behavior (currently).

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-14 12:58:26 -07:00
David S. Miller 66e1e3b20c [TCP]: Set initial_ssthresh default to zero in Cubic and BIC.
Because of the current default of 100, Cubic and BIC perform very
poorly compared to standard Reno.

In the worst case, this change makes Cubic and BIC as aggressive as
Reno.  So this change should be very safe.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-13 01:03:53 -07:00
Ilpo Järvinen af15cc7b85 [TCP]: Fix left_out setting during FRTO
Without FRTO, the tcp_try_to_open is never called with
lost_out > 0 (see tcp_time_to_recover). However, when FRTO is
enabled, the !tp->lost condition is not used until end of FRTO
because that way TCP avoids premature entry to fast recovery
during FRTO.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-12 16:16:44 -07:00
David S. Miller 3d7dbeac58 [TCP]: Disable TSO if MD5SIG is enabled.
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-12 14:36:42 -07:00
Paul Moore 50e5d35ce2 [CIPSO]: Fix several unaligned kernel accesses in the CIPSO engine.
IPv4 options are not very well aligned within the packet and the
format of a CIPSO option is even worse.  The result is that the CIPSO
engine in the kernel does a few unaligned accesses when parsing and
validating incoming packets with CIPSO options attached which generate
error messages on certain alignment sensitive platforms.  This patch
fixes this by marking these unaligned accesses with the
get_unaliagned() macro.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08 13:33:10 -07:00
Paul Moore ba6ff9f2b5 [NetLabel]: consolidate the struct socket/sock handling to just struct sock
The current NetLabel code has some redundant APIs which allow both
"struct socket" and "struct sock" types to be used; this may have made
sense at some point but it is wasteful now.  Remove the functions that
operate on sockets and convert the callers.  Not only does this make
the code smaller and more consistent but it pushes the locking burden
up to the caller which can be more intelligent about the locks.  Also,
perform the same conversion (socket to sock) on the SELinux/NetLabel
glue code where it make sense.

Signed-off-by: Paul Moore <paul.moore@hp.com>
Acked-by: James Morris <jmorris@namei.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08 13:33:09 -07:00
Herbert Xu 6363097cc4 [IPV4]: Do not remove idev when addresses are cleared
Now that we create idev before addresses are added, it no longer makes
sense to remove them when addresses are all deleted.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-08 13:33:08 -07:00
David S. Miller df2bc459a3 [UDP]: Revert 2-pass hashing changes.
This reverts changesets:

6aaf47fa48
b7b5f487ab
de34ed91c4
fc038410b4

There are still some correctness issues recently
discovered which do not have a known fix that doesn't
involve doing a full hash table scan on port bind.

So revert for now.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:50 -07:00
Dmitry Mishin 4c1b52bc7a [NETFILTER]: ip_tables: fix compat related crash
check_compat_entry_size_and_hooks iterates over the matches and calls
compat_check_calc_match, which loads the match and calculates the
compat offsets, but unlike the non-compat version, doesn't call
->checkentry yet. On error however it calls cleanup_matches, which in
turn calls ->destroy, which can result in crashes if the destroy
function (validly) expects to only get called after the checkentry
function.

Add a compat_release_match function that only drops the module reference
on error and rename compat_check_calc_match to compat_find_calc_match to
reflect the fact that it doesn't call the checkentry function.

Reported by Jan Engelhardt <jengelh@linux01.gwdg.de>

Signed-off-by: Dmitry Mishin <dim@openvz.org>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:32 -07:00
Patrick McHarrdy 3c158f7f57 [NETFILTER]: nf_conntrack: fix helper module unload races
When a helper module is unloaded all conntracks refering to it have their
helper pointer NULLed out, leading to lots of races. In most places this
can be fixed by proper use of RCU (they do already check for != NULL,
but in a racy way), additionally nf_conntrack_expect_related needs to
bail out when no helper is present.

Also remove two paranoid BUG_ONs in nf_conntrack_proto_gre that are racy
and not worth fixing.

Signed-off-by: Patrick McHarrdy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:26 -07:00
Patrick McHardy ef7c79ed64 [NETLINK]: Mark netlink policies const
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:10 -07:00
David S. Miller 14a49e1fd2 [TCP] tcp_probe: Attach printf attribute properly to printl().
GCC doesn't like the way Stephen initially did it:

net/ipv4/tcp_probe.c:83: warning: empty declaration

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:09 -07:00
Eric Dumazet 274707cff9 [TCP]: Use LIMIT_NETDEBUG in tcp_retransmit_timer().
LIMIT_NETDEBUG allows the admin to disable some warning messages (echo 0
 >/proc/sys/net/core/warnings).

The "TCP: Treason uncloaked!" message can use this facility.

Signed-off-by: Eric Dumazet <dada1@cosmosbay.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:40:08 -07:00
Herbert Xu 71e27da961 [IPV4]: Restore old behaviour of default config values
Previously inet devices were only constructed when addresses are added
(or rarely in ipmr).  Therefore the default config values they get are
the ones at the time of these operations.

Now that we're creating inet devices earlier, this changes the
behaviour of default config values in an incompatible way (see bug
#8519).

This patch creates a compromise by setting the default values at the
same point as before but only for those that have not been explicitly
set by the user since the inet device's creation.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:39:26 -07:00
Herbert Xu 31be308541 [IPV4]: Add default config support after inetdev_init
Previously once inetdev_init has been called on a device any changes
made to ipv4_devconf_dflt would have no effect on that device's
configuration.

This creates a problem since we have moved the point where
inetdev_init is called from when an address is added to where the
device is registered.

This patch is the first half of a set that tries to mimic the old
behaviour while still calling inetdev_init.

It propagates any changes to ipv4_devconf_dflt to those devices that
have not had the corresponding attribute set.

The next patch will forcibly set all values at the point where
inetdev_init was previously called.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:39:19 -07:00
Herbert Xu 42f811b8bc [IPV4]: Convert IPv4 devconf to an array
This patch converts the ipv4_devconf config members (everything except
sysctl) to an array.  This allows easier manipulation which will be
needed later on to provide better management of default config values.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:39:13 -07:00
Herbert Xu 8d76527e72 [IPV4]: Only panic if inetdev_init fails for loopback
When I made the inetdev_init call work on all devices I incorrectly
left in the panic call as well.  It is obviously undesirable to
panic on an allocation failure for a normal network device.  This
patch moves the panic call under the loopback if clause.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:39:03 -07:00
Patrick McHardy f0e48dbfc5 [TCP]: Honour sk_bound_dev_if in tcp_v4_send_ack
A time_wait socket inherits sk_bound_dev_if from the original socket,
but it is not used when sending ACK packets using ip_send_reply.

Fix by passing the oif to ip_send_reply in struct ip_reply_arg and
use it for output routing.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-07 13:38:51 -07:00
Patrick McHardy 6e1d91039b [ICMP]: Fix icmp_errors_use_inbound_ifaddr sysctl
Currently when icmp_errors_use_inbound_ifaddr is set and an ICMP error is
sent after the packet passed through ip_output(), an address from the
outgoing interface is chosen as ICMP source address since skb->dev doesn't
point to the incoming interface anymore.

Fix this by doing an interface lookup on rt->dst.iif and using that device.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03 18:08:51 -07:00
Wei Dong 584bdf8cbd [IPV4]: Fix "ipOutNoRoutes" counter error for TCP and UDP
Signed-off-by: Wei Dong <weidong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03 18:08:50 -07:00
Ilpo Järvinen 6418204f91 [TCP]: Fix GSO ignorance of pkts_acked arg (cong.cntrl modules)
The code used to ignore GSO completely, passing either way too
small or zero pkts_acked when GSO skb or part of it got ACKed.
In addition, there is no need to calculate the value in the loop
but simple arithmetics after the loop is sufficient. There is
no need to handle SYN case specially because congestion control
modules are not yet initialized when FLAG_SYN_ACKED is set.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03 18:08:48 -07:00
Mark Glines 3f196eb519 [TCP]: Use default 32768-61000 outgoing port range in all cases.
This diff changes the default port range used for outgoing connections,
from "use 32768-61000 in most cases, but use N-4999 on small boxes
(where N is a multiple of 1024, depending on just *how* small the box
is)" to just "use 32768-61000 in all cases".

I don't believe there are any drawbacks to this change, and it keeps
outgoing connection ports farther away from the mess of
IANA-registered ports.

Signed-off-by: Mark Glines <mark@glines.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-06-03 18:08:43 -07:00
Stephen Hemminger 67403754bc [TCP] tcp_probe: use GCC printf attribute
The function in tcp_probe is printf like, use GCC to check the args.

Sighed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:37 -07:00
Sangtae Ha 63313494c4 [TCP] tcp_probe: a trivial fix for mismatched number of printl arguments.
Just a fix to correct the number of printl arguments. Now, srtt is
logging correctly.

Signed-off-by: Sangtae Ha <sangtae.ha@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:36 -07:00
Pavel Emelianov e4fd5da39f [TCP]: Consolidate checking for tcp orphan count being too big.
tcp_out_of_resources() and tcp_close() perform the
same checking of number of orphan sockets. Move this
code into common place.

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:34 -07:00
David S. Miller ddc31ce311 [IPV4]: Kill references to bogus non-existent CONFIG_IP_NOSIOCRT
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:29 -07:00
Kazunori MIYAZAWA f282d45cb4 [IPSEC]: Fix panic when using inter address familiy IPsec on loopback.
Signed-off-by: Kazunori MIYAZAWA <kazunori@miyazawa.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-31 01:23:28 -07:00
David S. Miller 14e50e57ae [XFRM]: Allow packet drops during larval state resolution.
The current IPSEC rule resolution behavior we have does not work for a
lot of people, even though technically it's an improvement from the
-EAGAIN buisness we had before.

Right now we'll block until the key manager resolves the route.  That
works for simple cases, but many folks would rather packets get
silently dropped until the key manager resolves the IPSEC rules.

We can't tell these folks to "set the socket non-blocking" because
they don't have control over the non-block setting of things like the
sockets used to resolve DNS deep inside of the resolver libraries in
libc.

With that in mind I coded up the patch below with some help from
Herbert Xu which provides packet-drop behavior during larval state
resolution, controllable via sysctl and off by default.

This lays the framework to either:

1) Make this default at some point or...

2) Move this logic into xfrm{4,6}_policy.c and implement the
   ARP-like resolution queue we've all been dreaming of.
   The idea would be to queue packets to the policy, then
   once the larval state is resolved by the key manager we
   re-resolve the route and push the packets out.  The
   packets would timeout if the rule didn't get resolved
   in a certain amount of time.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24 18:17:54 -07:00
Jing Min Zhao 1ff75ed254 [NETFILTER]: nf_nat_h323: call set_h225_addr instead of set_h225_addr_hook
They're the same.

Signed-off-by: Jing Min Zhao <zhaojingmin@vivecode.com>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24 16:44:40 -07:00
Patrick McHardy 25b86e0546 [NETFILTER]: nf_conntrack_ftp: fix newline sequence number calculation
When the packet size is changed by the FTP NAT helper, the connection
tracking helper adjusts the sequence number of the newline character
by the size difference. This is wrong because NAT sequence number
adjustment happens after helpers are called, so the unadjusted number
is compared to the already adjusted one.

Based on report by YU, Haitao <yuhaitao@tsinghua.org.cn>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24 16:41:50 -07:00
Milan Kocian b8f5583135 [RTNETLINK]: Fix sending netlink message when replace route.
When you replace route via ip r r command the netlink multicast message is
not send.  This patch corrects it.  NL message is sent with NLM_F_REPLACE
flag.

Addresses http://bugzilla.kernel.org/show_bug.cgi?id=8320

Signed-off-by: Milan Kocian <milon@wq.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24 16:36:53 -07:00
Jan Engelhardt a6938a1e0e [IPVS]: Use menuconfig objects.
Use menuconfigs instead of menus, so the whole menu can be disabled at once
instead of going through all options.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>
Acked-by: Simon Horman <horms@verge.net.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-24 16:36:47 -07:00
Patrick McHardy d8cf27287a [IPV4]: icmp: fix crash with sysctl_icmp_errors_use_inbound_ifaddr
When icmp_send is called on the local output path before the
packet hits ip_output, skb->dev is not set, causing a crash
when sysctl_icmp_errors_use_inbound_ifaddr is set. This can
happen with the netfilter REJECT target or IPsec tunnels.

Let routing decide the ICMP source address in that case, since the
packet is locally generated there is no inbound interface and
the sysctl should not apply.

The option actually seems to be unfixable broken, on the path
after ip_output() skb->dev points to the outgoing device and
we don't know the incoming device anymore, so its going to do
the absolute wrong thing and pick the address of the outgoing
interface. Add a comment about this.

Reported by Curtis Doty <Curtis@GreenKey.net>.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19 14:44:15 -07:00
Patrick McHardy 3ad2a6fb6b [NETFILTER]: nf_conntrack_ipv4: fix incorrect #ifdef config name
The option is named CONFIG_NF_NAT not CONFIG_IP_NF_NAT. Remove the ifdef
completely since helpers also expect defragmented packet even without
NAT.

Noticed by Robert P. J. Day <rpjday@mindspring.com>

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19 14:24:16 -07:00
Ilpo Järvinen 580e572a4a [TCP] FRTO: Prevent state inconsistency in corner cases
State could become inconsistent in two cases:

1) Userspace disabled FRTO by tuning sysctl when one of the TCP
   flows was in the middle of FRTO algorithm (and then RTO is
   again triggered)

2) SACK reneging occurs during FRTO algorithm

A simple solution is just to abort the previous FRTO when such
obscure condition occurs...

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19 13:56:57 -07:00
Ilpo Järvinen 463236557d [TCP] FRTO: Add missing ECN CWR sending to one of the responses
The conservative spurious RTO response did not queue CWR even
though the sending rate was lowered. Whenever reduction happens
regardless of reason, CWR should be sent (forgetting to send it
is not very fatal though).

A better approach would be to queue CWR when one of the sending
rate reducing responses (rate-halving one or this conservative
response) is used already at RTO. Doing that would allow CWR to
be sent along with the two new data segments that are sent
during FRTO. However, it's a bit "racy" because userland could
tune the response sysctl to a more aggressive one in between.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-19 13:56:23 -07:00
David S. Miller f6c5d736af [IPV4]: Remove IPVS icmp hack from route.c for now.
Revert: 2d771cd86d

This is dangerous if enabled and a better solution to the
problem is being worked on.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-18 02:07:50 -07:00
Dave Jones d739437207 [IPV4]: Correct rp_filter help text.
As mentioned in http://bugzilla.kernel.org/show_bug.cgi?id=5015
The helptext implies that this is on by default.
This may be true on some distros (Fedora/RHEL have it enabled
in /etc/sysctl.conf), but the kernel defaults to it off.

Signed-off-by: Dave Jones <davej@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-17 15:02:21 -07:00
David S. Miller 2ff011efa4 [TCP]: TCP_CONG_YEAH requires TCP_CONG_VEGAS
These two congestion control modules share code.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-17 14:20:32 -07:00
Stephen Hemminger a02ba04166 [TCP] slow start: Make comments and code logic clearer.
Add more comments to describe our version of tcp_slow_start().

Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-17 14:20:31 -07:00
Mitsuru Chinen d831666e98 [IPV4] SNMP: Display new statistics at /proc/net/netstat
This displays the statistics specified in the updated IP-MIB RFC
(RFC4293) in /proc/net/netstat. The reason why these are not displayed
in /proc/net/snmp is that some existing utilities are developed under
the assumption which ipstat items in /proc/net/snmp is unchanged.

Signed-off-by: Mitsuru Chinen <mitch@linux.vnet.ibm.com>
Signed-off-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-14 03:07:30 -07:00
Patrick McHardy 802169a4b0 [NETFILTER]: iptable_raw: ignore short packets sent by SOCK_RAW sockets
iptables matches and targets expect packets to have at least a full
IP header and a valid header length. Ignore packets sent through
raw sockets for which this isn't true as in the other tables.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:59 -07:00
Patrick McHardy 4a176c1a61 [NETFILTER]: iptable_{filter,mangle}: more descriptive "happy cracking" message
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:59 -07:00
Yasuyuki Kozakai ba4c7cbadd [NETFILTER]: nf_nat: remove unused argument of function allocating binding
nf_nat_rule_find, alloc_null_binding and alloc_null_binding_confirmed
do not use the argument 'info', which is actually ct->nat.info.
If they are necessary to access it again, we can use the argument 'ct'
instead.

Signed-off-by: Yasuyuki Kozakai <yasuyuki.kozakai@toshiba.co.jp>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:44 -07:00
Patrick McHardy 3c2ad469c3 [NETFILTER]: Clean up table initialization
- move arp_tables initial table structure definitions to arp_tables.h
  similar to ip_tables and ip6_tables

- use C99 initializers

- use initializer macros where possible

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:43 -07:00
David S. Miller fc038410b4 [UDP]: Fix AF-specific references in AF-agnostic code.
__udp_lib_port_inuse() cannot make direct references to
inet_sk(sk)->rcv_saddr as that is ipv4 specific state and
this code is used by ipv6 too.

Use an operations vector to solve this, and this also paves
the way for ipv6 support for non-wild saddr hashing in UDP.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-10 23:47:22 -07:00
Linus Torvalds 9a9136e270 Merge git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
  sound: convert "sound" subdirectory to UTF-8
  MAINTAINERS: Add cxacru website/mailing list
  include files: convert "include" subdirectory to UTF-8
  general: convert "kernel" subdirectory to UTF-8
  documentation: convert the Documentation directory to UTF-8
  Convert the toplevel files CREDITS and MAINTAINERS to UTF-8.
  remove broken URLs from net drivers' output
  Magic number prefix consistency change to Documentation/magic-number.txt
  trivial: s/i_sem /i_mutex/
  fix file specification in comments
  drivers/base/platform.c: fix small typo in doc
  misc doc and kconfig typos
  Remove obsolete fat_cvf help text
  Fix occurrences of "the the "
  Fix minor typoes in kernel/module.c
  Kconfig: Remove reference to external mqueue library
  Kconfig: A couple of grammatical fixes in arch/i386/Kconfig
  Correct comments in genrtc.c to refer to correct /proc file.
  Fix more "deprecated" spellos.
  Fix "deprecated" typoes.
  ...

Fix trivial comment conflict in kernel/relay.c.
2007-05-09 12:54:17 -07:00
Oleg Nesterov 28e53bddf8 unify flush_work/flush_work_keventd and rename it to cancel_work_sync
flush_work(wq, work) doesn't need the first parameter, we can use cwq->wq
(this was possible from the very beginnig, I missed this).  So we can unify
flush_work_keventd and flush_work.

Also, rename flush_work() to cancel_work_sync() and fix all callers.
Perhaps this is not the best name, but "flush_work" is really bad.

(akpm: this is why the earlier patches bypassed maintainers)

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Tejun Heo <htejun@gmail.com>
Cc: Auke Kok <auke-jan.h.kok@intel.com>,
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:53 -07:00
Oleg Nesterov c214b2cc5f ipvs: flush defense_work before module unload
net/ipv4/ipvs/ip_vs_core.c

	module_exit
	    ip_vs_cleanup
		ip_vs_control_cleanup
		    cancel_rearming_delayed_work
	// done

This is unsafe.  The module may be unloaded and the memory may be freed
while defense_work's handler is still running/preempted.

Do flush_work(&defense_work.work) after cancel_rearming_delayed_work().

Alternatively, we could add flush_work() to cancel_rearming_delayed_work(),
but note that we can't change cancel_delayed_work() in the same manner
because it may be called from atomic context.

Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-09 12:30:52 -07:00
Michael Opdenacker 59c51591a0 Fix occurrences of "the the "
Signed-off-by: Michael Opdenacker <michael@free-electrons.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 08:57:56 +02:00
David Sterba 3dde6ad8fc Fix trivial typos in Kconfig* files
Fix several typos in help text in Kconfig* files.

Signed-off-by: David Sterba <dave@jikos.cz>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
2007-05-09 07:12:20 +02:00
Randy Dunlap e63340ae6b header cleaning: don't include smp_lock.h when not used
Remove includes of <linux/smp_lock.h> where it is not used/needed.
Suggested by Al Viro.

Builds cleanly on x86_64, i386, alpha, ia64, powerpc, sparc,
sparc64, and arm (all 59 defconfigs).

Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 11:15:07 -07:00
Srinivas Aji b40b4f79ce [TCP]: zero out rx_opt in tcp_disconnect()
When the server drops its connection, NFS client reconnects using the
same socket after disconnecting. If the new connection's SYN,ACK
doesn't contain the TCP timestamp option and the old connection's did,
tp->tcp_header_len is recomputed assuming no timestamp header but
tp->rx_opt.tstamp_ok remains set. Then tcp_build_and_update_options()
adds in a timestamp option past the end of the allocated TCP header,
overwriting TCP data, or when the data is in skb_shinfo(skb)->frags[],
overwriting skb_shinfo(skb) causing a crash soon after. (The issue was
debugged from such a crash.)

Similarly, wscale_ok and sack_ok also get set based on the SYN,ACK
packet but not reset on disconnect, since they are zeroed out at
initialization. The patch zeroes out the entire tp->rx_opt struct in
tcp_disconnect() to avoid this sort of problem.

Signed-off-by: Srinivas Aji <Aji_Srinivas@emc.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 17:32:28 -07:00
Pavel Emelianov 7562f876cd [NET]: Rework dev_base via list_head (v3)
Cleanup of dev_base list use, with the aim to simplify making device
list per-namespace. In almost every occasion, use of dev_base variable
and dev->next pointer could be easily replaced by for_each_netdev
loop. A few most complicated places were converted to using
first_netdev()/next_netdev().

Signed-off-by: Pavel Emelianov <xemul@openvz.org>
Acked-by: Kirill Korotaev <dev@openvz.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 15:13:45 -07:00
Ilpo Järvinen 03fba04796 [TCP] Highspeed: Limited slow-start is nowadays in tcp_slow_start
Reuse limited slow-start (RFC3742) included into tcp_cong instead
of having another implementation in High Speed TCP.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 13:28:35 -07:00
Herbert Xu cfd6c38096 [NETFILTER]: sip: Fix RTP address NAT
I needed to use this recently to talk to a Cisco server.  In my case
I only did SNAT while the Cisco server used a different address for
RTP traffic than the one for SIP.  I discovered that nf_nat_sip NATed
the RTP address to the SIP one which was unnecessary but OK.  However,
in doing so it did not DNAT the destination address on the RTP traffic
to the Cisco back to the original RTP address.

This patch corrects this by noting down the RTP address and using it
when the expectation fires.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:35:31 -07:00
Jorge Boncompte c2a1910b06 [NETFILTER]: nf_nat_proto_gre: do not modify/corrupt GREv0 packets through NAT
While porting some changes of the 2.6.21-rc7 pptp/proto_gre conntrack
and nat modules to a 2.4.32 kernel I noticed that the gre_key function
returns a wrong pointer to the GRE key of a version 0 packet thus
corrupting the packet payload.

The intended behaviour for GREv0 packets is to act like
nf_conntrack_proto_generic/nf_nat_proto_unknown so I have ripped the
offending functions (not used anymore) and modified the
nf_nat_proto_gre modules to not touch version 0 (non PPTP) packets.

Signed-off-by: Jorge Boncompte <jorge@dti2.net>
Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:34:42 -07:00
Patrick McHardy 327850070b [NETFILTER]: ipt_DNAT: accept port randomization option
Also accept the --random option for DNAT to allow randomly selecting a
destination port from the given range.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:34:03 -07:00
Robert P. J. Day 825e7d45cf [TCP]: Delete unused header file net/ipv4/tcp_yeah.h.
Delete the apparently unused header file net/ipv4/tcp_yeah.h.

Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2007-05-03 03:13:35 -07:00
David S. Miller de34ed91c4 [UDP]: Do not allow specific bind when wildcard bind exists.
When allocating local ports, do not allow a bind to a port
with a specific local address when a bind to that port with
a wildcard local address already exists.

Noticed by Linus.

Signed-off-by: David S. Miller <davem@davemloft.net>
2007-04-30 14:51:58 -07:00