Commit Graph

158278 Commits (ea6a634ef7f0ab1d1f48ba0ad4f50e96d6065312)

Author SHA1 Message Date
Matt Carlson b6080e1260 tg3: Add coalesce parameters for msix vectors
This patch adds code to tune the coalescing parameters for the other
msix vectors.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:46 -07:00
Matt Carlson fed9781081 tg3: Enable NAPI instances for other int vectors
This patch adds code to enable and disable the rest of the NAPI
instances.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:45 -07:00
Matt Carlson fe5f5787f0 tg3: Add TSS support
This patch exposes the additional transmit rings to the kernel and makes
the necessary modifications to transmit, open, and close paths.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:44 -07:00
Matt Carlson 89aeb3bcea tg3: Update intmbox and coal_now for msix
This patch fixes up two spots that need attention now that msix support
has been added.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:42 -07:00
Matt Carlson f77a6a8e6c tg3: Add tx and rx ring resource tracking
This patch adds code to assign status block, tx producer ring and rx
return ring resources needed for the other interrupt vectors.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:39 -07:00
Matt Carlson 646c9eddcf tg3: Add mailbox assignments
The 5717 assigns mailbox locations to interrupt vectors in a rather
non-intuitive way.  (Much of the complexity stems from legacy
compatibility issues.)  This patch implements the assignment scheme.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:36 -07:00
Matt Carlson 679563f47c tg3: Add MSI-X support
This patch adds MSI-X support.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:33 -07:00
Matt Carlson 4f125f42dd tg3: Add support code around kernel interrupt API
This patch adds code to support multiple interrupt vectors around the
kernel's interrupt API.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:30 -07:00
Matt Carlson 2d31ecaf10 tg3: Create tg3_rings_reset()
This patch moves most of the chip ring setup logic into a separate
function.  This will make it easier to verify the multi ring setup
changes.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:28 -07:00
Matt Carlson fd2ce37f8e tg3: Add per-int coalesce now member
Each interrupt vector has its own bit in the host coalescing register to
force that vector's status block to be updated and generate an
interrupt.  This patch adds a member to the per-interrupt structure
that records which bit belongs to that vector.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:25 -07:00
Matt Carlson f19af9c2cc tg3: inline tg3_cond_int()
This patch inlines the code of tg3_cond_int() into the function's only
callsite.  This prep work makes the following patch cleaner.

Signed-off-by: Matt Carlson <mcarlson@broadcom.com>
Reviewed-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-02 00:43:22 -07:00
David S. Miller 6cdee2f96a Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:
	drivers/net/yellowfin.c
2009-09-02 00:32:56 -07:00
Eric Dumazet 0625491493 ipv6: ip6_push_pending_frames() should increment IPSTATS_MIB_OUTDISCARDS
qdisc drops should be notified to IP_RECVERR enabled sockets, as done in IPV4.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 18:37:16 -07:00
Xiao Guangrong f2798eb4e0 drop_monitor: fix trace_napi_poll_hit()
The net_dev of backlog napi is NULL, like below:

__get_cpu_var(softnet_data).backlog.dev == NULL

So, we should check it in napi tracepoint's probe function

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 18:18:12 -07:00
David S. Miller 2fbd3da387 pkt_sched: Revert tasklet_hrtimer changes.
These are full of unresolved problems, mainly that conversions don't
work 1-1 from hrtimers to tasklet_hrtimers because unlike hrtimers
tasklets can't be killed from softirq context.

And when a qdisc gets reset, that's exactly what we need to do here.

We'll work this out in the net-next-2.6 tree and if warranted we'll
backport that work to -stable.

This reverts the following 3 changesets:

a2cb6a4dd4
("pkt_sched: Fix bogon in tasklet_hrtimer changes.")

38acce2d79
("pkt_sched: Convert CBQ to tasklet_hrtimer.")

ee5f9757ea
("pkt_sched: Convert qdisc_watchdog to tasklet_hrtimer")

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:59:25 -07:00
David S. Miller 3732e9bd2d xilinx_emaclite: Fix permissions on driver sources.
Noticed by Michal Simek.

Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:50:50 -07:00
Jarek Poplawski d66ee0587c net: sk_free() should be allowed right after sk_alloc()
After commit 2b85a34e91
(net: No more expensive sock_hold()/sock_put() on each tx)
sk_free() frees socks conditionally and depends
on sk_wmem_alloc being set e.g. in sock_init_data(). But in some
cases sk_free() is called earlier, usually after other alloc errors.

Fix is to move sk_wmem_alloc initialization from sock_init_data()
to sk_alloc() itself.

Signed-off-by: Jarek Poplawski <jarkao2@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:49:00 -07:00
Stephen Hemminger 89d69d2b75 net: make neigh_ops constant
These tables are never modified at runtime. Move to read-only
section.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:57 -07:00
roel kluin dcbfef820b au1000_eth: possible NULL dereference of aup->mii_bus->irq in au1000_probe()
aup->mii_bus->irq allocation may fail, prevent a dereference of NULL.

Signed-off-by: Roel Kluin <roel.kluin@gmail.com>
Acked-by: Florian Fainelli <florian@openwrt.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:56 -07:00
Damian Lukowski 5d7892298a RTO connection timeout: sysctl documentation update
This patch updates the sysctl documentation concerning the interpretation
of tcp_retries{1,2} and tcp_orphan_retries.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:50 -07:00
Damian Lukowski 5152fc7de3 RTO connection timeout: coding style fixes and comments
This patch affects the retransmits_timed_out() function.

Changes:
1) Variables have more meaningful names
2) retransmits_timed_out() has an introductionary comment.
3) Small coding style changes.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:47 -07:00
Mike McCormack 72c6068328 sky2: Use 32bit read to read Y2_VAUX_AVAIL
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL (1<<16),
 so use a 32bit read when testing Y2_VAUX_AVAIL

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:44 -07:00
Mike McCormack 90bbebb4a8 sky2: Create buffer alloc and free helpers
Refactor similar two sections of code that free buffers into one.
Only call tx_init if all buffer allocations succeed.

Signed-off-by: Mike McCormack <mikem@ring3k.org>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:42 -07:00
Stephen Hemminger 10547ae2c0 sky2: fix management of driver LED
Observed by Mike McCormack.

The LED bit here is just a software controlled value used to
turn on one of the LED's on some boards. The register value was wrong,
which could have been causing some power control issues.
Get rid of problematic define use the correct mask.

Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:35 -07:00
Michael S. Tsirkin 89f56d1e91 tun: reuse struct sock fields
As tun always has an embeedded struct sock,
use sk and sk_receive_queue fields instead of
duplicating them in tun_struct.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:33 -07:00
Alexey Dobriyan 86393e52c3 netns: embed ip6_dst_ops directly
struct net::ipv6.ip6_dst_ops is separatedly dynamically allocated,
but there is no fundamental reason for it. Embed it directly into
struct netns_ipv6.

For that:
* move struct dst_ops into separate header to fix circular dependencies
	I honestly tried not to, it's pretty impossible to do other way
* drop dynamical allocation, allocate together with netns

For a change, remove struct dst_ops::dst_net, it's deducible
by using container_of() given dst_ops pointer.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:31 -07:00
Eric Dumazet 885a136c52 bonding: use compare_ether_addr_64bits() in ALB
We can speedup ether addresses compares using compare_ether_addr_64bits()
instead of memcmp(). We make sure all operands are at least 8 bytes long and
16bits aligned (or better, long word aligned if possible)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Reviewed-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:26 -07:00
Eric Dumazet ac06713d55 macvlan: Use compare_ether_addr_64bits()
To speedup ether addresses compares, we can use compare_ether_addr_64bits()
(all operands are guaranteed to be at least 8 bytes long)

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:25 -07:00
Mallikarjuna R Chilakala c3c7432741 ixgbe: Patch to fix 82599 multispeed fiber link issues when driver is loaded without any cable and reconnecting it to 1G partner
In 82599 multi speed fiber case when driver is loaded without any
cable and reconnecting the cable with a 1G partner does not bring
up the link in 1Gb mode. When there is no link we first setup the link
at 10G & 1G and then try to re-establish the link at highest speed 10G
and thereby changing autoneg_advertised value to highest speed 10G.
After connecting back the cable to a 1G link partner we never try 1G
as autoneg advertised value is changed to link at 10G only. The
following patch fixes the issue by properly initializing the
autoneg_advertised value just before exiting from link setup routine.

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:20 -07:00
Peter P Waskiewicz Jr b7fdb71485 ixgbe: Properly disable DCB arbiters prior to applying changes
When disabling the Rx and Tx data arbiters prior to configuration changes,
the arbiters were not being shut down properly.  This can create a race
in the DCB hardware blocks, and potentially hang the arbiters.  Also, the
Tx descriptor arbiter shouldn't be disabled when applying configuration
changes; disabling this arbiter can cause a Tx hang.

Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:18 -07:00
Mallikarjuna R Chilakala 8620a103b5 ixgbe: refactor link setup code
Link code cleanup: a number of redundant functions and MAC variables are cleaned up,
with some functions being consolidated into a single-purpose code path.
Removed following deprecated link functions and mac variables
 * ixgbe_setup_copper_link_speed_82598
 * ixgbe_setup_mac_link_speed_multispeed_fiber
 * ixgbe_setup_mac_link_speed_82599
 * mac.autoneg, mac.autoneg_succeeded, phy.autoneg_wait_to_complete

Signed-off-by: Mallikarjuna R Chilakala <mallikarjuna.chilakala@intel.com>
Acked-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:14 -07:00
Graham, David fd38d7a0a0 e1000: Fix for e1000 kills IPMI on a tagged vlan.
Enabling VLAN filters (VFE) when the primary interface is brought up
(per commit 78ed11a) has caused problems for some users who manage
their systems using IPMI over a VLAN. This is because when the driver
enables the VLAN filter, this same filter table is enabled for the
management channel, and the table is initially empty, which means that
the IPMI/VLAN packets are filtered out and not received by the BMC.
This is a problem only on e1000 class adapters, as it is only
on e1000 that the filter table is common to the management and host
streams.

With this change, filtering is only enabled when one or more host VLANs
exist, and is disabled when the last host VLAN is removed. VLAN filtering
is always disabled when the primary interface is in promiscuous mode,
and will be (re)enabled if VLANs exist when the interface exits
promiscuous mode.

Note that this does not completely resolve the issue for those using VLAN
management, because if the host adds a VLAN, then the above problem
occurs when that VLAN is enabled. However, it does mean the there is no
problem for configurations where management is on a VLAN and the host is
not.

A complete solution to this issue would require further driver changes.
The driver would need to discover if (and which) management VLANs are
active before enabling VLAN filtering, so that it could ensure that the
managed VLANs are included in the VLAN filter table. This discovery
requires that the BMC identifies its VLAN in registers accessible
to the driver, and at least on Dell PE2850 systems the BMC does not
identify its VLAN to allow such discovery. Intel is pursuing this issue
with the BMC vendor.

Signed-off-by: Dave Graham <david.graham@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tested-by: Krzysztof Piotr Oledzki <ole@ans.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 17:40:11 -07:00
Samuel Ortiz 04e715cd46 iwmc3200wifi: Add a last_fw_err debugfs entry
In order to check what was the last fw error we got accross resets, we add
this debugfs entry. It displays the complete ASSERT information.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:28 -04:00
Samuel Ortiz d210176eaa iwmc3200wifi: Handle UMAC stalls and UMAC assert properly
When UMAC stalls or asserts, we want to reset the device. But when we're
associated, the current reset worker will end up calling
cfg80211_connect_result() with the cfg80211 sme layer knowing that we're
reassociating. That ends up with some ugly warnings.
With this patch we're telling the upper layer that we've roamed if
reassociation succeeds, and that we're disconnected if it fails.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:28 -04:00
Samuel Ortiz d04bd6283c iwmc3200wifi: New initial LMAC calibration
The LMAC calibration API got broken mostly by having a configuration bitmap
being different than the result one.
This patch tries to address that issue by correctly running calibrations with
the newest firmwares, and keeping a backward compatibility fallback path for
older firmwares, where the configuration and result bitmaps were identical.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:27 -04:00
Zhu Yi 31452420ca iwmc3200wifi: fix misuse of le16_to_cpu
Also mark some functions static.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:27 -04:00
Zhu Yi c743627388 iwmc3200wifi: add disconnect work
When the driver receives "connection terminated" event from device,
it could be caused by 2 reasons: the firmware is roaming or the
connection is lost (AP disappears). For the former, an association
complete event is supposed to come within 3 seconds. For the latter,
the driver won't receive any event except the connection terminated.
So we kick a delayed work (5*HZ) when we receive the connection
terminated event. It will be canceled if it turns out to be a roaming
event later. Otherwise we notify SME and userspace the disconnection.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:27 -04:00
Zhu Yi de15fd31fc iwmc3200wifi: use cfg80211_roamed to send roam event
The device sends connection terminated and [re]association success
(or failure) events when roaming occours. The patch uses
cfg80211_roamed instead of cfg80211_connect_result to notify SME
for roaming.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:27 -04:00
Samuel Ortiz d041811d93 iwmc3200wifi: Fix sparse warning
iwm_cfg80211_get_station() should be static.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:26 -04:00
Samuel Ortiz b90a5c9561 iwmc3200wifi: Set WEP key from connect
When connect is called with the LEGACY_PSK authentication type set, and a
proper sme->key, we need to set the WEP key straight after setting the
profile otherwise the authentication will never start.

Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:26 -04:00
Zhu Yi ae73abf235 iwmc3200wifi: invalidate profile when necessary before connect
If cfg80211 requests to connect when we have already had an active
profile, invalidate the current profile first before sending a new
profile to UMAC.

Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Signed-off-by: Samuel Ortiz <sameo@linux.intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:25 -04:00
Jussi Kivilinna 2b7dcfb7d0 rndis_wlan: remove 'select WIRELESS_EXT' in Kconfig
Since rndis_wlan is now converted to cfg80211, WIRELESS_EXT isn't
required anymore.

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:23 -04:00
Jussi Kivilinna 53d27eaf55 rndis_wlan: fix sparse endianess warnings
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:22 -04:00
Jussi Kivilinna c5c4fe90e3 rndis_wlan: cleanup
- remove double newlines between functions
- remove commented out function (rndis_set_config_parameter_u32())
- coding style fix in rndis_set_config_parameter_str()
- add comment banners between function sections

Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:22 -04:00
Jussi Kivilinna 051ae0bf7f rndis_wlan: use bool for on/off switches
Signed-off-by: Jussi Kivilinna <jussi.kivilinna@mbnet.fi>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:22 -04:00
Vasanthakumar Thiagarajan 8f43161aa6 ath9k: Call spin_lock_bh() on btcoex_lock
As generic hw timer interrupt handler is moved to tasklet,
we no more need to call spin_lock_irqsave().

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:21 -04:00
Vasanthakumar Thiagarajan ebb8e1d78c ath9k: Move generic hw timer intr handler to bottom-half
There is no point handling this in hard irq, move it to
tasklet.

Signed-off-by: Vasanthakumar Thiagarajan <vasanth@atheros.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:21 -04:00
Johannes Berg 8bc11b491b rfkill: relicense header file
This header file is copied into userspace tools that
need not be GPLv2 licensed, make that easier.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Acked-by: Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Acked-by: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Acked-by: Iñaky Pérez-González <inaky@linux.intel.com>
Acked-by: Ivo van Doorn <IvDoorn@gmail.com>
Acked-by: Jaswinder Singh Rajput <jaswinderrajput@gmail.com>
Acked-by: Michael Buesch <mb@bu3sch.de>
Acked-by: Tomas Winkler <tomas.winkler@intel.com>
Signed-off-by: John W. Linville <linville@tuxdriver.com>
2009-09-01 12:48:21 -04:00
Damian Lukowski 6fa12c8503 Revert Backoff [v3]: Calculate TCP's connection close threshold as a time value.
RFC 1122 specifies two threshold values R1 and R2 for connection timeouts,
which may represent a number of allowed retransmissions or a timeout value.
Currently linux uses sysctl_tcp_retries{1,2} to specify the thresholds
in number of allowed retransmissions.

For any desired threshold R2 (by means of time) one can specify tcp_retries2
(by means of number of retransmissions) such that TCP will not time out
earlier than R2. This is the case, because the RTO schedule follows a fixed
pattern, namely exponential backoff.

However, the RTO behaviour is not predictable any more if RTO backoffs can be
reverted, as it is the case in the draft
"Make TCP more Robust to Long Connectivity Disruptions"
(http://tools.ietf.org/html/draft-zimmermann-tcp-lcd).

In the worst case TCP would time out a connection after 3.2 seconds, if the
initial RTO equaled MIN_RTO and each backoff has been reverted.

This patch introduces a function retransmits_timed_out(N),
which calculates the timeout of a TCP connection, assuming an initial
RTO of MIN_RTO and N unsuccessful, exponentially backed-off retransmissions.

Whenever timeout decisions are made by comparing the retransmission counter
to some value N, this function can be used, instead.

The meaning of tcp_retries2 will be changed, as many more RTO retransmissions
can occur than the value indicates. However, it yields a timeout which is
similar to the one of an unpatched, exponentially backing off TCP in the same
scenario. As no application could rely on an RTO greater than MIN_RTO, there
should be no risk of a regression.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 02:45:47 -07:00
Damian Lukowski f1ecd5d9e7 Revert Backoff [v3]: Revert RTO on ICMP destination unreachable
Here, an ICMP host/network unreachable message, whose payload fits to
TCP's SND.UNA, is taken as an indication that the RTO retransmission has
not been lost due to congestion, but because of a route failure
somewhere along the path.
With true congestion, a router won't trigger such a message and the
patched TCP will operate as standard TCP.

This patch reverts one RTO backoff, if an ICMP host/network unreachable
message, whose payload fits to TCP's SND.UNA, arrives.
Based on the new RTO, the retransmission timer is reset to reflect the
remaining time, or - if the revert clocked out the timer - a retransmission
is sent out immediately.
Backoffs are only reverted, if TCP is in RTO loss recovery, i.e. if
there have been retransmissions and reversible backoffs, already.

Changes from v2:
1) Renaming of skb in tcp_v4_err() moved to another patch.
2) Reintroduced tcp_bound_rto() and __tcp_set_rto().
3) Fixed code comments.

Signed-off-by: Damian Lukowski <damian@tvk.rwth-aachen.de>
Acked-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: David S. Miller <davem@davemloft.net>
2009-09-01 02:45:42 -07:00