Commit graph

98839 commits

Author SHA1 Message Date
Ben Dooks
2bdf06c047 DM9000: Add documentation for the driver.
Add Documentation/networking/dm9000.txt for the DM9000
network driver.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:58:39 -04:00
Ben Dooks
6ff4ff06d2 DM9000: Remove DEFAULT_TRIGGER for request_irq() flags.
Currently all but one user (AT91SAM9261EK) of the dm9000
driver passes their IRQ flags through the resources attached
to the platform device. This means we can remove the use
of DEFAULT_TRIGGER as the blackfin machines all seem to
have their triggers set properly.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:58:36 -04:00
Ben Dooks
485ca22a10 DM9000: Re-unite menuconfig entries for DM9000 driver
The ENC28J60 driver ended up adding itself inbetween the
two DM9000 Kconfig entries, so re-unite the two together.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:58:29 -04:00
Ben Dooks
2fcf06ca67 DM9000: Add missing msleep() in EEPROM wait code.
The msleep() call in the code that checks for the
EEPROM controller's busy status was missing.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:58:17 -04:00
Ben Dooks
f8dd0ecbb7 DM9000: Allow the use of the NSR register to get link status.
The DM9000's internal PHY reports a copy of the link status
in the NSR register of the chip. Reading the status when
polling for link status is faster as it eliminates the need
to sleep, but does not print as much information.

Add an platform flag to force this behaviour, and a Kconfig
option to allow it to be forced to the faster method always.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:58:07 -04:00
Ben Dooks
aa1eb452e8 DM9000: Use NSR to determine link-status on internal PHY
The DM9000_NSR register contains a copy of the internal PHY's
link status which we can use to determine if the link is up
or down. This eliminates the more costly (and sleeping) PHY
read when using the DM9000's own PHY.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:57:58 -04:00
Ben Dooks
f8d79e79a1 DM9000: Cleanup source code - remove forward declerations
Cleanup the source code by moving the code around to avoid
having to declare the functions before they are used.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:57:51 -04:00
Ben Dooks
59eae1fa3b DM9000: Cleanup source code
Cleanup bits of the DM9000 driver to make the code
neater and easier to read. This is includes removing
some old definitions, re-indenting areas, etc.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:57:42 -04:00
Ben Dooks
9088fa4fa2 DM9000: Cleanups after the resource changes
Remove the now extraneous checks in dm9000_release_board()
now that the two-resource case is removed. Also remove the
check on pdev->num_resources, as we check the return data
from platform_get_resource() to ensure we have not only
the right number but the right type of resources as well.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:57:28 -04:00
Ben Dooks
6d406b3c76 DM9000: Add support for DM9000A and DM9000B chips
Add support for both the DM9000A and DM9000B versions of
the DM9000 networking chip. This includes adding support
for the Link-Change IRQ which is used instead of polling
the PHY every 2 seconds.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:57:16 -04:00
Laurent Pinchart
da3854fc9f DM9000: Fixup blackfin after removing 2 resource usage
The dm9000 driver accepts either 2 or 3 resources to describe the platform
devices. The 2 resources case abuses the ioresource mechanism by passing
ioremap()ed memory through the platform device resources. This patch removes
converts boards that were using it to the 3 resources scheme.

CC: Bryan Wu <cooloney@kernel.org>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:56:57 -04:00
Laurent Pinchart
08c3f57caa DM9000: Remove the 2 resources probe scheme.
The dm9000 driver accepts either 2 or 3 resources to describe the platform
devices. The 2 resources case abuses the ioresource mechanism by passing
ioremap()ed memory through the platform device resources. This patch removes
that case and converts boards that were using it to the 3 resources scheme.

Signed-off-by: Ben Dooks <ben-linux@fluff.org>
Signed-off-by: Laurent Pinchart <laurentp@cse-semaphore.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
2008-06-24 22:56:43 -04:00
Eilon Greenstein
e35c3269ed bnx2x: Update version
Updating to version 1.45.6

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:36:51 -07:00
Wendy Xiong
493adb1fee bnx2x: Add PCIE EEH support
Add PCI recovery functions to the driver.  The initial PCI state is
also saved so the MSI state can be restored during PCI recovery.

Signed-off-by: Wendy Xiong <wendyx@us.ibm.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:36:22 -07:00
Yitchak Gertner
f3c87cddfe bnx2x: Enhanced self test
Added registers, memories, loopback, nvram, interrupt and link tests to
the self-test

Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:35:51 -07:00
Eilon Greenstein
755735eb34 bnx2x: Re-factor Tx code
Add support for IPv6 TSO
Re-factor the Tx code with smaller functions to increase readability.
Add linearization code in case packet is too fragmented for the
microcode to handle.

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:35:13 -07:00
Vladislav Zolotarov
7a9b25577c bnx2x: Add TPA, Broadcoms HW LRO
The TPA stands for Transparent Packet Aggregation. When enabled, the FW
aggregate in-order TCP packets according to the 4-tuple match and sends
1 big packet to the driver. This packet is stored on an SGL in which
each SGE is 1 page. The FW also implements a timeout algorithm and it
honors all TCP flag, including the push flag as a trigger to halt
aggregation.

After receiving Ben Hutchings comments, we also added ethtool support,
so now, thanks to Ben's patch, when forwarding is enabled, our
aggregation is turned off using the LRO flags.

Signed-off-by: Vladislav Zolotarov <vladz@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:34:36 -07:00
Yitchak Gertner
bb2a0f7ae4 bnx2x: New statistics code
To avoid race conditions with link up/down and driver up/down - the
statistics handling was re-written in a form of state machine.
Also supporting statistics for 57711

Signed-off-by: Yitchak Gertner <gertner@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:33:36 -07:00
Eilon Greenstein
34f80b04f3 bnx2x: Add support for BCM57711 HW
Supporting the 57711 and 57711E - refers to in the code as E1H. The
57710 is referred to as E1.

To support the new members in the family, the bnx2x structure was
divided to 3 parts: common, port and function. These changes caused some
rearrangement in the bnx2x.h file.

A set of accessories macros were added to make access to the bnx2x
structure more readable

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:33:01 -07:00
Eilon Greenstein
e523287e8e bnx2x: New microcode part 3/3
The new Microcode BLOB - broken into a separate patch to make it small
enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:32:28 -07:00
Eilon Greenstein
299133cf73 bnx2x: New microcode part 2/3
The new Microcode BLOB - broken into a separate patch to make it small
enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:32:04 -07:00
Eilon Greenstein
74bc8ebcfd bnx2x: New microcode part 1/3
The new Microcode BLOB - broken into a separate patch to make it small
enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:31:40 -07:00
Eilon Greenstein
523cb50b26 bnx2x: Remove old microcode
Removing the old Microcode from the BLOB - broken into a separate
patch to make it small enough for the mailing list

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:30:11 -07:00
Eilon Greenstein
ad8d394804 bnx2x: New init infrastructure
This new initialization code supports the 57711 HW. It also supports
the emulation and FPGA for the 57711 and 57710 initializations values
(very small amount of code which is very helpful in the lab - less
than 30 lines).

The initialization is done via DMAE after the DMAE block is ready -
before it is ready, some of the initialization is done via PCI
configuration transactions (referred to as indirect write).  A mutex
to protect the DMAE from being overlapped was added.  There are few
new registers which needs to be initialized by SW - the full comment
for those registers is added to the register file.  A place holder for
the 57711 (referred to as E1H) microcode was added- the microcode
itself is too big and it is split over the following 4 patches

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:29:02 -07:00
Yaniv Rosner
c18487ee24 bnx2x: New link code
New Link code:
Moving all the link related code (including the calculations, the
initialization of the MAC and PHY and the external PHY's code) into
a separated file. The changes from the code that used to be part of
bnx2x.c (now called bnx2x_main.c) are:
- Using separate structures for link inputs and link outputs to clearly 
  identify what was configured and what is the outcome
- Adding code to read external PHY FW version and print it as part of 
  ethtool -i
- Adding code to upgrade external PHY FW from ethtool -E with special 
  magic number - Changing the link down indication to ERR level
- Adding a lock on all PHY access to prevent an interrupt and 
  setting changes to overlap
- Adding support for emulation and FPGA (small chunk of code that really 
  helps in the lab) - Adding support for 1G on BCM8706 PHY
- Adding clear debug print incase of fan failure (the PHY type is now 
  "failure")

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:27:52 -07:00
Yaniv Rosner
ea4e040abc bnx2x: Adding bnx2x_link
This patch is int the new bnx2x_link files (C and H). The files are
still not used in this patch, only in the next one so the patch will
be small enough for the mailing list.

Signed-off-by: Yaniv Rosner <yanivr@broadcom.com>
Signed-off-by: Eilong Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:27:26 -07:00
Eilon Greenstein
23bd462b6d bnx2x: Rename bnx2x.c to bnx2x_main.c
This patch is the rename of bnx2x.c to bnx2x_main.c.

Signed-off-by: Eilon Greenstein <eilong@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-23 20:25:46 -07:00
Vlad Yasevich
0f474d9bc5 sctp: Kill unused variable in sctp_assoc_bh_rcv()
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-20 10:34:47 -07:00
Michael Chan
8427f13612 bnx2: Update driver version to 1.7.7.
And update module description.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:44:44 -07:00
Michael Chan
2739a8bb5b bnx2: Cleanup error handling in bnx2_open().
All error handling in bnx2_open() can be consolidated.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:44:10 -07:00
Michael Chan
5e9ad9e108 bnx2: Turn on multi rx rings.
Enable multiple rx rings if MSI-X vectors are available.  We enable
up to 7 rx rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:43:17 -07:00
Michael Chan
2dffcc3dcd bnx2: Update firmware to support multi rx rings.
Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:42:39 -07:00
Michael Chan
f0ea2e6385 bnx2: Use one handler for all MSI-X vectors.
Use the same MSI-X handler to schedule NAPI.  Change the dev_instance
void pointer to the bnx2_napi struct instead so we can have the proper
context for each MSI-X vector.

Add a new bnx2_poll_msix() that is optimized for handling MSI-X
NAPI polling of rx/tx work only.  Remove the old bnx2_tx_poll() that
is no longer needed.  Each MSI-X vector handles 1 tx and 1 rx ring.
The first vector handles link events as well.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:41:57 -07:00
Michael Chan
43e80b89b6 bnx2: Optimize fast-path tx and rx work.
Add hw_tx_cons_ptr and hw_rx_cons_ptr to speed up the retreival of
the tx and rx consumer index, since the MSI-X and default status
blocks have different structures.

Combine status_blk and status_blk_msix into a union.  We'll only use
one type of status block for each vector.

Separate the code to detect more rx and tx work from the code to
detect link related work.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:41:08 -07:00
Michael Chan
bb4f98abf5 bnx2: Put rx ring variables in a separate struct.
In preparation for multi-ring support, rx ring variables are now put
in a separate bnx2_rx_ring_info struct.  With MSI-X, we can support
multiple rx rings.

The functions to allocate/free rx memory and to initialize rx rings
are now modified to handle multiple rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:38:19 -07:00
Michael Chan
35e9010b22 bnx2: Put tx ring variables in a separate struct.
In preparation for multi-ring support, tx ring variables are now put
in a separate bnx2_tx_ring_info struct.  Multi tx ring will not be
enabled until it is fully supported by the stack.  Only 1 tx ring
will be used at the moment.

The functions to allocate/free tx memory and to initialize tx rings
are now modified to handle multiple rings.

Signed-off-by: Michael Chan <mchan@broadcom.com>
Signed-off-by: Benjamin Li <benli@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:37:42 -07:00
Ben Hutchings
4497b0763c net: Discard and warn about LRO'd skbs received for forwarding
Add skb_warn_if_lro() to test whether an skb was received with LRO and
warn if so.

Change br_forward(), ip_forward() and ip6_forward() to call it) and
discard the skb if it returns true.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:22:28 -07:00
Ben Hutchings
0187bdfb05 net: Disable LRO on devices that are forwarding
Large Receive Offload (LRO) is only appropriate for packets that are
destined for the host, and should be disabled if received packets may be
forwarded.  It can also confuse the GSO on output.

Add dev_disable_lro() function which uses the appropriate ethtool ops to
disable LRO if enabled.

Add calls to dev_disable_lro() in br_add_if() and functions that enable
IPv4 and IPv6 forwarding.

Signed-off-by: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:15:47 -07:00
Vlad Yasevich
2e3216cd54 sctp: Follow security requirement of responding with 1 packet
RFC 4960, Section 11.4. Protection of Non-SCTP-Capable Hosts

When an SCTP stack receives a packet containing multiple control or
DATA chunks and the processing of the packet requires the sending of
multiple chunks in response, the sender of the response chunk(s) MUST
NOT send more than one packet.  If bundling is supported, multiple
response chunks that fit into a single packet MAY be bundled together
into one single response packet.  If bundling is not supported, then
the sender MUST NOT send more than one response chunk and MUST
discard all other responses.  Note that this rule does NOT apply to a
SACK chunk, since a SACK chunk is, in itself, a response to DATA and
a SACK does not require a response of more DATA.

We implement this by not servicing our outqueue until we reach the end
of the packet.  This enables maximum bundling.  We also identify
'response' chunks and make sure that we only send 1 packet when sending
such chunks.

Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:08:18 -07:00
Wei Yongjun
7115e632f9 sctp: Validate Initiate Tag when handling ICMP message
This patch add to validate initiate tag and chunk type if verification
tag is 0 when handling ICMP message.

RFC 4960, Appendix C. ICMP Handling

ICMP6) An implementation MUST validate that the Verification Tag
contained in the ICMP message matches the Verification Tag of the peer.
If the Verification Tag is not 0 and does NOT match, discard the ICMP
message.  If it is 0 and the ICMP message contains enough bytes to
verify that the chunk type is an INIT chunk and that the Initiate Tag
matches the tag of the peer, continue with ICMP7.  If the ICMP message
is too short or the chunk type or the Initiate Tag does not match,
silently discard the packet.

Signed-off-by: Wei Yongjun <yjwei@cn.fujitsu.com>
Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-19 16:07:48 -07:00
David S. Miller
0344f1c66b Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6
Conflicts:

	net/mac80211/tx.c
2008-06-19 16:00:04 -07:00
Johannes Berg
ef3a62d272 mac80211: detect driver tx bugs
When a driver rejects a frame in it's ->tx() callback, it must also
stop queues, otherwise mac80211 can go into a loop here. Detect this
situation and abort the loop after five retries, warning about the
driver bug.

Signed-off-by: Johannes Berg <johannes@sipsolutions.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18 15:39:48 -07:00
Patrick McHardy
6d1a3fb567 netlink: genl: fix circular locking
genetlink has a circular locking dependency when dumping the registered
families:

- dump start:
genl_rcv()            : take genl_mutex
genl_rcv_msg()        : call netlink_dump_start() while holding genl_mutex
netlink_dump_start(),
netlink_dump()        : take nlk->cb_mutex
ctrl_dumpfamily()     : try to detect this case and not take genl_mutex a
                        second time

- dump continuance:
netlink_rcv()         : call netlink_dump
netlink_dump          : take nlk->cb_mutex
ctrl_dumpfamily()     : take genl_mutex

Register genl_lock as callback mutex with netlink to fix this. This slightly
widens an already existing module unload race, the genl ops used during the
dump might go away when the module is unloaded. Thomas Graf is working on a
seperate fix for this.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18 02:07:07 -07:00
Wang Chen
dad9b335c6 netdevice: Fix promiscuity and allmulti overflow
Max of promiscuity and allmulti plus positive @inc can cause overflow.
Fox example: when allmulti=0xFFFFFFFF, any caller give dev_set_allmulti() a
positive @inc will cause allmulti be off.
This is not what we want, though it's rare case.
The fix is that only negative @inc will cause allmulti or promiscuity be off
and when any caller makes the counters touch the roof, we return error.

Change of v2:
Change void function dev_set_promiscuity/allmulti to return int.
So callers can get the overflow error.
Caller's fix will be done later.

Change of v3:
1. Since we return error to caller, we don't need to print KERN_ERROR,
KERN_WARNING is enough.
2. In dev_set_promiscuity(), if __dev_set_promiscuity() failed, we
return at once.

Signed-off-by: Wang Chen <wangchen@cn.fujitsu.com>
Acked-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18 01:48:28 -07:00
David S. Miller
3a5be7d4b0 Revert "mac80211: Use skb_header_cloned() on TX path."
This reverts commit 608961a5ec.

The problem is that the mac80211 stack not only needs to be able to
muck with the link-level headers, it also might need to mangle all of
the packet data if doing sw wireless encryption.

This fixes kernel bugzilla #10903.  Thanks to Didier Raboud (for the
bugzilla report), Andrew Prince (for bisecting), Johannes Berg (for
bringing this bisection analysis to my attention), and Ilpo (for
trying to analyze this purely from the TCP side).

In 2.6.27 we can take another stab at this, by using something like
skb_cow_data() when the TX path of mac80211 ends up with a non-NULL
tx->key.  The ESP protocol code in the IPSEC stack can be used as a
model for implementation.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18 01:19:51 -07:00
Rami Rosen
dd574dbfcc ipv6: minor cleanup in net/ipv6/tcp_ipv6.c [RESEND ].
In net/ipv6/tcp_ipv6.c:

  - Remove unneeded tcp_v6_send_check() declaration.

Signed-off-by: Rami Rosen <ramirose@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-18 00:51:09 -07:00
David S. Miller
972692e0db net: Add sk_set_socket() helper.
In order to more easily grep for all things that set
sk->sk_socket, add sk_set_socket() helper inline function.

Suggested (although only half-seriously) by Evgeniy Polyakov.

Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 22:41:38 -07:00
Rainer Weikusat
3c73419c09 af_unix: fix 'poll for write'/ connected DGRAM sockets
The unix_dgram_sendmsg routine implements a (somewhat crude)
form of receiver-imposed flow control by comparing the length of the
receive queue of the 'peer socket' with the max_ack_backlog value
stored in the corresponding sock structure, either blocking
the thread which caused the send-routine to be called or returning
EAGAIN. This routine is used by both SOCK_DGRAM and SOCK_SEQPACKET
sockets. The poll-implementation for these socket types is
datagram_poll from core/datagram.c. A socket is deemed to be writeable
by this routine when the memory presently consumed by datagrams
owned by it is less than the configured socket send buffer size. This
is always wrong for connected PF_UNIX non-stream sockets when the
abovementioned receive queue is currently considered to be full.
'poll' will then return, indicating that the socket is writeable, but
a subsequent write result in EAGAIN, effectively causing an
(usual) application to 'poll for writeability by repeated send request
with O_NONBLOCK set' until it has consumed its time quantum.

The change below uses a suitably modified variant of the datagram_poll
routines for both type of PF_UNIX sockets, which tests if the
recv-queue of the peer a socket is connected to is presently
considered to be 'full' as part of the 'is this socket
writeable'-checking code. The socket being polled is additionally
put onto the peer_wait wait queue associated with its peer, because the
unix_dgram_sendmsg routine does a wake up on this queue after a
datagram was received and the 'other wakeup call' is done implicitly
as part of skb destruction, meaning, a process blocked in poll
because of a full peer receive queue could otherwise sleep forever
if no datagram owned by its socket was already sitting on this queue.
Among this change is a small (inline) helper routine named
'unix_recvq_full', which consolidates the actual testing code (in three
different places) into a single location.

Signed-off-by: Rainer Weikusat <rweikusat@mssgmbh.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2008-06-17 22:28:05 -07:00
David S. Miller
5bbc1722d5 Merge branch 'davem-next' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-06-17 21:37:14 -07:00
David S. Miller
4552e1198a Merge branch 'davem-fixes' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/netdev-2.6 2008-06-17 21:32:08 -07:00