When testing the myri10ge driver with 2.6.24-rc1, I found
that the machine crashed under heavy load:
Unable to handle kernel paging request at 0000000000100108 RIP:
[<ffffffff803cc8dd>] net_rx_action+0x11b/0x184
The address corresponds to the list_move_tail() in
netif_rx_complete():
if (unlikely(work == weight))
list_move_tail(&n->poll_list, list);
Eventually, I traced the crashes to calling netif_rx_complete() with
work_done == budget. From looking at other drivers, it appears that
one should only call netif_rx_complete() when work_done < budget.
To fix it, I changed the test in myri10ge_poll() so that it refers
to to work_done rather than looking at the rx ring status. If
work_done is < budget, then that implies we have no more packets to
process. Any races will be resolved by the NIC when the write to
irq_claim is made.
In myri10ge_clean_rx_done(), if we ever exceeded our budget, it would
report a work_done one larger than was acutally done. This is because
the increment was done in the conditional, so work_done would be
incremented regardless of whether or not the test passed or failed.
This would lead to the WARN_ON_ONCE(work > weight); warning in
net_rx_action triggering. I've moved the increment of work_done
inside the loop. Note that this would only be a problem when we had
exceeded our budget.
Signed off by: Andrew Gallatin <gallatin@myri.com>
Andrew Gallatin Myricom Inc
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Found these while looking at printk uses.
Add missing newlines to dev_<level> uses
Add missing KERN_<level> prefixes to multiline dev_<level>s
Fixed a wierd->weird spelling typo
Added a newline to a printk
Signed-off-by: Joe Perches <joe@perches.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Jens Axboe <jens.axboe@oracle.com>
Cc: Mark M. Hoffman <mhoffman@lightlink.com>
Cc: Roland Dreier <rolandd@cisco.com>
Cc: Tilman Schmidt <tilman@imap.cc>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Jeff Garzik <jeff@garzik.org>
Cc: Stephen Hemminger <shemminger@linux-foundation.org>
Cc: Greg KH <greg@kroah.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Alessandro Zummo <a.zummo@towertech.it>
Cc: David Brownell <david-b@pacbell.net>
Cc: James Smart <James.Smart@Emulex.Com>
Cc: Andrew Vasquez <andrew.vasquez@qlogic.com>
Cc: "Antonino A. Daplas" <adaplas@pol.net>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Cc: Russell King <rmk@arm.linux.org.uk>
Cc: Jaroslav Kysela <perex@suse.cz>
Cc: Takashi Iwai <tiwai@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Update myri10ge firmware headers to latest upstream version with
TSO6 and RSS support.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix one comment in myri10ge.c and update indendation and white spaces
to match the code generated by indent from upstream CVS.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
These have been superceded by the new ->get_sset_count() hook.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the operations
get-tx-csum
get-sg
get-tso
get-ufo
the default ethtool_op_xxx behavior is fine for all drivers, so we
permit op==NULL to imply the default behavior.
This provides a more uniform behavior across all drivers, eliminating
ethtool(8) "ioctl not supported" errors on older drivers that had
not been updated for the latest sub-ioctls.
The ethtool_op_xxx() functions are left exported, in case anyone
wishes to call them directly from a driver-private implementation --
a not-uncommon case. Should an ethtool_op_xxx() helper remain unused
for a while, except by net/core/ethtool.c, we can un-export it at a
later date.
[ Resolved conflicts with set/get value ethtool patch... -DaveM ]
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch covers something like this:
dev = alloc_*dev(...
...
priv = netdev_priv(dev);
memset(priv, 0, sizeof(*priv));
The memset() here is superfluous. alloc_netdev() uses kzalloc()
to allocate needed memory so there is no need to zero the priv region
twice.
Signed-off-by: Mariusz Kozlowski <m.kozlowski@tuxland.pl>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on a patch from Peter Oruba, convert myri10ge to use pcie_get_readrq()
and pcie_set_readrq() instead of our own PCI calls and arithmetics.
These driver changes incorporate the proposed PCI-X / PCI-Express read byte
count interface. Reading and setting those values doesn't take place
"manually", instead wrapping functions are called to allow quirks for some
PCI bridges.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off by: Peter Oruba <peter.oruba@amd.com>
Based on work by Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Use the pause counter to avoid a needless device reset, and
print a message telling the admin that our link partner is
flow controlling us down to 0 pkts/sec.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove nonsensical limit in the tx done routine. Specifically,
the loop will always terminate after processing <= 1 rings worth
of frames, as the mcp index is not refetched, so the removed
conditional could never be true.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
SET_NETDEV_DEV() in myri10ge to create the "/sys/class/net/<if>/device"
symlink.
Signed-off-by: Maik Hampel <m.hampel@gmx.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Since Myri-10G boards may also run in Myrinet mode instead of Ethernet,
add a message when we detect that the link partner is not running in the
right mode.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Limit the number of recoveries from a NIC hw watchdog reset to 1 by default.
It enables detection of defective NICs immediately since these memory parity
errors are expected to happen very rarely (less than once per century*NIC).
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove the aligned-completion whitelist, and replace it by using the 1.4.16
firmware's auto-detection features to choose which firmware to load.
The driver now loads the aligned firmware, performs a MXGEFW_CMD_UNALIGNED_TEST,
and falls back to using the unaligned firmware if:
- The firmware is too old (ie, MXGEFW_CMD_UNALIGNED_TEST is an unknown command).
- The MXGEFW_CMD_UNALIGNED_TEST returns MXGEFW_CMD_ERROR_UNALIGNED, meaning
that it has seen an unaligned completion during the DMA test.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Don't count on whatever implementation artifact preserves the
multicast list across a reset cmd, and setup multicast filtering
as part of our reset routine.
The setting of allmulti when adopting firmware with the rx-filter
broadcast bug is also moved into the multicast setup routine where
it belongs.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add dropped_pause, dropped_bad_phy, dropped_bad_crc32,
dropped_unicast_filtered to the set of ethtool counters.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
The ip_hdrlen() buddy, created to reduce the number of skb->h.th-> uses and to
avoid the longer, open coded equivalent.
Ditched a no-op in bnx2 in the process.
I wonder if we should have a BUG_ON(skb->h.th->doff < 5) in tcp_optlen()...
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
For the quite common 'skb->h.raw - skb->data' sequence.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
One less thing for drivers writers to worry about.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add the Intel 5000 southbridge (aka Intel 6310/6311/6321ESB) PCIe ports
and the Intel E30x0 chipsets to the whitelist of aligned PCIe completion.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Simpler way of dealing with the firmware 4KB boundary crossing
restriction for rx buffers. This fixes a variety of memory
corruption issues when using an "uncommon" MTU with a 16KB
page size.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Correctly detect when TSO should be used on transmit by looking at the
skb->gso_size rather than seeing if the frame was larger than our MTU.
The old method causes problems when a host with a large (jumbo) MTU is
sending to a host with a small (standard) MTU.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix management of allocated physical pages when the architecture
page size is not 4kB since the firmware cannot cross 4K boundary.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Add a wc_enabled flag in the myri10ge_priv instead of relying
on mtrr >= 0.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Do not use 4k rdma request on SGI TIOCE chipset since this
bridge does not support it.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Allocate a specific page and use pci_map_page for dma test instead
of relying on another existing buffer.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix a missing error check in myri10ge_allocate_rings() and set status
to -ENOMEM before all actual allocations so that the error path returns
what it should.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix copyright and license ("regents" should not have ever been used).
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Work around a bug which occurs when adopting firmware versions
1.4.4 though 1.4.11 where broadcasts are filtered as if they
were multicasts.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Remove the NETIF_F_TSO #ifdef-ery in drivers/net; this was
for old-old-2.4 compat (even current 2.4 has NETIF_F_TSO)
but it's time to get rid of it by now.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Now that IRQ allocation is done in myri10ge_open(), we want to still
check when loading the driver that IRQ allocation could succeed later.
Additionaly, we fix the initialization and printing of netdev->irq.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Under some circumstances, using WC without the WC fifo is faster.
So we make it possible to tune wc_fifo with a module parameter.
Signed-off-by: Brice Goglin <brice@myri.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>