Add per-device dma_mapping_ops support for CONFIG_X86_64 as POWER
architecture does:
This enables us to cleanly fix the Calgary IOMMU issue that some devices
are not behind the IOMMU (http://lkml.org/lkml/2008/5/8/423).
I think that per-device dma_mapping_ops support would be also helpful for
KVM people to support PCI passthrough but Andi thinks that this makes it
difficult to support the PCI passthrough (see the above thread). So I
CC'ed this to KVM camp. Comments are appreciated.
A pointer to dma_mapping_ops to struct dev_archdata is added. If the
pointer is non NULL, DMA operations in asm/dma-mapping.h use it. If it's
NULL, the system-wide dma_ops pointer is used as before.
If it's useful for KVM people, I plan to implement a mechanism to register
a hook called when a new pci (or dma capable) device is created (it works
with hot plugging). It enables IOMMUs to set up an appropriate
dma_mapping_ops per device.
The major obstacle is that dma_mapping_error doesn't take a pointer to the
device unlike other DMA operations. So x86 can't have dma_mapping_ops per
device. Note all the POWER IOMMUs use the same dma_mapping_error function
so this is not a problem for POWER but x86 IOMMUs use different
dma_mapping_error functions.
The first patch adds the device argument to dma_mapping_error. The patch
is trivial but large since it touches lots of drivers and dma-mapping.h in
all the architecture.
This patch:
dma_mapping_error() doesn't take a pointer to the device unlike other DMA
operations. So we can't have dma_mapping_ops per device.
Note that POWER already has dma_mapping_ops per device but all the POWER
IOMMUs use the same dma_mapping_error function. x86 IOMMUs use device
argument.
[akpm@linux-foundation.org: fix sge]
[akpm@linux-foundation.org: fix svc_rdma]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix bnx2x]
[akpm@linux-foundation.org: fix s2io]
[akpm@linux-foundation.org: fix pasemi_mac]
[akpm@linux-foundation.org: fix sdhci]
[akpm@linux-foundation.org: build fix]
[akpm@linux-foundation.org: fix sparc]
[akpm@linux-foundation.org: fix ibmvscsi]
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Muli Ben-Yehuda <muli@il.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@qumranet.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The E100 device can't work on current kernel (2.6.26-rc6) and will cause
kernel corruption on intel ixdp4xx.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This fixes a "trying to free already free IRQ" message and simplifies
the shutdown/suspend code by re-using already existing code when going
to suspend. The code is now symmetric with e100_resume.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andreas Mohr <andi@lisas.de>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Patch against netdev-2.6 follows.
--
writeX functions are not permitted on iomap-ped space change to iowriteX,
also pci_unmap pci_map-ped space on exit (instead of iounmap).
Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
On the systems that have cache incoherent DMA, including ARM, there
is a race condition between software allocating a new receive buffer
and hardware writing into a buffer. The two race on touching the last
Receive Frame Descriptor (RFD). It has its el-bit set and its next
link equal to 0. When hardware encounters this buffer it attempts to
write data to it and then update Status Word bits and Actual Count in
the RFD. At the same time software may try to clear the el-bit and
set the link address to a new buffer.
Since the entire RFD is once cache-line, the two write operations can
collide. This can lead to the receive unit stalling or interpreting
random memory as its receive area.
The fix is to set the el-bit on and the size to 0 on the next to last
buffer in the chain. When the hardware encounters this buffer it stops
and does not write to it at all. The hardware issues an RNR interrupt
with the receive unit in the No Resources state. Software can write
to the tail of the list because it knows hardware will stop on the
previous descriptor that was marked as the end of list.
Once it has a new next to last buffer prepared, it can clear the el-bit
and set the size on the previous one. The race on this buffer is safe
since the link already points to a valid next buffer and the software
can handle the race setting the size (assuming aligned 16 bit writes
are atomic with respect to the DMA read). If the hardware sees the
el-bit cleared without the size set, it will move on to the next buffer
and skip this one. If it sees the size set but the el-bit still set,
it will complete that buffer and then RNR interrupt and wait.
Signed-off-by: David Acker <dacker@roinet.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Using ARRAY_SIZE() on arrays of the form array[][K] makes it unnecessary
to know the value of K when checking its size.
Signed-off-by: Alejandro Martinez Ruiz <alex@flawedcode.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This makes the ->poll() routines of the E100, E1000, E1000E, IXGB, and
IXGBE drivers complete ->poll() consistently.
Now they will all break out when the amount of RX work done is less
than 'budget'.
At a later time, we may want put back code to include the TX work as
well (as at least one other NAPI driver does, but by in large NAPI
drivers do not do this). But if so, it should be done consistently
across the board to all of these drivers.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Auke Kok <auke-jan.h.kok@intel.com>
Drivers do this to try to break out of the ->poll()'ing loop
when the device is being brought administratively down.
Now that we have a napi_disable() "pending" state we are going
to solve that problem generically.
Signed-off-by: David S. Miller <davem@davemloft.net>
Adapted from Ian Wienand <ianw@gelato.unsw.edu.au>
Explicitly free the IRQ before removing the device to remove a
warning "Destroying IRQ without calling free_irq"
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Ian Wienand <ianw@gelato.unsw.edu.au>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
No need to convert to bytes and back - cleanup unneeded code.
Adapted from fix from 'Roel Kluin <12o3l@tiscali.nl>'
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Fix the various misspellings of "system", controller", "interrupt" and
"[un]necessary".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@kernel.org>
Fix some of the easy warnings in network device drivers.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
These have been superceded by the new ->get_sset_count() hook.
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
We now have struct net_device_stats embedded in struct net_device,
and the default ->get_stats() hook does the obvious thing for us.
Run through drivers/net/* and remove the driver-local storage of
statistics, and driver-local ->get_stats() hook where applicable.
This was just the low-hanging fruit in drivers/net; plenty more drivers
remain to be updated.
[ Resolved conflicts with napi_struct changes and fix sunqe build
regression... -DaveM ]
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
It's been a useless no-op for long enough in 2.6 so I figured it's time to
remove it. The number of people that could object because they're
maintaining unified 2.4 and 2.6 drivers is probably rather small.
[ Handled drivers added by netdev tree and some missed IRDA cases... -DaveM ]
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Since E100 timer is 2HZ, use rounding to make timer occur on the
correct boundary.
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
All drivers implement ethtool get_perm_addr the same way -- by calling
the generic function. So we can inline the generic function into the
caller and avoid going through the drivers.
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: David S. Miller <davem@davemloft.net>
Instead of all drivers reading pci config space to get the revision
ID, they can now use the pci_device->revision member.
This exposes some issues where drivers where reading a word or a dword
for the revision number, and adding useless error-handling around the
read. Some drivers even just read it for no purpose of all.
In devices where the revision ID is being copied over and used in what
appears to be the equivalent of hotpath, I have left the copy code
and the cached copy as not to influence the driver's performance.
Compile tested with make all{yes,mod}config on x86_64 and i386.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
The 82550 & 51 parts have an extended configuration block that
includes a bit "GMRC", required to enable the expected TCO behavior,
in config byte offset 22d. The config block sent by the failing driver
does include the extension area, but this bit is not initialised,
and the downlaod only specifies 0x16 bytes to be sent to the NIC
(thaht's bytes 00..21d). By initializing the GMRC bit, and extending
the download size for D102+ MACs, the problem is resolved.
Signed-off-by: David Graham <david.graham@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This reverts commit d52df4a35a.
This patch attempted to fix e100 for non-cache coherent memory
architectures by using the cb style code that eepro100 had and using
the EL and s bits from the RFD list. Unfortunately the hardware
doesn't work exactly like this and therefore this patch actually
breaks e100. Reverting the change brings it back to the previously
known good state for 2.6.22. The pending rewrite in progress to this
code can then be safely merged later.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
It appears that some systems still like e100 better if it uses
I/O access mode. Setting the new parameter use_io=1 will cause
all driver instances to use io mapping to access the register
space on the e100 device.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Seved Torstendahl <seved.torstendahl@netinsight.net> suggested to
let the module parameter for invalid eeprom checksum control the valid
mac address test.
If this bypass happens we should print a different message,
or at least one that is correct, maybe something like below
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
I was going to say that eepro100's speedo_rx_link() does the same DMA
abuse as e100, but then I noticed one little detail: eepro100 sets both
EL (end of list) and S (suspend) bits in the RFD as it chains it to the
RFD list. e100 was only setting the EL bit. Hmmm, that's interesting.
That means that if HW reads a RFD with the S-bit set, it'll process
that RFD and then suspend the receive unit. The receive unit will
resume when SW clears the S-bit. There is no need for SW to restart
the receive unit. Which means a lot of the receive unit state tracking
code in the driver goes away.
So here's a patch against 2.6.14. (Sorry for inlining it; the mailer
I'm using now will mess with the word wrap). I can't test this on
XScale (unless someone has an e100 module for Gumstix :) . It should
be doing exactly what eepro100 does with RFDs. I don't believe this
change will introduce a performance hit because the S-bit and EL-bit go
hand-in-hand meaning if we're going to suspend because of the S- bit,
we're on the last resource anyway, so we'll have to wait for SW to
replenish.
(cherry picked from 29e79da9495261119e3b2e4e7c72507348e75976 commit)
To clearly state the intent of copying to linear sk_buffs, _offset being a
overly long variant but interesting for the sake of saving some bytes.
Signed-off-by: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
e100: fix napi ifdefs removing needed code
From: Auke Kok <auke-jan.h.kok@intel.com>
The e100 driver is NAPI mode only. We need to netif_poll_disable
during suspend and shutdown. The non-NAPI driver code was removed
and is only avaiable in the out-of-tree e100 kernel driver.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
e100: fix irq leak on suspend/resume
From: Frederik Deweerdt <frederik.deweerdt@gmail.com>
The e100_resume() function should be calling netif_device_detach and
free_irq. This fixes multiple irq's being allocated after resume.
Signed-off-by: Frederik Deweerdt <frederik.deweerdt@gmail.com>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Conflicts:
drivers/infiniband/core/iwcm.c
drivers/net/chelsio/cxgb2.c
drivers/net/wireless/bcm43xx/bcm43xx_main.c
drivers/net/wireless/prism54/islpci_eth.c
drivers/usb/core/hub.h
drivers/usb/input/hid-core.c
net/core/netpoll.c
Fix up merge failures with Linus's head and fix new compilation failures.
Signed-Off-By: David Howells <dhowells@redhat.com>
Fix various .c/.h typos in comments (no code changes).
Signed-off-by: Matt LaPlante <kernel1@cyberdogtech.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Account for the interface being closed before disabling polling
on a device, to fix shutdown on some systems that explcitly close
the netdevice before calling shutdown.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
When rebooting with netconsole over e100, the driver shutdown code would
deadlock with netpoll. Reduce shutdown code to a bare minimum while retaining
WoL and suspend functionality.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
Cc: Jeff Garzik <jeff@garzik.org>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
Maintain a per-CPU global "struct pt_regs *" variable which can be used instead
of passing regs around manually through all ~1800 interrupt handlers in the
Linux kernel.
The regs pointer is used in few places, but it potentially costs both stack
space and code to pass it around. On the FRV arch, removing the regs parameter
from all the genirq function results in a 20% speed up of the IRQ exit path
(ie: from leaving timer_interrupt() to leaving do_IRQ()).
Where appropriate, an arch may override the generic storage facility and do
something different with the variable. On FRV, for instance, the address is
maintained in GR28 at all times inside the kernel as part of general exception
handling.
Having looked over the code, it appears that the parameter may be handed down
through up to twenty or so layers of functions. Consider a USB character
device attached to a USB hub, attached to a USB controller that posts its
interrupts through a cascaded auxiliary interrupt controller. A character
device driver may want to pass regs to the sysrq handler through the input
layer which adds another few layers of parameter passing.
I've build this code with allyesconfig for x86_64 and i386. I've runtested the
main part of the code on FRV and i386, though I can't test most of the drivers.
I've also done partial conversion for powerpc and MIPS - these at least compile
with minimal configurations.
This will affect all archs. Mostly the changes should be relatively easy.
Take do_IRQ(), store the regs pointer at the beginning, saving the old one:
struct pt_regs *old_regs = set_irq_regs(regs);
And put the old one back at the end:
set_irq_regs(old_regs);
Don't pass regs through to generic_handle_irq() or __do_IRQ().
In timer_interrupt(), this sort of change will be necessary:
- update_process_times(user_mode(regs));
- profile_tick(CPU_PROFILING, regs);
+ update_process_times(user_mode(get_irq_regs()));
+ profile_tick(CPU_PROFILING);
I'd like to move update_process_times()'s use of get_irq_regs() into itself,
except that i386, alone of the archs, uses something other than user_mode().
Some notes on the interrupt handling in the drivers:
(*) input_dev() is now gone entirely. The regs pointer is no longer stored in
the input_dev struct.
(*) finish_unlinks() in drivers/usb/host/ohci-q.c needs checking. It does
something different depending on whether it's been supplied with a regs
pointer or not.
(*) Various IRQ handler function pointers have been moved to type
irq_handler_t.
Signed-Off-By: David Howells <dhowells@redhat.com>
(cherry picked from 1b16e7ac850969f38b375e511e3fa2f474a33867 commit)
Unify our shutdown/suspend/resume code and make it similar to e1000:
e1000_shutdown now calls suspend which does the exact same thing on
shutdown except saving PCI config state on suspend. WoL setup code
is now also more simple and works even when CONFIG_PM is not set, which
was previously broken.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
We keep getting requests from people that think that this might be
an exploitable hole where we would overwrite 4 bytes in the netdev
struct if the pci name would exceed 15 characters. In reality this
will never happen but we fix it anyway.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
This update to the copyright header adds the mailinglist, and aligns it
with the kernel licensing as well as remove the offending 'all rights
reserved'.
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>
A recent patch in -mm3 titled
"gregkh-pci-pci-don-t-enable-device-if-already-enabled.patch" causes
pci_enable_device() to be a no-op if the kernel thinks that the device is
already enabled. This change breaks the PCI error recovery mechanism in
the e100 device driver, since, after PCI slot reset, the card is no longer
enabled. This is a trivial fix for this problem. Tested.
Signed-off-by: Linas Vepstas <linas@austin.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Auke Kok <auke-jan.h.kok@intel.com>