Add missing sg_init_table for sg_set_buf in virtio_net which
induced in defer skb patch.
Reported-by: Thomas Müller <thomas@mathtm.de>
Tested-by: Thomas Müller <thomas@mathtm.de>
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Move sg structure off stack and into virtnet_info structure.
This helps remove extra sg_init_table calls as well as reduce
stack usage.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Tested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Converts the list and the core manipulating with it to be the same as uc_list.
+uses two functions for adding/removing mc address (normal and "global"
variant) instead of a function parameter.
+removes dev_mcast.c completely.
+exposes netdev_hw_addr_list_* macros along with __hw_addr_* functions for
manipulation with lists on a sandbox (used in bonding and 80211 drivers)
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing sg_init_table for sg_set_buf in virtio_net which
induced in defer skb patch.
Reported-by: Thomas Müller <thomas@mathtm.de>
Tested-by: Thomas Müller <thomas@mathtm.de>
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
This is no longer needed. I missed to remove this in
567ec874d1 ("net: convert multiple
drivers to use netdev_for_each_mc_addr, part6")
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Now we have a virtio detach API (in commit
f9bfbebf34), we don't need to track xmit
skbs in the virio_net driver, which improves transmission performance.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch replaces dev->mc_count in all drivers (hopefully I didn't miss
anything). Used spatch and did small tweaks and conding style changes when
it was suitable.
Jirka
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_net receives packets from its pre-allocated vring buffers, then it
delivers these packets to upper layer protocols as skb buffs. So it's not
necessary to pre-allocate skb for each mergable buffer, then frees extra
skbs when buffers are merged into a large packet. This patch has deferred
skb allocation in receiving packets for both big packets and mergeable buffers
to reduce skb pre-allocations and skb frees. It frees unused buffers by calling
detach_unused_buf in vring, so recv skb queue is not needed.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
I have seen RX stalls on a machine that experienced a suspected
OOM. After the stall, the RX buffer is empty on the guest side
and there are exactly 16 entries available on the host side. As
the number of entries is less than that required by a maximal
skb, the host cannot proceed.
The guest did not have a refill job scheduled.
My diagnosis is that an OOM had occured, with the delayed refill
job scheduled. The job was able to allocate at least one skb, but
not enough to overcome the minimum required by the host to proceed.
As the refill job would only reschedule itself if it failed completely
to allocate any skbs, this would lead to an RX stall.
The following patch removes this stall possibility by always
rescheduling the refill job until the ring is totally refilled.
Testing has shown that the RX stall no longer occurs whereas
previously it would occur within a day.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch introduces three macros to work with uc list from net drivers.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Only files where David Miller is the primary git-signer.
wireless, wimax, ixgbe, etc are not modified.
Compile tested x86 allyesconfig only
Not all files compiled (not x86 compatible)
Added a few > 80 column lines, which I ignored.
Existing checkpatch complaints ignored.
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (34 commits)
net/fsl_pq_mdio: add module license GPL
can: fix WARN_ON dump in net/core/rtnetlink.c:rtmsg_ifinfo()
can: should not use __dev_get_by_index() without locks
hisax: remove bad udelay call to fix build error on ARM
ipip: Fix handling of DF packets when pmtudisc is OFF
qlge: Set PCIe reset type for EEH to fundamental.
qlge: Fix early exit from mbox cmd complete wait.
ixgbe: fix traffic hangs on Tx with ioatdma loaded
ixgbe: Fix checking TFCS register for TXOFF status when DCB is enabled
ixgbe: Fix gso_max_size for 82599 when DCB is enabled
macsonic: fix crash on PowerBook 520
NET: cassini, fix lock imbalance
ems_usb: Fix byte order issues on big endian machines
be2net: Bug fix to send config commands to hardware after netdev_register
be2net: fix to set proper flow control on resume
netfilter: xt_connlimit: fix regression caused by zero family value
rt2x00: Don't queue ieee80211 work after USB removal
Revert "ipw2200: fix oops on missing firmware"
decnet: netdevice refcount leak
netfilter: nf_nat: fix NAT issue in 2.6.30.4+
...
Conflicts:
drivers/net/usb/cdc_ether.c
All CDC ethernet devices of type USB_CLASS_COMM need to use
'&mbm_info'.
Signed-off-by: David S. Miller <davem@davemloft.net>
Commit
3d1285b (move virtnet_remove to .devexit.text)
introduced the first reference to __devexit in struct virtio_driver
virtio_net which upset modpost ("Section mismatch in reference from the
variable virtio_net to the function .devexit.text:virtnet_remove()").
Fix this by renaming virtio_net to virtio_net_driver.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Reported-by: Michael S. Tsirkin <mst@redhat.com>
Blame-taken-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (43 commits)
net: Fix 'Re: PACKET_TX_RING: packet size is too long'
netdev: usb: dm9601.c can drive a device not supported yet, add support for it
qlge: Fix firmware mailbox command timeout.
qlge: Fix EEH handling.
AF_RAW: Augment raw_send_hdrinc to expand skb to fit iphdr->ihl (v2)
bonding: fix a race condition in calls to slave MII ioctls
virtio-net: fix data corruption with OOM
sfc: Set ip_summed correctly for page buffers passed to GRO
cnic: Fix L2CTX_STATUSB_NUM offset in context memory.
MAINTAINERS: rt2x00 list is moderated
airo: Reorder tests, check bounds before element
mac80211: fix for incorrect sequence number on hostapd injected frames
libertas spi: fix sparse errors
mac80211: trivial: fix spelling in mesh_hwmp
cfg80211: sme: deauthenticate on assoc failure
mac80211: keep auth state when assoc fails
mac80211: fix ibss joining
b43: add 'struct b43_wl' missing declaration
b43: Fix Bugzilla #14181 and the bug from the previous 'fix'
rt2x00: Fix crypto in TX frame for rt2800usb
...
virtio net used to unlink skbs from send queues on error,
but ever since 48925e372f
we do not do this. This causes guest data corruption and crashes
with vhost since net core can requeue the skb or free it without
it being taken off the list.
This patch fixes this by queueing the skb after successful
transmit.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
* git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux-2.6-for-linus:
move virtrng_remove to .devexit.text
move virtballoon_remove to .devexit.text
virtio_blk: Revert serial number support
virtio: let header files include virtio_ids.h
virtio_blk: revert QUEUE_FLAG_VIRT addition
Rusty,
commit 3ca4f5ca73
virtio: add virtio IDs file
moved all device IDs into a single file. While the change itself is
a very good one, it can break userspace applications. For example
if a userspace tool wanted to get the ID of virtio_net it used to
include virtio_net.h. This does no longer work, since virtio_net.h
does not include virtio_ids.h.
This patch moves all "#include <linux/virtio_ids.h>" from the C
files into the header files, making the header files compatible with
the old ones.
In addition, this patch exports virtio_ids.h to userspace.
CC: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
The function virtnet_remove is used only wrapped by __devexit_p so define
it using __devexit.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Sam Ravnborg <sam@ravnborg.org>
Cc: David S. Miller <davem@davemloft.net>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Alex Williamson <alex.williamson@hp.com>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: netdev@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Saves us one cycle of alloc-add-free if the queue was full.
Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (modified)
Now we can tell the theoretical capacity remaining in the output
queue, virtio_net can waste entries by stopping the queue early.
It doesn't work in the case of indirect buffers and kmalloc failure,
but that's rare (we could drop the packet in that case, but other
drivers return TX_BUSY for similar reasons).
For the record, I think this patch reflects poorly on the linux
network API.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
We put the virtio_net_hdr into the skb's cb region; turn this into a
union to clean up the code slightly and allow future expansion.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mark McLoughlin <markmc@redhat.com>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
The virtio_net driver is complicated by the two methods of freeing old
xmit buffers (in addition to freeing old ones at the start of the xmit
path).
The original code used a 1/10 second timer attached to xmit_free(),
reset on every xmit. Before we orphaned skbs on xmit, the
transmitting userspace could block with a full socket until the timer
fired, the skb destructor was called, and they were re-woken.
So we added the VIRTIO_F_NOTIFY_ON_EMPTY feature: supporting devices
send an interrupt (even if normally suppressed) on an empty xmit ring
which makes us schedule xmit_tasklet(). This was a benchmark win.
Unfortunately, VIRTIO_F_NOTIFY_ON_EMPTY makes quite a lot of work: a
host which is faster than the guest will fire the interrupt every xmit
packet (slowing the guest down further). Attempting mitigation in the
host adds overhead of userspace timers (possibly with the additional
pain of signals), and risks increasing latency anyway if you get it
wrong.
In practice, this effect was masked by benchmarks which take advantage
of GSO (with its inherent transmit batching), but it's still there.
Now we orphan xmitted skbs, the pressure is off: remove both paths and
no longer request VIRTIO_F_NOTIFY_ON_EMPTY. Note that the current
QEMU will notify us even if we don't negotiate this feature (legal,
but suboptimal); a patch is outstanding to improve that.
Move the skb_orphan/nf_reset to after we've done the send and notified
the other end, for a slight optimization.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Mark McLoughlin <markmc@redhat.com>
This effectively reverts 99ffc696d1
"virtio: wean net driver off NETDEV_TX_BUSY".
The complexity of queuing an skb (setting a tasklet to re-xmit) is
questionable, especially once we get rid of the other reason for the
tasklet in the next patch.
If the skb won't fit in the tx queue, just return NETDEV_TX_BUSY.
This is frowned upon, so a followup patch uses a more complex solution.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
The complex transmit free logic was introduced to avoid hangs on
removing the ip_conntrack module and also because drivers aren't
generally supposed to keep stale skbs for unbounded times.
After some debate, it was decided that while doing skb_orphan()
generally is a rat's nest, we can do it in this driver. Following
patches take advantage of this.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Virtio IDs are spread all over the tree which makes assigning new IDs
bothersome. Putting them together should make the process less error-prone.
Signed-off-by: Fernando Luis Vazquez Cao <fernando@oss.ntt.co.jp>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
This API change means that virtio_net can tell how much capacity
remains for buffers. It's necessarily fuzzy, since
VIRTIO_RING_F_INDIRECT_DESC means we can fit any number of descriptors
in one, *if* we can kmalloc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Dinesh Subhraveti <dineshs@us.ibm.com>
No need to put ethtool_ops in data, they should be const.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
These are all drivers that don't touch real hardware.
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
If we run out of memory, use keventd to fill the buffer. There's a
report of this happening: "Page allocation failures in guest",
Message-ID: <20090713115158.0a4892b0@mjolnir.ossman.eu>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
- Allow setting UFO on virtio-net and advertise to host.
Signed-off-by: Sridhar Samudrala <sri@us.ibm.com>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch is inspired by patch recently posted by Johannes Berg. Basically what
my patch does is to group list and a count of addresses into newly introduced
structure netdev_hw_addr_list. This brings us two benefits:
1) struct net_device becames a bit nicer.
2) in the future there will be a possibility to operate with lists independently
on netdevices (with exporting right functions).
I wanted to introduce this patch before I'll post a multicast lists conversion.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bnx2.c | 4 +-
drivers/net/e1000/e1000_main.c | 4 +-
drivers/net/ixgbe/ixgbe_main.c | 6 +-
drivers/net/mv643xx_eth.c | 2 +-
drivers/net/niu.c | 4 +-
drivers/net/virtio_net.c | 10 ++--
drivers/s390/net/qeth_l2_main.c | 2 +-
include/linux/netdevice.h | 17 +++--
net/core/dev.c | 130 ++++++++++++++++++--------------------
9 files changed, 89 insertions(+), 90 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>
This replaces find_vq/del_vq with find_vqs/del_vqs virtio operations,
and updates all drivers. This is needed for MSI support, because MSI
needs to know the total number of vectors upfront.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (+ lguest/9p compile fixes)
Add a linked list of all virtqueues for a virtio device: this helps for
debugging and is also needed for upcoming interface change.
Also, add a "name" field for clearer debug messages.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
We need to enforce the IP alignment on the non-mergeable RX path just
like the other RX path. Not doing so results in misaligned IP
headers.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Through a bug in the tun driver, I noticed that virtio_net is
producing bogus hdr_len values. In particular, it only includes
the IP header in the linear area, and excludes the entire TCP
header. This causes the TCP header to be copied twice for each
packet. (The bug omitted the second copy :)
This patch corrects this.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch converts unicast address list to standard list_head using
previously introduced struct netdev_hw_addr. It also relaxes the
locking. Original spinlock (still used for multicast addresses) is not
needed and is no longer used for a protection of this list. All
reading and writing takes place under rtnl (with no changes).
I also removed a possibility to specify the length of the address
while adding or deleting unicast address. It's always dev->addr_len.
The convertion touched especially e1000 and ixgbe codes when the
change is not so trivial.
Signed-off-by: Jiri Pirko <jpirko@redhat.com>
drivers/net/bnx2.c | 13 +--
drivers/net/e1000/e1000_main.c | 24 +++--
drivers/net/ixgbe/ixgbe_common.c | 14 ++--
drivers/net/ixgbe/ixgbe_common.h | 4 +-
drivers/net/ixgbe/ixgbe_main.c | 6 +-
drivers/net/ixgbe/ixgbe_type.h | 4 +-
drivers/net/macvlan.c | 11 +-
drivers/net/mv643xx_eth.c | 11 +-
drivers/net/niu.c | 7 +-
drivers/net/virtio_net.c | 7 +-
drivers/s390/net/qeth_l2_main.c | 6 +-
drivers/scsi/fcoe/fcoe.c | 16 ++--
include/linux/netdevice.h | 18 ++--
net/8021q/vlan.c | 4 +-
net/8021q/vlan_dev.c | 10 +-
net/core/dev.c | 195 +++++++++++++++++++++++++++-----------
net/dsa/slave.c | 10 +-
net/packet/af_packet.c | 4 +-
18 files changed, 227 insertions(+), 137 deletions(-)
Signed-off-by: David S. Miller <davem@davemloft.net>