Enable ibmveth for Cooperative Memory Overcommitment (CMO). For this driver
it means calculating a desired amount of IO memory based on the current MTU
and updating this value with the bus when MTU changes occur. Because DMA
mappings can fail, we have added a bounce buffer for temporary cases where
the driver can not map IO memory for the buffer pool.
The following changes are made to enable the driver for CMO:
* DMA mapping errors will not result in error messages if entitlement has
been exceeded and resources were not available.
* DMA mapping errors are handled gracefully, ibmveth_replenish_buffer_pool()
is corrected to check the return from dma_map_single and fail gracefully.
* The driver will have a get_desired_dma function defined to function
in a CMO environment.
* When the MTU is changed, the driver will update the device IO entitlement
Signed-off-by: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Removes the use of bitfields from the ibmveth driver. This results
in slightly smaller object code.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Removes dead frag processing code from ibmveth. Since NETIF_F_SG was
not set, this code was never executed. Also, since the ibmveth
interface can only handle 6 fragments, core networking code would need
to be modified in order to efficiently enable this support.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch adds the appropriate ethtool hooks to allow for enabling/disabling
of hypervisor assisted checksum offload for TCP.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patchset enables TCP checksum offload support for IPV4
on ibmveth. This completely eliminates the generation and checking of
the checksum for packets that are completely virtual and never
touch a physical network. A simple TCP_STREAM netperf run on
a virtual network with maximum mtu set yielded a ~30% increase
in throughput. This feature is enabled by default on systems that
support it, but can be disabled with a module option.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Several devices have multiple independant RX queues per net
device, and some have a single interrupt doorbell for several
queues.
In either case, it's easier to support layouts like that if the
structure representing the poll is independant from the net
device itself.
The signature of the ->poll() call back goes from:
int foo_poll(struct net_device *dev, int *budget)
to
int foo_poll(struct napi_struct *napi, int budget)
The caller is returned the number of RX packets processed (or
the number of "NAPI credits" consumed if you want to get
abstract). The callee no longer messes around bumping
dev->quota, *budget, etc. because that is all handled in the
caller upon return.
The napi_struct is to be embedded in the device driver private data
structures.
Furthermore, it is the driver's responsibility to disable all NAPI
instances in it's ->stop() device close handler. Since the
napi_struct is privatized into the driver's private data structures,
only the driver knows how to get at all of the napi_struct instances
it may have per-device.
With lots of help and suggestions from Rusty Russell, Roland Dreier,
Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
[ Ported to current tree and all drivers converted. Integrated
Stephen's follow-on kerneldoc additions, and restored poll_list
handling to the old style to fix mutual exclusion issues. -DaveM ]
Signed-off-by: Stephen Hemminger <shemminger@linux-foundation.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Remove the now unused "liobn" field in ibmveth which also avoids
having insider knowledge of the iommu table in that driver.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
Our pseries hcall interfaces are out of control:
plpar_hcall_norets
plpar_hcall
plpar_hcall_8arg_2ret
plpar_hcall_4out
plpar_hcall_7arg_7ret
plpar_hcall_9arg_9ret
Create 3 interfaces to cover all cases:
plpar_hcall_norets: 7 arguments no returns
plpar_hcall: 6 arguments 4 returns
plpar_hcall9: 9 arguments 9 returns
There are only 2 cases in the kernel that need plpar_hcall9, hopefully
we can keep it that way.
Pass in a buffer to stash return parameters so we avoid the &dummy1,
&dummy2 madness.
Signed-off-by: Anton Blanchard <anton@samba.org>
--
Signed-off-by: Paul Mackerras <paulus@samba.org>
Move all the Hypervisor call definitions to to a single header file.
Signed-off-by: Mike Kravetz <kravetz@us.ibm.com>
Signed-off-by: Paul Mackerras <paulus@samba.org>
This patch provides a sysfs interface to change some properties of the
ibmveth buffer pools (size of the buffers, number of buffers per pool,
and whether a pool is active). Ethernet drivers use ethtool to provide
this type of functionality. However, the buffers in the ibmveth driver
can have an arbitrary size (not only regular, mini, and jumbo which are
the only sizes that ethtool can change), and also ibmveth can have an
arbitrary number of buffer pools
Under heavy load we have seen dropped packets which obviously kills TCP
performance. We have created several fixes that mitigate this issue,
but we definitely need a way of changing the number of buffers for an
adapter dynamically. Also, changing the size of the buffers allows
users to change the MTU to something big (bigger than a jumbo frame)
greatly improving performance on partition to partition transfers.
The patch creates directories pool1...pool4 in the device directory in
sysfs, each with files: num, size, and active (which default to the
values in the mainline version).
Comments and suggestions are welcome...
--
Santiago A. Leon
Power Linux Development
IBM Linux Technology Center
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This patch adds the lockless TX feature to the ibmveth driver. The
hypervisor has its own locking so the only change that is necessary is
to protect the statistics counters.
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch removes the allocation of RX skb's buffers from a workqueue
to be called directly at RX processing time. This change was suggested
by Dave Miller when the driver was starving the RX buffers and
deadlocking under heavy traffic:
> Allocating RX SKBs via tasklet is, IMHO, the worst way to
> do it. It is no surprise that there are starvation cases.
>
> If tasklets or work queues get delayed in any way, you lose,
> and it's very easy for a card to catch up with the driver RX'ing
> packets very fast, no matter how aggressive you make the
> replenishing. By the time you detect that you need to be
> "more aggressive" it is already too late.
> The only pseudo-reliable way is to allocate at RX processing time.
>
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
This patch changes the way the ibmveth driver handles the receive
buffers. The old code mallocs and maps all the buffers in the pools
regardless of MTU size and it also limits the number of buffer pools to
three. This patch makes the driver malloc and map the buffers necessary
to support the current MTU. It also changes the hardcoded names of the
buffer pool number, size, and elements to arrays to make it easier to
change (with the hope of making them runtime parameters in the future).
Signed-off-by: Santiago Leon <santil@us.ibm.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
Initial git repository build. I'm not bothering with the full history,
even though we have it. We can create a separate "historical" git
archive of that later if we want to, and in the meantime it's about
3.2GB when imported into git - space that would just make the early
git days unnecessarily complicated, when we don't have a lot of good
infrastructure for it.
Let it rip!