Commit graph

200 commits

Author SHA1 Message Date
Tejun Heo
203b42f731 workqueue: make deferrable delayed_work initializer names consistent
Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-21 13:18:23 -07:00
Linus Torvalds
1e30c1b386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates and fixes from David Miller:

1) Reinstate the no-ref optimization for input route lookups in ipv4 to
   fix some routing cache removal perf regressions.

2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet.

3) Get RX hash value from correct place in be2net driver, from
   Sarveshwar Bandi.

4) Validation of FIB cached routes missing critical check, from Eric
   Dumazet.

5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
  ipv6: Early TCP socket demux
  ipv4: Fix input route performance regression.
  pch_gbe: vlan skb len fix
  pch_gbe: add extra clean tx
  pch_gbe: fix transmit watchdog timeout
  ixgbe: fix panic while dumping packets on Tx hang with IOMMU
  be2net: Fix to parse RSS hash from Receive completions correctly.
  net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
  hyperv: Add error handling to rndis_filter_device_add()
  hyperv: Add a check for ring_size value
  ipv4: rt_cache_valid must check expired routes
  net/pch_gpe: Cannot disable ethernet autonegation
  qeth: repair crash in qeth_l3_vlan_rx_kill_vid()
  netiucv: cleanup attribute usage
  net: wiznet add missing HAS_IOMEM dependency
  be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
  mlx4: Add support for EEH error recovery
  cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
  wanmain: comparing array with NULL
  caif: fix NULL pointer check
  ...
2012-07-26 18:09:01 -07:00
Amir Vadai
ee64c0ee51 net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
RFS filter id can't have the special value RPS_NO_FILTER,
need to skip it when allocating id's.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 00:23:55 -07:00
Kleber Sacilotto de Souza
57dbf29a54 mlx4: Add support for EEH error recovery
Currently the mlx4 drivers don't have the necessary callbacks to
implement EEH errors detection and recovery, so the PCI layer uses the
probe and remove callbacks to try to recover the device after an error on
the bus. However, these callbacks have race conditions with the internal
catastrophic error recovery functions, which will also detect the error
and this can cause the system to crash if both EEH and catas functions
try to reset the device.

This patch adds the necessary error recovery callbacks and makes sure
that the internal catastrophic error functions will not try to reset the
device in such scenarios. It also adds some calls to
pci_channel_offline() to suppress reads/writes on the bus when the slot
cannot accept I/O operations so we prevent unnecessary accesses to the
bus and speed up the device removal.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-25 15:24:13 -07:00
Linus Torvalds
5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Roland Dreier
089117e1ad Merge branches 'cma', 'cxgb4', 'misc', 'mlx4-sriov', 'mlx-cleanups', 'ocrdma' and 'qib' into for-linus 2012-07-22 23:26:17 -07:00
Thadeu Lima de Souza Cascardo
4cce66cdd1 mlx4_en: map entire pages to increase throughput
In its receive path, mlx4_en driver maps each page chunk that it pushes
to the hardware and unmaps it when pushing it up the stack. This limits
throughput to about 3Gbps on a Power7 8-core machine.

One solution is to map the entire allocated page at once. However, this
requires that we keep track of every page fragment we give to a
descriptor. We also need to work with the discipline that all fragments will
be released (in the sense that it will not be reused by the driver
anymore) in the order they are allocated to the driver.

This requires that we don't reuse any fragments, every single one of
them must be reallocated. We do that by releasing all the fragments that
are processed and only after finished processing the descriptors, we
start the refill.

We also must somehow guarantee that we either refill all fragments in a
descriptor or none at all, without resorting to giving up a page
fragment that we would have already given. Otherwise, we would break the
discipline of only releasing the fragments in the order they were
allocated.

This has passed page allocation fault injections (restricted to the
driver by using required-start and required-end) and device hotplug
while 16 TCP streams were able to deliver more than 9Gbps.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:53:13 -07:00
Amir Vadai
1eb8c695bd net/mlx4_en: Add accelerated RFS support
Use RFS infrastructure and flow steering in HW to keep CPU
affinity of rx interrupts and application per TCP stream.

A flow steering filter is added to the HW whenever the RFS
ndo callback is invoked by core networking code.

Because the invocation takes place in interrupt context, the
actual setup of HW is done using workqueue. Whenever new filter
is added, the driver checks for expiry of existing filters.

Since there's window in time between the point where the core
RFS code invoked the ndo callback, to the point where the HW
is configured from the workqueue context, the 2nd, 3rd etc
packets from that stream will cause the net core to invoke
the callback again and again.

To prevent inefficient/double configuration of the HW, the filters
are kept in a database which is indexed using hash function to enable
fast access.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai
d9236c3f10 {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
If supplied, the assigned IRQ is tracked using rmap infrastructure.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai
af22d9de45 net/mlx4: Move MAC_MASK to a common place
Define this macro is one common place instead of duplicating it over the code

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Dan Carpenter
9c64508af2 net/mlx4_en: dereferencing freed memory
We dereferenced "mclist" after the kfree().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:58:07 -07:00
Dan Carpenter
447458c01f net/mlx4: off by one in parse_trans_rule()
This should be ">=" here instead of ">".  MLX4_NET_TRANS_RULE_NUM is 6.
We use "spec->id" as an array offset into the __rule_hw_sz[] and
__sw_id_hw[] arrays which have 6 elements.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:57:43 -07:00
Jack Morgenstein
6634961c14 mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
To allow easy paravirtualization of P_Key and GID table sizes, keep
paravirtualized sizes in mlx4_dev->caps, but save the actual physical
sizes from FW in struct: mlx4_dev->phys_cap.

In addition, in SR-IOV mode, do the following:

1. Reduce reported P_Key table size by 1.
   This is done to reserve the highest P_Key index for internal use,
   for declaring an invalid P_Key in P_Key paravirtualization.
   We require a P_Key index which always contain an invalid P_Key
   value for this purpose (i.e., one which cannot be modified by
   the subnet manager).  The way to do this is to reduce the
   P_Key table size reported to the subnet manager by 1, so that
   it will not attempt to access the P_Key at index #127.

2. Paravirtualize the GID table size to 1. Thus, each guest sees
   only a single GID (at its paravirtualized index 0).

In addition, since we are paravirtualizing the GID table size to 1, we
add paravirtualization of the master GID event here (i.e., we do not
do ib_dispatch_event() for the GUID change event on the master, since
its (only) GUID never changes).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:52:23 -07:00
Jack Morgenstein
105c320f6a mlx4_core: Allow guests to have IB ports
Modify mlx4_dev_cap to allow IB support when SR-IOV is active.  Modify
mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
pass that to the guests (this is done in QUERY_FUNC_CAP and its
wrapper).

However, we don't activate IB support quite yet -- we leave the error
return at the start of mlx4_ib_add in the mlx4_ib driver.

In addition, set "protected fmr supported" bit to zero in the
QUERY_FUNC_CAP wrapper.

Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
checks for the port type (IB or Ethernet).  Previously, this was not
an issue, since only Ethernet ports were supported.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:51:37 -07:00
Jack Morgenstein
396f2feb05 mlx4_core: Implement mechanism for reserved Q_Keys
The SR-IOV special QP tunneling mechanism uses proxy special QPs
(instead of the real special QPs) for MADs on guests.  These proxy QPs
send their packets to a "tunnel" QP owned by the master.  The master
then forwards the MAD (after any required paravirtualization) to the
real special QP, which sends out the MAD.

For security reasons (i.e., to prevent guests from sending MADs to
tunnel QPs belonging to other guests), each proxy-tunnel QP pair is
assigned a unique, reserved, Q_Key.  These Q_Keys are available only
for proxy and tunnel QPs -- if the guest tries to use these Q_Keys
with other QPs, it will fail.

This patch introduces a mechanism for reserving a block of 64K Q_Keys
for proxy/tunneling use.

The patch introduces also two new fields into mlx4_dev: base_sqpn and
base_tunnel_sqpn.

In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps
are added to the reserved QPN area (so that they will not change).
There are 8 special QPs per port in the HCA, and each of them is
assigned both a proxy and a tunnel QP, for each VF and for the PF as
well in SR-IOV mode.

The QPNs for these QPs are arranged as follows:
 1. The real SQP numbers (8)
 2. The proxy SQPs (8 * (max number of VFs + max number of PFs)
 3. The tunnel SQPs (8 * (max number of VFs + max number of PFs)

To support these QPs, two new fields are added to struct mlx4_dev:

  base_sqp:  this is the QP number of the first of the real SQPs
  base_tunnel_sqp: this is the qp number of the first qp in the tunnel
                   sqp region. (On guests, this is the first tunnel
                   sqp of the 8 which are assigned to that guest).

In addition, in SR-IOV mode, sqp_start is the number of the first
proxy SQP in the proxy SQP region.  (In guests, this is the first
proxy SQP of the 8 which are assigned to that guest)

Note that in non-SR-IOV mode, there are no proxies and no tunnels.
In this case, sqp_start is set to sqp_base -- which minimizes code
changes.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:35:55 -07:00
Dotan Barak
240a9207aa net/mlx4_core: Free ICM table in case of error
In mlx4_init_icm_table(), free the allocated table if we failed to
allocate memory to its entries.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:58 -07:00
Dotan Barak
f457ce471c mlx4_core: Remove double function declarations
Spotted four duplicate declarations in icm.h, remove them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:58 -07:00
Jack Morgenstein
2aca1172c2 net/mlx4_core: Initialize IB port capabilities for all slaves
With IB SR-IOV, each slave has its own separate copy of the port
capabilities flags.  For example, the master can run a subnet manager
(which causes the IsSM bit to be set in the master's port
capabilities) without affecting the port capabilities seen by the
slaves (the IsSM bit will be seen as cleared in the slaves).

Also add a static inline mlx4_master_func_num() to enhance readability
of the code.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:57:06 -07:00
Jack Morgenstein
00f5ce99dc mlx4: Use port management change event instead of smp_snoop
The port management change event can replace smp_snoop.  If the
capability bit for this event is set in dev-caps, the event is used
(by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
mask in the MAP_EQ fw command).  In this case, when the driver passes
incoming SMP PORT_INFO SET mads to the FW, the FW generates port
management change events to signal any changes to the driver.

If the FW generates these events, smp_snoop shouldn't be invoked in
ib_process_mad(), or duplicate events will occur (once from the
FW-generated event, and once from smp_snoop).

In the case where the FW does not generate port management change
events smp_snoop needs to be invoked to create these events.  The flow
in smp_snoop has been modified to make use of the same procedures as
in the fw-generated-event event case to generate the port management
events (LID change, Client-rereg, Pkey change, and/or GID change).

Port management change event handling required changing the
mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
(last argument) had to be changed to unsigned long in order to
accomodate passing the EQE pointer.

We also needed to move the definition of struct mlx4_eqe from
net/mlx4.h to file device.h -- to make it available to the IB driver,
to handle port management change events.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:47:10 -07:00
Jack Morgenstein
752a50cab6 mlx4_core: Pass an invalid PCI id number to VFs
Currently, VFs have 0 in their dev->caps.function field.  This is a
valid pci id (usually of the PF).  Instead, pass an invalid PCI id to
the VF via QUERY_FW, so that if the value gets accessed in the VF
driver, we'll catch the problem.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:05 -07:00
Hadar Hen Zion
cabdc8ee37 net/mlx4_en: Add support for drop action through ethtool
The drop action is implemented by allocating a QP and keeping it in a reset state
such that the HW drops any packets which are steered to that QP. When a drop action
is requested, we attach the relevant flow to that QP.

Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
820672812f net/mlx4_en: Manage flow steering rules with ethtool
Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
the existing get_rxnfc callback.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
592e49dda8 net/mlx4: Implement promiscuous mode with device managed flow-steering
The device managed flow steering API has three promiscuous modes:

1. Uplink - captures all the packets that arrive to the port.
2. Allmulti - captures all multicast packets arriving to the port.
3. Function port - for future use, this mode is not implemented yet.

Use these modes with the flow_attach and flow_detach firmware commands
according to the promiscuous state of the netdevice.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
1b9c6b064e net/mlx4_core: Add resource tracking for device managed flow steering rules
As with other device resources, the resource tracker is needed for supporting
device managed flow steering rules under SRIOV: make sure virtual functions
delete only rules created by them, and clean all rules attached by a crashed VF.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion
0ff1fb654b {NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
8fcfb4db74 net/mlx4_core: Add firmware commands to support device managed flow steering
Add support for firmware commands to attach/detach a new device managed
steering mode. Such network steering rules allow the user to provide an
L2/L3/L4 flow specification to the firmware and have the device to steer
traffic that matches that specification to the provided QP.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
c96d97f4d1 net/mlx4: Set steering mode according to device capabilities
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Yevgeny Petrilin
6d19993788 net/mlx4_en: Re-design multicast attachments flow
Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.

To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
aa1ec3dde1 net/mlx4_core: Change resource tracking ID to be 64 bit
Currently the IDs used by the resource tracker are of type u32, so far this was
ok since all the different resources we were tracking could be encoded in 32bit.

As a preparation step for tracking of resources whose IDs need > 32 bits such
as network flow steering rules, who are 64 bit in size, move to use 64 bit
based resource IDs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion
4af1c0488d net/mlx4_core: Change resource tracking mechanism to use red-black tree
Change the data structure used for managing the SRIOV resource tracking
mechanism from radix tree to red-black tree. This is preparation step
for supporting resource IDs which are 64bit long, such as network flow
steering rules. Such IDs can't be used as radix-tree keys on 32bit
architectures and hence the reason for the change.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Yuval Mintz
90b1ebe7af mlx4: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 03:06:44 -07:00
David S. Miller
b26d344c6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/caif/caif_hsi.c
	drivers/net/usb/qmi_wwan.c

The qmi_wwan merge was trivial.

The caif_hsi.c, on the other hand, was not.  It's a conflict between
1c385f1fdf ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.

I did my best with that one and will ask Sjur to check it out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 17:37:00 -07:00
Yevgeny Petrilin
044ca2a5f2 net/mlx4_en: Release QP range in free_resources
Add a missing resource release in ring cleanup.
Not doing this leaves a range of QPs that are being reserved,
and no one can use them.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
Yevgeny Petrilin
9858d2d1ac net/mlx4: Use single completion vector after NOP failure
Fix a crash at the error flow of NOP command which caused the driver to try and use
a completion vector which wasn't allocated.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
Yevgeny Petrilin
5c8e904666 net/mlx4_en: Set correct port parameters during device initialization
Set valid port parameters: MTU and flow control configuration when
configuring the port during HW device initialization,
prior to the net device open() being called.
Using  invalid parameters (such as all zeros)
could lead to bad firmware behavior.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
David S. Miller
7e52b33bd5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/route.c

This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 15:51:55 -07:00
Linus Torvalds
ae501be0f6 InfiniBand/RDMA fixes for 3.5-rc2, all in hardware drivers:
- Fix crash in cxgb4
  - Fixes to new ocrdma driver
  - Regression fixes for mlx4
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPz5V+AAoJEENa44ZhAt0hsXoQAIFBNbsLWLft4J9jpyDAFj5z
 SpspcZAVgJW+K7I6j38SdQXMsR/XmcwM8Yh3iJIa/YG0P/6bWXDNU20itN62mAcI
 D+LphECUv/T8PGxEcAFlwjLYy5kcgFBTs/t7AgVS57OL4z8YuDLDr7uhdcnPdq7t
 AMIptqhXZgSnM/peIkf2SWBYxHSUMhukrlu5uNB9uc9GC9VRGhioXxT3eBPwr54O
 7BwHxQGDYtFZ4fYKmFxN9sJJmFf9nl/WhYM2HwCMPEe4UzlxxfbP7/c17QNS7YGo
 e072Jvs+U2ttb4J7J2yBRcpiiSjYDAVi+7fE9OtAdWyfPDa9MmcajVq7PUnohZLc
 muNSv1RGOebj2mSjE+oc1oWZKkM/nSEwbIE7PJOvWX2RdqOs8xNNC0RjgvsXDvyf
 +lARPcyolxpJXtvIlLY/Ua1KhJf52zu+Kp1QCbEnloU/NuGcc09It6Wq6X1yU/Gs
 N29qjuFOBoHQxDzfWPc3Uhp1/eNI8UJiTfO9CSYHv2ZHJzW8BqoblbKlE2l43TDg
 frH1l1/9RYY7gTGoCVvfJByvSWM7rBJdNfpu+yJDm5en86xcF6HB7GG0nJ/okpw2
 dUqQsLu+nTKZn8KCM3jzgP/ANT+PWk1UZlS1fIaMFMwa1SLWYqbaW/KEbRrY5IeP
 H0q/TFhJH8PccBjPbcsT
 =zl8u
 -----END PGP SIGNATURE-----

Merge tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA fixes from Roland Dreier:
 "All in hardware drivers:
   - Fix crash in cxgb4
   - Fixes to new ocrdma driver
   - Regression fixes for mlx4"

* tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Fix max_wqe capacity reported from query device
  mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
  IB/mlx4: Fix EQ deallocation in legacy mode
  RDMA/cxgb4: Fix crash when peer address is 0.0.0.0
  RDMA/ocrdma: Remove unnecessary version.h includes
  RDMA/ocrdma: Fix signaled event for SRQ_LIMIT_REACHED
  RDMA/ocrdma: Correct queue free count math
2012-06-06 10:45:21 -07:00
Jack Morgenstein
edc4a67e15 mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
Commit 096335b3f9 ("mlx4_core: Allow dynamic MTU configuration for
IB ports") modifies the port VL setting.  This exposes a bug in
mlx4_common_set_port(), where the VL cap value passed in (inside the
command mailbox) is incorrectly zeroed-out:

mlx4_SET_PORT modifies the VL_cap field (byte 3 of the mailbox).
Since the SET_PORT command is paravirtualized on the master as well as
on the slaves, mlx4_SET_PORT_wrapper() is invoked on the master.  This
calls mlx4_common_set_port() where mailbox byte 3 gets overwritten by
code which should only set a single bit in that byte (for the reset
qkey counter flag) -- but instead overwrites the entire byte.

The result is that when running in SR-IOV mode, the VL_cap will be set
to zero -- fix this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-06 10:07:54 -07:00
Joe Perches
6469933605 ethernet: Remove casts to same type
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.

For example, this cast:

        int y;
        int *p = (int *)&y;

I used the coccinelle script below to find and remove these
unnecessary casts.  I manually removed the conversions this
script produces of casts with __force, __iomem and __user.

@@
type T;
T *p;
@@

-       (T *)p
+       p

A function in atl1e_main.c was passed a const pointer
when it actually modified elements of the structure.

Change the argument to a non-const pointer.

A function in stmmac needed a __force to avoid a sparse
warning.  Added it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-06 09:31:33 -07:00
Jack Morgenstein
401453a31e net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
The "!mlx4_is_slave" is totally confusing.  Fix with
constant MLX4_CMD_NATIVE, which is the intended behavior.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
6230bb234d net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
The range check was performed after using the port number.

Reverse this to prevent a potential array overflow.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
b91cb3ebcd net/mlx4_core: Fixes for VF / Guest startup flow
- pass the following parameters:
  - firmware version (added QUERY_FW paravirtualization for that)

  - disable Blueflame on slaves. KVM disables write combining on guests,
    and we get better performance without BF in this case. (This requires
    QUERY_DEV_CAP paravirtualization, also in this commit)

  - max qp rdma as destination

- get rid of a chunk of "if (0)" dead code

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
13bf58b760 net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event
Port is used as an array index before we know if that is proper.

For example, in the catas event case, port is zero; however,
the port index should lie in the range (1..2).

Fix this by using 'port' only in the events where it is of interest.

Test for port out of range in the default (unhandled event) case,
and do not output a message if it is not an ethernet port.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Marcel Apfelbaum
3fc929e2d6 net/mlx4_core: Fix number of EQs used in ICM initialisation
In SRIOV mode, the number of EQs used when computing the total ICM size
was incorrect.

To fix this, we do the following:
1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA
   capabilities.  The PPF uses the phys capabilities when it computes things
   like ICM size.

   The dev_caps structure will then contain the paravirtualized values, making
   bookkeeping much easier in SRIOV mode. We add a structure rather than a
   single parameter because there will be other fields in the phys_caps.

   The first field we add to the mlx4_phys_caps structure is num_phys_eqs.

2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter
   passed to the FW is the number of EQs per VF/PF; each function (PF or VF)
   has this number of EQs available.

   However, the total number of EQs which must be allowed for in the ICM is
   (1 << log_num_eqs) * (#VFs + #PFs).  Rather than compute this quantity,
   we allocate ICM space for 1024 EQs (which is the device maximum
   number of EQs, and which is the value we place in the mlx4_phys_caps structure).

   For INIT_HCA, however, we use the per-function number of EQs as described
   above.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein
30f7c73bed net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
Ths fixes the comparison in the FLR (Function Level Reset) event case.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:15 -04:00
Linus Torvalds
c23ddf7857 InfiniBand/RDMA changes for the 3.5 merge window:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
  - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
    applications to send and receive arbitrary packets directly to/from
    the hardware
  - Add "doorbell drop" handling to the cxgb4 driver
  - A fairly large batch of qib hardware driver changes
  - A few fixes for lockdep-detected issues
  - A few other miscellaneous fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
 zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
 jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
 mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
 vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
 F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
 HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
 syKfN0VjiDRtJ+pxayi3
 =yWfr
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
 - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
   applications to send and receive arbitrary packets directly to/from
   the hardware
 - Add "doorbell drop" handling to the cxgb4 driver
 - A fairly large batch of qib hardware driver changes
 - A few fixes for lockdep-detected issues
 - A few other miscellaneous fixes and cleanups

Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.

* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
  RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
  IB/mlx4: Fix mlx4_ib_add() error flow
  IB/core: Fix IB_SA_COMP_MASK macro
  IB/iser: Fix error flow in iser ep connection establishment
  IB/mlx4: Increase the number of vectors (EQs) available for ULPs
  RDMA/cxgb4: Add query_qp support
  RDMA/cxgb4: Remove kfifo usage
  RDMA/cxgb4: Use vmalloc() for debugfs QP dump
  RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
  RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
  RDMA/cxgb4: Add DB Overflow Avoidance
  RDMA/cxgb4: Add debugfs RDMA memory stats
  cxgb4: DB Drop Recovery for RDMA and LLD queues
  cxgb4: Common platform specific changes for DB Drop Recovery
  cxgb4: Detect DB FULL events and notify RDMA ULD
  RDMA/cxgb4: Drop peer_abort when no endpoint found
  RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
  mlx4_core: Change bitmap allocator to work in round-robin fashion
  RDMA/nes: Don't call event handler if pointer is NULL
  RDMA/nes: Fix for the ORD value of the connecting peer
  ...
2012-05-21 17:54:55 -07:00
Amir Vadai
bc6a4744b8 net/mlx4_en: num cores tx rings for every UP
Change the TX ring scheme such that the number of rings for untagged packets
and for tagged packets (per each of the vlan priorities) is the same, unlike
the current situation where for tagged traffic there's one ring per priority
and for untagged rings as the number of core.

Queue selection is done as follows:

If the mqprio qdisc is operates on the interface, such that the core networking
code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
queue set is forced - for both, tagged and untagged traffic.

Else, the egress map skb->priority =>  User priority is used for tagged traffic, and
all untagged traffic is sent through tx rings of UP 0.

The patch follows the convergence of discussing that issue with John Fastabend
over this thread http://comments.gmane.org/gmane.linux.network/229877

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 16:17:50 -04:00
Jack Morgenstein
eb71d0d63f net/mlx4_core: Fixed error flow in rem_slave_eqs
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:59 -04:00
Jack Morgenstein
ba062d5219 net/mlx4_core: Add XRC domains and counters to resource tracker
Add missing resource tracking for XRC domains and complete the tracking for HCA
network flow counters.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:59 -04:00
Jack Morgenstein
b8924951f6 net/mlx4_core: Fix potential kernel Oops in res tracker during Dom0 driver unload
Currently the slave and master resources are deleted after master freed
all bitmaps. If any resources were not properly cleaned up during the
shutdown process, an Oops would result.

Fix so that delete slave (only) resources during cleanup. Master resources
are cleaned up during unload process, and need not separately be cleaned.

Note that during cleanup, we need to split the resource-tracker freeing
functionality.

Before removing all the bitmaps, we free any leftover slave resources.
However, we can only remove the resource tracker linked list after
all bitmap frees, since some of the freeing functions (e.g.,
mlx4_cleanup_eq_table) use paravirtualized FW commands which expect
the resource tracker linked list to be present.

Found-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00