Commit Graph

382 Commits (master)

Author SHA1 Message Date
Jack Morgenstein 992e8e6e87 IB/mlx4: Miscellaneous adjustments for SR-IOV IB support
1. Allow only master to change node description.
2. Prevent AH leakage in send mads.
3. Take device part number from PCI structure, so that guests see the
   VF part number (and not the PF part number).
4. Place the device revision ID into caps structure at startup.
5. SET_PORT in update_gids_task needs to go through wrapper on master.
6. In mlx4_ib_event(), PORT_MGMT_EVENT needs be handled in a work
   queue on the master, since it propagates events to slaves using
   GEN_EQE.
7. Do not support FMR on slaves.
8. Add spinlock to slave_event(), since it is called both in interrupt
   context and in process context (due to 6 above, and also if
   smp_snoop is used).  This fix was found and implemented by Saeed
   Mahameed <saeedm@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:41 -07:00
Jack Morgenstein 980e90010f mlx4_core: INIT/CLOSE port logic for IB ports in SR-IOV mode
Normally, INIT_PORT and CLOSE_PORT are invoked when special QP0
transitions to RTR, or transitions to ERR/RESET respectively.

In SR-IOV mode, however, the master is also paravirtualized.  This in
turn requires that we not do INIT_PORT until the entire QP0 path (real
QP0 and proxy QP0) is ready to receive.  When the real QP0 goes down,
we should indicate that the port is not active.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:40 -07:00
Jack Morgenstein efcd235d73 net/mlx4_core: Adjustments to SET_PORT for IB SR-IOV
1. Slaves may not set the IS_SM capability for the port.
2. DEV_MGMT may not be set in multifunction mode.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:40 -07:00
Jack Morgenstein 2a4fae148c IB/mlx4: Propagate P_Key and guid change port management events to slaves
P_Key change and guid change events are not of interest to all slaves,
but only to those slaves which "see" the table slots whose contents
have change.

For example, if the guid at port 1, index 5 has changed in the PPF, we
wish to propagate the gid-change event only to the function which has
that guid index mapped to its port/guid table (in this case it is
slave #5). Other functions should not get the event, since the event
does not affect them.

Similarly with P_Keys -- P_Key change events are forwarded only to
slaves which have that P_Key index mapped to their virtual P_Key table.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:38 -07:00
Jack Morgenstein a0c64a17ab mlx4: Add alias_guid mechanism
For IB ports, we paravirtualize the GUID at index 0 on slaves.  The
GUID at index 0 seen by a slave is the actual GUID occupying the GUID
table at the slave-id index.

The driver, by default, requests at startup time that subnet manager
populate its entire guid table with GUIDs. These guids are then mapped
(paravirtualized) to the slaves, and appear for each slave as its GUID
at index 0.

Until each slave has such a guid, its port status is DOWN.

The guid table is cached to support special QP paravirtualization, and
event propagation to slaves on guid change (we test to see if the guid
really changed before propagating an event to the slave).

To support this caching, add capability to __mlx4_ib_query_gid() to
obtain the network view (i.e., physical view) gid at index X, not just
the host (paravirtualized) view.

Based on a patch from Erez Shitrit <erezsh@mellanox.com>

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:37 -07:00
Jack Morgenstein 993c401e20 mlx4_core: Add IB port-state machine and port mgmt event propagation
For an IB port, a slave should not show port active until that slave
has a valid alias-guid (provided by the subnet manager).  Therefore
the port-up event should be passed to a slave only after both the port
is up, and the slave's alias-guid has been set.

Also, provide the infrastructure for propagating port-management
events (client-reregister, etc) to slaves.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:37 -07:00
Jack Morgenstein 0a9a01884d mlx4: MAD_IFC paravirtualization
The MAD_IFC firmware command fulfills two functions.

First, it is used in the QP0/QP1 MAD-handling flow to obtain
information from the FW (for answering queries), and for setting
variables in the HCA (MAD SET packets).

For this, MAD_IFC should provide the FW (physical) view of the data.
This is the view that OpenSM needs.  We call this the "network view".

In the second case, MAD_IFC is used by various verbs to obtain data
regarding the local HCA (e.g., ib_query_device()).  We call this the
"host view".

This data needs to be paravirtualized.

MAD_IFC therefore needs a wrapper function, and also needs another
flag indicating whether it should provide the network view (when it is
called by ib_process_mad in special-qp packet handling), or the host
view (when it is called while implementing a verb).

There are currently 2 flag parameters in mlx4_MAD_IFC already:
ignore_bkey and ignore_mkey.  These two parameters are replaced by a
single "mad_ifc_flags" parameter, with different bits set for each
flag.  A third flag is added: "network-view/host-view".

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:34 -07:00
Jack Morgenstein 54679e1482 mlx4: Implement QP paravirtualization and maintain phys_pkey_cache for smp_snoop
This requires:

1. Replacing the paravirtualized P_Key index (inserted by the guest)
   with the real P_Key index.

2. For UD QPs, placing the guest's true source GID index in the
   address path structure mgid field, and setting the ud_force_mgid
   bit so that the mgid is taken from the QP context and not from the
   WQE when posting sends.

3. For UC and RC QPs, placing the guest's true source GID index in the
   address path structure mgid field.

4. For tunnel and proxy QPs, setting the Q_Key value reserved for that
   proxy/tunnel pair.

Since not all the above adjustments occur in all the QP transitions,
the QP transitions require separate wrapper functions.

Secondly, initialize the P_Key virtualization table to its default
values: Master virtualized table is 1-1 with the real P_Key table,
guest virtualized table has P_Key index 0 mapped to the real P_Key
index 0, and all the other P_Key indices mapped to the reserved
(invalid) P_Key at index 127.

Finally, add logic in smp_snoop for maintaining the phys_P_Key_cache.
and generating events on the master only if a P_Key actually changed.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:33 -07:00
Jack Morgenstein fc06573dfa IB/mlx4: Initialize SR-IOV IB support for slaves in master context
Allocate SR-IOV paravirtualization resources and MAD demuxing contexts
on the master.

This has two parts.  The first part is to initialize the structures to
contain the contexts.  This is done at master startup time in
mlx4_ib_init_sriov().

The second part is to actually create the tunneling resources required
on the master to support a slave.  This is performed the master
detects that a slave has started up (MLX4_DEV_EVENT_SLAVE_INIT event
generated when a slave initializes its comm channel).

For the master, there is no such startup event, so it creates its own
tunneling resources when it starts up.  In addition, the master also
creates the real special QPs.  The ib_core layer on the master causes
creation of proxy special QPs, since the master is also
paravirtualized at the ib_core layer.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:32 -07:00
Jack Morgenstein e2c76824ca mlx4_core: Add proxy and tunnel QPs to the reserved QP area
In addition, pass the proxy and tunnel QP numbers to slaves so the
driver can perform special QP paravirtualization.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:31 -07:00
Or Gerlitz a084feebd2 mlx4_core: Remove annoying debug message in the resource tracker
This innocent print makes it very hard to actually use the mlx4_core
debug messages -- for example, the module load sequence of a device
with two VFs yielded 3200 debug prints, with 2800 of them being this
one.  Let's just remove it.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:09 -07:00
Dotan Barak 426dd00d76 mlx4_core: Fix wrong offset in parsing query device caps response
The wrong offset was used when parsing the number of XRCs in
mlx4_QUERY_DEV_CAP().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:33:08 -07:00
Linus Torvalds 2ade0b7f79 Last set of InfiniBand/RDMA fixes for 3.6:
- Couple more IPoIB fixes for regressions introduced by path database conversion
  - Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQV0eiAAoJEENa44ZhAt0hTScP/3+Y8v4w3Nyop7iNj9uvnsCn
 PhgfY9XUp9d0zOHM7XK+pTGLJheLLiyXfcCFTwgoC6tIyj7J1ulhYv+RBr7GbS+e
 K7aG7QtLDyM17ETgOBVIAvkz1CFKwdI843xRQKo0/oVMQJ9XLHg0nvzd+XaxZuud
 C+7Lfl5RH9Bl1tHJuOZmdyJinSQyLO/uV+tqwL5upF5YMTCJ2XsmVa1HQKhOAkBE
 bH4zXtFViNEKVuKXkpGc1yJFKZiwysirluYXedDkkEdr2jf9Mysw5JvZKVUOCh+4
 qYMjEAWQaUBia/tFUs9k6iCwE8ws7tRC4OSisb3cVzCUaqkfmx1Sp7vAQcDqx4xB
 r2dy4oc+vEFdrmjerOVEgmdgTwnDrWMjUlPuGyRX9935I7ybKA7NQPGlImKU8oiq
 8vyptsTN6r7AYc7kw6aYaE9kqGPnST7I3cZg71lHfwCuHKMNYfkxd1Mqxx/W6rMZ
 G1VnstVdioQK2HvcFIeArXQ6H6QjUAY92WMoyhIsrU4KwEx2Zpr0pfKWCD0eIOuB
 f4s9v0f5/6DqSwJ9uhhpAW/5bmV+RUCDjm8Ep9i70bGk7SliwT0MNPhHbaMfm7aU
 c+awy3HWpWKGCtod9zUzJH/4NsHzpggdHUlyAOuU5nqsh9GarYb0cvLjTNfsfdyV
 haQtMR+fSUK7Durd9MQl
 =nqvR
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA fixes from Roland Dreier:
 - A couple more IPoIB fixes for regressions introduced by path database
   conversion
 - Minor other fixes to low-level drivers (cxgb4, mlx4, qib, ocrdma)

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/qib: Fix failure of compliance test C14-024#06_LocalPortNum
  RDMA/ocrdma: Fix CQE expansion of unsignaled WQE
  mlx4_core: Fix integer overflows so 8TBs of memory registration works
  IPoIB: Fix AB-BA deadlock when deleting neighbours
  IPoIB: Fix memory leak in the neigh table deletion flow
  RDMA/cxgb4: Move dereference below NULL test
2012-09-17 13:21:02 -07:00
Yishai Hadas dd03e73481 mlx4_core: Fix integer overflows so 8TBs of memory registration works
This patch adds on the fixes done in commits 89dd86db78 ("mlx4_core:
Allow large mlx4_buddy bitmaps") and 3de819e6b6 ("mlx4_core: Fix
integer overflow issues around MTT table") so that memory registration
of up to 8TB (log_num_mtt=31) finally works.

It fixes integer overflows in a few mlx4_table_yyy routines in icm.c
by using a u64 intermediate variable, and int/uint issues that caused
table indexes to become nagive by setting some variables to be u32
instead of int.  These problems cause crashes when a user attempted to
register > 512GB of RAM.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-13 17:52:02 -07:00
Bjorn Helgaas 78890b5989 Merge commit 'v3.6-rc5' into next
* commit 'v3.6-rc5': (1098 commits)
  Linux 3.6-rc5
  HID: tpkbd: work even if the new Lenovo Keyboard driver is not configured
  Remove user-triggerable BUG from mpol_to_str
  xen/pciback: Fix proper FLR steps.
  uml: fix compile error in deliver_alarm()
  dj: memory scribble in logi_dj
  Fix order of arguments to compat_put_time[spec|val]
  xen: Use correct masking in xen_swiotlb_alloc_coherent.
  xen: fix logical error in tlb flushing
  xen/p2m: Fix one-off error in checking the P2M tree directory.
  powerpc: Don't use __put_user() in patch_instruction
  powerpc: Make sure IPI handlers see data written by IPI senders
  powerpc: Restore correct DSCR in context switch
  powerpc: Fix DSCR inheritance in copy_thread()
  powerpc: Keep thread.dscr and thread.dscr_inherit in sync
  powerpc: Update DSCR on all CPUs when writing sysfs dscr_default
  powerpc/powernv: Always go into nap mode when CPU is offline
  powerpc: Give hypervisor decrementer interrupts their own handler
  powerpc/vphn: Fix arch_update_cpu_topology() return value
  ARM: gemini: fix the gemini build
  ...

Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_main.c
	drivers/rapidio/devices/tsi721.c
2012-09-13 08:41:01 -06:00
Bjorn Helgaas 1959ec5f82 Merge branch 'pci/stephen-const' into next
* pci/stephen-const:
  make drivers with pci error handlers const
  scsi: make pci error handlers const
  netdev: make pci_error_handlers const
  PCI: Make pci_error_handlers const
2012-09-12 13:54:10 -06:00
Stephen Hemminger 3646f0e5c9 netdev: make pci_error_handlers const
Signed-off-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-09-07 16:35:00 -06:00
Eugenia Emantayev 521130d11f net/mlx4_core: Return the error value in case of command initialization failure
If mlx4_cmd_init() failed, the init_one function returned
success, although no resources were opened.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Aviad Yehezkel bef772eb06 net/mlx4_core: Fixing error flow in case of QUERY_FW failure
The order of operations was wrong on the teardown flow.

Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Aviad Yehezkel 60d31c1475 net/mlx4_core: Looking for promiscuous entries on the correct port
The search for promisc entries was always done on the first port,
While the addition is done on the correct port.
This lead to resource leackage of promisc entries on the second
port and brought to a state where we could no longer enter to
promiscuous mode after enough iterations of "ifconfig promisc"
on the second port.
Fix that by using the correct port when searching.

Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Hadar Hen Zion 7fb40f87c4 net/mlx4_core: Add security check / enforcement for flow steering rules set for VMs
Since VFs may be mapped to VMs which aren't trusted entities,  flow
steering rules attached through the wrapper on behalf of VFs must be
checked to make sure that their L2 specification relate to MAC address
assigned to that VF, and add L2 specification if its missing.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Hadar Hen Zion a8edc3bf05 net/mlx4_core: Put Firmware flow steering structures in common header files
To allow for usage of the flow steering Firmware structures in more locations over the driver,
such as the resource tracker, move them from mcg.c to common header files.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-07 12:55:59 -04:00
Jiang Liu fadd1daa0b mlx4: Use PCI Express Capability accessors
Use PCI Express Capability access functions to simplify mlx4 driver.

Signed-off-by: Jiang Liu <jiang.liu@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
2012-08-23 10:11:13 -06:00
Linus Torvalds 8f8ba75ee2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking update from David Miller:
 "A couple weeks of bug fixing in there.  The largest chunk is all the
  broken crap Amerigo Wang found in the netpoll layer."

 1) netpoll and it's users has several serious bugs:
    a) uses GFP_KERNEL with locks held
    b) interfaces requiring interrupts disabled are called with them
       enabled
    c) and vice versa
    d) VLAN tag demuxing, as per all other RX packet input paths, is not
       applied

    All from Amerigo Wang.

 2) Hopefully cure the ipv4 mapped ipv6 address TCP early demux bugs for
    good, from Neal Cardwell.

 3) Unlike AF_UNIX, AF_PACKET sockets don't set a default credentials
    when the user doesn't specify one explicitly during sendmsg().
    Instead we attach an empty (zero) SCM credential block which is
    definitely not what we want.  Fix from Eric Dumazet.

 4) IPv6 illegally invokes netdevice notifiers with RCU lock held, fix
    from Ben Hutchings.

 5) inet_csk_route_child_sock() checks wrong inet options pointer, fix
    from Christoph Paasch.

 6) When AF_PACKET is used for transmit, packet loopback doesn't behave
    properly when a socket fanout is enabled, from Eric Leblond.

 7) On bluetooth l2cap channel create failure, we leak the socket, from
    Jaganath Kanakkassery.

 8) Fix all the netprio file handling bugs found by Al Viro, from John
    Fastabend.

 9) Several error return and NULL deref bug fixes in networking drivers
    from Julia Lawall.

10) A large smattering of struct padding et al.  kernel memory leaks to
    userspace found of Mathias Krause.

11) Conntrack expections in netfilter can access an uninitialized timer,
    fix from Pablo Neira Ayuso.

12) Several netfilter SIP tracker bug fixes from Patrick McHardy.

13) IPSEC ipv6 routes are not initialized correctly all the time,
    resulting in an OOPS in inet_putpeer().  Also from Patrick McHardy.

14) Bridging does rcu_dereference() outside of RCU protected area, from
    Stephen Hemminger.

15) Fix routing cache removal performance regression when looking up
    output routes that have a local destination.  From Zheng Yan.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  af_netlink: force credentials passing [CVE-2012-3520]
  ipv4: fix ip header ident selection in __ip_make_skb()
  ipv4: Use newinet->inet_opt in inet_csk_route_child_sock()
  tcp: fix possible socket refcount problem
  net: tcp: move sk_rx_dst_set call after tcp_create_openreq_child()
  net/core/dev.c: fix kernel-doc warning
  netconsole: remove a redundant netconsole_target_put()
  net: ipv6: fix oops in inet_putpeer()
  net/stmmac: fix issue of clk_get for Loongson1B.
  caif: Do not dereference NULL in chnl_recv_cb()
  af_packet: don't emit packet on orig fanout group
  drivers/net/irda: fix error return code
  drivers/net/wan/dscc4.c: fix error return code
  drivers/net/wimax/i2400m/fw.c: fix error return code
  smsc75xx: add missing entry to MAINTAINERS
  net: qmi_wwan: new devices: UML290 and K5006-Z
  net: sh_eth: Add eth support for R8A7779 device
  netdev/phy: skip disabled mdio-mux nodes
  dt: introduce for_each_available_child_of_node, of_get_next_available_child
  net: netprio: fix cgrp create and write priomap race
  ...
2012-08-21 16:46:08 -07:00
Tejun Heo 203b42f731 workqueue: make deferrable delayed_work initializer names consistent
Initalizers for deferrable delayed_work are confused.

* __DEFERRED_WORK_INITIALIZER()
* DECLARE_DEFERRED_WORK()
* INIT_DELAYED_WORK_DEFERRABLE()

Rename them to

* __DEFERRABLE_WORK_INITIALIZER()
* DECLARE_DEFERRABLE_WORK()
* INIT_DEFERRABLE_WORK()

This patch doesn't cause any functional changes.

Signed-off-by: Tejun Heo <tj@kernel.org>
2012-08-21 13:18:23 -07:00
Linus Torvalds 846b99964a Grab bag of InfiniBand/RDMA fixes:
- IPoIB fixes for regressions introduced by path database conversion
  - mlx4 fixes for bugs with large memory systems and regressions from SR-IOV patches
  - RDMA CM fix for passing bad event up to userspace
  - Other minor fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQLo9cAAoJEENa44ZhAt0hPeUP/Ag1i164UE2e64sTfbVUfbEM
 7a+acsBo2ByQhVaXd65PZqm6+aYjosSwkStl0u4cb9Q3qQvYiZeLJj/qmInVV1hv
 qHuNytbnXl6W2f6vyz391TKx+X6uastlH7O4f4v0zS3zkrlkN9kCrN5dI1Fxm9ba
 5hluyggs7pVmHPX7KDGnQNZEjCsKoJF2kzB0hXwqGo1JxB2sEGq3DV/uMTHPgKKM
 HlrTJEEb7pOeHHf2Zt8MQiHMC0YBJ6MmpVIidR02TdQdCgZQcpoYnn20odmNLK5D
 HTT4jbCSsbOiHPpdAn0YkgzXzL58a2/b1Tt/pNq+Rud7VKwFVuFhNElPyAxR1/1K
 tPGUbDA6eOfosU++QbDJ0QHxDTgBTyXwh4yMkwPjmdWJ2jwTQev74iiDXgDf3QbS
 tMmWSwfC38hBxvfomNA6QTPvY1c3NNvj3uO+xq4/q6HloGqSlnIrPBL9hF0BiVQ6
 KteZ6gF+R1rHEIQraEnbNmh5Rxvi2RdgN0Xa0U9w/yr0LCowmUbFK35XoyTFyo/8
 rF7xp85s1wH5OuaA0NQeivQEKbuBxl7nvJlC2RBdRT/B9U5now+Y2XKOX0+FfRsg
 Qm0By/K8ss2XftX+lNfip+ScSmu9p+kCBME0X8Ra590VDt0IcPiJsC9ozhKyywRy
 KfSGj3iZG/nDaec6i1aN
 =hN/H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
 "Grab bag of InfiniBand/RDMA fixes:
   - IPoIB fixes for regressions introduced by path database conversion
   - mlx4 fixes for bugs with large memory systems and regressions from
     SR-IOV patches
   - RDMA CM fix for passing bad event up to userspace
   - Other minor fixes"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Check iboe netdev pointer before dereferencing it
  mlx4_core: Clean up buddy bitmap allocation
  mlx4_core: Fix integer overflow issues around MTT table
  mlx4_core: Allow large mlx4_buddy bitmaps
  IB/srp: Fix a race condition
  IB/qib: Fix error return code in qib_init_7322_variables()
  IB: Fix typos in infiniband drivers
  IB/ipoib: Fix RCU pointer dereference of wrong object
  IB/ipoib: Add missing locking when CM object is deleted
  RDMA/ucma.c: Fix for events with wrong context on iWARP
  RDMA/ocrdma: Don't call vlan_dev_real_dev() for non-VLAN netdevs
  IB/mlx4: Fix possible deadlock on sm_lock spinlock
2012-08-17 11:45:58 -07:00
Roland Dreier 96f17d5900 mlx4_core: Clean up buddy bitmap allocation
- Use kcalloc() / vzalloc() instead of an extra bitmap_zero().
 - Add __GFP_NOWARN to kcalloc() since we'll try vzalloc() if it fails.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:27 -07:00
Yishai Hadas 3de819e6b6 mlx4_core: Fix integer overflow issues around MTT table
Fix some issues around int variables used in data structures related
to memory registration.

Handle int overflow in mlx4_init_icm_table by using a u64 intermediate
variable and changing struct mlx4_icm_table num_obj field to be u32.

Change some more fields/variables to use u32 instead of int to prevent
a case where the variable becomes negative when bit 31 is set.

Also subtract log_mtts_per_seg from the exponent when computing
num_mtt, since its added later on in that very same code area.

This and the previous commit fixes some issues which actually prevent
commit db5a7a65c0 ("mlx4_core: Scale size of MTT table with system
RAM") from working.  Now, when the number of MTTs is scaled with the
size of the RAM we can map up to 8TB.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:26 -07:00
Yishai Hadas 89dd86db78 mlx4_core: Allow large mlx4_buddy bitmaps
mlx4_buddy_init uses kmalloc() to allocate bitmaps, which fails when
the required size is beyond the max supported value (or when memory is
too fragmented to handle a huge allocation).  Extend this to use use
vmalloc() if kmalloc() fails, and take that into account when freeing
the bitmaps as well.

This fixes a driver load failure when log num mtt is 26 or higher, and
is a step in the direction of allowing to register huge amounts of
memory on large memory systems.

Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 21:05:20 -07:00
Julia Lawall 499b95f6b3 drivers/net/ethernet/mellanox/mlx4/mcg.c: fix error return code
Convert a 0 error return code to a negative one, as returned elsewhere in the
function.

A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
identifier ret;
expression e,e1,e2,e3,e4,x;
@@

(
if (\(ret != 0\|ret < 0\) || ...) { ... return ...; }
|
ret = 0
)
... when != ret = e1
*x = \(kmalloc\|kzalloc\|kcalloc\|devm_kzalloc\|ioremap\|ioremap_nocache\|devm_ioremap\|devm_ioremap_nocache\)(...);
... when != x = e2
    when != ret = e3
*if (x == NULL || ...)
{
  ... when != ret = e4
*  return ret;
}
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-14 17:00:57 -07:00
Yevgeny Petrilin 2207b60ffb net/mlx4_core: Remove port type restrictions
Port1=Eth, Port2=IB restriction is no longer required.
Having RoCE, there will always rdma port initialized over ConnectX
physical port, no matter whether the link layer is IB or Ethernet.
So we always have dual port IB device.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03 16:49:40 -07:00
Yevgeny Petrilin c18520bd1b net/mlx4_en: Fixing TX queue stop/wake flow
Removing the ring->blocked flag, it is redundant and leads to a race:

We close the TX queue and then set the "blocked" flag.
Between those 2 operations the completion function can check the "blocked"
flag, sees that it is 0, and wouldn't open the TX queue.

Using netif_tx_queue_stopped to check the state of the queue to avoid this race.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03 16:49:02 -07:00
Amir Vadai c8c40b7f32 net/mlx4_en: loopbacked packets are dropped when SMAC=DMAC
Should NOT check SMAC=DMAC when:
1. loopback is turned on
2. validate_loopback is true.

Fixed it accordingly.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-08-03 16:49:02 -07:00
Linus Torvalds 1e30c1b386 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking updates and fixes from David Miller:

1) Reinstate the no-ref optimization for input route lookups in ipv4 to
   fix some routing cache removal perf regressions.

2) Make TCP socket pre-demux work on ipv6 side too, from Eric Dumazet.

3) Get RX hash value from correct place in be2net driver, from
   Sarveshwar Bandi.

4) Validation of FIB cached routes missing critical check, from Eric
   Dumazet.

5) EEH support in mlx4 driver, from Kleber Sacilotto de Souza.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (23 commits)
  ipv6: Early TCP socket demux
  ipv4: Fix input route performance regression.
  pch_gbe: vlan skb len fix
  pch_gbe: add extra clean tx
  pch_gbe: fix transmit watchdog timeout
  ixgbe: fix panic while dumping packets on Tx hang with IOMMU
  be2net: Fix to parse RSS hash from Receive completions correctly.
  net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
  hyperv: Add error handling to rndis_filter_device_add()
  hyperv: Add a check for ring_size value
  ipv4: rt_cache_valid must check expired routes
  net/pch_gpe: Cannot disable ethernet autonegation
  qeth: repair crash in qeth_l3_vlan_rx_kill_vid()
  netiucv: cleanup attribute usage
  net: wiznet add missing HAS_IOMEM dependency
  be2net: Missing byteswap in be_get_fw_log_level causes oops on PowerPC
  mlx4: Add support for EEH error recovery
  cdc-ncm: tag Ericsson WWAN devices (eg F5521gw) with FLAG_WWAN
  wanmain: comparing array with NULL
  caif: fix NULL pointer check
  ...
2012-07-26 18:09:01 -07:00
Amir Vadai ee64c0ee51 net/mlx4_en: Limit the RFS filter IDs to be < RPS_NO_FILTER
RFS filter id can't have the special value RPS_NO_FILTER,
need to skip it when allocating id's.

CC: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-26 00:23:55 -07:00
Kleber Sacilotto de Souza 57dbf29a54 mlx4: Add support for EEH error recovery
Currently the mlx4 drivers don't have the necessary callbacks to
implement EEH errors detection and recovery, so the PCI layer uses the
probe and remove callbacks to try to recover the device after an error on
the bus. However, these callbacks have race conditions with the internal
catastrophic error recovery functions, which will also detect the error
and this can cause the system to crash if both EEH and catas functions
try to reset the device.

This patch adds the necessary error recovery callbacks and makes sure
that the internal catastrophic error functions will not try to reset the
device in such scenarios. It also adds some calls to
pci_channel_offline() to suppress reads/writes on the bus when the slot
cannot accept I/O operations so we prevent unnecessary accesses to the
bus and speed up the device removal.

Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Acked-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-25 15:24:13 -07:00
Linus Torvalds 5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Roland Dreier 089117e1ad Merge branches 'cma', 'cxgb4', 'misc', 'mlx4-sriov', 'mlx-cleanups', 'ocrdma' and 'qib' into for-linus 2012-07-22 23:26:17 -07:00
Thadeu Lima de Souza Cascardo 4cce66cdd1 mlx4_en: map entire pages to increase throughput
In its receive path, mlx4_en driver maps each page chunk that it pushes
to the hardware and unmaps it when pushing it up the stack. This limits
throughput to about 3Gbps on a Power7 8-core machine.

One solution is to map the entire allocated page at once. However, this
requires that we keep track of every page fragment we give to a
descriptor. We also need to work with the discipline that all fragments will
be released (in the sense that it will not be reused by the driver
anymore) in the order they are allocated to the driver.

This requires that we don't reuse any fragments, every single one of
them must be reallocated. We do that by releasing all the fragments that
are processed and only after finished processing the descriptors, we
start the refill.

We also must somehow guarantee that we either refill all fragments in a
descriptor or none at all, without resorting to giving up a page
fragment that we would have already given. Otherwise, we would break the
discipline of only releasing the fragments in the order they were
allocated.

This has passed page allocation fault injections (restricted to the
driver by using required-start and required-end) and device hotplug
while 16 TCP streams were able to deliver more than 9Gbps.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 10:53:13 -07:00
Amir Vadai 1eb8c695bd net/mlx4_en: Add accelerated RFS support
Use RFS infrastructure and flow steering in HW to keep CPU
affinity of rx interrupts and application per TCP stream.

A flow steering filter is added to the HW whenever the RFS
ndo callback is invoked by core networking code.

Because the invocation takes place in interrupt context, the
actual setup of HW is done using workqueue. Whenever new filter
is added, the driver checks for expiry of existing filters.

Since there's window in time between the point where the core
RFS code invoked the ndo callback, to the point where the HW
is configured from the workqueue context, the 2nd, 3rd etc
packets from that stream will cause the net core to invoke
the callback again and again.

To prevent inefficient/double configuration of the HW, the filters
are kept in a database which is indexed using hash function to enable
fast access.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai d9236c3f10 {NET,IB}/mlx4: Add rmap support to mlx4_assign_eq
Enable callers of mlx4_assign_eq to supply a pointer to cpu_rmap.
If supplied, the assigned IRQ is tracked using rmap infrastructure.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Amir Vadai af22d9de45 net/mlx4: Move MAC_MASK to a common place
Define this macro is one common place instead of duplicating it over the code

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-19 08:34:37 -07:00
Dan Carpenter 9c64508af2 net/mlx4_en: dereferencing freed memory
We dereferenced "mclist" after the kfree().

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:58:07 -07:00
Dan Carpenter 447458c01f net/mlx4: off by one in parse_trans_rule()
This should be ">=" here instead of ">".  MLX4_NET_TRANS_RULE_NUM is 6.
We use "spec->id" as an array offset into the __rule_hw_sz[] and
__sw_id_hw[] arrays which have 6 elements.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-16 22:57:43 -07:00
Jack Morgenstein 6634961c14 mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
To allow easy paravirtualization of P_Key and GID table sizes, keep
paravirtualized sizes in mlx4_dev->caps, but save the actual physical
sizes from FW in struct: mlx4_dev->phys_cap.

In addition, in SR-IOV mode, do the following:

1. Reduce reported P_Key table size by 1.
   This is done to reserve the highest P_Key index for internal use,
   for declaring an invalid P_Key in P_Key paravirtualization.
   We require a P_Key index which always contain an invalid P_Key
   value for this purpose (i.e., one which cannot be modified by
   the subnet manager).  The way to do this is to reduce the
   P_Key table size reported to the subnet manager by 1, so that
   it will not attempt to access the P_Key at index #127.

2. Paravirtualize the GID table size to 1. Thus, each guest sees
   only a single GID (at its paravirtualized index 0).

In addition, since we are paravirtualizing the GID table size to 1, we
add paravirtualization of the master GID event here (i.e., we do not
do ib_dispatch_event() for the GUID change event on the master, since
its (only) GUID never changes).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:52:23 -07:00
Jack Morgenstein 105c320f6a mlx4_core: Allow guests to have IB ports
Modify mlx4_dev_cap to allow IB support when SR-IOV is active.  Modify
mlx4_slave_cap to set the "rdma-supported" bit in its flags area, and
pass that to the guests (this is done in QUERY_FUNC_CAP and its
wrapper).

However, we don't activate IB support quite yet -- we leave the error
return at the start of mlx4_ib_add in the mlx4_ib driver.

In addition, set "protected fmr supported" bit to zero in the
QUERY_FUNC_CAP wrapper.

Finally, in the QUERY_FUNC_CAP wrapper, we needed to add code which
checks for the port type (IB or Ethernet).  Previously, this was not
an issue, since only Ethernet ports were supported.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:51:37 -07:00
Jack Morgenstein 396f2feb05 mlx4_core: Implement mechanism for reserved Q_Keys
The SR-IOV special QP tunneling mechanism uses proxy special QPs
(instead of the real special QPs) for MADs on guests.  These proxy QPs
send their packets to a "tunnel" QP owned by the master.  The master
then forwards the MAD (after any required paravirtualization) to the
real special QP, which sends out the MAD.

For security reasons (i.e., to prevent guests from sending MADs to
tunnel QPs belonging to other guests), each proxy-tunnel QP pair is
assigned a unique, reserved, Q_Key.  These Q_Keys are available only
for proxy and tunnel QPs -- if the guest tries to use these Q_Keys
with other QPs, it will fail.

This patch introduces a mechanism for reserving a block of 64K Q_Keys
for proxy/tunneling use.

The patch introduces also two new fields into mlx4_dev: base_sqpn and
base_tunnel_sqpn.

In SR-IOV mode, the QP numbers for the "real," proxy, and tunnel sqps
are added to the reserved QPN area (so that they will not change).
There are 8 special QPs per port in the HCA, and each of them is
assigned both a proxy and a tunnel QP, for each VF and for the PF as
well in SR-IOV mode.

The QPNs for these QPs are arranged as follows:
 1. The real SQP numbers (8)
 2. The proxy SQPs (8 * (max number of VFs + max number of PFs)
 3. The tunnel SQPs (8 * (max number of VFs + max number of PFs)

To support these QPs, two new fields are added to struct mlx4_dev:

  base_sqp:  this is the QP number of the first of the real SQPs
  base_tunnel_sqp: this is the qp number of the first qp in the tunnel
                   sqp region. (On guests, this is the first tunnel
                   sqp of the 8 which are assigned to that guest).

In addition, in SR-IOV mode, sqp_start is the number of the first
proxy SQP in the proxy SQP region.  (In guests, this is the first
proxy SQP of the 8 which are assigned to that guest)

Note that in non-SR-IOV mode, there are no proxies and no tunnels.
In this case, sqp_start is set to sqp_base -- which minimizes code
changes.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 11:35:55 -07:00
Dotan Barak 240a9207aa net/mlx4_core: Free ICM table in case of error
In mlx4_init_icm_table(), free the allocated table if we failed to
allocate memory to its entries.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:58 -07:00
Dotan Barak f457ce471c mlx4_core: Remove double function declarations
Spotted four duplicate declarations in icm.h, remove them.

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-11 09:22:58 -07:00
Jack Morgenstein 2aca1172c2 net/mlx4_core: Initialize IB port capabilities for all slaves
With IB SR-IOV, each slave has its own separate copy of the port
capabilities flags.  For example, the master can run a subnet manager
(which causes the IsSM bit to be set in the master's port
capabilities) without affecting the port capabilities seen by the
slaves (the IsSM bit will be seen as cleared in the slaves).

Also add a static inline mlx4_master_func_num() to enhance readability
of the code.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:57:06 -07:00
Jack Morgenstein 00f5ce99dc mlx4: Use port management change event instead of smp_snoop
The port management change event can replace smp_snoop.  If the
capability bit for this event is set in dev-caps, the event is used
(by the driver setting the PORT_MNG_CHG_EVENT bit in the async event
mask in the MAP_EQ fw command).  In this case, when the driver passes
incoming SMP PORT_INFO SET mads to the FW, the FW generates port
management change events to signal any changes to the driver.

If the FW generates these events, smp_snoop shouldn't be invoked in
ib_process_mad(), or duplicate events will occur (once from the
FW-generated event, and once from smp_snoop).

In the case where the FW does not generate port management change
events smp_snoop needs to be invoked to create these events.  The flow
in smp_snoop has been modified to make use of the same procedures as
in the fw-generated-event event case to generate the port management
events (LID change, Client-rereg, Pkey change, and/or GID change).

Port management change event handling required changing the
mlx4_ib_event and mlx4_dispatch_event prototypes; the "param" argument
(last argument) had to be changed to unsigned long in order to
accomodate passing the EQE pointer.

We also needed to move the definition of struct mlx4_eqe from
net/mlx4.h to file device.h -- to make it available to the IB driver,
to handle port management change events.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-10 09:47:10 -07:00
Jack Morgenstein 752a50cab6 mlx4_core: Pass an invalid PCI id number to VFs
Currently, VFs have 0 in their dev->caps.function field.  This is a
valid pci id (usually of the PF).  Instead, pass an invalid PCI id to
the VF via QUERY_FW, so that if the value gets accessed in the VF
driver, we'll catch the problem.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:05:05 -07:00
Hadar Hen Zion cabdc8ee37 net/mlx4_en: Add support for drop action through ethtool
The drop action is implemented by allocating a QP and keeping it in a reset state
such that the HW drops any packets which are steered to that QP. When a drop action
is requested, we attach the relevant flow to that QP.

Sign-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion 820672812f net/mlx4_en: Manage flow steering rules with ethtool
Implement the ethtool APIs for attaching L2/L3/L4 based flow steering
rules to the netdevice RX rings. Added set_rxnfc callback and enhanced
the existing get_rxnfc callback.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion 592e49dda8 net/mlx4: Implement promiscuous mode with device managed flow-steering
The device managed flow steering API has three promiscuous modes:

1. Uplink - captures all the packets that arrive to the port.
2. Allmulti - captures all multicast packets arriving to the port.
3. Function port - for future use, this mode is not implemented yet.

Use these modes with the flow_attach and flow_detach firmware commands
according to the promiscuous state of the netdevice.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion 1b9c6b064e net/mlx4_core: Add resource tracking for device managed flow steering rules
As with other device resources, the resource tracker is needed for supporting
device managed flow steering rules under SRIOV: make sure virtual functions
delete only rules created by them, and clean all rules attached by a crashed VF.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:06 -07:00
Hadar Hen Zion 0ff1fb654b {NET, IB}/mlx4: Add device managed flow steering firmware API
The driver is modified to support three operation modes.

If supported by firmware use the device managed flow steering
API, that which we call device managed steering mode. Else, if
the firmware supports the B0 steering mode use it, and finally,
if none of the above, use the A0 steering mode.

When the steering mode is device managed, the code is modified
such that L2 based rules set by the mlx4_en driver for Ethernet
unicast and multicast, and the IB stack multicast attach calls
done through the mlx4_ib driver are all routed to use the device
managed API.

When attaching rule using device managed flow steering API,
the firmware returns a 64 bit registration id, which is to be
provided during detach.

Currently the firmware is always programmed during HCA initialization
to use standard L2 hashing. Future work should be done to allow
configuring the flow-steering hash function with common, non
proprietary means.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion 8fcfb4db74 net/mlx4_core: Add firmware commands to support device managed flow steering
Add support for firmware commands to attach/detach a new device managed
steering mode. Such network steering rules allow the user to provide an
L2/L3/L4 flow specification to the firmware and have the device to steer
traffic that matches that specification to the provided QP.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion c96d97f4d1 net/mlx4: Set steering mode according to device capabilities
Instead of checking the firmware supported steering mode in various
places in the code, add a dedicated field in the mlx4 device capabilities
structure which is written once during the initialization flow and read
across the code.

This also set the grounds for add new steering modes. Currently two modes
are supported, and are named after the ConnectX HW versions A0 and B0.

A0 steering uses mac_index, vlan_index and priority to steer traffic
into pre-defined range of QPs.

B0 steering uses Ethernet L2 hashing rules and is enabled only
if the firmware supports both unicast and multicast B0 steering,

The current steering modes are relevant for Ethernet traffic only,
such that Infiniband steering remains untouched.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Yevgeny Petrilin 6d19993788 net/mlx4_en: Re-design multicast attachments flow
Currently, for every change in the net device multicast list, the driver
detaches all the addresses from the HW device, and then attaches the
updated list. This behavior is wrong from two aspects: first, it causes
a load of firmware commands and second, there is period of time where
the correct addresses are not attached, which turned into packet loss.

To improve - a copy of the multicast list is saved by the driver. For
every change in the multicast list, the multicast list copy is used
to find the delta between those two lists and add or remove multicast
addresses as needed.

Reported-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Cc: Shawn Bohrer <sbohrer@rgmadvisors.com>
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion aa1ec3dde1 net/mlx4_core: Change resource tracking ID to be 64 bit
Currently the IDs used by the resource tracker are of type u32, so far this was
ok since all the different resources we were tracking could be encoded in 32bit.

As a preparation step for tracking of resources whose IDs need > 32 bits such
as network flow steering rules, who are 64 bit in size, move to use 64 bit
based resource IDs.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Hadar Hen Zion 4af1c0488d net/mlx4_core: Change resource tracking mechanism to use red-black tree
Change the data structure used for managing the SRIOV resource tracking
mechanism from radix tree to red-black tree. This is preparation step
for supporting resource IDs which are 64bit long, such as network flow
steering rules. Such IDs can't be used as radix-tree keys on 32bit
architectures and hence the reason for the change.

Signed-off-by: Hadar Hen Zion <hadarh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-07 16:23:05 -07:00
Yuval Mintz 90b1ebe7af mlx4: set maximal number of default RSS queues
Signed-off-by: Yuval Mintz <yuvalmin@broadcom.com>
Signed-off-by: Eilon Greenstein <eilong@broadcom.com>

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 03:06:44 -07:00
David S. Miller b26d344c6b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/caif/caif_hsi.c
	drivers/net/usb/qmi_wwan.c

The qmi_wwan merge was trivial.

The caif_hsi.c, on the other hand, was not.  It's a conflict between
1c385f1fdf ("caif-hsi: Replace platform
device with ops structure.") in the net-next tree and commit
39abbaef19 ("caif-hsi: Postpone init of
HIS until open()") in the net tree.

I did my best with that one and will ask Sjur to check it out.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-28 17:37:00 -07:00
Yevgeny Petrilin 044ca2a5f2 net/mlx4_en: Release QP range in free_resources
Add a missing resource release in ring cleanup.
Not doing this leaves a range of QPs that are being reserved,
and no one can use them.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
Yevgeny Petrilin 9858d2d1ac net/mlx4: Use single completion vector after NOP failure
Fix a crash at the error flow of NOP command which caused the driver to try and use
a completion vector which wasn't allocated.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
Yevgeny Petrilin 5c8e904666 net/mlx4_en: Set correct port parameters during device initialization
Set valid port parameters: MTU and flow control configuration when
configuring the port during HW device initialization,
prior to the net device open() being called.
Using  invalid parameters (such as all zeros)
could lead to bad firmware behavior.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-25 16:30:12 -07:00
David S. Miller 7e52b33bd5 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/ipv6/route.c

This deals with a merge conflict between the net-next addition of the
inetpeer network namespace ops, and Thomas Graf's bug fix in
2a0c451ade which makes sure we don't
register /proc/net/ipv6_route before it is actually safe to do so.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-15 15:51:55 -07:00
Linus Torvalds ae501be0f6 InfiniBand/RDMA fixes for 3.5-rc2, all in hardware drivers:
- Fix crash in cxgb4
  - Fixes to new ocrdma driver
  - Regression fixes for mlx4
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPz5V+AAoJEENa44ZhAt0hsXoQAIFBNbsLWLft4J9jpyDAFj5z
 SpspcZAVgJW+K7I6j38SdQXMsR/XmcwM8Yh3iJIa/YG0P/6bWXDNU20itN62mAcI
 D+LphECUv/T8PGxEcAFlwjLYy5kcgFBTs/t7AgVS57OL4z8YuDLDr7uhdcnPdq7t
 AMIptqhXZgSnM/peIkf2SWBYxHSUMhukrlu5uNB9uc9GC9VRGhioXxT3eBPwr54O
 7BwHxQGDYtFZ4fYKmFxN9sJJmFf9nl/WhYM2HwCMPEe4UzlxxfbP7/c17QNS7YGo
 e072Jvs+U2ttb4J7J2yBRcpiiSjYDAVi+7fE9OtAdWyfPDa9MmcajVq7PUnohZLc
 muNSv1RGOebj2mSjE+oc1oWZKkM/nSEwbIE7PJOvWX2RdqOs8xNNC0RjgvsXDvyf
 +lARPcyolxpJXtvIlLY/Ua1KhJf52zu+Kp1QCbEnloU/NuGcc09It6Wq6X1yU/Gs
 N29qjuFOBoHQxDzfWPc3Uhp1/eNI8UJiTfO9CSYHv2ZHJzW8BqoblbKlE2l43TDg
 frH1l1/9RYY7gTGoCVvfJByvSWM7rBJdNfpu+yJDm5en86xcF6HB7GG0nJ/okpw2
 dUqQsLu+nTKZn8KCM3jzgP/ANT+PWk1UZlS1fIaMFMwa1SLWYqbaW/KEbRrY5IeP
 H0q/TFhJH8PccBjPbcsT
 =zl8u
 -----END PGP SIGNATURE-----

Merge tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA fixes from Roland Dreier:
 "All in hardware drivers:
   - Fix crash in cxgb4
   - Fixes to new ocrdma driver
   - Regression fixes for mlx4"

* tag 'rdma-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB/mlx4: Fix max_wqe capacity reported from query device
  mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
  IB/mlx4: Fix EQ deallocation in legacy mode
  RDMA/cxgb4: Fix crash when peer address is 0.0.0.0
  RDMA/ocrdma: Remove unnecessary version.h includes
  RDMA/ocrdma: Fix signaled event for SRQ_LIMIT_REACHED
  RDMA/ocrdma: Correct queue free count math
2012-06-06 10:45:21 -07:00
Jack Morgenstein edc4a67e15 mlx4_core: Fix setting VL_cap in mlx4_SET_PORT wrapper flow
Commit 096335b3f9 ("mlx4_core: Allow dynamic MTU configuration for
IB ports") modifies the port VL setting.  This exposes a bug in
mlx4_common_set_port(), where the VL cap value passed in (inside the
command mailbox) is incorrectly zeroed-out:

mlx4_SET_PORT modifies the VL_cap field (byte 3 of the mailbox).
Since the SET_PORT command is paravirtualized on the master as well as
on the slaves, mlx4_SET_PORT_wrapper() is invoked on the master.  This
calls mlx4_common_set_port() where mailbox byte 3 gets overwritten by
code which should only set a single bit in that byte (for the reset
qkey counter flag) -- but instead overwrites the entire byte.

The result is that when running in SR-IOV mode, the VL_cap will be set
to zero -- fix this.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-06-06 10:07:54 -07:00
Joe Perches 6469933605 ethernet: Remove casts to same type
Adding casts of objects to the same type is unnecessary
and confusing for a human reader.

For example, this cast:

        int y;
        int *p = (int *)&y;

I used the coccinelle script below to find and remove these
unnecessary casts.  I manually removed the conversions this
script produces of casts with __force, __iomem and __user.

@@
type T;
T *p;
@@

-       (T *)p
+       p

A function in atl1e_main.c was passed a const pointer
when it actually modified elements of the structure.

Change the argument to a non-const pointer.

A function in stmmac needed a __force to avoid a sparse
warning.  Added it.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-06-06 09:31:33 -07:00
Jack Morgenstein 401453a31e net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
The "!mlx4_is_slave" is totally confusing.  Fix with
constant MLX4_CMD_NATIVE, which is the intended behavior.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein 6230bb234d net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
The range check was performed after using the port number.

Reverse this to prevent a potential array overflow.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein b91cb3ebcd net/mlx4_core: Fixes for VF / Guest startup flow
- pass the following parameters:
  - firmware version (added QUERY_FW paravirtualization for that)

  - disable Blueflame on slaves. KVM disables write combining on guests,
    and we get better performance without BF in this case. (This requires
    QUERY_DEV_CAP paravirtualization, also in this commit)

  - max qp rdma as destination

- get rid of a chunk of "if (0)" dead code

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein 13bf58b760 net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event
Port is used as an array index before we know if that is proper.

For example, in the catas event case, port is zero; however,
the port index should lie in the range (1..2).

Fix this by using 'port' only in the events where it is of interest.

Test for port out of range in the default (unhandled event) case,
and do not output a message if it is not an ethernet port.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Marcel Apfelbaum 3fc929e2d6 net/mlx4_core: Fix number of EQs used in ICM initialisation
In SRIOV mode, the number of EQs used when computing the total ICM size
was incorrect.

To fix this, we do the following:
1. We add a new structure to mlx4_dev, mlx4_phys_caps, to contain physical HCA
   capabilities.  The PPF uses the phys capabilities when it computes things
   like ICM size.

   The dev_caps structure will then contain the paravirtualized values, making
   bookkeeping much easier in SRIOV mode. We add a structure rather than a
   single parameter because there will be other fields in the phys_caps.

   The first field we add to the mlx4_phys_caps structure is num_phys_eqs.

2. In INIT_HCA, when running in SRIOV mode, the "log_num_eqs" parameter
   passed to the FW is the number of EQs per VF/PF; each function (PF or VF)
   has this number of EQs available.

   However, the total number of EQs which must be allowed for in the ICM is
   (1 << log_num_eqs) * (#VFs + #PFs).  Rather than compute this quantity,
   we allocate ICM space for 1024 EQs (which is the device maximum
   number of EQs, and which is the value we place in the mlx4_phys_caps structure).

   For INIT_HCA, however, we use the per-function number of EQs as described
   above.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:16 -04:00
Jack Morgenstein 30f7c73bed net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
Ths fixes the comparison in the FLR (Function Level Reset) event case.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-31 18:18:15 -04:00
Linus Torvalds c23ddf7857 InfiniBand/RDMA changes for the 3.5 merge window:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
  - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
    applications to send and receive arbitrary packets directly to/from
    the hardware
  - Add "doorbell drop" handling to the cxgb4 driver
  - A fairly large batch of qib hardware driver changes
  - A few fixes for lockdep-detected issues
  - A few other miscellaneous fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
 zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
 jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
 mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
 vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
 F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
 HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
 syKfN0VjiDRtJ+pxayi3
 =yWfr
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
 - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
   applications to send and receive arbitrary packets directly to/from
   the hardware
 - Add "doorbell drop" handling to the cxgb4 driver
 - A fairly large batch of qib hardware driver changes
 - A few fixes for lockdep-detected issues
 - A few other miscellaneous fixes and cleanups

Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.

* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
  RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
  IB/mlx4: Fix mlx4_ib_add() error flow
  IB/core: Fix IB_SA_COMP_MASK macro
  IB/iser: Fix error flow in iser ep connection establishment
  IB/mlx4: Increase the number of vectors (EQs) available for ULPs
  RDMA/cxgb4: Add query_qp support
  RDMA/cxgb4: Remove kfifo usage
  RDMA/cxgb4: Use vmalloc() for debugfs QP dump
  RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
  RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
  RDMA/cxgb4: Add DB Overflow Avoidance
  RDMA/cxgb4: Add debugfs RDMA memory stats
  cxgb4: DB Drop Recovery for RDMA and LLD queues
  cxgb4: Common platform specific changes for DB Drop Recovery
  cxgb4: Detect DB FULL events and notify RDMA ULD
  RDMA/cxgb4: Drop peer_abort when no endpoint found
  RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
  mlx4_core: Change bitmap allocator to work in round-robin fashion
  RDMA/nes: Don't call event handler if pointer is NULL
  RDMA/nes: Fix for the ORD value of the connecting peer
  ...
2012-05-21 17:54:55 -07:00
Amir Vadai bc6a4744b8 net/mlx4_en: num cores tx rings for every UP
Change the TX ring scheme such that the number of rings for untagged packets
and for tagged packets (per each of the vlan priorities) is the same, unlike
the current situation where for tagged traffic there's one ring per priority
and for untagged rings as the number of core.

Queue selection is done as follows:

If the mqprio qdisc is operates on the interface, such that the core networking
code invoked the device setup_tc ndo callback, a mapping of skb->priority =>
queue set is forced - for both, tagged and untagged traffic.

Else, the egress map skb->priority =>  User priority is used for tagged traffic, and
all untagged traffic is sent through tx rings of UP 0.

The patch follows the convergence of discussing that issue with John Fastabend
over this thread http://comments.gmane.org/gmane.linux.network/229877

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Liran Liss <liranl@mellanox.com>
Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-17 16:17:50 -04:00
Jack Morgenstein eb71d0d63f net/mlx4_core: Fixed error flow in rem_slave_eqs
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:59 -04:00
Jack Morgenstein ba062d5219 net/mlx4_core: Add XRC domains and counters to resource tracker
Add missing resource tracking for XRC domains and complete the tracking for HCA
network flow counters.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:59 -04:00
Jack Morgenstein b8924951f6 net/mlx4_core: Fix potential kernel Oops in res tracker during Dom0 driver unload
Currently the slave and master resources are deleted after master freed
all bitmaps. If any resources were not properly cleaned up during the
shutdown process, an Oops would result.

Fix so that delete slave (only) resources during cleanup. Master resources
are cleaned up during unload process, and need not separately be cleaned.

Note that during cleanup, we need to split the resource-tracker freeing
functionality.

Before removing all the bitmaps, we free any leftover slave resources.
However, we can only remove the resource tracker linked list after
all bitmap frees, since some of the freeing functions (e.g.,
mlx4_cleanup_eq_table) use paravirtualized FW commands which expect
the resource tracker linked list to be present.

Found-by: Aviad Yehezkel <aviadye@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein 681372a7a3 net/mlx4_core: Do not reset module-parameter num_vfs when fail to enable sriov
Consider the following scenario: 2 HCAs, where only one of which can run SRIOV.

If we reset the module parameter, all the VFs of the SRIOV HCA will be
claimed by the PPF host (-- the code relies on num_vfs being non-zero
to avoid this claiming, and num_vfs was reset when pci_enable_sriov failed
for the non-SRIOV HCA).

The solution is not to touch the num_vfs parameter.

Also, eliminate the unneeded check of num_vfs when disabling sriov
(the dev flag bit is sufficient).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein b9985f410a net/mlx4_core: Remove unused *_str functions from the resource tracker
Removed unsued *_str helper functions from resource_tracker.c

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein 5e92d803bf net/mlx4_core: Change SYNC_TPT to be native (not wrapped)
The "wrapped" was incorrect, since no wrapper function was defined.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein 8bac9ede68 net/mlx4_core: Fix init_port mask state for slaves
In function mlx4_INIT_PORT_wrapper, the port state mask for the
slave is only set if we are invoking the INIT_PORT fw command.

However, the reference count for the (initialized) port is
incremented anyway.

This creates a problem in that when we have multiple slaves,
then the CLOSE_PORT command will never be invoked. The
reason is that in the CLOSE_PORT wrapper, if the port-state
mask is zero for the slave (which it is), the wrapper returns
without doing anything. The only slave which will not return
immediately in the CLOSE_PORT wrapper is that slave for which
INIT_PORT was invoked.

The fix is to not have the port-state mask setting depend
on the logic for calling the INIT_PORT fw command.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Or Gerlitz 162344ed2c net/mlx4: Address build warnings on set but not used variables
Handle the compiler warnings on variables which are set but not used
by removing the relevant variable or casting a return value which is
ignored on purpose to void.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-05-16 00:56:58 -04:00
Jack Morgenstein f4ec9e9531 mlx4_core: Change bitmap allocator to work in round-robin fashion
Under most circumstances, the bitmap allocator does not allocate the
same full 24-bit QP number immediately after a QP is destroyed.

This works by using the upper bits of a 24-bit QP number, beyond the
number of QPs that are actually available in the low level driver.
For example, say that the HCA is willing to allocate a maximum of 64K
qps.  We use the bits 23..16 as a "counter" which is incremented by 1
at each allocation so that even if the same physical QP is
re-allocated, it will not receive the same 24-bit QP number.

However, we have seen the following scenario:
1. Allocate, say, 255 QPs in succession.  This will cause a wrap of the "counter".
2. Destroy the first QP allocated, then allocate a new QP.  The new QP,
   because of the counter wraparound, will get the same FULL QP number as
   the QP just destroyed!

This is a problem because packets in transit can be erroneously
delivered to the new QP when they were meant for the old (destroyed)
QP, because the full QP number of the new QP is identical to the
destroyed QP.  (The "counter" mechanism is meant to prevent this by
having the full 24-bit QP numbers differ even if the physical QP on
the HCA is the same.  As we see above, however, this mechanism does
not always work).

The best fix for this problem is to allocate QPs in round-robin mode,
so that the physical QP numbers are not immediately re-used.

Found-by:  Matthew Finlay <matt@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-14 13:44:38 -07:00
Shlomo Pongratz b3416f4476 mlx4_core: Add second capabilities flags field
This patch adds a 64-bit flags2 features member to struct mlx4_dev to
export further features of the hardware.  The original flags field
tracks features whose support bits are advertised by the firmware in
offsets 0x40 and 0x44 of the query device capabilities command.
flags2 will track features whose support bits are scattered at various
offsets.

RSS support is the first feature to be exported through flags2.  RSS
capabilities are located at offset 0x2e.  The size of the RSS
indirection table is also given in this offset.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-08 11:54:32 -07:00
Yevgeny Petrilin 5b263f5374 mlx4_en: Byte Queue Limit support
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:34:02 -04:00
Yevgeny Petrilin e22979d96a mlx4_en: Moving to Interrupts for TX completions
Moving to interrupts instead of polling fpr TX completions
Avoiding situations where skb can be held in by the driver for
a long time (till timer expires).
The change is also necessary for supporting BQL.

Removing comp_lock that was required because we could handle TX
completions from several contexts: Interrupts, timer, polling.
Now there is only interrupts

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:34:02 -04:00
Yevgeny Petrilin a19a848a45 mlx4_en: Added Ethtool support for TX Interrupt coalescing
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-23 22:34:02 -04:00
Paul Gortmaker 43c880dff3 drivers/net: fix unresolved 64bit math in mellanox/mlx4/en_dcb_nl.c
Commit 109d244605

    "net/mlx4_en: Set max rate-limit for a TC"

introduced 64 bit math operations into mlx4_en_dcbnl_ieee_setmaxrate()

causing the following final link failure on an x86_32 allmodconfig

  ERROR: "__udivdi3" [drivers/net/ethernet/mellanox/mlx4/mlx4_en.ko] undefined!

Convert it to use div_u64() instead.

Cc: Amir Vadai <amirv@mellanox.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-16 02:12:11 -04:00
Masanari Iida fd9071ec61 net: Fix spelling typo in net
Correct spelling typo within drivers/net.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-14 15:29:02 -04:00
David S. Miller 06eb4eafbd Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-04-10 14:30:45 -04:00
Amir Vadai 109d244605 net/mlx4_en: Set max rate-limit for a TC
This patch is using the DCB netlink to set rate limit per ETS TC
Values are accepted in Kbps and rounded up to the nearest multiply of 100Mbps.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai 897d7846b4 net/mlx4_en: sk_prio <=> UP for untagged traffic
Since vlan egress map is only good for tagged traffic, need to have other
mapping to be used by untagged traffic.
For that, the driver uses sch_mqprio mapping. This mapping could be set by
using tc tool from iproute2 package.
Mapped UP will be used by the HW for QoS purposes, but won't go out on the
wire.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai 564c274c3d net/mlx4_en: DCB QoS support
Set TSA, promised BW and PFC using IEEE 802.1qaz netlink commands.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:04 -04:00
Amir Vadai e5395e92a4 net/mlx4_core: set port QoS attributes
Adding QoS firmware commands:
- mlx4_en_SET_PORT_PRIO2TC - set UP <=> TC
- mlx4_en_SET_PORT_SCHEDULER - set promised BW, max BW and PG number

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:03 -04:00
Amir Vadai 0e98b523c4 net/mlx4_en: Force user priority by QP attribute
Instead of relying on HW to change schedule queue by UP, schedule
queue is fixed for a tx_ring, and UP in WQE is ignored in this aspect.  This
resolves two issues with untagged traffic:
1. untagged traffic has no UP in packet which is needed for QoS. The change
   above allows setting the schedule queue (and by that the UP) of such a stream.
2. BlueFlame uses the same field used by vlan tag. So forcing UP from QPC
   allows using BF for untagged but prioritized traffic.

In old firmware that force UP is not supported, untagged traffic will not subject to
QoS.

Because UP is set by QP, need to always have a tx ring per UP, even if pfcrx
module paramter is false.

Signed-off-by: Amir Vadai <amirv@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-05 05:08:03 -04:00
Thadeu Lima de Souza Cascardo 117980c4c9 mlx4: allocate just enough pages instead of always 4 pages
The driver uses a 2-order allocation, which is too much on architectures
like ppc64, which has a 64KiB page. This particular allocation is used
for large packet fragments that may have a size of 512, 1024, 4096 or
fill the whole allocation. So, a minimum size of 16384 is good enough
and will be the same size that is used in architectures of 4KiB sized
pages.

This will avoid allocation failures that we see when the system is under
stress, but still has plenty of memory, like the one below.

This will also allow us to set the interface MTU to higher values like
9000, which was not possible on ppc64 without this patch.

Node 1 DMA: 737*64kB 37*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB 0*8192kB 0*16384kB = 51904kB
83137 total pagecache pages
0 pages in swap cache
Swap cache stats: add 0, delete 0, find 0/0
Free swap  = 10420096kB
Total swap = 10420096kB
107776 pages RAM
1184 pages reserved
147343 pages shared
28152 pages non-shared
netstat: page allocation failure. order:2, mode:0x4020
Call Trace:
[c0000001a4fa3770] [c000000000012f04] .show_stack+0x74/0x1c0 (unreliable)
[c0000001a4fa3820] [c00000000016af38] .__alloc_pages_nodemask+0x618/0x930
[c0000001a4fa39a0] [c0000000001a71a0] .alloc_pages_current+0xb0/0x170
[c0000001a4fa3a40] [d00000000dcc3e00] .mlx4_en_alloc_frag+0x200/0x240 [mlx4_en]
[c0000001a4fa3b10] [d00000000dcc3f8c] .mlx4_en_complete_rx_desc+0x14c/0x250 [mlx4_en]
[c0000001a4fa3be0] [d00000000dcc4eec] .mlx4_en_process_rx_cq+0x62c/0x850 [mlx4_en]
[c0000001a4fa3d20] [d00000000dcc5150] .mlx4_en_poll_rx_cq+0x40/0x90 [mlx4_en]
[c0000001a4fa3dc0] [c0000000004e2bb8] .net_rx_action+0x178/0x450
[c0000001a4fa3eb0] [c00000000009c9b8] .__do_softirq+0x118/0x290
[c0000001a4fa3f90] [c000000000031df8] .call_do_softirq+0x14/0x24
[c000000184c3b520] [c00000000000e700] .do_softirq+0xf0/0x110
[c000000184c3b5c0] [c00000000009c6d4] .irq_exit+0xb4/0xc0
[c000000184c3b640] [c00000000000e964] .do_IRQ+0x144/0x230

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Tested-by: Kleber Sacilotto de Souza <klebers@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-04-04 20:34:29 -04:00
Linus Torvalds 0c2fe82a9b InfiniBand/RDMA changes for the 3.4 merge window. Nothing big really
stands out; by patch count lots of fixes to the mlx4 driver plus some
 cleanups and fixes to the core and other drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
 CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
 12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
 A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
 SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
 oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
 Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
 tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
 3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
 ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
 R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
 3etEZZRwROAyYKcwN01p
 =DbLx
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
 "Nothing big really stands out; by patch count lots of fixes to the
  mlx4 driver plus some cleanups and fixes to the core and other
  drivers."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
  mlx4_core: Scale size of MTT table with system RAM
  mlx4_core: Allow dynamic MTU configuration for IB ports
  IB/mlx4: Fix info returned when querying IBoE ports
  IB/mlx4: Fix possible missed completion event
  mlx4_core: Report thermal error events
  mlx4_core: Fix one more static exported function
  IB: Change CQE "csum_ok" field to a bit flag
  RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
  RDMA/cxgb3: Don't pass irq flags to flush_qp()
  mlx4_core: Get rid of redundant ext_port_cap flags
  RDMA/ucma: Fix AB-BA deadlock
  IB/ehca: Fix ilog2() compile failure
  IB: Use central enum for speed instead of hard-coded values
  IB/iser: Post initial receive buffers before sending the final login request
  IB/iser: Free IB connection resources in the proper place
  IB/srp: Consolidate repetitive sysfs code
  IB/srp: Use pr_fmt() and pr_err()/pr_warn()
  IB/core: Fix SDR rates in sysfs
  mlx4: Enforce device max FMR maps in FMR alloc
  IB/mlx4: Set bad_wr for invalid send opcode
  ...
2012-03-21 10:33:42 -07:00
Eugenia Emantayev 58a3de0592 mlx4_core: fix race on comm channel
Prevent race condition between commands on comm channel.
Happened while unloading the driver when switching from
event to polling mode. VF got completion on the last command
before switching to polling mode, but toggle was not changed.
After the fix - VF will not write the next command before
toggle is updated.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-19 18:02:05 -04:00
Roland Dreier 42872c7a5e Merge branches 'misc' and 'mlx4' into for-next
Conflicts:
	drivers/infiniband/hw/mlx4/main.c
	drivers/net/ethernet/mellanox/mlx4/main.c
	include/linux/mlx4/device.h
2012-03-12 16:25:28 -07:00
Roland Dreier db5a7a65c0 mlx4_core: Scale size of MTT table with system RAM
The current driver defaults to 1M MTT segments, where each segment holds
8 MTT entries.  This limits the total memory registered to 8M * PAGE_SIZE
which is 32GB with 4K pages.  Since systems that have much more memory
are pretty common now (at least among systems with InfiniBand hardware),
this limit ends up getting hit in practice quite a bit.

Handle this by having the driver allocate at least enough MTT entries to
cover 2 * totalram pages.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:59 -07:00
Or Gerlitz 096335b3f9 mlx4_core: Allow dynamic MTU configuration for IB ports
Set the MTU for IB ports in the driver instead of using the firmware
default of 2KB (the driver defaults to 4KB).  Allow for dynamic mtu
configuration through a new, per-port sysfs entry.

Since there's a dependency between the port MTU and the max number of
HW VLs the port can support, apply a mim/max approach, using a loop
that goes down from the highest possible number of VLs to the lowest,
using the firmware return status to know whether the requested number
of VLs is possible with a given MTU.

For now, as with the dynamic link type change / VPI support, the sysfs
entry to change the mtu is exposed only when NOT running in SR-IOV
mode.  To allow changing the MTU for the master in SR-IOV mode,
primary-function-initiated FLR (Function Level Reset) needs to be
implemented.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:59 -07:00
Jack Morgenstein 5984be9004 mlx4_core: Report thermal error events
Print an error message when a thermal error async event is reported by the HW.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Dotan Barak <dotanb@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:59 -07:00
Roland Dreier e10903b087 mlx4_core: Fix one more static exported function
Commit 22c8bff6fa ("mlx4_core: Exported functions can't be static")
fixed most of this up, but forgot about mlx4_is_slave_active().  Fix
this one too.

Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-12 16:24:58 -07:00
David S. Miller b2d3298e09 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-03-09 14:34:20 -08:00
Jack Morgenstein dcf353b170 mlx4_core: fix bug in modify_cq wrapper for resize flow.
The actual FW command is called in procedure "handle_resize".
Code incorrectly invoked the FW command again (in good flow), in
the modify_cq wrapper function.

Fix by skipping second FW invocation unconditionally for resize.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-08 00:28:01 -08:00
Or Gerlitz 8154c07fe1 mlx4_core: Get rid of redundant ext_port_cap flags
While doing the work for commit a6f7feae6d ("IB/mlx4: pass SMP
vendor-specific attribute MADs to firmware") we realized that the
firmware would respond on all sorts of vendor-specific MADs.
Therefore commit 97285b7817 ("mlx4_core: Add extended port
capabilities support") adds redundant code into the driver, since
there's no real reaon to maintain the extended capabilities of the
port, as they can be queried on demand (e.g the FDR10 capability).

This patch reverts commit 97285b7817 and removes the check for
extended caps from the mlx4_ib driver port query flow.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-06 17:25:18 -08:00
Yevgeny Petrilin 66431a7d45 net/mlx4: defining functions as static
Fixing sparse warnings, the 2 functions are only used in same
file. Defining them as static and not exporting them.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Yevgeny Petrilin be6736ba1f net/mlx4: remove unused functions
Removing functions that are no longer in use, but still exist

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Yevgeny Petrilin 9a9a232a92 net/mlx4: fixing sparse warnings for not declared, functions
The SET_PORT functions are implemented in port.c, which is part
of mlx4_core, these functions are exported. The functions are in use by
the mlx4_en module (were originally part of mlx4_en).
Their declaration remained in mlx4_en module, moving the declaration to the right location.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:18 -05:00
Yevgeny Petrilin 2ab573c586 net/mlx4: fixing sparse warnings when copying mac, address to gid entry
The mac should be written as __be64 the gid. The warning was because
we changed the mac parameter, which is u64, by writing result of cpu_to_be64
into it. Fixing by using new variable of type __be64.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Yevgeny Petrilin 39b2c4ebb4 net/mlx4: fix sparse warnings on wrong type for RSS keys
The keys used for the hardware RSS topelitz hash are of type __be32
where the values provided by the driver are from array of u32,
this triggered sparse warning on incorrect type in assignment as of different base types.
Since these values are picked randomly,
the fix is to transform the key to __be32 by executing cpu_to_be_32()

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Or Gerlitz 966684d581 net/mlx4: fix sparse warnings on TX blue flame buffer
The blue flame buffer is defined to be of type void __iomem *
but was passed to mlx4_bf_copy which gets unsigned long * .
This triggered sparse warning on different address spaces,
fix that by changing mlx4_bf_copy first param to be of type void __iomem * .

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Or Gerlitz 4ef2a435be net/mlx4: fix sparse warnings on TX control flags, endianess
Fix sparse warnings on incompatibility between the endianess of the ctrl_flags
field of struct mlx4_en_priv to the srcrb_flags field of struct
mlx4_wqe_ctrl_seg by changing the former to be __be32 instead of u32.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Yevgeny Petrilin ebf8c9aa03 net/mlx4_en: Saving mem access on data path
Localized the pdev->dev, and using dma_map instead of pci_map
There are multiple map/unmap operations on data path,
optimizing those by saving redundant pointer access.
Those places were identified as hot-spots when running kernel profiling
during some benchmarks.
The fixes had most impact when testing packet rate with small packets,
reducing several % from CPU load, and in some case being the difference
between reaching wire speed or being CPU bound.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 15:19:17 -05:00
Yevgeny Petrilin 1d4526e037 mlx4_core: remove buggy sched_queue masking
Fixes a bug introduced by commit fe9a2603c, where the priority bits
in the schedule queue field were masked out.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-06 14:43:50 -05:00
Eric Dumazet 18f973af3e mlx4_en: remove sparse errors
Fix new sparse errors introduced in commit 6221217199 (mlx4_en: dont
change mac_header on xmit)

Reported-by: Or Gerlitz <or.gerlitz@gmail.com>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-03-05 16:42:45 -05:00
David S. Miller ff4783ce78 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/sfc/rx.c

Overlapping changes in drivers/net/ethernet/sfc/rx.c, one to change
the rx_buf->is_page boolean into a set of u16 flags, and another to
adjust how ->ip_summed is initialized.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 21:55:51 -05:00
Linus Torvalds 203738e548 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
1) ICMP sockets leave err uninitialized but we try to return it for the
   unsupported MSG_OOB case, reported by Dave Jones.

2) Add new Zaurus device ID entries, from Dave Jones.

3) Pointer calculation in hso driver memset is wrong, from Dan
   Carpenter.

4) ks8851_probe() checks unsigned value as negative, fix also from Dan
   Carpenter.

5) Fix crashes in atl1c driver due to TX queue handling, from Eric
   Dumazet.  I anticipate some TX side locking fixes coming in the near
   future for this driver as well.

6) The inline directive fix in Bluetooth which was breaking the build
   only with very new versions of GCC, from Johan Hedberg.

7) Fix crashes in the ATP CLIP code due to ARP cleanups this merge
   window, reported by Meelis Roos and fixed by Eric Dumazet.

8) JME driver doesn't flush RX FIFO correctly, from Guo-Fu Tseng.

9) Some ip6_route_output() callers test the return value for NULL, but
   this never happens as the convention is to return a dst entry with
   dst->error set.  Fixes from RonQing Li.

10) Logitech Harmony 900 should be handled by zaurus driver not
   cdc_ether, update white lists and black lists accordingly.  From
   Scott Talbert.

11) Receiving from certain kinds of devices there won't be a MAC header,
   so there is no MAC header to fixup in the IPSEC code, and if we try
   to do it we'll crash.  Fix from Eric Dumazet.

12) Port type array indexing off-by-one in mlx4 driver, fix from Yevgeny
   Petrilin.

13) Fix regression in link-down handling in davinci_emac which causes
   all RX descriptors to be freed up and therefore RX to wedge
   completely, from Christian Riesch.

14) It took two attempts, but ctnetlink soft lockups seem to be
   cured now, from Pablo Neira Ayuso.

15) Endianness bug fix in ENIC driver, from Santosh Nayak.

16) The long ago conversion of the PPP fragmentation code over to
   abstracted SKB list handling wasn't perfect, once we get an
   out of sequence SKB we don't flush the rest of them like we
   should.  From Ben McKeegan.

17) Fix regression of ->ip_summed initialization in sfc driver.
   From Ben Hutchings.

18) Bluetooth timeout mistakenly using msecs instead of jiffies,
   from Andrzej Kaczmarek.

19) Using _sync variant of work cancellation results in deadlocks,
   use the non _sync variants instead.  From Andre Guedes.

20) Bluetooth rfcomm code had reference counting problems leading
   to crashes, fix from Octavian Purdila.

21) The conversion of netem over to classful qdisc handling added
   two bugs to netem_dequeue(), fixes from Eric Dumazet.

22) Missing pci_iounmap() in ATM Solos driver.  Fix from Julia Lawall.

23) b44_pci_exit() should not have __exit tag since it's invoked from
   non-__exit code.  From Nikola Pajkovsky.

24) The conversion of the neighbour hash tables over to RCU added a
   race, fixed here by adding the necessary reread of tbl->nht, fix
   from Michel Machado.

25) When we added VF (virtual function) attributes for network device
   dumps, this potentially bloats up the size of the dump of one
   network device such that the dump size is too large for the buffer
   allocated by properly written netlink applications.

   In particular, if you add 255 VFs to a network device, parts of
   GLIBC stop working.

   To fix this, we add an attribute that is used to turn on these
   extended portions of the network device dump.  Sophisticaed
   applications like 'ip' that want to see this stuff  will be changed
   to set the attribute, whereas things like GLIBC that don't care
   about VFs simply will not, and therefore won't be busted by the
   mere presence of VFs on a network device.

   Thanks to the tireless work of Greg Rose on this fix.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (53 commits)
  sfc: Fix assignment of ip_summed for pre-allocated skbs
  ppp: fix 'ppp_mp_reconstruct bad seq' errors
  enic: Fix endianness bug.
  gre: fix spelling in comments
  netfilter: ctnetlink: fix soft lockup when netlink adds new entries (v2)
  Revert "netfilter: ctnetlink: fix soft lockup when netlink adds new entries"
  davinci_emac: Do not free all rx dma descriptors during init
  mlx4_core: Fixing array indexes when setting port types
  phy: IC+101G and PHY_HAS_INTERRUPT flag
  netdev/phy/icplus: Correct broken phy_init code
  ipsec: be careful of non existing mac headers
  Move Logitech Harmony 900 from cdc_ether to zaurus
  hso: memsetting wrong data in hso_get_count()
  netfilter: ip6_route_output() never returns NULL.
  ethernet/broadcom: ip6_route_output() never returns NULL.
  ipv6: ip6_route_output() never returns NULL.
  jme: Fix FIFO flush issue
  atm: clip: remove clip_tbl
  ipv4: ping: Fix recvmsg MSG_OOB error handling.
  rtnetlink: Fix problem with buffer allocation
  ...
2012-02-26 12:47:17 -08:00
Eric Dumazet 6221217199 mlx4_en: dont change mac_header on xmit
A driver xmit function is not allowed to change skb without special
care.

mlx4_en_xmit() should not call skb_reset_mac_header() and instead should
use skb->data to access ethernet header.

This removes a dumb test : if (ethh && ethh->h_dest)

Also remove this slow mlx4_en_mac_to_u64() call, we can use
get_unaligned() to get faster code.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-26 14:22:05 -05:00
Eli Cohen a5bbe892da mlx4: Enforce device max FMR maps in FMR alloc
ConnectX devices have a limit on the number of mappings that can be
done on an FMR before having to call sync_tpt.  The current
mlx4_ib driver reports the limit correctly in max_map_per_fmr in
.query_device(), but mlx4_core doesn't check it when actually
allocating FMRs.

Add a max_fmr_maps field to struct mlx4_caps and enforce this maximum
value on FMR allocations.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-26 01:43:37 -08:00
Linus Torvalds b52b80023f One InfiniBand/RDMA regression fix for 3.3:
- mlx4 SR-IOV changes added static exported functions, which doesn't
    build on powerpc at least.  Fix from Doug Ledford for this.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPSEe5AAoJEENa44ZhAt0hIoUP/jwpxojruN8c7PuJesp3sne2
 45xxH0OsPSFdYTBnNZGtL3MWnzp88YjoggRvrKGmiEc/vitDTveL8/j+jFatl/d5
 o5yBq9wsSKgU3jJiczT21WBOIg2I8KGgdWozNSO8rUl++fHJGgH1sCSmf8biomnk
 dlHZbA4ZszH1bxHh406GK/+cP5jjKlTLqkixz/156fsopMxzHBaJycOmzSPpHl9s
 ykrVv3n/mhrc3zBgx5y9aU+LhcchZH6CBQOzLBks1c4w8AFXTxIMAQRhkBVnDksi
 zZhg5E05zRkynr27zebNnu6Y6hfWamfCHkM6krJLXL/QRfN0LmoLvdtW2HRffrHW
 9eOzEdmh9i5wHSeH5zhQAE17yttpWQCNN1thGv3oe2uGj7ItmtA1yc/5q8opwBU3
 bl4/saNVdK8DGtpGZjCmPBnsOl9IgMmYMn5haRmOhDvD4/B93OHl+2iQi50vPJHS
 bBvYG03OTHYs9QexDANDv3Q9VmxGRCBXcZkQblAqofgO3dLgCQdddwn0VHt9xDfK
 5Hp31vPs5pGfT6AkPHREHNNvj27Ve1erjrODNVI/Qr2CIkaoiQ/6oR+9ZMrAFsnb
 uyhNDzt4yuMrfA3T/7sAITL6hFLkVHYL2lSorFr5BXZ4cusYlzUYHRbtiR17Vsye
 1KNTElxLI9fmD5xJSY6N
 =mHhD
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

One InfiniBand/RDMA regression fix for 3.3:

 - mlx4 SR-IOV changes added static exported functions, which doesn't
   build on powerpc at least.  Fix from Doug Ledford for this.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  mlx4_core: Exported functions can't be static
2012-02-24 20:03:14 -08:00
Yevgeny Petrilin 1e0f03d57d mlx4_core: Fixing array indexes when setting port types
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-24 01:52:45 -05:00
Doug Ledford 22c8bff6fa mlx4_core: Exported functions can't be static
At least on powerpc, it breaks the build if exported functions are
static.  Fix some static exported functions introduced with the mlx4
SR-IOV support added in 3.3-rc1.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-22 23:00:38 -08:00
Yevgeny Petrilin 3d8f93083b mlx4: Setting new port types after all interfaces unregistered
In port type change flow, need to set the new port types only after
all interfaces have finished the unregister process.
Otherwise, during unregister, one of the interfaces might issue a SET_PORT
command with wrong port types, it can cause bad FW behavior.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:27:24 -05:00
Yevgeny Petrilin 730c41d5ba mlx4: Replacing pool_lock with mutex
Under the spinlock we call request_irq(), which allocates memory with GFP_KERNEL,
This causes the following trace when DEBUG_SPINLOCK is enabled, it can cause
the following trace:

 BUG: spinlock wrong CPU on CPU#2, ethtool/2595
 lock: ffff8801f9cbc2b0, .magic: dead4ead, .owner: ethtool/2595, .owner_cpu: 0
 Pid: 2595, comm: ethtool Not tainted 3.0.18 #2
 Call Trace:
 spin_bug+0xa2/0xf0
 do_raw_spin_unlock+0x71/0xa0
 _raw_spin_unlock+0xe/0x10
 mlx4_assign_eq+0x12b/0x190 [mlx4_core]
 mlx4_en_activate_cq+0x252/0x2d0 [mlx4_en]
 ? mlx4_en_activate_rx_rings+0x227/0x370 [mlx4_en]
 mlx4_en_start_port+0x189/0xb90 [mlx4_en]
 mlx4_en_set_ringparam+0x29a/0x340 [mlx4_en]
 dev_ethtool+0x816/0xb10
 ? dev_get_by_name_rcu+0xa4/0xe0
 dev_ioctl+0x2b5/0x470
 handle_mm_fault+0x1cd/0x2d0
 sock_do_ioctl+0x5d/0x70
 sock_ioctl+0x79/0x2f0
 do_vfs_ioctl+0x8c/0x340
 sys_ioctl+0xa1/0xb0
 system_call_fastpath+0x16/0x1b

Replacing with mutex, which is enough in this case.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-21 15:27:23 -05:00
Jack Morgenstein 3d7474734b mlx4_core: Do not map BF area if capability is 0
BF can be disabled in some cases, the capability field, bf_reg_size is set
to zero in this case. Don't map the BF area in this case, it would cause
failures.  In addition, leaving the BF area unmapped
also alerts the ETH driver to not use BF.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-20 19:26:34 -05:00
David S. Miller 32efe08d77 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_stats.c

Small minor conflict in bnx2x, wherein one commit changed how
statistics were stored in software, and another commit
fixed endianness bugs wrt. reading the values provided by
the chip in memory.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-19 16:03:15 -05:00
Eugenia Emantayev 9f5b6c632e mlx4: add unicast steering entries to resource_tracker
Add unicast steering entries to resource tracker.
Do qp_detach also for these entries when VF doesn't shut down gracefully.
Otherwise there is leakage of these resources, since they are not tracked.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15 14:50:16 -05:00
Eugenia Emantayev 2531188b47 mlx4: fix QP tree trashing
When adding new unicast steer entry, before moving qp to state ready,
actually before calling mlx4_RST2INIT_QP_wrapper(), there were added
a lot of entries with local_qpn=0 into radix tree.
This fact impacted the get_res() function and proper functioning
of resource tracker in addition to adding trash entries into radix tree.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@melllanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15 14:50:16 -05:00
Eugenia Emantayev 75c6062cb7 mlx4: fix buffer overrun
When passing MLX4_UC_STEER=1 it was translated to value 2
after mlx4_QP_ATTACH_wrapper. Therefore in new_steering_entry()
unicast steer entries were added to index 2 of array of size 2.
Fixing this bug by shift right to one position.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-15 14:50:15 -05:00
Eugenia Emantayev eb40d89276 mlx4: add unicast steering entries to resource_tracker
Add unicast steering entries to resource tracker.
Do qp_detach also for these entries when VF doesn't shut down gracefully.
Otherwise there is leakage of these resources, since they are not tracked.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:59 -05:00
Eugenia Emantayev f1f75f0e2b mlx4: attach multicast with correct flag
mlx4_multicast_attach/detach() should use always MLX4_MC_STEER flag

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev de9b43dbb8 mlx4: remove redundant adding of steering type to gid
mlx4_uc_steer_add/release() should not add MLX4_UC_STEER flag to gid.
It is added in mlx4_unicast_attach/detach().

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev deb8b3e849 mlx4: remove unnecessary variables and arguments
mlx4_qp_attach/detach_common() don't use hash variable, move it to find_entry()
static find_entry() in mcg.c doesn't use steer argument

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev 45b5136551 mlx4: remove unused field high_prios
Remove unnecessary field high_prios from mlx4_steer struct and initialization

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev 0ee9f1dd7b mlx4: fix QP tree trashing
When adding new unicast steer entry, before moving qp to state ready,
actually before calling mlx4_RST2INIT_QP_wrapper(), there were added
a lot of entries with local_qpn=0 into radix tree.
This fact impacted the get_res() function and proper functioning
of resource tracker in addition to adding trash entries into radix tree.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@melllanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:58 -05:00
Eugenia Emantayev 58a30d6a3c mlx4_core: fix buffer overrun
When passing MLX4_UC_STEER=1 it was translated to value 2
after mlx4_QP_ATTACH_wrapper. Therefore in new_steering_entry()
unicast steer entries were added to index 2 of array of size 2.
Fixing this bug by shift right to one position.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-14 14:11:57 -05:00
Axel Lin 758ff235b3 mlx4: Fix kcalloc parameters swapped
The first parameter should be "number of elements" and the second parameter
should be "element size".

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-13 16:00:58 -05:00
David S. Miller d5ef8a4d87 Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/infiniband/hw/nes/nes_cm.c

Simple whitespace conflict.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-10 23:32:28 -05:00
Thadeu Lima de Souza Cascardo 7e2eb99cc6 mlx4: fix DMA mapping leak when allocation fails
mlx4_en_prepare_rx_desc does not correctly clean up after it finds an
allocation failure. It should unmap a page before calling put_page, but
it only calls the later.

This bug would prevent a device removal using hotplug after setting the
device MTU to 9000 and opening the network interface. After the fix, we
still see the allocation failure with MTU 9000, but we are able to
remove the device.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 14:42:28 -05:00
Thadeu Lima de Souza Cascardo 68355f7113 mlx4: allow device removal by fixing dma unmap size
After opening the network interface, Mellanox ConnectX device cannot be
removed by hotplug because it has not properly unmapped all DMA memory.

It happens that mlx4_en_activate_rx_rings overrides the variable that
keeps the size of the memory mapped.

This is fixed by passing to mlx4_en_destroy_rx_ring the same size that is
given to mlx4_en_create_rx_ring.

After applying this patch, hot unplugging the device works after opening
the interface.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 14:42:28 -05:00
Eugenia Emantayev 4c41b36737 mlx4_core: use correct port for steering
Use port number for correct steering (list per port).
Before the fix all steering entries (for both physical ports)
were managed in first port structures, so we had leakage of resources
for port 2.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Eugenia Emantayev 4df9950406 mlx4_core: use correct flag for unicast_promisc
Use MLX4_DEV_CAP_FLAG_VEP_UC_STEER for unicast_promisc_add/remove
Unicast entries were managed in wrong data structures.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Eugenia Emantayev f08ad06c05 mlx4_core: fix memory leak at multi_func_cleanup
Perform cleanup also in non-master flow.
The VFs use communication channel as well.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 12:10:11 -05:00
Pradeep A Dalvi c056b734e5 netdev: ethernet dev_alloc_skb to netdev_alloc_skb
Replaced deprecating dev_alloc_skb with netdev_alloc_skb in drivers/net/ethernet
  - Removed extra skb->dev = dev after netdev_alloc_skb

Signed-off-by: Pradeep A Dalvi <netdev@pradeepdalvi.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-06 11:52:27 -05:00
Masanari Iida 8d9eb069ea mlx4: Fix typo in cmd.c
Correct spelling "reseting" to "resetting" in
drivers/net/ethernet/mellanox/mlx4/cmd.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-04 16:31:47 -05:00
Joe Perches 41de8d4cff drivers/net: Remove alloc_etherdev error messages
alloc_etherdev has a generic OOM/unable to alloc message.
Remove the duplicative messages after alloc_etherdev calls.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-31 16:20:48 -05:00
Joe Perches e404decb0f drivers/net: Remove unnecessary k.alloc/v.alloc OOM messages
alloc failures use dump_stack so emitting an additional
out-of-memory message is an unnecessary duplication.

Remove the allocation failure messages.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-31 16:20:21 -05:00
Marcel Apfelbaum 803143fbda mlx4_core: map async events to arbitrary slave eqs
Slave async events were mapped to single eq. This patch fixes this issue, so
the slaves can map the async events to any eq.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:44 -05:00
Marcel Apfelbaum 9fd7a1e147 mlx4_core: Fix mtt profile issue
Num mtts from profile is really the number of mtt segments.
Thus, in make profile, to get the proper number of MTT entries,
must multiply num_mtts by mtts per segment.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:44 -05:00
Marcel Apfelbaum eb41049f2f mlx4_core: removed function index from vf.
The Virtual Functions should not be aware their function number.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:43 -05:00
Eugenia Emantayev 93ece0c1a7 mlx4_en: eth statistics modification
In native mode display all available staticstics.
In SRIOV mode on VF display only SW counters statistics,
in SRIOV mode on hypervisor display SW counters and errors (got from FW)
statistics.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:43 -05:00
Eugenia Emantayev 35fb9afbde mlx4: VF is not allowed to perform dump stats
In multifunction mode - DUMP_STATS command is not executed
for VFs.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:43 -05:00
Eugenia Emantayev b477ba628a mlx4_en: clear all eth statistics when port goes up
Bug fix: Not all stats fields were cleared.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Reviewed-by: Yevgeny Petriln <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-22 15:08:43 -05:00
Yevgeny Petrilin 93d3e3678f mlx4_en: set number of rx rings used by RSS using ethtool
Value must be a power of 2 due to HW limitation.
Driver supports only 'equal' mode in ethtool and can't be set by using weights.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Reviewed-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-18 16:07:34 -05:00
Yevgeny Petrilin 1c015b3b82 mlx4_core: Elaborating limitation on VF port options
Showing which capabilities are not passed to VF
when executing QUERY_PORT

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-03 12:49:16 -05:00
Marcel Apfelbaum 1e27ca6944 mlx4_core: fix mtt range deallocation
The mtt range was allocated in mtt units but deallocated
in segments. Among the rest, this caused crash during hotplug removal

Reported-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Tested-by: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-01-03 12:49:16 -05:00
David S. Miller 455ffa607f Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2012-01-02 18:56:49 -05:00
Yevgeny Petrilin cd3109d23c mlx4_en: nullify cq->vector field when closing completion queue
Caused loss of connectivity when changing ring size.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-30 17:15:41 -05:00
Yevgeny Petrilin 95f56e7aa8 mlx4_core: limiting VF port options
At the moment VFs can only operate in Eth mode.
In addition we don't want the VF to attempt link sensing,
so we block this option as well.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 15:08:41 -05:00
Yevgeny Petrilin 46c4674754 mlx4_core: using array index for sense_allowed
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-29 15:08:41 -05:00
Axel Lin e143a1ada3 mlx4: Add missing include of linux/slab.h
Include linux/slab.h to fix below build error:

  CC      drivers/net/ethernet/mellanox/mlx4/resource_tracker.o
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_init_resource_tracker':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:233: error: implicit declaration of function 'kzalloc'
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:234: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mlx4_free_resource_tracker':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:264: error: implicit declaration of function 'kfree'
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_qp_tr':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:370: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_mtt_tr':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:386: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_mpt_tr':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:402: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_eq_tr':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:417: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_cq_tr':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:431: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_srq_tr':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:446: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'alloc_counter_tr':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:461: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'add_res_range':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:521: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'mac_add_to_slave':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:1193: warning: assignment makes pointer from integer without a cast
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c: In function 'add_mcg_res':
drivers/net/ethernet/mellanox/mlx4/resource_tracker.c:2521: warning: assignment makes pointer from integer without a cast
make[5]: *** [drivers/net/ethernet/mellanox/mlx4/resource_tracker.o] Error 1
make[4]: *** [drivers/net/ethernet/mellanox/mlx4] Error 2
make[3]: *** [drivers/net/ethernet/mellanox] Error 2
make[2]: *** [drivers/net/ethernet] Error 2
make[1]: *** [drivers/net] Error 2
make: *** [drivers] Error 2

Signed-off-by: Axel Lin <axel.lin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-26 15:18:36 -05:00
Yevgeny Petrilin 89efea25cd mlx4_en: FIX: Setting default_qpn before using it
When UDP RSS is enabled, we use same QPN for TCP and UDP ranges
The bug is that the default_qpn was used base UDP qpn before it
was set.
Fixes bug introduced in commit: 1202d460b1

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-20 13:31:36 -05:00
Rusty Russell eb93992207 module_param: make bool parameters really bool (net & drivers/net)
module_param(bool) used to counter-intuitively take an int.  In
fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy
trick.

It's time to remove the int/unsigned int option.  For this version
it'll simply give a warning, but it'll break next kernel version.

(Thanks to Joe Perches for suggesting coccinelle for 0/1 -> true/false).

Cc: "David S. Miller" <davem@davemloft.net>
Cc: netdev@vger.kernel.org
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 22:27:29 -05:00
Yevgeny Petrilin 72be84f1c2 mlx4: Fixing wrong error codes in communication channel
The communication channel is HW interface from PF point of view
So the command return status should be stored as HW error code
and only then translated to errno values.

Reporetd-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:07 -05:00
Yevgeny Petrilin 996b0541e7 mlx4: not using spin_lock_irq when getting vf by resource.
The function is always called from irq context, changing the call
to spin_lock().

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:07 -05:00
Alexander Guller 0e03567a2c mlx4_en: nullify cached multicast address list after cleanup
Solves an issue where we tried to free the same page twice after
the port has been opened and closed.

Signed-off-by: Alexander Guller <alexg@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:07 -05:00
Yevgeny Petrilin 8d0fc7b611 mlx4_core: Changing link sensing logic
New FW can give clues to driver regarding default port type
and whether or not we should default to link sensing on the port.

2 bits are added to QUERY_PORT command:
1. suggested_type: This bit gives a hint whether the default port type should be
   IB or Ethernet.
   The driver will use this hint in case the user didn't specify explicitly the link layer
   type he wants to set.
2. default_sense: If this bit is set, we would sense the port type on start-up
   and default the port to link sensing

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:07 -05:00
Yevgeny Petrilin 58a60168d1 mlx4: capability for link sensing
For ConnectX3 devices, we allow link sensing only if FW explicitly
reported it supports the feature.
For older versions (ConnectX1 and 2), if the card supports both link layer types
(Ethenet and Infiniband), link sensing is supported.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-19 14:57:06 -05:00
Joerg Roedel cb9ffb7694 mlx4: Fix compile error when driver is comiled-in
This patch fixes a compile error that occurs when the driver
is compile into the kernel and not as a module.

Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-16 15:28:10 -05:00
Yevgeny Petrilin 6edf91da43 mlx4_en: updated driver version to 2.0
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:08 -05:00
Yevgeny Petrilin 7d4b6bcce0 mlx4_core: updated driver version to 1.1
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:08 -05:00
Jack Morgenstein ab9c17a009 mlx4_core: Modify driver initialization flow to accommodate SRIOV for Ethernet
1. Added module parameters sr_iov and probe_vf for controlling enablement of
   SRIOV mode.
2. Increased default max num-qps, num-mpts and log_num_macs to accomodate
   SRIOV mode
3. Added port_type_array as a module parameter to allow driver startup with
   ports configured as desired.
   In SRIOV mode, only ETH is supported, and this array is ignored; otherwise,
   for the case where the FW supports both port types (ETH and IB), the
   port_type_array parameter is used.
   By default, the port_type_array is set to configure both ports as IB.
4. When running in sriov mode, the master needs to initialize the ICM eq table
   to hold the eq's for itself and also for all the slaves.
5. mlx4_set_port_mask() now invoked from mlx4_init_hca, instead of in mlx4_dev_cap.
6. Introduced sriov VF (slave) device startup/teardown logic (mainly procedures
   mlx4_init_slave, mlx4_slave_exit, mlx4_slave_cap, mlx4_slave_exit and flow
   modifications in __mlx4_init_one, mlx4_init_hca, and mlx4_setup_hca).
   VFs obtain their startup information from the PF (master) device via the
   comm channel.
7. In SRIOV mode (both PF and VF), MSI_X must be enabled, or the driver
   aborts loading the device.
8. Do not allow setting port type via sysfs when running in SRIOV mode.
9. mlx4_get_ownership:  Currently, only one PF is supported by the driver.
   If the HCA is burned with FW which enables more than one PF, only one
   of the PFs is allowed to run.  The first one up grabs a FW ownership
   semaphone -- all other PFs will find that semaphore taken, and the
   driver will not allow them to run.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Liran Liss <liranl@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:08 -05:00
Jack Morgenstein d81c7186aa mlx4_core: adjust catas operation for SRIOV mode
When running in SRIOV mode, driver should not automatically start/stop
the mlx4_core upon sensing an HCA internal error -- doing this disables/enables
sriov, which will cause the hypervisor to hang if there are running VMs with
attached VFs.

In addition, on VMs the catas process should not run at all, since the HCA
error buffer is not available to VMs in the BARs.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:08 -05:00
Marcel Apfelbaum 2b8fb2867c mlx4_core: mtts resources units changed to offset
In the previous implementation mtts are managed by:
1. order     - log(mtt segments), 'mtt segment' groups several mtts together.
2. first_seg - segment location relative to mtt table.
In the current implementation:
1. order     - log(mtts) rather than segments
2. offset    - mtt index in mtt table

Note: The actual mtt allocation is made in segments but it is
      transparent to callers.

Rational: The mtt resource holders are not interested on how the allocation
          of mtt is done, but rather on how they will use it.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Eugenia Emantayev 5b4c4d3686 mlx4_en: Allow communication between functions on same host
To enable internal loopback, always fill DMAC in control segment
when transmitting the packet, once this is done, the packet is subject
for loopback for if the DMAC mathces one of the multicast/unicast addresses
registered on the physical port.
In receive path if source MAC is our own MAC and we are not in selftest,
or not in force LB mode - drop this packet.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Eugenia Emantayev ffe455ad04 mlx4: Ethernet port management modifications
The physical port is now common to the PF and VFs.
The port resources and configuration is managed by the PF, VFs can
only influence the MTU of the port, it is set as max among all functions,
Each function allocates RX buffers of required size to meet it's MTU enforcement.
Port management code was moved to mlx4_core, as the mlx4_en module is
virtualization unaware

Move handling qp functionality to mlx4_get_eth_qp/mlx4_put_eth_qp
including reserve/release range and add/release unicast steering.
Let mlx4_register/unregister_mac deal only with MAC (un)registration.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Eugenia Emantayev 0ec2c0f86d mlx4: Traffic steering management support for SRIOV
Let multicast/unicast attaching flow go through resource tracker.
The PF is the one responsible for managing all the steering entries.
Define and use module parameter that determines the number of qps
per multicast group.
Minor changes in function calls according to changed prototype.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Eli Cohen c82e9aa0a8 mlx4_core: resource tracking for HCA resources used by guests
The resource tracker is used to track usage of HCA resources by the different
guests.

Virtual functions (VFs) are attached to guest operating systems but
resources are allocated from the same pool and are assigned to VFs. It is
essential that hostile/buggy guests not be able to affect the operation of
other VFs, possibly attached to other guest OSs since ConnectX firmware is not
tolerant to misuse of resources.

The resource tracker module associates each resource with a VF and maintains
state information for the allocated object. It also defines allowed state
transitions and enforces them.

Relationships between resources are also referred to. For example, CQs are
pointed to by QPs, so it is forbidden to destroy a CQ if a QP refers to it.

ICM memory is always accessible through the primary function and hence it is
allocated by the owner of the primary function.

When a guest dies, an FLR is generated for all the VFs it owns and all the
resources it used are freed.

The tracked resource types are: QPs, CQs, SRQs, MPTs, MTTs, MACs, RES_EQs,
and XRCDNs.

Signed-off-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:07 -05:00
Jack Morgenstein acba2420f9 mlx4_core: Add wrapper functions and comm channel and slave event support to EQs
Passing async events to slaves:
In SRIOV mode, each slave creates its own async EQ, but only the master can
register directly with the FW to receive async events.  Async events which
should be passed to slaves (such as a WQ_ACCESS_ERROR for a QP owned by a slave)
are generated at the slave by the master using the GEN_EQE FW command.

Wrapper functions: mlx4_MAP_EQ_wrapper
Only the master can map an EQ. The slave commands to map their EQs arrive
at the master via the comm channel.  The master then invokes the wrapper
function to do the work (and enter the resource in the tracking database).

New events: COMM_CHANNEL and FLR
The COMM_CHANNEL event arrives only at the master, and signals that
a slave has posted a command on the comm channel.
The FLR event is generated by the FW when a guest operating a VF
unexpectedly goes down.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:06 -05:00
Jack Morgenstein ea51b377ab mlx4_core: mtt modifications for SRIOV
MTTs are resources which are allocated and tracked by the PF driver.
In multifunction mode, the allocation and icm mapping is done in
the resource tracker (later patch in this sequence).

To accomplish this, we have "work" functions whose names start with
"__", and "request" functions (same name, no __). If we are operating
in multifunction mode, the request function actually results in
comm-channel commands being sent (ALLOC_RES or FREE_RES).
The PF-driver comm-channel handler will ultimately invoke the
"work" (__) function and return the result.

If we are not in multifunction mode, the "work" handler is invoked
immediately.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:06 -05:00
Jack Morgenstein d7233386b2 mlx4_core: cq modifications for SRIOV
CQs are resources which are allocated and tracked by the PF driver.
In multifunction mode, the allocation and icm mapping is done in
the resource tracker (later patch in this sequence).

To accomplish this, we have "work" functions whose names start with
"__", and "request" functions (same name, no __). If we are operating
in multifunction mode, the request function actually results in
comm-channel commands being sent (ALLOC_RES or FREE_RES).
The PF-driver comm-channel handler will ultimately invoke the
"work" (__) function and return the result.

If we are not in multifunction mode, the "work" handler is invoked
immediately.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:06 -05:00
Jack Morgenstein fe9a2603c5 mlx4_core: qp modifications for SRIOV
QPs are resources which are allocated and tracked by the PF driver.
In multifunction mode, the allocation and icm mapping is done in
the resource tracker (later patch in this sequence).

To accomplish this, we have "work" functions whose names start with
"__", and "request" functions (same name, no __). If we are operating
in multifunction mode, the request function actually results in
comm-channel commands being sent (ALLOC_RES or FREE_RES).
The PF-driver comm-channel handler will ultimately invoke the
"work" (__) function and return the result.

If we are not in multifunction mode, the "work" handler is invoked
immediately.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:06 -05:00
Jack Morgenstein 3ec65b2be5 mlx4_core: srq modifications for SRIOV
SRQs are resources which are allocated and tracked by the PF driver.
In multifunction mode, the allocation and icm mapping is done in
the resource tracker (later patch in this sequence).

To accomplish this, we have "work" functions whose names start with
"__", and "request" functions (same name, no __). If we are operating
in multifunction mode, the request function actually results in
comm-channel commands being sent (ALLOC_RES or FREE_RES).
The PF-driver comm-channel handler will ultimately invoke the
"work" (__) function and return the result.

If we are not in multifunction mode, the "work" handler is invoked
immediately.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:06 -05:00
Marcel Apfelbaum 5cc914f108 mlx4_core: Added FW commands and their wrappers for supporting SRIOV
The following commands are added here:
1. QUERY_FUNC_CAP and its wrapper.  This function is used by VFs when
   they start up to receive configuration information from the PF, such
   as resource quotas for this VF, which ports should be used (currently
   two), what protocol is running on the port (currently Ethernet ONLY,
   or port not active).

2. QUERY_PORT and its wrapper. Previously, this FW command was invoked directly
   by the ETH driver (en_port.c) using mlx4_cmd_box. Virtualization is now
   required here (the VF's MAC address must be substituted for the PFs
   MAC address returned by the FW). We changed the invocation
   in the ETH driver to use mlx4_QUERY_PORT, and added the wrapper.

3. QUERY_HCA. Used by the VF to determine how the HCA was initialized.
   For now, we need only the multicast table member entry size
   (log2_mc_table_entry_sz, in the ConnectX PRM).  No wrapper is needed
   here, because the data may be passed as is to the VF without modification).

   In this command, we have added a GLOBAL_CAPS field for passing required
   configuration information from FW to a VF (this field is to allow safely
                   adding new SRIOV capabilities which require support in VF drivers, too).
   Bits will set here by FW in response to PF-driver configuration commands which
   will activate as yet undefined new SRIOV features. The VF will test to see that
   all required capabilities indicated by this field are supported (i.e., if a bit
   is set and the VF driver does not recognize that bit, it must abort
   its initialization).  Currently, no bits are set.

4. Added a CLOSE_PORT wrapper.  The PF context needs to keep track of how many VF contexts
   have the port open.  The PF context will not actually issue the FW close port command
   until the last port user issues a CLOSE_PORT request.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: Marcel Apfelbaum <marcela@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:06 -05:00
Yevgeny Petrilin e8f081aacd net/mlx4_core: Implement the master-slave communication channel
When SRIOV is enabled, pf and vfs communicate via shared comm channel.
The vf gets its side of the comm channel via a VF BAR.
Each VF (slave) creates its vHCR (virtual HCA Command Register),
Its DMA address is passed to the PF (master) using Communication Channel Register.
The same Register is used to notify the master of commands posted by the
slaves and for the master to pass events to the slaves, such as command completions
and asynchronous events.

The vHCR format is identical to the HCR format, except for the 'go' and 't' bits,
which are reserved in the vHCR. Posting commands to the vHCR is identical to
the way it is done with the HCR, albeit that the function/PF token fields are
used instead of the HCR go bit.
Specifically:
- When the function prepares a new command in the vHCR, it issues the Post_vHCR_cmd
  communication channel command and toggles the value of the function token;
  when PF token has an equal value, the command has been accepted and a new command may be posted.
- When the PF detects a Post_vHCR_cmd command, it concludes that a new command is available in the vHCR;
  after processing the command, the PF toggles the PF token to match the function token.

When the 'e' bit is not set, the completion of a Post_vHCR_cmd command also indicates
the completion the vHCR command. If, however, the 'e' bit is set, the completion of a
Post_vHCR_cmd command only indicates that the vHCR command has been accepted for execution by the PF.

Function commands are processed by the PF as follows:
-DMA (using the ACCESS_MEM command) the vHCR image into a shadow buffer.
-Validate that the opcode is non-privileged, and that the opcode- and input-modifiers are legal.
-DMA the in-box (if required) into a shadow buffer.
-Validate the command:
	o Resource ranges (e.g., QP ranges).
	o Partition key.
	o Ranges of referenced resources (e.g., CQs within QP contexts).
-If the 'e' bit is set
	o complete the Post_vHCR_cmd command
-Execute the command on the HCR.
-DMA the results to the vHCR out-box (if required).
-If the 'e' bit is set
	o Indicate command completion by generating a completion event using the GEN_EQE command
-Otherwise
	o DMA the command status to the vHCR
	o Complete the Post_vHCR_cmd command

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrillin <yevgenyp@mellanox.com>
Signed-off-by: Liran Liss <liranl@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:05 -05:00
Jack Morgenstein f5311ac109 mlx4_core: Reduce number of PD bits to 17
When SRIOV is enabled on the chip (at FW burning time),
the HCA uses only 17 bits for the PD. The remaining 7 high-order bits
are ignored.

Change the allocator to return only 17 bits for the PD.  The MSB 7
bits will be used to encode the slave number for consistency
checking later on in the resource tracker.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:05 -05:00
Jack Morgenstein f9baff509f mlx4_core: Add "native" argument to mlx4_cmd and its callers (where needed)
For SRIOV, some Hypervisor commands can be executed directly (native = 1).
Others should go through the command wrapper flow (for tracking resource
usage, for example, or for changing some HCA configurations that slaves
need to be notified of).

This patch sets the groundwork for this capability -- adding the correct
value of "native" in each case.

Note that if SRIOV is not activated, this parameter has no effect.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:05 -05:00
Jack Morgenstein 65dab25deb mlx4: Extanding port_mask functionality
Port mask now has additional state.
Port can be set as "none". In this case neither the mlx4_en or mlx4_ib
drivers take ownership of the port.
In multifunction mode there is an option to set the vfs as single ported devices.
(in single function mode, both physical ports belong to same function)

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:05 -05:00
Jack Morgenstein 623ed84b1f mlx4_core: initial header-file changes for SRIOV support
These changes will not affect module operation as yet. They
are only to get some structs and enums in place for use by
subsequent patches (making those smaller).

Added here:
* sriov state structs and inlines (mlx4_is_master/slave/mfunc)
* comm-channel and vhcr support structures
* enum values for new FW and comm-channel virtual commands
  (i.e., commands, passed via the comm channel to the PF-driver).
* prototypes for many command wrapper functions (used by the
  PF context for processing FW commands passed to it by the VFs).
* struct mlx4_eqe is moved from eq.c to mlx4.h (it will be used
  by other mlx4_core source files).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-13 13:56:05 -05:00
Jiri Pirko 8e586137e6 net: make vlan ndo_vlan_rx_[add/kill]_vid return error value
Let caller know the result of adding/removing vlan id to/from vlan
filter.

In some drivers I make those functions to just return 0. But in those
where there is able to see if hw setup went correctly, return value is
set appropriately.

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-12-08 19:52:37 -05:00
Amir Vadai c140d769c2 net/mlx4_en: bug fix for the case of vlan id 0 and UP 0
When using vlan 0 and UP 0, vlan header wasn't placed.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-27 17:17:04 -05:00
Amir Vadai 60d6fe99e4 net/mlx4_en: adding loopback support
Device must be in promiscuous mode or DMAC must be same as the host MAC, or
else packet will be dropped by the HW rx filtering.

Signed-off-by: Amir Vadai <amirv@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-27 17:17:04 -05:00
Oren Duer 559a9f1d35 net/mlx4_en: fix WOL handlers were always looking at port2 capability bit
There are 2 capability bits for WOL, one for each port.
WOL handlers were looking only on the second bit, regardless of the port.

Signed-off-by: Oren Duer <oren@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-27 17:17:04 -05:00
Yevgeny Petrilin f0ab34f011 net/mlx4_en: using non collapsed CQ on TX
Moving to regular Completion Queue implementation (not collapsed)
Completion for each transmitted packet is written to new entry.

Signed-off-by: Yevgeny Petrilin <yevgenyp@mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-27 17:17:04 -05:00