If the chip is no longer usable, LEDs should be turned off so system
can be found easily in the cluster.
Also some minor reorganizing so both chips print hardware error
message at same point and only if there were unrecovered errors
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Re-init of the kernel structures after a chip reset was leaving the
portdata structure for port zero in an inconsistent state, and a
pointer to it either stale (in re-init code) or NULL (in devdata)
Fixing the order of operations on this struct, and the condition for
interrupt access, prevents the crashes.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Due to a chip bug, the PIOAvail register is not always updated to
memory. This patch allows userspace to force an update.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
In initialization, if we bailed at chip specific initialization, we
forgot to clean up the irq we had requested.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a bug where multicast packets without a GRH were not
being dropped as per the IB spec.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the module parameter "kpiobufs" is set too high, the calculation to
reset it to a sane value was incorrect.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix RDMA read response length checking for RDMA_READ_RESPONSE_ONLY to
allow a zero length response. RDMA read responses which don't match
the expected length or occur in response to some other operation
should generate a completion queue error (see table 56, ch. 9.9.2.3 in
the IB spec).
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Improve port-sharing performance by allowing any process to receive
packets from the shared hardware port under a spin lock for mutual
exclusion. Previously, one process was nominated as the master and
that process was responsible for receiving all packets from the shared
hardware port and either consuming them or forwarding them to their
destination. This led to starvation problems for other processes when
the master process was busy in computation phases.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The port sharing feature mixed kernel virtual addresses as well as
physical addresses for the offset used to describe the mmap address to
map the InfiniPath hardware into user space. This had a conflict on
powerpc. The new scheme converts it to a physical address so it
doesn't conflict with chip addresses and yet still fits in 40/44 bits
so it isn't truncated by 32-bit applications calling mmap64().
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a receive work request has been removed from the queue but has not
had a CQ entry generated for it and the QP is modified to the error
state, the completion entry generated is incorrect. This patch fixes
the problem.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Code was converted from a &= ~mask to clear_bit, but the bit was left
shifted instead of being used directly, so we were either trashing
memory several pages away, or sometimes taking a kernel page fault on
an invalid page.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Some types of packet errors are moderately common with longer IB
cables and large clusters, and are not reported with prints by other
IB HCA drivers. This suppresses those messages unless the new
__IPATH_ERRPKTDBG bit is set in ipath_debug. Reporting of temporarily
disabled frequent error interrupts was also made clearer
We also distinguish between chip errors, and bad packets sent or
received in the wording of the messages.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes a number of bugs with updating the PSN for retries of
RC requests.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When switching to the QP error state, the completion queue entries
(error or flush) were not being generated correctly.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipath_dbg doesn't need the same prefixes that printk does.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch adds support for multiple RDMA reads and atomics to be sent
before an ACK is required to be seen by the requester.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If a post send is done in loopback and there is no receive queue
entry, the sending QP is put on a timeout list for a while so the
receiver has a chance to post a receive buffer. If the another post
send is done, the code incorrectly tried to put the QP on the timeout
list again an corrupted the timeout list. This eventually leads to a
spin lock deadlock NMI due to the timer function looping forever with
the lock held.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A silly programming error causes a CQ entry to not be generated if a
SRQ limit event is generated.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
A recent change was made to allocate memory for a port after CPU
affinity is set. That change didn't account for subports and was
trying to allocate memory for the port twice.
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The chip documentation on the expected TID vs eager TID parity error
bits was reversed from what was implemented in the RTL, for both
chips. This corrects the definitions.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The loop which initializes the user memory region from an array of
pages was using the wrong limit for the array. This worked OK when
dma_map_sg() returned the same number as the number of pages. This
patch fixes the problem.
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This is a sticky state. It is useful for diagnosing problems with
boards versus cable/switch problems.
Signed-off-by: Dave Olson <dave.olson@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There's no point in printing the opcode field in the completion
handling debugging output, since the type of completion is already
printed at the beginning of the line. In fact the opcode field is not
even defined for completions with a status other than success.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The current ib_umad code never accesses bits past IB_UMAD_MAX_PORTS in
dev_map[]. We shouldn't declare it to be twice as big.
Pointed-out-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Hal Rosenstock <halr@voltaire.com>
In mthca_arbel_fmr_unmap(), the high bits of the key are masked off.
This gets rid of the effect of adjust_key(), which makes sure that
bits 3 and 23 of the key are equal when the Sinai throughput
optimization is enabled, and so it may happen that an FMR will end up
with bits 3 and 23 in the key being different. This causes data
corruption, because when enabling the throughput optimization, the
driver promises the HCA firmware that bits 3 and 23 of all memory keys
will always be equal.
Fix by re-applying adjust_key() after masking the key.
Thanks to Or Gerlitz for reproducing the problem, and Ariel Shahar for
help in debug.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@samba.org>
As of commit 6cdbd77e ("cxgb3 - missing CPL hanler and register
setting."), the cxgb3 ethernet NIC driver no longer handles SET_TCB
replies, so we need to do it in the iWARP driver.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Acked-by: Divy Le Ray <divy@chelsio.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Receive buffers need to be mapped with DMA_FROM_DEVICE. Incorrectly
mapping with DMA_TO_DEVICE causes a hard lock on ppc64 machines with
an IOMMU.
This fixes <https://bugs.openfabrics.org/show_bug.cgi?id=431>
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
When a connection is terminated asynchronously from the iSCSI layer's
perspective, iSER needs to notify the iSCSI layer that the connection
has failed. This is done using a workqueue (switched to from the iSER
tasklet context). Meanwhile, the connection object (that holds the
work struct) is released. If the workqueue function wasn't called
yet, it will be called later with a NULL pointer, which will crash the
kernel.
The context switch (tasklet to workqueue) is not required, and
everything can be done from the iSER tasklet. This eliminates the NULL
work struct bug (and simplifies the code).
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/iser: Handle aborting a command after it is sent
IB/mthca: Fix thinko in init_mr_table()
RDMA/cxgb3: Fix resource leak in cxio_hal_init_ctrl_qp()
The SCSI midlayer may abort a command that was already sent. If the
initiator is still trying to send the command (or data-out PDUs for
that command), the QP may time out after the midlayer times
out. Therefore, when aborting the command, iSER may still have
references for the command's buffers. When sending these PDUs, the
sends will complete with an error and their resources will be released
then.
Signed-off-by: Erez Zilber <erezz@voltaire.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Commit c20e20ab ("IB/mthca: Merge MR and FMR space on 64-bit systems")
swapped the number of MTTs and MPTs when initializing the MR table. As
a result, we get a kernel oops when the number of MTT segments
allocated exceeds 0x20000.
Noted by Troy Benjegerdes <troy@scl.ameslab.gov>, and reproduced by
Dotan Barak <dotanb@mellanox.co.il>. This fixes
https://bugs.openfabrics.org/show_bug.cgi?id=490
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This was spotted by the Coverity checker (CID 1554).
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
->neigh_destructor() is killed (not used), replaced with
->neigh_cleanup(), which is called when neighbor entry goes to dead
state. At this point everything is still valid: neigh->dev,
neigh->parms etc.
The device should guarantee that dead neighbor entries (neigh->dead !=
0) do not get private part initialized, otherwise nobody will cleanup
it.
I think this is enough for ipoib which is the only user of this thing.
Initialization private part of neighbor entries happens in ipib
start_xmit routine, which is not reached when device is down. But it
would be better to add explicit test for neigh->dead in any case.
Signed-off-by: David S. Miller <davem@davemloft.net>
The packet length checks in ipoib are broken: we add 4 bytes (IPoIB
encapsulation header) when sending a packet, not 20 bytes (hardware
address length) to each packet. Therefore, if connected mode is
enabled so that the interface MTU is larger than the multicast MTU,
IPoIB may end up trying to send too-long multicast packets. For
example, multicast is broken if a message of size 2048 bytes is sent
on an interface with UD MTU 2048, because 2048 is bigger than the real
limit of 2044 but the code tests against the wrong limit of 2060.
This patch fixes <https://bugs.openfabrics.org/show_bug.cgi?id=418>,
submitted by Scott Weitzenkamp <sweitzen@cisco.com>.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The connected mode code added the possibility that an neigh struct
gets freed in the list_for_each_entry() loop in path_rec_completion(),
which causes a use-after-free. Fix this by changing to the _safe
variant of the list walking macro.
This was spotted by the Coverity checker (CID 1567).
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
eHCA scaling code must not depend on register_cpu_notifier() if
CONFIG_HOTPLUG_CPU is not set, so put all related code into #ifdefs.
Signed-off-by: Joachim Fenkes <fenkes@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
There's a race between ipoib_mcast_leave() and ipoib_mcast_join_finish()
where we can try to detach from a multicast group before we've
attached to it. Fix this by reordering the code in ipoib_mcast_leave
to free the multicast group first, which waits for the multicast
callback thread (which calls ipoib_mcast_join_finish()) to complete
before detaching from the group.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The sense of the time_after_eq() test in ipoib_cm_stale_task() is
reversed so that only non-stale connections are reaped. Fix this by
changing to time_before_eq().
Noticed by Pradeep Satyanarayana <pradeep@us.ibm.com>.
Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch renames DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH to avoid
confusion with the drivers default values (DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
is the iscsi RFC specific default).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Do netif_carrier_on() right after the IPv4 broadcast multicast group
is joined, rather than waiting for all of the initial set of multicast
group joins to finish. This allows at least IPv4 traffic to limp
along on broken fabrics where not all multicast groups can be joined.
Signed-off-by: Shirley Ma <xma@us.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change the returned error code to ENOMEM if the connection event
backlog is full. This prevents the ib_cm from issuing a reject
on the connection, which can allow retries to succeed.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix memory region permission problems:
- remove useless and redundant iwch_mem_perms enum.
- create ib_to_tpt_access_rights() for mapping ib access rights
to T3 TPT permissions.
- create ib_to_mwbind_access_rights() for mapping ib access rights
to T3 MWBIND WR permissions.
- fix up the mem reg code to utilize the new functions.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Only print one AE error for a given connection in the kernel log.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Stop the endpoint timer when the MPA exchange is aborted by the peer.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Change iwch_destroy_qp() to always move the QP to ERROR and let
iwch_modify_qp() decide what to do.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fixes for "normal close" failures:
- Start normal close timer when moving to CLOSING state.
- Handle ABORTING state in close_con_rpl().
- Stop timer correctly on abort during a normal close.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
cxgb3 uses dma_alloc_coherent() et al. thus needs linux/dma-mapping.h
include in order to build reliably.
Noticed on sparc64.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The struct rdma_bind_list fields for hlist are not being initialized,
resulting in a corrupted list. Fix this by using kzalloc() to make
sure all pointers are NULL.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If the consumer rejects the connection we end up under-referencing the
endpoint structure. The fix is to call iwch_ep_disconnect() instead
of the low level disconnect functions so that the endpoint close timer
is started correctly.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The garbled logic in mthca_alloc_memfree() causes it to return 0, even
if it fails to allocate all doorbell records. Fix it to return -ENOMEM
when it fails.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch fixes two issues reported by Roland Dreier and Christoph Hellwig:
- Mismatched sync/locking between completion handler and destroy cq We
introduced a counter nr_events per cq to track number of irq events
seen. This counter is incremented when an event queue entry is seen
and decremented after completion handler has been called regardless
if scaling code is active or not. Note that nr_callbacks tracks
number of events assigned to a cpu and both counters can potentially
diverge.
The sync between running completion handler and destroy cq is done
by using the global spin lock ehca_cq_idr_lock.
- Replace yield by wait_event on the counter above to become zero.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
An asynchronous event carries the port number that the event occurred
on, so there's no reason for an IPoIB interface to process an event
associated with a different local HCA port.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If path_rec_completion() is passed a non-NULL path record pointer
along with an unsuccessful status value, the tracing code incorrectly
prints the (invalid) DLID from the path record rather than the more
interesting status code. The actual logic of the function correctly
uses the path record only if the status indicates a successful lookup.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Stop the ep timer in ec_status() if the status indicates a
bad close.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
- don't mark static functions in C files as inline - gcc should know
best whether inlining makes sense
- never compile the unused cxio_dbg.c
- make the following needlessly global functions static:
- cxio_hal.c: cxio_hal_clear_qp_ctx()
- iwch_provider.c: iwch_get_qp()
- remove the following unused global functions:
- cxio_hal.c: cxio_allocate_stag()
- cxio_resource.: cxio_hal_get_rhdl()
- cxio_resource.: cxio_hal_put_rhdl()
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The cm_device references an ib_device, which already contains the node_guid.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The rdma_cm requires that path records be reversible. Set the
reversible bit when issuing an path record query.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The hop_limit value in the ah_attr should be 0xFF, not the value read
from the received GRH (which should be 0). See 13.5.4.4 in the 1.2 IB
spec.
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
If no matching PD is found in ib_uverbs_reg_mr(), then the function
jumps to err_release without setting the return value ret. This means
that ret will hold the return value of the call to ib_umem_get() a few
lines earlier; if the function reaches the point where it looks for
the PD, we know that ib_umem_get() must have returned 0, so
ib_uverbs_reg_mr() ends up return 0 for a bad PD ID. Fix this by
setting ret to -EINVAL before jumping to the exit path when no PD is
found.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Now that low-level drivers handle the conversion from an absolute rate
to a relative rate, there's no need for the IPoIB driver to keep track
of the local port's data rate.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Avoid the overhead of freeing/reallocating and mapping/unmapping for
DMA pages that have not been written to by hardware.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
This patch makes the needlessly global functions mthca_tavor_write_mtt_seg()
and mthca_arbel_write_mtt_seg() static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/bunk/trivial: (25 commits)
Documentation/kernel-docs.txt update.
arch/cris: typo in KERN_INFO
Storage class should be before const qualifier
kernel/printk.c: comment fix
update I/O sched Kconfig help texts - CFQ is now default, not AS.
Remove duplicate listing of Cris arch from README
kbuild: more doc. cleanups
doc: make doc. for maxcpus= more visible
drivers/net/eexpress.c: remove duplicate comment
add a help text for BLK_DEV_GENERIC
correct a dead URL in the IP_MULTICAST help text
fix the BAYCOM_SER_HDX help text
fix SCSI_SCAN_ASYNC help text
trivial documentation patch for platform.txt
Fix typos concerning hierarchy
Fix comment typo "spin_lock_irqrestore".
Fix misspellings of "agressive".
drivers/scsi/a100u2w.c: trivial typo patch
Correct trivial typo in log2.h.
Remove useless FIND_FIRST_BIT() macro from cardbus.c.
...
Correct mis-spellings of "algorithm", "appear", "consistent" and
(shame, shame) "kernel".
Signed-off-by: Robert P. J. Day <rpjday@mindspring.com>
Signed-off-by: Adrian Bunk <bunk@stusta.de>
The static rate from the path record should be put into the address
vector -- a long time ago the rate in the address attributes needed to
be a relative rate, which required more munging, but now that the
conversion from absolute to relative is done in the low-level driver,
it's easy for ib_init_ah_from_path() to put the absolute rate in.
Cc: Jason Gunthorpe <jgunthorpe@obsidianresearch.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Extend rdma_cm to support multicast communication. Multicast support
is added to the existing RDMA_PS_UDP port space, as well as a new
RDMA_PS_IPOIB port space. The latter port space allows joining the
multicast groups used by IPoIB, which enables offloading IPoIB traffic
to a separate QP. The port space determines the signature used in the
MGID when joining the group. The newly added RDMA_PS_IPOIB also
allows for unicast operations, similar to RDMA_PS_UDP.
Supporting the RDMA_PS_IPOIB requires changing how UD QPs are initialized,
since we can no longer assume that the qkey is constant. This requires
saving the Q_Key to use when attaching to a device, so that it is
available when creating the QP. The Q_Key information is exported to
the user through the existing rdma_init_qp_attr() interface.
Multicast support is also exported to userspace through the rdma_ucm.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The IB SA tracks multicast join/leave requests on a per port basis and
does not do any reference counting: if two users of the same port join
the same group, and one leaves that group, then the SA will remove the
port from the group even though there is one user who wants to stay a
member left. Therefore, in order to support multiple users of the
same multicast group from the same port, we need to perform reference
counting locally.
To do this, add an multicast submodule to ib_sa to perform reference
counting of multicast join/leave operations. Modify ib_ipoib (the
only in-kernel user of multicast) to use the new interface.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
ipoib_cm_alloc_rx_skb() might be called from IRQ context, so it must
use dev_kfree_skb_any(), not kfree_skb().
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Remove the Open Grid Computing copyright. It shouldn't be there.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For T3B devices, mark user QP in error once we transition
to TERMINATE.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
iwcm iw_cm_id destruction race condition fixes:
- iwcm_deref_id() always wakes up if there's another reference.
- clean up race condition in cm_work_handler().
- create static void free_cm_id() which deallocs the work entries and then
kfrees the cm_id memory. This reduces code replication.
- rem_ref() if this is the last reference -and- the IWCM owns freeing the
cm_id, then free it.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Tom Tucker <tom@opengridcomputing.com>
Acked-by: Krishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Set the port phys state as returned from ehca_query_port() to LINK_UP.
ehca actually represents a logical HCA, whose phys/link state always
is LINK_UP.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Allow users to en/disable scaling code when loading ib_ehca module,
rather than requiring the module to be rebuilt to change the setting.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Fix a race condition in find_next_cpu_online() and some other locking
issues in ehca scaling code.
Signed-off-by: Hoang-Nam Nguyen <hnguyen@de.ibm.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The change to allow allocating ICM chunks from coherent memory did not
increment the count of sg entries properly, so a chunk that required
more than allocation would not be mapped properly by the HCA.
Fix this by adding the missing increment of chunk->nsg.
Signed-off-by: Roland Dreier <rolandd@cisco.com>
RESET->RESET is an allowed QP state transition, so mthca should handle
it correctly, by just returning success without involving the firmware.
Signed-off-by: Dotan Barak <dotanb@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
After Al Viro (finally) succeeded in removing the sched.h #include in module.h
recently, it makes sense again to remove other superfluous sched.h includes.
There are quite a lot of files which include it but don't actually need
anything defined in there. Presumably these includes were once needed for
macros that used to live in sched.h, but moved to other header files in the
course of cleaning it up.
To ease the pain, this time I did not fiddle with any header files and only
removed #includes from .c-files, which tend to cause less trouble.
Compile tested against 2.6.20-rc2 and 2.6.20-rc2-mm2 (with offsets) on alpha,
arm, i386, ia64, mips, powerpc, and x86_64 with allnoconfig, defconfig,
allmodconfig, and allyesconfig as well as a few randconfigs on x86_64 and all
configs in arch/arm/configs on arm. I also checked that no new warnings were
introduced by the patch (actually, some warnings are removed that were emitted
by unnecessarily included header files).
Signed-off-by: Tim Schmielau <tim@physik3.uni-rostock.de>
Acked-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of master.kernel.org:/pub/scm/linux/kernel/git/roland/infiniband:
IB/mthca: Always fill MTTs from CPU
IB/mthca: Merge MR and FMR space on 64-bit systems
IB/mthca: Fix access to MTT and MPT tables on non-cache-coherent CPUs
IB/mthca: Give reserved MTTs a separate cache line
IB/mthca: Fix reserved MTTs calculation on mem-free HCAs
RDMA/cxgb3: Add driver for Chelsio T3 RNIC
IB: Remove redundant "_wq" from workqueue names
RDMA/cma: Increment port number after close to avoid re-use
IB/ehca: Fix memleak on module unloading
IB/mthca: Work around gcc bug on sparc64
IPoIB: Connected mode experimental support
IB/core: Use ARRAY_SIZE macro for mandatory_table
IB/mthca: Use correct structure size in call to memset()
Speed up memory registration by filling in MTTs directly when the CPU
can write directly to the whole table (all mem-free cards, and to
Tavor mode on 64-bit systems with the patch I posted earlier). This
reduces the number of FW commands needed to register an MR by at least
a factor of 2 and speeds up memory registration significantly.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
For Tavor, we currently reserve separate MPT and MTT space for FMRs to
avoid abusing the vmalloc space on 32 bit kernels. No such problem
exists on 64 bit kernels so let's not do it there.
This way we have a shared pool for MR and FMR resources, used on
demand. This will also make it possible to write MTTs for regular
regions directly from driver.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
We allocate the MTT table with alloc_pages() and then do pci_map_sg(),
so we must call pci_dma_sync_sg() after the CPU writes to the MTT
table. This works since the device will never write MTTs on mem-free
HCAs, once we get rid of the use of the WRITE_MTT firmware command.
This change is needed to make that work, and is an improvement for
now, since it gives FMRs a chance at working.
For MPTs, both the device and CPU might write there, so we must
allocate DMA coherent memory for these.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
MTTs are allocated in non-cache-coherent memory, so we must give
reserved MTTs their own cache line, to prevent both device and
CPU from writing into the same cache line at the same time.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
The reserved_mtts field has different meaning in Tavor and Arbel, so
we are wasting mtt entries on memfree. Fix the Arbel case to match
Tavor semantics.
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
Add an RDMA/iWARP driver for the Chelsio T3 1GbE and 10GbE adapters.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>