Commit graph

18 commits

Author SHA1 Message Date
Steve Wise
00f7ec36c9 RDMA/core: Add memory management extensions support
This patch adds support for the IB "base memory management extension"
(BMME) and the equivalent iWARP operations (which the iWARP verbs
mandates all devices must implement).  The new operations are:

 - Allocate an ib_mr for use in fast register work requests.

 - Allocate/free a physical buffer lists for use in fast register work
   requests.  This allows device drivers to allocate this memory as
   needed for use in posting send requests (eg via dma_alloc_coherent).

 - New send queue work requests:
   * send with remote invalidate
   * fast register memory region
   * local invalidate memory region
   * RDMA read with invalidate local memory region (iWARP only)

Consumer interface details:

 - A new device capability flag IB_DEVICE_MEM_MGT_EXTENSIONS is added
   to indicate device support for these features.

 - New send work request opcodes IB_WR_FAST_REG_MR, IB_WR_LOCAL_INV,
   IB_WR_RDMA_READ_WITH_INV are added.

 - A new consumer API function, ib_alloc_mr() is added to allocate
   fast register memory regions.

 - New consumer API functions, ib_alloc_fast_reg_page_list() and
   ib_free_fast_reg_page_list() are added to allocate and free
   device-specific memory for fast registration page lists.

 - A new consumer API function, ib_update_fast_reg_key(), is added to
   allow the key portion of the R_Key and L_Key of a fast registration
   MR to be updated.  Consumers call this if desired before posting
   a IB_WR_FAST_REG_MR work request.

Consumers can use this as follows:

 - MR is allocated with ib_alloc_mr().

 - Page list memory is allocated with ib_alloc_fast_reg_page_list().

 - MR R_Key/L_Key "key" field is updated with ib_update_fast_reg_key().

 - MR made VALID and bound to a specific page list via
   ib_post_send(IB_WR_FAST_REG_MR)

 - MR made INVALID via ib_post_send(IB_WR_LOCAL_INV),
   ib_post_send(IB_WR_RDMA_READ_WITH_INV) or an incoming send with
   invalidate operation.

 - MR is deallocated with ib_dereg_mr()

 - page lists dealloced via ib_free_fast_reg_page_list().

Applications can allocate a fast register MR once, and then can
repeatedly bind the MR to different physical block lists (PBLs) via
posting work requests to a send queue (SQ).  For each outstanding
MR-to-PBL binding in the SQ pipe, a fast_reg_page_list needs to be
allocated (the fast_reg_page_list is owned by the low-level driver
from the consumer posting a work request until the request completes).
Thus pipelining can be achieved while still allowing device-specific
page_list processing.

The 32-bit fast register memory key/STag is composed of a 24-bit index
and an 8-bit key.  The application can change the key each time it
fast registers thus allowing more control over the peer's use of the
key/STag (ie it can effectively be changed each time the rkey is
rebound to a page list).

Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-07-14 23:48:45 -07:00
Ralph Campbell
733d12813b IB/ipath: Fix error returned from ib_resize_cq if new size smaller than # entries
The gen2_basic tests check for the errno value when a CQ is resized
smaller than the number of outstanding completions queue on the CQ.
This patch changes ib_ipath to return EINVAL which is what ib_mthca
returns and what gen2_basic expects.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2008-01-25 14:15:28 -08:00
Ralph Campbell
fb74dacb0f IB/ipath: Fix offset returned to ibv_resize_cq()
The wrong offset was being returned to libipathverbs so that when
ibv_resize_cq() calls mmap(), it always fails.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-20 11:03:26 -08:00
Ralph Campbell
a6e7550d8f IB/ipath: Fix memory leak in ipath_resize_cq() if copy_to_user() fails
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Patrick Marchand Latifi <patrick.latifi@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-11-13 15:26:57 -08:00
Ralph Campbell
6cff2faaf1 IB/ipath: Optimize completion queue entry insertion and polling
The code to add an entry to the completion queue stored the QPN which is
needed for the user level verbs view of the completion queue entry but
the kernel struct ib_wc contains a pointer to the QP instead of a QPN.
When the kernel polled for a completion queue entry, the QPN was lookup
up and the QP pointer recovered. This patch stores the CQE differently
based on whether the CQ is a kernel CQ or a user CQ thus avoiding the
QPN to QP lookup overhead.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-10-09 20:52:23 -07:00
Ralph Campbell
4fc570bcbe IB/ipath: Add barrier before updating WC head in shared memory
Add a barrier to make sure the CPU doesn't reorder writes to memory,
since user programs can be polling on the head index update and the
entry should be written before that.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
John Gregor
87427da55b IB/ipath: Update copyright dates
Now that it's June, it's about time to update
the copyright notices of files that have changed.

Signed-off-by: John Gregor <john.gregor@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-07-09 20:12:26 -07:00
Roland Dreier
ed23a72778 IB: Return "maybe missed event" hint from ib_req_notify_cq()
The semantics defined by the InfiniBand specification say that
completion events are only generated when a completions is added to a
completion queue (CQ) after completion notification is requested.  In
other words, this means that the following race is possible:

	while (CQ is not empty)
		ib_poll_cq(CQ);
	// new completion is added after while loop is exited
	ib_req_notify_cq(CQ);
	// no event is generated for the existing completion

To close this race, the IB spec recommends doing another poll of the
CQ after requesting notification.

However, it is not always possible to arrange code this way (for
example, we have found that NAPI for IPoIB cannot poll after
requesting notification).  Also, some hardware (eg Mellanox HCAs)
actually will generate an event for completions added before the call
to ib_req_notify_cq() -- which is allowed by the spec, since there's
no way for any upper-layer consumer to know exactly when a completion
was really added -- so the extra poll of the CQ is just a waste.

Motivated by this, we add a new flag "IB_CQ_REPORT_MISSED_EVENTS" for
ib_req_notify_cq() so that it can return a hint about whether the a
completion may have been added before the request for notification.
The return value of ib_req_notify_cq() is extended so:

	 < 0	means an error occurred while requesting notification
	== 0	means notification was requested successfully, and if
		IB_CQ_REPORT_MISSED_EVENTS was passed in, then no
		events were missed and it is safe to wait for another
		event.
	 > 0	is only returned if IB_CQ_REPORT_MISSED_EVENTS was
		passed in.  It means that the consumer must poll the
		CQ again to make sure it is empty to avoid the race
		described above.

We add a flag to enable this behavior rather than turning it on
unconditionally, because checking for missed events may incur
significant overhead for some low-level drivers, and consumers that
don't care about the results of this test shouldn't be forced to pay
for the test.

Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Michael S. Tsirkin
f4fd0b224d IB: Add CQ comp_vector support
Add a num_comp_vectors member to struct ib_device and extend
ib_create_cq() to pass in a comp_vector parameter -- this parallels
the userspace libibverbs API.  Update all hardware drivers to set
num_comp_vectors to 1 and have all ULPs pass 0 for the comp_vector
value.  Pass the value of num_comp_vectors to userspace rather than
hard-coding a value of 1.

We want multiple CQ event vector support (via MSI-X or similar for
adapters that can generate multiple interrupts), but it's not clear
how many vectors we want, or how we want to deal with policy issues
such as how to decide which vector to use or how to set up interrupt
affinity.  This patch is useful for experimenting, since no core
changes will be necessary when updating a driver to support multiple
vectors, and we know that we want to make at least these changes
anyway.

Signed-off-by: Michael S. Tsirkin <mst@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-05-06 21:18:11 -07:00
Robert Walsh
6b66b2da1e IB/ipath: Don't corrupt pending mmap list when unmapped objects are freed
Fix the pending mmap code so it doesn't corrupt the list of pending
mmaps and crash the machine when pending mmaps are destroyed without
first being mapped.  Also, remove an unused variable, and use standard
kernel lists instead of our own homebrewed linked list implementation
to keep the pending mmap list.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-30 17:30:28 -07:00
Robert Walsh
40b90430ec IB/ipath: Fix WC format drift between user and kernel space
The kernel ib_wc structure now uses a QP pointer, but the user space
equivalent uses a QP number instead.  This means we can no longer use
a simple structure copy to copy stuff into user space.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2007-04-18 20:21:01 -07:00
Bryan O'Sullivan
7a26c47412 IB/ipath: Fix races with ib_resize_cq()
The resize CQ function changes the memory used to store the queue.
Other routines need to honor the lock before accessing the pointer
to the queue and verify that the head and tail are in range.

Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-28 11:17:12 -07:00
Bryan O'Sullivan
aa4eaed702 IB/ipath: Lock and count allocated CQs properly
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-28 11:16:34 -07:00
Bryan O'Sullivan
eae33d47a7 IB/ipath: do not allow use of CQ entries with invalid counts
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:34 -07:00
Ralph Campbell
373d991580 IB/ipath: Performance improvements via mmap of queues
Improve performance of userspace post receive, post SRQ receive, and
poll CQ operations for ipath by allowing userspace to directly mmap()
receive queues and completion queues.  This eliminates the copying
between userspace and the kernel in the data path.

Signed-off-by: Ralph Campbell <ralph.campbell@qlogic.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-09-22 15:22:26 -07:00
Bryan O'Sullivan
fe62546a6a [PATCH] IB/ipath: enforce device resource limits
These limits are somewhat artificial in that we don't actually have any
device limits.  However, the verbs layer expects that such limits exist
and are enforced, so we make up arbitrary (but sensible) limits.

Signed-off-by: Robert Walsh <robert.walsh@qlogic.com>
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:59 -07:00
Bryan O'Sullivan
759d57686d [PATCH] IB/ipath: update copyrights and other strings to reflect new company name
Signed-off-by: Bryan O'Sullivan <bryan.osullivan@qlogic.com>
Cc: "Michael S. Tsirkin" <mst@mellanox.co.il>
Cc: Roland Dreier <rolandd@cisco.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-07-01 09:55:58 -07:00
Bryan O'Sullivan
cef1cce5c8 IB/ipath: misc infiniband code, part 1
Completion queues, local and remote memory keys, and memory region
support.

Signed-off-by: Bryan O'Sullivan <bos@pathscale.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2006-03-31 13:14:20 -08:00