Commit Graph

3213 Commits (9390ef0c85fd065f01045fef708b046c98cda04c)

Author SHA1 Message Date
Linus Torvalds 5bd665f28d Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target updates from Nicholas Bellinger:
 "It has been a very busy development cycle this time around in target
  land, with the highlights including:

   - Kill struct se_subsystem_dev, in favor of direct se_device usage
     (hch)
   - Simplify reservations code by combining SPC-3 + SCSI-2 support for
     virtual backends only (hch)
   - Simplify ALUA code for virtual only backends, and remove left over
     abstractions (hch)
   - Pass sense_reason_t as return value for I/O submission path (hch)
   - Refactor MODE_SENSE emulation to allow for easier addition of new
     mode pages.  (roland)
   - Add emulation of MODE_SELECT (roland)
   - Fix bug in handling of ExpStatSN wrap-around (steve)
   - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
   - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
   - Convert ib_srpt to use modern target_submit_cmd caller + drop
     legacy ioctx->kref usage (nab)
   - Convert ib_srpt to use modern target_submit_tmr caller (nab)
   - Add link_magic for fabric allow_link destination target_items for
     symlinks within target_core_fabric_configfs.c code (nab)
   - Allocate pointers in instead of full structs for
     config_group->default_groups (sebastian)
   - Fix 32-bit highmem breakage for FILEIO (sebastian)

  All told, hch was able to shave off another ~1K LOC by killing the
  se_subsystem_dev abstraction, along with a number of PR + ALUA
  simplifications.  Also, a nice patch by Roland is the refactoring of
  MODE_SENSE handling, along with the addition of initial MODE_SELECT
  emulation support for virtual backends.

  Sebastian found a long-standing issue wrt to allocation of full
  config_group instead of pointers for config_group->default_group[]
  setup in a number of areas, which ends up saving memory with big
  configurations.  He also managed to fix another long-standing BUG wrt
  to broken 32-bit highmem support within the FILEIO backend driver.

  Thank you again to everyone who contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi_target: Add NodeACL tags for initiator group support
  target/tcm_fc: fix the lockdep warning due to inconsistent lock state
  sbp-target: fix error path in sbp_make_tpg()
  sbp-target: use simple assignment in tgt_agent_rw_agent_state()
  iscsi-target: use kstrdup() for iscsi_param
  target/file: merge fd_do_readv() and fd_do_writev()
  target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
  target: Add link_magic for fabric allow_link destination target_items
  ib_srpt: Convert TMR path to target_submit_tmr
  ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
  target: Make spc_get_write_same_sectors return sector_t
  target/configfs: use kmalloc() instead of kzalloc() for default groups
  target/configfs: allocate only 6 slots for dev_cg->default_groups
  target/configfs: allocate pointers instead of full struct for default_groups
  target: update error handling for sbc_setup_write_same()
  iscsit: use GFP_ATOMIC under spin lock
  iscsi_target: Remove redundant null check before kfree
  target/iblock: Forward declare bio helpers
  target: Clean up flow in transport_check_aborted_status()
  target: Clean up logic in transport_put_cmd()
  ...
2012-12-15 14:25:10 -08:00
Roland Dreier 01e0336598 Merge branch 'nes' into for-next 2012-12-08 20:45:48 -08:00
Tatyana Nikolova 7d9c199a55 RDMA/nes: Fix for crash when registering zero length MR for CQ
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-08 00:31:03 -08:00
Tatyana Nikolova 7bfcfa51c3 RDMA/nes: Fix for terminate timer crash
The terminate timer needs to be initialized just once.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-08 00:31:02 -08:00
Tatyana Nikolova 00ad255d17 RDMA/nes: Fix for BUG_ON due to adding already-pending timer
To avoid nes tcp_timer crash for SMP architectures, add_timer is
replaced with mod_timer.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-08 00:31:02 -08:00
Roland Dreier fb57e1dbbd Merge branch 'srp' into for-next 2012-11-30 17:41:04 -08:00
Bart Van Assche dc1bdbd9b8 IB/srp: Allow SRP disconnect through sysfs
Make it possible to disconnect the IB RC connection used by the SRP
protocol to communicate with a target.

Have the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:33 -08:00
Vu Pham 55d93898a1 IB/srp: send disconnect request without waiting for CM timewait exit
Now that SRP recreates the CM ID, QP, and CQ for each connection,
there is no need to wait for the timewait state to complete.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Ishai Rabinovitz 73aa89ed9e IB/srp: destroy and recreate QP and CQs when reconnecting
HW QP FATAL errors persist over a reset operation, but we can recover
from that by recreating the QP and associated CQs for each connection.
Creating a new QP/CQ also completely forecloses any possibility of
getting stale completions or packets on the new connection.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>

[ updated to current code from OFED, cleaned up commit message ]

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Bart Van Assche ef6c49d87c IB/srp: Eliminate state SRP_TARGET_DEAD
Only queue removal work after having changed the target state
into SRP_TARGET_REMOVED and not if that state was already equal
to SRP_TARGET_REMOVED.  That allows us to remove the state
SRP_TARGET_DEAD.  Add a call to srp_disconnect_target() in
srp_remove_target() -- due to previous changes it is now safe to
invoke that function even if the IB connection has already
been disconnected.  This change allows us to replace the target
removal code in srp_remove_one() by an (indirect) call to
srp_remove_target().  Rename srp_target_port.work into
srp_target_port.remove_work to reflect its usage.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche ee12d6a80c IB/srp: Introduce the helper function srp_remove_target()
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche 294c875a65 IB/srp: Suppress superfluous error messages
Keep track of the connection state.  Only report QP errors while
connected.  Only invoke ib_send_cm_dreq() when connected so that
invoking srp_disconnect_target() after having received a DREQ does not
cause an error message to be printed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche 4f0af69799 IB/srp: Process all error completions
If the RDMA RC connection is closed, tell the SCSI mid-layer to
terminate all pending commands instead of only the first.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche 948d1e889e IB/srp: Introduce srp_handle_qp_err()
Introduce the function srp_handle_qp_err(), change the type of
qp_in_error from int into bool and move the initialization of that
variable from srp_reconnect_target() to srp_connect_target().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche 224db15743 IB/srp: Simplify SCSI error handling
Since scsi_remove_host() has been modified so that SCSI error handling
functions will no longer be invoked after scsi_remove_host() returns,
the test at the start of srp_send_tsk_mgmt() is now superfluous.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche f371823120 IB/srp: Keep processing commands during host removal
Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from
inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()).
Make sure that these commands have a chance to reach the SCSI device.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche 09be70a238 IB/srp: Eliminate state SRP_TARGET_CONNECTING
Block the SCSI host while reconnecting instead of representing the
reconnection activity as a distinct SRP target state.  This allows us
to eliminate the target state SRP_TARGET_CONNECTING.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Bart Van Assche c9b03c1ae5 IB/srp: Increase block layer timeout
Increase the block layer timeout for disks so that it is above the
InfiniBand transport layer timeout.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Roland Dreier 78c90247af Merge branches 'cma' and 'mlx4' into for-next 2012-11-29 12:16:43 -08:00
shefty 63f05be2c0 RDMA/cm: Change return value from find_gid_port()
Problem reported by Dan Carpenter <dan.carpenter@oracle.com>:

The patch 3c86aa70bf67: "RDMA/cm: Add RDMA CM support for IBoE
devices" from Oct 13, 2010, leads to the following warning:
net/sunrpc/xprtrdma/svc_rdma_transport.c:722 svc_rdma_create()
	 error: passing non neg 1 to ERR_PTR

This bug would result in a NULL dereference.  svc_rdma_create() is
supposed to return ERR_PTRs or valid pointers, but instead it returns
ERR_PTRs, valid pointers and 1.

The call tree is:

svc_rdma_create()
   => rdma_bind_addr()
      => cma_acquire_dev()
         => find_gid_port()

rdma_bind_addr() should return a valid errno.  Fix this by having
find_gid_port() also return a valid errno.  If we can't find the
specified GID on a given port, return -EADDRNOTAVAIL, rather than
-EAGAIN, to better indicate the error.  We also drop using the
special return value of '1' and instead pass through the error
returned by the underlying verbs call.  On such errors, rather
than aborting the search,  we simply continue to check the next
device/port.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-29 12:16:29 -08:00
Jack Morgenstein ceb7decb36 IB/mlx4: Fix spinlock order to avoid lockdep warnings
lockdep warns about taking a hard-irq-unsafe lock (sriov->id_map_lock)
inside a hard-irq-safe lock (sriov->going_down_lock).

Since id_map_lock is never taken in the interrupt context, we can
simply reverse the order of taking the two spinlocks, thus avoiding
the warning and the depencency.

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-29 12:14:45 -08:00
Nicholas Bellinger 3e4f574857 ib_srpt: Convert TMR path to target_submit_tmr
This patch converts the TMR path in srpt_handle_tsk_mgmt() to use
target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage.

v2: Drop ununused res in target_submit_tmr (Fengguang.Wu)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 11:20:28 -08:00
Nicholas Bellinger 9474b04313 ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
This patch converts the main srpt_handle_cmd() I/O path to use modern
target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage.  This includes
dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
response callbacks.

It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
completion of aborted commands, and adds target_wait_for_sess_cmds() into
srpt_release_channel_work() to allow outstanding I/O to complete during
session shutdown.

Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 01:10:59 -08:00
Roland Dreier d0951d2133 Merge branches 'cxgb4', 'misc', 'mlx4', 'nes' and 'uapi' into for-next 2012-11-26 11:08:41 -08:00
Julia Lawall 5107c2a3d1 RDMA/cxgb3: use WARN
Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 11:08:16 -08:00
Julia Lawall 76f267b7ad RDMA/cxgb4: use WARN
Use WARN rather than printk followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 11:07:18 -08:00
Or Gerlitz 08ff32352d mlx4: 64-byte CQE/EQE support
ConnectX-3 devices can use either 64- or 32-byte completion queue
entries (CQEs) and event queue entries (EQEs).  Using 64-byte
EQEs/CQEs performs better because each entry is aligned to a complete
cacheline.  This patch queries the HCA's capabilities, and if it
supports 64-byte CQEs and EQES the driver will configure the HW to
work in 64-byte mode.

The 32-byte vs 64-byte mode is global per HCA and not per CQ or EQ.

Since this mode is global, userspace (libmlx4) must be updated to work
with the configured CQE size, and guests using SR-IOV virtual
functions need to know both EQE and CQE size.

In case one of the 64-byte CQE/EQE capabilities is activated, the
patch makes sure that older guest drivers that use the QUERY_DEV_FUNC
command (e.g as done in mlx4_core of Linux 3.3..3.6) will notice that
they need an update to be able to work with the PPF. This is done by
changing the returned pf_context_behaviour not to be zero any more. In
case none of these capabilities is activated that value remains zero
and older guest drivers can run OK.

The SRIOV related flow is as follows

1. the PPF does the detection of the new capabilities using
   QUERY_DEV_CAP command.

2. the PPF activates the new capabilities using INIT_HCA.

3. the VF detects if the PPF activated the capabilities using
   QUERY_HCA, and if this is the case activates them for itself too.

Note that the VF detects that it must be aware to the new PF behaviour
using QUERY_FUNC_CAP.  Steps 1 and 2 apply also for native mode.

User space notification is done through a new field introduced in
struct mlx4_ib_ucontext which holds device capabilities for which user
space must take action. This changes the binary interface so the ABI
towards libmlx4 exposed through uverbs is bumped from 3 to 4 but only
when **needed** i.e. only when the driver does use 64-byte CQEs or
future device capabilities which must be in sync by user space. This
practice allows to work with unmodified libmlx4 on older devices (e.g
A0, B0) which don't support 64-byte CQEs.

In order to keep existing systems functional when they update to a
newer kernel that contains these changes in VF and userspace ABI, a
module parameter enable_64b_cqe_eqe must be set to enable 64-byte
mode; the default is currently false.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-26 10:19:17 -08:00
Alan Cox c9795bd708 RDMA/amsol1100: Fix missing break
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-22 00:52:55 -08:00
Alan Cox 5390f86796 IB/ipath: Remove unreachable code
Signed-off-by: Alan Cox <alan@linux.intel.com>
Acked-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-22 00:50:51 -08:00
Julia Lawall 079abea6a3 RDMA/nes: Use WARN()
Use WARN() rather than printk() followed by WARN_ON(1), for conciseness.

A simplified version of the semantic patch that makes this transformation
is as follows: (http://coccinelle.lip6.fr/)

// <smpl>
@@
expression list es;
@@

-printk(
+WARN(1,
  es);
-WARN_ON(1);
// </smpl>

Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>

[ Remove extra KERN_ERR from WARN() format.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-22 00:49:15 -08:00
Tatyana Nikolova cecdcd5f24 RDMA/nes: Fix for incorrect multicast address in the perfect filter table
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-22 00:49:14 -08:00
Tatyana Nikolova fc8d7547b1 RDMA/nes: Fix for sending fpdus in order to hardware
Locking fix to prevent race conditions. Fpdus (per qp) need to be
forwarded to hardware in the order of their sequence numbers.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Donald Wood <Donald.E.Wood@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-22 00:49:14 -08:00
Tatyana Nikolova bff3976bef RDMA/nes: Fix for unlinking skbs from empty list
Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-22 00:49:13 -08:00
Tatyana Nikolova 3127e4ea54 RDMA/nes: Fix incorrect address of IP header
Fix for incorrect ip header address when forwarding fpdus to hardware.

Signed-off-by: Tatyana Nikolova <Tatyana.E.Nikolova@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-22 00:49:13 -08:00
Christoph Hellwig de103c93af target: pass sense_reason as a return value
Pass the sense reason as an explicit return value from the I/O submission
path instead of storing it in struct se_cmd and using negative return
values.  This cleans up a lot of the code pathes, and with the sparse
annotations for the new sense_reason_t type allows for much better
error checking.

(nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use
      sense_reason_t with Roland's MODE SELECT changes)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-06 20:55:46 -08:00
Roland Dreier 1e3474d1de Merge branches 'cxgb4' and 'mlx4' into for-next 2012-10-23 09:03:49 -07:00
Thadeu Lima de Souza Cascardo 32c631f9f2 RDMA/cxgb4: Don't free chunk that we have failed to allocate
In the error path of registering memory when there's a failure to
allocate a chunk from the memory pool, we try to free the same chunk
we just failed to allocate, which will BUG().

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-22 11:05:00 -07:00
Eli Cohen bef83ed92c IB/mlx4: Synchronize cleanup of MCGs in MCG paravirtualization
A client re-register event invokes cleanup of all MCGs.  This is
required to protect against misbehaved guests leading to corruption of
join/leave database.  However, since cleaning up the MCGs is a heavy
operation, it is pushed to a work queue for further processing.
Client re-register is also propagated to ULPs (e.g IPoIB).

However, since the cleanup is performed in a workqueue, the ULP could
leave and re-join groups before the cleanup occurs.  In this case,
when the cleanup takes place, it prunes the (newly-joined) MCGs and
the ULP is left without actual MCGs while believing it joined them.

Fix this by setting the flushing flag before invoking the cleanup task
and clearing it after flushing is complete.

Signed-off-by: Eli Cohen <eli@mellanox.com>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-18 10:29:02 -07:00
Jack Morgenstein 2c75d2ccb6 IB/mlx4: Fix QP1 P_Key processing in the Primary Physical Function (PPF)
In the MAD paravirtualization code, one of the checks performed when
forwarding QP1 (GSI) packets from wire to slave was a P_Key check: the
P_Key received in the MAD must be present in the guest's paravirtualized
P_Key table, and at least one of the (packet P_Key, guest P_Key) must
be a full-membership P_Key.

However, if everyone involved has only limited membership in the
default P_Key, then packets sent by full-member remote hosts arrive at
the PPF but are not passed on to the VFs with the current P_Key1 check.

Fix this as follows:

1. Don't care if P_Key received over wire is full or not. If it
   successfully passed HW checks on the real QP1, then simply pass it
   to guest regardless of whether the guest has full or limited
   membership in its P_Key table.

2. If the guest (including paravirtualized master) has both full and
   limited P_Key forms in its table, preferentially pass the
   paravirtualized P_Key index of the full P_Key form in the tunnel
   header.

3. In the multicast join flow (mlx4/mcg.c), use the index for the
   default P_Key (wherever it is located) in replies generated from
   within the mcg module (previously, P_Key index 0 was used in all
   cases).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-18 10:29:02 -07:00
Doug Ledford 8a095030f7 IB/mlx4: Fix build error on platforms where UL is not 64 bits
Line 110 uses UL as a compiler cast for the 0x constant, but it's not
large enough to hold a 64-bit value on a 32-bit arch.

Signed-off-by: Doug Ledford <dledford@redhat.com>

[ Use "-1" instead of "FFFFFFFFFFFFFFFFULL".  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-18 10:29:01 -07:00
Linus Torvalds a188e7e93a Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi target updates from Nicholas Bellinger:
 "Things have been calm for the most part with no new fabric drivers in
  flight for v3.7 (we're up to eight now !), so this update is primarily
  focused on addressing a few long-standing items within target-core and
  iscsi-target fabric code.

  The highlights include:

   - target: Simplify fabric sense data length handling (roland)
   - qla2xxx: Fix endianness of task management response code (roland)
   - target: fix truncation of mode data, support zero allocation length
     (paolo)
   - target: Properly support zero-length commands in normal processing
     path (paolo)
   - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT
     PDU (ronnie + nab)
   - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG
     demo-mode (ronnie + nab)
   - target/file: Re-enable optional fd_buffered_io=1 operation (nab +
     hch)
   - iscsi-target: Add MaxXmitDataSegmenthLength forr target ->
     initiator MDRSL declaration (nab)
   - target: Add target_submit_cmd_map_sgls for SGL fabric memory
     passthrough (nab + hch)
   - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch +
     nab)
   - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab
     + hch)

  The last series for adding a new target_submit_cmd_map_sgls() fabric
  caller (as requested by hch) that accepts pre-allocated SGL memory
  (using existing logic), along with converting tcm_loop + tcm_vhost has
  only been in -next for the last days, but has gotten enough review
  +testing and is clear enough a mechanical change that I think it's
  reasonable to merge for -rc1 code.

  Thanks again to everyone who contributed this round! Extra special
  thanks to Roland (PureStorage) for tracking down the qla2xxx target
  TMR response code endian issue, and to Paolo (Redhat) for resolving
  the long standing zero-length CDB issues within target-core between
  virtual and pSCSI backends."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits)
  iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values
  iscsit: proper endianess conversions
  iscsit: use the itt_t abstract type
  iscsit: add missing endianess conversion in iscsit_check_inaddr_any
  iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp
  iscsit: mark various functions static
  target/iscsi: precedence bug in iscsit_set_dataout_sequence_values()
  target/usb-gadget: strlen() doesn't count the terminator
  target/usb-gadget: remove duplicate initialization
  tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add control CDB READ payload zero work-around
  tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough
  iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode
  iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength
  iscsi-target: Add MaxXmitDataSegmentLength connection recovery check
  iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength
  iscsi-target: Enable MaxXmitDataSegmentLength operation in login path
  iscsi-target: Add base MaxXmitDataSegmentLength code
  target/file: Re-enable optional fd_buffered_io=1 operation
  ...
2012-10-10 19:52:19 +09:00
David S. Miller 8dd9117cc7 Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux
Pulled mainline in order to get the UAPI infrastructure already
merged before I pull in David Howells's UAPI trees for networking.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-09 13:14:32 -04:00
Konstantin Khlebnikov 314e51b985 mm: kill vma flag VM_RESERVED and mm->reserved_vm counter
A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
currently it lost original meaning but still has some effects:

 | effect                 | alternative flags
-+------------------------+---------------------------------------------
1| account as reserved_vm | VM_IO
2| skip in core dump      | VM_IO, VM_DONTDUMP
3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP

This patch removes reserved_vm counter from mm_struct.  Seems like nobody
cares about it, it does not exported into userspace directly, it only
reduces total_vm showed in proc.

Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.

remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.

[akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Carsten Otte <cotte@de.ibm.com>
Cc: Chris Metcalf <cmetcalf@tilera.com>
Cc: Cyrill Gorcunov <gorcunov@openvz.org>
Cc: Eric Paris <eparis@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: Jason Baron <jbaron@redhat.com>
Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
Cc: Matt Helsley <matthltc@us.ibm.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Robert Richter <robert.richter@amd.com>
Cc: Suresh Siddha <suresh.b.siddha@intel.com>
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Venkatesh Pallipadi <venki@google.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-09 16:22:19 +09:00
Linus Torvalds ca4da6948b Second batch of changes for the 3.7 merge window:
- Late-breaking fix for IPoIB on mlx4 SR-IOV VFs.
  - Fix for IPoIB build breakage with CONFIG_INFINIBAND_IPOIB_CM=n
    (new netlink config changes are to blame).
  - Make sure retry count values are in range in RDMA CM.
  - A few nes hardware driver fixes and cleanups.
  - Have iSER initiator use >1 interrupt vectors if available.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQbkIkAAoJEENa44ZhAt0hrLEP/3jNpwR+6/rbVxcPkmVVITsn
 yUz9S3V7loJqQqlHt7ktphgcoUpar+xs2C8DbCyVupyee4K3zd7Lw+wOVO2OJ16Y
 +gE8PZB6mSYOVaI0JW26Qqe2gcQ5DLZ4ic6E8gEiuyjNo3ig3MZl17ZffTI+5lXi
 igR+Qnsy2bm18JI6aumSeKYa17f7EmOK63jaIp0jc95vrtWke3jRRs6+bmmDLV7q
 4ZxCP7ckBTRFVkisXlWZ95MmUbg/lhWosZGdJwjRYs+LW3tzgY18OQZgbw6FJKpJ
 0OouAO51UMKcizfg7ikwUIw6/apaCFXXv/pH3rIeq18qiJeKcDsIXseY7sFRboHR
 fIPpfhHENokcxtlOReaw7hnlyMuwrLiYbF6TjPN9t8dWTNbwe+4W0K9G60fu1wmk
 qjaUTBCCuDF4KtKqCxP01cZzXyjJrworw5p+Nc1wxds+JeyKRCpVRrfYep8vOgft
 z6KJjQGDF16MvXmhF1dD9VBpmaMCOA4NDhfa0IcgHPTOLc/7Nbe5NTD95InlzM3Q
 /C0CPj+4J4kzEMnTBHVJvv1NeD3G1RQD80bce6ViTbgOUg9pmeGHkLWe420h7xT4
 u/LBnwM1dcqY7BDOnn3ycYoJgr5OCQZOpLcNWjTXv4t51mBOYtM6AJ6IBIvkOmSz
 QjEyiDEEvT2K7c/rhYOz
 =9Jv4
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband changes from Roland Dreier:
  "Second batch of changes for the 3.7 merge window:
   - Late-breaking fix for IPoIB on mlx4 SR-IOV VFs.
   - Fix for IPoIB build breakage with CONFIG_INFINIBAND_IPOIB_CM=n (new
     netlink config changes are to blame).
   - Make sure retry count values are in range in RDMA CM.
   - A few nes hardware driver fixes and cleanups.
   - Have iSER initiator use >1 interrupt vectors if available."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cma: Check that retry count values are in range
  IB/iser: Add more RX CQs to scale out processing of SCSI responses
  RDMA/nes: Bump the version number of nes driver
  RDMA/nes: Remove unused module parameter "send_first"
  RDMA/nes: Remove unnecessary if-else statement
  RDMA/nes: Add missing break to switch.
  mlx4_core: Adjust flow steering attach wrapper so that IB works on SR-IOV VFs
  IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n
2012-10-07 17:19:49 +09:00
Gao feng 809d5fc9bf infiniband: pass rdma_cm module to netlink_dump_start
set netlink_dump_control.module to avoid panic.

Signed-off-by: Gao feng <gaofeng@cn.fujitsu.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-07 00:30:56 -04:00
Linus Torvalds 5f3d2f2e1a Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc
Pull powerpc updates from Benjamin Herrenschmidt:
 "Some highlights in addition to the usual batch of fixes:

   - 64TB address space support for 64-bit processes by Aneesh Kumar

   - Gavin Shan did a major cleanup & re-organization of our EEH support
     code (IBM fancy PCI error handling & recovery infrastructure) which
     paves the way for supporting different platform backends, along
     with some rework of the PCIe code for the PowerNV platform in order
     to remove home made resource allocations and instead use the
     generic code (which is possible after some small improvements to it
     done by Gavin).

   - Uprobes support by Ananth N Mavinakayanahalli

   - A pile of embedded updates from Freescale folks, including new SoC
     and board supports, more KVM stuff including preparing for 64-bit
     BookE KVM support, ePAPR 1.1 updates, etc..."

Fixup trivial conflicts in drivers/scsi/ipr.c

* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
  powerpc/iommu: Fix multiple issues with IOMMU pools code
  powerpc: Fix VMX fix for memcpy case
  driver/mtd:IFC NAND:Initialise internal SRAM before any write
  powerpc/fsl-pci: use 'Header Type' to identify PCIE mode
  powerpc/eeh: Don't release eeh_mutex in eeh_phb_pe_get
  powerpc: Remove tlb batching hack for nighthawk
  powerpc: Set paca->data_offset = 0 for boot cpu
  powerpc/perf: Sample only if SIAR-Valid bit is set in P7+
  powerpc/fsl-pci: fix warning when CONFIG_SWIOTLB is disabled
  powerpc/mpc85xx: Update interrupt handling for IFC controller
  powerpc/85xx: Enable USB support in p1023rds_defconfig
  powerpc/smp: Do not disable IPI interrupts during suspend
  powerpc/eeh: Fix crash on converting OF node to edev
  powerpc/eeh: Lock module while handling EEH event
  powerpc/kprobe: Don't emulate store when kprobe stwu r1
  powerpc/kprobe: Complete kprobe and migrate exception frame
  powerpc/kprobe: Introduce a new thread flag
  powerpc: Remove unused __get_user64() and __put_user64()
  powerpc/eeh: Global mutex to protect PE tree
  powerpc/eeh: Remove EEH PE for normal PCI hotplug
  ...
2012-10-06 03:16:12 +09:00
Fengguang Wu 125c4c706b idr: rename MAX_LEVEL to MAX_IDR_LEVEL
To avoid name conflicts:

  drivers/video/riva/fbdev.c:281:9: sparse: preprocessor token MAX_LEVEL redefined

While at it, also make the other names more consistent and add
parentheses.

[akpm@linux-foundation.org: repair fallout]
[sfr@canb.auug.org.au: IB/mlx4: fix for MAX_ID_MASK to MAX_IDR_MASK name change]
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
Cc: walter harms <wharms@bfs.de>
Cc: Glauber Costa <glommer@parallels.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2012-10-06 03:04:56 +09:00
Roland Dreier 56040f07b2 Merge branches 'cma', 'ipoib', 'iser', 'mlx4' and 'nes' into for-next 2012-10-04 19:12:22 -07:00
Sean Hefty 4ede178a5e RDMA/cma: Check that retry count values are in range
The retry_count and rnr_retry_count connection parameters are both
3-bit values.  Check that the values are in range and reduce if
they're not.

This fixes a problem reported by Doug Ledford <dledford@redhat.com>
that resulted in the userspace rping test (part of the librdmacm
samples) failing to run over Intel IB HCAs.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>

[ Use min_t() to avoid warnings about type mismatch.  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-04 19:11:54 -07:00
Alex Tabachnik 5a33a66949 IB/iser: Add more RX CQs to scale out processing of SCSI responses
RX/TX CQs will now be selected from a per HCA pool.  For the RX flow
this has the effect of using different interrupt vectors when using
low level drivers (such as mlx4) that map the "vector" param provided
by the ULP on CQ creation to a dedicated IRQ/MSI-X vector.  This
allows the RX flow processing of IO responses to be distributed across
multiple CPUs.

QPs (--> iSER sessions) are assigned to CQs in round robin order using
the CQ with the minimum number of sessions attached to it.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-03 21:26:49 -07:00