Commit graph

244 commits

Author SHA1 Message Date
James Smart
573e591353 [SCSI] scsi_lib: pause between error retries
During cable pull tests on our 16G FC adapter, we are seeing errors,
typically reads to close targets, which fail due to CRC or framing
errors caused by the cable being pull (return status DID_ERROR).
The adapter detects the error on one of the first frames received,
marks the FC exchange as dead (further frames go to bit bucket) and
signals the host of the error. This action is so quick, and coupled
with fast host CPUs, creates a scenario in which the midlayer sees
the failure and retries the io almost immediately. We've seen link
traces with the retry on the link while the original i/o is still
being processed by the target. We're also seeing the time window
for the "link to pull-apart" and the physical interface to report
disconnected to be in the few millisecond range. Which means, we're
encountering scenarios where the full retry count is exhausted
(all with error) by the midlayer before the link disconnect state
is detected.

We looked at 8G FC behavior and occasionally see the same behavior,
but as the link was slower, it rarely could exhaust all retries
before the link reported disconnect.

What is needed is a slight delay between io retries due to DID_ERROR
to cover this error.  It is inappropriate to put this delay in the
driver, as the error is indistinguishable from other link-related errors,
nor does the driver track whether the io is a retry or not. This is also
easier than tracking between-io-error bursts that are seen in this
scenario.

The patch below updates the retry path so that it inserts a delay as
if the target was busy.  The busy delay is on the order of 6ms. This
delay is sufficient to ensure the link down condition is reported
before the retry count is exhausted (at most 1 retry is seen).

Signed-off-by: Alex Iannicelli <alex.iannicelli@emulex.com>
Signed-off-by: James Smart <james.smart@emulex.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-07-27 14:06:01 +04:00
James Bottomley
bfe159a512 [SCSI] fix crash in scsi_dispatch_cmd()
USB surprise removal of sr is triggering an oops in
scsi_dispatch_command().  What seems to be happening is that USB is
hanging on to a queue reference until the last close of the upper
device, so the crash is caused by surprise remove of a mounted CD
followed by attempted unmount.

The problem is that USB doesn't issue its final commands as part of
the SCSI teardown path, but on last close when the block queue is long
gone.  The long term fix is probably to make sr do the teardown in the
same way as sd (so remove all the lower bits on ejection, but keep the
upper disk alive until last close of user space).  However, the
current oops can be simply fixed by not allowing any commands to be
sent to a dead queue.

Cc: stable@kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-07-21 14:21:18 -07:00
Linus Torvalds
a2b9c1f620 Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  block: don't delay blk_run_queue_async
  scsi: remove performance regression due to async queue run
  blk-throttle: Use task_subsys_state() to determine a task's blkio_cgroup
  block: rescan partitions on invalidated devices on -ENOMEDIA too
  cdrom: always check_disk_change() on open
  block: unexport DISK_EVENT_MEDIA_CHANGE for legacy/fringe drivers
2011-05-18 06:49:02 -07:00
Jens Axboe
9937a5e2f3 scsi: remove performance regression due to async queue run
Commit c21e6beb removed our queue request_fn re-enter
protection, and defaulted to always running the queues from
kblockd to be safe. This was a known potential slow down,
but should be safe.

Unfortunately this is causing big performance regressions for
some, so we need to improve this logic. Looking into the details
of the re-enter, the real issue is on requeue of requests.

Requeue of requests upon seeing a BUSY condition from the device
ends up re-running the queue, causing traces like this:

scsi_request_fn()
        scsi_dispatch_cmd()
                scsi_queue_insert()
                        __scsi_queue_insert()
                                scsi_run_queue()
					scsi_request_fn()
						...

potentially causing the issue we want to avoid. So special
case the requeue re-run of the queue, but improve it to offload
the entire run of local queue and starved queue from a single
workqueue callback. This is a lot better than potentially
kicking off a workqueue run for each device seen.

This also fixes the issue of the local device going into recursion,
since the above mentioned commit never moved that queue run out
of line.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-05-17 11:04:44 +02:00
James Bottomley
c055f5b261 [SCSI] fix oops in scsi_run_queue()
The recent commit closing the race window in device teardown:

commit 86cbfb5607
Author: James Bottomley <James.Bottomley@suse.de>
Date:   Fri Apr 22 10:39:59 2011 -0500

    [SCSI] put stricter guards on queue dead checks

is causing a potential NULL deref in scsi_run_queue() because the
q->queuedata may already be NULL by the time this function is called.
Since we shouldn't be running a queue that is being torn down, simply
add a NULL check in scsi_run_queue() to forestall this.

Tested-by: Jim Schutt <jaschut@sandia.gov>
Cc: stable@kernel.org
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-05-03 15:30:00 -05:00
Jens Axboe
c21e6beba8 block: get rid of QUEUE_FLAG_REENTER
We are currently using this flag to check whether it's safe
to call into ->request_fn(). If it is set, we punt to kblockd.
But we get a lot of false positives and excessive punts to
kblockd, which hurts performance.

The only real abuser of this infrastructure is SCSI. So export
the async queue run and convert SCSI over to use that. There's
room for improvement in that SCSI need not always use the async
call, but this fixes our performance issue and they can fix that
up in due time.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-19 13:32:46 +02:00
Christoph Hellwig
24ecfbe27f block: add blk_run_queue_async
Instead of overloading __blk_run_queue to force an offload to kblockd
add a new blk_run_queue_async helper to do it explicitly.  I've kept
the blk_queue_stopped check for now, but I suspect it's not needed
as the check we do when the workqueue items runs should be enough.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-04-18 11:41:33 +02:00
Linus Torvalds
6c51038900 Merge branch 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.39/core' of git://git.kernel.dk/linux-2.6-block: (65 commits)
  Documentation/iostats.txt: bit-size reference etc.
  cfq-iosched: removing unnecessary think time checking
  cfq-iosched: Don't clear queue stats when preempt.
  blk-throttle: Reset group slice when limits are changed
  blk-cgroup: Only give unaccounted_time under debug
  cfq-iosched: Don't set active queue in preempt
  block: fix non-atomic access to genhd inflight structures
  block: attempt to merge with existing requests on plug flush
  block: NULL dereference on error path in __blkdev_get()
  cfq-iosched: Don't update group weights when on service tree
  fs: assign sb->s_bdi to default_backing_dev_info if the bdi is going away
  block: Require subsystems to explicitly allocate bio_set integrity mempool
  jbd2: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  jbd: finish conversion from WRITE_SYNC_PLUG to WRITE_SYNC and explicit plugging
  fs: make fsync_buffers_list() plug
  mm: make generic_writepages() use plugging
  blk-cgroup: Add unaccounted time to timeslice_used.
  block: fixup plugging stubs for !CONFIG_BLOCK
  block: remove obsolete comments for blkdev_issue_zeroout.
  blktrace: Use rq->cmd_flags directly in blk_add_trace_rq.
  ...

Fix up conflicts in fs/{aio.c,super.c}
2011-03-24 10:16:26 -07:00
Linus Torvalds
c55d267de2 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (170 commits)
  [SCSI] scsi_dh_rdac: Add MD36xxf into device list
  [SCSI] scsi_debug: add consecutive medium errors
  [SCSI] libsas: fix ata list corruption issue
  [SCSI] hpsa: export resettable host attribute
  [SCSI] hpsa: move device attributes to avoid forward declarations
  [SCSI] scsi_debug: Logical Block Provisioning (SBC3r26)
  [SCSI] sd: Logical Block Provisioning update
  [SCSI] Include protection operation in SCSI command trace
  [SCSI] hpsa: fix incorrect PCI IDs and add two new ones (2nd try)
  [SCSI] target: Fix volume size misreporting for volumes > 2TB
  [SCSI] bnx2fc: Broadcom FCoE offload driver
  [SCSI] fcoe: fix broken fcoe interface reset
  [SCSI] fcoe: precedence bug in fcoe_filter_frames()
  [SCSI] libfcoe: Remove stale fcoe-netdev entries
  [SCSI] libfcoe: Move FCOE_MTU definition from fcoe.h to libfcoe.h
  [SCSI] libfc: introduce __fc_fill_fc_hdr that accepts fc_hdr as an argument
  [SCSI] fcoe, libfc: initialize EM anchors list and then update npiv EMs
  [SCSI] Revert "[SCSI] libfc: fix exchange being deleted when the abort itself is timed out"
  [SCSI] libfc: Fixing a memory leak when destroying an interface
  [SCSI] megaraid_sas: Version and Changelog update
  ...

Fix up trivial conflicts due to whitespace differences in
drivers/scsi/libsas/{sas_ata.c,sas_scsi_host.c}
2011-03-17 17:54:40 -07:00
Martin K. Petersen
c98a0eb0e9 [SCSI] sd: Logical Block Provisioning update
SBC3r26 contains many changes to the Logical Block Provisioning
interfaces (formerly known as Thin Provisioning ditto). This patch
implements support for both the old and new schemes using the same
heuristic as before (whether the LBP VPD page is present).

The new code also allows the provisioning mode (i.e. choice of command)
to be overridden on a per-device basis via sysfs. Two additional modes
are supported in this version:

 - WRITE SAME(10) with the UNMAP bit set

 - WRITE SAME(10) without the UNMAP bit set. This allows us to support
   devices that predate the TP/LBP enhancements in SBC3 and which work
   by way zero-detection

Switching between modes has been consolidated in a helper function that
also updates the block layer topology according to the limitations of
the chosen command.

I experimented with trying WRITE SAME(16) if UNMAP fails, WRITE SAME(10)
if WRITE SAME(16) fails, etc. but found several devices that got
cranky. So for now we'll disable discard if one of the commands
fail. The user still has the option of selecting a different mode in
sysfs.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-14 18:37:34 -05:00
Martin K. Petersen
72f7d322fd [SCSI] Include protection operation in SCSI command trace
When debugging DIF/DIX it is very helpful to be able to see which DIX
operation is associated with the scsi_cmnd. Include the protection op in
the SCSI command trace.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-03-14 18:36:02 -05:00
Jens Axboe
4c63f5646e Merge branch 'for-2.6.39/stack-plug' into for-2.6.39/core
Conflicts:
	block/blk-core.c
	block/blk-flush.c
	drivers/md/raid1.c
	drivers/md/raid10.c
	drivers/md/raid5.c
	fs/nilfs2/btnode.c
	fs/nilfs2/mdt.c

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:58:35 +01:00
Jens Axboe
a488e74976 scsi: convert to blk_delay_queue()
It was always abuse to reuse the plugging infrastructure for this,
convert it to the (new) real API for delaying queueing a bit. A
default delay of 3 msec is defined, to match the previous
behaviour.

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-10 08:45:54 +01:00
Tejun Heo
1654e7411a block: add @force_kblockd to __blk_run_queue()
__blk_run_queue() automatically either calls q->request_fn() directly
or schedules kblockd depending on whether the function is recursed.
blk-flush implementation needs to be able to explicitly choose
kblockd.  Add @force_kblockd.

All the current users are converted to specify %false for the
parameter and this patch doesn't introduce any behavior change.

stable: This is prerequisite for fixing ide oops caused by the new
        blk-flush implementation.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jan Beulich <JBeulich@novell.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2011-03-02 08:48:05 -05:00
Hannes Reinecke
63583cca74 [SCSI] Add detailed SCSI I/O errors
Instead of just passing 'EIO' for any I/O error we should be
notifying the upper layers with more details about the cause
of this error.

Update the possible I/O errors to:

- ENOLINK: Link failure between host and target
- EIO: Retryable I/O error
- EREMOTEIO: Non-retryable I/O error
- EBADE: I/O error restricted to the I_T_L nexus

'Retryable' in this context means that an I/O error _might_ be
restricted to the I_T_L nexus (vulgo: path), so retrying on another
nexus / path might succeed.

'Non-retryable' in general refers to a target failure, so this
error will always be generated regardless of the I_T_L nexus
it was send on.

I/O errors restricted to the I_T_L nexus might be retried
on another nexus / path, but they should _not_ be queued
if no paths are available.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-12 10:33:08 -06:00
Linus Torvalds
275220f0fc Merge branch 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.38/core' of git://git.kernel.dk/linux-2.6-block: (43 commits)
  block: ensure that completion error gets properly traced
  blktrace: add missing probe argument to block_bio_complete
  block cfq: don't use atomic_t for cfq_group
  block cfq: don't use atomic_t for cfq_queue
  block: trace event block fix unassigned field
  block: add internal hd part table references
  block: fix accounting bug on cross partition merges
  kref: add kref_test_and_get
  bio-integrity: mark kintegrityd_wq highpri and CPU intensive
  block: make kblockd_workqueue smarter
  Revert "sd: implement sd_check_events()"
  block: Clean up exit_io_context() source code.
  Fix compile warnings due to missing removal of a 'ret' variable
  fs/block: type signature of major_to_index(int) to major_to_index(unsigned)
  block: convert !IS_ERR(p) && p to !IS_ERR_NOR_NULL(p)
  cfq-iosched: don't check cfqg in choose_service_tree()
  fs/splice: Pull buf->ops->confirm() from splice_from_pipe actors
  cdrom: export cdrom_check_events()
  sd: implement sd_check_events()
  sr: implement sr_check_events()
  ...
2011-01-13 10:45:01 -08:00
Linus Torvalds
da40d036fd Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (147 commits)
  [SCSI] arcmsr: fix write to device check
  [SCSI] lpfc: lower stack use in lpfc_fc_frame_check
  [SCSI] eliminate an unnecessary local variable from scsi_remove_target()
  [SCSI] libiscsi: use bh locking instead of irq with session lock
  [SCSI] libiscsi: do not take host lock in queuecommand
  [SCSI] be2iscsi: fix null ptr when accessing task hdr
  [SCSI] be2iscsi: fix gfp use in alloc_pdu
  [SCSI] libiscsi: add more informative failure message during iscsi scsi eh
  [SCSI] gdth: Add missing call to gdth_ioctl_free
  [SCSI] bfa: remove unused defintions and misc cleanups
  [SCSI] bfa: remove inactive functions
  [SCSI] bfa: replace bfa_assert with WARN_ON
  [SCSI] qla2xxx: Use sg_next to fetch next sg element while walking sg list.
  [SCSI] qla2xxx: Fix to avoid recursive lock failure during BSG timeout.
  [SCSI] qla2xxx: Remove code to not reset ISP82xx on failure.
  [SCSI] qla2xxx: Display mailbox register 4 during 8012 AEN for ISP82XX parts.
  [SCSI] qla2xxx: Don't perform a BIG_HAMMER if Get-ID (0x20) mailbox command fails on CNAs.
  [SCSI] qla2xxx: Remove redundant module parameter permission bits
  [SCSI] qla2xxx: Add sysfs node for displaying board temperature.
  [SCSI] qla2xxx: Code cleanup to remove unwanted comments and code.
  ...
2011-01-07 12:47:02 -08:00
Hillf Danton
fd01a6632d [SCSI] fix the return value of scsi_target_queue_read()
It seems that zero should be returned if scsi_target_is_busy(starget) is
true, no matter if sdev is on the starved list.

Signed-off-by: Hillf Danton <dhillf@gmail.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-12-21 12:37:28 -06:00
Linus Torvalds
7f8635cc9e Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block
* 'for-linus' of git://git.kernel.dk/linux-2.6-block:
  cciss: fix cciss_revalidate panic
  block: max hardware sectors limit wrapper
  block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
  blk-throttle: Correct the placement of smp_rmb()
  blk-throttle: Trim/adjust slice_end once a bio has been dispatched
  block: check for proper length of iov entries earlier in blk_rq_map_user_iov()
  drbd: fix for spin_lock_irqsave in endio callback
  drbd: don't recvmsg with zero length
2010-12-20 09:19:46 -08:00
Martin K. Petersen
e692cb668f block: Deprecate QUEUE_FLAG_CLUSTER and use queue_limits instead
When stacking devices, a request_queue is not always available. This
forced us to have a no_cluster flag in the queue_limits that could be
used as a carrier until the request_queue had been set up for a
metadevice.

There were several problems with that approach. First of all it was up
to the stacking device to remember to set queue flag after stacking had
completed. Also, the queue flag and the queue limits had to be kept in
sync at all times. We got that wrong, which could lead to us issuing
commands that went beyond the max scatterlist limit set by the driver.

The proper fix is to avoid having two flags for tracking the same thing.
We deprecate QUEUE_FLAG_CLUSTER and use the queue limit directly in the
block layer merging functions. The queue_limit 'no_cluster' is turned
into 'cluster' to avoid double negatives and to ease stacking.
Clustering defaults to being enabled as before. The queue flag logic is
removed from the stacking function, and explicitly setting the cluster
flag is no longer necessary in DM and MD.

Reported-by: Ed Lin <ed.lin@promise.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Acked-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@kernel.org
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-17 08:35:53 +01:00
Tejun Heo
9f8a2c23c6 scsi: replace sr_test_unit_ready() with scsi_test_unit_ready()
The usage of TUR has been confusing involving several different
commits updating different parts over time.  Currently, the only
differences between scsi_test_unit_ready() and sr_test_unit_ready()
are,

* scsi_test_unit_ready() also sets sdev->changed on NOT_READY.

* scsi_test_unit_ready() returns 0 if TUR ended with UNIT_ATTENTION or
  NOT_READY.

Due to the above two differences, sr is using its own
sr_test_unit_ready(), but sd - the sole user of the above extra
handling - doesn't even need them.

Where scsi_test_unit_ready() is used in sd_media_changed(), the code
is looking for device ready w/ media present state which is true iff
TUR succeeds w/o sense data or UA, and when the device is not ready
for whatever reason sd_media_changed() explicitly marks media as
missing so there's no reason to set sdev->changed automatically from
scsi_test_unit_ready() on NOT_READY.

Drop both special handlings from scsi_test_unit_ready(), which makes
it equivalant to sr_test_unit_ready(), and replace
sr_test_unit_ready() with scsi_test_unit_ready().  Also, drop the
unnecessary explicit NOT_READY check from sd_media_changed().
Checking return value is enough for testing device readiness.

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-12-16 17:53:39 +01:00
James Bottomley
459dbf72e4 [SCSI] Eliminate error handler overload of the SCSI serial number
The error handler is using the test cmd->serial_number == 0 in the
abort routines to signal that the command to be aborted has already
completed normally.  This design was to close a race window in the
original error handler where a command could go through the normal
completion routines after it timed out but before error handling was
started.

Mike Anderson pointed out that when we converted our timeout and
softirq completions, we picked up atomicity here because the block
layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
that *either* the command times out or our done routine is called, but
ensures we can't get both occurring.  That makes the serial number
zero check redundant and it can be removed.

Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-12-09 09:41:16 -06:00
Mike Christie
986fe6c7f5 [SCSI] Fix regressions in scsi_internal_device_block
Deleting a SCSI device on a blocked fc_remote_port (before
fast_io_fail_tmo fires) results in a hanging thread:

  STACK:
  0 schedule+1108 [0x5cac48]
  1 schedule_timeout+528 [0x5cb7fc]
  2 wait_for_common+266 [0x5ca6be]
  3 blk_execute_rq+160 [0x354054]
  4 scsi_execute+324 [0x3b7ef4]
  5 scsi_execute_req+162 [0x3b80ca]
  6 sd_sync_cache+138 [0x3cf662]
  7 sd_shutdown+138 [0x3cf91a]
  8 sd_remove+112 [0x3cfe4c]
  9 __device_release_driver+124 [0x3a08b8]
10 device_release_driver+60 [0x3a0a5c]
11 bus_remove_device+266 [0x39fa76]
12 device_del+340 [0x39d818]
13 __scsi_remove_device+204 [0x3bcc48]
14 scsi_remove_device+66 [0x3bcc8e]
15 sysfs_schedule_callback_work+50 [0x260d66]
16 worker_thread+622 [0x162326]
17 kthread+160 [0x1680b0]
18 kernel_thread_starter+6 [0x10aaea]

During the delete, the SCSI device is in moved to SDEV_CANCEL.  When
the FC transport class later calls scsi_target_unblock, this has no
effect, since scsi_internal_device_unblock ignores SCSI devics in this
state.

It looks like all these are regressions caused by:
5c10e63c94
[SCSI] limit state transitions in scsi_internal_device_unblock

Fix by rejecting offline and cancel in the state transition.

Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
[jejb: Original patch by Christof Schmitt, modified by Mike Christie]
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-10-25 09:48:32 -05:00
Linus Torvalds
e9dd2b6837 Merge branch 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.37/core' of git://git.kernel.dk/linux-2.6-block: (39 commits)
  cfq-iosched: Fix a gcc 4.5 warning and put some comments
  block: Turn bvec_k{un,}map_irq() into static inline functions
  block: fix accounting bug on cross partition merges
  block: Make the integrity mapped property a bio flag
  block: Fix double free in blk_integrity_unregister
  block: Ensure physical block size is unsigned int
  blkio-throttle: Fix possible multiplication overflow in iops calculations
  blkio-throttle: limit max iops value to UINT_MAX
  blkio-throttle: There is no need to convert jiffies to milli seconds
  blkio-throttle: Fix link failure failure on i386
  blkio: Recalculate the throttled bio dispatch time upon throttle limit change
  blkio: Add root group to td->tg_list
  blkio: deletion of a cgroup was causes oops
  blkio: Do not export throttle files if CONFIG_BLK_DEV_THROTTLING=n
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: revert bad fix for memory hotplug causing bounces
  Fix compile error in blk-exec.c for !CONFIG_DETECT_HUNG_TASK
  block: set the bounce_pfn to the actual DMA limit rather than to max memory
  block: Prevent hang_check firing during long I/O
  cfq: improve fsync performance for small files
  ...

Fix up trivial conflicts due to __rcu sparse annotation in include/linux/genhd.h
2010-10-22 17:00:32 -07:00
Martin K. Petersen
13f05c8d8e block/scsi: Provide a limit on the number of integrity segments
Some controllers have a hardware limit on the number of protection
information scatter-gather list segments they can handle.

Introduce a max_integrity_segments limit in the block layer and provide
a new scsi_host_template setting that allows HBA drivers to provide a
value suitable for the hardware.

Add support for honoring the integrity segment limit when merging both
bios and requests.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <axboe@carl.home.kernel.dk>
2010-09-10 20:50:10 +02:00
James Bottomley
3a5c19c23d [SCSI] fix use-after-free in scsi_init_io()
we're using a pointer through a freed command to reset the request,
which has shown up as an oops with slab poisoning:

Reported-by: Tejun Heo <tj@kernel.org>
Reported-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-09-09 09:58:18 -05:00
Bartlomiej Zolnierkiewicz
d6e9fb46cd scsi: remove superfluous NULL pointer check from scsi_kill_request()
Dan's list included:

drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced in initializer 'cmd'
drivers/scsi/scsi_lib.c +1365 scsi_kill_request(9) warning: variable derefenced before check 'cmd'

We dereference cmd (and possible OOPS if cmd == NULL) before starting the
request so just remove the superfluous debugging code altogether.

[ bart: the potential NULL pointer dereference was finally fixed in
  (much later than mine) commit 03b1470 but my patch is still valid ]

Reported-by: Dan Carpenter <error27@gmail.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Eugene Teo <eteo@redhat.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-08-11 08:59:00 -07:00
FUJITA Tomonori
610a63498f scsi: fix discard page leak
We leak a page allocated for discard on some error conditions
(e.g. scsi_prep_state_check returns BLKPREP_DEFER in
scsi_setup_blk_pc_cmnd).

We unprep on requests that weren't prepped in the error path of
scsi_init_io. It makes the error path to clean up scsi commands messy.

Let's strictly apply the rule that we can't unprep on a request that
wasn't prepped.

Calling just scsi_put_command() in the error path of scsi_init_io() is
enough. We don't set REQ_DONTPREP yet.

scsi_setup_discard_cmnd can safely free a page on the error case with
the above rule.

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:24:28 +02:00
James Bottomley
28018c242a block: implement an unprep function corresponding directly to prep
Reviewed-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>

Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:23:47 +02:00
Christoph Hellwig
33659ebbae block: remove wrappers for request type/flags
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests.  This allows much easier grepping for different request
types instead of unwinding through macros.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
2010-08-07 18:17:56 +02:00
Linus Torvalds
b1bf936840 Merge branch 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.34' of git://git.kernel.dk/linux-2.6-block: (38 commits)
  block: don't access jiffies when initialising io_context
  cfq: remove 8 bytes of padding from cfq_rb_root on 64 bit builds
  block: fix for "Consolidate phys_segment and hw_segment limits"
  cfq-iosched: quantum check tweak
  blktrace: perform cleanup after setup error
  blkdev: fix merge_bvec_fn return value checks
  cfq-iosched: requests "in flight" vs "in driver" clarification
  cciss: Fix problem with scatter gather elements in the scsi half of the driver
  cciss: eliminate unnecessary pointer use in cciss scsi code
  cciss: do not use void pointer for scsi hba data
  cciss: factor out scatter gather chain block mapping code
  cciss: fix scatter gather chain block dma direction kludge
  cciss: simplify scatter gather code
  cciss: factor out scatter gather chain block allocation and freeing
  cciss: detect bad alignment of scsi commands at build time
  cciss: clarify command list padding calculation
  cfq-iosched: rethink seeky detection for SSDs
  cfq-iosched: rework seeky detection
  block: remove padding from io_context on 64bit builds
  block: Consolidate phys_segment and hw_segment limits
  ...
2010-03-01 09:00:29 -08:00
Martin K. Petersen
8a78362c4e block: Consolidate phys_segment and hw_segment limits
Except for SCSI no device drivers distinguish between physical and
hardware segment limits.  Consolidate the two into a single segment
limit.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-26 13:58:08 +01:00
Martin K. Petersen
086fa5ff08 block: Rename blk_queue_max_sectors to blk_queue_max_hw_sectors
The block layer calling convention is blk_queue_<limit name>.
blk_queue_max_sectors predates this practice, leading to some confusion.
Rename the function to appropriately reflect that its intended use is to
set max_hw_sectors.

Also introduce a temporary wrapper for backwards compability.  This can
be removed after the merge window is closed.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2010-02-26 13:58:08 +01:00
Douglas Gilbert
e7efe5932b [SCSI] skip sense logging for some ATA PASS-THROUGH cdbs
Further to the lsml thread titled:
"does scsi_io_completion need to dump sense data for ata pass through (ck_cond =
1) ?"

This is a patch to skip logging when the sense data is
associated with a SENSE_KEY of "RECOVERED_ERROR" and the
additional sense code is "ATA PASS-THROUGH INFORMATION
AVAILABLE". This only occurs with the SAT ATA PASS-THROUGH
commands when CK_COND=1 (in the cdb). It indicates that
the sense data contains ATA registers.

Smartmontools uses such commands on ATA disks connected via
SAT. Periodic checks such as those done by smartd cause
nuisance entries into logs that are:
    - neither errors nor warnings
    - pointless unless the cdb that caused them are also logged

Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-01-18 10:48:16 -06:00
Boaz Harrosh
63c43b0ec1 [SCSI] scsi_lib: Fix bug in completion of bidi commands
Because of the terrible structuring of scsi-bidi-commands
it breaks some of the life time rules of a scsi-command.
It is now not allowed to free up the block-request before
cleanup and partial deallocation of the scsi-command. (Which
is not so for none bidi commands)

The right fix to this problem would be to make bidi command
a first citizen by allocating a scsi_sdb pointer at scsi command
just like cmd->prot_sdb. The bidi sdb should be allocated/deallocated
as part of the get/put_command (Again like the prot_sdb) and the
current decoupling of scsi_cmnd and blk-request should be kept.

For now make sure scsi_release_buffers() is called before the
call to blk_end_request_all() which might cause the suicide of
the block requests. At best the leak of bidi buffers, at worse
a crash, as there is a race between the existence of the bidi_request
and the free of the associated bidi_sdb.

The reason this was never hit before is because only OSD has the potential
of doing asynchronous bidi commands. (So does bsg but it is never used)
And OSD clients just happen to do all their bidi commands synchronously, up
until recently.

CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2010-01-17 12:16:18 -06:00
Martin K. Petersen
d8705f11d8 [SCSI] Correctly handle thin provisioning write error
A thin provisioned device may temporarily be out of sufficient
allocation units to fulfill a write request.  In that case it will
return a space allocation in progress error.  Wait a bit and retry the
write.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-10 08:54:15 -06:00
Jiri Slaby
03b147083a [SCSI] scsi_lib: fix potential NULL dereference
Stanse found a potential NULL dereference in scsi_kill_request.

Instead of triggering BUG() in 'if (unlikely(cmd == NULL))' branch,
the kernel will Oops earlier on cmd dereference.

Move the dereferences after the if.

Signed-off-by: Jiri Slaby <jirislaby@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-12-04 12:00:51 -06:00
Mike Christie
ad63082626 [SCSI] fix propogation of integrity errors
When the Integrity check is done in scsi_io_completion it will
set error to -EILSEQ. However, at this point error is no longer
used, and blk_end_request_err has -EIO hardcoded.

It looks like there was just porting mistake with this patch
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=3e695f89c5debb735e4ff051e9e58d8fb4e95110
and we meant to send error upwards, so this patch changes the hard
coded EIO to the error variable.

I have only boot tested this patch.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-10-29 13:03:27 -04:00
Linus Torvalds
355bbd8cb8 Merge branch 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block
* 'for-2.6.32' of git://git.kernel.dk/linux-2.6-block: (29 commits)
  block: use blkdev_issue_discard in blk_ioctl_discard
  Make DISCARD_BARRIER and DISCARD_NOBARRIER writes instead of reads
  block: don't assume device has a request list backing in nr_requests store
  block: Optimal I/O limit wrapper
  cfq: choose a new next_req when a request is dispatched
  Seperate read and write statistics of in_flight requests
  aoe: end barrier bios with EOPNOTSUPP
  block: trace bio queueing trial only when it occurs
  block: enable rq CPU completion affinity by default
  cfq: fix the log message after dispatched a request
  block: use printk_once
  cciss: memory leak in cciss_init_one()
  splice: update mtime and atime on files
  block: make blk_iopoll_prep_sched() follow normal 0/1 return convention
  cfq-iosched: get rid of must_alloc flag
  block: use interrupts disabled version of raise_softirq_irqoff()
  block: fix comment in blk-iopoll.c
  block: adjust default budget for blk-iopoll
  block: fix long lines in block/blk-iopoll.c
  block: add blk-iopoll, a NAPI like approach for block devices
  ...
2009-09-14 17:55:15 -07:00
Tejun Heo
da6c5c720c scsi,block: update SCSI to handle mixed merge failures
Update scsi_io_completion() such that it only fails requests till the
next error boundary and retry the leftover.  This enables block layer
to merge requests with different failfast settings and still behave
correctly on errors.  Allow merge of requests of different failfast
settings.

As SCSI is currently the only subsystem which follows failfast status,
there's no need to worry about other block drivers for now.

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Niel Lambrechts <niel.lambrechts@gmail.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-09-11 14:33:30 +02:00
Martin K. Petersen
002b1eb2c0 [SCSI] Print failed commands
When a request fails we print the sense data but not the actual command
that failed.  Add a printout of the operation + CDB for failed commands.

Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2009-08-22 17:51:51 -05:00
Hannes Reinecke
b391277a56 sd, sr: fix Driver 'sd' needs updating message
If a SCSI ULD driver sets blk_queue_prep_rq(), it should clean it
up itself on remove(), and not from the bus callbacks. This
removes the need to hook into bus->remove(), which should not
be used at the same time as driver->remove().

[jejb: fix sdkp initialisation problem due to mismerge]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kay Sievers <kay.sievers@vrfy.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-21 12:01:27 -05:00
James Bottomley
82681a318f [SCSI] Merge branch 'linus'
Conflicts:
	drivers/message/fusion/mptsas.c

fixed up conflict between req->data_len accessors and mptsas driver updates.

Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-06-12 10:02:03 -05:00
Takahiro Yasui
5c10e63c94 [SCSI] limit state transitions in scsi_internal_device_unblock
scsi timeout on two or more devices may cause extremely long execution
time for user applications because SDEV_OFFLINE state is changed to
SDEV_RUNNING state during scsi error recovery procedures triggered by
a bus reset or a host reset of scsi LLD, and scsi timeout can happens
on the same devices many times.

This happens because scsi_internal_device_unblock() changes device's
state to SDEV_RUNNING even if a device in other states than SDEV_BLOCK,
while the following two transitions are required in this function.

  SDEV_BLOCK -> SDEV_RUNNING
  SDEV_CREATED_BLOCK -> SDEV_CREATED

Otherwise, it returns -EINVAL.

Signed-off-by: Takahiro Yasui <tyasui@redhat.com>
[matthew@wil.cx: supplied rewritten base for patch]
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
2009-05-23 15:44:05 -05:00
Jens Axboe
e4b636366c Merge branch 'master' into for-2.6.31
Conflicts:
	drivers/block/hd.c
	drivers/block/mg_disk.c

Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-22 20:25:34 +02:00
Boaz Harrosh
ac36552a52 scsi_lib: remove unused variable
The last request completion cleanup in scsi_lib left an unused
this_count variable in scsi_io_completion().
(It was used before in a code segment that now uses blk_end_request_all())

Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-19 19:54:09 +02:00
Tejun Heo
e458824f9d scsi: fix resid_len mis-conversion in scsi_end_request()
Commit c3a4d78c58 introduced
rq->data_len and converted residual count users to it.  While
converting, it mistakenly converted scsi_end_request() to finish
requests with residual count when it wants to do is fully complete the
request.  Fix it by using blk_end_request_all() instead.

This bug was spotted by Boaz Harrosh.

Signed-off-by: Tejun Heo <tj@kernel.org>
Spotted-by: Boaz Harrosh <bharrosh@panasas.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-12 08:49:32 +02:00
FUJITA Tomonori
e6bb7a96c2 scsi: simplify the bidi completion
Let's use blk_end_request_all() instead of blk_end_bidi_request().

Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11 11:06:47 +02:00
Tejun Heo
9934c8c045 block: implement and enforce request peek/start/fetch
Till now block layer allowed two separate modes of request execution.
A request is always acquired from the request queue via
elv_next_request().  After that, drivers are free to either dequeue it
or process it without dequeueing.  Dequeue allows elv_next_request()
to return the next request so that multiple requests can be in flight.

Executing requests without dequeueing has its merits mostly in
allowing drivers for simpler devices which can't do sg to deal with
segments only without considering request boundary.  However, the
benefit this brings is dubious and declining while the cost of the API
ambiguity is increasing.  Segment based drivers are usually for very
old or limited devices and as converting to dequeueing model isn't
difficult, it doesn't justify the API overhead it puts on block layer
and its more modern users.

Previous patches converted all block low level drivers to dequeueing
model.  This patch completes the API transition by...

* renaming elv_next_request() to blk_peek_request()

* renaming blkdev_dequeue_request() to blk_start_request()

* adding blk_fetch_request() which is combination of peek and start

* disallowing completion of queued (not started) requests

* applying new API to all LLDs

Renamings are for consistency and to break out of tree code so that
it's apparent that out of tree drivers need updating.

[ Impact: block request issue API cleanup, no functional change ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Mike Miller <mike.miller@hp.com>
Cc: unsik Kim <donari75@gmail.com>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Tim Waugh <tim@cyberelk.net>
Cc: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Laurent Vivier <Laurent@lvivier.info>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jeremy Fitzhardinge <jeremy@xensource.com>
Cc: Grant Likely <grant.likely@secretlab.ca>
Cc: Adrian McMenamin <adrian@mcmen.demon.co.uk>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Cc: Borislav Petkov <petkovbb@googlemail.com>
Cc: Sergei Shtylyov <sshtylyov@ru.mvista.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Pierre Ossman <drzeus@drzeus.cx>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: Stefan Weinhuber <wein@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11 09:52:18 +02:00
Tejun Heo
1011c1b9f2 block: blk_rq_[cur_]_{sectors|bytes}() usage cleanup
With the previous changes, the followings are now guaranteed for all
requests in any valid state.

* blk_rq_sectors() == blk_rq_bytes() >> 9
* blk_rq_cur_sectors() == blk_rq_cur_bytes() >> 9

Clean up accessor usages.  Notable changes are

* nbd,i2o_block: end_all used instead of explicit byte count
* scsi_lib: unnecessary conditional on request type removed

[ Impact: cleanup ]

Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Paul Clements <paul.clements@steeleye.com>
Cc: Pete Zaitcev <zaitcev@redhat.com>
Cc: Alex Dubov <oakad@yahoo.com>
Cc: Markus Lidel <Markus.Lidel@shadowconnect.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
2009-05-11 09:50:55 +02:00