Commit graph

395 commits

Author SHA1 Message Date
Dan Williams
4e646ddd5f [SCSI] isci: use sas eh strategy handlers
...now that the strategy handlers guarantee eh context and notify
the driver of bus reset.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2012-07-20 08:58:51 +01:00
Jeff Skirvin
de2eb4d5c5 isci: End the RNC resumption wait when the RNC is destroyed.
While the RNC is suspended for I/O cleanup, the remote device can be
stopped and the RNC setup for destruction.  These changes accomodate that
case in the abort path.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:44 -07:00
Jeff Skirvin
6c6aacbb77 isci: Fixed RNC bug that lost the suspension or resumption during destroy
This fix corrects the saving of resume parameters when the destruction
of the RNC has already been directed, and makes sure not to overwrite
the RNC destruction callbacks.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:44 -07:00
Jeff Skirvin
79cbab89ff isci: Fix RNC AWAIT_SUSPENSION->INVALIDATING transition.
The RNC state machine would incorrectly transition from
SCI_RNC_AWAIT_SUSPENSION directly to SCI_RNC_INVALIDATING when a destruct
request was made.  This would skip the increment of the suspension count
and the abort of pending TCs (although the invalidating state would at
least cleanup outstanding TCs).

Instead, the RNC will transition to SCI_RNC_SUSPENDED and then start the
destruction process.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:44 -07:00
Jeff Skirvin
3ef768c6c0 isci: Manage the IREQ_NO_AUTO_FREE_TAG under scic_lock.
Since there is a possibilty of a timeout waiting for the RNC suspension,
handle the exit case from the task termination under scic_lock, and leave
the tag allocated if the termination timed-out.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:44 -07:00
Jeff Skirvin
f8381807eb isci: Remove obviated host callback list.
Since the callbacks to libsas now occur under scic_lock, there is no
longer any reason to save the completed requests in a separate list
for completion to libsas.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:44 -07:00
Jeff Skirvin
397497dd61 isci: Check IDEV_GONE before performing abort path operations.
In the link fail path, set IDEV_GONE for every device on the domain
when the last link in the port fails.

In the abort path functions like isci_reset_device, make sure that
there has not already been a detected domain failure with the device
by checking IDEV_GONE, before performing any kind of hard reset, SMP
phy control, or TMF operation.

The check for IDEV_GONE makes sure that the device in the abort path
really has control of the port with which it is associated.  This
prevents starting hard resets at incorrect times and scheduling
unnecessary LUN resets for SATA devices.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:43 -07:00
Jeff Skirvin
87805162b6 isci: Restore the ATAPI device RNC management code.
The ATAPI specific and STP general RNC suspension code had been
incorrectly removed from the remote device code.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:43 -07:00
Jeff Skirvin
1f05388933 isci: Don't wait for an RNC suspend if it's being destroyed.
Make sure that the wait for suspend can handle the RNC destruction case.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:43 -07:00
Jeff Skirvin
c5457a82a4 isci: Change the phy control and link reset interface for HW reasons.
There is an apparent HW lockup caused when the PE is disabled while there
is an outstanding TC in progress.  This change puts the link into OOB to
force the TC to end before the PE is disabled.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:42 -07:00
Jeff Skirvin
8c731888bf isci: Added timeouts to RNC suspensions in the abort path.
This change adds timeouts to the RNC suspension wait.  It makes the
suspend and resume timeouts the same.

The previous resume timeout of 5 ms was too short, and timeouts were
seen in resumptions of devices in the abort task/LUN reset path - which
would receive an RNC resumed message within a tenth of a second later.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:42 -07:00
Jeff Skirvin
28de92bef0 isci: Add protocol indicator for TMF requests.
Requests contructed as task management requests need to have the protocol
indicator set so the completion decode can observe any RNC suspension
conditions.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:42 -07:00
Jeff Skirvin
1db79b3e78 isci: Directly control IREQ_ABORT_PATH_ACTIVE when completing TMFs.
TMF requests, unlike normal I/O requests, need to handle I/O management
conditions in the completion function because TMFs are not handled in the
completion tasklet.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:42 -07:00
Jeff Skirvin
0cce165e28 isci: Wait for RNC resumption before leaving the abort path.
In the case of TMF execution, or device resets, wait for the RNC to fully
resume before returning to the caller.  This ensures that the remote
device will not fail I/O requests while waiting for the RNC resumption to
complete.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:42 -07:00
Jeff Skirvin
d76689e46c isci: Fix RNC suspend call for SCI_RESUMING state.
Instead of immediately transitioning to the SCI_RNC_AWAIT_SUSPENSION
state, handle the SCI_RNC_RESUMING suspend transition from the
SCI_RNC_READY state like the SCI_RNC_INVALIDATING --> SCI_RNC_POSTING
transitions do now, by setting the destination state for the entry
into the READY state.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:41 -07:00
Jeff Skirvin
621120ca56 isci: Manage tag releases differently when aborting tasks.
When an individual request is being terminated, the request's tag
is managed in the terminate function.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:41 -07:00
Jeff Skirvin
033d19d298 isci: Callbacks to libsas occur under scic_lock and are synchronized.
This patch changes the callback mechanism to libsas to only occur while
the scic_lock is held; the abort path cleanup of I/Os also checks to make
sure IREQ_ABORT_PATH_ACTIVE is clear before proceding.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:41 -07:00
Jeff Skirvin
0c3ce38f1b isci: When in the abort path, defeat other resume calls until done.
Completion of I/Os during the one of the abort path interface calls
from libsas can drive remote device state changes and the resumption
of the device RNC.  This is a problem when the abort path is
attempting to cleanup outstanding I/O at the same time - the resumption
can prevent the termination from occuring correctly.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:41 -07:00
Jeff Skirvin
31a38ef0a5 isci: Implement waiting for suspend in the abort path.
In order to prevent a device from receiving an I/O request while still
in an RNC suspending or resuming state (and therefore failing that
I/O back to libsas with a reset required status) wait for the RNC state
change before proceding in the abort path.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:41 -07:00
Jeff Skirvin
08c031e4e3 isci: Make sure all TCs are terminated and cleaned in LUN reset.
In the libsas error path, SATA disks require extra handling in
libata to recover operation.  However, libsas expects to be able
to immediately recover all outstanding I/O once the error handler
escalation stops.  This patch fixes the condition where the libata
error handler is scheduled for operation but libsas has already
deleted the outstanding sas_tasks.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:40 -07:00
Jeff Skirvin
9608b6408e isci: Manage the LLHANG timer enable/disable per-device.
The LLHANG timer should be enabled once per device.  This patch corrects
both the timer enable and the timer disable for the remote device.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:40 -07:00
Jeff Skirvin
447bfbcee0 isci: Save the suspension hint for upcoming suspensions.
In the case of a suspend call while in SCI_RNC_POSTING or INVALIDATING
states, the LLHANG detect needed to be saved so the upcoming suspension
would enable it correctly.  The unused suspend callback parameters were
removed.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:40 -07:00
Jeff Skirvin
e3c84dfdb8 isci: Fix the terminated I/O to not call sas_task_abort().
This addresses a regression from the commit "isci: Redesign
device suspension, abort, cleanup." in which the sas_task end
condition for terminated I/Os was made to call back on
sas_task_abort()".
This commit will be rolled into the original.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:40 -07:00
Jeff Skirvin
c94fc1ad25 isci: Distinguish between remote device suspension cases
For NCQ error conditions among others, there is no need to enable
the link layer hang detect timer.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:40 -07:00
Jeff Skirvin
d6b2a0e4a0 isci: Remove isci_device reqs_in_process and dev_node from isci_device.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:39 -07:00
Jeff Skirvin
033751f664 isci: Only set IDEV_GONE in the device stop path.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:39 -07:00
Jeff Skirvin
aa20d93430 isci: All pending TCs are terminated when the RNC is invalidated.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:39 -07:00
Jeff Skirvin
637325028f isci: Device access in the error path does not depend on IDEV_GONE.
Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:39 -07:00
Jeff Skirvin
59e3539643 isci: Add suspension cases for RNC INVALIDATING, POSTING states.
The RNC can be any of the states in the loop from suspended to
ready when the API "suspend" or "resume" are called.  This change
adds destination states parameters that control the suspension /
resumption action of the RNC statemachine for those transition states.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:39 -07:00
Jeff Skirvin
14aaa9f0a3 isci: Redesign device suspension, abort, cleanup.
This commit changes the means by which outstanding I/Os are handled
for cleanup.
The likelihood is that this commit will be broken into smaller pieces,
however that will be a later revision.  Among the changes:

- All completion structures have been removed from the tmf and
abort paths.
- Now using one completed I/O list, with the I/O completed in host bit being
used to select error or normal callback paths.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:38 -07:00
Jeff Skirvin
d80ecd5726 isci: Escalate to I_T_Nexus_Reset when the device is gone.
If LUN reset sees that the device is gone, it returns TMF_RESP_FUNC_FAILED
to cause libsas to escalate to an I_T_Nexus_Reset.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:38 -07:00
Jeff Skirvin
83884014ea isci: Remote device stop also suspends the RNC and terminates I/O.
Fixing the remote device state machine to suspend and terminate
all outstanding I/O before the device stopped state is reached.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:37 -07:00
Jeff Skirvin
23ec2aa947 isci: Remote device must be suspended for NCQ cleanup.
When the remote device enters the NCQ error state, the device must
be suspended so that the I/O terminations can take place.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:37 -07:00
Jeff Skirvin
5b6bf225e7 isci: Manage device suspensions during TC terminations.
TCs must be terminated only while the RNC is suspended.  This commit
adds remote device suspensions and resumptions in the abort, reset and
termination paths.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:37 -07:00
Jeff Skirvin
726980d569 isci: Terminate outstanding TCs on TX/RX RNC suspensions.
TCs must only be terminated when RNCs are suspended.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:37 -07:00
Jeff Skirvin
ac78ed0f78 isci: Handle all suspending TC completions
Add comprehensive decode for all TC completions that generate RNC
suspensions.

Note that this commit also removes unconditional resumptions of ATAPI
devices when in the SCI_STP_DEV_ATAPI_ERROR state, and STP devices
when in the SCI_STP_DEV_IDLE state. This is because the SCI_STP_DEV_IDLE
and SCI_STP_DEV_ATAPI state entry functions manage the RNC resumption.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:37 -07:00
Jeff Skirvin
56d7c013e7 isci: Fixed bug in resumption from RNC Tx/Rx suspend state.
The resumption from the Tx/Rx suspended state should work the same
as the Tx suspended state.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:36 -07:00
Jeff Skirvin
6f48844e3f isci: Manage the link layer hang detect timer for RNC suspensions.
For STP devices under certain protocol conditions, an RNC will not
suspend until the current transfer state is broken with a SYNC/ESC
sequence from the SCU.  The SYNC/ESC driven by expiration of the
SCU link layer hang detect timer, which has too small a dynamic
range to support slow SATA devices, so normally it is disabled.

This change enables the timer with the minimum period at the point
when the suspension is requested.

Note that there is potential collateral damage to other open
connections to slow SATA devices on the same port, since there
is no alternative but to enable the LLHANG timer on every phy in
the port for the current suspension request - there is no way to
tell on which phy the RNC in question is currently active.

Signed-off-by: Jeff Skirvin <jeffrey.d.skirvin@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 14:33:36 -07:00
Dan Williams
fc25f79af3 isci: fix oem parameter validation on single controller skus
OEM parameters [1] are parsed from the platform option-rom / efi
driver.  By default the driver was validating the parameters for the
dual-controller case, but in single-controller case only the first set
of parameters may be valid.

Limit the validation to the number of actual controllers detected
otherwise the driver may fail to parse the valid parameters leading to
driver-load or runtime failures.

[1] the platform specific set of phy address, configuration,and analog
    tuning values

[stable v3.0+]
Cc: <stable@vger.kernel.org>
Reported-by: Dave Jiang <dave.jiang@intel.com>
Tested-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:29 -07:00
Maciej Trela
08e73be56b isci: enable BCN in sci_port_add_phy()
Ensure we enable receiving BCN's from the
hardware when adding phy to isci_port.
Otherwise if we get BCN before the port is
created we won't see any BCN

Signed-off-by: Maciej Trela <maciej.trela@intel.com>
Reported-by: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:28 -07:00
Andrzej Jakowski
6119908f0f isci: Changes in COMSAS timings enabling ISCI to detect buggy disc drives.
This patch extends timings in COMSAS signaling, so ISCI can detect disc
drives having issues to send COMSAS in correct time frame.

Signed-off-by: Andrzej Jakowski <andrzej.jakowski@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:28 -07:00
Dan Williams
d1dc5e2d21 isci: kill isci_host.shost
We can retrieve the shost from the sas_ha like the rest of libsas and
drop this out of our local data structure.

Acked-by: Jacek Danecki <jacek.danecki@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:13 -07:00
Dan Williams
2396a2650a isci: fix interrupt disable
There is a (dubious?) lost irq workaround in sci_controller_isr() that
effectively nullifies attempts to disable interrupts.  Until the
workaround can be re-evaluated add some infrastructure to prevent the
interrupt handler from inadvertantly re-enabling interrupts.

The failure mode was interrupts continuing to run after the driver had
been removed and its iomappings torn down.

Reported-by: Jacek Danecki <jacek.danecki@intel.com>
Tested-by: Jacek Danecki <jacek.danecki@intel.com>
[richard: clear remaining interrupts at the end of reset]
Acked-by: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:12 -07:00
Dan Williams
50a92d9314 isci: fix 'link-up' events occur after 'start-complete'
The call to wait_for_start() is meant to ensure that all links have been
given a chance to come up before letting the kernel proceed with
probing.  However, the implementation is not correctly syncing with the
port configuration agent.  In the MPC case the ports are hard-coded, in
the APC case we need to wait for the port-configuration to form ports
from the started phys.

Towards that end increase the timeout for the APC agent to form ports,
and delay start complete until all phys are out of link-training.

Cc: <stable@vger.kernel.org>
Cc: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:12 -07:00
Dan Williams
eb608c3cb3 isci: fix controller stop
1/ notify waiters when controller stop completes (fixes 10 second stall
   unloading the driver)
2/ make sure phy stop is after port and device stop

Cc: Richard Boyd <richard.g.boyd@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:12 -07:00
Dan Williams
abec912d71 isci: refactor initialization for S3/S4
Based on an original implementation by Ed Nadolski and Artur Wojcik

In preparation for S3/S4 support refactor initialization so that
driver-load and resume-from-suspend can share the common init path of
isci_host_init().  Organize the initialization into objects that are
self-contained to the driver (initialized by isci_host_init) versus
those that have some upward registration (initialized at allocation time
asd_sas_phy, asd_sas_port, dma allocations).  The largest change is
moving the the validation of the oem and module parameters from
isci_host_init() to isci_host_alloc().

The S3/S4 approach being taken is that libsas will be tasked with
remembering the state of the domain and the lldd is free to be
forgetful.  In the case of isci we'll just re-init using a subset of the
normal driver load path.

[clean up some unused / mis-indented function definitions in host.h]

Signed-off-by: Ed Nadolski <edmund.nadolski@intel.com>
Signed-off-by: Artur Wojcik <artur.wojcik@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:12 -07:00
Dan Williams
ae904d15cf isci: kill isci_port.domain_dev_list
Another unused field, and isci_port_init is overkill.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:12 -07:00
Dan Williams
1844e4789f isci: kill ->status, and ->state_lock in isci_host
They serve no incremental purpose over the existing sas_ha state.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:12 -07:00
Tom Jackson
944b787d0a isci: Don't filter BROADCAST CHANGE primitives
Per the SAS spec, several types of BROADCAST CHANGE primitives
must cause re-discovery of the originating expander.
Only the standard BROADCAST CHANGE primitive was being
sent to the LIBSAS layer.  The other BC primitives have been
added to the sci_phy_event_handler()

Signed-off-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:12 -07:00
Dan Williams
c79dd80d73 isci: kill sci_phy_protocol and sci_request_protocol
Holdovers from the initial driver cleanup, replace with enum sas_protocol.

Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2012-05-17 12:27:11 -07:00