This is a particularly nasty bug. The problem is that if any internal
ascb times out, currently we free it even though it's pending at the
sequencer. This results in the sequencer getting terminally confused
and the error message:
BUG:sequencer:dl:no ascb
Being returned when it comes back. The way to fix this is to manage
freeing the ascb from the tasklet completion routine, so that we only
free it when the sequencer actually returns it. The code is also
altered to use on stack completions and transfer variables.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Currently aic94xx has no exported I_T_nexus_reset function. This is a
bit of a huge problem, since sas_ata relies on this function to
perform an ATA phy reset and also it means that if abort fails, we
really have no bigger hammer to hit everything with.
Plumb in the I_T_nexus_reset by quiescing the sequencer, sending the
correct phy reset (link for ATA and hard for SAS) and then carefully
resuming the sequencer again.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The clear nexus I_T and clear nexus I_T_L functions in the aic94xx
specify the SUSPEND_TX flag which causes the sequencer to be suspended
until it receives a RESUME_TX. Unfortunately, nothing ever sends the
resume, so the sequencer on the link is stopped forever, leading to
eventual timeouts and I/O errors.
Since clear nexus commands are only executed as part of error recovery,
it's perfectly fine to keep the sequencer running on the link ... as
soon as the recovery function is completed, we'll send it the commands
to retry.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
include/scsi/scsi.h as a definition:
#define ABORT_TASK 0x0d
on the other hand drivers/scsi/aic94xx/aic94xx_sas.h has:
#define ABORT_TASK 0x03
rename the latter to SCB_ABORT_TASK
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
sparse complains about the mixing of enums in libsas. Since the
underlying numeric values of both enums are the same, combine them
to get rid of the warning.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Every so often, the driver will call asd_clear_nexus to clean out a task.
It is supposed to be the case that the CLEAR NEXUS does not go on the done
list until after the task itself has been put on the done list, but for
some reason this doesn't always happen. Thus, the
wait_for_completion_timeout call times out, and we return success. This
makes libsas free the task even though the task hasn't completed, leading
to a BUG_ON message from aic94xx_hwi.c around line 341. We should return
failure from asd_clear_nexus so that libsas tries again; at a bare minimum
it shouldn't be freeing active tasks. I _think_ this will fix one of
the SCB timeout crash problems (though I've not been able to reproduce
it lately...)
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In asd_initiate_ssp_tmf, the TMF result code is replaced with
TMF_RESP_FUNC_FAILED except when the TMF returns a result code immediately.
However, TMFs can return result codes via an ESCB... yet these codes are
also replaced with "FAILED". The only values that can fall into that case
are TMF_* codes anyway, so get rid of this code where COMPLETE and SUCCESS
are turned into FAILED. This also lets us propagate those TMF_* codes up
to the caller.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In this driver, TMF_QUERY_TASK translates to QUERY_SSP_TASK. The
sequencer, it seems, is perfectly happy sending us a SSP response, which
this function promptly "converts" into TMF_RESP_FUNC_FAILED. This leads to
the SAS EH making bad decisions based on bad data, so we should not perform
the conversion in this case.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is the end point of the separate aic94xx driver based on the
original driver and transport class from Luben Tuikov
<ltuikov@yahoo.com>
The log of the separate development is:
Alexis Bruemmer:
o aic94xx: fix hotplug/unplug for expanderless systems
o aic94xx: disable split completion timer/setting by default
o aic94xx: wide port off expander support
o aic94xx: remove various inline functions
o aic94xx: use bitops
o aic94xx: remove queue comment
o aic94xx: remove sas_common.c
o aic94xx: sas remove depot's
o aic94xx: use available list_for_each_entry_safe_reverse()
o aic94xx: sas header file merge
James Bottomley:
o aic94xx: fix TF_TMF_NO_CTX processing
o aic94xx: convert to request_firmware interface
o aic94xx: fix hotplug/unplug
o aic94xx: add link error counts to the expander phys
o aic94xx: add transport class phy reset capability
o aic94xx: remove local_attached flag
o Remove README
o Fixup Makefile variable for libsas rename
o Rename sas->libsas
o aic94xx: correct return code for sas_discover_event
o aic94xx: use parent backlink port
o aic94xx: remove channel abstraction
o aic94xx: fix routing algorithms
o aic94xx: add backlink port
o aic94xx: fix cascaded expander properties
o aic94xx: fix sleep under lock
o aic94xx: fix panic on module removal in complex topology
o aic94xx: make use of the new sas_port
o rename sas_port to asd_sas_port
o Fix for eh_strategy_handler move
o aic94xx: move entirely over to correct transport class formulation
o remove last vestages of sas_rphy_alloc()
o update for eh_timed_out move
o Preliminary expander support for aic94xx
o sas: remove event thread
o minor warning cleanups
o remove last vestiges of id mapping arrays
o Further updates
o Convert aic94xx over entirely to the transport class end device and
o update aic94xx/sas to use the new sas transport class end device
o [PATCH] aic94xx: attaching to the sas transport class
o Add missing completion removal from prior patch
o [PATCH] aic94xx: attaching to the sas transport class
o Build fixes from akpm
Jeff Garzik:
o [scsi aic94xx] Remove ->owner from PCI info table
Luben Tuikov:
o initial aic94xx driver
Mike Anderson:
o aic94xx: fix panic on module insertion
o aic94xx: stub out SATA_DEV case
o aic94xx: compile warning cleanups
o aic94xx: sas_alloc_task
o aic94xx: ref count update
o aic94xx nexus loss time value
o [PATCH] aic94xx: driver assertion in non-x86 BIOS env
Randy Dunlap:
o libsas: externs not needed
Robert Tarte:
o aic94xx: sequence patch - fixes SATA support
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>