Skip to content

Commit 8b4d946

Browse files
committed
ata: libata-scsi: Fix delayed scsi_rescan_device() execution
Commit 6aa0365 ("ata: libata-scsi: Avoid deadlock on rescan after device resume") modified ata_scsi_dev_rescan() to check the scsi device "is_suspended" power field to ensure that the scsi device associated with an ATA device is fully resumed when scsi_rescan_device() is executed. However, this fix is problematic as: 1) It relies on a PM internal field that should not be used without PM device locking protection. 2) The check for is_suspended and the call to scsi_rescan_device() are not atomic and a suspend PM event may be triggered between them, casuing scsi_rescan_device() to be called on a suspended device and in that function blocking while holding the scsi device lock. This would deadlock a following resume operation. These problems can trigger PM deadlocks on resume, especially with resume operations triggered quickly after or during suspend operations. E.g., a simple bash script like: for (( i=0; i<10; i++ )); do echo "+2 > /sys/class/rtc/rtc0/wakealarm echo mem > /sys/power/state done that triggers a resume 2 seconds after starting suspending a system can quickly lead to a PM deadlock preventing the system from correctly resuming. Fix this by replacing the check on is_suspended with a check on the return value given by scsi_rescan_device() as that function will fail if called against a suspended device. Also make sure rescan tasks already scheduled are first cancelled before suspending an ata port. Fixes: 6aa0365 ("ata: libata-scsi: Avoid deadlock on rescan after device resume") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
1 parent ff48b37 commit 8b4d946

File tree

2 files changed

+31
-18
lines changed

2 files changed

+31
-18
lines changed

drivers/ata/libata-core.c

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5168,11 +5168,27 @@ static const unsigned int ata_port_suspend_ehi = ATA_EHI_QUIET
51685168

51695169
static void ata_port_suspend(struct ata_port *ap, pm_message_t mesg)
51705170
{
5171+
/*
5172+
* We are about to suspend the port, so we do not care about
5173+
* scsi_rescan_device() calls scheduled by previous resume operations.
5174+
* The next resume will schedule the rescan again. So cancel any rescan
5175+
* that is not done yet.
5176+
*/
5177+
cancel_delayed_work_sync(&ap->scsi_rescan_task);
5178+
51715179
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, false);
51725180
}
51735181

51745182
static void ata_port_suspend_async(struct ata_port *ap, pm_message_t mesg)
51755183
{
5184+
/*
5185+
* We are about to suspend the port, so we do not care about
5186+
* scsi_rescan_device() calls scheduled by previous resume operations.
5187+
* The next resume will schedule the rescan again. So cancel any rescan
5188+
* that is not done yet.
5189+
*/
5190+
cancel_delayed_work_sync(&ap->scsi_rescan_task);
5191+
51765192
ata_port_request_pm(ap, mesg, 0, ata_port_suspend_ehi, true);
51775193
}
51785194

drivers/ata/libata-scsi.c

Lines changed: 15 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -4756,7 +4756,7 @@ void ata_scsi_dev_rescan(struct work_struct *work)
47564756
struct ata_link *link;
47574757
struct ata_device *dev;
47584758
unsigned long flags;
4759-
bool delay_rescan = false;
4759+
int ret = 0;
47604760

47614761
mutex_lock(&ap->scsi_scan_mutex);
47624762
spin_lock_irqsave(ap->lock, flags);
@@ -4765,37 +4765,34 @@ void ata_scsi_dev_rescan(struct work_struct *work)
47654765
ata_for_each_dev(dev, link, ENABLED) {
47664766
struct scsi_device *sdev = dev->sdev;
47674767

4768+
/*
4769+
* If the port was suspended before this was scheduled,
4770+
* bail out.
4771+
*/
4772+
if (ap->pflags & ATA_PFLAG_SUSPENDED)
4773+
goto unlock;
4774+
47684775
if (!sdev)
47694776
continue;
47704777
if (scsi_device_get(sdev))
47714778
continue;
47724779

4773-
/*
4774-
* If the rescan work was scheduled because of a resume
4775-
* event, the port is already fully resumed, but the
4776-
* SCSI device may not yet be fully resumed. In such
4777-
* case, executing scsi_rescan_device() may cause a
4778-
* deadlock with the PM code on device_lock(). Prevent
4779-
* this by giving up and retrying rescan after a short
4780-
* delay.
4781-
*/
4782-
delay_rescan = sdev->sdev_gendev.power.is_suspended;
4783-
if (delay_rescan) {
4784-
scsi_device_put(sdev);
4785-
break;
4786-
}
4787-
47884780
spin_unlock_irqrestore(ap->lock, flags);
4789-
scsi_rescan_device(sdev);
4781+
ret = scsi_rescan_device(sdev);
47904782
scsi_device_put(sdev);
47914783
spin_lock_irqsave(ap->lock, flags);
4784+
4785+
if (ret)
4786+
goto unlock;
47924787
}
47934788
}
47944789

4790+
unlock:
47954791
spin_unlock_irqrestore(ap->lock, flags);
47964792
mutex_unlock(&ap->scsi_scan_mutex);
47974793

4798-
if (delay_rescan)
4794+
/* Reschedule with a delay if scsi_rescan_device() returned an error */
4795+
if (ret)
47994796
schedule_delayed_work(&ap->scsi_rescan_task,
48004797
msecs_to_jiffies(5));
48014798
}

0 commit comments

Comments
 (0)