Skip to content

Commit 8ebbf26

Browse files
Dave ChinnerDarrick J. Wong
authored andcommitted
xfs: don't block in busy flushing when freeing extents
If the current transaction holds a busy extent and we are trying to allocate a new extent to fix up the free list, we can deadlock if the AG is entirely empty except for the busy extent held by the transaction. This can occur at runtime processing an XEFI with multiple extents in this path: __schedule+0x22f at ffffffff81f75e8f schedule+0x46 at ffffffff81f76366 xfs_extent_busy_flush+0x69 at ffffffff81477d99 xfs_alloc_ag_vextent_size+0x16a at ffffffff8141711a xfs_alloc_ag_vextent+0x19b at ffffffff81417edb xfs_alloc_fix_freelist+0x22f at ffffffff8141896f xfs_free_extent_fix_freelist+0x6a at ffffffff8141939a __xfs_free_extent+0x99 at ffffffff81419499 xfs_trans_free_extent+0x3e at ffffffff814a6fee xfs_extent_free_finish_item+0x24 at ffffffff814a70d4 xfs_defer_finish_noroll+0x1f7 at ffffffff81441407 xfs_defer_finish+0x11 at ffffffff814417e1 xfs_itruncate_extents_flags+0x13d at ffffffff8148b7dd xfs_inactive_truncate+0xb9 at ffffffff8148bb89 xfs_inactive+0x227 at ffffffff8148c4f7 xfs_fs_destroy_inode+0xb8 at ffffffff81496898 destroy_inode+0x3b at ffffffff8127d2ab do_unlinkat+0x1d1 at ffffffff81270df1 do_syscall_64+0x40 at ffffffff81f6b5f0 entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8200007c This can also happen in log recovery when processing an EFI with multiple extents through this path: context_switch() kernel/sched/core.c:3881 __schedule() kernel/sched/core.c:5111 schedule() kernel/sched/core.c:5186 xfs_extent_busy_flush() fs/xfs/xfs_extent_busy.c:598 xfs_alloc_ag_vextent_size() fs/xfs/libxfs/xfs_alloc.c:1641 xfs_alloc_ag_vextent() fs/xfs/libxfs/xfs_alloc.c:828 xfs_alloc_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:2362 xfs_free_extent_fix_freelist() fs/xfs/libxfs/xfs_alloc.c:3029 __xfs_free_extent() fs/xfs/libxfs/xfs_alloc.c:3067 xfs_trans_free_extent() fs/xfs/xfs_extfree_item.c:370 xfs_efi_recover() fs/xfs/xfs_extfree_item.c:626 xlog_recover_process_efi() fs/xfs/xfs_log_recover.c:4605 xlog_recover_process_intents() fs/xfs/xfs_log_recover.c:4893 xlog_recover_finish() fs/xfs/xfs_log_recover.c:5824 xfs_log_mount_finish() fs/xfs/xfs_log.c:764 xfs_mountfs() fs/xfs/xfs_mount.c:978 xfs_fs_fill_super() fs/xfs/xfs_super.c:1908 mount_bdev() fs/super.c:1417 xfs_fs_mount() fs/xfs/xfs_super.c:1985 legacy_get_tree() fs/fs_context.c:647 vfs_get_tree() fs/super.c:1547 do_new_mount() fs/namespace.c:2843 do_mount() fs/namespace.c:3163 ksys_mount() fs/namespace.c:3372 __do_sys_mount() fs/namespace.c:3386 __se_sys_mount() fs/namespace.c:3383 __x64_sys_mount() fs/namespace.c:3383 do_syscall_64() arch/x86/entry/common.c:296 entry_SYSCALL_64() arch/x86/entry/entry_64.S:180 To avoid this deadlock, we should not block in xfs_extent_busy_flush() if we hold a busy extent in the current transaction. Now that the EFI processing code can handle requeuing a partially completed EFI, we can detect this situation in xfs_extent_busy_flush() and return -EAGAIN rather than going to sleep forever. The -EAGAIN get propagated back out to the xfs_trans_free_extent() context, where the EFD is populated and the transaction is rolled, thereby moving the busy extents into the CIL. At this point, we can retry the extent free operation again with a clean transaction. If we hit the same "all free extents are busy" situation when trying to fix up the free list, we can safely call xfs_extent_busy_flush() and wait for the busy extents to resolve and wake us. At this point, the allocation search can make progress again and we can fix up the free list. This deadlock was first reported by Chandan in mid-2021, but I couldn't make myself understood during review, and didn't have time to fix it myself. It was reported again in March 2023, and again I have found myself unable to explain the complexities of the solution needed during review. As such, I don't have hours more time to waste trying to get the fix written the way it needs to be written, so I'm just doing it myself. This patchset is largely based on Wengang Wang's last patch, but with all the unnecessary stuff removed, split up into multiple patches and cleaned up somewhat. Reported-by: Chandan Babu R <chandanrlinux@gmail.com> Reported-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org>
1 parent 0853b5d commit 8ebbf26

File tree

4 files changed

+88
-30
lines changed

4 files changed

+88
-30
lines changed

fs/xfs/libxfs/xfs_alloc.c

Lines changed: 50 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -1556,6 +1556,8 @@ xfs_alloc_ag_vextent_near(
15561556
if (args->agbno > args->max_agbno)
15571557
args->agbno = args->max_agbno;
15581558

1559+
/* Retry once quickly if we find busy extents before blocking. */
1560+
alloc_flags |= XFS_ALLOC_FLAG_TRYFLUSH;
15591561
restart:
15601562
len = 0;
15611563

@@ -1611,9 +1613,20 @@ xfs_alloc_ag_vextent_near(
16111613
*/
16121614
if (!acur.len) {
16131615
if (acur.busy) {
1616+
/*
1617+
* Our only valid extents must have been busy. Flush and
1618+
* retry the allocation again. If we get an -EAGAIN
1619+
* error, we're being told that a deadlock was avoided
1620+
* and the current transaction needs committing before
1621+
* the allocation can be retried.
1622+
*/
16141623
trace_xfs_alloc_near_busy(args);
1615-
xfs_extent_busy_flush(args->mp, args->pag,
1616-
acur.busy_gen, alloc_flags);
1624+
error = xfs_extent_busy_flush(args->tp, args->pag,
1625+
acur.busy_gen, alloc_flags);
1626+
if (error)
1627+
goto out;
1628+
1629+
alloc_flags &= ~XFS_ALLOC_FLAG_TRYFLUSH;
16171630
goto restart;
16181631
}
16191632
trace_xfs_alloc_size_neither(args);
@@ -1653,6 +1666,8 @@ xfs_alloc_ag_vextent_size(
16531666
int error;
16541667
int i;
16551668

1669+
/* Retry once quickly if we find busy extents before blocking. */
1670+
alloc_flags |= XFS_ALLOC_FLAG_TRYFLUSH;
16561671
restart:
16571672
/*
16581673
* Allocate and initialize a cursor for the by-size btree.
@@ -1710,19 +1725,25 @@ xfs_alloc_ag_vextent_size(
17101725
error = xfs_btree_increment(cnt_cur, 0, &i);
17111726
if (error)
17121727
goto error0;
1713-
if (i == 0) {
1714-
/*
1715-
* Our only valid extents must have been busy.
1716-
* Make it unbusy by forcing the log out and
1717-
* retrying.
1718-
*/
1719-
xfs_btree_del_cursor(cnt_cur,
1720-
XFS_BTREE_NOERROR);
1721-
trace_xfs_alloc_size_busy(args);
1722-
xfs_extent_busy_flush(args->mp, args->pag,
1723-
busy_gen, alloc_flags);
1724-
goto restart;
1725-
}
1728+
if (i)
1729+
continue;
1730+
1731+
/*
1732+
* Our only valid extents must have been busy. Flush and
1733+
* retry the allocation again. If we get an -EAGAIN
1734+
* error, we're being told that a deadlock was avoided
1735+
* and the current transaction needs committing before
1736+
* the allocation can be retried.
1737+
*/
1738+
trace_xfs_alloc_size_busy(args);
1739+
error = xfs_extent_busy_flush(args->tp, args->pag,
1740+
busy_gen, alloc_flags);
1741+
if (error)
1742+
goto error0;
1743+
1744+
alloc_flags &= ~XFS_ALLOC_FLAG_TRYFLUSH;
1745+
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
1746+
goto restart;
17261747
}
17271748
}
17281749

@@ -1802,10 +1823,21 @@ xfs_alloc_ag_vextent_size(
18021823
args->len = rlen;
18031824
if (rlen < args->minlen) {
18041825
if (busy) {
1805-
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
1826+
/*
1827+
* Our only valid extents must have been busy. Flush and
1828+
* retry the allocation again. If we get an -EAGAIN
1829+
* error, we're being told that a deadlock was avoided
1830+
* and the current transaction needs committing before
1831+
* the allocation can be retried.
1832+
*/
18061833
trace_xfs_alloc_size_busy(args);
1807-
xfs_extent_busy_flush(args->mp, args->pag, busy_gen,
1808-
alloc_flags);
1834+
error = xfs_extent_busy_flush(args->tp, args->pag,
1835+
busy_gen, alloc_flags);
1836+
if (error)
1837+
goto error0;
1838+
1839+
alloc_flags &= ~XFS_ALLOC_FLAG_TRYFLUSH;
1840+
xfs_btree_del_cursor(cnt_cur, XFS_BTREE_NOERROR);
18091841
goto restart;
18101842
}
18111843
goto out_nominleft;

fs/xfs/libxfs/xfs_alloc.h

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,11 +19,12 @@ unsigned int xfs_agfl_size(struct xfs_mount *mp);
1919
/*
2020
* Flags for xfs_alloc_fix_freelist.
2121
*/
22-
#define XFS_ALLOC_FLAG_TRYLOCK 0x00000001 /* use trylock for buffer locking */
23-
#define XFS_ALLOC_FLAG_FREEING 0x00000002 /* indicate caller is freeing extents*/
24-
#define XFS_ALLOC_FLAG_NORMAP 0x00000004 /* don't modify the rmapbt */
25-
#define XFS_ALLOC_FLAG_NOSHRINK 0x00000008 /* don't shrink the freelist */
26-
#define XFS_ALLOC_FLAG_CHECK 0x00000010 /* test only, don't modify args */
22+
#define XFS_ALLOC_FLAG_TRYLOCK (1U << 0) /* use trylock for buffer locking */
23+
#define XFS_ALLOC_FLAG_FREEING (1U << 1) /* indicate caller is freeing extents*/
24+
#define XFS_ALLOC_FLAG_NORMAP (1U << 2) /* don't modify the rmapbt */
25+
#define XFS_ALLOC_FLAG_NOSHRINK (1U << 3) /* don't shrink the freelist */
26+
#define XFS_ALLOC_FLAG_CHECK (1U << 4) /* test only, don't modify args */
27+
#define XFS_ALLOC_FLAG_TRYFLUSH (1U << 5) /* don't wait in busy extent flush */
2728

2829
/*
2930
* Argument structure for xfs_alloc routines.

fs/xfs/xfs_extent_busy.c

Lines changed: 29 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -566,21 +566,45 @@ xfs_extent_busy_clear(
566566

567567
/*
568568
* Flush out all busy extents for this AG.
569+
*
570+
* If the current transaction is holding busy extents, the caller may not want
571+
* to wait for committed busy extents to resolve. If we are being told just to
572+
* try a flush or progress has been made since we last skipped a busy extent,
573+
* return immediately to allow the caller to try again.
574+
*
575+
* If we are freeing extents, we might actually be holding the only free extents
576+
* in the transaction busy list and the log force won't resolve that situation.
577+
* In this case, we must return -EAGAIN to avoid a deadlock by informing the
578+
* caller it needs to commit the busy extents it holds before retrying the
579+
* extent free operation.
569580
*/
570-
void
581+
int
571582
xfs_extent_busy_flush(
572-
struct xfs_mount *mp,
583+
struct xfs_trans *tp,
573584
struct xfs_perag *pag,
574585
unsigned busy_gen,
575586
uint32_t alloc_flags)
576587
{
577588
DEFINE_WAIT (wait);
578589
int error;
579590

580-
error = xfs_log_force(mp, XFS_LOG_SYNC);
591+
error = xfs_log_force(tp->t_mountp, XFS_LOG_SYNC);
581592
if (error)
582-
return;
593+
return error;
594+
595+
/* Avoid deadlocks on uncommitted busy extents. */
596+
if (!list_empty(&tp->t_busy)) {
597+
if (alloc_flags & XFS_ALLOC_FLAG_TRYFLUSH)
598+
return 0;
599+
600+
if (busy_gen != READ_ONCE(pag->pagb_gen))
601+
return 0;
602+
603+
if (alloc_flags & XFS_ALLOC_FLAG_FREEING)
604+
return -EAGAIN;
605+
}
583606

607+
/* Wait for committed busy extents to resolve. */
584608
do {
585609
prepare_to_wait(&pag->pagb_wait, &wait, TASK_KILLABLE);
586610
if (busy_gen != READ_ONCE(pag->pagb_gen))
@@ -589,6 +613,7 @@ xfs_extent_busy_flush(
589613
} while (1);
590614

591615
finish_wait(&pag->pagb_wait, &wait);
616+
return 0;
592617
}
593618

594619
void

fs/xfs/xfs_extent_busy.h

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -51,9 +51,9 @@ bool
5151
xfs_extent_busy_trim(struct xfs_alloc_arg *args, xfs_agblock_t *bno,
5252
xfs_extlen_t *len, unsigned *busy_gen);
5353

54-
void
55-
xfs_extent_busy_flush(struct xfs_mount *mp, struct xfs_perag *pag,
56-
unsigned busy_gen, uint32_t alloc_flags);
54+
int
55+
xfs_extent_busy_flush(struct xfs_trans *tp, struct xfs_perag *pag,
56+
unsigned busy_gen, uint32_t alloc_flags);
5757

5858
void
5959
xfs_extent_busy_wait_all(struct xfs_mount *mp);

0 commit comments

Comments
 (0)