Skip to content

Commit 0cb5950

Browse files
fdmananakdave
authored andcommitted
btrfs: fix deadlock when reserving space during defrag
When defragging we can end up collecting a range for defrag that has already pages under delalloc (dirty), as long as the respective extent map for their range is not mapped to a hole, a prealloc extent or the extent map is from an old generation. Most of the time that is harmless from a functional perspective at least, however it can result in a deadlock: 1) At defrag_collect_targets() we find an extent map that meets all requirements but there's delalloc for the range it covers, and we add its range to list of ranges to defrag; 2) The defrag_collect_targets() function is called at defrag_one_range(), after it locked a range that overlaps the range of the extent map; 3) At defrag_one_range(), while the range is still locked, we call defrag_one_locked_target() for the range associated to the extent map we collected at step 1); 4) Then finally at defrag_one_locked_target() we do a call to btrfs_delalloc_reserve_space(), which will reserve data and metadata space. If the space reservations can not be satisfied right away, the flusher might be kicked in and start flushing delalloc and wait for the respective ordered extents to complete. If this happens we will deadlock, because both flushing delalloc and finishing an ordered extent, requires locking the range in the inode's io tree, which was already locked at defrag_collect_targets(). So fix this by skipping extent maps for which there's already delalloc. Fixes: eb793cf ("btrfs: defrag: introduce helper to collect target file extents") CC: stable@vger.kernel.org # 5.16 Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent c080b41 commit 0cb5950

File tree

1 file changed

+30
-1
lines changed

1 file changed

+30
-1
lines changed

fs/btrfs/ioctl.c

Lines changed: 30 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1213,6 +1213,35 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
12131213
if (em->generation < newer_than)
12141214
goto next;
12151215

1216+
/*
1217+
* Our start offset might be in the middle of an existing extent
1218+
* map, so take that into account.
1219+
*/
1220+
range_len = em->len - (cur - em->start);
1221+
/*
1222+
* If this range of the extent map is already flagged for delalloc,
1223+
* skip it, because:
1224+
*
1225+
* 1) We could deadlock later, when trying to reserve space for
1226+
* delalloc, because in case we can't immediately reserve space
1227+
* the flusher can start delalloc and wait for the respective
1228+
* ordered extents to complete. The deadlock would happen
1229+
* because we do the space reservation while holding the range
1230+
* locked, and starting writeback, or finishing an ordered
1231+
* extent, requires locking the range;
1232+
*
1233+
* 2) If there's delalloc there, it means there's dirty pages for
1234+
* which writeback has not started yet (we clean the delalloc
1235+
* flag when starting writeback and after creating an ordered
1236+
* extent). If we mark pages in an adjacent range for defrag,
1237+
* then we will have a larger contiguous range for delalloc,
1238+
* very likely resulting in a larger extent after writeback is
1239+
* triggered (except in a case of free space fragmentation).
1240+
*/
1241+
if (test_range_bit(&inode->io_tree, cur, cur + range_len - 1,
1242+
EXTENT_DELALLOC, 0, NULL))
1243+
goto next;
1244+
12161245
/*
12171246
* For do_compress case, we want to compress all valid file
12181247
* extents, thus no @extent_thresh or mergeable check.
@@ -1221,7 +1250,7 @@ static int defrag_collect_targets(struct btrfs_inode *inode,
12211250
goto add;
12221251

12231252
/* Skip too large extent */
1224-
if (em->len >= extent_thresh)
1253+
if (range_len >= extent_thresh)
12251254
goto next;
12261255

12271256
next_mergeable = defrag_check_next_extent(&inode->vfs_inode, em,

0 commit comments

Comments
 (0)