Skip to content

Commit 3ca8273

Browse files
fdmananakdave
authored andcommitted
btrfs: fix failure to rebuild free space tree using multiple transactions
If we are rebuilding a free space tree, while modifying the free space tree we may need to allocate a new metadata block group. If we end up using multiple transactions for the rebuild, when we call btrfs_end_transaction() we enter btrfs_create_pending_block_groups() which calls add_block_group_free_space() to add items to the free space tree for the block group. Then later during the free space tree rebuild, at btrfs_rebuild_free_space_tree(), we may find such new block groups and call populate_free_space_tree() for them, which fails with -EEXIST because there are already items in the free space tree. Then we abort the transaction with -EEXIST at btrfs_rebuild_free_space_tree(). Notice that we say "may find" the new block groups because a new block group may be inserted in the block groups rbtree, which is being iterated by the rebuild process, before or after the current node where the rebuild process is currently at. Syzbot recently reported such case which produces a trace like the following: ------------[ cut here ]------------ BTRFS: Transaction aborted (error -17) WARNING: CPU: 1 PID: 7626 at fs/btrfs/free-space-tree.c:1341 btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341 Modules linked in: CPU: 1 UID: 0 PID: 7626 Comm: syz.2.25 Not tainted 6.15.0-rc7-syzkaller-00085-gd7fa1af5b33e-dirty #0 PREEMPT Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/07/2025 pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341 lr : btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341 sp : ffff80009c4f7740 x29: ffff80009c4f77b0 x28: ffff0000d4c3f400 x27: 0000000000000000 x26: dfff800000000000 x25: ffff70001389eee8 x24: 0000000000000003 x23: 1fffe000182b6e7b x22: 0000000000000000 x21: ffff0000c15b73d8 x20: 00000000ffffffef x19: ffff0000c15b7378 x18: 1fffe0003386f276 x17: ffff80008f31e000 x16: ffff80008adbe98c x15: 0000000000000001 x14: 1fffe0001b281550 x13: 0000000000000000 x12: 0000000000000000 x11: ffff60001b281551 x10: 0000000000000003 x9 : 1c8922000a902c00 x8 : 1c8922000a902c00 x7 : ffff800080485878 x6 : 0000000000000000 x5 : 0000000000000001 x4 : 0000000000000001 x3 : ffff80008047843c x2 : 0000000000000001 x1 : ffff80008b3ebc40 x0 : 0000000000000001 Call trace: btrfs_rebuild_free_space_tree+0x470/0x54c fs/btrfs/free-space-tree.c:1341 (P) btrfs_start_pre_rw_mount+0xa78/0xe10 fs/btrfs/disk-io.c:3074 btrfs_remount_rw fs/btrfs/super.c:1319 [inline] btrfs_reconfigure+0x828/0x2418 fs/btrfs/super.c:1543 reconfigure_super+0x1d4/0x6f0 fs/super.c:1083 do_remount fs/namespace.c:3365 [inline] path_mount+0xb34/0xde0 fs/namespace.c:4200 do_mount fs/namespace.c:4221 [inline] __do_sys_mount fs/namespace.c:4432 [inline] __se_sys_mount fs/namespace.c:4409 [inline] __arm64_sys_mount+0x3e8/0x468 fs/namespace.c:4409 __invoke_syscall arch/arm64/kernel/syscall.c:35 [inline] invoke_syscall+0x98/0x2b8 arch/arm64/kernel/syscall.c:49 el0_svc_common+0x130/0x23c arch/arm64/kernel/syscall.c:132 do_el0_svc+0x48/0x58 arch/arm64/kernel/syscall.c:151 el0_svc+0x58/0x17c arch/arm64/kernel/entry-common.c:767 el0t_64_sync_handler+0x78/0x108 arch/arm64/kernel/entry-common.c:786 el0t_64_sync+0x198/0x19c arch/arm64/kernel/entry.S:600 irq event stamp: 330 hardirqs last enabled at (329): [<ffff80008048590c>] raw_spin_rq_unlock_irq kernel/sched/sched.h:1525 [inline] hardirqs last enabled at (329): [<ffff80008048590c>] finish_lock_switch+0xb0/0x1c0 kernel/sched/core.c:5130 hardirqs last disabled at (330): [<ffff80008adb9e60>] el1_dbg+0x24/0x80 arch/arm64/kernel/entry-common.c:511 softirqs last enabled at (10): [<ffff8000801fbf10>] local_bh_enable+0x10/0x34 include/linux/bottom_half.h:32 softirqs last disabled at (8): [<ffff8000801fbedc>] local_bh_disable+0x10/0x34 include/linux/bottom_half.h:19 ---[ end trace 0000000000000000 ]--- Fix this by flagging new block groups which had their free space tree entries already added and then skip them in the rebuild process. Also, since the rebuild may be triggered when doing a remount, make sure that when we clear an existing free space tree that we clear such flag from every existing block group, otherwise we would skip those block groups during the rebuild. Reported-by: syzbot+d0014fb0fc39c5487ae5@syzkaller.appspotmail.com Link: https://lore.kernel.org/linux-btrfs/68460a54.050a0220.daf97.0af5.GAE@google.com/ Fixes: 882af9f ("btrfs: handle free space tree rebuild in multiple transactions") Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent c306346 commit 3ca8273

File tree

2 files changed

+42
-0
lines changed

2 files changed

+42
-0
lines changed

fs/btrfs/block-group.h

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -83,6 +83,8 @@ enum btrfs_block_group_flags {
8383
BLOCK_GROUP_FLAG_ZONED_DATA_RELOC,
8484
/* Does the block group need to be added to the free space tree? */
8585
BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE,
86+
/* Set after we add a new block group to the free space tree. */
87+
BLOCK_GROUP_FLAG_FREE_SPACE_ADDED,
8688
/* Indicate that the block group is placed on a sequential zone */
8789
BLOCK_GROUP_FLAG_SEQUENTIAL_ZONE,
8890
/*

fs/btrfs/free-space-tree.c

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1241,6 +1241,7 @@ static int clear_free_space_tree(struct btrfs_trans_handle *trans,
12411241
{
12421242
BTRFS_PATH_AUTO_FREE(path);
12431243
struct btrfs_key key;
1244+
struct rb_node *node;
12441245
int nr;
12451246
int ret;
12461247

@@ -1269,6 +1270,16 @@ static int clear_free_space_tree(struct btrfs_trans_handle *trans,
12691270
btrfs_release_path(path);
12701271
}
12711272

1273+
node = rb_first_cached(&trans->fs_info->block_group_cache_tree);
1274+
while (node) {
1275+
struct btrfs_block_group *bg;
1276+
1277+
bg = rb_entry(node, struct btrfs_block_group, cache_node);
1278+
clear_bit(BLOCK_GROUP_FLAG_FREE_SPACE_ADDED, &bg->runtime_flags);
1279+
node = rb_next(node);
1280+
cond_resched();
1281+
}
1282+
12721283
return 0;
12731284
}
12741285

@@ -1358,12 +1369,18 @@ int btrfs_rebuild_free_space_tree(struct btrfs_fs_info *fs_info)
13581369

13591370
block_group = rb_entry(node, struct btrfs_block_group,
13601371
cache_node);
1372+
1373+
if (test_bit(BLOCK_GROUP_FLAG_FREE_SPACE_ADDED,
1374+
&block_group->runtime_flags))
1375+
goto next;
1376+
13611377
ret = populate_free_space_tree(trans, block_group);
13621378
if (ret) {
13631379
btrfs_abort_transaction(trans, ret);
13641380
btrfs_end_transaction(trans);
13651381
return ret;
13661382
}
1383+
next:
13671384
if (btrfs_should_end_transaction(trans)) {
13681385
btrfs_end_transaction(trans);
13691386
trans = btrfs_start_transaction(free_space_root, 1);
@@ -1390,6 +1407,29 @@ static int __add_block_group_free_space(struct btrfs_trans_handle *trans,
13901407

13911408
clear_bit(BLOCK_GROUP_FLAG_NEEDS_FREE_SPACE, &block_group->runtime_flags);
13921409

1410+
/*
1411+
* While rebuilding the free space tree we may allocate new metadata
1412+
* block groups while modifying the free space tree.
1413+
*
1414+
* Because during the rebuild (at btrfs_rebuild_free_space_tree()) we
1415+
* can use multiple transactions, every time btrfs_end_transaction() is
1416+
* called at btrfs_rebuild_free_space_tree() we finish the creation of
1417+
* new block groups by calling btrfs_create_pending_block_groups(), and
1418+
* that in turn calls us, through add_block_group_free_space(), to add
1419+
* a free space info item and a free space extent item for the block
1420+
* group.
1421+
*
1422+
* Then later btrfs_rebuild_free_space_tree() may find such new block
1423+
* groups and processes them with populate_free_space_tree(), which can
1424+
* fail with EEXIST since there are already items for the block group in
1425+
* the free space tree. Notice that we say "may find" because a new
1426+
* block group may be added to the block groups rbtree in a node before
1427+
* or after the block group currently being processed by the rebuild
1428+
* process. So signal the rebuild process to skip such new block groups
1429+
* if it finds them.
1430+
*/
1431+
set_bit(BLOCK_GROUP_FLAG_FREE_SPACE_ADDED, &block_group->runtime_flags);
1432+
13931433
ret = add_new_free_space_info(trans, block_group, path);
13941434
if (ret)
13951435
return ret;

0 commit comments

Comments
 (0)