Skip to content

Commit 2f6397e

Browse files
fdmananakdave
authored andcommitted
btrfs: don't refill whole delayed refs block reserve when starting transaction
Since commit 28270e2 ("btrfs: always reserve space for delayed refs when starting transaction") we started not only to reserve metadata space for the delayed refs a caller of btrfs_start_transaction() might generate but also to try to fully refill the delayed refs block reserve, because there are several case where we generate delayed refs and haven't reserved space for them, relying on the global block reserve. Relying too much on the global block reserve is not always safe, and can result in hitting -ENOSPC during transaction commits or worst, in rare cases, being unable to mount a filesystem that needs to do orphan cleanup or anything that requires modifying the filesystem during mount, and has no more unallocated space and the metadata space is nearly full. This was explained in detail in that commit's change log. However the gap between the reserved amount and the size of the delayed refs block reserve can be huge, so attempting to reserve space for such a gap can result in allocating many metadata block groups that end up not being used. After a recent patch, with the subject: "btrfs: add new unused block groups to the list of unused block groups" We started to add new block groups that are unused to the list of unused block groups, to avoid having them around for a very long time in case they are never used, because a block group is only added to the list of unused block groups when we deallocate the last extent or when mounting the filesystem and the block group has 0 bytes used. This is not a problem introduced by the commit mentioned earlier, it always existed as our metadata space reservations are, most of the time, pessimistic and end up not using all the space they reserved, so we can occasionally end up with one or two unused metadata block groups for a long period. However after that commit mentioned earlier, we are just more pessimistic in the metadata space reservations when starting a transaction and therefore the issue is more likely to happen. This however is not always enough because we might create unused metadata block groups when reserving metadata space at a high rate if there's always a gap in the delayed refs block reserve and the cleaner kthread isn't triggered often enough or is busy with other work (running delayed iputs, cleaning deleted roots, etc), not to mention the block group's allocated space is only usable for a new block group after the transaction used to remove it is committed. A user reported that he's getting a lot of allocated metadata block groups but the usage percentage of metadata space was very low compared to the total allocated space, specially after running a series of block group relocations. So for now stop trying to refill the gap in the delayed refs block reserve and reserve space only for the delayed refs we are expected to generate when starting a transaction. CC: stable@vger.kernel.org # 6.7+ Reported-by: Ivan Shapovalov <intelfx@intelfx.name> Link: https://lore.kernel.org/linux-btrfs/9cdbf0ca9cdda1b4c84e15e548af7d7f9f926382.camel@intelfx.name/ Link: https://lore.kernel.org/linux-btrfs/CAL3q7H6802ayLHUJFztzZAVzBLJAGdFx=6FHNNy87+obZXXZpQ@mail.gmail.com/ Tested-by: Ivan Shapovalov <intelfx@intelfx.name> Reported-by: Heddxh <g311571057@gmail.com> Link: https://lore.kernel.org/linux-btrfs/CAE93xANEby6RezOD=zcofENYZOT-wpYygJyauyUAZkLv6XVFOA@mail.gmail.com/ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
1 parent 88e81a6 commit 2f6397e

File tree

1 file changed

+2
-36
lines changed

1 file changed

+2
-36
lines changed

fs/btrfs/transaction.c

Lines changed: 2 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -564,56 +564,22 @@ static int btrfs_reserve_trans_metadata(struct btrfs_fs_info *fs_info,
564564
u64 num_bytes,
565565
u64 *delayed_refs_bytes)
566566
{
567-
struct btrfs_block_rsv *delayed_refs_rsv = &fs_info->delayed_refs_rsv;
568567
struct btrfs_space_info *si = fs_info->trans_block_rsv.space_info;
569-
u64 extra_delayed_refs_bytes = 0;
570-
u64 bytes;
568+
u64 bytes = num_bytes + *delayed_refs_bytes;
571569
int ret;
572570

573-
/*
574-
* If there's a gap between the size of the delayed refs reserve and
575-
* its reserved space, than some tasks have added delayed refs or bumped
576-
* its size otherwise (due to block group creation or removal, or block
577-
* group item update). Also try to allocate that gap in order to prevent
578-
* using (and possibly abusing) the global reserve when committing the
579-
* transaction.
580-
*/
581-
if (flush == BTRFS_RESERVE_FLUSH_ALL &&
582-
!btrfs_block_rsv_full(delayed_refs_rsv)) {
583-
spin_lock(&delayed_refs_rsv->lock);
584-
if (delayed_refs_rsv->size > delayed_refs_rsv->reserved)
585-
extra_delayed_refs_bytes = delayed_refs_rsv->size -
586-
delayed_refs_rsv->reserved;
587-
spin_unlock(&delayed_refs_rsv->lock);
588-
}
589-
590-
bytes = num_bytes + *delayed_refs_bytes + extra_delayed_refs_bytes;
591-
592571
/*
593572
* We want to reserve all the bytes we may need all at once, so we only
594573
* do 1 enospc flushing cycle per transaction start.
595574
*/
596575
ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
597-
if (ret == 0) {
598-
if (extra_delayed_refs_bytes > 0)
599-
btrfs_migrate_to_delayed_refs_rsv(fs_info,
600-
extra_delayed_refs_bytes);
601-
return 0;
602-
}
603-
604-
if (extra_delayed_refs_bytes > 0) {
605-
bytes -= extra_delayed_refs_bytes;
606-
ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);
607-
if (ret == 0)
608-
return 0;
609-
}
610576

611577
/*
612578
* If we are an emergency flush, which can steal from the global block
613579
* reserve, then attempt to not reserve space for the delayed refs, as
614580
* we will consume space for them from the global block reserve.
615581
*/
616-
if (flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
582+
if (ret && flush == BTRFS_RESERVE_FLUSH_ALL_STEAL) {
617583
bytes -= *delayed_refs_bytes;
618584
*delayed_refs_bytes = 0;
619585
ret = btrfs_reserve_metadata_bytes(fs_info, si, bytes, flush);

0 commit comments

Comments
 (0)