Skip to content

Commit 83e80a6

Browse files
jankaratytso
authored andcommitted
ext4: use buckets for cr 1 block scan instead of rbtree
Using rbtree for sorting groups by average fragment size is relatively expensive (needs rbtree update on every block freeing or allocation) and leads to wide spreading of allocations because selection of block group is very sentitive both to changes in free space and amount of blocks allocated. Furthermore selecting group with the best matching average fragment size is not necessary anyway, even more so because the variability of fragment sizes within a group is likely large so average is not telling much. We just need a group with large enough average fragment size so that we have high probability of finding large enough free extent and we don't want average fragment size to be too big so that we are likely to find free extent only somewhat larger than what we need. So instead of maintaing rbtree of groups sorted by fragment size keep bins (lists) or groups where average fragment size is in the interval [2^i, 2^(i+1)). This structure requires less updates on block allocation / freeing, generally avoids chaotic spreading of allocations into block groups, and still is able to quickly (even faster that the rbtree) provide a block group which is likely to have a suitably sized free space extent. This patch reduces number of block groups used when untarring archive with medium sized files (size somewhat above 64k which is default mballoc limit for avoiding locality group preallocation) to about half and thus improves write speeds for eMMC flash significantly. Fixes: 196e402 ("ext4: improve cr 0 / cr 1 group scanning") CC: stable@kernel.org Reported-and-tested-by: Stefan Wahren <stefan.wahren@i2se.com> Tested-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Link: https://lore.kernel.org/all/0d81a7c2-46b7-6010-62a4-3e6cfc1628d6@i2se.com/ Link: https://lore.kernel.org/r/20220908092136.11770-5-jack@suse.cz Signed-off-by: Theodore Ts'o <tytso@mit.edu>
1 parent a9f2a29 commit 83e80a6

File tree

3 files changed

+111
-149
lines changed

3 files changed

+111
-149
lines changed

fs/ext4/ext4.h

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -167,8 +167,6 @@ enum SHIFT_DIRECTION {
167167
#define EXT4_MB_CR0_OPTIMIZED 0x8000
168168
/* Avg fragment size rb tree lookup succeeded at least once for cr = 1 */
169169
#define EXT4_MB_CR1_OPTIMIZED 0x00010000
170-
/* Perform linear traversal for one group */
171-
#define EXT4_MB_SEARCH_NEXT_LINEAR 0x00020000
172170
struct ext4_allocation_request {
173171
/* target inode for block we're allocating */
174172
struct inode *inode;
@@ -1600,8 +1598,8 @@ struct ext4_sb_info {
16001598
struct list_head s_discard_list;
16011599
struct work_struct s_discard_work;
16021600
atomic_t s_retry_alloc_pending;
1603-
struct rb_root s_mb_avg_fragment_size_root;
1604-
rwlock_t s_mb_rb_lock;
1601+
struct list_head *s_mb_avg_fragment_size;
1602+
rwlock_t *s_mb_avg_fragment_size_locks;
16051603
struct list_head *s_mb_largest_free_orders;
16061604
rwlock_t *s_mb_largest_free_orders_locks;
16071605

@@ -3413,14 +3411,16 @@ struct ext4_group_info {
34133411
ext4_grpblk_t bb_first_free; /* first free block */
34143412
ext4_grpblk_t bb_free; /* total free blocks */
34153413
ext4_grpblk_t bb_fragments; /* nr of freespace fragments */
3414+
int bb_avg_fragment_size_order; /* order of average
3415+
fragment in BG */
34163416
ext4_grpblk_t bb_largest_free_order;/* order of largest frag in BG */
34173417
ext4_group_t bb_group; /* Group number */
34183418
struct list_head bb_prealloc_list;
34193419
#ifdef DOUBLE_CHECK
34203420
void *bb_bitmap;
34213421
#endif
34223422
struct rw_semaphore alloc_sem;
3423-
struct rb_node bb_avg_fragment_size_rb;
3423+
struct list_head bb_avg_fragment_size_node;
34243424
struct list_head bb_largest_free_order_node;
34253425
ext4_grpblk_t bb_counters[]; /* Nr of free power-of-two-block
34263426
* regions, index is order.

0 commit comments

Comments
 (0)