Skip to content

Commit fc05e06

Browse files
jankaraliu-song-6
authored andcommitted
md/raid5: Improve performance for sequential IO
Commit 7e55c60 ("md/raid5: Pivot raid5_make_request()") changed the order in which requests for underlying disks are created. Since for large sequential IO adding of requests frequently races with md_raid5 thread submitting bios to underlying disks, this results in a change in IO pattern because intermediate states of new order of request creation result in more smaller discontiguous requests. For RAID5 on top of three rotational disks our performance testing revealed this results in regression in write throughput: iozone -a -s 131072000 -y 4 -q 8 -i 0 -i 1 -R before 7e55c60: KB reclen write rewrite read reread 131072000 4 493670 525964 524575 513384 131072000 8 540467 532880 512028 513703 after 7e55c60: KB reclen write rewrite read reread 131072000 4 421785 456184 531278 509248 131072000 8 459283 456354 528449 543834 To reduce the amount of discontiguous requests we can start generating requests with the stripe with the lowest chunk offset as that has the best chance of being adjacent to IO queued previously. This improves the performance to: KB reclen write rewrite read reread 131072000 4 497682 506317 518043 514559 131072000 8 514048 501886 506453 504319 restoring big part of the regression. Fixes: 7e55c60 ("md/raid5: Pivot raid5_make_request()") Cc: stable@vger.kernel.org # v6.0+ Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/r/20230417171537.17899-1-jack@suse.cz
1 parent 952aa34 commit fc05e06

File tree

1 file changed

+44
-1
lines changed

1 file changed

+44
-1
lines changed

drivers/md/raid5.c

Lines changed: 44 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6079,6 +6079,38 @@ static enum stripe_result make_stripe_request(struct mddev *mddev,
60796079
return ret;
60806080
}
60816081

6082+
/*
6083+
* If the bio covers multiple data disks, find sector within the bio that has
6084+
* the lowest chunk offset in the first chunk.
6085+
*/
6086+
static sector_t raid5_bio_lowest_chunk_sector(struct r5conf *conf,
6087+
struct bio *bi)
6088+
{
6089+
int sectors_per_chunk = conf->chunk_sectors;
6090+
int raid_disks = conf->raid_disks;
6091+
int dd_idx;
6092+
struct stripe_head sh;
6093+
unsigned int chunk_offset;
6094+
sector_t r_sector = bi->bi_iter.bi_sector & ~((sector_t)RAID5_STRIPE_SECTORS(conf)-1);
6095+
sector_t sector;
6096+
6097+
/* We pass in fake stripe_head to get back parity disk numbers */
6098+
sector = raid5_compute_sector(conf, r_sector, 0, &dd_idx, &sh);
6099+
chunk_offset = sector_div(sector, sectors_per_chunk);
6100+
if (sectors_per_chunk - chunk_offset >= bio_sectors(bi))
6101+
return r_sector;
6102+
/*
6103+
* Bio crosses to the next data disk. Check whether it's in the same
6104+
* chunk.
6105+
*/
6106+
dd_idx++;
6107+
while (dd_idx == sh.pd_idx || dd_idx == sh.qd_idx)
6108+
dd_idx++;
6109+
if (dd_idx >= raid_disks)
6110+
return r_sector;
6111+
return r_sector + sectors_per_chunk - chunk_offset;
6112+
}
6113+
60826114
static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
60836115
{
60846116
DEFINE_WAIT_FUNC(wait, woken_wake_function);
@@ -6150,6 +6182,17 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
61506182
}
61516183
md_account_bio(mddev, &bi);
61526184

6185+
/*
6186+
* Lets start with the stripe with the lowest chunk offset in the first
6187+
* chunk. That has the best chances of creating IOs adjacent to
6188+
* previous IOs in case of sequential IO and thus creates the most
6189+
* sequential IO pattern. We don't bother with the optimization when
6190+
* reshaping as the performance benefit is not worth the complexity.
6191+
*/
6192+
if (likely(conf->reshape_progress == MaxSector))
6193+
logical_sector = raid5_bio_lowest_chunk_sector(conf, bi);
6194+
s = (logical_sector - ctx.first_sector) >> RAID5_STRIPE_SHIFT(conf);
6195+
61536196
add_wait_queue(&conf->wait_for_overlap, &wait);
61546197
while (1) {
61556198
res = make_stripe_request(mddev, conf, &ctx, logical_sector,
@@ -6178,7 +6221,7 @@ static bool raid5_make_request(struct mddev *mddev, struct bio * bi)
61786221
continue;
61796222
}
61806223

6181-
s = find_first_bit(ctx.sectors_to_do, stripe_cnt);
6224+
s = find_next_bit_wrap(ctx.sectors_to_do, stripe_cnt, s);
61826225
if (s == stripe_cnt)
61836226
break;
61846227

0 commit comments

Comments
 (0)