Skip to content

Commit 03de20b

Browse files
ifishlandtytso
authored andcommitted
ext4: do not mark inode dirty every time when appending using delalloc
In the delalloc append write scenario, if inode's i_size is extended due to buffer write, there are delalloc writes pending in the range up to i_size, and no need to touch i_disksize since writeback will push i_disksize up to i_size eventually. Offers significant performance improvement in high-frequency append write scenarios. I conducted tests in my 32-core environment by launching 32 concurrent threads to append write to the same file. Each write operation had a length of 1024 bytes and was repeated 100000 times. Without using this patch, the test was completed in 7705 ms. However, with this patch, the test was completed in 5066 ms, resulting in a performance improvement of 34%. Moreover, in test scenarios of Kafka version 2.6.2, using packet size of 2K, with this patch resulted in a 10% performance improvement. Signed-off-by: Liu Song <liusong@linux.alibaba.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230810154333.84921-1-liusong@linux.alibaba.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
1 parent bb15cea commit 03de20b

File tree

1 file changed

+62
-26
lines changed

1 file changed

+62
-26
lines changed

fs/ext4/inode.c

Lines changed: 62 additions & 26 deletions
Original file line numberDiff line numberDiff line change
@@ -2935,14 +2935,73 @@ static int ext4_da_should_update_i_disksize(struct folio *folio,
29352935
return 1;
29362936
}
29372937

2938+
static int ext4_da_do_write_end(struct address_space *mapping,
2939+
loff_t pos, unsigned len, unsigned copied,
2940+
struct page *page)
2941+
{
2942+
struct inode *inode = mapping->host;
2943+
loff_t old_size = inode->i_size;
2944+
bool disksize_changed = false;
2945+
loff_t new_i_size;
2946+
2947+
/*
2948+
* block_write_end() will mark the inode as dirty with I_DIRTY_PAGES
2949+
* flag, which all that's needed to trigger page writeback.
2950+
*/
2951+
copied = block_write_end(NULL, mapping, pos, len, copied, page, NULL);
2952+
new_i_size = pos + copied;
2953+
2954+
/*
2955+
* It's important to update i_size while still holding page lock,
2956+
* because page writeout could otherwise come in and zero beyond
2957+
* i_size.
2958+
*
2959+
* Since we are holding inode lock, we are sure i_disksize <=
2960+
* i_size. We also know that if i_disksize < i_size, there are
2961+
* delalloc writes pending in the range up to i_size. If the end of
2962+
* the current write is <= i_size, there's no need to touch
2963+
* i_disksize since writeback will push i_disksize up to i_size
2964+
* eventually. If the end of the current write is > i_size and
2965+
* inside an allocated block which ext4_da_should_update_i_disksize()
2966+
* checked, we need to update i_disksize here as certain
2967+
* ext4_writepages() paths not allocating blocks and update i_disksize.
2968+
*/
2969+
if (new_i_size > inode->i_size) {
2970+
unsigned long end;
2971+
2972+
i_size_write(inode, new_i_size);
2973+
end = (new_i_size - 1) & (PAGE_SIZE - 1);
2974+
if (copied && ext4_da_should_update_i_disksize(page_folio(page), end)) {
2975+
ext4_update_i_disksize(inode, new_i_size);
2976+
disksize_changed = true;
2977+
}
2978+
}
2979+
2980+
unlock_page(page);
2981+
put_page(page);
2982+
2983+
if (old_size < pos)
2984+
pagecache_isize_extended(inode, old_size, pos);
2985+
2986+
if (disksize_changed) {
2987+
handle_t *handle;
2988+
2989+
handle = ext4_journal_start(inode, EXT4_HT_INODE, 2);
2990+
if (IS_ERR(handle))
2991+
return PTR_ERR(handle);
2992+
ext4_mark_inode_dirty(handle, inode);
2993+
ext4_journal_stop(handle);
2994+
}
2995+
2996+
return copied;
2997+
}
2998+
29382999
static int ext4_da_write_end(struct file *file,
29393000
struct address_space *mapping,
29403001
loff_t pos, unsigned len, unsigned copied,
29413002
struct page *page, void *fsdata)
29423003
{
29433004
struct inode *inode = mapping->host;
2944-
loff_t new_i_size;
2945-
unsigned long start, end;
29463005
int write_mode = (int)(unsigned long)fsdata;
29473006
struct folio *folio = page_folio(page);
29483007

@@ -2961,30 +3020,7 @@ static int ext4_da_write_end(struct file *file,
29613020
if (unlikely(copied < len) && !PageUptodate(page))
29623021
copied = 0;
29633022

2964-
start = pos & (PAGE_SIZE - 1);
2965-
end = start + copied - 1;
2966-
2967-
/*
2968-
* Since we are holding inode lock, we are sure i_disksize <=
2969-
* i_size. We also know that if i_disksize < i_size, there are
2970-
* delalloc writes pending in the range upto i_size. If the end of
2971-
* the current write is <= i_size, there's no need to touch
2972-
* i_disksize since writeback will push i_disksize upto i_size
2973-
* eventually. If the end of the current write is > i_size and
2974-
* inside an allocated block (ext4_da_should_update_i_disksize()
2975-
* check), we need to update i_disksize here as certain
2976-
* ext4_writepages() paths not allocating blocks update i_disksize.
2977-
*
2978-
* Note that we defer inode dirtying to generic_write_end() /
2979-
* ext4_da_write_inline_data_end().
2980-
*/
2981-
new_i_size = pos + copied;
2982-
if (copied && new_i_size > inode->i_size &&
2983-
ext4_da_should_update_i_disksize(folio, end))
2984-
ext4_update_i_disksize(inode, new_i_size);
2985-
2986-
return generic_write_end(file, mapping, pos, len, copied, &folio->page,
2987-
fsdata);
3023+
return ext4_da_do_write_end(mapping, pos, len, copied, &folio->page);
29883024
}
29893025

29903026
/*

0 commit comments

Comments
 (0)