Skip to content

Commit f01d819

Browse files
fdmananaSasha Levin
authored andcommitted
btrfs: fix fsync of files with no hard links not persisting deletion
[ Upstream commit 5e85262 ] If we fsync a file (or directory) that has no more hard links, because while a process had a file descriptor open on it, the file's last hard link was removed and then the process did an fsync against the file descriptor, after a power failure or crash the file still exists after replaying the log. This behaviour is incorrect since once an inode has no more hard links it's not accessible anymore and we insert an orphan item into its subvolume's tree so that the deletion of all its items is not missed in case of a power failure or crash. So after log replay the file shouldn't exist anymore, which is also the behaviour on ext4, xfs, f2fs and other filesystems. Fix this by not ignoring inodes with zero hard links at btrfs_log_inode_parent() and by committing an inode's delayed inode when we are not doing a fast fsync (either BTRFS_INODE_COPY_EVERYTHING or BTRFS_INODE_NEEDS_FULL_SYNC is set in the inode's runtime flags). This last step is necessary because when removing the last hard link we don't delete the corresponding ref (or extref) item, instead we record the change in the inode's delayed inode with the BTRFS_DELAYED_NODE_DEL_IREF flag, so that when the delayed inode is committed we delete the ref/extref item from the inode's subvolume tree - otherwise the logging code will log the last hard link and therefore upon log replay the inode is not deleted. The base code for a fstests test case that reproduces this bug is the following: . ./common/dmflakey _require_scratch _require_dm_target flakey _require_mknod _scratch_mkfs >>$seqres.full 2>&1 || _fail "mkfs failed" _require_metadata_journaling $SCRATCH_DEV _init_flakey _mount_flakey touch $SCRATCH_MNT/foo # Commit the current transaction and persist the file. _scratch_sync # A fifo to communicate with a background xfs_io process that will # fsync the file after we deleted its hard link while it's open by # xfs_io. mkfifo $SCRATCH_MNT/fifo tail -f $SCRATCH_MNT/fifo | \ $XFS_IO_PROG $SCRATCH_MNT/foo >>$seqres.full & XFS_IO_PID=$! # Give some time for the xfs_io process to open a file descriptor for # the file. sleep 1 # Now while the file is open by the xfs_io process, delete its only # hard link. rm -f $SCRATCH_MNT/foo # Now that it has no more hard links, make the xfs_io process fsync it. echo "fsync" > $SCRATCH_MNT/fifo # Terminate the xfs_io process so that we can unmount. echo "quit" > $SCRATCH_MNT/fifo wait $XFS_IO_PID unset XFS_IO_PID # Simulate a power failure and then mount again the filesystem to # replay the journal/log. _flakey_drop_and_remount # We don't expect the file to exist anymore, since it was fsynced when # it had no more hard links. [ -f $SCRATCH_MNT/foo ] && echo "file foo still exists" _unmount_flakey # success, all done echo "Silence is golden" status=0 exit A test case for fstests will be submitted soon. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
1 parent 898dbb9 commit f01d819

File tree

1 file changed

+16
-8
lines changed

1 file changed

+16
-8
lines changed

fs/btrfs/tree-log.c

Lines changed: 16 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6583,6 +6583,19 @@ static int btrfs_log_inode(struct btrfs_trans_handle *trans,
65836583
btrfs_log_get_delayed_items(inode, &delayed_ins_list,
65846584
&delayed_del_list);
65856585

6586+
/*
6587+
* If we are fsyncing a file with 0 hard links, then commit the delayed
6588+
* inode because the last inode ref (or extref) item may still be in the
6589+
* subvolume tree and if we log it the file will still exist after a log
6590+
* replay. So commit the delayed inode to delete that last ref and we
6591+
* skip logging it.
6592+
*/
6593+
if (inode->vfs_inode.i_nlink == 0) {
6594+
ret = btrfs_commit_inode_delayed_inode(inode);
6595+
if (ret)
6596+
goto out_unlock;
6597+
}
6598+
65866599
ret = copy_inode_items_to_log(trans, inode, &min_key, &max_key,
65876600
path, dst_path, logged_isize,
65886601
inode_only, ctx,
@@ -7051,14 +7064,9 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
70517064
if (btrfs_root_generation(&root->root_item) == trans->transid)
70527065
return BTRFS_LOG_FORCE_COMMIT;
70537066

7054-
/*
7055-
* Skip already logged inodes or inodes corresponding to tmpfiles
7056-
* (since logging them is pointless, a link count of 0 means they
7057-
* will never be accessible).
7058-
*/
7059-
if ((btrfs_inode_in_log(inode, trans->transid) &&
7060-
list_empty(&ctx->ordered_extents)) ||
7061-
inode->vfs_inode.i_nlink == 0)
7067+
/* Skip already logged inodes and without new extents. */
7068+
if (btrfs_inode_in_log(inode, trans->transid) &&
7069+
list_empty(&ctx->ordered_extents))
70627070
return BTRFS_NO_LOG_SYNC;
70637071

70647072
ret = start_log_trans(trans, root, ctx);

0 commit comments

Comments
 (0)