Skip to content

Commit 22e111e

Browse files
author
Al Viro
committed
rename(): fix the locking of subdirectories
We should never lock two subdirectories without having taken ->s_vfs_rename_mutex; inode pointer order or not, the "order" proposed in 28eceed "fs: Lock moved directories" is not transitive, with the usual consequences. The rationale for locking renamed subdirectory in all cases was the possibility of race between rename modifying .. in a subdirectory to reflect the new parent and another thread modifying the same subdirectory. For a lot of filesystems that's not a problem, but for some it can lead to trouble (e.g. the case when short directory contents is kept in the inode, but creating a file in it might push it across the size limit and copy its contents into separate data block(s)). However, we need that only in case when the parent does change - otherwise ->rename() doesn't need to do anything with .. entry in the first place. Some instances are lazy and do a tautological update anyway, but it's really not hard to avoid. Amended locking rules for rename(): find the parent(s) of source and target if source and target have the same parent lock the common parent else lock ->s_vfs_rename_mutex lock both parents, in ancestor-first order; if neither is an ancestor of another, lock the parent of source first. find the source and target. if source and target have the same parent if operation is an overwriting rename of a subdirectory lock the target subdirectory else if source is a subdirectory lock the source if target is a subdirectory lock the target lock non-directories involved, in inode pointer order if both source and target are such. That way we are guaranteed that parents are locked (for obvious reasons), that any renamed non-directory is locked (nfsd relies upon that), that any victim is locked (emptiness check needs that, among other things) and subdirectory that changes parent is locked (needed to protect the update of .. entries). We are also guaranteed that any operation locking more than one directory either takes ->s_vfs_rename_mutex or locks a parent followed by its child. Cc: stable@vger.kernel.org Fixes: 28eceed "fs: Lock moved directories" Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
1 parent 7deee77 commit 22e111e

File tree

4 files changed

+74
-38
lines changed

4 files changed

+74
-38
lines changed

Documentation/filesystems/directory-locking.rst

Lines changed: 15 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,16 @@ exclusive.
2222
3) object removal. Locking rules: caller locks parent, finds victim,
2323
locks victim and calls the method. Locks are exclusive.
2424

25-
4) rename() that is _not_ cross-directory. Locking rules: caller locks the
26-
parent and finds source and target. We lock both (provided they exist). If we
27-
need to lock two inodes of different type (dir vs non-dir), we lock directory
28-
first. If we need to lock two inodes of the same type, lock them in inode
29-
pointer order. Then call the method. All locks are exclusive.
30-
NB: we might get away with locking the source (and target in exchange
31-
case) shared.
25+
4) rename() that is _not_ cross-directory. Locking rules: caller locks
26+
the parent and finds source and target. Then we decide which of the
27+
source and target need to be locked. Source needs to be locked if it's a
28+
non-directory; target - if it's a non-directory or about to be removed.
29+
Take the locks that need to be taken, in inode pointer order if need
30+
to take both (that can happen only when both source and target are
31+
non-directories - the source because it wouldn't be locked otherwise
32+
and the target because mixing directory and non-directory is allowed
33+
only with RENAME_EXCHANGE, and that won't be removing the target).
34+
After the locks had been taken, call the method. All locks are exclusive.
3235

3336
5) link creation. Locking rules:
3437

@@ -44,20 +47,17 @@ rules:
4447

4548
* lock the filesystem
4649
* lock parents in "ancestors first" order. If one is not ancestor of
47-
the other, lock them in inode pointer order.
50+
the other, lock the parent of source first.
4851
* find source and target.
4952
* if old parent is equal to or is a descendent of target
5053
fail with -ENOTEMPTY
5154
* if new parent is equal to or is a descendent of source
5255
fail with -ELOOP
53-
* Lock both the source and the target provided they exist. If we
54-
need to lock two inodes of different type (dir vs non-dir), we lock
55-
the directory first. If we need to lock two inodes of the same type,
56-
lock them in inode pointer order.
56+
* Lock subdirectories involved (source before target).
57+
* Lock non-directories involved, in inode pointer order.
5758
* call the method.
5859

59-
All ->i_rwsem are taken exclusive. Again, we might get away with locking
60-
the source (and target in exchange case) shared.
60+
All ->i_rwsem are taken exclusive.
6161

6262
The rules above obviously guarantee that all directories that are going to be
6363
read, modified or removed by method will be locked by caller.
@@ -67,6 +67,7 @@ If no directory is its own ancestor, the scheme above is deadlock-free.
6767

6868
Proof:
6969

70+
[XXX: will be updated once we are done massaging the lock_rename()]
7071
First of all, at any moment we have a linear ordering of the
7172
objects - A < B iff (A is an ancestor of B) or (B is not an ancestor
7273
of A and ptr(A) < ptr(B)).

Documentation/filesystems/locking.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ symlink: exclusive
101101
mkdir: exclusive
102102
unlink: exclusive (both)
103103
rmdir: exclusive (both)(see below)
104-
rename: exclusive (all) (see below)
104+
rename: exclusive (both parents, some children) (see below)
105105
readlink: no
106106
get_link: no
107107
setattr: exclusive
@@ -123,6 +123,9 @@ get_offset_ctx no
123123
Additionally, ->rmdir(), ->unlink() and ->rename() have ->i_rwsem
124124
exclusive on victim.
125125
cross-directory ->rename() has (per-superblock) ->s_vfs_rename_sem.
126+
->unlink() and ->rename() have ->i_rwsem exclusive on all non-directories
127+
involved.
128+
->rename() has ->i_rwsem exclusive on any subdirectory that changes parent.
126129

127130
See Documentation/filesystems/directory-locking.rst for more detailed discussion
128131
of the locking scheme for directory operations.

Documentation/filesystems/porting.rst

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1061,3 +1061,21 @@ export_operations ->encode_fh() no longer has a default implementation to
10611061
encode FILEID_INO32_GEN* file handles.
10621062
Filesystems that used the default implementation may use the generic helper
10631063
generic_encode_ino32_fh() explicitly.
1064+
1065+
---
1066+
1067+
**mandatory**
1068+
1069+
If ->rename() update of .. on cross-directory move needs an exclusion with
1070+
directory modifications, do *not* lock the subdirectory in question in your
1071+
->rename() - it's done by the caller now [that item should've been added in
1072+
28eceeda130f "fs: Lock moved directories"].
1073+
1074+
---
1075+
1076+
**mandatory**
1077+
1078+
On same-directory ->rename() the (tautological) update of .. is not protected
1079+
by any locks; just don't do it if the old parent is the same as the new one.
1080+
We really can't lock two subdirectories in same-directory rename - not without
1081+
deadlocks.

fs/namei.c

Lines changed: 37 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -3021,20 +3021,14 @@ static struct dentry *lock_two_directories(struct dentry *p1, struct dentry *p2)
30213021
p = d_ancestor(p2, p1);
30223022
if (p) {
30233023
inode_lock_nested(p2->d_inode, I_MUTEX_PARENT);
3024-
inode_lock_nested(p1->d_inode, I_MUTEX_CHILD);
3024+
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT2);
30253025
return p;
30263026
}
30273027

30283028
p = d_ancestor(p1, p2);
3029-
if (p) {
3030-
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
3031-
inode_lock_nested(p2->d_inode, I_MUTEX_CHILD);
3032-
return p;
3033-
}
3034-
3035-
lock_two_inodes(p1->d_inode, p2->d_inode,
3036-
I_MUTEX_PARENT, I_MUTEX_PARENT2);
3037-
return NULL;
3029+
inode_lock_nested(p1->d_inode, I_MUTEX_PARENT);
3030+
inode_lock_nested(p2->d_inode, I_MUTEX_PARENT2);
3031+
return p;
30383032
}
30393033

30403034
/*
@@ -4716,11 +4710,12 @@ SYSCALL_DEFINE2(link, const char __user *, oldname, const char __user *, newname
47164710
*
47174711
* a) we can get into loop creation.
47184712
* b) race potential - two innocent renames can create a loop together.
4719-
* That's where 4.4 screws up. Current fix: serialization on
4713+
* That's where 4.4BSD screws up. Current fix: serialization on
47204714
* sb->s_vfs_rename_mutex. We might be more accurate, but that's another
47214715
* story.
4722-
* c) we have to lock _four_ objects - parents and victim (if it exists),
4723-
* and source.
4716+
* c) we may have to lock up to _four_ objects - parents and victim (if it exists),
4717+
* and source (if it's a non-directory or a subdirectory that moves to
4718+
* different parent).
47244719
* And that - after we got ->i_mutex on parents (until then we don't know
47254720
* whether the target exists). Solution: try to be smart with locking
47264721
* order for inodes. We rely on the fact that tree topology may change
@@ -4752,6 +4747,7 @@ int vfs_rename(struct renamedata *rd)
47524747
bool new_is_dir = false;
47534748
unsigned max_links = new_dir->i_sb->s_max_links;
47544749
struct name_snapshot old_name;
4750+
bool lock_old_subdir, lock_new_subdir;
47554751

47564752
if (source == target)
47574753
return 0;
@@ -4805,15 +4801,32 @@ int vfs_rename(struct renamedata *rd)
48054801
take_dentry_name_snapshot(&old_name, old_dentry);
48064802
dget(new_dentry);
48074803
/*
4808-
* Lock all moved children. Moved directories may need to change parent
4809-
* pointer so they need the lock to prevent against concurrent
4810-
* directory changes moving parent pointer. For regular files we've
4811-
* historically always done this. The lockdep locking subclasses are
4812-
* somewhat arbitrary but RENAME_EXCHANGE in particular can swap
4813-
* regular files and directories so it's difficult to tell which
4814-
* subclasses to use.
4804+
* Lock children.
4805+
* The source subdirectory needs to be locked on cross-directory
4806+
* rename or cross-directory exchange since its parent changes.
4807+
* The target subdirectory needs to be locked on cross-directory
4808+
* exchange due to parent change and on any rename due to becoming
4809+
* a victim.
4810+
* Non-directories need locking in all cases (for NFS reasons);
4811+
* they get locked after any subdirectories (in inode address order).
4812+
*
4813+
* NOTE: WE ONLY LOCK UNRELATED DIRECTORIES IN CROSS-DIRECTORY CASE.
4814+
* NEVER, EVER DO THAT WITHOUT ->s_vfs_rename_mutex.
48154815
*/
4816-
lock_two_inodes(source, target, I_MUTEX_NORMAL, I_MUTEX_NONDIR2);
4816+
lock_old_subdir = new_dir != old_dir;
4817+
lock_new_subdir = new_dir != old_dir || !(flags & RENAME_EXCHANGE);
4818+
if (is_dir) {
4819+
if (lock_old_subdir)
4820+
inode_lock_nested(source, I_MUTEX_CHILD);
4821+
if (target && (!new_is_dir || lock_new_subdir))
4822+
inode_lock(target);
4823+
} else if (new_is_dir) {
4824+
if (lock_new_subdir)
4825+
inode_lock_nested(target, I_MUTEX_CHILD);
4826+
inode_lock(source);
4827+
} else {
4828+
lock_two_nondirectories(source, target);
4829+
}
48174830

48184831
error = -EPERM;
48194832
if (IS_SWAPFILE(source) || (target && IS_SWAPFILE(target)))
@@ -4861,8 +4874,9 @@ int vfs_rename(struct renamedata *rd)
48614874
d_exchange(old_dentry, new_dentry);
48624875
}
48634876
out:
4864-
inode_unlock(source);
4865-
if (target)
4877+
if (!is_dir || lock_old_subdir)
4878+
inode_unlock(source);
4879+
if (target && (!new_is_dir || lock_new_subdir))
48664880
inode_unlock(target);
48674881
dput(new_dentry);
48684882
if (!error) {

0 commit comments

Comments
 (0)