Skip to content

Commit 1c24a18

Browse files
committed
fs: fd tables have to be multiples of BITS_PER_LONG
This has always been the rule: fdtables have several bitmaps in them, and as a result they have to be sized properly for bitmaps. We walk those bitmaps in chunks of 'unsigned long' in serveral cases, but even when we don't, we use the regular kernel bitops that are defined to work on arrays of 'unsigned long', not on some byte array. Now, the distinction between arrays of bytes and 'unsigned long' normally only really ends up being noticeable on big-endian systems, but Fedor Pchelkin and Alexey Khoroshilov reported that copy_fd_bitmaps() could be called with an argument that wasn't even a multiple of BITS_PER_BYTE. And then it fails to do the proper copy even on little-endian machines. The bug wasn't in copy_fd_bitmap(), but in sane_fdtable_size(), which didn't actually sanitize the fdtable size sufficiently, and never made sure it had the proper BITS_PER_LONG alignment. That's partly because the alignment historically came not from having to explicitly align things, but simply from previous fdtable sizes, and from count_open_files(), which counts the file descriptors by walking them one 'unsigned long' word at a time and thus naturally ends up doing sizing in the proper 'chunks of unsigned long'. But with the introduction of close_range(), we now have an external source of "this is how many files we want to have", and so sane_fdtable_size() needs to do a better job. This also adds that explicit alignment to alloc_fdtable(), although there it is mainly just for documentation at a source code level. The arithmetic we do there to pick a reasonable fdtable size already aligns the result sufficiently. In fact,clang notices that the added ALIGN() in that function doesn't actually do anything, and does not generate any extra code for it. It turns out that gcc ends up confusing itself by combining a previous constant-sized shift operation with the variable-sized shift operations in roundup_pow_of_two(). And probably due to that doesn't notice that the ALIGN() is a no-op. But that's a (tiny) gcc misfeature that doesn't matter. Having the explicit alignment makes sense, and would actually matter on a 128-bit architecture if we ever go there. This also adds big comments above both functions about how fdtable sizes have to have that BITS_PER_LONG alignment. Fixes: 60997c3 ("close_range: add CLOSE_RANGE_UNSHARE") Reported-by: Fedor Pchelkin <aissur0002@gmail.com> Reported-by: Alexey Khoroshilov <khoroshilov@ispras.ru> Link: https://lore.kernel.org/all/20220326114009.1690-1-aissur0002@gmail.com/ Tested-and-acked-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
1 parent 13776eb commit 1c24a18

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

fs/file.c

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -87,6 +87,21 @@ static void copy_fdtable(struct fdtable *nfdt, struct fdtable *ofdt)
8787
copy_fd_bitmaps(nfdt, ofdt, ofdt->max_fds);
8888
}
8989

90+
/*
91+
* Note how the fdtable bitmap allocations very much have to be a multiple of
92+
* BITS_PER_LONG. This is not only because we walk those things in chunks of
93+
* 'unsigned long' in some places, but simply because that is how the Linux
94+
* kernel bitmaps are defined to work: they are not "bits in an array of bytes",
95+
* they are very much "bits in an array of unsigned long".
96+
*
97+
* The ALIGN(nr, BITS_PER_LONG) here is for clarity: since we just multiplied
98+
* by that "1024/sizeof(ptr)" before, we already know there are sufficient
99+
* clear low bits. Clang seems to realize that, gcc ends up being confused.
100+
*
101+
* On a 128-bit machine, the ALIGN() would actually matter. In the meantime,
102+
* let's consider it documentation (and maybe a test-case for gcc to improve
103+
* its code generation ;)
104+
*/
90105
static struct fdtable * alloc_fdtable(unsigned int nr)
91106
{
92107
struct fdtable *fdt;
@@ -102,6 +117,7 @@ static struct fdtable * alloc_fdtable(unsigned int nr)
102117
nr /= (1024 / sizeof(struct file *));
103118
nr = roundup_pow_of_two(nr + 1);
104119
nr *= (1024 / sizeof(struct file *));
120+
nr = ALIGN(nr, BITS_PER_LONG);
105121
/*
106122
* Note that this can drive nr *below* what we had passed if sysctl_nr_open
107123
* had been set lower between the check in expand_files() and here. Deal
@@ -269,11 +285,25 @@ static unsigned int count_open_files(struct fdtable *fdt)
269285
return i;
270286
}
271287

288+
/*
289+
* Note that a sane fdtable size always has to be a multiple of
290+
* BITS_PER_LONG, since we have bitmaps that are sized by this.
291+
*
292+
* 'max_fds' will normally already be properly aligned, but it
293+
* turns out that in the close_range() -> __close_range() ->
294+
* unshare_fd() -> dup_fd() -> sane_fdtable_size() we can end
295+
* up having a 'max_fds' value that isn't already aligned.
296+
*
297+
* Rather than make close_range() have to worry about this,
298+
* just make that BITS_PER_LONG alignment be part of a sane
299+
* fdtable size. Becuase that's really what it is.
300+
*/
272301
static unsigned int sane_fdtable_size(struct fdtable *fdt, unsigned int max_fds)
273302
{
274303
unsigned int count;
275304

276305
count = count_open_files(fdt);
306+
max_fds = ALIGN(max_fds, BITS_PER_LONG);
277307
if (max_fds < NR_OPEN_DEFAULT)
278308
max_fds = NR_OPEN_DEFAULT;
279309
return min(count, max_fds);

0 commit comments

Comments
 (0)