FileHandle vs Inode (real and pseudo) vs Inode's FileId (byte[]) vs. Stat ino; PseudoFS, Opaque & co. #149
Replies: 3 comments 1 reply
-
@kohlschuetter, thanks for starting this discussion. Indeed, the code is confusing and (probably) over-engineered/complicated. The main reason for separating The The So, yeah, I agree, many things can be improved. Low-hanging fruit is fixing With your change suggestion list above I thinks: ⏸️ 1.
✅ 2.
🔧 5.
|
Beta Was this translation helpful? Give feedback.
-
I had yet another look at |
Beta Was this translation helpful? Give feedback.
-
Cool. I understand compatibility with existing installations is very important, so will try to make all changes backwards compatible. Regarding I think I can make this a backwards-compatible change since it will only apply to new file systems and new NFSv4-only setups. Concerning |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Going through the NFS4j code, I'm a bit consued by the many different layers of referencing an inode in the underlying file system. We seem to be converting, serializing, deserializing, masquerading, extracting and ultimately conflating different objects. This not only makes the code quite complex and hard to follow, it also adds to the memory and CPU footprint.
Ultimately, from a protocol perspective, NFSv4 expects
nfs_fh4
to be 128 bytes of rather arbitrary content, but also, in some places (e.g., getattr), expects a 64-bit "file id" (similar to a UNIX "inode").In our code, however I see the following:
FileHandle
: is practically only referenced byInode
, which encapsulates it, butInode
provides hashcode/equality based uponFileHandle
's byte representation, which seems to be the only functional addition overFileHandle
itself. However that code is very inefficient as it builds aByteBuffer
/byte[]
for each check.FileTracker
,VfsCache
,LockManager
, etc., we don't track theInode
, we trackinode.getFileId()
(which at some point could be abyte[]
of rather arbitrary length), and we may even wrap it into anOpaque
object before each useLocalFileSystem
uses incrementally generatedlong
-based "inode" values (which are not based upon the actual file system inodes — this may cause problems when restarting the server, especially after altering the local file system)Stat
returns a 64-bit long forgetIno()
("Ino" refers to "inode"), which is reported to NFS as the "fileid".PseudoFS
encodes additional bits of information (such as "exportIndex") into theFileHandle
. By making access decisions based on bits encoded in the file handle, a malicious NFS client may trigger privilege escalation by manipulating the "exportIndex" part. The current logic is probably also the reason why we can't use the inode's full 128 bytes by the downstream VirtualFileSystem.I wonder if there is interest in simplifying this structure as follows. As I'm new to the codebase, please correct my assumptions if I'm missing something important:
FileHandle
andInode
into one type, and simplify the hashCode/equals checks, remove the various byte[] serialiation steps, etc. We can keep calling this "Inode" for the sake of API compatibility. (I have working code that extendsInode
fromFileHandle
, but I think we could also completely absorb that class into one).FileTracker
etc. reference theInode
objects directly instead ofbyte[]
/Opaque
.LocalFileSystem
could use randomUUID
s (or the underlying FS's actual inode number when available) instead of incrementally added numbers. This seems to fix an inode-confusion issue I'm observing on macOS after restarting the NFS server (while modifying the contents of the underlying file system between restarts).Long
object instead oflong
, or maybe interpret0
as "no value" in NFSv4).PseudoFS
to track state internally instead of exposing it via bytes in the NFS file handle object (this is the item with the least priority for me, as I might be able to just skip that level of indirection for my use case, but I think this is critical for hardening NFS4j's security).I already have working code for most of this, but I first want to understand if there's 1. any interest to pursue this and 2. anything else I'm missing here.
Beta Was this translation helpful? Give feedback.
All reactions