|
| 1 | +=========== |
| 2 | +NFS LOCALIO |
| 3 | +=========== |
| 4 | + |
| 5 | +Overview |
| 6 | +======== |
| 7 | + |
| 8 | +The LOCALIO auxiliary RPC protocol allows the Linux NFS client and |
| 9 | +server to reliably handshake to determine if they are on the same |
| 10 | +host. Select "NFS client and server support for LOCALIO auxiliary |
| 11 | +protocol" in menuconfig to enable CONFIG_NFS_LOCALIO in the kernel |
| 12 | +config (both CONFIG_NFS_FS and CONFIG_NFSD must also be enabled). |
| 13 | + |
| 14 | +Once an NFS client and server handshake as "local", the client will |
| 15 | +bypass the network RPC protocol for read, write and commit operations. |
| 16 | +Due to this XDR and RPC bypass, these operations will operate faster. |
| 17 | + |
| 18 | +The LOCALIO auxiliary protocol's implementation, which uses the same |
| 19 | +connection as NFS traffic, follows the pattern established by the NFS |
| 20 | +ACL protocol extension. |
| 21 | + |
| 22 | +The LOCALIO auxiliary protocol is needed to allow robust discovery of |
| 23 | +clients local to their servers. In a private implementation that |
| 24 | +preceded use of this LOCALIO protocol, a fragile sockaddr network |
| 25 | +address based match against all local network interfaces was attempted. |
| 26 | +But unlike the LOCALIO protocol, the sockaddr-based matching didn't |
| 27 | +handle use of iptables or containers. |
| 28 | + |
| 29 | +The robust handshake between local client and server is just the |
| 30 | +beginning, the ultimate use case this locality makes possible is the |
| 31 | +client is able to open files and issue reads, writes and commits |
| 32 | +directly to the server without having to go over the network. The |
| 33 | +requirement is to perform these loopback NFS operations as efficiently |
| 34 | +as possible, this is particularly useful for container use cases |
| 35 | +(e.g. kubernetes) where it is possible to run an IO job local to the |
| 36 | +server. |
| 37 | + |
| 38 | +The performance advantage realized from LOCALIO's ability to bypass |
| 39 | +using XDR and RPC for reads, writes and commits can be extreme, e.g.: |
| 40 | + |
| 41 | +fio for 20 secs with directio, qd of 8, 16 libaio threads: |
| 42 | +- With LOCALIO: |
| 43 | + 4K read: IOPS=979k, BW=3825MiB/s (4011MB/s)(74.7GiB/20002msec) |
| 44 | + 4K write: IOPS=165k, BW=646MiB/s (678MB/s)(12.6GiB/20002msec) |
| 45 | + 128K read: IOPS=402k, BW=49.1GiB/s (52.7GB/s)(982GiB/20002msec) |
| 46 | + 128K write: IOPS=11.5k, BW=1433MiB/s (1503MB/s)(28.0GiB/20004msec) |
| 47 | + |
| 48 | +- Without LOCALIO: |
| 49 | + 4K read: IOPS=79.2k, BW=309MiB/s (324MB/s)(6188MiB/20003msec) |
| 50 | + 4K write: IOPS=59.8k, BW=234MiB/s (245MB/s)(4671MiB/20002msec) |
| 51 | + 128K read: IOPS=33.9k, BW=4234MiB/s (4440MB/s)(82.7GiB/20004msec) |
| 52 | + 128K write: IOPS=11.5k, BW=1434MiB/s (1504MB/s)(28.0GiB/20011msec) |
| 53 | + |
| 54 | +fio for 20 secs with directio, qd of 8, 1 libaio thread: |
| 55 | +- With LOCALIO: |
| 56 | + 4K read: IOPS=230k, BW=898MiB/s (941MB/s)(17.5GiB/20001msec) |
| 57 | + 4K write: IOPS=22.6k, BW=88.3MiB/s (92.6MB/s)(1766MiB/20001msec) |
| 58 | + 128K read: IOPS=38.8k, BW=4855MiB/s (5091MB/s)(94.8GiB/20001msec) |
| 59 | + 128K write: IOPS=11.4k, BW=1428MiB/s (1497MB/s)(27.9GiB/20001msec) |
| 60 | + |
| 61 | +- Without LOCALIO: |
| 62 | + 4K read: IOPS=77.1k, BW=301MiB/s (316MB/s)(6022MiB/20001msec) |
| 63 | + 4K write: IOPS=32.8k, BW=128MiB/s (135MB/s)(2566MiB/20001msec) |
| 64 | + 128K read: IOPS=24.4k, BW=3050MiB/s (3198MB/s)(59.6GiB/20001msec) |
| 65 | + 128K write: IOPS=11.4k, BW=1430MiB/s (1500MB/s)(27.9GiB/20001msec) |
| 66 | + |
| 67 | +RPC |
| 68 | +=== |
| 69 | + |
| 70 | +The LOCALIO auxiliary RPC protocol consists of a single "UUID_IS_LOCAL" |
| 71 | +RPC method that allows the Linux NFS client to verify the local Linux |
| 72 | +NFS server can see the nonce (single-use UUID) the client generated and |
| 73 | +made available in nfs_common. This protocol isn't part of an IETF |
| 74 | +standard, nor does it need to be considering it is Linux-to-Linux |
| 75 | +auxiliary RPC protocol that amounts to an implementation detail. |
| 76 | + |
| 77 | +The UUID_IS_LOCAL method encodes the client generated uuid_t in terms of |
| 78 | +the fixed UUID_SIZE (16 bytes). The fixed size opaque encode and decode |
| 79 | +XDR methods are used instead of the less efficient variable sized |
| 80 | +methods. |
| 81 | + |
| 82 | +The RPC program number for the NFS_LOCALIO_PROGRAM is 400122 (as assigned |
| 83 | +by IANA, see https://www.iana.org/assignments/rpc-program-numbers/ ): |
| 84 | +Linux Kernel Organization 400122 nfslocalio |
| 85 | + |
| 86 | +The LOCALIO protocol spec in rpcgen syntax is: |
| 87 | + |
| 88 | +/* raw RFC 9562 UUID */ |
| 89 | +#define UUID_SIZE 16 |
| 90 | +typedef u8 uuid_t<UUID_SIZE>; |
| 91 | +
|
| 92 | +program NFS_LOCALIO_PROGRAM { |
| 93 | + version LOCALIO_V1 { |
| 94 | + void |
| 95 | + NULL(void) = 0; |
| 96 | + |
| 97 | + void |
| 98 | + UUID_IS_LOCAL(uuid_t) = 1; |
| 99 | + } = 1; |
| 100 | +} = 400122; |
| 101 | + |
| 102 | +LOCALIO uses the same transport connection as NFS traffic. As such, |
| 103 | +LOCALIO is not registered with rpcbind. |
| 104 | + |
| 105 | +NFS Common and Client/Server Handshake |
| 106 | +====================================== |
| 107 | + |
| 108 | +fs/nfs_common/nfslocalio.c provides interfaces that enable an NFS client |
| 109 | +to generate a nonce (single-use UUID) and associated short-lived |
| 110 | +nfs_uuid_t struct, register it with nfs_common for subsequent lookup and |
| 111 | +verification by the NFS server and if matched the NFS server populates |
| 112 | +members in the nfs_uuid_t struct. The NFS client then uses nfs_common to |
| 113 | +transfer the nfs_uuid_t from its nfs_uuids to the nn->nfsd_serv |
| 114 | +clients_list from the nfs_common's uuids_list. See: |
| 115 | +fs/nfs/localio.c:nfs_local_probe() |
| 116 | + |
| 117 | +nfs_common's nfs_uuids list is the basis for LOCALIO enablement, as such |
| 118 | +it has members that point to nfsd memory for direct use by the client |
| 119 | +(e.g. 'net' is the server's network namespace, through it the client can |
| 120 | +access nn->nfsd_serv with proper rcu read access). It is this client |
| 121 | +and server synchronization that enables advanced usage and lifetime of |
| 122 | +objects to span from the host kernel's nfsd to per-container knfsd |
| 123 | +instances that are connected to nfs client's running on the same local |
| 124 | +host. |
| 125 | + |
| 126 | +NFS Client issues IO instead of Server |
| 127 | +====================================== |
| 128 | + |
| 129 | +Because LOCALIO is focused on protocol bypass to achieve improved IO |
| 130 | +performance, alternatives to the traditional NFS wire protocol (SUNRPC |
| 131 | +with XDR) must be provided to access the backing filesystem. |
| 132 | + |
| 133 | +See fs/nfs/localio.c:nfs_local_open_fh() and |
| 134 | +fs/nfsd/localio.c:nfsd_open_local_fh() for the interface that makes |
| 135 | +focused use of select nfs server objects to allow a client local to a |
| 136 | +server to open a file pointer without needing to go over the network. |
| 137 | + |
| 138 | +The client's fs/nfs/localio.c:nfs_local_open_fh() will call into the |
| 139 | +server's fs/nfsd/localio.c:nfsd_open_local_fh() and carefully access |
| 140 | +both the associated nfsd network namespace and nn->nfsd_serv in terms of |
| 141 | +RCU. If nfsd_open_local_fh() finds that the client no longer sees valid |
| 142 | +nfsd objects (be it struct net or nn->nfsd_serv) it returns -ENXIO |
| 143 | +to nfs_local_open_fh() and the client will try to reestablish the |
| 144 | +LOCALIO resources needed by calling nfs_local_probe() again. This |
| 145 | +recovery is needed if/when an nfsd instance running in a container were |
| 146 | +to reboot while a LOCALIO client is connected to it. |
| 147 | + |
| 148 | +Once the client has an open nfsd_file pointer it will issue reads, |
| 149 | +writes and commits directly to the underlying local filesystem (normally |
| 150 | +done by the nfs server). As such, for these operations, the NFS client |
| 151 | +is issuing IO to the underlying local filesystem that it is sharing with |
| 152 | +the NFS server. See: fs/nfs/localio.c:nfs_local_doio() and |
| 153 | +fs/nfs/localio.c:nfs_local_commit(). |
| 154 | + |
| 155 | +Security |
| 156 | +======== |
| 157 | + |
| 158 | +Localio is only supported when UNIX-style authentication (AUTH_UNIX, aka |
| 159 | +AUTH_SYS) is used. |
| 160 | + |
| 161 | +Care is taken to ensure the same NFS security mechanisms are used |
| 162 | +(authentication, etc) regardless of whether LOCALIO or regular NFS |
| 163 | +access is used. The auth_domain established as part of the traditional |
| 164 | +NFS client access to the NFS server is also used for LOCALIO. |
| 165 | + |
| 166 | +Relative to containers, LOCALIO gives the client access to the network |
| 167 | +namespace the server has. This is required to allow the client to access |
| 168 | +the server's per-namespace nfsd_net struct. With traditional NFS, the |
| 169 | +client is afforded this same level of access (albeit in terms of the NFS |
| 170 | +protocol via SUNRPC). No other namespaces (user, mount, etc) have been |
| 171 | +altered or purposely extended from the server to the client. |
| 172 | + |
| 173 | +Testing |
| 174 | +======= |
| 175 | + |
| 176 | +The LOCALIO auxiliary protocol and associated NFS LOCALIO read, write |
| 177 | +and commit access have proven stable against various test scenarios: |
| 178 | + |
| 179 | +- Client and server both on the same host. |
| 180 | + |
| 181 | +- All permutations of client and server support enablement for both |
| 182 | + local and remote client and server. |
| 183 | + |
| 184 | +- Testing against NFS storage products that don't support the LOCALIO |
| 185 | + protocol was also performed. |
| 186 | + |
| 187 | +- Client on host, server within a container (for both v3 and v4.2). |
| 188 | + The container testing was in terms of podman managed containers and |
| 189 | + includes successful container stop/restart scenario. |
| 190 | + |
| 191 | +- Formalizing these test scenarios in terms of existing test |
| 192 | + infrastructure is on-going. Initial regular coverage is provided in |
| 193 | + terms of ktest running xfstests against a LOCALIO-enabled NFS loopback |
| 194 | + mount configuration, and includes lockdep and KASAN coverage, see: |
| 195 | + https://evilpiepirate.org/~testdashboard/ci?user=snitzer&branch=snitm-nfs-next |
| 196 | + https://github.com/koverstreet/ktest |
| 197 | + |
| 198 | +- Various kdevops testing (in terms of "Chuck's BuildBot") has been |
| 199 | + performed to regularly verify the LOCALIO changes haven't caused any |
| 200 | + regressions to non-LOCALIO NFS use cases. |
| 201 | + |
| 202 | +- All of Hammerspace's various sanity tests pass with LOCALIO enabled |
| 203 | + (this includes numerous pNFS and flexfiles tests). |
0 commit comments