Skip to content

Commit 6917b2f

Browse files
committed
ZFS Interface for Accelerators (Z.I.A.)
The ZIO pipeline has been modified to allow for external, alternative implementations of existing operations to be used. The original ZFS functions remain in the code as fallback in case the external implementation fails. Definitions: Accelerator - an entity (usually hardware) that is intended to accelerate operations Offloader - synonym of accelerator; used interchangeably Data Processing Unit Services Module (DPUSM) - https://github.com/hpc/dpusm - defines a "provider API" for accelerator vendors to set up - defines a "user API" for accelerator consumers to call - maintains list of providers and coordinates interactions between providers and consumers. Provider - a DPUSM wrapper for an accelerator's API Offload - moving data from ZFS/memory to the accelerator Onload - the opposite of offload In order for Z.I.A. to be extensible, it does not directly communicate with a fixed accelerator. Rather, Z.I.A. acquires a handle to a DPUSM, which is then used to acquire handles to providers. Using ZFS with Z.I.A.: 1. Build and start the DPUSM 2. Implement, build, and register a provider with the DPUSM 3. Reconfigure ZFS with '--with-zia=<DPUSM root>' 4. Rebuild and start ZFS 5. Create a zpool 6. Select the provider zpool set zia_provider=<provider name> <zpool> 7. Select operations to offload zpool set zia_<property>=on <zpool> The operations that have been modified are: - compression - non-raw-writes only - decompression - checksum - not handling embedded checksums - checksum compute and checksum error call the same function - raidz - generation - reconstruction - vdev_file - open - write - close - vdev_disk - open - invalidate - write - flush - close Successful operations do not bring data back into memory after they complete, allowing for subsequent offloader operations reuse the data. This results in only one data movement per ZIO at the beginning of a pipeline that is necessary for getting data from ZFS to the accelerator. When errors ocurr and the offloaded data is still accessible, the offloaded data will be onloaded (or dropped if it still matches the in-memory copy) for that ZIO pipeline stage and processed with ZFS. This will cause thrashing if a later operation offloads data. This should not happen often, as constant errors (resulting in data movement) is not expected to be the norm. Unrecoverable errors such as hardware failures will trigger pipeline restarts (if necessary) in order to complete the original ZIO using the software path. The modifications to ZFS can be thought of as two sets of changes: - The ZIO write pipeline - compression, checksum, RAIDZ generation, and write - Each stage starts by offloading data that was not previously offloaded - This allows for ZIOs to be offloaded at any point in the pipeline - Resilver - vdev_raidz_io_done (RAIDZ reconstruction, checksum, and RAIDZ generation), and write - Because the core of resilver is vdev_raidz_io_done, data is only offloaded once at the beginning of vdev_raidz_io_done - Errors cause data to be onloaded, but will not re-offload in subsequent steps within resilver - Write is a separate ZIO pipeline stage, so it will attempt to offload data The zio_decompress function has been modified to allow for offloading but the ZIO read pipeline as a whole has not, so it is not part of the above list. An example provider implementation can be found in module/zia-software-provider - The provider's "hardware" is actually software - data is "offloaded" to memory not owned by ZFS - Calls ZFS functions in order to not reimplement operations - Has kernel module parameters that can be used to trigger ZIA_ACCELERATOR_DOWN states for testing pipeline restarts. abd_t, raidz_row_t, and vdev_t have each been given an additional "void *<prefix>_zia_handle" member. These opaque handles point to data that is located on an offloader. abds are still allocated, but their payloads are expected to diverge from the offloaded copy as operations are run. Encryption and deduplication are disabled for zpools with Z.I.A. operations enabled Aggregation is disabled for offloaded abds RPMs will build with Z.I.A. Signed-off-by: Jason Lee <jasonlee@lanl.gov>
1 parent 92157c8 commit 6917b2f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

57 files changed

+5544
-70
lines changed

Makefile.am

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -57,6 +57,8 @@ dist_noinst_DATA += module/os/linux/spl/THIRDPARTYLICENSE.gplv2
5757
dist_noinst_DATA += module/os/linux/spl/THIRDPARTYLICENSE.gplv2.descrip
5858
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.cityhash
5959
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.cityhash.descrip
60+
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.zia
61+
dist_noinst_DATA += module/zfs/THIRDPARTYLICENSE.zia.descrip
6062

6163
@CODE_COVERAGE_RULES@
6264

config/Rules.am

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ AM_CPPFLAGS += -DPKGDATADIR=\"$(pkgdatadir)\"
4444
AM_CPPFLAGS += $(DEBUG_CPPFLAGS)
4545
AM_CPPFLAGS += $(CODE_COVERAGE_CPPFLAGS)
4646
AM_CPPFLAGS += -DTEXT_DOMAIN=\"zfs-@ac_system_l@-user\"
47+
AM_CPPFLAGS += $(ZIA_CPPFLAGS)
4748

4849
if ASAN_ENABLED
4950
AM_CPPFLAGS += -DZFS_ASAN_ENABLED

config/zfs-build.m4

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -263,6 +263,8 @@ AC_DEFUN([ZFS_AC_CONFIG], [
263263
AC_SUBST(TEST_JOBS)
264264
])
265265
266+
ZFS_AC_ZIA
267+
266268
ZFS_INIT_SYSV=
267269
ZFS_INIT_SYSTEMD=
268270
ZFS_WANT_MODULES_LOAD_D=
@@ -294,7 +296,8 @@ AC_DEFUN([ZFS_AC_CONFIG], [
294296
[test "x$qatsrc" != x ])
295297
AM_CONDITIONAL([WANT_DEVNAME2DEVID], [test "x$user_libudev" = xyes ])
296298
AM_CONDITIONAL([WANT_MMAP_LIBAIO], [test "x$user_libaio" = xyes ])
297-
AM_CONDITIONAL([PAM_ZFS_ENABLED], [test "x$enable_pam" = xyes])
299+
AM_CONDITIONAL([PAM_ZFS_ENABLED], [test "x$enable_pam" = xyes ])
300+
AM_CONDITIONAL([ZIA_ENABLED], [test "x$enable_zia" = xyes ])
298301
])
299302

300303
dnl #
@@ -342,6 +345,10 @@ AC_DEFUN([ZFS_AC_RPM], [
342345
RPM_DEFINE_COMMON=${RPM_DEFINE_COMMON}' --define "__strip /bin/true"'
343346
])
344347
348+
AS_IF([test "x$enable_zia" = xyes], [
349+
RPM_DEFINE_COMMON=${RPM_DEFINE_COMMON}' --define "$(WITH_ZIA) 1" --define "DPUSM_ROOT $(DPUSM_ROOT)"'
350+
])
351+
345352
RPM_DEFINE_UTIL=' --define "_initconfdir $(initconfdir)"'
346353
347354
dnl # Make the next three RPM_DEFINE_UTIL additions conditional, since

config/zia.m4

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,45 @@
1+
dnl # Adds --with-zia=PATH to configuration options
2+
dnl # The path provided should point to the DPUSM
3+
dnl # root and contain Module.symvers.
4+
AC_DEFUN([ZFS_AC_ZIA], [
5+
AC_ARG_WITH([zia],
6+
AS_HELP_STRING([--with-zia=PATH],
7+
[Path to Data Processing Services Module]),
8+
[
9+
DPUSM_ROOT="$withval"
10+
AS_IF([test "x$DPUSM_ROOT" != "xno"],
11+
[enable_zia=yes],
12+
[enable_zia=no])
13+
],
14+
[enable_zia=no]
15+
)
16+
17+
AS_IF([test "x$enable_zia" == "xyes"],
18+
AS_IF([! test -d "$DPUSM_ROOT"],
19+
[AC_MSG_ERROR([--with-zia=PATH requires the DPUSM root directory])]
20+
)
21+
22+
DPUSM_SYMBOLS="$DPUSM_ROOT/Module.symvers"
23+
24+
AS_IF([test -r $DPUSM_SYMBOLS],
25+
[
26+
AC_MSG_RESULT([$DPUSM_SYMBOLS])
27+
ZIA_CPPFLAGS="-DZIA=1 -I$DPUSM_ROOT/include"
28+
KERNEL_ZIA_CPPFLAGS="-DZIA=1 -I$DPUSM_ROOT/include"
29+
WITH_ZIA="_with_zia"
30+
31+
AC_SUBST(WITH_ZIA)
32+
AC_SUBST(KERNEL_ZIA_CPPFLAGS)
33+
AC_SUBST(ZIA_CPPFLAGS)
34+
AC_SUBST(DPUSM_SYMBOLS)
35+
AC_SUBST(DPUSM_ROOT)
36+
],
37+
[
38+
AC_MSG_ERROR([
39+
*** Failed to find Module.symvers in:
40+
$DPUSM_SYMBOLS
41+
])
42+
]
43+
)
44+
)
45+
])

include/Makefile.am

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -143,6 +143,9 @@ COMMON_H = \
143143
sys/zfs_vfsops.h \
144144
sys/zfs_vnops.h \
145145
sys/zfs_znode.h \
146+
sys/zia.h \
147+
sys/zia_cddl.h \
148+
sys/zia_private.h \
146149
sys/zil.h \
147150
sys/zil_impl.h \
148151
sys/zio.h \

include/sys/abd.h

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ typedef struct abd {
6666
list_t abd_gang_chain;
6767
} abd_gang;
6868
} abd_u;
69+
void *abd_zia_handle;
6970
} abd_t;
7071

7172
typedef int abd_iter_func_t(void *buf, size_t len, void *priv);

include/sys/fs/zfs.h

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -272,6 +272,19 @@ typedef enum {
272272
ZPOOL_PROP_DEDUP_TABLE_QUOTA,
273273
ZPOOL_PROP_DEDUPCACHED,
274274
ZPOOL_PROP_LAST_SCRUBBED_TXG,
275+
ZPOOL_PROP_ZIA_AVAILABLE,
276+
ZPOOL_PROP_ZIA_PROVIDER,
277+
ZPOOL_PROP_ZIA_COMPRESS,
278+
ZPOOL_PROP_ZIA_DECOMPRESS,
279+
ZPOOL_PROP_ZIA_CHECKSUM,
280+
ZPOOL_PROP_ZIA_RAIDZ1_GEN,
281+
ZPOOL_PROP_ZIA_RAIDZ2_GEN,
282+
ZPOOL_PROP_ZIA_RAIDZ3_GEN,
283+
ZPOOL_PROP_ZIA_RAIDZ1_REC,
284+
ZPOOL_PROP_ZIA_RAIDZ2_REC,
285+
ZPOOL_PROP_ZIA_RAIDZ3_REC,
286+
ZPOOL_PROP_ZIA_FILE_WRITE,
287+
ZPOOL_PROP_ZIA_DISK_WRITE,
275288
ZPOOL_NUM_PROPS
276289
} zpool_prop_t;
277290

include/sys/spa_impl.h

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,7 @@
5353
#include <sys/zfeature.h>
5454
#include <sys/zthr.h>
5555
#include <sys/dsl_deadlist.h>
56+
#include <sys/zia.h>
5657
#include <zfeature_common.h>
5758

5859
#ifdef __cplusplus
@@ -474,6 +475,8 @@ struct spa {
474475
*/
475476
spa_config_lock_t spa_config_lock[SCL_LOCKS]; /* config changes */
476477
zfs_refcount_t spa_refcount; /* number of opens */
478+
479+
zia_props_t spa_zia_props;
477480
};
478481

479482
extern char *spa_config_path;

include/sys/vdev_disk.h

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,5 +43,13 @@
4343

4444
#ifdef _KERNEL
4545
#include <sys/vdev.h>
46+
47+
#ifdef __linux__
48+
int __vdev_classic_physio(struct block_device *bdev, zio_t *zio,
49+
size_t io_size, uint64_t io_offset, int rw, int flags);
50+
int vdev_disk_io_flush(struct block_device *bdev, zio_t *zio);
51+
void vdev_disk_error(zio_t *zio);
52+
#endif /* __linux__ */
53+
4654
#endif /* _KERNEL */
4755
#endif /* _SYS_VDEV_DISK_H */

include/sys/vdev_file.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -41,6 +41,10 @@ typedef struct vdev_file {
4141
extern void vdev_file_init(void);
4242
extern void vdev_file_fini(void);
4343

44+
#ifdef __linux__
45+
extern mode_t vdev_file_open_mode(spa_mode_t spa_mode);
46+
#endif
47+
4448
#ifdef __cplusplus
4549
}
4650
#endif

0 commit comments

Comments
 (0)