-
Notifications
You must be signed in to change notification settings - Fork 1.6k
Description
Is your feature request related to a problem? Please describe.
The security of FATE is dependent on several privacy computing techniques while corresponding cryptographic operations are time-consuming. The bottleneck comes from the complex calculations of ciphertexts which are usually large integers. It is hard for CPU to achieve high performance without changing the underlying algorithm. Thus, it is worthwhile to support different types of computing devices for the calculations.
Describe the solution you'd like
This proposal intends to allow FATE to support heterogeneous acceleration architecture, including APIs, data structure, etc., which efficiently combines FATE with hardware devices.
Different from CPU, hardware devices like FPGA and GPU are designed with SIMD architecture. They can achieve considerate throughput with the nature of high computing parallelism. Therefore, it is promising to use these devices to accelerate the computing-density calculations in FATE. The assistance of hardware devices is called heterogeneous acceleration for FATE.
To successfully leverage hardware-based accelerators, a three-layer architecture is needed in general.
The lowermost layer is a library which implements different cryptographic operations by managing and calling the devices. It is expandable considering increasing operations and variant devices. To achieve efficient interaction with hardware device, the library is usually developed with C/C++ language. As a result, an effective cross-language binding is also required to make the library accessible for FATE.
The middle layer performs as the middleware of the architecture. It defines the data structure to store and transfer data related to cryptographic operations. As mentioned above, the high throughput of hardware devices comes from high parallelism. It indicates that operations with single slice of data lead to great waste of performance of devices. In order to maximize the utilization of the device and minimize the overhead of cross-device interaction, data should be stored in blocks and processed by the device parallelly. Reasonable memory layout which is both transfer-friendly and computing-friendly is the key point in the design of data structure.
The uppermost layer consists of multiple computing APIs, which are similar to current APIs in FATE. These APIs overload the cryptographic operations for the data structure defined in the middle layer. Thus, they can be called without much modification to FATE. In addition to the calculations, minimal but necessary data format conversion is also required before executing the operations to construct the well-defined data structure.
Furthermore, few minor changes are also required to fit the architecture above. For example, additional training parameters are needed for the user to specify the configuration of hardware devices, including type and number.