Description
Introduction
This RFC aims to provide a framework for native Zephyr support for a range of different embedded SoC hardware blocks that accelerate common functionality such as DSP, trigonometric offloading, image processing, and other compute-intensive tasks.
Examples of some of the hardware blocks that fall into this category:
- STM32 CORDIC block for massively SIMD processing
- STM32 FMAC multiply/accumulate block used for FIR or IIR filtering, or for convolution
- STM32 DFSDM for low-pass filtering and decimation
- NXP S32R Signal Processing Toolbox (SPT) capable of FFT, histogram calculation, and trig functions
- NXP S32R BBE32 DSP
- NXP S32V APEX vision core capable of a wide variety of image algorithms
Problem description
Many SoCs provide hardware accelerators that can significantly improve the efficiency of various embedded processing tasks. Currently Zephyr does not provide a driver API for accessing any of these hardware accelerators and users must use whatever HAL the ASIC vendor provides, including clock, reset, IRQ, and DMA configuration.
Proposed change
Because many of these hardware accelerators perform vastly different functions, providing an API that explicitly and generically exposes all capabilities of all supported modules is not feasible. Instead, I propose a flexible hardware accelerator API that allows drivers to expose their unique capabilities.
Detailed RFC
WIP
__subsystem struct accel_driver_api {
void *get_hal_handle(void);
int query_hw_caps(*caps);
#ifdef RTIO
int iodev_submit(*iodev_sqe);
#endif
int config(*hw_specific);
int set_buffer_format(which, fmt);
int write_vector(which, *buf, len);
int callback_set(which, *cb);
int start(void);
int abort(void);
}
Proposed change (Detailed)
Many of the representative hardware blocks have a few common features:
- Many (not all) can connect directly to ADCs and DACs in addition to DMA. It's unclear exactly what common functionality we can provide here, but DMA for input and output should not be required.
- DMA tx and rx should be supported for all drivers but optional, with some capacity to point the driver at other hardware producers or consumers of data in a device-specific way.
- I believe most of these hardware blocks are primarily used in single dedicated applications. Some, such as the CORDIC block, may be used by multiple algorithms but most applications appear to be served with a static configuration and act as a single-producer single-consumer stream processor.
- Static configuration using the device tree should be highest priority, with an RTIO-like API secondary priority.
- Several of these blocks (filter coefficients, FFT twiddle tables, etc) require initialization vectors.
- Initialization vectors should be supported by the API, though the number of vectors and their size is hardware-specific and in some cases application-specific.
- Configuration for some of these blocks (the ST ones, at least) seems to be fairly straight-forward, resembling more of a SIMD pipeline. The NXP cores on the other hand appear to be more MIMD with documentation recommending configuration using a vendor-supplied tool that embeds some fixed configuration into the application.
- Exposing a HAL-compatible device-specific hardware reference through the API would allow the user to re-use vendor-specific configuration while allowing Zephyr to manage common system-wide features like clocks, resets, and DMA.
- Complex blocks with their own sequencer state machines may be serviceable with a microcode initialization vector (or kernel) provided by a vendor-specific tool.
- Some blocks are tuned for one-shot operation on buffers while some run continuously in a streaming mode.
- Support three modes: one-shot, framed continuous (callbacks on each frame, used for e.g. streaming video processing with callbacks on each frame), and simple continuous (continuous ADC sample filtering, for instance).
- Input and output data formats can be both hardware- and application-specific.
- Support static and runtime configuration of buffer formats. Support querying of supported formats.
Dependencies
Some of these devices will be really useful for the DSP subsystem. For now I'm not including any features in this RFC that would enable this hardware to be used with the DSP API.
Concerns and Unresolved Questions
Some of this hardware is so application-specific that it may not make sense to wrap it in any kind of standard API. This RFC isn't necessarily intended to pull in every DSP or image processing block out there, just more of them than are available today.
I am currently explicitly excluding AI and ML hardware accelerators from the RFC, mainly because of my lack of understanding. These increasingly-common blocks may or may not fit into this API.
Alternatives
Currently the only alternative I'm aware of is for users to use the vendor-specific HAL to take advantage of this hardware.
As one of the main benefits this RFC provides is consistent treatment of clocks, resets, interrupts, and DMA, one alternative is a single reusable generic driver that provides only common initialization routines for those shared resources. Entries for this hardware could be added to the device trees of compatible hardware and the user could gain access to the HAL-compatible handles using a simple API. This would essentially move the implementation of hardware-specific configuration to the user application, but still provide consistent initialization of shared resources in Zephyr.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status