This code implements optimal parallel loops for BKMA-like algorithm. Two use case implementations are available in the code, a 1D semi-lagrangian advection operator and a 1D Convolution operator.
Implement a 1D convolution operator in-place using BKMA strategies.
Implements a 1D advection operator inside a multidimensionnal space. It implements a semi-Lagrangian scheme using the SYCL 2020 progamming models.
To reproduce the benchmark, follow the benchmark README.md instructions.
The algorithm is implemented in various ways using different SYCL constructs. It requires local memory allocation via the local accessor. The implementations are in the src/core
directory.
- BasicRange (out of place), no hierarchical parallelism involved
- NDRange (in-place), work-groups and work-items, direct mapping of the problem dimensions
- AdaptiveWg (in-place or out-of-place), optimized work-group sizes, streaming, optimal local memory usage
You can use the compile.sh
script to compile for various hardware and sycl-implementations. For multi-device compilation flows, build the project manually.
Use the ./compile.sh --help
to see the options.
Example usage:
#generate the advection executable and advection.ini file
./compile.sh --hw cpu --sycl dpcpp
#create build_dpcpp_a100 folder with benchmarks
./compile.sh --hw a100 --sycl dpcpp --benchmark_DIR=/path/to/google/benchmark/build
#create build_acpp_mi300 folder with tests and execute tests
./compile.sh --hw mi300 --sycl acpp --build-tests --run-tests
Flags varies on the SYCL implementation you are using.
- For DPC++, add the correct flags via the
-DDPCPP_FSYCL_TARGETS
cmake variable. - For acpp, export the
ACPP_TARGETS
environment variable before compiling
- Set the runtime parameters in
build/src/<conv1d|advection>.ini
- Run the executable
build/src/advection/<conv1d|advection>
The advection operator in this code is largely inspired by the vlp4D code.