Skip to content

db-tu-dresden/BTW2025-TSL-Tutorial

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SIMD for Everyone - A tutorial to TSL

This repository contains supplementary material for the tutorial "SIMD for Everyone - A tutorial to TSL", presented at Workshop on Novel Data Management Ideas on Heterogeneous Hardware Architectures (NoDMC) @ 21st Conference on Database Systems for Business, Technology and Web (BTW 2025) in Bamberg, Germany.

If you want to work "offline", just clone the repository and open it with VSCode (using the provided devcontainer):

git clone --recurse-submodules https://github.com/db-tu-dresden/BTW2025-TSL-Tutorial.git

The setup is tested on:

  • x86-64 Linux (Arch, Ubuntu)
  • aarch64 Linux (Ubuntu)
  • x86-64 Windows 11

On MacOS (with M>=1) change Line 4 in .devcontainer/devcontainer.json from "Dockerfile" to "DockerfileARM".

If you face any hurdles setting up the environment, we highly recommend to use Github Codespace (with the provided devcontainer).

To bootstrap the tutorial, the repository contains a devcontainer and an associated dockerfile. The image contains:

  • Development tools (for building the examples)
    • clang, gcc (generate x86 binaries)
    • aarch64-linux-gnu-gcc, aarch64-linux-gnu-binutils (generate aarch64 binaries)
    • cmake, make, ninja
    • python (execute the TSL generator)
  • Execution/Emulation environment
    • qemu (run/emulate aarch64 code on x86)
    • intel-sde (emulate latest x86 hardware on older x86 platforms)
  • TSL
    • The TSL will be installed after container start up under /usr/include/tsl. Consequently, it can be directly included and used.
    • Additionally, the TSLgenerator is located under 3rdparty/tslgen.

For the course of the tutorial, we will provide an introduction in how to use and extend the Template SIMD Library (available on Github) for exploiting hardware provided SIMD capabilities in a hardware-agnostic way.

Filter-Aggregation: TSL in action

To provide an overview about how to use and extend the TSL, we use a filter aggregation kernel as a toy example. Basically, the filter-aggregation iterates over two chunks of consecutive memory, loads data from one location, compares it with a specific value and accumulates the corresponding values from the other piece of memory. In order to exploit data-level parallelism, we will implement the algorithm using explicit SIMD programming with the help of the TSL (An overview of the supported instructions can be found here). The associated files are:

The code can be build using either the VS-Code CMake integration or using a terminal:

cmake -S . -B build && cmake --build build -j4

The resulting binaries will be located under ./bin/

Adding std::float16_t: Extending TSL

In order to extend the TSL, we will add support for half-precision floating point numbers into the Filter-Aggregation kernel. As a showcase, we decided to extend the filtering part. In order to extend the TSL to support std::float16_t, we need to change the following files:

An experimental implementation can be found here.

To generate the TSL and prepare the the build environment, just run

cmake -S . -B build_fp16 -DGENERATE_TSL=True -DArchId=sapphirerapids -DCMAKE_CXX_COMPILER=g++

As clang-19 seems to lack support for std::float16_t, we use g++-14 here.

Next, build the code using

cmake --build build_fp16 -j4

As we assume the participants lack direct access to Intel Sapphire Rapids cores, we use intel_sde to emulate the necessary hardware with support for avx512-fp16:

intel-sde -spr -- bin/filter_agg

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published