Skip to content

pm4rtx/threadgroup_bitonic_sort_hlsl

Repository files navigation

threadgroup_bitonic_sort<T> HLSL MIT GitHub

What is it?

This is an implementaion of threadgroup wide bitonic sort in HLSL.

Purpose

Sometimes, it is desired to sort elements within a thread group on the GPU. The threadgroup_bitonic_sort.hlsli header file provides multiple variants of the bitonic sort to support any power-of-2 threadgroup size and the number of sortable elements of up to 4096.

Features

  • It is agnostic of wave/warp sizes
  • It automatically switches to sorting and shuffling within waves/warps by utilising wave intrinsics when the sizes of sorted/shuffled blocks become smaller than the size of waves/warps in a threadgroup (check out AMD RGA codegen on godbolt.org)
  • It supports GPUs without wave intrinsic support
  • It supports sorting of up to 4096 elements within a thread group (sorting 4096 elements requires the size of a thread group to be 1024 threads)
  • For a thread group with N threads, it supports sorting of N, N * 2 or N * 4 elements

Building Demo

To build demo.cpp, run build.bat from Visual Studio Command Prompt. The batch file should automatically download the required packages (D3D12, DXC), build and run all shader variants as benchmarks.

The header file can be compiled with DX Compiler release for February 2025 or earlier.

License

This header file is available to anybody free of charge, under the terms of MIT License (see LICENSE.md).

About

An HLSL implementation of a thread group wide bitonic sort for GPU

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published