A single block generic parallel bubble sort implementation (CUDA)
# run from repo dir
nvcc -o out/generic-bubble-sort generic-bubble-sort.cu
To know how this implementation works, check this answer posted on stackoverflow.
- push the integer version (done by Nadhir)
- write the generic version
- optimize memory access by using the shared memory
- describe the parallel approach in the README.md