Releases: ROCm/rocPRIM
Releases · ROCm/rocPRIM
rocPRIM 2.11.0 for ROCm 5.3.0
Added
- New functions
subtract_left
andsubtract_right
inblock_adjacent_difference
to apply functions
on pairs of adjacent items distributed between threads in a block. - New device level
adjacent_difference
primitives. - Added experimental tooling for automatic kernel configuration tuning for various architectures
- Benchmarks collect and output more detailed system information
- CMake functionality to improve build parallelism of the test suite that splits compilation units by
function or by parameters. - Reverse iterator.
rocPRIM 2.10.14 for ROCm 5.2.3
rocPRIM code for ROCm 5.2.3 did not change. The library was rebuilt for the updated ROCm 5.2.3 stack.
rocPRIM 2.10.14 for ROCm 5.2.1
rocPRIM code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.
rocPRIM 2.10.14 for ROCm 5.2.0
Added
- Packages for tests and benchmark executable on all supported OSes using CPack.
- Added File/Folder Reorg Changes and Enabled Backward compatibility support using wrapper headers.
rocPRIM 2.10.13 for ROCm 5.1.3
rocPRIM code for ROCm 5.1.3 did not change. The library was rebuilt for the updated ROCm 5.1.3 stack.
rocPRIM 2.10.13 for ROCm 5.1.1
rocPRIM code for ROCm 5.1.1 did not change. The library was rebuilt for the updated ROCm 5.1.1 stack.
rocPRIM 2.10.13 for ROCm 5.1.0
Fixed
- Fixed radix sort int64_t bug introduced in [2.10.11]
Added
- Future value
- Added device partition_three_way to partition input to three output iterators based on two predicates
Changed
- The reduce/scan algorithm precision issues in the tests has been resolved for half types.
Known issues
- device_segmented_radix_sort unit test failing for HIP on Windows
rocPRIM-2.10.12 for ROCm 5.0.2
rocPRIM code for ROCm 5.0.2 is unchanged from rocPRIM for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.
rocPRIM-2.10.12 for ROCm 5.0.1
rocPRIM code for ROCm 5.0.1 is unchanged from rocPRIM for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.
rocPRIM-2.10.12 for ROCm 5.0.0
Fixed
- Enable bfloat16 tests and reduce threshold for bfloat16
- Fix device scan limit_size feature
- Non-optimized builds no longer trigger local memory limit errors
Added
- Added scan size limit feature
- Added reduce size limit feature
- Added transform size limit feature
- Add block_load_striped and block_store_striped
- Add gather_to_blocked to gather values from other threads into a blocked arrangement
- The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config
- the block sort step supports multiple items per thread
Changed
- size_limit for scan, reduce and transform can now be set in the config struct instead of a parameter
- Device_scan and device_segmented_scan:
inclusive_scan
now uses the input-type as accumulator-type,exclusive_scan
uses initial-value-type.- This particularly changes behaviour of small-size input types with large-size output types (e.g.
short
input,int
output). - And low-res input with high-res output (e.g.
float
input,double
output)
- This particularly changes behaviour of small-size input types with large-size output types (e.g.
- Revert old Fiji workaround, because they solved the issue at compiler side
- Update README cmake minimum version number
- Block sort support multiple items per thread
- currently only powers of two block sizes, and items per threads are supported and only for full blocks
- Bumped the minimum required version of CMake to 3.16
Known issues
- Unit tests may soft hang on MI200 when running in hipMallocManaged mode.
- device_segmented_radix_sort, device_scan unit tests failing for HIP on Windows
- ReduceEmptyInput cause random faulire with bfloat16