Skip to content

Releases: ROCm/rocPRIM

rocPRIM 2.11.0 for ROCm 5.3.0

30 Sep 19:26
Compare
Choose a tag to compare

Added

  • New functions subtract_left and subtract_right in block_adjacent_difference to apply functions
    on pairs of adjacent items distributed between threads in a block.
  • New device level adjacent_difference primitives.
  • Added experimental tooling for automatic kernel configuration tuning for various architectures
  • Benchmarks collect and output more detailed system information
  • CMake functionality to improve build parallelism of the test suite that splits compilation units by
    function or by parameters.
  • Reverse iterator.

rocPRIM 2.10.14 for ROCm 5.2.3

18 Aug 16:59
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.2.3 did not change. The library was rebuilt for the updated ROCm 5.2.3 stack.

rocPRIM 2.10.14 for ROCm 5.2.1

21 Jul 20:24
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.

rocPRIM 2.10.14 for ROCm 5.2.0

28 Jun 18:45
Compare
Choose a tag to compare

Added

  • Packages for tests and benchmark executable on all supported OSes using CPack.
  • Added File/Folder Reorg Changes and Enabled Backward compatibility support using wrapper headers.

rocPRIM 2.10.13 for ROCm 5.1.3

20 May 17:05
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.1.3 did not change. The library was rebuilt for the updated ROCm 5.1.3 stack.

rocPRIM 2.10.13 for ROCm 5.1.1

08 Apr 20:53
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.1.1 did not change. The library was rebuilt for the updated ROCm 5.1.1 stack.

rocPRIM 2.10.13 for ROCm 5.1.0

30 Mar 17:31
Compare
Choose a tag to compare

Fixed

  • Fixed radix sort int64_t bug introduced in [2.10.11]

Added

  • Future value
  • Added device partition_three_way to partition input to three output iterators based on two predicates

Changed

  • The reduce/scan algorithm precision issues in the tests has been resolved for half types.

Known issues

  • device_segmented_radix_sort unit test failing for HIP on Windows

rocPRIM-2.10.12 for ROCm 5.0.2

04 Mar 17:53
76ebab2
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.0.2 is unchanged from rocPRIM for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.

rocPRIM-2.10.12 for ROCm 5.0.1

16 Feb 22:14
76ebab2
Compare
Choose a tag to compare

rocPRIM code for ROCm 5.0.1 is unchanged from rocPRIM for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

rocPRIM-2.10.12 for ROCm 5.0.0

09 Feb 20:28
76ebab2
Compare
Choose a tag to compare

Fixed

  • Enable bfloat16 tests and reduce threshold for bfloat16
  • Fix device scan limit_size feature
  • Non-optimized builds no longer trigger local memory limit errors

Added

  • Added scan size limit feature
  • Added reduce size limit feature
  • Added transform size limit feature
  • Add block_load_striped and block_store_striped
  • Add gather_to_blocked to gather values from other threads into a blocked arrangement
  • The block sizes for device merge sorts initial block sort and its merge steps are now separate in its kernel config
    • the block sort step supports multiple items per thread

Changed

  • size_limit for scan, reduce and transform can now be set in the config struct instead of a parameter
  • Device_scan and device_segmented_scan: inclusive_scan now uses the input-type as accumulator-type, exclusive_scan uses initial-value-type.
    • This particularly changes behaviour of small-size input types with large-size output types (e.g. short input, int output).
    • And low-res input with high-res output (e.g. float input, double output)
  • Revert old Fiji workaround, because they solved the issue at compiler side
  • Update README cmake minimum version number
  • Block sort support multiple items per thread
    • currently only powers of two block sizes, and items per threads are supported and only for full blocks
  • Bumped the minimum required version of CMake to 3.16

Known issues

  • Unit tests may soft hang on MI200 when running in hipMallocManaged mode.
  • device_segmented_radix_sort, device_scan unit tests failing for HIP on Windows
  • ReduceEmptyInput cause random faulire with bfloat16