Skip to content

Releases: ROCm/Tensile

Tensile 4.34.0 for ROCm 5.3.0

30 Sep 19:24
b33ca97
Compare
Choose a tag to compare

Added

  • Lazy loading of solution libraries and code object files
  • Support for dictionary style logic files
  • Support for decision tree based logic files using dictionary format
  • DecisionTreeLibrary for solution selection
  • DirectToLDS support for HGEMM
  • DirectToVgpr support for SGEMM
  • Grid based distance metric for solution selection
  • Support for gfx11xx
  • Support for DirectToVgprA/B + TLU=False
  • ForkParameters Groups as a way of specifying solution parameters
  • Support for a new Tensile yaml config format
  • TensileClientConfig for generating Tensile client config files
  • Options for TensileCreateLibrary to build client and create client config file

Optimizations

  • Solution generation is now cached and is not repeated if solution parameters are unchanged

Changed

  • Default MACInstruction to FMA

Fixed

  • Accept StaggerUStride=0 as valid
  • Reject invalid data types for UnrollLoopEfficiencyEnable
  • Fix invalid code generation issues related to DirectToVgpr
  • Return hipErrorNotFound if no modules are loaded
  • Fix performance drop for NN ZGEMM with 96x64 macro tile
  • Fix memory violation for general batched kernels when alpha/beta/K = 0

Tensile 4.33.0 for ROCm 5.2.3

18 Aug 16:59
da90ed3
Compare
Choose a tag to compare

Tensile code for ROCm 5.2.3 did not change. The library was rebuilt for the updated ROCm 5.2.3 stack.

Tensile 4.33.0 for ROCm 5.2.1

21 Jul 20:23
da90ed3
Compare
Choose a tag to compare

Tensile code for ROCm 5.2.1 did not change. The library was rebuilt for the updated ROCm 5.2.1 stack.

Tensile 4.33.0 for ROCm 5.2.0

28 Jun 18:42
da90ed3
Compare
Choose a tag to compare

Added

  • TensileUpdateLibrary for updating old library logic files
  • Support for TensileRetuneLibrary to use sizes from separate file
  • ZGEMM DirectToVgpr/DirectToLds/StoreCInUnroll/MIArchVgpr support
  • Tests for denorm correctness
  • Option to write different architectures to different TensileLibrary files

Optimizations

  • Optimize MessagePackLoadLibraryFile by switching to fread
  • DGEMM tail loop optimization for PrefetchAcrossPersistentMode=1/DirectToVgpr

Changed

  • Alpha/beta datatype remains as F32 for HPA HGEMM
  • Force assembly kernels to not flush denorms
  • Use hipDeviceAttributePhysicalMultiProcessorCount as multiProcessorCount

Fixed

  • Fix segmentation fault when run i8 datatype with TENSILE_DB=0x80

Tensile 4.32.0 for ROCm 5.1.3

20 May 17:05
Compare
Choose a tag to compare

Tensile code for ROCm 5.1.3 did not change. The library was rebuilt for the updated ROCm 5.1.3 stack.

Tensile 4.32.0 for ROCm 5.1.1

08 Apr 20:52
Compare
Choose a tag to compare

Tensile code for ROCm 5.1.1 did not change. The library was rebuilt for the updated ROCm 5.1.1 stack.

Tensile 4.32.0 for ROCm 5.1.0

30 Mar 17:26
Compare
Choose a tag to compare

Added

  • Better control of parallelism to control memory usage
  • Support for multiprocessing on Windows for TensileCreateLibrary
  • New JSD metric and metric selection functionality
  • Initial changes to support two-tier solution selection

Optimized

  • Optimized runtime of TensileCreateLibraries by reducing max RAM usage
  • StoreCInUnroll additional optimizations plus adaptive K support
  • DGEMM NN optimizations with PrefetchGlobalRead(PGR)=2 support

Changed

  • Update Googletest to 1.11.0

Removed

  • Remove no longer supported benchmarking steps

Tensile 4.31.0 for ROCm 5.0.2

04 Mar 17:54
Compare
Choose a tag to compare

Tensile code for ROCm 5.0.2 is unchanged from Tensile for ROCm 5.0.1. The library was rebuilt for the updated ROCm 5.0.2 stack.

Tensile 4.31.0 for ROCm 5.0.1

16 Feb 22:17
Compare
Choose a tag to compare

Tensile code for ROCm 5.0.1 is unchanged from Tensile for ROCm 5.0.0. The library was rebuilt for the updated ROCm 5.0.1 stack.

Tensile 4.31.0 for ROCm 5.0.0

09 Feb 20:34
Compare
Choose a tag to compare

Added

  • DirectToLds support (x2/x4)
  • DirectToVgpr support for DGEMM
  • Parameter to control number of files kernels are merged into to better parallelize kernel compilation
  • FP16 alternate implementation for HPA HGEMM on aldebaran

Optimized

  • Add DGEMM NN custom kernel for HPL on aldebaran

Changed

  • Update tensile_client executable to std=c++14

Removed

  • Remove unused old Tensile client code

Fixed

  • Fix hipErrorInvalidHandle during benchmarks
  • Fix addrVgpr for atomic GSU
  • Fix for Python 3.8: add case for Constant nodeType
  • Fix architecture mapping for gfx1011 and gfx1012
  • Fix PrintSolutionRejectionReason verbiage in KernelWriter.py
  • Fix vgpr alignment problem when enabling flat buffer load