Skip to content

Navigation Menu

Appearance settings

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

Appearance settings

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

ROCm / Tensile Public

Notifications You must be signed in to change notification settings
Fork 166
Star 245

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Wiki
Security
Insights

Releases: ROCm/Tensile

Releases · ROCm/Tensile

Tensile 4.36.0 for ROCm 5.5.1

24 May 19:05

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.36.0 for ROCm 5.5.1

Tensile code for ROCm 5.5.1 did not change. The library was rebuilt for the updated ROCm 5.5.1 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.36.0 for ROCm 5.5.0

01 May 21:02

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.36.0 for ROCm 5.5.0

Added

Add functions for user-driven tuning
Add GFX11 support: HostLibraryTests yamls, rearragne FP32(C)/FP64(C) instruction order, archCaps for instruction renaming condition, adjust vgpr bank for A/B/C for optimize, separate vscnt and vmcnt, dual mac
Add binary search for Grid-Based algorithm
Add reject condition for (StoreCInUnroll + BufferStore=0) and (DirectToVgpr + ScheduleIterAlg<3 + PrefetchGlobalRead==2)
Add support for (DirectToLds + hgemm + NN/NT/TT) and (DirectToLds + hgemm + GlobalLoadVectorWidth < 4)
Add support for (DirectToLds + hgemm(TLU=True only) or sgemm + NumLoadsCoalesced > 1)
Add GSU SingleBuffer algorithm for HSS/BSS
Add gfx900:xnack-, gfx1032, gfx1034, gfx1035
Enable gfx1031 support

Optimizations

Use AssertSizeLessThan for BufferStoreOffsetLimitCheck if it is smaller than MT1
Improve InitAccVgprOpt

Changed

Use global_atomic for GSU instead of flat and global_store for debug code
Replace flat_load/store with global_load/store
Use global_load/store for BufferLoad/Store=0 and enable scheduling
LocalSplitU support for HGEMM+HPA when MFMA disabled
Update Code Object Version
Type cast local memory to COMPUTE_DATA_TYPE in LDS to avoid precision loss
Update asm cap cache arguments
Unify SplitGlobalRead into ThreadSeparateGlobalRead and remove SplitGlobalRead
Change checks, error messages, assembly syntax, and coverage for DirectToLds
Remove unused cmake file
Clean up the LLVM dependency code
Update ThreadSeparateGlobalRead test cases for PrefetchGlobalRead=2
Update sgemm/hgemm test cases for DirectToLds and ThreadSepareteGlobalRead

Fixed

Add build-id to header of compiled source kernels
Fix solution index collisions
Fix h beta vectorwidth4 correctness issue for WMMA
Fix an error with BufferStore=0
Fix mismatch issue with (StoreCInUnroll + PrefetchGlobalRead=2)
Fix MoveMIoutToArch bug
Fix flat load correctness issue on I8 and flat store correctness issue
Fix mismatch issue with BufferLoad=0 + TailLoop for large array sizes
Fix code generation error with BufferStore=0 and StoreCInUnrollPostLoop
Fix issues with DirectToVgpr + ScheduleIterAlg<3
Fix mismatch issue with DGEMM TT + LocalReadVectorWidth=2
Fix mismatch issue with PrefetchGlobalRead=2
Fix mismatch issue with DirectToVgpr + PrefetchGlobalRead=2 + small tile size
Fix an error with PersistentKernel=0 + PrefetchAcrossPersistent=1 + PrefetchAcrossPersistentMode=1
Fix mismatch issue with DirectToVgpr + DirectToLds + only 1 iteration in unroll loop case
Remove duplicate GSU kernels: for GSU = 1, GSUAlgorithm SingleBuffer and MultipleBuffer kernels are identical
Fix for failing CI tests due to CpuThreads=0
Fix mismatch issue with DirectToLds + PrefetchGlobalRead=2
Remove the reject condition for ThreadSeparateGlobalRead and DirectToLds (HGEMM, SGEMM only)
Modify reject condition for minimum lanes of ThreadSeparateGlobalRead (SGEMM or larger data type only)

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.34.0 for ROCm 5.3.3

17 Nov 19:21

ROCmMathLibrariesBot

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.34.0 for ROCm 5.3.3

Tensile code for ROCm 5.3.3 did not change. The library was rebuilt for the updated ROCm 5.3.3 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.34.0 for ROCm 5.3.2

10 Nov 01:04

ROCmMathLibrariesBot

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.34.0 for ROCm 5.3.2

Tensile code for ROCm 5.3.2 did not change. The library was rebuilt for the updated ROCm 5.3.2 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.35.0 for ROCm 5.4.4

22 Mar 20:46

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.35.0 for ROCm 5.4.4

Tensile code for ROCm 5.4.4 did not change. The library was rebuilt for the updated ROCm 5.4.4 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.35.0 for ROCm 5.4.3

07 Feb 17:32

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.35.0 for ROCm 5.4.3

Tensile code for ROCm 5.4.3 did not change. The library was rebuilt for the updated ROCm 5.4.3 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.35.0 for ROCm 5.4.2

13 Jan 16:40

rocm-ci

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.35.0 for ROCm 5.4.2

Tensile code for ROCm 5.4.2 did not change. The library was rebuilt for the updated ROCm 5.4.2 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.35.0 for ROCm 5.4.1

15 Dec 18:38

ROCmMathLibrariesBot

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.35.0 for ROCm 5.4.1

Tensile code for ROCm 5.4.1 did not change. The library was rebuilt for the updated ROCm 5.4.1 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.35.0 for ROCm 5.4.0

30 Nov 17:32

ROCmMathLibrariesBot

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.35.0 for ROCm 5.4.0

Added

Async DMA support for Transpose Data Layout (ThreadSeparateGlobalReadA/B)
Option to output library logic in dictionary format
No solution found error message for benchmarking client
Exact K check for StoreCInUnrollExact
Support for CGEMM + MIArchVgpr
client-path parameter for using prebuilt client
CleanUpBuildFiles global parameter
Debug flag for printing library logic index of winning solution
NumWarmups global parameter for benchmarking
Windows support for benchmarking client
DirectToVgpr support for CGEMM
TensileLibLogicToYaml for creating tuning configs from library logic solutions

Optimizations

Put beta code and store separately if StoreCInUnroll = x4 store
Improved performance for StoreCInUnroll + b128 store

Changed

Re-enable HardwareMonitor for gfx90a
Decision trees use MLFeatures instead of Properties

Fixed

Reject DirectToVgpr + MatrixInstBM/BN > 1
Fix benchmark timings when using warmups and/or validation
Fix mismatch issue with DirectToVgprB + VectorWidth > 1
Fix mismatch issue with DirectToLds + NumLoadsCoalesced > 1 + TailLoop
Fix incorrect reject condition for DirectToVgpr
Fix reject condition for DirectToVgpr + MIWaveTile < VectorWidth
Fix incorrect instruction generation with StoreCInUnroll

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Tensile 4.34.0 for ROCm 5.3.1

28 Oct 16:57

lawruble13

This commit was created on GitHub.com and signed with GitHub’s verified signature. The key has expired.

GPG key ID: 4AEE18F83AFDEB23

Expired

Verified

Learn about vigilant mode.

Compare

Choose a tag to compare

Loading

Tensile 4.34.0 for ROCm 5.3.1

Tensile code for ROCm 5.3.1 did not change. The library was rebuilt for the updated ROCm 5.3.1 stack.

Assets 2

Loading

Uh oh!

There was an error while loading. Please reload this page.

All reactions

Previous 1 2 3 4 5 6 7 8 Next

Footer

© 2025 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.