Skip to content
fulvius31 edited this page Feb 18, 2025 · 4 revisions

Triton Cache Wiki

Overview

Triton's cache system accelerates kernel compilation by storing precompiled artifacts (PTX, CUBIN, HSACO etc.) and metadata. This document explains its structure, behavior and customization options.


Cache Directory Structure

By default, Triton stores cached kernels in:~/.triton/cache/. This can be customized using the TRITON_CACHE_DIR environment variable.

File Types

Extension Description
.json Metadata, compilation parameters
.cubin / .hsaco Compiled binary kernel (CUDA, ROCm)
.ptx / .amdgcn PTX or AMDGCN intermediate representation
.llir LLVM IR
.ttir Triton IR
.ttgir Triton GPU IR

Example cache structure (NVIDIA):

$ tree ~/.triton/cache/QU0JRSfWJiAb9DadP-xn4vDFWO9yNo7Am32JeY1alLc
├── __grp__triton_poi_fused_threshold_backward_1.json
├── triton_poi_fused_threshold_backward_1.cubin
├── triton_poi_fused_threshold_backward_1.json
├── triton_poi_fused_threshold_backward_1.llir
├── triton_poi_fused_threshold_backward_1.ptx
├── triton_poi_fused_threshold_backward_1.ttgir
└── triton_poi_fused_threshold_backward_1.ttir

Cache Key Generation

The cache key is generated from multiple inputs to ensure uniqueness:

  1. Triton Environment

    • Triton version (not stored in metadata but included in the hash)
  2. Kernel Identity

    • Function name
    • Signature types (normalized, for example, pointer types become "ptr")
    • Constant values
    • Kernel attributes
  3. Backend Configuration

    • Backend (rocm/cuda)
    • GPU architecture
    • Warp size
    • Compilation options (num_warps, num_stages, etc.)
  4. Environment Factors

    • TRITON_DEBUG variable state (0 or 1)

Key Generation Code:

import hashlib
import base64

def make_so_cache_key(version_hash, signature, constants, ids, **kwargs):
    signature = {k: 'ptr' if v[0] == '*' else v for k, v in signature.items()}
    key = f"{version_hash}-{''.join(signature.values())}-{constants}-{ids}"
    for kw in kwargs:
        key = f"{key}-{kwargs.get(kw)}"
    key = hashlib.sha256(key.encode("utf-8")).hexdigest()
    return _base32(key)

Note: The Triton version is included in the cache key but is not stored in the metadata files. If you need to check the version, you must track it externally.


Environment Variables

Variable Default Description
TRITON_CACHE_DIR ~/.triton/cache Custom cache directory
TRITON_ALWAYS_COMPILE 0 Bypass cache (force recompilation) when set to 1
TRITON_KERNEL_OVERRIDE 0 Enable manual kernel IR overrides
TRITON_OVERRIDE_DIR ~/.triton/override/ Directory for manually overridden kernels
TRITON_KERNEL_DUMP 0 Enable kernel IR dumping
TRITON_DUMP_DIR ~/.triton/dump/ Directory for dumped compilation artifacts
TRITON_STORE_BINARY_ONLY 0 Store only essential binaries (saves ~77% space)
TRITON_DEBUG 0 Include debug info in cache key (affects hashing)

Cache Management

Force Recompilation

To completely bypass the cache:

export TRITON_ALWAYS_COMPILE=1

Storage Optimization

Enable binary-only storage to save space:

export TRITON_STORE_BINARY_ONLY=1

This reduces stored files to:

  • .json (metadata)
  • .cubin/.hsaco (compiled binaries)

Remote Caching

Triton supports distributed caching via RemoteCacheManager. Example Redis setup:

import os

# Configure via environment variables
os.environ["TRITON_REMOTE_CACHE_BACKEND"] = "triton.backends.redis:RedisRemoteCacheBackend"
os.environ["TRITON_REDIS_HOST"] = "redis.example.com"
os.environ["TRITON_REDIS_PORT"] = "6379"

Troubleshooting

Issue Solution
Stale Cache Delete cache or use TRITON_ALWAYS_COMPILE=1
Version Mismatch Triton upgrades and different environment variable change the cache hash, invalidating old caches