The Model Cache Utils (MCU) (formerly Triton Kernel Development Kit (TKDK)) is a suite of tools designed to streamline and enhance the development workflow for Model Kernel developers. Whether you're optimizing cache usage, monitoring kernel performance, or distributing your builds securely, MCU has you covered. MCU supports Triton and vLLM.
Organize, index, and monitor your Model kernel caches. This tool provides detailed reports on cache usage, offering data-driven insights into compilation performance and cache effectiveness. For more information please see the MCM readme.
Package Model/GPU kernel caches into OCI-compliant container images. Secure your caches with cryptographic signing, enabling safe and efficient cache distribution and reuse across environments and teams. For more information please see the MCV readme.
Write cleaner, more intuitive Triton code with high-level abstractions and utilities for loading, storing, and debugging GPU memory.
Triton-util was developed by Umer Adil and generously contributed to MCU.
For more information please see the Triton Util readme.
-
Clone this repository:
git clone https://github.com/redhat-et/MCU.git cd MCU
-
Follow setup instructions for each tool in its respective directory.
MCU/
├── mcm/ # Model Cache Manager
├── mcv/ # OCI packaging and signing tool
├── triton_util/ # Triton Utilities
└── README.md # You're here!
Model Cache Vault ensures that your cache packages are:
- Packaged using OCI standards
- Signed cryptographically for tamper-proof integrity
- Easily distributable across environments and pipelines
- Improve Triton/vLLM kernel cache management
- Package and share caches across machines or Kubernetes environments.
We welcome contributions! If you find bugs, have feature suggestions, or want to contribute code, please open an issue or submit a pull request.
Apache License Version 2.0. See LICENSE for details.