Dynamo Release v0.1.0
Dynamo v0.1.0 version will be released following Jensen Huang’s GTC keynote, and the product will be hosted on github.com/ai-dynamo. It’s an open source project with Apache 2 license, and public continuous integration will be available from the start to enable industry-wide collaboration. The primary distribution will be through pip wheels with minimal binary size. The ai-dynamo github org will host 2 repos: dynamo and NIXL.
Initial Dynamo release features:
- Disaggregated serving with X prefill and Y decode nodes
- KV aware routing
- KV cache manager to offload KV cache to system memory
- NIXL support for RDMA (InfiniBand, Ethernet) and TCP
- Support for K8s deployment
As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang at launch, with varying degrees of maturity and support. Dynamo supports the vLLM engine with all the capabilities mentioned above, with a plan to achieve feature parity with the rest of inference engines as soon as possible.
Future plans
The next release of Dynamo plans to open-source the KV cache manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
In that release, we will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved.