Releases · ai-dynamo/dynamo

18 Jul 05:21

v0.3.2

50f3636

Latest

Dynamo is a high-performance, low-latency inference framework designed to serve generative AI models—across any framework, architecture, or deployment scale. It's an open source project under the Apache 2.0 license. Dynamo is available for installation via pip wheels and containers from NVIDIA NGC.

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

NVIDIA TensorRT-LLM
vLLM
SGLang

Major Features and Improvements

Engine Support and Routing

Example standalone router for use outside of dynamo (#1409).
The new SLA-based planner dynamically manages resource allocation based on service-level objectives (#1420).
Data-parallel vLLM worker setups are now supported (#1513).
SGLang support was extended for DeepEP deployments (#1120).
Clean shutdown is now available for vllm_v1 and SGLang engines (#1562, #1764).
Experimental support for WideEP with EPLB aggregation and disaggregation is now available for TRTLLM (#1652, #1690).
Approximate KV cache residency and predicted active KV blocks for improved routing efficiency (#1636, #1638, #1731).

Observability and Metrics

Native DCGM and Prometheus integration enables hardware metrics collection and export. Optional Grafana dashboards are provided (#1488, #1701, #1788).
New Grafana dashboards offer composite software and hardware system visibility (#1788).
Batch /completions endpoint and speculative decoding metrics are now supported for vLLM (#1626, #1549).

Deployment, Kubernetes, and CLI

The Kubernetes operator now supports custom entrypoints, command overrides, and simplified graph deployments (#1396, #1708, #1877, #1893).
Example manifests for multimodal and minimal deployments were added (#1836, #1872).
Graph Helm chart logic, resource requests, and health probes were improved (#1877, #1888).
Two new Helm charts are introduced in this release: dynamo-platform, and dynamo-crds, enabling modular and robust Kubernetes deployments for a variety of topologies and operational requirements.
The dynamo-run command line interface now supports the --version flag and improved error handling and validation (#1596, #1674, #1623).
Docker and Kubernetes deployment workflows were streamlined. Helm charts and container images were improved (#1742, #1796, #1840, #1841).

Developer Experience

Embedding request handling was improved with frontend tokenization (#1494).
OpenAI API request validation is now available (#1674).
Batch embedding and parallel tokenization improve efficiency for batch inference and embedding (#1657).
The /responses endpoint and additional API features were added (#1694).

Bug Fixes

Issues related to GPU resource specifications in deployments, container builds, and runtime were fixed (#1826, #1792, #1546).
Helm chart logic, resource requests, and health probes were corrected (#1877, #1893).
Error handling and model loading were improved for multimodal and distributed deployments (#1545).
Metrics publishing and logging were fixed for vLLM, SGLang, and OpenAI endpoints (#1864, #1649, #1639).
Process cleanup issues were resolved in tests (#1801).

Documentation

Documentation updates include new guides for Ray setup, architecture diagrams, and deployment modes (#1947, #1697).
Benchmarking, troubleshooting, and advanced usage scenario documentation was enhanced.
Notes were added to deprecate outdated connectors (#1964, #1959).

Build, CI, and Test

Dependency upgrades include protobuf, nats, and etcd (#1876, #1744).
CI coverage now includes GPU-based and multi-engine tests.
Container builds now use distroless images for improved security and efficiency (#1570, #1569).
Fault tolerance tests #1444

Known Issues

KVBM is supported only with Python 3.12.

Release Assets

Python Wheels:

Rust Crates:

Containers:

Helm Charts:

Contributors

Thank you to all contributors for this release. For a full list, refer to the changelog.

Assets 3

01 Jul 17:59

nv-anants

v0.3.1

e117295

Dynamo Release v0.3.1

Dynamo is an open source project under the Apache 2.0 license. The primary distribution is done through pip wheels with minimal binary size. The ai-dynamo GitHub organization hosts two repositories: Dynamo and NIXL. Dynamo is designed as the next-generation inference server, building upon the foundation of NVIDIA® Triton Inference Server™. While Triton focuses on single-node inference deployments, we're integrating its robust capabilities into Dynamo over the next several months. We'll maintain support for Triton while providing a clear migration path for existing users once Dynamo achieves feature parity.

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

NVIDIA TensorRT-LLM
vLLM
SGLang

Dynamo v0.3.1 features:

Functional DeepSeek R1 disaggregated serving with wide EP using SGLang
Functional EPD disaggregation with video model (Llava video 7B)
Proof of concept inference gateway support
Prebuilt Dynamo + vLLM container
- We plan to release these pre-built containers in the coming days
Amazon Linux support

Future plans
Dynamo Roadmap

Known Issues

KVBM is supported only with python 3.12

What's Changed

🚀 Features & Improvements

feat: expose estimated kv cache hit in dynamo-run by @tedzhouhk in #1246
feat: KVBM async Python bindings and Layer class by @kthui in #1141
feat: add critical task execution handle by @ryanolson in #1268
feat: Initial Granite support by @grahamking in #1271
feat: Restructure kv manager block registration by @jthomson04 in #1093
feat: Publish events and metrics when using kv routing by @tanmayv25 in #1262
feat(dynamo-run): Use llama.cpp as the default engine for GGUF by @grahamking in #1276
feat: populate default image name by @biswapanda in #1255
feat: flatten out dynamo cloud helm chart by @julienmancuso in #1258
refactor: Refactor kv event publishers by @jthomson04 in #1287
refactor: rename KvMetricsPublisher to WorkerMetricsPublisher by @alec-flowers in #1284
feat: all blocks cleared event by @jain-ria in #1279
perf: Create default sampling params only once during initialization by @krishung5 in #1294
feat: expose router configurations to dynamo-run by @tedzhouhk in #1259
feat: Make llama.cpp Gnu OpenMP dependency optional by @grahamking in #1331
feat: set env variables in Dynamo deployments from secrets by @hhzhang16 in #1325
feat: Add DSR1 configurations by @ptarasiewiczNV in #1298
feat: add more metrics to rust frontend by @tedzhouhk in #1315
feat: Enable disagg support in trtllm standalone script by @tanmayv25 in #1355
feat: Integrate KVBM with CriticalTaskHandle by @jthomson04 in #1321
feat: add implementation for embeddings by @t-ob in #1290
feat: refactor docker registry secret management in operator by @julienmancuso in #1337
feat: set model specific prompt templates in the multimodal config files, add documentation for multimodal example deployment by @hhzhang16 in #1366
feat: add result of fluid experiment by @julienmancuso in #1379
feat: Update container with better EFA/RDMA support by @aranadive in #1333
feat: Support larger Gemma 3 models by @grahamking in #1359
refactor: Rename CompletionRequest to NvCreateCompletionRequest by @paulhendricks in #1383
feat: decouple bento dependency by @biswapanda in #1266
feat: data synthesizer based on prefix statistics by @PeaBrane in #1087
feat: introduce abstract classes to dynamo services by @mohammedabdulwahhab in #924
feat: KVBM dynamo runtime + event manger by @oandreeva-nv in #1195
feat: Utilities for distributed leader-worker barriers by @jthomson04 in #1429
feat: Restructure the KVBM WriteTo trait by @jthomson04 in #1363
feat: KVBM prometheus monitoring by @jthomson04 in #1211
feat: Improved offload queueing and block eviction ordering by @jthomson04 in #1425
feat: generate random texts from hashes using lorem ipsum by @PeaBrane in #1458
refactor: use comment filed in annotated to pass metric-related information by @tedzhouhk in #1385
feat: generalize VLM embedding extraction by @hhzhang16 in #1388
refactor: move kv store to runtime by @ryanolson in #1459
feat: add endpoint to clear all kv blocks in vllm v1 by @jain-ria in #1384
feat: Video support with Dynamo by @indrajit96 in #1443
feat: add build --push command by @hhzhang16 in #1485
feat: FT downed worker instance tracking and skipping by @kthui in #1424
feat: add dynamo pipeline example using inf-gw by @biswapanda in #1512
refactor: Log subprocess stderr as WARN (#1563) by @rmccorm4 in #1574

🐛 Bug Fixes

fix: cherry-pick of attributions from 0.2.1 release branch by @saturley-hall in #1267
fix: resolve local dev container build issues by @t-ob in #1269
fix: Renamed event publisher classes and configuration by @alec-flowers in #1273
fix: Only check model name on etcd-registered endpoints by @jthomson04 in #1263
fix: Fix mypy errors on trtllm examples by @tanmayv25 in #1277
fix: remove sglang hash for pyproject by @ishandhanani in #1281
fix: copy workspace as part of ci-min stage by @nv-anants in #1291
fix: resources naming by @biswapanda in #1302
fix: wait until probing on vllm examples to prevent timeouts by @mohammedabdulwahhab in #1293
fix: Fix vllm v0 None*int error when not using kv aware router by @tedzhouhk in #1304
fix: Update breaking change to enable_overlap_scheduler field from TRTLLM commit b4e5df0e by @rmccorm4 in #1310
fix: make imagePullSecrets optional when installing dynamo cloud by @julienmancuso in #1324
fix: Properly set VLLM_NIXL_SIDE_CHANNEL_HOST in multi-node by @ptarasiewiczNV in #1327
fix: Allow building only llamacpp or only mistralrs engine. by @grahamking in #1328
fix: allow custom annotations in api-store service by @julienmancuso in #1329
fix: Flatten pytorch_backend_config section to address breaking change to trtllm config by @rmccorm4 in #1326
fix: update profile script by @tedzhouhk in #1336
fix: Use min of max tokens or context length by @abrarshivani in #1322
fix: add ingress to llm example by @hhzhang16 in #1349
fix(dynamo-run): For internal comms use a random endpoint instead of hard coded by @grahamking in #1335
fix: dockerhub registry issues in dynamo operator by @mohammedabdulwahhab in #1350
fix: add speculative decoding config to dynamo serve + trtllm by @richardhuo-nv in #1356
fix: prefillqueue stream name in load-planner by @tedzhouhk in #1377
fix: take into account number of workers from config by @julienmancuso in #1365
fix: Fix link path for dynamo_run doc by @krishung5 in #1382
fix: fix dynamo cloud helm chart by @julienmancuso in #1376
fix: mismatch GAP and PA version by @tedzhouhk in #1386
fix: remove unused arg in planner by @tedzhouhk in #1390
fix: Use Ru...

Contributors

grahamking, saturley-hall, and 35 other contributors

Assets 3

05 Jun 20:51

nv-anants

v0.3.0

15ca948

Dynamo Release v0.3.0

As a vendor-neutral serving framework, Dynamo supports multiple large language model (LLM) inference engines to varying degrees:

NVIDIA TensorRT-LLM
vLLM
SGLang

Dynamo v0.3.0 features:

Dynamo run with KV routing and multiple model support! guide
Vllm v1 engine support! example
Sglang with DP attention! example
SLA based planner! guide
Optimized embedding transfer for multi-modal! example
Dynamo deploy update command! guide
Model caching using Fluid! guide
Fluxcd guide to managing custom resources guide

Future plans
Dynamo Roadmap

Known Issues

KVBM is supported only with python 3.12

What's Changed

🚀 Features & Improvements

feat: kv block manager by @ryanolson in #965
feat(sglang): disaggregated support by @ishandhanani in #976
feat(dynamo-run): Print HTTP routes on startup by @grahamking in #1010
feat(dynamo-run): KV-aware routing by @grahamking in #1064
feat: KV Cache Manager block offloading by @jthomson04 in #1030
feat: Add ignore_eos/nvext support for legacy completions by @rmccorm4 in #1080
feat: Use existing Tokio runtime in components by @abrarshivani in #941
feat: add vLLM V1 PD disagg example by @ptarasiewiczNV in #1013
feat: Add OpenAI Embeddings interface in rust lib by @t-ob in #1110
feat: add update deployment to dynamo deploy API and CLI by @hhzhang16 in #1048
feat: KV Block Manager Python bindings by @kthui in #1022
feat: Add LWS to Dynamo Operator by @nvrohanv in #998
feat: Add support for SSD offloading in block manager by @jthomson04 in #1115
feat: Support multiple models on single ingress node by @grahamking in #1127
feat: adding outer dimension to isolate k/v blocks by @ryanolson in #1126
feat: SLA Profiling and Recommending Parallelization Mapping by @tedzhouhk in #1114
feat: vllm mock workers, Rusty skeleton by @PeaBrane in #1033
feat: rename dynamo decorator by @biswapanda in #1133
feat(dynamo-run): Allow setting context-length by @grahamking in #1157
feat: Various KVBM improvements by @jthomson04 in #1134
feat: Add TTFT and ITL Interpolation to Profiling Script by @tedzhouhk in #1159
feat(dynamo-run): Allow setting KV cache block size by @grahamking in #1175
feat: Add standalone script for TRTLLM integration into dynamo-run by @tanmayv25 in #1162
feat: adding arena allocator for storage objects by @ryanolson in #1178
feat: support k8s target in dynamo deploy command by @hhzhang16 in #1104
feat: add dynamo operator overview doc by @julienmancuso in #688
feat: add dynamo-run example for vllm v0 by @tedzhouhk in #1186
feat: kvbm offload fixes and tests by @jthomson04 in #1191
feat: Add metrics and event publishers by @tanmayv25 in #1192
feat: NIXL Based RDMA Support w/ Multimodal Example by @whoisj in #1060
feat: Add Hello World Multinode example by @kylehh in #624
feat(sglang): add dockerfile/pyproject toml entry + steps to run dsr1 disagg by @ishandhanani in #1193
feat(http): add health check endpoint by @ishandhanani in #1037
feat: document model caching using Fluid by @julienmancuso in #1218
feat: portable dynamo build by @biswapanda in #1215
feat: fluxcd guide to managing custom resources by @mohammedabdulwahhab in #1220
feat: Enable dynamo-run out=trtllm by @tanmayv25 in #1223
feat(dynamo-llm): Remove bring-your-own-engine by @grahamking in #1216
feat: remove bento cloud deploy target, set deployment target to kubernetes by default by @hhzhang16 in #1247
feat: Support OAI frontend format and add async image handing by @krishung5 in #1214
feat: add KV Event Publishing to vLLM v1 by @alec-flowers in #1181

🐛 Bug Fixes

fix(bindings): serve_endpoint no longer takes a lease by @grahamking in #1014
fix(deps): sglang install must be done manually by @ishandhanani in #1019
fix: dynamo_serve and scv config inject/get by @tedzhouhk in #1017
fix: pin click dependency to old releases by @nv-anants in #1042
fix: use correct lease id for kv router by @tedzhouhk in #1035
fix: update nixl setup for arm builds by @nv-anants in #1061
fix: downgrade CUDA image use to work around PyNccl timeout in vLLM Ray use case by @GuanLuo in #1065
fix: read 'workers' to set deployments 'replicas' by @julienmancuso in #1040
fix: add maxage to nats stream by @wxsms in #1053
fix: fix broken links in deployment docs by @biswapanda in #1084
fix: Fix default RouterMode value by @grahamking in #1092
fix: planner fixes by @mohammedabdulwahhab in #1055
fix: use resource and workers hints from decorators and service args by @biswapanda in #1044
fix: add planner path in devcontainer by @biswapanda in #1113
fix: remove lib.real from LD_LIBRARY_PATH by @alec-flowers in #1117
fix(sglang): allow for disaggregation_bootstrap_port for multinode deployment by @ishandhanani in #1119
fix: Disable block manager by default in Python bindings by @kthui in #1128
fix: Incrementally decode token to reduce the overhead from Processor by @tanmayv25 in #1129
fix: set gpus as strings in config files by @julienmancuso in #1123
fix: Fix the protocol in the example by @tanmayv25 in #1146
fix: register model after engine load by @nnshah1 in #1145
fix: make component type a simple string by @mohammedabdulwahhab in #1144
fix(llmctl): Use ModelWatcher instead of direct etcd operations by @grahamking in #1150
fix(dynamo-run): Don't exit interactive chat on error by @grahamking in #1155
fix(llmctl): Add back the model_type in remove by @grahamking in #1158
fix: Enable Dynamo HTTP servers to run on IPv6-only hosts by @jmswen in #1166
fix: typo in planner doc and log by @tedzhouhk in #1165
fix: Fix race condition in kv_router unit test by @grahamk...

Contributors

grahamking, jmswen, and 33 other contributors

Assets 3

22 May 23:45

nnshah1

v0.2.1

b950ec5

Dynamo Release v0.2.1

Dynamo is an open source project with Apache 2 license. The primary distribution is done via pip wheels with minimal binary size. The ai-dynamo github org hosts 2 repos: dynamo and NIXL. Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved. As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang, with varying degrees of maturity and support.

Dynamo v0.2.1 features:

KV Block Manager! intro
Improved vLLM Performance by avoiding re-initializing sampling params
SGLang support! README.md
Multi-Modal E/P/D Disaggregation! README.md
Leader Worker Set K8s!
Qwen3, Gemma3 and Llama4 in Dynamo Run!

Future plans

Dynamo Roadmap

Known Issues

Benchmark guides are still being validated on public cloud instances (GCP / AWS)

What's Changed

🚀 Features & Improvements

feat: Qwen3, Gemma3 and Llama4 support by @grahamking in #1002
feat: Remove vllm and sglang from cargo build command by @hhzhang16 in #1003
feat: deploy planner in operator by @julienmancuso in #921
refactor: use primary lease + self-contained graceful shutdown trigged by SIGINT/SIGTERM by @tedzhouhk in #1001
feat: Add AWS EFA support by @aranadive in #999
feat(sglang): aggregated support by @ishandhanani in #937
feat: decoupling dynamo serve by @biswapanda in #905
feat: allow adding auth to etcd by @wxsms in #980

🐛 Bug Fixes

fix: Extract tokenizer from GGUF for Qwen3 and Gemma3 arch by @grahamking in #1011

Other Changes

docs: add docs for dynamo build by @mohammedabdulwahhab in #714
docs: fix typo in disagg perf tuning guide by @tedzhouhk in #859
feat: Adding completions endpoint support to dynamo run in=http by @oandreeva-nv in #777
docs: update editable install to include planner by @nv-anants in #860
chore: add docs around how runtime reconfiguration works by @ishandhanani in #861
feat: replace async queue with async iter and double decorator by @biswapanda in #858
docs: fix typo in planner documentation by @AndyDai-nv in #864
feat: Add unified x86 / aarch64 (ARM) build for VLLM image by @rmccorm4 in #839
refactor: move logging config to runtime by @ishandhanani in #863
feat: support multiple endpoints by @biswapanda in #857
build: Add Olga as a Rust reviewer by @grahamking in #872
fix: change the processor number to 5 to reduce the tokenization bottleneck by @richardhuo-nv in #865
refactor: change trtllm example kv routing use python bindings | deal with trtllm partial blocks | trtllm event change by @ziqif-nv in #866
fix: change environment variable to support local mount by @nnshah1 in #885
fix: manylinux tag in ai-dynamo-vllm wheel by @nv-anants in #884
chore: Split PushRouter from Client by @grahamking in #817
chore: add fastapi depenedncy in pyproject.toml by @biswapanda in #888
docs: update pythonpath for starting planner by @tedzhouhk in #890
fix(http): Make ModelDeploymentCard optional by @grahamking in #891
feat: Add request template support for default inference parameters by @abrarshivani in #841
fix: endless map in nixl.py by @wxsms in #852
feat: remove dynamoComponentRequest CRD by @julienmancuso in #856
docs: Fixes to dynamo deploy docs by @mohammedabdulwahhab in #902
feat: label component CR for planner by @julienmancuso in #901
feat: allow users to add env vars to dynamo deployment by @hhzhang16 in #862
chore: unified logging, added informative warnings for KV router example by @PeaBrane in #912
docs: add an example on how to use --service-name flag to spin up a standalone service by @ishandhanani in #915
fix: trtllm example by @biswapanda in #909
fix: add dedicated llmapi config for trtllm disagg kv routing example by @ziqif-nv in #916
chore: reduce code repetition in processor by @PeaBrane in #919
feat: Support hf:// URLs in dynamo run by @abrarshivani in #917
feat: Add check for version info in container build script by @abrarshivani in #774
docs: update examples in document by @biswapanda in #897
chore(dynamo-llm): Move the pre-processor to ingress side by @grahamking in #903
fix: default docker username and password are empty by @hhzhang16 in #926
feat: Add multimodal example with aggregated serving by @krishung5 in #709
docs: Add multi-node TRTLLM steps to README by @rmccorm4 in #930
feat: Update to support completion endpoint in TRTLLM by @tanmayv25 in #837
fix: use primary lease for NixlMetadataStore by @tedzhouhk in #928
chore: merge in support matrix and nixl commit hash by @saturley-hall in #944
feat: allow to set http port by @julienmancuso in #931
feat: automatically reserve port for assigning port number to endpoint and pubsub by @richardhuo-nv in #946
feat: multi-thread (via asyncio.task) in processor by @tedzhouhk in #904
fix: remove requirement for istio in doc by @julienmancuso in #950
feat: dynamo-run <-> python interop by @grahamking in #934
refactor: refactor dynamo deploy subfolder by @hhzhang16 in #927
ci: lock cuda at 12.8 by @hhzhang16 in #957
chore: Two-line copyright check by @grahamking in #958
chore: Add John as Codeowner by @jthomson04 in #962
feat(dynamo-run): vllm and sglang subprocess engines by @grahamking in #954
docs: add drt doc by @tedzhouhk in #951
feat: Migrate NATS Queue to Rust (#669) by @jthomson04 in #961
fix: create k8s service for main component only by @julienmancuso in #953
fix: fix missing num_remote_prefill_groups in vLLM patch by @ptarasiewiczNV in #981
fix: Create default sampling params only once during initialization by @ptarasiewiczNV in #982
chore: Remove embedded Python vllm and sglang engines by @grahamking in #966
fix: increase ulimit nofile for container/run.sh by @ajcasagrande in #969
docs: add fix for Zsh globbing error with pip install .[all] by @Chasing1020 in #945
build: Cleans the TensorRTLLM + Dynamo container build by @tanmayv25 in #968
feat: add interface for deployment manager by @biswapanda in #987
fix: Check nvext for ignore_eos and set min_tokens for benchmark consistency by @rmccorm4 in #988
fix: Fix vllm/sglang engine model name if using HF repo by @grahamking in #986
feat: Add multimodal example with disaggregated serving by @krishung5 in #811
feat: clea...

Contributors

grahamking, saturley-hall, and 23 other contributors

Assets 3

01 May 00:33

nv-anants

v0.2.0

ca728f6

Dynamo Release v0.2.0

Dynamo v0.2.0 features:

GB200 support with ARM builds (Note: currently requires a container build)
Planner - new experimental support for spinning workers up and down based on load
Improved K8s deployment workflow
- Installation wizard to enable easy configuration of Dynamo on your Kubernetes cluster
- CLI to manage your operator-based deployments
- Consolidate Custom Resources for Dynamo Deployments
- Documentation improvements (including Minikube guide to installing Dynamo Platform)

Future plans

Dynamo Roadmap

Known Issues

Benchmark guides are still being validated on public cloud instances (GCP / AWS)
Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.
TensorRT-LLM examples are not working currently in this release - but are being fixed in main.

What's Changed

fix: fix max_local_prefill_length not being printed out in disagg router log by @tedzhouhk in #628
docs: Add instructions to install git lfs by @tanmayv25 in #627
fix: add DYNAMO_HOME env var to vLLM docker image by @nv-anants in #629
fix: Account for Metrics.decode() changes by @rmccorm4 in #619
fix: Update test_report by @pvijayakrish in #641
fix: serviceArgs in config was not getting set for workers by @mohammedabdulwahhab in #640
fix: adding conversion to string for notif id comparison by @nnshah1 in #638
docs: Add documentation for UCX KV cache transfer in TRTLLM by @tanmayv25 in #639
build: Define UCX env var to use NVLink when available by @tanmayv25 in #631
feat: ETCD prefix watcher + python binding + runtime reconfiguration for router and disagg router by @tedzhouhk in #581
fix: dynamo build should work with link syntax by @mohammedabdulwahhab in #646
fix: change trtllm kv_router default block_size to 32 by @ziqif-nv in #642
fix: signal handlers to clean up zombie vllm processes by @ishandhanani in #545
feat: add .devcontainer based off images in container/ by @alec-flowers in #497
fix: devcontainer mounts and vllm c api by @alec-flowers in #663
fix: deploy command should support passing config by @mohammedabdulwahhab in #626
feat(dynamo-run): improve available engines list in --help by @XueSongTap in #664
feat: add dynamoDeployment CR finalizer by @julienmancuso in #623
fix: set correct parent_hash for each kv block when publish kv events by @ziqif-nv in #671
docs: Use the same term for dynamo base image across code snippets and text by @hutm in #670
docs: move deploy docs to docs/guides by @hhzhang16 in #674
fix: frontend and http server signal handling by @alec-flowers in #677
fix: check for resource in pipeline helm chart by @julienmancuso in #687
fix: ensure VLLM_LOGGING_LEVEL=xyz followsDYN_LOG=xyz by @ishandhanani in #692
feat: replace dynamo server with dynamo cloud by @hhzhang16 in #696
feat: base Dynamo docker image improvements and fixes by @hhzhang16 in #658
fix: fix pipeline helm chart by @julienmancuso in #698
docs: Benchmarking guide updates by @kthui in #678
feat: bump vLLM version to v0.8.4 by @ptarasiewiczNV in #690
chore: Replace TRD->Dynamo in llmctl help output by @rmccorm4 in #710
fix: allow for an empty dynamo config file by @hhzhang16 in #712
fix: cli version by @ishandhanani in #716
docs: Remove outdated python-wheels directory reference by @rmccorm4 in #719
fix: direct clients vs dependancies by @ishandhanani in #704
feat: adding dynamo-tokens crate by @ryanolson in #718
fix: bump GAP to r25.03 by @tedzhouhk in #724
feat: make ingress configurable in operator by @julienmancuso in #717
feat: configure logger with detail info by @tlipoca9 in #654
feat: Add disagg skeleton example by @kylehh in #683
fix: dynamo deploy helm chart cleanup by @mohammedabdulwahhab in #727
docs: add dedicated minikube guide by @mohammedabdulwahhab in #735
feat(dynamo-engine-vllm): vllm 0.8.X support by @grahamking in #728
feat: gracefully shutdown endpoint by revoking etcd lease + python binding by @tedzhouhk in #730
fix: Add missing deps for '--framework none' build by @rmccorm4 in #738
chore: Remove TRT-LLM C++ engine in favor of Python one by @grahamking in #747
docs: Support matrix post release. by @pvijayakrish in #736
docs: add aggregated deployment guide for multi-node sized model by @GuanLuo in #713
feat: make the model name to be the same as the HF repo name for dynamo-run by @AndyDai-nv in #749
feat: add additional packages to log filters by @abrarshivani in #752
chore(dynamo-run): Fix echo_core for EOS tokens by @grahamking in #759
feat: add custom lease to worker components by @ishandhanani in #748
chore: Add roadmap to main README.md by @harryskim in #763
feat: MLA disaggregation support to vLLM patch by @ptarasiewiczNV in #745
fix: Fix cancellation flow in python component graph by @pankajroark in #765
fix: give the user ownership permissions of /opt/dynamo/venv by @hhzhang16 in #767
docs: deployment docs improvements by @hhzhang16 in #753
feat: add option to configure separate docker registry for pipelines docker images by @julienmancuso in #744
chore: Update bug report to use dynamo env for collecting environment information by @nv-tusharma in #558
docs: R1 disaggregation guide by @GuanLuo in #720
feat: allow to CRUD dynamo pipelines by @julienmancuso in #761
docs: Custom Backend/Worker Guide by @rmccorm4 in #608
chore: fix arg name in example by @CormickKneey in #770
build: add rust binaries in manylinux image by @nv-anants in #783
feat: remove bento/yatai references by @julienmancuso in #782
docs: add note to use release branch examples by @nv-anants in #793
feat: Add log verbosity level flag to dynamo-run cli by @abrarshivani in #780
feat: rename operator CRDs by @julienmancuso in #795
feat: Add linux aarch64 support to dynamo-run build by @rmccorm4 in #802
fix: Update TRTLLM version and fix disagg workflow by @tanmayv25 in #804
chore: Increase sleep tim...

Contributors

grahamking, saturley-hall, and 28 other contributors

Assets 2

16 Apr 20:44

nv-anants

v0.1.1

926370b

Dynamo Release v0.1.1

Dynamo v0.1.1 features:

Benchmarking guides for Single and Multi-Node Disaggregation on H100 (vLLM)
TensorRT-LLM support for KV Aware Routing
TensorRT-LLM support for Disaggregation
ManyLinux and Ubuntu 22.04 Support for wheels and crates
Unified logging for Python and Rust

Future plans

Instructions for reproducing benchmark guides on GCP and AWS
KV Cache Manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.
Searchable user guides and documentation
Multi-node instances for large models
Initial Planner version supporting dynamic scaling of P / D workers. We will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
vLLM 1.0 support with NIXL and KV Cache Events

Known Issues

Benchmark guides are still being validated on public cloud instances (GCP / AWS)
Benchmarks on internal clusters show a 15% degradation from results displayed in summary graphs for multi-node 70B and are being investigated.

What's Changed

docs: Benchmarking guide updates (#678) by @kthui in #699
docs: Update support matrix by @pvijayakrish in #691
fix: change trtllm kv_router default block_size to 32 (#642) by @tanmayv25 in #694
fix: set correct parent_hash for each kv block when publish kv events by @tanmayv25 in #693
fix: Remove kv connector from agg config by @ptarasiewiczNV in #655
fix: Account for Metrics.decode() changes (#619) by @rmccorm4 in #619
fix: update to match latest nixl notifications as bytes @nnshah1 in #645
docs: Update support matrix by @pvijayakrish in #633
docs: Add instructions to install git lfs (#627) by @tanmayv25 in #627
fix: add DYNAMO_HOME env var to vLLM docker image (#629) by @nv-anants in #629
feat: TRT-LLM disaggregated serving using UCX (#562) by @tanmayv25 in #562
docs: Update support matrix by @pvijayakrish in #604
docs: Guide for multi-node benchmarking (#561) by @kthui in #561
fix: remove api-store from container by @mohammedabdulwahhab in #617
docs: Guides for single node benchmarking (#509) by @kthui in #509
fix: set worker env before worker process spawn by @ishandhanani in #614
docs: Move trtllm dynamo run doc from example to dynamo run guide (#578) by @tanmayv25 in #578
chore: update ai-dynamo-vllm wheel version (#598) by @nv-anants in #598
fix: bump bento to 1.4.8 (#579) by @mohammedabdulwahhab in #579
fix: update yum install in wheel-builder image (#605) by @nv-anants in #605
docs: update dynamo serve trtllm agg example yaml files (#600) by @ziqif-nv in #600
chore: use latest nixl for docker builds by @nv-anants in #596
chore: update versions to 0.1.1 by @nv-anants in #552
docs: Updated dynamo run instructions by @cdgamarose-nv in #555
feat: Add manylinux support for Dynamo by @pvijayakrish in #536
docs: Clarify the --max-local-prefill-length help description by @kthui in #554
feat: Add dynamo env CLI option to provide information about user environment by @nv-tusharma in #533
docs: add disagg tuning guide by @tedzhouhk in #413
fix: let dynamo run pass --help to dynamo-run by @ziqif-nv in #547
chore: Update TRTLLM version. Fix router. by @tanmayv25 in #527
fix: unify and enable dynamo logging by @ishandhanani in #520
feat(dynamo-run): Basic routing choice by @grahamking in #524
fix: clean unused bento pieces from serve.py and serving.py by @ishandhanani in #532
docs: update close-deployment in dynamo_serve.md by @tlipoca9 in #535
feat: update operator README by @julienmancuso in #544
fix: mypy error by @ishandhanani in #543
feat: cleanup operator code by @julienmancuso in #529
chore: Fixed file headers. Added attributions. by @dmitry-tokarev-nv in #530
fix: Remove api-server code by @mohammedabdulwahhab in #526
docs: hello world and vllm process docs by @ishandhanani in #525
feat: KV recorder for dumping router events into a jsonl by @PeaBrane in #505
chore: cleaner required workers check (don't spam print) by @PeaBrane in #521
docs: dynamo-run clarify engine list by @grahamking in #522
chore: Upgrade Rust to 1.86 by @grahamking in #518
chore: Add devops in more CODEOWNERS by @grahamking in #512
feat: Python decorator dynamo_worker takes optional static parameter without etcd by @grahamking in #494
fix: broken link to dynamo run by @lkm2835 in #517
docs: add 405b disaggregated serving documentation by @ishandhanani in #496
refactor: migrate engines to standalone crates by @ryanolson in #453
feat: Add TensorRT-LLM example for dynamo serve/run by @tanmayv25 in #456
docs: Remove invalid link by @grahamking in #506
docs: add instruction to copy dynamo-run in container setup by @hanweisen in #508
chore: Add libclang-dev to CI for llamacpp by @grahamking in #507
chore: rename duration to timeout by @tlipoca9 in #503
fix: adding missing file by @ryanolson in #501
feat: allow replicas to be set in DynamoDeployment CR by @julienmancuso in #486
chore: Disable blank issue creation for default issues template by @nv-tusharma in #492
chore: Remove <> from title + add labels for default issues template. by @nv-tusharma in #491
feat: Sets the code of conduct for the repository by @saturley-hall in #454
fix: Consolidate dynamo start and dynamo serve commands by @mohammedabdulwahhab in #405
feat: improve serve commands and expose DYNAMO_HOME env var by @jon-chuang in #436
feat: kv aware router executable by @ryanolson in #399
feat: deploy and use buildkit to build dynamo images by @julienmancuso in #450
feat(serve): Enhance multi-node deployment and worker configuration by @ishandhanani in #457
chore: Add default issue template for bug & feature requests by @nv-tusharma in #471
feat: unified logging by @ryanolson in #472
feat: add devcontainer to dynamo for Ubuntu 24.04 use by @h...

Contributors

grahamking, saturley-hall, and 37 other contributors

Assets 2

18 Mar 03:37

statiraju

v0.1.0

3983830

Dynamo Release v0.1.0

Dynamo v0.1.0 version will be released following Jensen Huang’s GTC keynote, and the product will be hosted on github.com/ai-dynamo. It’s an open source project with Apache 2 license, and public continuous integration will be available from the start to enable industry-wide collaboration. The primary distribution will be through pip wheels with minimal binary size. The ai-dynamo github org will host 2 repos: dynamo and NIXL.

Initial Dynamo release features:

Disaggregated serving with X prefill and Y decode nodes
KV aware routing
KV cache manager to offload KV cache to system memory
NIXL support for RDMA (InfiniBand, Ethernet) and TCP
Support for K8s deployment

As a vendor-agnostic serving framework, Dynamo supports multiple LLM inference engines including TRT-LLM, vLLM, and SGLang at launch, with varying degrees of maturity and support. Dynamo supports the vLLM engine with all the capabilities mentioned above, with a plan to achieve feature parity with the rest of inference engines as soon as possible.

Future plans
The next release of Dynamo plans to open-source the KV cache manager as a standalone repository under the ai-dynamo organization. This release will provide functionality for storing and evicting KV cache across multiple memory tiers, including GPU, system memory, local SSD, and object storage.

In that release, we will include an early version of the Dynamo Planner, another core component. This initial release will feature heuristic-based dynamic allocation of GPU workers between prefill and decode tasks, as well as model and fleet configuration adjustments based on user traffic patterns. Our vision is to evolve the Planner into a reinforcement learning platform, which will allow users to define objectives and then tune and optimize performance policies automatically based on system feedback.
Dynamo is designed as the ideal next generation inference server, building upon the foundations of the Triton Inference Server. While Triton focuses on single-node inference deployments, we are committed to integrating its robust single-node capabilities into Dynamo within the next several months. We will maintain ongoing support for Triton while ensuring a seamless migration path for existing users to Dynamo once feature parity is achieved.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Major Features and Improvements

Engine Support and Routing

Observability and Metrics

Deployment, Kubernetes, and CLI

Developer Experience

Bug Fixes

Documentation

Build, CI, and Test

Known Issues

Release Assets

Contributors

Uh oh!

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Contributors

Uh oh!

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Contributors

Uh oh!

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Other Changes

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Uh oh!

Releases: ai-dynamo/dynamo

Dynamo Release v0.3.2

Major Features and Improvements

Engine Support and Routing

Observability and Metrics

Deployment, Kubernetes, and CLI

Developer Experience

Bug Fixes

Documentation

Build, CI, and Test

Known Issues

Release Assets

Contributors

Uh oh!

Dynamo Release v0.3.1

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Contributors

Uh oh!

Dynamo Release v0.3.0

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Contributors

Uh oh!

Dynamo Release v0.2.1

What's Changed

🚀 Features & Improvements

🐛 Bug Fixes

Other Changes

Contributors

Uh oh!

Dynamo Release v0.2.0

What's Changed

Contributors

Uh oh!

Dynamo Release v0.1.1

What's Changed

Contributors

Uh oh!

Dynamo Release v0.1.0

Uh oh!