Releases: vllm-project/production-stack
Releases · vllm-project/production-stack
vllm-stack-0.1.6
The stack deployment of vLLM
What's changed
- [CI]: change the entrypoint of nightly docker images (#514) (by @sammshen )
- Add support for sleep and wake_up endpoints (#498) (by @dumb0002 )
- [Bugfix] add health probe for lmcache server (#520) (by @zerofishnoodles )
- [Doc, Feat] basic KEDA support and tutorials (#487) (by @Romero027 )
- [Misc] Delete Unnecessary file (#521) (by @zerofishnoodles )
- change keda name (#529) (by @zerofishnoodles )
- [CI/CD] Add roundrobin router e2e test (#525) (by @zerofishnoodles )
- [Doc] Add CRD deployment docs (#530) (by @kobe0938 )
- [Doc] Kubernetes in Docker (kind) tutorial (#534) (by @lucas-tucker )
- FEAT introduce ruff to project 1 - tests (#527) (by @BrianPark314 )
- [CI/CD] Add static e2e test for prefixaware (#532) (by @zerofishnoodles )
- fix(request): make sure to extend full_response (#536) (by @max-wittig )
- [CI/CD] Add prefix aware routing test (#523) (by @zerofishnoodles )
- [Bugfix][Helm] prevent duplicate securitycontext entry for containers (#544) (by @Hexoplon )
- feature/gateway-inference-extension (#537) (by @BrianPark314 )
- Add Artifact Hub metadata for verified publisher (#540) (by @kobe0938 )
- [CI/CD] Add multiple routing logic test (#547) (by @zerofishnoodles )
- [Doc] Adding security context for disaggregated prefill (#555) (by @YuhanLiu11 )
- [CI/CD] Add checkov security check for infomation (by @zerofishnoodles )
- fix(reconciler): trigger update when image or replicas are changed (#554) (by @googs1025 )
- [Feat] Terraform Quickstart Tutorials for MS Azure (#552) (by @falconlee236 )
- [Router] Expose /tokenize and /detokenize endpoints (#541) (by @Exchioz )
- feature/ruff-router (#553) (by @BrianPark314 )
- [Doc] Adding tutorial for Gateway Inference Extension support (#570) (@YuhanLiu11 )
- fix: race condition in trie insert (by @zhouwfang )
- [Feature] Moving default vLLM version from v0 to v1 (#580) (@YuhanLiu11 )
- feat(helm): make imagePullPolicy configurable & fix router service annotation for LoadBalancer (#573) (by @lonelygo )
- perf: minimize lock contention (#581) (by @zhouwfang )
- [BugFix] fix lora controller reconcile logic (#565) (by @zerofishnoodles )
- [FEAT] Add LoRA helm deployment (#563) (by @zerofishnoodles )
vllm-stack-0.1.5
The stack deployment of vLLM
vllm-stack-0.1.4
The stack deployment of vLLM
What's changed
- Adding support to route a request to a specific engine instance (#438) @dumb0002
- [Perf] Improve disaggregated prefill router performance (#440) @YuhanLiu11
- [Fix] Only the default namespace service monitor namespace (#447) @nicole-lihui
- update install script kubectl command to find kuberay-operator pod globally (#460) @googs1025
- [Doc] Adding documentation for disaggregated prefill (#477) @YuhanLiu11
- Optimize port conversion (#466) @learner0810
- [Misc] Making KV aware routing compatible with latest LMCache (#475) @YuhanLiu11
- fix(operator): fix cr status base on deployment replicas (#443) @googs1025
- [Misc] Update the request_id handling logic to align with vLLM (#473) @KevinCheung2259
- [CI/Build] Add env clean up before run (#486) @Shaoting-Feng
- [BugFix] Fix v1/models in static discovery (#492) @zerofishnoodles
- Bugfix/482 helm rayspec fix (#483) @insukim1994
vllm-stack-0.1.3
The stack deployment of vLLM
Changes made
- [Feat] add extraVolumes and extraVolumeMounts options @BrianPark314 (#396 )
- [Bugfix] fix(services): make post_request callback not dependent on semantic_cache @ant-ms (#399 )
- [Feat] Support for manual scheduling of a engine pod @dumb0002 (#400 )
- [Bugfix] add miss argument type set @googs1025 (#401)
- [Feat] add sentry sdk and cli args @pwuersch (#395 )
- [Doc] Added documentation about uninstalling previous minikube instal @insukim1994 (#405 )
- [Feat] KV cache aware routing @YuhanLiu11 (#403 )
- [Feat] add event when Reconciling configmap failed @googs1025 (#402 )
- [Misc] Update helm chart for v1 @YuhanLiu11 (#412 )
- [Bugfix] fix(parser): fix dynamic config not working @max-wittig (#413 )
- [feat] add model aliases @max-wittig (#397 )
- [Misc] use schema https://json-schema.org/draft/2020-12/schema @sh1ng (#423 )
- [Feat] Add initial CRD support for production stack @royyhuang (#415 )
- [Feat] Prefix aware routing implementation based on hash trie @KuntaiDu (#432)
- [Feat] Simple Gateway inference extension integration @YuhanLiu11 (#436)
- [Feat] Adding support for disaggregated prefill based on vLLM v1 @YuhanLiu11 (#435)
- refactor: Replace services list with a single service object @googs1025 (#409)
- [Feat][Router] add static-model-types argument @max-wittig (#430 )
- [CI/CD] Adding CI/CD tests for CRDs @YuhanLiu11 (#452 )
- Switch context in CI @Shaoting-Feng (#451)
- chore: add unittest coverage @max-wittig (#449)
- Feat/basic pipeline parallelism @insukim1994 (#422)
- feat: add endpoint health checks to static router @max-wittig (#428)
- [Feat][lora] add lora operator and modify vllm router to support @zerofishnoodles (#446)
vllm-stack-0.1.2
The stack deployment of vLLM
What's Changed
- [Feat] Adding support to turn on/off engine deployment by @dumb0002 #311
- [Feat] Add nodeSelectorTerms for router & cacher servers by @kinoute #314
- [Bugfix] Update logger handler to handle stdout/stderr properly @corona10 #320
- [CI] Always upload logs of Helm functionality checks @pwuersch #321
- [CI/Build] Remove sudo requirements in CI/CD @Shaoting-Feng #325
- [Feat] Multiple service creation when multiple models specified @lucas-tucker #326
- [CI] Add coverage tracking @zhuohangu #330
- [CLI/Doc]Update on gke deployment with gpu quota @EaminC #334
- [Bugfix] Fix thread creation to pass parameters properly. @corona10 #336
- [Feat] OpenTelemetry Support Example @lucas-tucker #346
- [Feat] Tool calling support for MCP client integration @YuhanLiu11 #352
- [Benchmark] Add api key option @Kimdongui #354
- [Bugfix] fix init container pvc volume mount @zerofishnoodles #359
- [Feat] Enabled latency monitor and added average latency computation logic @insukim1994 #362
- [Feat] Added a tutorial document for deploying production stack on amd gpus @insukim1994 #364
- [Bugfix] Deprecated least loaded routing logic @insukim1994 #366
- [Bugfix] added model name to deployment selector @TamKej #367
- [Feat] helm: add routerSpec.serviceType value @marquiz #368
- [Feat] Support Multi-Model Deployment with Enhanced vLLM Configurations @haitwang-cloud #371
- [Bugfix] Fixing issues on the engine svc labels @dumb0002 #376
- [Bugfix] Declare logger properly for protocols.py @corona10 #381
- [Feat] Adding a tutorial for using vLLM v1 in production stack @YuhanLiu11 #390
vllm-stack-0.1.1
The stack deployment of vLLM
What's Changed
- [CI/Build][Router] Make semantic caching optional by @Shaoting-Feng in #218
- [Benchmark] Add router config in tutorial by @Shaoting-Feng in #223
- refactor: standard fastapi project structure for better main… by @BrianPark314 in #217
- Added lora support proposal by @wangchen615 in #216
- [Feat] Added
initContainer
tomodelSpec
by @AbelHristodor in #221 - [Router] Fix semantic cache check in chat completion url by @Shaoting-Feng in #224
- [Doc] Change repo in tutorial 08 naive k8s by @Shaoting-Feng in #225
- [Doc] Update community meeting calendar invite by @YuhanLiu11 in #231
- [Doc] Fix
startupProbe
indentation invalues-07
tutorial file by @AbelHristodor in #226 - [Doc] Initial docs structure by @Siddhant-Ray in #234
- [Doc] Update endpoint in 01 tutorial by @Shaoting-Feng in #236
- [Doc] add example page and readme by @Siddhant-Ray in #241
- [Doc] Fix typo of model name and output len in AIBrix by @Shaoting-Feng in #242
- [Doc] Add doc page for benchmark qa by @Siddhant-Ray in #243
- [Doc] add doc on gcp.rst by @EaminC in #249
- [Feat] add vllm-api-key by @JustinDuy in #194
- [CI/Build] Add concurrency to functionality test by @Shaoting-Feng in #219
- [Doc] update tutorial and user manual docs by @Siddhant-Ray in #257
- [Doc] Add docs for router CRD config and dev, some small tweaks by @Siddhant-Ray in #259
- [FEAT] Terraform Quickstart Tutorials for Google GKE by @falconlee236 in #250
- [Feat] add requestGPUType to modelSpec by @Hexoplon in #253
- [Doc][CI/Build] Minor fix by @Shaoting-Feng in #258
- [Doc] dev api docs, bug fixes by @Siddhant-Ray in #266
- [Feat] add explicit resource limit values by @Hexoplon in #255
- [DOC] format unified gcp.rst adding trouble shooting by @EaminC in #263
- [Doc] Minor fix in tutorial by @YuhanLiu11 in #272
- [Doc] Minor fix in benchmarking scripts by @YuhanLiu11 in #273
- [Tutorial] Deployment on Azure AKS by @surajssd in #247
- [Feat] add model label on engine deployments by @Hexoplon in #269
- [Misc] Add
schedulerName
in servingEngineSpec by @hongkunyoo in #275 - [Feat] Remove sudo requirement for kubectl and helm by @Romero027 in #256
- [Benchmark] Minor fix in benchmark script by @YuhanLiu11 in #284
- [Benchmark] Minor updates to benchmark script by @YuhanLiu11 in #286
- [Doc] Minor fix in tutorials by @YuhanLiu11 in #288
- [Feat] add extraVolume and extraVolumeMount helm variables by @Hexoplon in #280
- Update 09-lora-enabled-installation.md by @wangchen615 in #287
- chore: use extra deps to optionally install additional pkg by @rootfs in #289
- [Feat] Request rewriter interface in router by @ApostaC in #230
- [Feat] add security context to servingEngineSpec by @Hexoplon in #282
- [Doc] add docs link to readme by @Siddhant-Ray in #290
- chore: update e2e test to use python 3.12 to match setup.py requirements by @rootfs in #295
- [CI/Build] Github action for building docs pipeline by @Siddhant-Ray in #291
- Add
.readthedocs.yaml
by @hmellor in #296 - Hotfix readthedocs build by @hmellor in #298
- Update docs link in README by @hmellor in #299
- feat: support PII detection in http request by @rootfs in #235
- [Bugfix]: add missing v1 prefix by @Xunzhuo in #302
- [Misc] Bumping version to 0.1.1 by @YuhanLiu11 in #308
New Contributors
- @AbelHristodor made their first contribution in #221
- @JustinDuy made their first contribution in #194
- @falconlee236 made their first contribution in #250
- @Hexoplon made their first contribution in #253
- @surajssd made their first contribution in #247
- @hongkunyoo made their first contribution in #275
- @Romero027 made their first contribution in #256
- @Xunzhuo made their first contribution in #302
Full Changelog: vllm-stack-0.1.0...vllm-stack-0.1.1
vllm-stack-0.1.0
The stack deployment of vLLM
What's Changed
- [Feat] add imagePullSecrets option to helm chart #179 by @kalantar
- [Benchmark] Adding multi-round QA benchmark script #180 @YuhanLiu11
- [Feat]: add support for embeddings, rerank and score endpoints #181 @bufferoverflow
- [CI/Build]: bump python to 3.12 to be inline with vllm #182 @bufferoverflow
- Manually Enable LoRA Adapters using existing Router and vLLM deployment #206 @wangchen615
- [Feat] dynamic configuration support for router #207 @ApostaC
- [Feat] create kubernetes operator to manage dynamic config file #208 @rootfs
- [Document, Feat] basic HPA support and tutorials #209 @ApostaC
- [Feat] enable experimental semantic cache in router #210 @rootfs
New Contributors
vllm-stack-0.0.11
The stack deployment of vLLM
What's Changed
- [Doc] Fixing CONTRIBUTING.md path issue in PR template by @YuhanLiu11 in #158
- [Misc] Implement Singleton Design Pattern for EngineStat Scraper, RequestStat Monitor, and Router by @sitloboi2012 in #131
- Fixed some tutorial problems by @Hanchenli in #160
- [router] setuptools_scm to support version argument by @gaocegege in #155
- Added disclaimer for tutorial by @Hanchenli in #161
- [Misc] Remove hardcoded eks cluster name by @coloryourlife in #162
- [Doc] Adding community meeting info by @YuhanLiu11 in #169
- [Doc] Updating community meeting info by @YuhanLiu11 in #171
- [Bugfix] Fix docker build problem in github workflow by @ApostaC in #164
- [Feat, Misc] Disable PVC creation when
pvcStorage
is not provided by @ApostaC in #176
New Contributors
- @coloryourlife made their first contribution in #162
Full Changelog: vllm-stack-0.0.10...vllm-stack-0.0.11
vllm-stack-0.0.9
The stack deployment of vLLM
What's Changed
- [Bugfix] Fix indentation issue in Helm Chart PVC by @BaeYeongbin in #148
- [Tutorial] Deployment on Google GKE by @EaminC in #146
- Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 by @sitloboi2012 in #119
- [release] Add github sha tag for router image by @gaocegege in #153
- [Fix] Minor Fixs for Tutorial and Bumped version to 0.0.9 by @Hanchenli in #154
New Contributors
- @BaeYeongbin made their first contribution in #148
- @EaminC made their first contribution in #146
- @sitloboi2012 made their first contribution in #119
Full Changelog: vllm-stack-0.0.8...vllm-stack-0.0.9
vllm-stack-0.0.10
The stack deployment of vLLM
What's Changed
- [Feature] Enabled vLLM v1 in Production Stack by @YuhanLiu11 in #157