22 Jul 04:55

github-actions

vllm-stack-0.1.6

3bb6b73

vllm-stack-0.1.6 Latest

Latest

The stack deployment of vLLM

What's changed

[CI]: change the entrypoint of nightly docker images (#514) (by @sammshen )
Add support for sleep and wake_up endpoints (#498) (by @dumb0002 )
[Bugfix] add health probe for lmcache server (#520) (by @zerofishnoodles )
[Doc, Feat] basic KEDA support and tutorials (#487) (by @Romero027 )
[Misc] Delete Unnecessary file (#521) (by @zerofishnoodles )
change keda name (#529) (by @zerofishnoodles )
[CI/CD] Add roundrobin router e2e test (#525) (by @zerofishnoodles )
[Doc] Add CRD deployment docs (#530) (by @kobe0938 )
[Doc] Kubernetes in Docker (kind) tutorial (#534) (by @lucas-tucker )
FEAT introduce ruff to project 1 - tests (#527) (by @BrianPark314 )
[CI/CD] Add static e2e test for prefixaware (#532) (by @zerofishnoodles )
fix(request): make sure to extend full_response (#536) (by @max-wittig )
[CI/CD] Add prefix aware routing test (#523) (by @zerofishnoodles )
[Bugfix][Helm] prevent duplicate securitycontext entry for containers (#544) (by @Hexoplon )
feature/gateway-inference-extension (#537) (by @BrianPark314 )
Add Artifact Hub metadata for verified publisher (#540) (by @kobe0938 )
[CI/CD] Add multiple routing logic test (#547) (by @zerofishnoodles )
[Doc] Adding security context for disaggregated prefill (#555) (by @YuhanLiu11 )
[CI/CD] Add checkov security check for infomation (by @zerofishnoodles )
fix(reconciler): trigger update when image or replicas are changed (#554) (by @googs1025 )
[Feat] Terraform Quickstart Tutorials for MS Azure (#552) (by @falconlee236 )
[Router] Expose /tokenize and /detokenize endpoints (#541) (by @Exchioz )
feature/ruff-router (#553) (by @BrianPark314 )
[Doc] Adding tutorial for Gateway Inference Extension support (#570) (@YuhanLiu11 )
fix: race condition in trie insert (by @zhouwfang )
[Feature] Moving default vLLM version from v0 to v1 (#580) (@YuhanLiu11 )
feat(helm): make imagePullPolicy configurable & fix router service annotation for LoadBalancer (#573) (by @lonelygo )
perf: minimize lock contention (#581) (by @zhouwfang )
[BugFix] fix lora controller reconcile logic (#565) (by @zerofishnoodles )
[FEAT] Add LoRA helm deployment (#563) (by @zerofishnoodles )

Contributors

max-wittig, lonelygo, and 13 other contributors

Assets 3

17 Jun 04:23

github-actions

vllm-stack-0.1.5

0b6a61c

vllm-stack-0.1.5

The stack deployment of vLLM

Assets 3

05 Jun 21:10

github-actions

vllm-stack-0.1.4

6e3c06f

vllm-stack-0.1.4

The stack deployment of vLLM

What's changed

Adding support to route a request to a specific engine instance (#438) @dumb0002
[Perf] Improve disaggregated prefill router performance (#440) @YuhanLiu11
[Fix] Only the default namespace service monitor namespace (#447) @nicole-lihui
update install script kubectl command to find kuberay-operator pod globally (#460) @googs1025
[Doc] Adding documentation for disaggregated prefill (#477) @YuhanLiu11
Optimize port conversion (#466) @learner0810
[Misc] Making KV aware routing compatible with latest LMCache (#475) @YuhanLiu11
fix(operator): fix cr status base on deployment replicas (#443) @googs1025
[Misc] Update the request_id handling logic to align with vLLM (#473) @KevinCheung2259
[CI/Build] Add env clean up before run (#486) @Shaoting-Feng
[BugFix] Fix v1/models in static discovery (#492) @zerofishnoodles
Bugfix/482 helm rayspec fix (#483) @insukim1994

Contributors

dumb0002, YuhanLiu11, and 7 other contributors

Assets 3

30 May 06:39

github-actions

vllm-stack-0.1.3

ff7a6c1

vllm-stack-0.1.3

The stack deployment of vLLM

Changes made

[Feat] add extraVolumes and extraVolumeMounts options @BrianPark314 (#396 )
[Bugfix] fix(services): make post_request callback not dependent on semantic_cache @ant-ms (#399 )
[Feat] Support for manual scheduling of a engine pod @dumb0002 (#400 )
[Bugfix] add miss argument type set @googs1025 (#401)
[Feat] add sentry sdk and cli args @pwuersch (#395 )
[Doc] Added documentation about uninstalling previous minikube instal @insukim1994 (#405 )
[Feat] KV cache aware routing @YuhanLiu11 (#403 )
[Feat] add event when Reconciling configmap failed @googs1025 (#402 )
[Misc] Update helm chart for v1 @YuhanLiu11 (#412 )
[Bugfix] fix(parser): fix dynamic config not working @max-wittig (#413 )
[feat] add model aliases @max-wittig (#397 )
[Misc] use schema https://json-schema.org/draft/2020-12/schema @sh1ng (#423 )
[Feat] Add initial CRD support for production stack @royyhuang (#415 )
[Feat] Prefix aware routing implementation based on hash trie @KuntaiDu (#432)
[Feat] Simple Gateway inference extension integration @YuhanLiu11 (#436)
[Feat] Adding support for disaggregated prefill based on vLLM v1 @YuhanLiu11 (#435)
refactor: Replace services list with a single service object @googs1025 (#409)
[Feat][Router] add static-model-types argument @max-wittig (#430 )
[CI/CD] Adding CI/CD tests for CRDs @YuhanLiu11 (#452 )
Switch context in CI @Shaoting-Feng (#451)
chore: add unittest coverage @max-wittig (#449)
Feat/basic pipeline parallelism @insukim1994 (#422)
feat: add endpoint health checks to static router @max-wittig (#428)
[Feat][lora] add lora operator and modify vllm router to support @zerofishnoodles (#446)

Contributors

sh1ng, max-wittig, and 11 other contributors

Assets 3

29 Apr 19:56

github-actions

vllm-stack-0.1.2

2404918

vllm-stack-0.1.2

The stack deployment of vLLM

What's Changed

[Feat] Adding support to turn on/off engine deployment by @dumb0002 #311
[Feat] Add nodeSelectorTerms for router & cacher servers by @kinoute #314
[Bugfix] Update logger handler to handle stdout/stderr properly @corona10 #320
[CI] Always upload logs of Helm functionality checks @pwuersch #321
[CI/Build] Remove sudo requirements in CI/CD @Shaoting-Feng #325
[Feat] Multiple service creation when multiple models specified @lucas-tucker #326
[CI] Add coverage tracking @zhuohangu #330
[CLI/Doc]Update on gke deployment with gpu quota @EaminC #334
[Bugfix] Fix thread creation to pass parameters properly. @corona10 #336
[Feat] OpenTelemetry Support Example @lucas-tucker #346
[Feat] Tool calling support for MCP client integration @YuhanLiu11 #352
[Benchmark] Add api key option @Kimdongui #354
[Bugfix] fix init container pvc volume mount @zerofishnoodles #359
[Feat] Enabled latency monitor and added average latency computation logic @insukim1994 #362
[Feat] Added a tutorial document for deploying production stack on amd gpus @insukim1994 #364
[Bugfix] Deprecated least loaded routing logic @insukim1994 #366
[Bugfix] added model name to deployment selector @TamKej #367
[Feat] helm: add routerSpec.serviceType value @marquiz #368
[Feat] Support Multi-Model Deployment with Enhanced vLLM Configurations @haitwang-cloud #371
[Bugfix] Fixing issues on the engine svc labels @dumb0002 #376
[Bugfix] Declare logger properly for protocols.py @corona10 #381
[Feat] Adding a tutorial for using vLLM v1 in production stack @YuhanLiu11 #390

Contributors

kinoute, marquiz, and 13 other contributors

Assets 3

19 Mar 18:39

github-actions

vllm-stack-0.1.1

82b47eb

vllm-stack-0.1.1

The stack deployment of vLLM

What's Changed

[CI/Build][Router] Make semantic caching optional by @Shaoting-Feng in #218
[Benchmark] Add router config in tutorial by @Shaoting-Feng in #223
refactor: standard fastapi project structure for better main… by @BrianPark314 in #217
Added lora support proposal by @wangchen615 in #216
[Feat] Added initContainer to modelSpec by @AbelHristodor in #221
[Router] Fix semantic cache check in chat completion url by @Shaoting-Feng in #224
[Doc] Change repo in tutorial 08 naive k8s by @Shaoting-Feng in #225
[Doc] Update community meeting calendar invite by @YuhanLiu11 in #231
[Doc] Fix startupProbe indentation in values-07 tutorial file by @AbelHristodor in #226
[Doc] Initial docs structure by @Siddhant-Ray in #234
[Doc] Update endpoint in 01 tutorial by @Shaoting-Feng in #236
[Doc] add example page and readme by @Siddhant-Ray in #241
[Doc] Fix typo of model name and output len in AIBrix by @Shaoting-Feng in #242
[Doc] Add doc page for benchmark qa by @Siddhant-Ray in #243
[Doc] add doc on gcp.rst by @EaminC in #249
[Feat] add vllm-api-key by @JustinDuy in #194
[CI/Build] Add concurrency to functionality test by @Shaoting-Feng in #219
[Doc] update tutorial and user manual docs by @Siddhant-Ray in #257
[Doc] Add docs for router CRD config and dev, some small tweaks by @Siddhant-Ray in #259
[FEAT] Terraform Quickstart Tutorials for Google GKE by @falconlee236 in #250
[Feat] add requestGPUType to modelSpec by @Hexoplon in #253
[Doc][CI/Build] Minor fix by @Shaoting-Feng in #258
[Doc] dev api docs, bug fixes by @Siddhant-Ray in #266
[Feat] add explicit resource limit values by @Hexoplon in #255
[DOC] format unified gcp.rst adding trouble shooting by @EaminC in #263
[Doc] Minor fix in tutorial by @YuhanLiu11 in #272
[Doc] Minor fix in benchmarking scripts by @YuhanLiu11 in #273
[Tutorial] Deployment on Azure AKS by @surajssd in #247
[Feat] add model label on engine deployments by @Hexoplon in #269
[Misc] Add schedulerName in servingEngineSpec by @hongkunyoo in #275
[Feat] Remove sudo requirement for kubectl and helm by @Romero027 in #256
[Benchmark] Minor fix in benchmark script by @YuhanLiu11 in #284
[Benchmark] Minor updates to benchmark script by @YuhanLiu11 in #286
[Doc] Minor fix in tutorials by @YuhanLiu11 in #288
[Feat] add extraVolume and extraVolumeMount helm variables by @Hexoplon in #280
Update 09-lora-enabled-installation.md by @wangchen615 in #287
chore: use extra deps to optionally install additional pkg by @rootfs in #289
[Feat] Request rewriter interface in router by @ApostaC in #230
[Feat] add security context to servingEngineSpec by @Hexoplon in #282
[Doc] add docs link to readme by @Siddhant-Ray in #290
chore: update e2e test to use python 3.12 to match setup.py requirements by @rootfs in #295
[CI/Build] Github action for building docs pipeline by @Siddhant-Ray in #291
Add .readthedocs.yaml by @hmellor in #296
Hotfix readthedocs build by @hmellor in #298
Update docs link in README by @hmellor in #299
feat: support PII detection in http request by @rootfs in #235
[Bugfix]: add missing v1 prefix by @Xunzhuo in #302
[Misc] Bumping version to 0.1.1 by @YuhanLiu11 in #308

New Contributors

@AbelHristodor made their first contribution in #221
@JustinDuy made their first contribution in #194
@falconlee236 made their first contribution in #250
@Hexoplon made their first contribution in #253
@surajssd made their first contribution in #247
@hongkunyoo made their first contribution in #275
@Romero027 made their first contribution in #256
@Xunzhuo made their first contribution in #302

Full Changelog: vllm-stack-0.1.0...vllm-stack-0.1.1

Contributors

hongkunyoo, JustinDuy, and 15 other contributors

Assets 3

03 Mar 17:31

github-actions

vllm-stack-0.1.0

fecae77

vllm-stack-0.1.0

The stack deployment of vLLM

What's Changed

[Feat] add imagePullSecrets option to helm chart #179 by @kalantar
[Benchmark] Adding multi-round QA benchmark script #180 @YuhanLiu11
[Feat]: add support for embeddings, rerank and score endpoints #181 @bufferoverflow
[CI/Build]: bump python to 3.12 to be inline with vllm #182 @bufferoverflow
Manually Enable LoRA Adapters using existing Router and vLLM deployment #206 @wangchen615
[Feat] dynamic configuration support for router #207 @ApostaC
[Feat] create kubernetes operator to manage dynamic config file #208 @rootfs
[Document, Feat] basic HPA support and tutorials #209 @ApostaC
[Feat] enable experimental semantic cache in router #210 @rootfs

New Contributors

@bufferoverflow made his first contribution in #181
@kalantar made his first contribution in #179
@rootfs made his first contribution in #208

Contributors

bufferoverflow, kalantar, and 4 other contributors

Assets 3

25 Feb 17:48

github-actions

vllm-stack-0.0.11

fb0cb90

vllm-stack-0.0.11

The stack deployment of vLLM

What's Changed

[Doc] Fixing CONTRIBUTING.md path issue in PR template by @YuhanLiu11 in #158
[Misc] Implement Singleton Design Pattern for EngineStat Scraper, RequestStat Monitor, and Router by @sitloboi2012 in #131
Fixed some tutorial problems by @Hanchenli in #160
[router] setuptools_scm to support version argument by @gaocegege in #155
Added disclaimer for tutorial by @Hanchenli in #161
[Misc] Remove hardcoded eks cluster name by @coloryourlife in #162
[Doc] Adding community meeting info by @YuhanLiu11 in #169
[Doc] Updating community meeting info by @YuhanLiu11 in #171
[Bugfix] Fix docker build problem in github workflow by @ApostaC in #164
[Feat, Misc] Disable PVC creation when pvcStorage is not provided by @ApostaC in #176

New Contributors

@coloryourlife made their first contribution in #162

Full Changelog: vllm-stack-0.0.10...vllm-stack-0.0.11

Contributors

gaocegege, ApostaC, and 4 other contributors

Assets 3

19 Feb 18:07

github-actions

vllm-stack-0.0.9

4c3aeef

vllm-stack-0.0.9

The stack deployment of vLLM

What's Changed

[Bugfix] Fix indentation issue in Helm Chart PVC by @BaeYeongbin in #148
[Tutorial] Deployment on Google GKE by @EaminC in #146
Feat: Router observability (Current QPS, router-side queueing delay, etc) Part 1 by @sitloboi2012 in #119
[release] Add github sha tag for router image by @gaocegege in #153
[Fix] Minor Fixs for Tutorial and Bumped version to 0.0.9 by @Hanchenli in #154

New Contributors

@BaeYeongbin made their first contribution in #148
@EaminC made their first contribution in #146
@sitloboi2012 made their first contribution in #119

Full Changelog: vllm-stack-0.0.8...vllm-stack-0.0.9

Contributors

gaocegege, sitloboi2012, and 3 other contributors

Assets 3

19 Feb 21:34

github-actions

vllm-stack-0.0.10

ecca068

vllm-stack-0.0.10

The stack deployment of vLLM

What's Changed

[Feature] Enabled vLLM v1 in Production Stack by @YuhanLiu11 in #157

Contributors

YuhanLiu11

Assets 3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

What's changed

Contributors

Uh oh!

Uh oh!

What's changed

Contributors

Uh oh!

Changes made

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Releases: vllm-project/production-stack

vllm-stack-0.1.6

What's changed

Contributors

Uh oh!

vllm-stack-0.1.5

Uh oh!

vllm-stack-0.1.4

What's changed

Contributors

Uh oh!

vllm-stack-0.1.3

Changes made

Contributors

Uh oh!

vllm-stack-0.1.2

What's Changed

Contributors

Uh oh!

vllm-stack-0.1.1

What's Changed

New Contributors

Contributors

Uh oh!

vllm-stack-0.1.0

What's Changed

New Contributors

Contributors

Uh oh!

vllm-stack-0.0.11

What's Changed

New Contributors

Contributors

Uh oh!

vllm-stack-0.0.9

What's Changed

New Contributors

Contributors

Uh oh!

vllm-stack-0.0.10

What's Changed

Contributors

Uh oh!