v0.0.0beta22
·
229 commits
to main
since this release
What's Changed
- Change middleware format by @squeakymouse in #393
- Fix custom framework Dockerfile by @squeakymouse in #395
- fixing tensorrt-llm enum value (fixes #390) by @ian-scale in #396
- overriding model length for zephyr 7b alpha by @ian-scale in #398
- time completions use case by @saiatmakuri in #397
- update docs to show model len / context windows by @ian-scale in #401
- Add MultiprocessingConcurrencyLimiter to gateway by @squeakymouse in #399
- change code-llama to codellama by @ian-scale in #400
- fix completions request id by @saiatmakuri in #402
- Allow latest inference framework tag by @squeakymouse in #403
- Bump helm chart version by @seanshi-scale in #406
- 4x sqlalchemy pool size by @yunfeng-scale in #405
- bump datadog module to 0.47.0 by @saiatmakuri in #407
- Fix autoscaler node selector by @seanshi-scale in #409
- Log request sizes by @yunfeng-scale in #410
- add support for mixtral-8x7b and mixtral-8x7b-instruct by @saiatmakuri in #408
- Make sure metadata is not incorrectly wiped during endpoint update by @yunfeng-scale in #413
- Always return output for completions sync response by @yunfeng-scale in #412
- handle update endpoint errors by @saiatmakuri in #414
- [bug-fix] LLM Artifact Gateway .list_files() by @saiatmakuri in #416
- enable sensitive log mode by @song-william in #415
- Throughput benchmark script by @yunfeng-scale in #411
- Upgrade vllm to 0.2.7 by @yunfeng-scale in #417
- LLM batch completions API by @yunfeng-scale in #418
- Small update to vllm batch by @yunfeng-scale in #419
- sensitive content flag by @yunfeng-scale in #421
- Revert a broken refactoring by @yunfeng-scale in #423
- [Logging I/O] Post inference hooks as background tasks by @tiffzhao5 in #422
- Batch inference client / doc by @yunfeng-scale in #424
- Minor fixes for batch inference by @yunfeng-scale in #426
Full Changelog: v0.0.0beta20...v0.0.0beta22