Release v0.0.0beta22 · scaleapi/llm-engine

What's Changed

Change middleware format by @squeakymouse in #393
Fix custom framework Dockerfile by @squeakymouse in #395
fixing tensorrt-llm enum value (fixes #390) by @ian-scale in #396
overriding model length for zephyr 7b alpha by @ian-scale in #398
time completions use case by @saiatmakuri in #397
update docs to show model len / context windows by @ian-scale in #401
Add MultiprocessingConcurrencyLimiter to gateway by @squeakymouse in #399
change code-llama to codellama by @ian-scale in #400
fix completions request id by @saiatmakuri in #402
Allow latest inference framework tag by @squeakymouse in #403
Bump helm chart version by @seanshi-scale in #406
4x sqlalchemy pool size by @yunfeng-scale in #405
bump datadog module to 0.47.0 by @saiatmakuri in #407
Fix autoscaler node selector by @seanshi-scale in #409
Log request sizes by @yunfeng-scale in #410
add support for mixtral-8x7b and mixtral-8x7b-instruct by @saiatmakuri in #408
Make sure metadata is not incorrectly wiped during endpoint update by @yunfeng-scale in #413
Always return output for completions sync response by @yunfeng-scale in #412
handle update endpoint errors by @saiatmakuri in #414
[bug-fix] LLM Artifact Gateway .list_files() by @saiatmakuri in #416
enable sensitive log mode by @song-william in #415
Throughput benchmark script by @yunfeng-scale in #411
Upgrade vllm to 0.2.7 by @yunfeng-scale in #417
LLM batch completions API by @yunfeng-scale in #418
Small update to vllm batch by @yunfeng-scale in #419
sensitive content flag by @yunfeng-scale in #421
Revert a broken refactoring by @yunfeng-scale in #423
[Logging I/O] Post inference hooks as background tasks by @tiffzhao5 in #422
Batch inference client / doc by @yunfeng-scale in #424
Minor fixes for batch inference by @yunfeng-scale in #426

Full Changelog: v0.0.0beta20...v0.0.0beta22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.0.0beta22

What's Changed

Contributors

Uh oh!