Releases: llm-d/llm-d-inference-sim
Releases · llm-d/llm-d-inference-sim
v0.3.0
Release Notes
Compatibility with vLLM
- Aligned command-line parameters with real vLLM. All parameters supported by both the simulator and the vLLM now share the same name and format:
- Support for --served-model-name
- Support for --seed
- Support for --max-model-len
- Added support for tools in chat completions
- Included usage in the response
- Added object field to the response JSON
- Added support for multimodal inputs in chat completions
- Added health and readiness endpoints
- Added P/D support; the connector type must be set to nixl
Additional Features
- Introduced configuration file support. All parameters can now be loaded from a configuration file in addition to being set via the command line.
- Added new test coverage
- Changed the Docker base image
- Added the ability to randomize time to first token, inter token latency, and KV-cache transfer latency
Migration Notes (for users upgrading from versions prior to v0.2.0)
- max-running-requests has been renamed to max-num-seqs
- lora has been replaced by lora-modules, which now accepts a list of JSON strings, e.g, '{"name": "name", "path": "lora_path", "base_model_name": "id"}'
Change details since v0.2.2
- feat: add max-model-len configuration and validation for context window (#82) by @mohitpalsingh in #85
- Fixed readme, removed error for --help by @irar2 in #89
- Pd support by @mayabar in #94
- fix: crash when omitted stream_options by @jasonmadigan in #95
- style: 🔨 splits all import blocks into different sections by @yafengio in #98
- Fixed deployment.yaml by @irar2 in #99
- Enable configuration of various parameters in tools by @irar2 in #100
- Choose latencies randomly by @irar2 in #103
New Contributors
- @mohitpalsingh made their first contribution in #85
- @jasonmadigan made their first contribution in #95
Full Changelog: v0.2.2...v0.3.0
v0.2.2
What's Changed
- Initialize rand once, added seed to configuration by @irar2 in #79
- use string when storing lora adapters in simulator by @mayabar in #81
- Improved support for empty command line arguments by @irar2 in #80
- Added tests for LoRA configuration, load and unload by @irar2 in #86
Full Changelog: v0.2.1...v0.2.2
v0.2.1
What's Changed
- fix: max-cpu-loras should be initialized from max-loras by @shmuelk in #77
- Support space separated arguments, use correct format in config file by @irar2 in #78
Full Changelog: v0.2.0...v0.2.1
Migrating from releases prior to v0.2.0
max-running-requests
was replaced bymax-num-seqs
lora
was replaced bylora-modules
, which is now a list of JSON strings, e.g, '{"name": "name", "path": "lora_path", "base_model_name": "id"}'
v0.2.0
What's Changed
- add support to multimodal in chat completions by @JuanmaBM in #49
- Updated REST endpoint and Prometheus metric documentation by @shmuelk in #59
- Support tools by @irar2 in #55
- Split defs.go file containing data structures and functions into several files by @irar2 in #60
- Calculate usage data once for both text and chat by @irar2 in #61
- Improved tokenization by @irar2 in #62
- feat: 🚀 Reduce the mirror image size by @yafengio in #64
- Allow array parameters in tools by @irar2 in #65
- Support object parameters in tools by @irar2 in #66
- Support minItems and maxItems for array parameters in tools by @irar2 in #67
- Support integer and float in tools by @irar2 in #68
- link checker - fails PR if links are broken - be consistent with llmd-scheduler by @mayabar in #70
- Simplify tools parameters json schema by @irar2 in #72
- Add the --served-model-name flag by @nerdalert in #69
- Configuration improvements by @irar2 in #75
Migrating from releases prior to v0.2.0
Changes have been made in release v0.2.0 to make the command line arguments and configuration file more in-line with
vLLM's command line arguments and configuration file. In particular:
max-running-requests
was replaced bymax-num-seqs
lora
was replaced bylora-modules
, which is now an array in JSON format, e.g, [{"name": "name", "path": "lora_path", "base_model_name": "id"}]
New Contributors
- @JuanmaBM made their first contribution in #49
- @yafengio made their first contribution in #64
- @nerdalert made their first contribution in #69
Full Changelog: v0.1.2...v0.2.0
v0.1.0
The first release of the llm-d-inference-sim.
The llm-d-inference-sim is a lightweight vLLM simulator for use during development of the llm-d platform. In particular for use in the development of the llm-d-inference-scheduler.
What's Changed
- Move to gha remove tekton by @clubanderson in #19
- fix: Lint errors by @shmuelk in #20
Full Changelog: 0.0.6...v0.1.0