20 Jul 08:29

shmuelk

7f1f766

v0.3.0 Pre-release

Pre-release

Release Notes

Compatibility with vLLM

Aligned command-line parameters with real vLLM. All parameters supported by both the simulator and the vLLM now share the same name and format:
- Support for --served-model-name
- Support for --seed
- Support for --max-model-len
Added support for tools in chat completions
Included usage in the response
Added object field to the response JSON
Added support for multimodal inputs in chat completions
Added health and readiness endpoints
Added P/D support; the connector type must be set to nixl

Additional Features

Introduced configuration file support. All parameters can now be loaded from a configuration file in addition to being set via the command line.
Added new test coverage
Changed the Docker base image
Added the ability to randomize time to first token, inter token latency, and KV-cache transfer latency

Migration Notes (for users upgrading from versions prior to v0.2.0)

max-running-requests has been renamed to max-num-seqs
lora has been replaced by lora-modules, which now accepts a list of JSON strings, e.g, '{"name": "name", "path": "lora_path", "base_model_name": "id"}'

Change details since v0.2.2

feat: add max-model-len configuration and validation for context window (#82) by @mohitpalsingh in #85
Fixed readme, removed error for --help by @irar2 in #89
Pd support by @mayabar in #94
fix: crash when omitted stream_options by @jasonmadigan in #95
style: 🔨 splits all import blocks into different sections by @yafengio in #98
Fixed deployment.yaml by @irar2 in #99
Enable configuration of various parameters in tools by @irar2 in #100
Choose latencies randomly by @irar2 in #103

New Contributors

@mohitpalsingh made their first contribution in #85
@jasonmadigan made their first contribution in #95

Full Changelog: v0.2.2...v0.3.0

Contributors

jasonmadigan, mayabar, and 3 other contributors

Assets 2

13 Jul 10:02

mayabar

v0.2.2

7656a3c

v0.2.2 Pre-release

Pre-release

What's Changed

Initialize rand once, added seed to configuration by @irar2 in #79
use string when storing lora adapters in simulator by @mayabar in #81
Improved support for empty command line arguments by @irar2 in #80
Added tests for LoRA configuration, load and unload by @irar2 in #86

Full Changelog: v0.2.1...v0.2.2

Contributors

mayabar and irar2

Assets 2

06 Jul 10:03

mayabar

v0.2.1

3e63a0d

v0.2.1 Pre-release

Pre-release

What's Changed

fix: max-cpu-loras should be initialized from max-loras by @shmuelk in #77
Support space separated arguments, use correct format in config file by @irar2 in #78

Full Changelog: v0.2.0...v0.2.1

Migrating from releases prior to v0.2.0

max-running-requests was replaced by max-num-seqs
lora was replaced by lora-modules, which is now a list of JSON strings, e.g, '{"name": "name", "path": "lora_path", "base_model_name": "id"}'

Contributors

shmuelk and irar2

Assets 2

03 Jul 10:26

shmuelk

v0.2.0

2119638

v0.2.0 Pre-release

Pre-release

What's Changed

add support to multimodal in chat completions by @JuanmaBM in #49
Updated REST endpoint and Prometheus metric documentation by @shmuelk in #59
Support tools by @irar2 in #55
Split defs.go file containing data structures and functions into several files by @irar2 in #60
Calculate usage data once for both text and chat by @irar2 in #61
Improved tokenization by @irar2 in #62
feat: 🚀 Reduce the mirror image size by @yafengio in #64
Allow array parameters in tools by @irar2 in #65
Support object parameters in tools by @irar2 in #66
Support minItems and maxItems for array parameters in tools by @irar2 in #67
Support integer and float in tools by @irar2 in #68
link checker - fails PR if links are broken - be consistent with llmd-scheduler by @mayabar in #70
Simplify tools parameters json schema by @irar2 in #72
Add the --served-model-name flag by @nerdalert in #69
Configuration improvements by @irar2 in #75

Migrating from releases prior to v0.2.0

Changes have been made in release v0.2.0 to make the command line arguments and configuration file more in-line with
vLLM's command line arguments and configuration file. In particular:

max-running-requests was replaced by max-num-seqs
lora was replaced by lora-modules, which is now an array in JSON format, e.g, [{"name": "name", "path": "lora_path", "base_model_name": "id"}]

New Contributors

@JuanmaBM made their first contribution in #49
@yafengio made their first contribution in #64
@nerdalert made their first contribution in #69

Full Changelog: v0.1.2...v0.2.0

Contributors

nerdalert, JuanmaBM, and 4 other contributors

Assets 2

20 May 11:36

shmuelk

v0.1.0

a5c928e

v0.1.0 Pre-release

Pre-release

The first release of the llm-d-inference-sim.

The llm-d-inference-sim is a lightweight vLLM simulator for use during development of the llm-d platform. In particular for use in the development of the llm-d-inference-scheduler.

What's Changed

Move to gha remove tekton by @clubanderson in #19
fix: Lint errors by @shmuelk in #20

Full Changelog: 0.0.6...v0.1.0

Contributors

clubanderson and shmuelk

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Release Notes

Compatibility with vLLM

Additional Features

Migration Notes (for users upgrading from versions prior to v0.2.0)

Change details since v0.2.2

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

What's Changed

Migrating from releases prior to v0.2.0

Contributors

Uh oh!

What's Changed

Migrating from releases prior to v0.2.0

New Contributors

Contributors

Uh oh!

What's Changed

Contributors

Uh oh!

Releases: llm-d/llm-d-inference-sim

v0.3.0

Release Notes

Compatibility with vLLM

Additional Features

Migration Notes (for users upgrading from versions prior to v0.2.0)

Change details since v0.2.2

New Contributors

Contributors

Uh oh!

v0.2.2

What's Changed

Contributors

Uh oh!

v0.2.1

What's Changed

Migrating from releases prior to v0.2.0

Contributors

Uh oh!

v0.2.0

What's Changed

Migrating from releases prior to v0.2.0

New Contributors

Contributors

Uh oh!

v0.1.0

What's Changed

Contributors

Uh oh!