Release v0.2.0 · llm-d/llm-d-inference-sim

What's Changed

add support to multimodal in chat completions by @JuanmaBM in #49
Updated REST endpoint and Prometheus metric documentation by @shmuelk in #59
Support tools by @irar2 in #55
Split defs.go file containing data structures and functions into several files by @irar2 in #60
Calculate usage data once for both text and chat by @irar2 in #61
Improved tokenization by @irar2 in #62
feat: 🚀 Reduce the mirror image size by @yafengio in #64
Allow array parameters in tools by @irar2 in #65
Support object parameters in tools by @irar2 in #66
Support minItems and maxItems for array parameters in tools by @irar2 in #67
Support integer and float in tools by @irar2 in #68
link checker - fails PR if links are broken - be consistent with llmd-scheduler by @mayabar in #70
Simplify tools parameters json schema by @irar2 in #72
Add the --served-model-name flag by @nerdalert in #69
Configuration improvements by @irar2 in #75

Migrating from releases prior to v0.2.0

Changes have been made in release v0.2.0 to make the command line arguments and configuration file more in-line with
vLLM's command line arguments and configuration file. In particular:

max-running-requests was replaced by max-num-seqs
lora was replaced by lora-modules, which is now an array in JSON format, e.g, [{"name": "name", "path": "lora_path", "base_model_name": "id"}]

New Contributors

@JuanmaBM made their first contribution in #49
@yafengio made their first contribution in #64
@nerdalert made their first contribution in #69

Full Changelog: v0.1.2...v0.2.0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

v0.2.0

What's Changed

Migrating from releases prior to v0.2.0

New Contributors

Contributors

Uh oh!