Docker command to start the sim from the Official documentation and README fails (deprecated flag `--lora`)

Command: `docker run --rm --publish 8000:8000 ghcr.io/llm-d/llm-d-inference-sim:dev  --port 8000 --model "Qwen/Qwen2.5-1.5B-Instruct" --lora "tweet-summary-0,tweet-summary-1"`

Result:
```
 docker run --rm --publish 8000:8000 ghcr.io/llm-d/llm-d-inference-sim:dev  --port 8000 --model "Qwen/Qwen2.5-1.5B-Instruct" --lora "tweet-summary-0,tweet-summary-1"
I0712 18:39:50.628554       1 cmd.go:36] "Starting vLLM simulator"
unknown flag: --lora
Usage of llm-d-inference-sim flags:
      --add_dir_header                   If true, adds the file directory to the header of the log messages
      --alsologtostderr                  log to standard error as well as files (no effect when -logtostderr=true)
      --config string                    The configuration file
      --inter-token-latency int          Time to generate one token (in milliseconds)
      --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                   If non-empty, write log files in this directory (no effect when -logtostderr=true)
      --log_file string                  If non-empty, use this log file (no effect when -logtostderr=true)
      --log_file_max_size uint           Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --logtostderr                      log to standard error instead of files (default true)
      --lora-modules strings             List of LoRA adapters (a list of space-separated JSON strings)
      --max-cpu-loras int                Maximum number of LoRAs to store in CPU memory
      --max-loras int                    Maximum number of LoRAs in a single batch (default 1)
      --max-num-seqs int                 Maximum number of inference requests that could be processed at the same time (parameter to simulate requests waiting queue) (default 5)
      --mode string                      Simulator mode, echo - returns the same text that was sent in the request, for chat completion returns the last message, random - returns random sentence from a bank of pre-defined sentences (default "random")
      --model string                     Currently 'loaded' model
      --one_output                       If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
      --port int                         Port (default 8000)
      --seed int                         Random seed for operations (if not set, current Unix time in nanoseconds is used) (default 1752345590629871667)
      --served-model-name strings        Model names exposed by the API (a list of space-separated strings)
      --skip_headers                     If true, avoid header prefixes in the log messages
      --skip_log_headers                 If true, avoid headers when opening log files (no effect when -logtostderr=true)
      --stderrthreshold severity         logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=true) (default 2)
      --time-to-first-token int          Time to first token (in milliseconds)
  -v, --v Level                          number for the log level verbosity
      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging
unknown flag: --lora
```

References:

1. https://llm-d.ai/docs/architecture/Components/inf-simulator#running
2. https://github.com/llm-d/llm-d-inference-sim?tab=readme-ov-file#running
3. Correction note: https://github.com/llm-d/llm-d-inference-sim?tab=readme-ov-file#migrating-from-releases-prior-to-v020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Docker command to start the sim from the Official documentation and README fails (deprecated flag `--lora`) #84

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docker command to start the sim from the Official documentation and README fails (deprecated flag --lora) #84

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Docker command to start the sim from the Official documentation and README fails (deprecated flag `--lora`) #84