Skip to content

Docker command to start the sim from the Official documentation and README fails (deprecated flag --lora) #84

Open
@mohitpalsingh

Description

@mohitpalsingh

Command: docker run --rm --publish 8000:8000 ghcr.io/llm-d/llm-d-inference-sim:dev --port 8000 --model "Qwen/Qwen2.5-1.5B-Instruct" --lora "tweet-summary-0,tweet-summary-1"

Result:

 docker run --rm --publish 8000:8000 ghcr.io/llm-d/llm-d-inference-sim:dev  --port 8000 --model "Qwen/Qwen2.5-1.5B-Instruct" --lora "tweet-summary-0,tweet-summary-1"
I0712 18:39:50.628554       1 cmd.go:36] "Starting vLLM simulator"
unknown flag: --lora
Usage of llm-d-inference-sim flags:
      --add_dir_header                   If true, adds the file directory to the header of the log messages
      --alsologtostderr                  log to standard error as well as files (no effect when -logtostderr=true)
      --config string                    The configuration file
      --inter-token-latency int          Time to generate one token (in milliseconds)
      --log_backtrace_at traceLocation   when logging hits line file:N, emit a stack trace (default :0)
      --log_dir string                   If non-empty, write log files in this directory (no effect when -logtostderr=true)
      --log_file string                  If non-empty, use this log file (no effect when -logtostderr=true)
      --log_file_max_size uint           Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
      --logtostderr                      log to standard error instead of files (default true)
      --lora-modules strings             List of LoRA adapters (a list of space-separated JSON strings)
      --max-cpu-loras int                Maximum number of LoRAs to store in CPU memory
      --max-loras int                    Maximum number of LoRAs in a single batch (default 1)
      --max-num-seqs int                 Maximum number of inference requests that could be processed at the same time (parameter to simulate requests waiting queue) (default 5)
      --mode string                      Simulator mode, echo - returns the same text that was sent in the request, for chat completion returns the last message, random - returns random sentence from a bank of pre-defined sentences (default "random")
      --model string                     Currently 'loaded' model
      --one_output                       If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
      --port int                         Port (default 8000)
      --seed int                         Random seed for operations (if not set, current Unix time in nanoseconds is used) (default 1752345590629871667)
      --served-model-name strings        Model names exposed by the API (a list of space-separated strings)
      --skip_headers                     If true, avoid header prefixes in the log messages
      --skip_log_headers                 If true, avoid headers when opening log files (no effect when -logtostderr=true)
      --stderrthreshold severity         logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=true) (default 2)
      --time-to-first-token int          Time to first token (in milliseconds)
  -v, --v Level                          number for the log level verbosity
      --vmodule moduleSpec               comma-separated list of pattern=N settings for file-filtered logging
unknown flag: --lora

References:

  1. https://llm-d.ai/docs/architecture/Components/inf-simulator#running
  2. https://github.com/llm-d/llm-d-inference-sim?tab=readme-ov-file#running
  3. Correction note: https://github.com/llm-d/llm-d-inference-sim?tab=readme-ov-file#migrating-from-releases-prior-to-v020

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions