Closed
Description
Command: docker run --rm --publish 8000:8000 ghcr.io/llm-d/llm-d-inference-sim:dev --port 8000 --model "Qwen/Qwen2.5-1.5B-Instruct" --lora "tweet-summary-0,tweet-summary-1"
Result:
docker run --rm --publish 8000:8000 ghcr.io/llm-d/llm-d-inference-sim:dev --port 8000 --model "Qwen/Qwen2.5-1.5B-Instruct" --lora "tweet-summary-0,tweet-summary-1"
I0712 18:39:50.628554 1 cmd.go:36] "Starting vLLM simulator"
unknown flag: --lora
Usage of llm-d-inference-sim flags:
--add_dir_header If true, adds the file directory to the header of the log messages
--alsologtostderr log to standard error as well as files (no effect when -logtostderr=true)
--config string The configuration file
--inter-token-latency int Time to generate one token (in milliseconds)
--log_backtrace_at traceLocation when logging hits line file:N, emit a stack trace (default :0)
--log_dir string If non-empty, write log files in this directory (no effect when -logtostderr=true)
--log_file string If non-empty, use this log file (no effect when -logtostderr=true)
--log_file_max_size uint Defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
--logtostderr log to standard error instead of files (default true)
--lora-modules strings List of LoRA adapters (a list of space-separated JSON strings)
--max-cpu-loras int Maximum number of LoRAs to store in CPU memory
--max-loras int Maximum number of LoRAs in a single batch (default 1)
--max-num-seqs int Maximum number of inference requests that could be processed at the same time (parameter to simulate requests waiting queue) (default 5)
--mode string Simulator mode, echo - returns the same text that was sent in the request, for chat completion returns the last message, random - returns random sentence from a bank of pre-defined sentences (default "random")
--model string Currently 'loaded' model
--one_output If true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
--port int Port (default 8000)
--seed int Random seed for operations (if not set, current Unix time in nanoseconds is used) (default 1752345590629871667)
--served-model-name strings Model names exposed by the API (a list of space-separated strings)
--skip_headers If true, avoid header prefixes in the log messages
--skip_log_headers If true, avoid headers when opening log files (no effect when -logtostderr=true)
--stderrthreshold severity logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=true) (default 2)
--time-to-first-token int Time to first token (in milliseconds)
-v, --v Level number for the log level verbosity
--vmodule moduleSpec comma-separated list of pattern=N settings for file-filtered logging
unknown flag: --lora
References:
Metadata
Metadata
Assignees
Labels
No labels