Skip to content

Commit 2119638

Browse files
irar2mayabar
andauthored
Configuration improvements (#75)
* Configuration file, changes to command line parameters Signed-off-by: Ira <IRAR@il.ibm.com> * Update README.md - fix mode options presentation Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update README.md Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Update README.md - update indentation Signed-off-by: Maya Barnea <mayab@il.ibm.com> * Added served-model-name to configuration Signed-off-by: Ira <IRAR@il.ibm.com> --------- Signed-off-by: Ira <IRAR@il.ibm.com> Signed-off-by: Maya Barnea <mayab@il.ibm.com> Signed-off-by: Ira Rosen <irar@il.ibm.com> Co-authored-by: Maya Barnea <mayab@il.ibm.com>
1 parent 5a9ba0c commit 2119638

File tree

9 files changed

+442
-94
lines changed

9 files changed

+442
-94
lines changed

README.md

Lines changed: 29 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -85,18 +85,40 @@ API responses contains a subset of the fields provided by the OpenAI API.
8585
For more details see the <a href="https://docs.vllm.ai/en/stable/getting_started/quickstart.html#openai-completions-api-with-vllm">vLLM documentation</a>
8686

8787
## Command line parameters
88-
- `port`: the port the simulator listents on, mandatory
88+
- `config`: the path to a yaml configuration file
89+
- `port`: the port the simulator listents on, default is 8000
8990
- `model`: the currently 'loaded' model, mandatory
90-
- `lora`: a list of available LoRA adapters, separated by commas, optional, by default empty
91+
- `served-model-name`: model names exposed by the API (comma-separated)
92+
- `lora-modules`: LoRA module configurations in JSON format: [{"name": "name", "path": "lora_path", "base_model_name": "id"}], optional, empty by default
93+
- `max-loras`: maximum number of LoRAs in a single batch, optional, default is one
94+
- `max-cpu-loras`: maximum number of LoRAs to store in CPU memory, optional, must be >= than max-loras, default is max-loras
95+
- `max-num-seqs`: maximum number of sequences per iteration (maximum number of inference requests that could be processed at the same time), default is 5
9196
- `mode`: the simulator mode, optional, by default `random`
92-
- `echo`: returns the same text that was sent in the request
93-
- `random`: returns a sentence chosen at random from a set of pre-defined sentences
97+
- `echo`: returns the same text that was sent in the request
98+
- `random`: returns a sentence chosen at random from a set of pre-defined sentences
9499
- `time-to-first-token`: the time to the first token (in milliseconds), optional, by default zero
95100
- `inter-token-latency`: the time to 'generate' each additional token (in milliseconds), optional, by default zero
96-
- `max-loras`: maximum number of LoRAs in a single batch, optional, default is one
97-
- `max-cpu-loras`: maximum number of LoRAs to store in CPU memory, optional, must be >= than max_loras, default is max_loras
98-
- `max-running-requests`: maximum number of inference requests that could be processed at the same time
99101

102+
In addition, as we are using klog, the following parameters are available:
103+
- `add_dir_header`: if true, adds the file directory to the header of the log messages
104+
- `alsologtostderr`: log to standard error as well as files (no effect when -logtostderr=true)
105+
- `log_backtrace_at`: when logging hits line file:N, emit a stack trace (default :0)
106+
- `log_dir`: if non-empty, write log files in this directory (no effect when -logtostderr=true)
107+
- `log_file`: if non-empty, use this log file (no effect when -logtostderr=true)
108+
- `log_file_max_size`: defines the maximum size a log file can grow to (no effect when -logtostderr=true). Unit is megabytes. If the value is 0, the maximum file size is unlimited. (default 1800)
109+
- `logtostderr`: log to standard error instead of files (default true)
110+
- `one_output`: if true, only write logs to their native severity level (vs also writing to each lower severity level; no effect when -logtostderr=true)
111+
- `skip_headers`: if true, avoid header prefixes in the log messages
112+
- `skip_log_headers`: if true, avoid headers when opening log files (no effect when -logtostderr=true)
113+
- `stderrthreshold`: logs at or above this threshold go to stderr when writing to files and stderr (no effect when -logtostderr=true or -alsologtostderr=true) (default 2)
114+
- `v`: number for the log level verbosity
115+
- `vmodule`: comma-separated list of pattern=N settings for file-filtered logging
116+
117+
---
118+
119+
## Migrating from releases prior to v0.2.0
120+
- `max-running-requests` was replaced by `max-num-seqs`
121+
- `lora` was replaced by `lora-modules`, which is now an array in JSON format, e.g, [{"name": "name", "path": "lora_path", "base_model_name": "id"}]
100122

101123
## Working with docker image
102124

manifests/config.yaml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
port: 8001
2+
model: "Qwen/Qwen2-0.5B"
3+
served-model-name: ["model1", "model2"]
4+
max-loras: 2
5+
max-cpu-loras: 5
6+
max-num-seqs: 5
7+
lora-modules: [{"name":"lora1","path":"/path/to/lora1"},{"name":"lora2","path":"/path/to/lora2"}]
8+
9+
mode: "random"
10+
time-to-first-token: 2
11+
inter-token-latency: 1

manifests/deployment.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@ spec:
2020
- "8000"
2121
- --max-loras
2222
- "2"
23-
- --lora
24-
- food-review-1
23+
- --lora-modules
24+
- '[{"name": "food-review-1"}]'
2525
image: ghcr.io/llm-d/llm-d-inference-sim:v0.1.0
2626
imagePullPolicy: IfNotPresent
2727
name: vllm-sim

pkg/llm-d-inference-sim/config.go

Lines changed: 159 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,159 @@
1+
/*
2+
Copyright 2025 The llm-d-inference-sim Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package llmdinferencesim
18+
19+
import (
20+
"encoding/json"
21+
"errors"
22+
"fmt"
23+
"os"
24+
25+
"gopkg.in/yaml.v3"
26+
)
27+
28+
type configuration struct {
29+
// Port defines on which port the simulator runs
30+
Port int `yaml:"port"`
31+
// Model defines the current base model name
32+
Model string `yaml:"model"`
33+
// ServedModelNames is one or many model names exposed by the API
34+
ServedModelNames []string `yaml:"served-model-name"`
35+
// MaxLoras defines maximum number of loaded LoRAs
36+
MaxLoras int `yaml:"max-loras"`
37+
// MaxCPULoras defines maximum number of LoRAs to store in CPU memory
38+
MaxCPULoras int `yaml:"max-cpu-loras"`
39+
// MaxNumSeqs is maximum number of sequences per iteration (the maximum
40+
// number of inference requests that could be processed at the same time)
41+
MaxNumSeqs int `yaml:"max-num-seqs"`
42+
// LoraModules is a list of LoRA adapters
43+
LoraModules loraModulesValue `yaml:"lora-modules"`
44+
45+
// TimeToFirstToken time before the first token will be returned, in milliseconds
46+
TimeToFirstToken int `yaml:"time-to-first-token"`
47+
// InterTokenLatency time between generated tokens, in milliseconds
48+
InterTokenLatency int `yaml:"inter-token-latency"`
49+
// Mode defines the simulator response generation mode, valid values: echo, random
50+
Mode string `yaml:"mode"`
51+
}
52+
53+
type loraModule struct {
54+
// Name is the LoRA's name
55+
Name string `yaml:"name"`
56+
// Path is the LoRA's path
57+
Path string `yaml:"path"`
58+
// BaseModelName is the LoRA's base model
59+
BaseModelName string `yaml:"base_model_name"`
60+
}
61+
62+
type loraModulesValue []loraModule
63+
64+
func (l *loraModulesValue) String() string {
65+
b, _ := json.Marshal(l)
66+
return string(b)
67+
}
68+
69+
func (l *loraModulesValue) Set(val string) error {
70+
return json.Unmarshal([]byte(val), l)
71+
}
72+
73+
func (l *loraModulesValue) Type() string {
74+
return "loras"
75+
}
76+
77+
// Implement custom YAML unmarshaling for just this type
78+
func (l *loraModulesValue) UnmarshalYAML(unmarshal func(interface{}) error) error {
79+
// Try parsing as an array of loraModule
80+
var arr []loraModule
81+
if err := unmarshal(&arr); err == nil {
82+
*l = arr
83+
return nil
84+
}
85+
// Try parsing as a JSON string
86+
var str string
87+
if err := unmarshal(&str); err == nil {
88+
return json.Unmarshal([]byte(str), l)
89+
}
90+
return errors.New("lora-modules: invalid format")
91+
}
92+
93+
func newConfig() *configuration {
94+
return &configuration{
95+
Port: vLLMDefaultPort,
96+
MaxLoras: 1,
97+
MaxCPULoras: 1,
98+
MaxNumSeqs: 5,
99+
Mode: modeRandom,
100+
}
101+
}
102+
103+
func (c *configuration) load(configFile string) error {
104+
configBytes, err := os.ReadFile(configFile)
105+
if err != nil {
106+
return fmt.Errorf("failed to read configuration file: %s", err)
107+
}
108+
109+
if err := yaml.Unmarshal(configBytes, &c); err != nil {
110+
return fmt.Errorf("failed to unmarshal configuration: %s", err)
111+
}
112+
return nil
113+
}
114+
115+
func (c *configuration) validate() error {
116+
if c.Model == "" {
117+
return errors.New("model parameter is empty")
118+
}
119+
// Upstream vLLM behaviour: when --served-model-name is not provided,
120+
// it falls back to using the value of --model as the single public name
121+
// returned by the API and exposed in Prometheus metrics.
122+
if len(c.ServedModelNames) == 0 {
123+
c.ServedModelNames = []string{c.Model}
124+
}
125+
126+
if c.Mode != modeEcho && c.Mode != modeRandom {
127+
return fmt.Errorf("invalid mode '%s', valid values are 'random' and 'echo'", c.Mode)
128+
}
129+
if c.Port <= 0 {
130+
return fmt.Errorf("invalid port '%d'", c.Port)
131+
}
132+
if c.InterTokenLatency < 0 {
133+
return errors.New("inter token latency cannot be negative")
134+
}
135+
if c.TimeToFirstToken < 0 {
136+
return errors.New("time to first token cannot be negative")
137+
}
138+
if c.MaxLoras < 1 {
139+
return errors.New("max LoRAs cannot be less than 1")
140+
}
141+
if c.MaxCPULoras == 0 {
142+
// max CPU LoRAs by default is same as max LoRAs
143+
c.MaxCPULoras = c.MaxLoras
144+
}
145+
if c.MaxCPULoras < c.MaxLoras {
146+
return errors.New("max CPU LoRAs cannot be less than max LoRAs")
147+
}
148+
149+
for _, lora := range c.LoraModules {
150+
if lora.Name == "" {
151+
return errors.New("empty LoRA name")
152+
}
153+
if lora.BaseModelName != "" && lora.BaseModelName != c.Model {
154+
return fmt.Errorf("unknown base model '%s' for LoRA '%s'", lora.BaseModelName, lora.Name)
155+
}
156+
}
157+
158+
return nil
159+
}
Lines changed: 155 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,155 @@
1+
/*
2+
Copyright 2025 The llm-d-inference-sim Authors.
3+
4+
Licensed under the Apache License, Version 2.0 (the "License");
5+
you may not use this file except in compliance with the License.
6+
You may obtain a copy of the License at
7+
8+
http://www.apache.org/licenses/LICENSE-2.0
9+
10+
Unless required by applicable law or agreed to in writing, software
11+
distributed under the License is distributed on an "AS IS" BASIS,
12+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
See the License for the specific language governing permissions and
14+
limitations under the License.
15+
*/
16+
17+
package llmdinferencesim
18+
19+
import (
20+
"os"
21+
22+
. "github.com/onsi/ginkgo/v2"
23+
. "github.com/onsi/gomega"
24+
"k8s.io/klog/v2"
25+
)
26+
27+
func createSimConfig(args []string) (*configuration, error) {
28+
oldArgs := os.Args
29+
defer func() {
30+
os.Args = oldArgs
31+
}()
32+
os.Args = args
33+
34+
s, err := New(klog.Background())
35+
if err != nil {
36+
return nil, err
37+
}
38+
if err := s.parseCommandParamsAndLoadConfig(); err != nil {
39+
return nil, err
40+
}
41+
return s.config, nil
42+
}
43+
44+
type testCase struct {
45+
name string
46+
args []string
47+
expectedConfig *configuration
48+
}
49+
50+
var _ = Describe("Simulator configuration", func() {
51+
tests := make([]testCase, 0)
52+
53+
// Simple config with only model name set
54+
c := newConfig()
55+
c.Model = model
56+
c.ServedModelNames = []string{c.Model}
57+
test := testCase{
58+
name: "simple",
59+
args: []string{"cmd", "--model", model, "--mode", modeRandom},
60+
expectedConfig: c,
61+
}
62+
tests = append(tests, test)
63+
64+
// Config from config.yaml file
65+
c = newConfig()
66+
c.Port = 8001
67+
c.Model = "Qwen/Qwen2-0.5B"
68+
c.ServedModelNames = []string{"model1", "model2"}
69+
c.MaxLoras = 2
70+
c.MaxCPULoras = 5
71+
c.MaxNumSeqs = 5
72+
c.TimeToFirstToken = 2
73+
c.InterTokenLatency = 1
74+
c.LoraModules = []loraModule{{Name: "lora1", Path: "/path/to/lora1"}, {Name: "lora2", Path: "/path/to/lora2"}}
75+
test = testCase{
76+
name: "config file",
77+
args: []string{"cmd", "--config", "../../manifests/config.yaml"},
78+
expectedConfig: c,
79+
}
80+
tests = append(tests, test)
81+
82+
// Config from config.yaml file plus command line args
83+
c = newConfig()
84+
c.Port = 8002
85+
c.Model = model
86+
c.ServedModelNames = []string{"alias1", "alias2"}
87+
c.MaxLoras = 2
88+
c.MaxCPULoras = 5
89+
c.MaxNumSeqs = 5
90+
c.TimeToFirstToken = 2
91+
c.InterTokenLatency = 1
92+
c.LoraModules = []loraModule{{Name: "lora1", Path: "/path/to/lora1"}, {Name: "lora2", Path: "/path/to/lora2"}}
93+
test = testCase{
94+
name: "config file with command line args",
95+
args: []string{"cmd", "--model", model, "--config", "../../manifests/config.yaml", "--port", "8002",
96+
"--served-model-name", "alias1,alias2"},
97+
expectedConfig: c,
98+
}
99+
tests = append(tests, test)
100+
101+
// Invalid configurations
102+
test = testCase{
103+
name: "invalid model",
104+
args: []string{"cmd", "--model", "", "--config", "../../manifests/config.yaml"},
105+
}
106+
tests = append(tests, test)
107+
108+
test = testCase{
109+
name: "invalid port",
110+
args: []string{"cmd", "--port", "-50", "--config", "../../manifests/config.yaml"},
111+
}
112+
tests = append(tests, test)
113+
114+
test = testCase{
115+
name: "invalid max-loras",
116+
args: []string{"cmd", "--max-loras", "15", "--config", "../../manifests/config.yaml"},
117+
}
118+
tests = append(tests, test)
119+
120+
test = testCase{
121+
name: "invalid mode",
122+
args: []string{"cmd", "--mode", "hello", "--config", "../../manifests/config.yaml"},
123+
}
124+
tests = append(tests, test)
125+
126+
test = testCase{
127+
name: "invalid lora",
128+
args: []string{"cmd", "--config", "../../manifests/config.yaml",
129+
"--lora-modules", "[{\"path\":\"/path/to/lora15\"}]"},
130+
}
131+
tests = append(tests, test)
132+
133+
DescribeTable("check configurations",
134+
func(args []string, expectedConfig *configuration) {
135+
config, err := createSimConfig(args)
136+
Expect(err).NotTo(HaveOccurred())
137+
Expect(config).To(Equal(expectedConfig))
138+
},
139+
Entry(tests[0].name, tests[0].args, tests[0].expectedConfig),
140+
Entry(tests[1].name, tests[1].args, tests[1].expectedConfig),
141+
Entry(tests[2].name, tests[2].args, tests[2].expectedConfig),
142+
)
143+
144+
DescribeTable("invalid configurations",
145+
func(args []string) {
146+
_, err := createSimConfig(args)
147+
Expect(err).To(HaveOccurred())
148+
},
149+
Entry(tests[3].name, tests[3].args),
150+
Entry(tests[4].name, tests[4].args),
151+
Entry(tests[5].name, tests[5].args),
152+
Entry(tests[6].name, tests[6].args),
153+
Entry(tests[7].name, tests[7].args),
154+
)
155+
})

0 commit comments

Comments
 (0)