Skip to content

Commit 2f56e0c

Browse files
authored
Enhance Readme and CLI Configuration Options (#39)
## Summary Reconfigures tokenizer support for the GuideLLM backend and request generators, enhancing the flexibility and reconfigurability of benchmark requests. It also updates CLI commands to include mandatory arguments for better user guidance and integrates a more comprehensive set of configuration options for data handling and request rate types. ## Details - **Tokenizer Support**: Added methods to instantiate and utilize tokenizers in backend classes and request generators, ensuring compatibility with various model configurations. - **CLI Enhancements**: - Updated CLI commands to require `-data` and `-data-type` arguments, improving clarity for users and preventing misconfigurations. - Refined help messages for all CLI options to provide more detailed guidance. - **Configuration Options**: - Introduced new options for specifying the `-tokenizer` and additional request rates in `-rate`. - Added functionality for testing backend connections using tokenizers. - Improved error handling when required options or compatible models are not available. - **Documentation**: Updated `README.md` and added detailed instructions for using the new configuration options. - **Tests**: - Expanded unit tests to cover new methods and configurations. - Ensured backward compatibility by validating default behaviors with updated test cases. ## Fixes - Resolves #37 with CLI pathways that default to model if tokenizer is not supplied - Resolves #36 with further documentation in the readme and in the help output text for the CLI
1 parent 912ec77 commit 2f56e0c

File tree

15 files changed

+344
-74
lines changed

15 files changed

+344
-74
lines changed

README.md

Lines changed: 33 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@
99
Scale Efficiently: Evaluate and Optimize Your LLM Deployments for Real-World Inference Needs
1010
</h3>
1111

12-
[![GitHub Release](https://img.shields.io/github/release/neuralmagic/guidellm.svg?label=Version)](https://github.com/neuralmagic/guidellm/releases) [![Documentation](https://img.shields.io/badge/Documentation-8A2BE2?logo=read-the-docs&logoColor=%23ffffff&color=%231BC070)](https://github.com/neuralmagic/guidellm/tree/main/docs) [![License](https://img.shields.io/github/license/neuralmagic/guidellm.svg)](https://github.com/neuralmagic/guidellm/blob/main/LICENSE) [![PyPi Release](https://img.shields.io/pypi/v/guidellm.svg?label=PyPi%20Release)](https://pypi.python.org/pypi/guidellm) [![Pypi Release](https://img.shields.io/pypi/v/guidellm-nightly.svg?label=PyPi%20Nightly)](https://pypi.python.org/pypi/guidellm-nightly) [![Python Versions](https://img.shields.io/pypi/pyversions/guidellm.svg?label=Python)](https://pypi.python.org/pypi/guidellm) [![Nightly Build](https://img.shields.io/github/actions/workflow/status/neuralmagic/guidellm/nightly.yml?branch=main&label=Nightly%20Build)](https://github.com/neuralmagic/guidellm/actions/workflows/nightly.yml)
12+
[![GitHub Release](https://img.shields.io/github/release/neuralmagic/guidellm.svg?label=Version)](https://github.com/neuralmagic/guidellm/releases) [![Documentation](https://img.shields.io/badge/Documentation-8A2BE2?logo=read-the-docs&logoColor=%23ffffff&color=%231BC070)](https://github.com/neuralmagic/guidellm/tree/main/docs) [![License](https://img.shields.io/github/license/neuralmagic/guidellm.svg)](https://github.com/neuralmagic/guidellm/blob/main/LICENSE) [![PyPI Release](https://img.shields.io/pypi/v/guidellm.svg?label=PyPI%20Release)](https://pypi.python.org/pypi/guidellm) [![Pypi Release](https://img.shields.io/pypi/v/guidellm-nightly.svg?label=PyPI%20Nightly)](https://pypi.python.org/pypi/guidellm-nightly) [![Python Versions](https://img.shields.io/pypi/pyversions/guidellm.svg?label=Python)](https://pypi.python.org/pypi/guidellm) [![Nightly Build](https://img.shields.io/github/actions/workflow/status/neuralmagic/guidellm/nightly.yml?branch=main&label=Nightly%20Build)](https://github.com/neuralmagic/guidellm/actions/workflows/nightly.yml)
1313

1414
## Overview
1515

@@ -65,10 +65,12 @@ To run a GuideLLM evaluation, use the `guidellm` command with the appropriate mo
6565
```bash
6666
guidellm \
6767
--target "http://localhost:8000/v1" \
68-
--model "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16"
68+
--model "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16" \
69+
--data-type emulated \
70+
--data "prompt_tokens=512,generated_tokens=128"
6971
```
7072

71-
The above command will begin the evaluation and output progress updates similar to the following: <img src="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-benchmark.gif" />
73+
The above command will begin the evaluation and output progress updates similar to the following (if running on a different server, be sure to update the target!): <img src="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-benchmark.gif" />
7274

7375
Notes:
7476

@@ -88,17 +90,39 @@ The end of the output will include important performance summary metrics such as
8890

8991
<img alt="Sample GuideLLM benchmark end output" src="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-output-end.png" />
9092

91-
### Advanced Settings
93+
### Configurations
9294

93-
GuideLLM provides various options to customize evaluations, including setting the duration of each benchmark run, the number of concurrent requests, and the request rate. For a complete list of options and advanced settings, see the [GuideLLM CLI Documentation](https://github.com/neuralmagic/guidellm/blob/main/docs/guides/cli.md).
95+
GuideLLM provides various CLI and environment options to customize evaluations, including setting the duration of each benchmark run, the number of concurrent requests, and the request rate.
9496

95-
Some common advanced settings include:
97+
Some common configurations for the CLI include:
9698

97-
- `--rate-type`: The rate to use for benchmarking. Options include `sweep` (shown above), `synchronous` (one request at a time), `throughput` (all requests at once), `constant` (a constant rate defined by `--rate`), and `poisson` (a poisson distribution rate defined by `--rate`).
98-
- `--data-type`: The data to use for the benchmark. Options include `emulated` (default shown above, emulated to match a given prompt and output length), `transformers` (a transformers dataset), and `file` (a {text, json, jsonl, csv} file with a list of prompts).
99+
- `--rate-type`: The rate to use for benchmarking. Options include `sweep`, `synchronous`, `throughput`, `constant`, and `poisson`.
100+
- `--rate-type sweep`: (default) Sweep runs through the full range of performance for the server. Starting with a `synchronous` rate first, then `throughput`, and finally 10 `constant` rates between the min and max request rate found.
101+
- `--rate-type synchronous`: Synchronous runs requests in a synchronous manner, one after the other.
102+
- `--rate-type throughput`: Throughput runs requests in a throughput manner, sending requests as fast as possible.
103+
- `--rate-type constant`: Constant runs requests at a constant rate. Specify the rate in requests per second with the `--rate` argument. For example, `--rate 10` or multiple rates with `--rate 10 --rate 20 --rate 30`.
104+
- `--rate-type poisson`: Poisson draws from a poisson distribution with the mean at the specified rate, adding some real-world variance to the runs. Specify the rate in requests per second with the `--rate` argument. For example, `--rate 10` or multiple rates with `--rate 10 --rate 20 --rate 30`.
105+
- `--data-type`: The data to use for the benchmark. Options include `emulated`, `transformers`, and `file`.
106+
- `--data-type emulated`: Emulated supports an EmulationConfig in string or file format for the `--data` argument to generate fake data. Specify the number of prompt tokens at a minimum and optionally the number of output tokens and other params for variance in the length. For example, `--data "prompt_tokens=128"`, `--data "prompt_tokens=128,generated_tokens=128"`, or `--data "prompt_tokens=128,prompt_tokens_variance=10"`.
107+
- `--data-type file`: File supports a file path or URL to a file for the `--data` argument. The file should contain data encoded as a CSV, JSONL, TXT, or JSON/YAML file with a single prompt per line for CSV, JSONL, and TXT or a list of prompts for JSON/YAML. For example, `--data "data.txt"` where data.txt contents are `"prompt1\nprompt2\nprompt3"`.
108+
- `--data-type transformers`: Transformers supports a dataset name or dataset file path for the `--data` argument. For example, `--data "neuralmagic/LLM_compression_calibration"`.
99109
- `--max-seconds`: The maximum number of seconds to run each benchmark. The default is 120 seconds.
100110
- `--max-requests`: The maximum number of requests to run in each benchmark.
101111

112+
For a full list of supported CLI arguments, run the following command:
113+
114+
```bash
115+
guidellm --help
116+
```
117+
118+
For a full list of configuration options, run the following command:
119+
120+
```bash
121+
guidellm-config
122+
```
123+
124+
For further information, see the [GuideLLM Documentation](#Documentation).
125+
102126
## Resources
103127

104128
### Documentation
@@ -109,7 +133,7 @@ Our comprehensive documentation provides detailed guides and resources to help y
109133

110134
- [**Installation Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/install.md) - Step-by-step instructions to install GuideLLM, including prerequisites and setup tips.
111135
- [**Architecture Overview**](https://github.com/neuralmagic/guidellm/tree/main/docs/architecture.md) - A detailed look at GuideLLM's design, components, and how they interact.
112-
- [**CLI Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/cli_usage.md) - Comprehensive usage information for running GuideLLM via the command line, including available commands and options.
136+
- [**CLI Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/cli.md) - Comprehensive usage information for running GuideLLM via the command line, including available commands and options.
113137
- [**Configuration Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/configuration.md) - Instructions on configuring GuideLLM to suit various deployment needs and performance goals.
114138

115139
### Supporting External Documentation

docs/assets/sample-benchmark.gif

-6.01 MB
Binary file not shown.

docs/assets/sample-benchmarks.gif

2.15 MB
Loading

pyproject.toml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -75,6 +75,7 @@ dev = [
7575

7676
[project.entry-points.console_scripts]
7777
guidellm = "guidellm.main:generate_benchmark_report_cli"
78+
guidellm-config = "guidellm.config:print_config"
7879

7980

8081
# ************************************************

src/guidellm/backend/base.py

Lines changed: 58 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,14 @@
1+
import asyncio
12
import functools
23
from abc import ABC, abstractmethod
34
from typing import AsyncGenerator, Dict, List, Literal, Optional, Type, Union
45

56
from loguru import logger
67
from pydantic import BaseModel
8+
from transformers import ( # type: ignore # noqa: PGH003
9+
AutoTokenizer,
10+
PreTrainedTokenizer,
11+
)
712

813
from guidellm.core import TextGenerationRequest, TextGenerationResult
914

@@ -103,10 +108,21 @@ def create(cls, backend_type: BackendEngine, **kwargs) -> "Backend":
103108
return Backend._registry[backend_type](**kwargs)
104109

105110
def __init__(self, type_: BackendEngine, target: str, model: str):
111+
"""
112+
Base constructor for the Backend class.
113+
Calls into test_connection to ensure the backend is reachable.
114+
Ensure all setup is done in the subclass constructor before calling super.
115+
116+
:param type_: The type of the backend.
117+
:param target: The target URL for the backend.
118+
:param model: The model used by the backend.
119+
"""
106120
self._type = type_
107121
self._target = target
108122
self._model = model
109123

124+
self.test_connection()
125+
110126
@property
111127
def default_model(self) -> str:
112128
"""
@@ -148,6 +164,48 @@ def model(self) -> str:
148164
"""
149165
return self._model
150166

167+
def model_tokenizer(self) -> PreTrainedTokenizer:
168+
"""
169+
Get the tokenizer for the backend model.
170+
171+
:return: The tokenizer instance.
172+
"""
173+
return AutoTokenizer.from_pretrained(self.model)
174+
175+
def test_connection(self) -> bool:
176+
"""
177+
Test the connection to the backend by running a short text generation request.
178+
If successful, returns True, otherwise raises an exception.
179+
180+
:return: True if the connection is successful.
181+
:rtype: bool
182+
:raises ValueError: If the connection test fails.
183+
"""
184+
try:
185+
asyncio.get_running_loop()
186+
is_async = True
187+
except RuntimeError:
188+
is_async = False
189+
190+
if is_async:
191+
logger.warning("Running in async mode, cannot test connection")
192+
return True
193+
194+
try:
195+
request = TextGenerationRequest(
196+
prompt="Test connection", output_token_count=5
197+
)
198+
199+
asyncio.run(self.submit(request))
200+
return True
201+
except Exception as err:
202+
raise_err = RuntimeError(
203+
f"Backend connection test failed for backend type={self.type_} "
204+
f"with target={self.target} and model={self.model} with error: {err}"
205+
)
206+
logger.error(raise_err)
207+
raise raise_err from err
208+
151209
async def submit(self, request: TextGenerationRequest) -> TextGenerationResult:
152210
"""
153211
Submit a text generation request and return the result.

src/guidellm/config.py

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
1+
import json
12
from enum import Enum
2-
from typing import Dict, List, Optional
3+
from typing import Dict, List, Optional, Sequence
34

45
from pydantic import BaseModel, Field, model_validator
56
from pydantic_settings import BaseSettings, SettingsConfigDict
@@ -10,6 +11,7 @@
1011
"Environment",
1112
"LoggingSettings",
1213
"OpenAISettings",
14+
"print_config",
1315
"ReportGenerationSettings",
1416
"Settings",
1517
"reload_settings",
@@ -70,7 +72,6 @@ class DatasetSettings(BaseModel):
7072
preferred_data_splits: List[str] = Field(
7173
default_factory=lambda: ["test", "tst", "validation", "val", "train"]
7274
)
73-
default_tokenizer: str = "neuralmagic/Meta-Llama-3.1-8B-FP8"
7475

7576

7677
class EmulatedDataSettings(BaseModel):
@@ -163,6 +164,53 @@ def set_default_source(cls, values):
163164

164165
return values
165166

167+
def generate_env_file(self) -> str:
168+
"""
169+
Generate the .env file from the current settings
170+
"""
171+
return Settings._recursive_generate_env(
172+
self,
173+
self.model_config["env_prefix"], # type: ignore # noqa: PGH003
174+
self.model_config["env_nested_delimiter"], # type: ignore # noqa: PGH003
175+
)
176+
177+
@staticmethod
178+
def _recursive_generate_env(model: BaseModel, prefix: str, delimiter: str) -> str:
179+
env_file = ""
180+
add_models = []
181+
for key, value in model.model_dump().items():
182+
if isinstance(value, BaseModel):
183+
# add nested properties to be processed after the current level
184+
add_models.append((key, value))
185+
continue
186+
187+
dict_values = (
188+
{
189+
f"{prefix}{key.upper()}{delimiter}{sub_key.upper()}": sub_value
190+
for sub_key, sub_value in value.items()
191+
}
192+
if isinstance(value, dict)
193+
else {f"{prefix}{key.upper()}": value}
194+
)
195+
196+
for tag, sub_value in dict_values.items():
197+
if isinstance(sub_value, Sequence) and not isinstance(sub_value, str):
198+
value_str = ",".join(f'"{item}"' for item in sub_value)
199+
env_file += f"{tag}=[{value_str}]\n"
200+
elif isinstance(sub_value, Dict):
201+
value_str = json.dumps(sub_value)
202+
env_file += f"{tag}={value_str}\n"
203+
elif not sub_value:
204+
env_file += f"{tag}=\n"
205+
else:
206+
env_file += f'{tag}="{sub_value}"\n'
207+
208+
for key, value in add_models:
209+
env_file += Settings._recursive_generate_env(
210+
value, f"{prefix}{key.upper()}{delimiter}", delimiter
211+
)
212+
return env_file
213+
166214

167215
settings = Settings()
168216

@@ -173,3 +221,14 @@ def reload_settings():
173221
"""
174222
new_settings = Settings()
175223
settings.__dict__.update(new_settings.__dict__)
224+
225+
226+
def print_config():
227+
"""
228+
Print the current configuration settings
229+
"""
230+
print(f"Settings: \n{settings.generate_env_file()}") # noqa: T201
231+
232+
233+
if __name__ == "__main__":
234+
print_config()

src/guidellm/core/request.py

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,3 +28,17 @@ class TextGenerationRequest(Serializable):
2828
default_factory=dict,
2929
description="The parameters for the text generation request.",
3030
)
31+
32+
def __str__(self) -> str:
33+
prompt_short = (
34+
self.prompt[:32] + "..."
35+
if self.prompt and len(self.prompt) > 32 # noqa: PLR2004
36+
else self.prompt
37+
)
38+
39+
return (
40+
f"TextGenerationRequest(id={self.id}, "
41+
f"prompt={prompt_short}, prompt_token_count={self.prompt_token_count}, "
42+
f"output_token_count={self.output_token_count}, "
43+
f"params={self.params})"
44+
)

0 commit comments

Comments
 (0)