You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Enhance Readme and CLI Configuration Options (#39)
## Summary
Reconfigures tokenizer support for the GuideLLM backend and request
generators, enhancing the flexibility and reconfigurability of benchmark
requests. It also updates CLI commands to include mandatory arguments
for better user guidance and integrates a more comprehensive set of
configuration options for data handling and request rate types.
## Details
- **Tokenizer Support**: Added methods to instantiate and utilize
tokenizers in backend classes and request generators, ensuring
compatibility with various model configurations.
- **CLI Enhancements**:
- Updated CLI commands to require `-data` and `-data-type` arguments,
improving clarity for users and preventing misconfigurations.
- Refined help messages for all CLI options to provide more detailed
guidance.
- **Configuration Options**:
- Introduced new options for specifying the `-tokenizer` and additional
request rates in `-rate`.
- Added functionality for testing backend connections using tokenizers.
- Improved error handling when required options or compatible models are
not available.
- **Documentation**: Updated `README.md` and added detailed instructions
for using the new configuration options.
- **Tests**:
- Expanded unit tests to cover new methods and configurations.
- Ensured backward compatibility by validating default behaviors with
updated test cases.
## Fixes
- Resolves#37 with CLI pathways that default to model if tokenizer is
not supplied
- Resolves#36 with further documentation in the readme and in the help
output text for the CLI
The above command will begin the evaluation and output progress updates similar to the following: <imgsrc="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-benchmark.gif" />
73
+
The above command will begin the evaluation and output progress updates similar to the following (if running on a different server, be sure to update the target!): <imgsrc="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-benchmark.gif" />
72
74
73
75
Notes:
74
76
@@ -88,17 +90,39 @@ The end of the output will include important performance summary metrics such as
88
90
89
91
<imgalt="Sample GuideLLM benchmark end output"src="https://github.com/neuralmagic/guidellm/blob/main/docs/assets/sample-output-end.png" />
90
92
91
-
### Advanced Settings
93
+
### Configurations
92
94
93
-
GuideLLM provides various options to customize evaluations, including setting the duration of each benchmark run, the number of concurrent requests, and the request rate. For a complete list of options and advanced settings, see the [GuideLLM CLI Documentation](https://github.com/neuralmagic/guidellm/blob/main/docs/guides/cli.md).
95
+
GuideLLM provides various CLI and environment options to customize evaluations, including setting the duration of each benchmark run, the number of concurrent requests, and the request rate.
94
96
95
-
Some common advanced settings include:
97
+
Some common configurations for the CLI include:
96
98
97
-
-`--rate-type`: The rate to use for benchmarking. Options include `sweep` (shown above), `synchronous` (one request at a time), `throughput` (all requests at once), `constant` (a constant rate defined by `--rate`), and `poisson` (a poisson distribution rate defined by `--rate`).
98
-
-`--data-type`: The data to use for the benchmark. Options include `emulated` (default shown above, emulated to match a given prompt and output length), `transformers` (a transformers dataset), and `file` (a {text, json, jsonl, csv} file with a list of prompts).
99
+
-`--rate-type`: The rate to use for benchmarking. Options include `sweep`, `synchronous`, `throughput`, `constant`, and `poisson`.
100
+
-`--rate-type sweep`: (default) Sweep runs through the full range of performance for the server. Starting with a `synchronous` rate first, then `throughput`, and finally 10 `constant` rates between the min and max request rate found.
101
+
-`--rate-type synchronous`: Synchronous runs requests in a synchronous manner, one after the other.
102
+
-`--rate-type throughput`: Throughput runs requests in a throughput manner, sending requests as fast as possible.
103
+
-`--rate-type constant`: Constant runs requests at a constant rate. Specify the rate in requests per second with the `--rate` argument. For example, `--rate 10` or multiple rates with `--rate 10 --rate 20 --rate 30`.
104
+
-`--rate-type poisson`: Poisson draws from a poisson distribution with the mean at the specified rate, adding some real-world variance to the runs. Specify the rate in requests per second with the `--rate` argument. For example, `--rate 10` or multiple rates with `--rate 10 --rate 20 --rate 30`.
105
+
-`--data-type`: The data to use for the benchmark. Options include `emulated`, `transformers`, and `file`.
106
+
-`--data-type emulated`: Emulated supports an EmulationConfig in string or file format for the `--data` argument to generate fake data. Specify the number of prompt tokens at a minimum and optionally the number of output tokens and other params for variance in the length. For example, `--data "prompt_tokens=128"`, `--data "prompt_tokens=128,generated_tokens=128"`, or `--data "prompt_tokens=128,prompt_tokens_variance=10"`.
107
+
-`--data-type file`: File supports a file path or URL to a file for the `--data` argument. The file should contain data encoded as a CSV, JSONL, TXT, or JSON/YAML file with a single prompt per line for CSV, JSONL, and TXT or a list of prompts for JSON/YAML. For example, `--data "data.txt"` where data.txt contents are `"prompt1\nprompt2\nprompt3"`.
108
+
-`--data-type transformers`: Transformers supports a dataset name or dataset file path for the `--data` argument. For example, `--data "neuralmagic/LLM_compression_calibration"`.
99
109
-`--max-seconds`: The maximum number of seconds to run each benchmark. The default is 120 seconds.
100
110
-`--max-requests`: The maximum number of requests to run in each benchmark.
101
111
112
+
For a full list of supported CLI arguments, run the following command:
113
+
114
+
```bash
115
+
guidellm --help
116
+
```
117
+
118
+
For a full list of configuration options, run the following command:
119
+
120
+
```bash
121
+
guidellm-config
122
+
```
123
+
124
+
For further information, see the [GuideLLM Documentation](#Documentation).
125
+
102
126
## Resources
103
127
104
128
### Documentation
@@ -109,7 +133,7 @@ Our comprehensive documentation provides detailed guides and resources to help y
109
133
110
134
-[**Installation Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/install.md) - Step-by-step instructions to install GuideLLM, including prerequisites and setup tips.
111
135
-[**Architecture Overview**](https://github.com/neuralmagic/guidellm/tree/main/docs/architecture.md) - A detailed look at GuideLLM's design, components, and how they interact.
112
-
-[**CLI Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/cli_usage.md) - Comprehensive usage information for running GuideLLM via the command line, including available commands and options.
136
+
-[**CLI Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/cli.md) - Comprehensive usage information for running GuideLLM via the command line, including available commands and options.
113
137
-[**Configuration Guide**](https://github.com/neuralmagic/guidellm/tree/main/docs/guides/configuration.md) - Instructions on configuring GuideLLM to suit various deployment needs and performance goals.
0 commit comments