Skip to content

Commit 328c0f5

Browse files
ADD TGI docs (#43)
Adds simple similar instructions for using TGI to benchmark. P.S. Great tool! Co-authored-by: Eldar Kurtic <eldarkurtic314@gmail.com>
1 parent fd04739 commit 328c0f5

File tree

1 file changed

+16
-2
lines changed

1 file changed

+16
-2
lines changed

README.md

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -48,15 +48,29 @@ For detailed installation instructions and requirements, see the [Installation G
4848

4949
### Quick Start
5050

51-
#### 1. Start an OpenAI Compatible Server (vLLM)
51+
#### 1a. Start an OpenAI Compatible Server (vLLM)
5252

5353
GuideLLM requires an OpenAI-compatible server to run evaluations. [vLLM](https://github.com/vllm-project/vllm) is recommended for this purpose. To start a vLLM server with a Llama 3.1 8B quantized model, run the following command:
5454

5555
```bash
5656
vllm serve "neuralmagic/Meta-Llama-3.1-8B-Instruct-quantized.w4a16"
5757
```
5858

59-
For more information on starting a vLLM server, see the [vLLM Documentation](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html).
59+
#### 1b. Start an OpenAI Compatible Server (Hugging Face TGI)
60+
61+
GuideLLM requires an OpenAI-compatible server to run evaluations. [Text Generation Inference](https://github.com/huggingface/text-generation-inference) can be used here. To start a TGI server with a Llama 3.1 8B using docker, run the following command:
62+
63+
```bash
64+
docker run --gpus 1 -ti --shm-size 1g --ipc=host --rm -p 8080:80 \
65+
-e MODEL_ID=https://huggingface.co/llhf/Meta-Llama-3.1-8B-Instruct \
66+
-e NUM_SHARD=1 \
67+
-e MAX_INPUT_TOKENS=4096 \
68+
-e MAX_TOTAL_TOKENS=6000 \
69+
-e HF_TOKEN=$(cat ~/.cache/huggingface/token) \
70+
ghcr.io/huggingface/text-generation-inference:2.2.0
71+
```
72+
73+
For more information on starting a TGI server, see the [TGI Documentation](https://huggingface.co/docs/text-generation-inference/index).
6074

6175
#### 2. Run a GuideLLM Evaluation
6276

0 commit comments

Comments
 (0)