Skip to content

Commit d1978c3

Browse files
committed
[Misc] unify variable for LLM instance
Signed-off-by: Andy Xie <andy.xning@gmail.com>
1 parent e7e3e6d commit d1978c3

File tree

19 files changed

+94
-90
lines changed

19 files changed

+94
-90
lines changed

docs/configuration/model_resolution.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ For example:
1414
```python
1515
from vllm import LLM
1616

17-
model = LLM(
17+
llm = LLM(
1818
model="cerebras/Cerebras-GPT-1.3B",
1919
hf_overrides={"architectures": ["GPT2LMHeadModel"]}, # GPT-2
2020
)

docs/features/lora.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -302,7 +302,7 @@ To this end, we allow registration of default multimodal LoRAs to handle this au
302302
return tokenizer.apply_chat_template(chat, tokenize=False)
303303

304304

305-
model = LLM(
305+
llm = LLM(
306306
model=model_id,
307307
enable_lora=True,
308308
max_lora_rank=64,

docs/features/quantization/fp8.md

Lines changed: 6 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -86,8 +86,9 @@ Load and run the model in `vllm`:
8686

8787
```python
8888
from vllm import LLM
89-
model = LLM("./Meta-Llama-3-8B-Instruct-FP8-Dynamic")
90-
result = model.generate("Hello my name is")
89+
90+
llm = LLM("./Meta-Llama-3-8B-Instruct-FP8-Dynamic")
91+
result = llm.generate("Hello my name is")
9192
print(result[0].outputs[0].text)
9293
```
9394

@@ -125,9 +126,10 @@ In this mode, all Linear modules (except for the final `lm_head`) have their wei
125126

126127
```python
127128
from vllm import LLM
128-
model = LLM("facebook/opt-125m", quantization="fp8")
129+
130+
llm = LLM("facebook/opt-125m", quantization="fp8")
129131
# INFO 06-10 17:55:42 model_runner.py:157] Loading model weights took 0.1550 GB
130-
result = model.generate("Hello, my name is")
132+
result = llm.generate("Hello, my name is")
131133
print(result[0].outputs[0].text)
132134
```
133135

docs/features/quantization/int4.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,8 @@ After quantization, you can load and run the model in vLLM:
108108

109109
```python
110110
from vllm import LLM
111-
model = LLM("./Meta-Llama-3-8B-Instruct-W4A16-G128")
111+
112+
llm = LLM("./Meta-Llama-3-8B-Instruct-W4A16-G128")
112113
```
113114

114115
To evaluate accuracy, you can use `lm_eval`:

docs/features/quantization/int8.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,8 @@ After quantization, you can load and run the model in vLLM:
114114

115115
```python
116116
from vllm import LLM
117-
model = LLM("./Meta-Llama-3-8B-Instruct-W8A8-Dynamic-Per-Token")
117+
118+
llm = LLM("./Meta-Llama-3-8B-Instruct-W8A8-Dynamic-Per-Token")
118119
```
119120

120121
To evaluate accuracy, you can use `lm_eval`:

docs/models/pooling_models.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -149,11 +149,11 @@ You can change the output dimensions of embedding models that support Matryoshka
149149
```python
150150
from vllm import LLM, PoolingParams
151151

152-
model = LLM(model="jinaai/jina-embeddings-v3",
153-
task="embed",
154-
trust_remote_code=True)
155-
outputs = model.embed(["Follow the white rabbit."],
156-
pooling_params=PoolingParams(dimensions=32))
152+
llm = LLM(model="jinaai/jina-embeddings-v3",
153+
task="embed",
154+
trust_remote_code=True)
155+
outputs = llm.embed(["Follow the white rabbit."],
156+
pooling_params=PoolingParams(dimensions=32))
157157
print(outputs[0].outputs)
158158
```
159159

examples/offline_inference/basic/classify.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,10 @@ def main(args: Namespace):
2828

2929
# Create an LLM.
3030
# You should pass task="classify" for classification models
31-
model = LLM(**vars(args))
31+
llm = LLM(**vars(args))
3232

3333
# Generate logits. The output is a list of ClassificationRequestOutputs.
34-
outputs = model.classify(prompts)
34+
outputs = llm.classify(prompts)
3535

3636
# Print the outputs.
3737
print("\nGenerated Outputs:\n" + "-" * 60)

examples/offline_inference/basic/embed.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -31,10 +31,10 @@ def main(args: Namespace):
3131

3232
# Create an LLM.
3333
# You should pass task="embed" for embedding models
34-
model = LLM(**vars(args))
34+
llm = LLM(**vars(args))
3535

3636
# Generate embedding. The output is a list of EmbeddingRequestOutputs.
37-
outputs = model.embed(prompts)
37+
outputs = llm.embed(prompts)
3838

3939
# Print the outputs.
4040
print("\nGenerated Outputs:\n" + "-" * 60)

examples/offline_inference/basic/score.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -27,10 +27,10 @@ def main(args: Namespace):
2727

2828
# Create an LLM.
2929
# You should pass task="score" for cross-encoder models
30-
model = LLM(**vars(args))
30+
llm = LLM(**vars(args))
3131

3232
# Generate scores. The output is a list of ScoringRequestOutputs.
33-
outputs = model.score(text_1, texts_2)
33+
outputs = llm.score(text_1, texts_2)
3434

3535
# Print the outputs.
3636
print("\nGenerated Outputs:\n" + "-" * 60)

examples/offline_inference/embed_jina_embeddings_v3.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -30,11 +30,11 @@ def main(args: Namespace):
3030

3131
# Create an LLM.
3232
# You should pass task="embed" for embedding models
33-
model = LLM(**vars(args))
33+
llm = LLM(**vars(args))
3434

3535
# Generate embedding. The output is a list of EmbeddingRequestOutputs.
3636
# Only text matching task is supported for now. See #16120
37-
outputs = model.embed(prompts)
37+
outputs = llm.embed(prompts)
3838

3939
# Print the outputs.
4040
print("\nGenerated Outputs:")

0 commit comments

Comments
 (0)