Skip to content

Commit 29a38f0

Browse files
[Doc] Support "important" and "announcement" admonitions (#19479)
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
1 parent a5115f4 commit 29a38f0

File tree

12 files changed

+61
-23
lines changed

12 files changed

+61
-23
lines changed

docs/contributing/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -130,7 +130,7 @@ pytest -s -v tests/test_logger.py
130130

131131
If you encounter a bug or have a feature request, please [search existing issues](https://github.com/vllm-project/vllm/issues?q=is%3Aissue) first to see if it has already been reported. If not, please [file a new issue](https://github.com/vllm-project/vllm/issues/new/choose), providing as much relevant information as possible.
132132

133-
!!! warning
133+
!!! important
134134
If you discover a security vulnerability, please follow the instructions [here](gh-file:SECURITY.md#reporting-a-vulnerability).
135135

136136
## Pull Requests & Code Reviews

docs/contributing/model/multimodal.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -48,8 +48,8 @@ Further update the model as follows:
4848
return vision_embeddings
4949
```
5050

51-
!!! warning
52-
The returned `multimodal_embeddings` must be either a **3D [torch.Tensor][]** of shape `(num_items, feature_size, hidden_size)`, or a **list / tuple of 2D [torch.Tensor][]'s** of shape `(feature_size, hidden_size)`, so that `multimodal_embeddings[i]` retrieves the embeddings generated from the `i`-th multimodal data item (e.g, image) of the request.
51+
!!! important
52+
The returned `multimodal_embeddings` must be either a **3D [torch.Tensor][]** of shape `(num_items, feature_size, hidden_size)`, or a **list / tuple of 2D [torch.Tensor][]'s** of shape `(feature_size, hidden_size)`, so that `multimodal_embeddings[i]` retrieves the embeddings generated from the `i`-th multimodal data item (e.g, image) of the request.
5353

5454
- Implement [get_input_embeddings][vllm.model_executor.models.interfaces.SupportsMultiModal.get_input_embeddings] to merge `multimodal_embeddings` with text embeddings from the `input_ids`. If input processing for the model is implemented correctly (see sections below), then you can leverage the utility function we provide to easily merge the embeddings.
5555

@@ -100,8 +100,8 @@ Further update the model as follows:
100100
```
101101

102102
!!! note
103-
The model class does not have to be named `*ForCausalLM`.
104-
Check out [the HuggingFace Transformers documentation](https://huggingface.co/docs/transformers/model_doc/auto#multimodal) for some examples.
103+
The model class does not have to be named `*ForCausalLM`.
104+
Check out [the HuggingFace Transformers documentation](https://huggingface.co/docs/transformers/model_doc/auto#multimodal) for some examples.
105105

106106
## 2. Specify processing information
107107

docs/contributing/model/registration.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ After you have implemented your model (see [tutorial][new-model-basic]), put it
1818
Then, add your model class to `_VLLM_MODELS` in <gh-file:vllm/model_executor/models/registry.py> so that it is automatically registered upon importing vLLM.
1919
Finally, update our [list of supported models][supported-models] to promote your model!
2020

21-
!!! warning
21+
!!! important
2222
The list of models in each section should be maintained in alphabetical order.
2323

2424
## Out-of-tree models
@@ -49,6 +49,6 @@ def register():
4949
)
5050
```
5151

52-
!!! warning
52+
!!! important
5353
If your model is a multimodal model, ensure the model class implements the [SupportsMultiModal][vllm.model_executor.models.interfaces.SupportsMultiModal] interface.
5454
Read more about that [here][supports-multimodal].

docs/contributing/model/tests.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ Without them, the CI for your PR will fail.
1515
Include an example HuggingFace repository for your model in <gh-file:tests/models/registry.py>.
1616
This enables a unit test that loads dummy weights to ensure that the model can be initialized in vLLM.
1717

18-
!!! warning
18+
!!! important
1919
The list of models in each section should be maintained in alphabetical order.
2020

2121
!!! tip

docs/design/multiprocessing.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ page for information on known issues and how to solve them.
77

88
## Introduction
99

10-
!!! warning
10+
!!! important
1111
The source code references are to the state of the code at the time of writing in December, 2024.
1212

1313
The use of Python multiprocessing in vLLM is complicated by:

docs/features/multimodal_inputs.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -211,7 +211,7 @@ for o in outputs:
211211

212212
Our OpenAI-compatible server accepts multi-modal data via the [Chat Completions API](https://platform.openai.com/docs/api-reference/chat).
213213

214-
!!! warning
214+
!!! important
215215
A chat template is **required** to use Chat Completions API.
216216
For HF format models, the default chat template is defined inside `chat_template.json` or `tokenizer_config.json`.
217217

docs/getting_started/quickstart.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -61,7 +61,8 @@ from vllm import LLM, SamplingParams
6161
```
6262

6363
The next section defines a list of input prompts and sampling parameters for text generation. The [sampling temperature](https://arxiv.org/html/2402.05201v1) is set to `0.8` and the [nucleus sampling probability](https://en.wikipedia.org/wiki/Top-p_sampling) is set to `0.95`. You can find more information about the sampling parameters [here][sampling-params].
64-
!!! warning
64+
65+
!!! important
6566
By default, vLLM will use sampling parameters recommended by model creator by applying the `generation_config.json` from the Hugging Face model repository if it exists. In most cases, this will provide you with the best results by default if [SamplingParams][vllm.SamplingParams] is not specified.
6667

6768
However, if vLLM's default sampling parameters are preferred, please set `generation_config="vllm"` when creating the [LLM][vllm.LLM] instance.
@@ -116,7 +117,7 @@ vllm serve Qwen/Qwen2.5-1.5B-Instruct
116117
!!! note
117118
By default, the server uses a predefined chat template stored in the tokenizer.
118119
You can learn about overriding it [here][chat-template].
119-
!!! warning
120+
!!! important
120121
By default, the server applies `generation_config.json` from the huggingface model repository if it exists. This means the default values of certain sampling parameters can be overridden by those recommended by the model creator.
121122

122123
To disable this behavior, please pass `--generation-config vllm` when launching the server.

docs/mkdocs/stylesheets/extra.css

Lines changed: 37 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,3 +34,40 @@ body[data-md-color-scheme="slate"] .md-nav__item--section > label.md-nav__link .
3434
color: rgba(255, 255, 255, 0.75) !important;
3535
font-weight: 700;
3636
}
37+
38+
/* Custom admonitions */
39+
:root {
40+
--md-admonition-icon--announcement: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M3.25 9a.75.75 0 0 1 .75.75c0 2.142.456 3.828.733 4.653a.122.122 0 0 0 .05.064.212.212 0 0 0 .117.033h1.31c.085 0 .18-.042.258-.152a.45.45 0 0 0 .075-.366A16.743 16.743 0 0 1 6 9.75a.75.75 0 0 1 1.5 0c0 1.588.25 2.926.494 3.85.293 1.113-.504 2.4-1.783 2.4H4.9c-.686 0-1.35-.41-1.589-1.12A16.4 16.4 0 0 1 2.5 9.75.75.75 0 0 1 3.25 9Z"></path><path d="M0 6a4 4 0 0 1 4-4h2.75a.75.75 0 0 1 .75.75v6.5a.75.75 0 0 1-.75.75H4a4 4 0 0 1-4-4Zm4-2.5a2.5 2.5 0 1 0 0 5h2v-5Z"></path><path d="M15.59.082A.75.75 0 0 1 16 .75v10.5a.75.75 0 0 1-1.189.608l-.002-.001h.001l-.014-.01a5.775 5.775 0 0 0-.422-.25 10.63 10.63 0 0 0-1.469-.64C11.576 10.484 9.536 10 6.75 10a.75.75 0 0 1 0-1.5c2.964 0 5.174.516 6.658 1.043.423.151.787.302 1.092.443V2.014c-.305.14-.669.292-1.092.443C11.924 2.984 9.713 3.5 6.75 3.5a.75.75 0 0 1 0-1.5c2.786 0 4.826-.484 6.155-.957.665-.236 1.154-.47 1.47-.64.144-.077.284-.161.421-.25l.014-.01a.75.75 0 0 1 .78-.061Z"></path></svg>');
41+
--md-admonition-icon--important: url('data:image/svg+xml;charset=utf-8,<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16" width="16" height="16"><path d="M4.47.22A.749.749 0 0 1 5 0h6c.199 0 .389.079.53.22l4.25 4.25c.141.14.22.331.22.53v6a.749.749 0 0 1-.22.53l-4.25 4.25A.749.749 0 0 1 11 16H5a.749.749 0 0 1-.53-.22L.22 11.53A.749.749 0 0 1 0 11V5c0-.199.079-.389.22-.53Zm.84 1.28L1.5 5.31v5.38l3.81 3.81h5.38l3.81-3.81V5.31L10.69 1.5ZM8 4a.75.75 0 0 1 .75.75v3.5a.75.75 0 0 1-1.5 0v-3.5A.75.75 0 0 1 8 4Zm0 8a1 1 0 1 1 0-2 1 1 0 0 1 0 2Z"></path></svg>');
42+
}
43+
44+
.md-typeset .admonition.announcement,
45+
.md-typeset details.announcement {
46+
border-color: rgb(255, 110, 66);
47+
}
48+
.md-typeset .admonition.important,
49+
.md-typeset details.important {
50+
border-color: rgb(239, 85, 82);
51+
}
52+
53+
.md-typeset .announcement > .admonition-title,
54+
.md-typeset .announcement > summary {
55+
background-color: rgb(255, 110, 66, 0.1);
56+
}
57+
.md-typeset .important > .admonition-title,
58+
.md-typeset .important > summary {
59+
background-color: rgb(239, 85, 82, 0.1);
60+
}
61+
62+
.md-typeset .announcement > .admonition-title::before,
63+
.md-typeset .announcement > summary::before {
64+
background-color: rgb(239, 85, 82);
65+
-webkit-mask-image: var(--md-admonition-icon--announcement);
66+
mask-image: var(--md-admonition-icon--announcement);
67+
}
68+
.md-typeset .important > .admonition-title::before,
69+
.md-typeset .important > summary::before {
70+
background-color: rgb(239, 85, 82);
71+
-webkit-mask-image: var(--md-admonition-icon--important);
72+
mask-image: var(--md-admonition-icon--important);
73+
}

docs/models/generative_models.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -51,7 +51,7 @@ for output in outputs:
5151
print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
5252
```
5353

54-
!!! warning
54+
!!! important
5555
By default, vLLM will use sampling parameters recommended by model creator by applying the `generation_config.json` from the huggingface model repository if it exists. In most cases, this will provide you with the best results by default if [SamplingParams][vllm.SamplingParams] is not specified.
5656

5757
However, if vLLM's default sampling parameters are preferred, please pass `generation_config="vllm"` when creating the [LLM][vllm.LLM] instance.
@@ -81,7 +81,7 @@ The [chat][vllm.LLM.chat] method implements chat functionality on top of [genera
8181
In particular, it accepts input similar to [OpenAI Chat Completions API](https://platform.openai.com/docs/api-reference/chat)
8282
and automatically applies the model's [chat template](https://huggingface.co/docs/transformers/en/chat_templating) to format the prompt.
8383

84-
!!! warning
84+
!!! important
8585
In general, only instruction-tuned models have a chat template.
8686
Base models may perform poorly as they are not trained to respond to the chat conversation.
8787

docs/models/supported_models.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -379,7 +379,7 @@ Specified using `--task generate`.
379379

380380
See [this page](./pooling_models.md) for more information on how to use pooling models.
381381

382-
!!! warning
382+
!!! important
383383
Since some model architectures support both generative and pooling tasks,
384384
you should explicitly specify the task type to ensure that the model is used in pooling mode instead of generative mode.
385385

@@ -432,7 +432,7 @@ Specified using `--task reward`.
432432
If your model is not in the above list, we will try to automatically convert the model using
433433
[as_reward_model][vllm.model_executor.models.adapters.as_reward_model]. By default, we return the hidden states of each token directly.
434434

435-
!!! warning
435+
!!! important
436436
For process-supervised reward models such as `peiyi9979/math-shepherd-mistral-7b-prm`, the pooling config should be set explicitly,
437437
e.g.: `--override-pooler-config '{"pooling_type": "STEP", "step_tag_id": 123, "returned_token_ids": [456, 789]}'`.
438438

@@ -485,7 +485,7 @@ On the other hand, modalities separated by `/` are mutually exclusive.
485485

486486
See [this page][multimodal-inputs] on how to pass multi-modal inputs to the model.
487487

488-
!!! warning
488+
!!! important
489489
**To enable multiple multi-modal items per text prompt in vLLM V0**, you have to set `limit_mm_per_prompt` (offline inference)
490490
or `--limit-mm-per-prompt` (online serving). For example, to enable passing up to 4 images per text prompt:
491491

@@ -640,7 +640,7 @@ Specified using `--task generate`.
640640

641641
See [this page](./pooling_models.md) for more information on how to use pooling models.
642642

643-
!!! warning
643+
!!! important
644644
Since some model architectures support both generative and pooling tasks,
645645
you should explicitly specify the task type to ensure that the model is used in pooling mode instead of generative mode.
646646

0 commit comments

Comments
 (0)