Skip to content

Commit fab5cab

Browse files
hmellorhuydhn
authored andcommitted
[Doc] Use gh-pr and gh-issue everywhere we can in the docs (vllm-project#20564)
Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
1 parent 9f09071 commit fab5cab

File tree

4 files changed

+22
-24
lines changed

4 files changed

+22
-24
lines changed

docs/ci/update_pytorch_version.md

Lines changed: 5 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,9 +7,8 @@ release in CI/CD. It is standard practice to submit a PR to update the
77
PyTorch version as early as possible when a new [PyTorch stable
88
release](https://github.com/pytorch/pytorch/blob/main/RELEASE.md#release-cadence) becomes available.
99
This process is non-trivial due to the gap between PyTorch
10-
releases. Using [#16859](https://github.com/vllm-project/vllm/pull/16859) as
11-
an example, this document outlines common steps to achieve this update along with
12-
a list of potential issues and how to address them.
10+
releases. Using <gh-pr:16859> as an example, this document outlines common steps to achieve this
11+
update along with a list of potential issues and how to address them.
1312

1413
## Test PyTorch release candidates (RCs)
1514

@@ -79,7 +78,7 @@ and timeout. Additionally, since vLLM's fastcheck pipeline runs in read-only mod
7978
it doesn't populate the cache, so re-running it to warm up the cache
8079
is ineffective.
8180

82-
While ongoing efforts like [#17419](https://github.com/vllm-project/vllm/issues/17419)
81+
While ongoing efforts like [#17419](gh-issue:17419)
8382
address the long build time at its source, the current workaround is to set VLLM_CI_BRANCH
8483
to a custom branch provided by @khluu (`VLLM_CI_BRANCH=khluu/use_postmerge_q`)
8584
when manually triggering a build on Buildkite. This branch accomplishes two things:
@@ -140,6 +139,5 @@ to handle some platforms separately. The separation of requirements and Dockerfi
140139
for different platforms in vLLM CI/CD allows us to selectively choose
141140
which platforms to update. For instance, updating XPU requires the corresponding
142141
release from https://github.com/intel/intel-extension-for-pytorch by Intel.
143-
While https://github.com/vllm-project/vllm/pull/16859 updated vLLM to PyTorch
144-
2.7.0 on CPU, CUDA, and ROCm, https://github.com/vllm-project/vllm/pull/17444
145-
completed the update for XPU.
142+
While <gh-pr:16859> updated vLLM to PyTorch 2.7.0 on CPU, CUDA, and ROCm,
143+
<gh-pr:17444> completed the update for XPU.

docs/features/spec_decode.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -217,8 +217,8 @@ an [EAGLE (Extrapolation Algorithm for Greater Language-model Efficiency)](https
217217
A few important things to consider when using the EAGLE based draft models:
218218

219219
1. The EAGLE draft models available in the [HF repository for EAGLE models](https://huggingface.co/yuhuili) should
220-
be able to be loaded and used directly by vLLM after [PR 12304](https://github.com/vllm-project/vllm/pull/12304).
221-
If you are using vllm version before [PR 12304](https://github.com/vllm-project/vllm/pull/12304), please use the
220+
be able to be loaded and used directly by vLLM after <gh-pr:12304>.
221+
If you are using vllm version before <gh-pr:12304>, please use the
222222
[script](https://gist.github.com/abhigoyal1997/1e7a4109ccb7704fbc67f625e86b2d6d) to convert the speculative model,
223223
and specify `"model": "path/to/modified/eagle/model"` in `speculative_config`. If weight-loading problems still occur when using the latest version of vLLM, please leave a comment or raise an issue.
224224

@@ -228,7 +228,7 @@ A few important things to consider when using the EAGLE based draft models:
228228

229229
3. When using EAGLE-based speculators with vLLM, the observed speedup is lower than what is
230230
reported in the reference implementation [here](https://github.com/SafeAILab/EAGLE). This issue is under
231-
investigation and tracked here: [https://github.com/vllm-project/vllm/issues/9565](https://github.com/vllm-project/vllm/issues/9565).
231+
investigation and tracked here: <gh-issue:9565>.
232232

233233
A variety of EAGLE draft models are available on the Hugging Face hub:
234234

docs/usage/troubleshooting.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -212,7 +212,7 @@ if __name__ == '__main__':
212212

213213
## `torch.compile` Error
214214

215-
vLLM heavily depends on `torch.compile` to optimize the model for better performance, which introduces the dependency on the `torch.compile` functionality and the `triton` library. By default, we use `torch.compile` to [optimize some functions](https://github.com/vllm-project/vllm/pull/10406) in the model. Before running vLLM, you can check if `torch.compile` is working as expected by running the following script:
215+
vLLM heavily depends on `torch.compile` to optimize the model for better performance, which introduces the dependency on the `torch.compile` functionality and the `triton` library. By default, we use `torch.compile` to [optimize some functions](gh-pr:10406) in the model. Before running vLLM, you can check if `torch.compile` is working as expected by running the following script:
216216

217217
??? Code
218218

@@ -231,7 +231,7 @@ vLLM heavily depends on `torch.compile` to optimize the model for better perform
231231
print(f(x))
232232
```
233233

234-
If it raises errors from `torch/_inductor` directory, usually it means you have a custom `triton` library that is not compatible with the version of PyTorch you are using. See [this issue](https://github.com/vllm-project/vllm/issues/12219) for example.
234+
If it raises errors from `torch/_inductor` directory, usually it means you have a custom `triton` library that is not compatible with the version of PyTorch you are using. See <gh-issue:12219> for example.
235235

236236
## Model failed to be inspected
237237

docs/usage/v1_guide.md

Lines changed: 12 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
!!! announcement
44

5-
We have started the process of deprecating V0. Please read [RFC #18571](https://github.com/vllm-project/vllm/issues/18571) for more details.
5+
We have started the process of deprecating V0. Please read [RFC #18571](gh-issue:18571) for more details.
66

77
V1 is now enabled by default for all supported use cases, and we will gradually enable it for every use case we plan to support. Please share any feedback on [GitHub](https://github.com/vllm-project/vllm) or in the [vLLM Slack](https://inviter.co/vllm-slack).
88

@@ -83,7 +83,7 @@ based on assigned priority, with FCFS as a tie-breaker), configurable via the
8383
| **Decoder-only Models** | <nobr>🚀 Optimized</nobr> |
8484
| **Encoder-Decoder Models** | <nobr>🟠 Delayed</nobr> |
8585
| **Embedding Models** | <nobr>🟢 Functional</nobr> |
86-
| **Mamba Models** | <nobr>🚧 WIP ([PR #19327](https://github.com/vllm-project/vllm/pull/19327))</nobr> |
86+
| **Mamba Models** | <nobr>🚧 WIP (<gh-pr:19327>)</nobr> |
8787
| **Multimodal Models** | <nobr>🟢 Functional</nobr> |
8888

8989
vLLM V1 currently excludes model architectures with the `SupportsV0Only` protocol.
@@ -98,14 +98,14 @@ See below for the status of models that are not yet supported or have more featu
9898

9999
The initial basic support is now functional.
100100

101-
Later, we will consider using [hidden states processor](https://github.com/vllm-project/vllm/issues/12249),
102-
which is based on [global logits processor](https://github.com/vllm-project/vllm/pull/13360)
101+
Later, we will consider using [hidden states processor](gh-issue:12249),
102+
which is based on [global logits processor](gh-pr:13360)
103103
to enable simultaneous generation and embedding using the same engine instance in V1.
104104

105105
#### Mamba Models
106106

107107
Models using selective state-space mechanisms instead of standard transformer attention (e.g., `MambaForCausalLM`, `JambaForCausalLM`)
108-
will be supported via [PR #19327](https://github.com/vllm-project/vllm/pull/19327).
108+
will be supported via <gh-pr:19327>.
109109

110110
#### Encoder-Decoder Models
111111

@@ -120,13 +120,13 @@ are not yet supported.
120120
| **Chunked Prefill** | <nobr>🚀 Optimized</nobr> |
121121
| **LoRA** | <nobr>🚀 Optimized</nobr> |
122122
| **Logprobs Calculation** | <nobr>🟢 Functional</nobr> |
123-
| **FP8 KV Cache** | <nobr>🟢 Functional on Hopper devices ([PR #15191](https://github.com/vllm-project/vllm/pull/15191))</nobr>|
123+
| **FP8 KV Cache** | <nobr>🟢 Functional on Hopper devices (<gh-pr:15191>)</nobr>|
124124
| **Spec Decode** | <nobr>🚀 Optimized</nobr> |
125-
| **Prompt Logprobs with Prefix Caching** | <nobr>🟡 Planned ([RFC #13414](https://github.com/vllm-project/vllm/issues/13414))</nobr>|
125+
| **Prompt Logprobs with Prefix Caching** | <nobr>🟡 Planned ([RFC #13414](gh-issue:13414))</nobr>|
126126
| **Structured Output Alternative Backends** | <nobr>🟢 Functional</nobr> |
127127
| **Request-level Structured Output Backend** | <nobr>🔴 Deprecated</nobr> |
128-
| **best_of** | <nobr>🔴 Deprecated ([RFC #13361](https://github.com/vllm-project/vllm/issues/13361))</nobr>|
129-
| **Per-Request Logits Processors** | <nobr>🔴 Deprecated ([RFC #13360](https://github.com/vllm-project/vllm/pull/13360))</nobr> |
128+
| **best_of** | <nobr>🔴 Deprecated ([RFC #13361](gh-issue:13361))</nobr>|
129+
| **Per-Request Logits Processors** | <nobr>🔴 Deprecated ([RFC #13360](gh-pr:13360))</nobr> |
130130
| **GPU <> CPU KV Cache Swapping** | <nobr>🔴 Deprecated</nobr> |
131131

132132
!!! note
@@ -153,19 +153,19 @@ Support for logprobs with post-sampling adjustments is in progress and will be a
153153

154154
**Prompt Logprobs with Prefix Caching**
155155

156-
Currently prompt logprobs are only supported when prefix caching is turned off via `--no-enable-prefix-caching`. In a future release, prompt logprobs will be compatible with prefix caching, but a recomputation will be triggered to recover the full prompt logprobs even upon a prefix cache hit. See details in [RFC #13414](https://github.com/vllm-project/vllm/issues/13414).
156+
Currently prompt logprobs are only supported when prefix caching is turned off via `--no-enable-prefix-caching`. In a future release, prompt logprobs will be compatible with prefix caching, but a recomputation will be triggered to recover the full prompt logprobs even upon a prefix cache hit. See details in [RFC #13414](gh-issue:13414).
157157

158158
#### Deprecated Features
159159

160160
As part of the major architectural rework in vLLM V1, several legacy features have been deprecated.
161161

162162
**Sampling features**
163163

164-
- **best_of**: This feature has been deprecated due to limited usage. See details at [RFC #13361](https://github.com/vllm-project/vllm/issues/13361).
164+
- **best_of**: This feature has been deprecated due to limited usage. See details at [RFC #13361](gh-issue:13361).
165165
- **Per-Request Logits Processors**: In V0, users could pass custom
166166
processing functions to adjust logits on a per-request basis. In vLLM V1, this
167167
feature has been deprecated. Instead, the design is moving toward supporting **global logits
168-
processors**, a feature the team is actively working on for future releases. See details at [RFC #13360](https://github.com/vllm-project/vllm/pull/13360).
168+
processors**, a feature the team is actively working on for future releases. See details at [RFC #13360](gh-pr:13360).
169169

170170
**KV Cache features**
171171

0 commit comments

Comments
 (0)