Skip to content

Commit 7bcbb31

Browse files
authored
docs: rename userguide AI Agents section to Features | add spec decoding to Features (#8099) (#8105)
1 parent ee62cc7 commit 7bcbb31

File tree

3 files changed

+97
-3
lines changed

3 files changed

+97
-3
lines changed

docs/contents.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -50,10 +50,11 @@
5050

5151
.. toctree::
5252
:hidden:
53-
:caption: AI Agents
53+
:caption: LLM Features
5454

55-
Constrained Decoding <../tutorials/AI_Agents_Guide/Constrained_Decoding/README.md>
56-
Function Calling <../tutorials/AI_Agents_Guide/Function_Calling/README.md>
55+
Constrained Decoding <../tutorials/Feature_Guide/Constrained_Decoding/README.md>
56+
Function Calling <../tutorials/Feature_Guide/Function_Calling/README.md>
57+
llm_features/speculative_decoding_by_backend_type
5758

5859
.. toctree::
5960
:hidden:
Lines changed: 54 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,54 @@
1+
..
2+
.. Copyright 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
..
4+
.. Redistribution and use in source and binary forms, with or without
5+
.. modification, are permitted provided that the following conditions
6+
.. are met:
7+
.. * Redistributions of source code must retain the above copyright
8+
.. notice, this list of conditions and the following disclaimer.
9+
.. * Redistributions in binary form must reproduce the above copyright
10+
.. notice, this list of conditions and the following disclaimer in the
11+
.. documentation and/or other materials provided with the distribution.
12+
.. * Neither the name of NVIDIA CORPORATION nor the names of its
13+
.. contributors may be used to endorse or promote products derived
14+
.. from this software without specific prior written permission.
15+
..
16+
.. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
.. EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
.. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
.. PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
.. CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
.. EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
.. PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
.. PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
.. OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
.. (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
.. OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
28+
.. raw:: html
29+
30+
31+
About Speculative Decoding
32+
=========================
33+
Speculative Decoding (also referred to as Speculative Sampling) is a set of techniques designed
34+
to allow generation of more than one token per forward pass iteration. This can lead to a reduction
35+
in the average per-token latency in situations where the GPU is underutilized due to small batch sizes.
36+
37+
Speculative decoding involves predicting a sequence of future tokens, referred to as draft tokens,
38+
using a method that is substantially more efficient than repeatedly executing the target Large Language
39+
Model (LLM). These draft tokens are then collectively validated by processing them through the target LLM
40+
in a single forward pass. The underlying assumptions are twofold:
41+
42+
1. processing multiple draft tokens concurrently will be as rapid as processing a single token
43+
2. multiple draft tokens will be validated successfully over the course of the full generation
44+
45+
If the first assumption holds true, the latency of speculative decoding will no worse than the standard
46+
approach. If the second holds, output token generation advances by statistically more than one token per
47+
forward pass. The combination of both these allows speculative decoding to result in reduced latency.
48+
49+
Performance Improvements
50+
========================
51+
It's important to note that the effectiveness of speculative decoding techniques is highly dependent
52+
on the specific task at hand. For instance, forecasting subsequent tokens in a code-completion scenario
53+
may prove simpler than generating a summary for an article. `Spec-Bench <https://sites.google.com/view/spec-bench>`__
54+
shows the performance of different speculative decoding approaches on different tasks.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
..
2+
.. Copyright 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
..
4+
.. Redistribution and use in source and binary forms, with or without
5+
.. modification, are permitted provided that the following conditions
6+
.. are met:
7+
.. * Redistributions of source code must retain the above copyright
8+
.. notice, this list of conditions and the following disclaimer.
9+
.. * Redistributions in binary form must reproduce the above copyright
10+
.. notice, this list of conditions and the following disclaimer in the
11+
.. documentation and/or other materials provided with the distribution.
12+
.. * Neither the name of NVIDIA CORPORATION nor the names of its
13+
.. contributors may be used to endorse or promote products derived
14+
.. from this software without specific prior written permission.
15+
..
16+
.. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS ``AS IS'' AND ANY
17+
.. EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
18+
.. IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
19+
.. PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
20+
.. CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
21+
.. EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
22+
.. PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
23+
.. PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY
24+
.. OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
25+
.. (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
26+
.. OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
27+
28+
####
29+
Speculative Decoding
30+
####
31+
32+
.. include:: speculative_decoding.rst
33+
34+
.. toctree::
35+
:maxdepth: 1
36+
:hidden:
37+
38+
TRT-LLM <../tutorials/Feature_Guide/Speculative_Decoding/TRT-LLM/README.md>
39+
vLLM <../tutorials/Feature_Guide/Speculative_Decoding/vLLM/README.md>

0 commit comments

Comments
 (0)