You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/features/reasoning_output.md
+21-16Lines changed: 21 additions & 16 deletions
Original file line number
Diff line number
Diff line change
@@ -1,30 +1,35 @@
1
-
# Chain-of-Thought Content
1
+
# Reasoning Outputs
2
2
3
-
The reasoning model returns a`reasoning_content` field in the output, representing the chain-of-thought content—the reasoning steps that lead to the final conclusion.
3
+
Reasoning models return an additional`reasoning_content` field in their output, which contains the reasoning steps that led to the final conclusion.
4
4
5
-
## Currently Supported Chain-of-Thought Models
6
-
| Model Name | Parser Name |Chain-of-Thought Enabled by Default |
The reasoning model requires a specified parser to interpret the reasoning content. The reasoning mode can be disabled by setting the `enable_thinking=False` parameter.
11
+
The reasoning model requires a specified parser to extract reasoning content. The reasoning mode can be disabled by setting the `enable_thinking=False` parameter.
12
12
13
13
Interfaces that support toggling the reasoning mode:
14
-
1.`/v1/chat/completions`request in OpenAI services.
15
-
2.`/v1/chat/completions`request in the OpenAI Python client.
16
-
3.`llm.chat`request in Offline interfaces.
14
+
1.`/v1/chat/completions`requests in OpenAI services.
15
+
2.`/v1/chat/completions`requests in the OpenAI Python client.
16
+
3.`llm.chat`requests in Offline interfaces.
17
17
18
18
For reasoning models, the length of the reasoning content can be controlled via `reasoning_max_tokens`. Add `metadata={"reasoning_max_tokens": 1024}` to the request.
19
19
20
20
### Quick Start
21
21
When launching the model service, specify the parser name using the `--reasoning-parser` argument.
22
22
This parser will process the model's output and extract the `reasoning_content` field.
Next, send a `chat completion` request to the model:
32
+
Next, make a request to the model that should return the reasoning content in the response.
28
33
```bash
29
34
curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
30
35
-H "Content-Type: application/json" \
@@ -40,8 +45,8 @@ curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
40
45
```
41
46
The `reasoning_content` field contains the reasoning steps to reach the final conclusion, while the `content` field holds the conclusion itself.
42
47
43
-
### Streaming Sessions
44
-
In streaming sessions, the `reasoning_content` field can be retrieved from the `delta` in `chat completion response chunks`.
48
+
### Streaming chat completions
49
+
Streaming chat completions are also supported for reasoning models. The `reasoning_content` field is available in the `delta`field in `chat completion response chunks`
45
50
```python
46
51
from openai import OpenAI
47
52
# Set OpenAI's API key and API base to use vLLM's API server.
0 commit comments