modify reasoning_output docs (#2696)

LiqinruiG · web-flow · commit b38823bc6656 · 2025-07-04T11:30:02.000+08:00
diff --git a/docs/features/reasoning_output.md b/docs/features/reasoning_output.md
@@ -1,30 +1,35 @@
-# Chain-of-Thought Content
+# Reasoning Outputs
 
-The reasoning model returns a `reasoning_content` field in the output, representing the chain-of-thought content—the reasoning steps that lead to the final conclusion.
+Reasoning models return an additional `reasoning_content` field in their output, which contains the reasoning steps that led to the final conclusion.
 
-## Currently Supported Chain-of-Thought Models
-| Model Name     | Parser Name    | Chain-of-Thought Enabled by Default |
-|----------------|----------------|-------------------------------------|
-| ernie-45-vl    | ernie-45-vl    | ✓                                   |
-| ernie-lite-vl  | ernie-45-vl    | ✓                                   |
+## Supported Models
+| Model Name     | Parser Name    | Eable_thinking by Default |
+|----------------|----------------|---------------------------|
+| baidu/ERNIE-4.5-VL-424B-A47B-Paddle    | ernie-45-vl    | ✓                         |
+| baidu/ERNIE-4.5-VL-28B-A3B-Paddle | ernie-45-vl    | ✓                         |
 
-The reasoning model requires a specified parser to interpret the reasoning content. The reasoning mode can be disabled by setting the `enable_thinking=False` parameter.
+The reasoning model requires a specified parser to extract reasoning content. The reasoning mode can be disabled by setting the `enable_thinking=False` parameter.
 
 Interfaces that support toggling the reasoning mode:
-1. `/v1/chat/completions` request in OpenAI services.
-2. `/v1/chat/completions` request in the OpenAI Python client.
-3. `llm.chat` request in Offline interfaces.
+1. `/v1/chat/completions` requests in OpenAI services.
+2. `/v1/chat/completions` requests in the OpenAI Python client.
+3. `llm.chat` requests in Offline interfaces.
 
 For reasoning models, the length of the reasoning content can be controlled via `reasoning_max_tokens`. Add `metadata={"reasoning_max_tokens": 1024}` to the request.
 
 ### Quick Start
 When launching the model service, specify the parser name using the `--reasoning-parser` argument.  
 This parser will process the model's output and extract the `reasoning_content` field.
 ```bash
-python -m fastdeploy.entrypoints.openai.api_server --model /root/merge_llm_model  --enable-mm --tensor-parallel-size=8 --port 8192 --quantization wint4 --reasoning-parser=ernie-45-vl
+python -m fastdeploy.entrypoints.openai.api_server \
+    --model /path/to/your/model \
+    --enable-mm \
+    --tensor-parallel-size 8 \
+    --port 8192 \
+    --quantization wint4 \
+    --reasoning-parser ernie-45-vl
 ```
-
-Next, send a `chat completion` request to the model:
+Next, make a request to the model that should return the reasoning content in the response.
 ```bash
 curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
 -H "Content-Type: application/json" \
@@ -40,8 +45,8 @@ curl -X POST "http://0.0.0.0:8192/v1/chat/completions" \
 ```
 The `reasoning_content` field contains the reasoning steps to reach the final conclusion, while the `content` field holds the conclusion itself.
 
-### Streaming Sessions
-In streaming sessions, the `reasoning_content` field can be retrieved from the `delta` in `chat completion response chunks`.
+### Streaming chat completions
+Streaming chat completions are also supported for reasoning models. The `reasoning_content` field is available in the `delta` field in `chat completion response chunks`
 ```python
 from openai import OpenAI
 # Set OpenAI's API key and API base to use vLLM's API server.
diff --git a/docs/zh/features/reasoning_output.md b/docs/zh/features/reasoning_output.md
@@ -5,8 +5,8 @@
 ##目前支持思考链的模型
 | 模型名称          | 解析器名称       | 默认开启思考链 |
 |---------------|-------------|---------|
-| ernie-45-vl   | ernie-45-vl | ✓       |
-| ernie-lite-vl | ernie-45-vl |    ✓    |
+| baidu/ERNIE-4.5-VL-424B-A47B-Paddle  | ernie-45-vl | ✓       |
+| baidu/ERNIE-4.5-VL-28B-A3B-Paddle | ernie-45-vl |    ✓    |
 
 思考模型需要指定解析器,以便于对思考内容进行解析. 通过`enable_thinking=False` 参数可以关闭模型思考模式.