Skip to content

Commit 5fc659b

Browse files
[Docs] add enable_logprob parameter description (#2850)
* add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description * add enable_logprob parameter description --------- Co-authored-by: Jiang-Jia-Jun <163579578+Jiang-Jia-Jun@users.noreply.github.com>
1 parent 33db137 commit 5fc659b

File tree

4 files changed

+52
-0
lines changed

4 files changed

+52
-0
lines changed

docs/online_serving/README.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,16 @@ python -m fastdeploy.entrypoints.openai.api_server \
99
--max-model-len 32768
1010
```
1111

12+
To enable log probability output, simply deploy with the following command:
13+
14+
```bash
15+
python -m fastdeploy.entrypoints.openai.api_server \
16+
--model baidu/ERNIE-4.5-0.3B-Paddle \
17+
--port 8188 --tensor-parallel-size 8 \
18+
--max-model-len 32768 \
19+
--enable-logprob
20+
```
21+
1222
For more usage methods of the command line during service deployment, refer to [Parameter Descriptions](../parameters.md).
1323

1424
## Sending User Requests
@@ -26,6 +36,18 @@ curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
2636
]
2737
}'
2838
```
39+
Here's an example curl command demonstrating how to include the logprobs parameter in a user request:
40+
41+
```bash
42+
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
43+
-H "Content-Type: application/json" \
44+
-d '{
45+
"messages": [
46+
{"role": "user", "content": "Hello!"}, "logprobs": true, "top_logprobs": 5
47+
]
48+
}'
49+
```
50+
2951
Here is an example of sending a user request using a Python script:
3052
```python
3153
import openai
@@ -55,6 +77,8 @@ The differences in request parameters between FastDeploy and the OpenAI protocol
5577

5678
- `prompt` (supported only in the `v1/completions` interface)
5779
- `messages` (supported only in the `v1/chat/completions` interface)
80+
- `logprobs`: Optional[bool] = False (supported only in the `v1/chat/completions` interface)
81+
- `top_logprobs`: Optional[int] = None (supported only in the `v1/chat/completions` interface. An integer between 0 and 20,logprobs must be set to true if this parameter is used)
5882
- `frequency_penalty`: Optional[float] = 0.0
5983
- `max_tokens`: Optional[int] = 16
6084
- `presence_penalty`: Optional[float] = 0.0

docs/parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,6 +44,7 @@ When using FastDeploy to deploy models (including offline inference and service
4444
| ```speculative_config``` | `dict[str]` | Speculative decoding configuration, only supports standard format JSON string, default: None |
4545
| ```dynamic_load_weight``` | `int` | Whether to enable dynamic weight loading, default: 0 |
4646
| ```enable_expert_parallel``` | `bool` | Whether to enable expert parallel |
47+
| ```enable_logprob``` | `bool` | Whether to enable return log probabilities of the output tokens or not. If true, returns the log probabilities of each output token returned in the content of message.If logrpob is not used, this parameter can be omitted when starting |
4748

4849

4950
## 1. Relationship between KVCache allocation, ```num_gpu_blocks_override``` and ```block_size```?

docs/zh/online_serving/README.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,17 @@ python -m fastdeploy.entrypoints.openai.api_server \
99
--max-model-len 32768
1010
```
1111

12+
如果要启用输出token的logprob,用户可以通过如下命令快速进行部署:
13+
14+
```bash
15+
python -m fastdeploy.entrypoints.openai.api_server \
16+
--model baidu/ERNIE-4.5-0.3B-Paddle \
17+
--port 8188 --tensor-parallel-size 8 \
18+
--max-model-len 32768 \
19+
--enable-logprob
20+
```
21+
22+
1223
服务部署时的命令行更多使用方式参考[参数说明](../parameters.md)
1324

1425
## 发送用户请求
@@ -26,6 +37,19 @@ curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
2637
]
2738
}'
2839
```
40+
41+
使用 curl 命令示例,演示如何在用户请求中包含logprobs参数:
42+
43+
```bash
44+
curl -X POST "http://0.0.0.0:8188/v1/chat/completions" \
45+
-H "Content-Type: application/json" \
46+
-d '{
47+
"messages": [
48+
{"role": "user", "content": "Hello!"}, "logprobs": true, "top_logprobs": 5
49+
]
50+
}'
51+
```
52+
2953
使用 Python 脚本发送用户请求示例如下:
3054
```python
3155
import openai
@@ -54,6 +78,8 @@ print('\n')
5478
FastDeploy 与 OpenAI 协议的请求参数差异如下,其余请求参数会被忽略:
5579
- `prompt` (仅支持 `v1/completions` 接口)
5680
- `messages` (仅支持 `v1/chat/completions` 接口)
81+
- `logprobs`: Optional[bool] = False (仅支持 `v1/chat/completions` 接口)
82+
- `top_logprobs`: Optional[int] = None (仅支持 `v1/chat/completions` 接口。如果使用这个参数必须设置logprobs为True,取值大于等于0小于20)
5783
- `frequency_penalty`: Optional[float] = 0.0
5884
- `max_tokens`: Optional[int] = 16
5985
- `presence_penalty`: Optional[float] = 0.0

docs/zh/parameters.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -43,6 +43,7 @@
4343
| ```speculative_config``` | `dict[str]` | 投机解码配置,仅支持标准格式json字符串,默认为None |
4444
| ```dynamic_load_weight``` | `int` | 是否动态加载权重,默认0 |
4545
| ```enable_expert_parallel``` | `bool` | 是否启用专家并行 |
46+
| ```enable_logprob``` | `bool` | 是否启用输出token返回logprob。如果未使用 logrpob,则在启动时可以省略此参数。 |
4647

4748

4849
## 1. KVCache分配与```num_gpu_blocks_override``````block_size```的关系?

0 commit comments

Comments
 (0)