You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/offline_inference.md
+9-4Lines changed: 9 additions & 4 deletions
Original file line number
Diff line number
Diff line change
@@ -4,6 +4,7 @@
4
4
FastDeploy supports offline inference by loading models locally and processing user data. Usage examples:
5
5
6
6
### Chat Interface (LLM.chat)
7
+
7
8
```python
8
9
from fastdeploy importLLM, SamplingParams
9
10
@@ -77,10 +78,12 @@ for output in outputs:
77
78
prompt = output.prompt
78
79
generated_text = output.outputs.text
79
80
```
81
+
80
82
> Note: Text completion interface, suitable for scenarios where users have predefined the context input and expect the model to output only the continuation content. No additional `prompt` concatenation will be added during the inference process.
81
83
> For the `chat` model, it is recommended to use the Chat Interface (`LLM.chat`).
82
84
83
85
For multimodal models, such as `baidu/ERNIE-4.5-VL-28B-A3B-Paddle`, when calling the `generate interface`, you need to provide a prompt that includes images. The usage is as follows:
>Note: The `generate interface` does not currently support passing parameters to control the thinking function (on/off). It always uses the model's default parameters.
145
149
146
150
## 2. API Documentation
@@ -159,12 +163,12 @@ For ```LLM``` configuration, refer to [Parameter Documentation](parameters.md).
0 commit comments