Skip to content

Commit 3744b71

Browse files
authored
[Docs] Update deployment documentation (#2435)
* Update * Refine docs * Add version control * basic inference->quick inference * Remove deprecated docs * Remove deprecated docs * Remove tutorial list in serving docs
1 parent 6e0c094 commit 3744b71

File tree

9 files changed

+249
-312
lines changed

9 files changed

+249
-312
lines changed

docs/pipeline_deploy/high_performance_inference.en.md

Lines changed: 62 additions & 36 deletions
Original file line numberDiff line numberDiff line change
@@ -8,11 +8,11 @@ In real-world production environments, many applications have stringent standard
88

99
## 1. Installation and Usage of High-Performance Inference Plugins
1010

11-
Before using the high-performance inference plugins, ensure you have completed the installation of PaddleX according to the [PaddleX Local Installation Tutorial](../installation/installation.en.md), and have successfully run the basic inference of the pipeline using either the PaddleX pipeline command line instructions or the Python script instructions.
11+
Before using the high-performance inference plugins, ensure you have completed the installation of PaddleX according to the [PaddleX Local Installation Tutorial](../installation/installation.en.md), and have successfully run the quick inference of the pipeline using either the PaddleX pipeline command line instructions or the Python script instructions.
1212

1313
### 1.1 Installing High-Performance Inference Plugins
1414

15-
Find the corresponding installation command based on your processor architecture, operating system, device type, and Python version in the table below and execute it in your deployment environment:
15+
Find the corresponding installation command based on your processor architecture, operating system, device type, and Python version in the table below and execute it in your deployment environment. Please replace `{paddlex version number}` with the actual paddlex version number, such as the current latest stable version `3.0.0b2`. If you need to use the version corresponding to the development branch, replace `{paddlex version number}` with `0.0.0.dev0`.
1616

1717
<table>
1818
<tr>
@@ -29,33 +29,33 @@ Find the corresponding installation command based on your processor architecture
2929
</tr>
3030
<tr>
3131
<td>3.8</td>
32-
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device cpu --py 38</td>
32+
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device cpu --py 38</td>
3333
</tr>
3434
<tr>
3535
<td>3.9</td>
36-
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device cpu --py 39</td>
36+
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device cpu --py 39</td>
3737
</tr>
3838
<tr>
3939
<td>3.10</td>
40-
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device cpu --py 310</td>
40+
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device cpu --py 310</td>
4141
</tr>
4242
<tr>
4343
<td rowspan="3">GPU&nbsp;(CUDA&nbsp;11.8&nbsp;+&nbsp;cuDNN&nbsp;8.6)</td>
4444
<td>3.8</td>
45-
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 38</td>
45+
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.8 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 38</td>
4646
</tr>
4747
<tr>
4848
<td>3.9</td>
49-
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 39</td>
49+
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.9 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 39</td>
5050
</tr>
5151
<tr>
5252
<td>3.10</td>
53-
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/latest/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 310</td>
53+
<td>curl -s https://paddle-model-ecology.bj.bcebos.com/paddlex/PaddleX3.0/deploy/paddlex_hpi/install_script/{paddlex version number}/install_paddlex_hpi.py | python3.10 - --arch x86_64 --os linux --device gpu_cuda118_cudnn86 --py 310</td>
5454
</tr>
5555
</table>
5656

57-
* When the device type is GPU, please use the installation instructions corresponding to the CUDA and cuDNN versions that match your environment. Otherwise, you will not be able to use the high-performance inference plugin properly.
5857
* For Linux systems, execute the installation instructions using Bash.
58+
* When using NVIDIA GPUs, please use the installation instructions corresponding to the CUDA and cuDNN versions that match your environment. Otherwise, you will not be able to use the high-performance inference plugin properly.
5959
* When the device type is CPU, the installed high-performance inference plugin only supports inference using the CPU; for other device types, the installed high-performance inference plugin supports inference using the CPU or other devices.
6060

6161
### 1.2 Obtaining Serial Numbers and Activation
@@ -77,37 +77,37 @@ Please note: Each serial number can only be bound to a unique device fingerprint
7777

7878
### 1.3 Enabling High-Performance Inference Plugins
7979

80-
Before enabling high-performance plugins, please ensure that the `LD_LIBRARY_PATH` of the current environment does not specify the TensorRT directory, as the plugins already integrate TensorRT to avoid conflicts caused by different TensorRT versions that may prevent the plugins from functioning properly.
80+
For Linux systems, if using the high-performance inference plugin in a Docker container, please mount the host machine's `/dev/disk/by-uuid` and `${HOME}/.baidu/paddlex/licenses` directories to the container.
8181

8282
For PaddleX CLI, specify `--use_hpip` and set the serial number to enable the high-performance inference plugin. If you wish to activate the license online, specify `--update_license` when using the serial number for the first time. Taking the general image classification pipeline as an example:
8383

84-
```diff
84+
```bash
8585
paddlex \
8686
--pipeline image_classification \
8787
--input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
8888
--device gpu:0 \
89-
+ --use_hpip \
90-
+ --serial_number {serial_number}
89+
--use_hpip \
90+
--serial_number {serial_number}
9191

92-
# If you wish to activate the license online
92+
# If you wish to perform online activation
9393
paddlex \
9494
--pipeline image_classification \
9595
--input https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg \
9696
--device gpu:0 \
97-
+ --use_hpip \
98-
+ --serial_number {serial_number} \
99-
+ --update_license
97+
--use_hpip \
98+
--serial_number {serial_number} \
99+
--update_license
100100
```
101101

102102
For PaddleX Python API, enabling the high-performance inference plugin is similar. Still taking the general image classification pipeline as an example:
103103

104-
```diff
104+
```python
105105
from paddlex import create_pipeline
106106

107107
pipeline = create_pipeline(
108108
pipeline="image_classification",
109-
+ use_hpip=True,
110-
+ serial_number="{serial_number}",
109+
use_hpip=True,
110+
hpi_params={"serial_number": "{serial_number}"},
111111
)
112112

113113
output = pipeline.predict("https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_image_classification_001.jpg")
@@ -117,35 +117,61 @@ The inference results obtained with the high-performance inference plugin enable
117117

118118
### 1.4 Modifying High-Performance Inference Configurations
119119

120-
PaddleX provides default high-performance inference configurations for each model and stores them in the model's configuration file. Due to the diversity of actual deployment environments, using the default configurations may not achieve ideal performance in specific environments or may even result in inference failures. For situations where the default configurations cannot meet requirements, you can try changing the model's inference backend as follows:
120+
PaddleX combines model information and runtime environment information to provide default high-performance inference configurations for each model. These default configurations are carefully prepared to be applicable in several common scenarios and achieve relatively optimal performance. Therefore, users typically may not need to be concerned with the specific details of these configurations. However, due to the diversity of actual deployment environments and requirements, the default configuration may not yield ideal performance in certain scenarios and could even result in inference failures. In cases where the default configuration does not meet the requirements, users can manually adjust the configuration by modifying the Hpi field in the inference.yml file within the model directory (if this field does not exist, it needs to be added). The following are two common situations:
121121

122-
1. Locate the `inference.yml` file in the model directory and find the `Hpi` field.
122+
- Switching inference backends:
123123

124-
2. Modify the value of `selected_backends`. Specifically, `selected_backends` may be set as follows:
124+
When the default inference backend is not available, the inference backend needs to be switched manually. Users should modify the `selected_backends` field (if it does not exist, it needs to be added).
125125

126126
```yaml
127-
selected_backends:
127+
Hpi:
128+
...
129+
selected_backends:
128130
cpu: paddle_infer
129131
gpu: onnx_runtime
132+
...
130133
```
131134

132-
Each entry is formatted as `{device_type}: {inference_backend_name}`. The default selects the backend with the shortest inference time in the official test environment. `supported_backends` lists the inference backends supported by the model in the official test environment for reference.
135+
Each entry should follow the format `{device type}: {inference backend name}`.
133136

134137
The currently available inference backends are:
135138

136-
* `paddle_infer`: The standard Paddle Inference engine. Supports CPU and GPU.
137-
* `paddle_tensorrt`: [Paddle-TensorRT](https://www.paddlepaddle.org.cn/lite/v2.10/optimize/paddle_trt.html), a high-performance deep learning inference library produced by Paddle, which integrates TensorRT in the form of subgraphs for further optimization and acceleration. Supports GPU only.
138-
* `openvino`: [OpenVINO](https://github.com/openvinotoolkit/openvino), a deep learning inference tool provided by Intel, optimized for model inference performance on various Intel hardware. Supports CPU only.
139-
* `onnx_runtime`: [ONNX Runtime](https://onnxruntime.ai/), a cross-platform, high-performance inference engine. Supports CPU and GPU.
140-
* `tensorrt`: [TensorRT](https://developer.nvidia.com/tensorrt), a high-performance deep learning inference library provided by NVIDIA, optimized for NVIDIA GPUs to improve speed. Supports GPU only.
139+
* `paddle_infer`: The Paddle Inference engine. Supports CPU and GPU. Compared to the PaddleX quick inference, TensorRT subgraphs can be integrated to enhance inference performance on GPUs.
140+
* `openvino`: [OpenVINO](https://github.com/openvinotoolkit/openvino), a deep learning inference tool provided by Intel, optimized for model inference performance on various Intel hardware. Supports CPU only. The high-performance inference plugin automatically converts the model to the ONNX format and uses this engine for inference.
141+
* `onnx_runtime`: [ONNX Runtime](https://onnxruntime.ai/), a cross-platform, high-performance inference engine. Supports CPU and GPU. The high-performance inference plugin automatically converts the model to the ONNX format and uses this engine for inference.
142+
* `tensorrt`: [TensorRT](https://developer.nvidia.com/tensorrt), a high-performance deep learning inference library provided by NVIDIA, optimized for NVIDIA GPUs to improve speed. Supports GPU only. The high-performance inference plugin automatically converts the model to the ONNX format and uses this engine for inference.
141143

142-
Here are some key details of the current official test environment:
144+
- Modifying dynamic shape configurations for Paddle Inference or TensorRT:
143145

144-
* CPU: Intel Xeon Gold 5117
145-
* GPU: NVIDIA Tesla T4
146-
* CUDA Version: 11.8
147-
* cuDNN Version: 8.6
148-
* Docker:registry.baidubce.com/paddlepaddle/paddle:latest-dev-cuda11.8-cudnn8.6-trt8.5-gcc82
146+
Dynamic shape is the ability of TensorRT to defer specifying parts or all of a tensor’s dimensions until runtime. If the default dynamic shape configuration does not meet requirements (e.g., the model may require input shapes beyond the default range), users need to modify the `trt_dynamic_shapes` or `dynamic_shapes` field in the inference backend configuration:
147+
148+
```yaml
149+
Hpi:
150+
...
151+
backend_configs:
152+
# Configuration for the Paddle Inference backend
153+
paddle_infer:
154+
...
155+
trt_dynamic_shapes:
156+
x:
157+
- [1, 3, 300, 300]
158+
- [4, 3, 300, 300]
159+
- [32, 3, 1200, 1200]
160+
...
161+
# Configuration for the TensorRT backend
162+
tensorrt:
163+
...
164+
dynamic_shapes:
165+
x:
166+
- [1, 3, 300, 300]
167+
- [4, 3, 300, 300]
168+
- [32, 3, 1200, 1200]
169+
...
170+
```
171+
172+
In `trt_dynamic_shapes` or `dynamic_shapes`, each input tensor requires a specified dynamic shape in the format: `{input tensor name}: [{minimum shape}, [{optimal shape}], [{maximum shape}]]`. For details on minimum, optimal, and maximum shapes and further information, please refer to the official TensorRT documentation.
173+
174+
After completing the modifications, please delete the cache files in the model directory (`shape_range_info.pbtxt` and files starting with `trt_serialized`).
149175

150176
## 2. Pipelines and Models Supporting High-Performance Inference Plugins
151177

0 commit comments

Comments
 (0)