Skip to content

PaddleOcr应用部署到华为昇腾NPU机器,启动报错Call aclInit(nullptr) failed : 507008 #16162

@doudoubingo

Description

@doudoubingo

🔎 Search before asking

  • I have searched the PaddleOCR Docs and found no similar bug report.
  • I have searched the PaddleOCR Issues and found no similar bug report.
  • I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

在昇腾【Atlas 800I A2】机器上通过docker方式部署ocr的定制化应用,启动报错如下:
启动命令:docker run -d --name bcmocr-npu-st --privileged --shm-size=64G -v /usr/local/Ascend/driver:/usr/local/Ascend/driver -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi -v /usr/local/dcmi:/usr/local/dcmi -v /usr/lib/firmware:/usr/lib/firmware -e ASCEND_RT_VISIBLE_DEVICES="0" -e ASCEND_DRIVER_COMPAT_MODE=1 -e TZ=Asia/Shanghai -p 7097:7097 bcmocr-npu-st:0.0.1
报错日志:
2025-07-25 19:18:06 [线程ID: 281473059091504] INFO [bcmocr.ocr_initializer:364]: OCR参数配置: {'use_angle_cls': True, 'lang': 'ch', 'text_detection_model_dir': './models/PP-OCRv5/PP-OCRv5_server_det_infer/', 'text_recognition_model_dir': './models/PP-OCRv5/PP-OCRv5_server_rec_infer/', 'rec_batch_num': 6, 'enable_mkldnn': True, 'cpu_threads': 10, 'det_limit_side_len': 960, 'text_det_thresh': 0.2, 'text_det_box_thresh': 0.5, 'text_det_unclip_ratio': 2.0, 'text_rec_score_thresh': 0.0, 'device': 'npu'}
2025-07-25 19:18:06 [线程ID: 281473059091504] INFO [bcmocr.ocr_initializer:368]: 正在初始化PP-OCRv5引擎(完整配置)...
/usr/local/lib/python3.10/dist-packages/bcmocr/ocr_initializer.py:371: UserWarning: lang and ocr_version will be ignored when model names or model directories are not None.
ocr = PaddleOCR(**valid_params)
I0725 19:18:06.995957 1 init.cc:235] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device
I0725 19:18:06.996003 1 init.cc:144] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
Call aclInit(nullptr) failed : 507008 at file /paddle/backends/npu/runtime/runtime.cc line 403
EE1001: [PID: 1] 2025-07-25-19:18:07.265.468 The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
[Init][Version]init soc version failed, ret = 507008[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:5372]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
宿主机npu信息如下:
[root@bms-57652300-001 ~]# npu-smi info
+------------------------------------------------------------------------------------------------+
| npu-smi 24.1.0.3 Version: 24.1.0.3 |
+---------------------------+---------------+----------------------------------------------------+
| NPU Name | Health | Power(W) Temp(C) Hugepages-Usage(page)|
| Chip | Bus-Id | AICore(%) Memory-Usage(MB) HBM-Usage(MB) |
+===========================+===============+====================================================+
| 0 910B2 | OK | 99.3 45 0 / 0 |
| 0 | 0000:C1:00.0 | 0 0 / 0 3373 / 65536 |
+===========================+===============+====================================================+
| 1 910B2 | OK | 89.8 44 0 / 0 |
| 0 | 0000:C2:00.0 | 0 0 / 0 3196 / 65536 |
+===========================+===============+====================================================+
| 2 910B2 | OK | 91.8 47 0 / 0 |
| 0 | 0000:81:00.0 | 0 0 / 0 3196 / 65536 |
+===========================+===============+====================================================+
| 3 910B2 | OK | 90.8 46 0 / 0 |
| 0 | 0000:82:00.0 | 0 0 / 0 3376 / 65536 |
+===========================+===============+====================================================+
| 4 910B2 | OK | 93.8 50 0 / 0 |
| 0 | 0000:01:00.0 | 0 0 / 0 59607/ 65536 |
+===========================+===============+====================================================+
| 5 910B2 | OK | 92.1 50 0 / 0 |
| 0 | 0000:02:00.0 | 0 0 / 0 59604/ 65536 |
+===========================+===============+====================================================+
| 6 910B2 | OK | 93.9 49 0 / 0 |
| 0 | 0000:41:00.0 | 0 0 / 0 59605/ 65536 |
+===========================+===============+====================================================+
| 7 910B2 | OK | 89.2 48 0 / 0 |
| 0 | 0000:42:00.0 | 0 0 / 0 59604/ 65536 |
+===========================+===============+====================================================+
+---------------------------+---------------+----------------------------------------------------+
| NPU Chip | Process id | Process name | Process memory(MB) |
+===========================+===============+====================================================+
| No running processes found in NPU 0 |
+===========================+===============+====================================================+
| No running processes found in NPU 1 |
+===========================+===============+====================================================+
| No running processes found in NPU 2 |
+===========================+===============+====================================================+
| No running processes found in NPU 3 |
+===========================+===============+====================================================+
| 4 0 | 2071090 | mindie_llm_back | 56277 |
+===========================+===============+====================================================+
| 5 0 | 2071092 | mindie_llm_back | 56277 |
+===========================+===============+====================================================+
| 6 0 | 2071099 | mindie_llm_back | 56277 |
+===========================+===============+====================================================+
| 7 0 | 2071106 | mindie_llm_back | 56277 |
+===========================+===============+====================================================+
Dockerfile内容如下:

昇腾芯NPU - ARM架构

FROM ccr-2vdh3abv-pub.cnc.bj.baidubce.com/device/paddle-npu:cann800-ubuntu20-npu-910b-base-aarch64-gcc84

设置环境变量

ENV LD_LIBRARY_PATH=/usr/local/Ascend/nnal/atb/8.0.0/atb/cxx_abi_1/lib:/usr/local/Ascend/fwkacllib/lib64:/usr/local/Ascend/acllib/lib64:/usr/local/Ascend/driver/lib64:/usr/local/Ascend/ascend-toolkit/8.0.0/aarch64-linux/lib64:/usr/local/Ascend/ascend-toolkit/8.0.0/aarch64-linux/devlib:$LD_LIBRARY_PATH
ENV LD_PRELOAD=/usr/lib/aarch64-linux-gnu/libgomp.so.1:$LD_PRELOAD
ENV ASCEND_HOME_PATH=/usr/local/Ascend/ascend-toolkit/8.0.0

创建应用目录

RUN mkdir -p /app
WORKDIR /app

COPY aclruntime-0.0.2-cp310-cp310-linux_aarch64.whl .
RUN python -m pip install ./aclruntime-0.0.2-cp310-cp310-linux_aarch64.whl

安装飞桨及自定义NPU插件

RUN python -m pip install paddlepaddle==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/cpu/
RUN python -m pip install paddle-custom-npu==3.0.0rc1 -i https://www.paddlepaddle.org.cn/packages/stable/npu/

安装其他Python依赖

RUN python -m pip install --no-cache-dir
numpy==1.26.4
opencv-python==3.4.18.65
-i https://mirrors.aliyun.com/pypi/simple

复制应用代码和模型

COPY npu-bcmocr-0.0.1.tar.gz .
COPY models ./models/

安装应用依赖 - 单独安装以避免冲突

RUN python -m pip install --no-cache-dir
flask==3.0.3
paddleocr==3.1.0
-i https://mirrors.aliyun.com/pypi/simple

安装其他工具包

RUN python -m pip install --no-cache-dir
pyinstaller
setuptools
wheel
colorama
diff_match_patch
-i https://mirrors.aliyun.com/pypi/simple

最后安装应用包

RUN python -m pip install --no-cache-dir npu-bcmocr-0.0.1.tar.gz

强制卸载 simsimd 并创建 mock 版本(避免和NPU冲突)

RUN pip uninstall -y simsimd || true &&
mkdir -p /usr/local/lib/python3.10/dist-packages/simsimd &&
printf 'import numpy as np\n\n# Mock simsimd module to avoid NPU conflicts\n# Provide minimal functionality for albucore compatibility\n\ndef cosine(a, b):\n """Mock cosine similarity function"""\n return np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))\n\ndef euclidean(a, b):\n """Mock euclidean distance function"""\n return np.linalg.norm(a - b)\n\ndef hamming(a, b):\n """Mock hamming distance function"""\n return np.sum(a != b)\n\n# Add other commonly used functions as needed\ndef jaccard(a, b):\n """Mock jaccard similarity function"""\n intersection = np.sum(np.minimum(a, b))\n union = np.sum(np.maximum(a, b))\n return intersection / union if union > 0 else 0\n\n# Make the module importable\n__version__ = "5.0.0"\n__all__ = ["cosine", "euclidean", "hamming", "jaccard"]\n' > /usr/local/lib/python3.10/dist-packages/simsimd/init.py

清理临时文件

RUN rm -f npu-bcmocr-0.0.1.tar.gz && python -m pip cache purge

启动命令

ENTRYPOINT ["/bin/bash", "-c", "set -e && echo 'Starting BCM OCR service...' && exec python -m bcmocr"]
EXPOSE 7097

🏃‍♂️ Environment (运行环境)

操作系统:欧拉22.10,架构:arm,华为昇腾:Atlas 800I A2,python:3.10,部署方式:docker

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

进入docker容器内使用飞桨基础健康检查命令,也报相同的错误。
root@2917ae65cc3b:/app# python -c "import paddle; paddle.utils.run_check()"
I0730 15:20:59.151801 18 init.cc:235] ENV [CUSTOM_DEVICE_ROOT]=/usr/local/lib/python3.10/dist-packages/paddle_custom_device
I0730 15:20:59.151852 18 init.cc:144] Try loading custom device libs from: [/usr/local/lib/python3.10/dist-packages/paddle_custom_device]
Call aclInit(nullptr) failed : 507008 at file /paddle/backends/npu/runtime/runtime.cc line 403
EE1001: [PID: 18] 2025-07-30-15:20:59.931.919 The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]
Solution: 1.Check the input parameter range of the function. 2.Check the function invocation relationship.
TraceBack (most recent call last):
[Init][Version]init soc version failed, ret = 507008[FUNC:ReportInnerError][FILE:log_inner.cpp][LINE:145]
ctx is NULL![FUNC:GetDevErrMsg][FILE:api_impl.cc][LINE:5372]
The argument is invalid.Reason: rtGetDevMsg execute failed, reason=[context pointer null]

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions