Skip to content

Error err_str:cudaErrorIllegalAddress happened randomly. #659

@namogg

Description

@namogg

I have been running YOLOv11 model on deepstream for quite some time and i got this error randomly(maybe after 1 day or 12 hours). Here is the logs:

ERROR: nvdsinfer_context_impl.cpp:343 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1751 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:343 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1751 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
ERROR: nvdsinfer_context_impl.cpp:343 Failed to make stream wait on event, cuda err_no:700, err_str:cudaErrorIllegalAddress
ERROR: nvdsinfer_context_impl.cpp:1751 Preprocessor transform input data failed., nvinfer error:NVDSINFER_CUDA_ERROR
Error: gst-stream-error-quark: Buffer conversion failed (1): gstnvinfer.cpp(1574): gst_nvinfer_process_full_frame (): /GstPipeline:pipeline0/GstBin:coco_pipeline_bin/GstNvInfer:coco-detect

Here is some of my configuration(It should be similar to the sample configuration, the only diffrent is i convert my model using trtexec base con conversion configs):

[property]
gpu-id=0
net-scale-factor=0.0039215697906911373
model-color-format=0
onnx-file=/server/server_assets/models/Coco/coco_v11.onnx
model-engine-file=/server/server_assets/models/Coco/coco_v11.engine
#int8-calib-file=calib.table
labelfile-path=labels.txt
batch-size=50
network-mode=0
num-detected-classes=80
interval=0
gie-unique-id=10
process-mode=1
network-type=0
cluster-mode=2
maintain-aspect-ratio=1
symmetric-padding=1
#workspace-size=2000
parse-bbox-func-name=NvDsInferParseYolo
#parse-bbox-func-name=NvDsInferParseYoloCuda
custom-lib-path=/server/src/recognitions/coco/libnvdsinfer_custom_impl_Yolo.so
engine-create-func-name=NvDsInferYoloCudaEngineGet
#filter-out-class-ids=2

[class-attrs-all]
nms-iou-threshold=0.45
pre-cluster-threshold=0.25
topk=300


[conversion]
input_name = input
min_shape = (1, 3, 640, 640)
opt_shape = (50, 3, 640, 640)
max_shape = (50, 3, 640, 640)
atol = 1e-4
network-mode=2

Is this problem known and is there anyway to avoid it?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions