-
Notifications
You must be signed in to change notification settings - Fork 146

Description
How can I run kmcuda synchronously after a tensorRT model performs inference on the same GPU (in a loop)?
For instance, I already am allocating pagelocked buffers for my tensorRT model, but I don't explicitly allocate anything upfront for kmeans_cuda
to run on. Doesn't that mean there might be a conflict if both processes are accessing the GPU and don't totally "cleanup" after themselves?
The error I get the next time tensorRT runs (only after kmcuda runs):
[TensorRT] ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
reported here: NVIDIA/TensorRT#303
So I guess in general my question is how should/can I cleanup after kmcuda runs? The reason I think some how preallocating buffers would help is because a very similar SO issue reported that as the solution (for tensorflow and tensorRT on the same GPU)
Environment:
nvcr.io/nvidia/l4t-base:r32.4.4
cuda-10.2
tensorRT 7.1.3