-
Couldn't load subscription status.
- Fork 415
Open
Labels
Description
Hi, When running the contraction_jit sample with Valgrind (valgrind --tool=memcheck), the program crashes with a segmentation fault during cutensorWriteKernelCacheToFile. The error indicates an invalid read in memmove.
Environment:
- OS: Ubuntu 20.04
- CUDA: 12.8
- CUDA driver: 570.172.08
- cuTensor: 2.1.0
- Valgrind: 3.15.0
Repro Steps:
- Build the
contraction_jitsample - Run once to ensure
kernelCache.binis generated in the directory - Then run with valgrind:
valgrind --tool=memcheck --track-origins=yes ./contraction_jit
Valgrind Output:
Total memory: 0.38 GiB
Kernel cache found and read successfully.
cuTENSOR : 865 GFLOPs/s
cuTENSOR JIT: 866 GFLOPs/s
Speedup: 1.0x
JIT Compilation time: 0.0 seconds (Kernel cache file was read successfully; Compilation was not required)
Kernel cache written to file. Will be read in next execution.
Successful completion
root@1c52d552ef27:~/CUDALibrarySamples/cuTENSOR# valgrind --tool=memcheck --track-origins=yes ./contraction_jit
==1035== Memcheck, a memory error detector
==1035== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1035== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1035== Command: ./contraction_jit
==1035==
==1035== Warning: set address range perms: large range [0x484a000, 0x1dd2a000) (defined)
==1035== Warning: set address range perms: large range [0x1e159000, 0x5044b000) (defined)
Total memory: 0.38 GiB
==1035== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x4b with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: set address range perms: large range [0x200000000, 0xa00200000) (noaccess)
==1035== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x49 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x44 with no size/direction hints.
==1035== This could cause spurious value errors to appear.
==1035== See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: set address range perms: large range [0x7a400000, 0x8c3ff000) (noaccess)
Kernel cache found and read successfully.
cuTENSOR : 812 GFLOPs/s
cuTENSOR JIT: 812 GFLOPs/s
Speedup: 1.0x
JIT Compilation time: 0.0 seconds (Kernel cache file was read successfully; Compilation was not required)
==1035== Invalid read of size 1
==1035== at 0x4842A39: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1035== by 0x4E382A1: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035== by 0x4E37419: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035== by 0x4DC1E0D: cutensorWriteKernelCacheToFile (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035== by 0x112994: main (contraction_jit.cu:482)
==1035== Address 0xc90385cc is not stack'd, malloc'd or (recently) free'd
==1035==
==1035==
==1035== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1035== Access not within mapped region at address 0xC90385CC
==1035== at 0x4842A39: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1035== by 0x4E382A1: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035== by 0x4E37419: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035== by 0x4DC1E0D: cutensorWriteKernelCacheToFile (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035== by 0x112994: main (contraction_jit.cu:482)
==1035== If you believe this happened as a result of a stack
==1035== overflow in your program's main thread (unlikely but
==1035== possible), you can try to increase the size of the
==1035== main thread stack using the --main-stacksize= flag.
==1035== The main thread stack size used in this run was 8388608.
==1035==
==1035== HEAP SUMMARY:
==1035== in use at exit: 737,323,325 bytes in 307,997 blocks
==1035== total heap usage: 450,372 allocs, 142,375 frees, 900,451,450 bytes allocated
==1035==
==1035== LEAK SUMMARY:
==1035== definitely lost: 0 bytes in 0 blocks
==1035== indirectly lost: 0 bytes in 0 blocks
==1035== possibly lost: 143,632 bytes in 1,417 blocks
==1035== still reachable: 737,179,693 bytes in 306,580 blocks
==1035== suppressed: 0 bytes in 0 blocks
==1035== Rerun with --leak-check=full to see details of leaked memory
==1035==
==1035== For lists of detected and suppressed errors, rerun with: -s
==1035== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Segmentation fault