Skip to content

cuTENSOR: Valgrind report segFault when runing contraction_jit due to invalid memory access #280

@Hanfei147

Description

@Hanfei147

Hi, When running the contraction_jit sample with Valgrind (valgrind --tool=memcheck), the program crashes with a segmentation fault during cutensorWriteKernelCacheToFile. The error indicates an invalid read in memmove.

Environment:

  • OS: Ubuntu 20.04
  • CUDA: 12.8
  • CUDA driver: 570.172.08
  • cuTensor: 2.1.0
  • Valgrind: 3.15.0

Repro Steps:

  1. Build the contraction_jit sample
  2. Run once to ensure kernelCache.bin is generated in the directory
  3. Then run with valgrind:
valgrind --tool=memcheck --track-origins=yes ./contraction_jit

Valgrind Output:

Total memory: 0.38 GiB
Kernel cache found and read successfully.
cuTENSOR    :    865 GFLOPs/s
cuTENSOR JIT:    866 GFLOPs/s
Speedup: 1.0x
JIT Compilation time: 0.0 seconds (Kernel cache file was read successfully; Compilation was not required)
Kernel cache written to file. Will be read in next execution.
Successful completion
root@1c52d552ef27:~/CUDALibrarySamples/cuTENSOR# valgrind --tool=memcheck --track-origins=yes ./contraction_jit
==1035== Memcheck, a memory error detector
==1035== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1035== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==1035== Command: ./contraction_jit
==1035==
==1035== Warning: set address range perms: large range [0x484a000, 0x1dd2a000) (defined)
==1035== Warning: set address range perms: large range [0x1e159000, 0x5044b000) (defined)
Total memory: 0.38 GiB
==1035== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x4b with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: set address range perms: large range [0x200000000, 0xa00200000) (noaccess)
==1035== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x49 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: noted but unhandled ioctl 0x44 with no size/direction hints.
==1035==    This could cause spurious value errors to appear.
==1035==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==1035== Warning: set address range perms: large range [0x7a400000, 0x8c3ff000) (noaccess)
Kernel cache found and read successfully.
cuTENSOR    :    812 GFLOPs/s
cuTENSOR JIT:    812 GFLOPs/s
Speedup: 1.0x
JIT Compilation time: 0.0 seconds (Kernel cache file was read successfully; Compilation was not required)
==1035== Invalid read of size 1
==1035==    at 0x4842A39: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1035==    by 0x4E382A1: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035==    by 0x4E37419: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035==    by 0x4DC1E0D: cutensorWriteKernelCacheToFile (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035==    by 0x112994: main (contraction_jit.cu:482)
==1035==  Address 0xc90385cc is not stack'd, malloc'd or (recently) free'd
==1035==
==1035==
==1035== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1035==  Access not within mapped region at address 0xC90385CC
==1035==    at 0x4842A39: memmove (in /usr/lib/x86_64-linux-gnu/valgrind/vgpreload_memcheck-amd64-linux.so)
==1035==    by 0x4E382A1: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035==    by 0x4E37419: ??? (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035==    by 0x4DC1E0D: cutensorWriteKernelCacheToFile (in /opt/nvidia/hpc_sdk/Linux_x86_64/25.3/math_libs/12.8/targets/x86_64-linux/lib/libcutensor.so.2.1.0)
==1035==    by 0x112994: main (contraction_jit.cu:482)
==1035==  If you believe this happened as a result of a stack
==1035==  overflow in your program's main thread (unlikely but
==1035==  possible), you can try to increase the size of the
==1035==  main thread stack using the --main-stacksize= flag.
==1035==  The main thread stack size used in this run was 8388608.
==1035==
==1035== HEAP SUMMARY:
==1035==     in use at exit: 737,323,325 bytes in 307,997 blocks
==1035==   total heap usage: 450,372 allocs, 142,375 frees, 900,451,450 bytes allocated
==1035==
==1035== LEAK SUMMARY:
==1035==    definitely lost: 0 bytes in 0 blocks
==1035==    indirectly lost: 0 bytes in 0 blocks
==1035==      possibly lost: 143,632 bytes in 1,417 blocks
==1035==    still reachable: 737,179,693 bytes in 306,580 blocks
==1035==         suppressed: 0 bytes in 0 blocks
==1035== Rerun with --leak-check=full to see details of leaked memory
==1035==
==1035== For lists of detected and suppressed errors, rerun with: -s
==1035== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 4 from 4)
Segmentation fault

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions