-
Couldn't load subscription status.
- Fork 415
[cuTENSOR] Automatically enable/disable TensorFloat32 #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
ec627be to
5c03ab7
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Greptile Overview
Greptile Summary
This PR adds a new Python package for cuTENSOR with PyTorch and TensorFlow bindings, implementing automatic TensorFloat32 (TF32) detection based on GPU architecture. The core improvement queries torch.backends.cuda.matmul.allow_tf32 and tf.config.experimental.tensor_float_32_execution_enabled() at build time to conditionally define the CUTENSOR_USE_TF32 macro, ensuring the wrapper falls back to standard FP32 on non-Ampere GPUs. The PR also updates several C++ cuTENSOR examples to use CUTENSOR_COMPUTE_32F instead of CUTENSOR_R_MIN_32F, removing hardcoded TF32 assumptions and making examples universally compatible. The Python package includes full build infrastructure (setup.py, MANIFEST.in, package metadata), comprehensive test suites for both frameworks, C++ extension kernels with shape inference, and utility functions for Einstein notation parsing. Additionally, build configuration is updated to support Windows and add the new einsum example target.
Changed Files
| Filename | Score | Overview |
|---|---|---|
| .gitignore | 5/5 | Adds root-level .gitignore to exclude .vscode/ directory from version control |
| cuTENSOR/python/cutensor/__init__.py | 4/5 | New package init file importing metadata from package_info module |
| cuTENSOR/python/MANIFEST.in | 5/5 | Adds directive to include header files in source distribution |
| cuTENSOR/python/einsum.h | 0/5 | Empty file content provided - cannot review |
| cuTENSOR/elementwise_permute.cu | 5/5 | Removes unused argc/argv parameters from main function signature |
| cuTENSOR/python/cutensor/tensorflow/__init__.py | 4/5 | New TensorFlow wrapper init exposing einsum function; shebang has incorrect spacing |
| cuTENSOR/elementwise_trinary.cu | 5/5 | Removes unused command-line parameters from main function |
| cuTENSOR/elementwise_binary.cu | 5/5 | Removes unused argc/argv parameters from main function |
| cuTENSOR/python/cutensor/torch/__init__.py | 4/5 | New PyTorch wrapper init exposing einsum functionality |
| cuTENSOR/python/README.md | 2/5 | Adds documentation with critical syntax errors: malformed markdown link and missing colon in function definition |
| cuTENSOR/README.md | 5/5 | New README with Linux/Windows build instructions; minor formatting issues |
| cuTENSOR/Makefile | 4/5 | Adds einsum target and switches to CUTENSOR_ROOT env var; einsum missing from clean target |
| cuTENSOR/python/.gitignore | 5/5 | Comprehensive Python .gitignore covering build artifacts and development environments |
| cuTENSOR/python/cutensor/c_extensions.py | 4/5 | Defines extension build configuration for PyTorch and TensorFlow; shebang spacing issue |
| cuTENSOR/reduction.cu | 5/5 | Changes compute type from TF32 to standard FP32; removes unused main parameters |
| cuTENSOR/python/cutensor/package_info.py | 4/5 | Adds package metadata; download URL points to non-existent repository path |
| cuTENSOR/python/cutensor/tensorflow/einsum_module.cc | 2/5 | Creates Python C extension module; incorrectly calls Py_Initialize() in module init |
| cuTENSOR/contraction.cu | 5/5 | Switches from TF32 to FP32 compute type; removes unused main parameters |
| cuTENSOR/CMakeLists.txt | 1/5 | Adds Windows support and cublasLt dependency; critical bug with CUTENSOR_LIBRARY_DEF undefined on non-Windows |
| cuTENSOR/contraction_simple.cu | 5/5 | Changes compute type to FP32 and makes algorithm configurable; removes unused parameters |
| cuTENSOR/python/cutensor/torch/einsum.py | 3/5 | Implements PyTorch einsum with autograd; lacks validation for empty/malformed einsum paths |
| cuTENSOR/einsum.cu | 4/5 | New einsum example with templated implementation; minor memory allocation inefficiency |
| cuTENSOR/python/cutensor/torch/einsum_test.py | 4/5 | Comprehensive PyTorch tests; contains redundant self-assignments |
| cuTENSOR/python/cutensor/tensorflow/einsum_kernel.cc | 4/5 | TensorFlow kernel implementation; workspace size calculation may truncate |
| cuTENSOR/python/cutensor/tensorflow/einsum_test.py | 4/5 | Comprehensive TensorFlow tests; shebang syntax error |
| cuTENSOR/python/cutensor/common.py | 4/5 | Einsum subscript normalization utility; typo in exception message and missing input validation |
| cuTENSOR/python/cutensor/torch/einsum.cc | 3/5 | PyTorch C++ bindings; TF32 conditional logic for float32 not visible in this file |
| cuTENSOR/contraction_plan_cache.cu | 4/5 | New plan caching example; missing error checks on cudaMemcpy and malloc null check |
| cuTENSOR/python/cutensor/tensorflow/einsum.py | 3/5 | TensorFlow einsum wrapper; glob unpacking will crash if binding file count != 1 |
| cuTENSOR/python/cutensor/c_extensions_utils.py | 2/5 | Build utilities with TF32 detection; missing nvcc null check and return statement in Tensorflow method |
| cuTENSOR/python/setup.py | 2/5 | Package setup script; imports from cutensor package before installation will cause ImportError |
| cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc | 2/5 | TensorFlow op registration; uses assert() which causes undefined behavior in release builds |
Confidence score: 2/5
- This PR requires careful review and testing before merging due to multiple critical bugs that will cause runtime failures
- Score reflects serious issues: CMake configuration bug on Linux (CUTENSOR_LIBRARY_DEF undefined), Python package installation failures (imports before setup), missing null checks causing crashes (nvcc path, malloc), and unsafe assertions in TensorFlow ops
- Pay close attention to cuTENSOR/CMakeLists.txt (Linux build failure), cuTENSOR/python/setup.py (installation broken), cuTENSOR/python/cutensor/c_extensions_utils.py (nvcc crash), cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc (release build undefined behavior), and cuTENSOR/python/cutensor/tensorflow/einsum_module.cc (Py_Initialize misuse)
Additional Comments (40)
-
cuTENSOR/python/einsum.h, line 378-380 (link)logic: Memory leak: cachelines allocated with
new[]but never freed. Consider using unique_ptr or documenting ownership transfer to cuTENSOR. Does cutensorHandleAttachPlanCachelines take ownership of the cachelines array? -
cuTENSOR/python/cutensor/tensorflow/__init__.py, line 1 (link)syntax: space before
/usr/bin/pythonin shebang breaks the shebang functionality -
cuTENSOR/python/README.md, line 3 (link)syntax: broken markdown syntax: backtick before opening square bracket should be removed
-
cuTENSOR/python/README.md, line 41 (link)syntax: missing colon at end of function definition
-
cuTENSOR/README.md, line 7 (link)style: Trailing whitespace after 'Linux'
-
cuTENSOR/README.md, line 36 (link)style: Add
%wrapper around PATH -
cuTENSOR/Makefile, line 14 (link)logic:
einsumbinary is built but not removed bymake clean -
cuTENSOR/python/cutensor/c_extensions.py, line 1 (link)syntax: Remove space after
#in shebang (should be#!/usr/bin/python) -
cuTENSOR/python/cutensor/package_info.py, line 43 (link)logic: URL path appears incorrect - should be
/tree/master/cuTENSOR/pythonrather than/tree/master/cuTENSOR/cutensorsince the package is in thepythondirectory, not acutensorsubdirectory -
cuTENSOR/python/cutensor/tensorflow/einsum_module.cc, line 45 (link)logic:
Py_Initialize()should not be called in module initialization—Python is already initialized. This can cause crashes or double-initialization issues. Remove this call. -
cuTENSOR/CMakeLists.txt, line 71-76 (link)logic: CUTENSOR_LIBRARY_DEF is only set for WIN32 but used unconditionally on line 79
-
cuTENSOR/python/cutensor/torch/einsum.py, line 109-136 (link)logic: Missing error handling if path is empty or tensors list becomes empty during iteration. Could cause UnboundLocalError if
resultis never assigned. What happens when np.einsum_path returns an empty path or when the initial tensors list is empty? -
cuTENSOR/einsum.cu, line 258 (link)syntax: typo in comment: 'wor_raw' should be 'work_raw'
-
cuTENSOR/einsum.cu, line 396-397 (link)style: allocating memory for B_raw even when B_shape is empty (for reduction-only operations). Should memory allocation be skipped when B_shape is empty?
-
cuTENSOR/python/cutensor/torch/einsum_test.py, line 138 (link)style: redundant assignment that has no effect
-
cuTENSOR/python/cutensor/torch/einsum_test.py, line 149 (link)style: redundant assignment that has no effect
-
cuTENSOR/python/cutensor/torch/einsum_test.py, line 208 (link)style: redundant assignment that has no effect
-
cuTENSOR/python/cutensor/torch/einsum_test.py, line 215 (link)style: redundant assignment that has no effect
-
cuTENSOR/python/cutensor/tensorflow/einsum_kernel.cc, line 97 (link)logic: potential issue: work_tensor_size calculation assumes worksize is always divisible by sizeof(float). If worksize is not a multiple of 4, this will truncate and allocate insufficient workspace. Is worksize guaranteed to be a multiple of sizeof(float)?
-
cuTENSOR/python/cutensor/tensorflow/einsum_kernel.cc, line 84-85 (link)style: Einsum object is constructed on every Compute() call. If construction is expensive and the equation is fixed, consider caching the Einsum instance as a member variable
-
cuTENSOR/python/cutensor/tensorflow/einsum_test.py, line 1 (link)syntax: Space between shebang and Python path will prevent execution on some systems
-
cuTENSOR/python/cutensor/common.py, line 1 (link)syntax: shebang has unnecessary space after
# -
cuTENSOR/python/cutensor/common.py, line 41 (link)syntax: 'Elipsis' is misspelled - should be 'Ellipsis'
-
cuTENSOR/python/cutensor/common.py, line 34-36 (link)logic: splitting on
->returns a 2-element list but doesn't validate - if subscript contains multiple->, this will silently ignore elements after the second. Should this validate that subscript contains exactly one '->' delimiter? -
cuTENSOR/python/cutensor/torch/einsum.cc, line 40-52 (link)logic: The PR description mentions automatically enabling/disabling TensorFloat32 based on PyTorch configuration, but these type traits are hardcoded to always use CUTENSOR_R_MIN_32F for half/bfloat16 types. For float32 types (not shown in these specializations), the compute type should conditionally check torch::allow_tf32_cublas() to set either TF32 or standard F32 compute types. Where is the TF32auto-detection logic for float32 tensors? The PR description states this addresses unsupported TF32 on non-Ampere GPUs, but this file doesn't show conditional compute type selection.
-
cuTENSOR/contraction_plan_cache.cu, line 371 (link)logic: Missing error check on cudaMemcpy return value - if copy fails, timing results will be invalid
-
cuTENSOR/contraction_plan_cache.cu, line 192-193 (link)logic: Missing null check after malloc - if allocation fails, the program will crash on line 193
-
cuTENSOR/python/cutensor/tensorflow/einsum.py, line 44 (link)logic: unpacking single value from glob result may fail if multiple binding files match pattern or none exist. consider error handling. Should this handle cases where multiple binding files exist or none are found?
-
cuTENSOR/python/cutensor/tensorflow/einsum.py, line 82 (link)syntax:
tf.logging.warnis deprecated in TensorFlow 2.x -
cuTENSOR/python/cutensor/c_extensions_utils.py, line 43-47 (link)logic:
find_executable('nvcc')can return None if nvcc is not in PATH. This would cause AttributeError on line 44 (os.path.dirname(None)) and unpack error on line 47 (.group(1)on None result). -
cuTENSOR/python/cutensor/c_extensions_utils.py, line 73 (link)syntax: Typo: 'seperate' should be 'separate'
-
cuTENSOR/python/cutensor/c_extensions_utils.py, line 127 (link)logic: Missing return statement. The Tensorflow method should return
ext(like Torch does on line 96). -
cuTENSOR/python/setup.py, line 34-41 (link)logic: Imports from cutensor package before it's installed. These will fail during initial setup unless cutensor is already in sys.path or imported dynamically at setup time. Consider moving package metadata to a separate file that doesn't depend on cutensor imports, or using importlib to load dynamically. How is the cutensor package available during setup? Is there a init.py that makes these imports work before installation?
-
cuTENSOR/python/setup.py, line 43-50 (link)style: Missing common setup() parameters like author, author_email, classifiers, python_requires, install_requires, and long_description. These help with package discovery and dependency management on PyPI.
-
cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc, line 54 (link)logic: asserts terminate in release builds with undefined behavior if condition is false; use TF_RET_CHECK or return proper error status instead
-
cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc, line 59 (link)logic: same issue: assert in shape inference can cause undefined behavior in release builds
-
cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc, line 65 (link)logic: assert will not provide error feedback to TensorFlow; should check condition and return error Status
-
cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc, line 70 (link)logic: replace assert with proper error handling that returns Status
-
cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc, line 66 (link)style: modeA.size() returns size_t (unsigned); comparing with signed int can cause issues if Rank returns negative values or for very large sizes
-
cuTENSOR/python/cutensor/tensorflow/einsum_ops.cc, line 81-82 (link)logic: if modeC contains a character not in dim_map, this will insert a default-constructed (invalid) DimensionHandle; should validate that all modes in modeC exist in dim_map
32 files reviewed, 40 comments
|
Hi @Linux-cpp-lisp! We have changed the license of the repository from BSD-3 to Apache 2.0 to allow accepting external contributions in the right way. What this means is that
What this means for this PR: Thanks! |
(Thanks to @springer13 for making your work on a PyTorch cuTENSOR wrapper public!)
Currently, the Python cuTENSOR wrapper always uses TensorFloat32 as the compute dtype for 32-bit float tensors, which is unsupported on non-Ampere GPUs. This PR uses the PyTorch and Tensorflow configuration options for TensorFloat32 (which autodetect Ampere) to set the compute dtype to normal 32-bit float when tf32 is not supported.