uses current CUDAStream correctly #118

soumith · 2025-07-08T15:10:07Z

Thanks to @vlejd for finding the issue, debugging and reporting it.

This commit fixes GitHub issue pytorch/pytorch#157363 where custom CUDA kernels were not properly synchronized with PyTorch's CUDA stream when used with torch.compile in reduce-overhead mode. Changes: - Add #include <ATen/cuda/CUDAContext.h> for getCurrentCUDAStream() - Use at::cuda::getCurrentCUDAStream() to get PyTorch's current CUDA stream - Launch all kernels with the correct stream parameter The issue occurred because custom kernels launched on the default CUDA stream while PyTorch operations (like nn.Linear) run on PyTorch's managed stream. This created race conditions where custom kernels would execute before PyTorch operations completed, resulting in incorrect output values. With this fix, all custom kernels are properly synchronized with PyTorch's CUDA stream, ensuring correct execution order and preventing race conditions when used with torch.compile. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

Added comprehensive tests to verify the fix for GitHub issue pytorch/pytorch#157363: 1. test_compile_with_linear_layer: - Tests custom CUDA kernels with nn.Linear + torch.compile - Verifies correct behavior with various input sizes (1000, 5000, 10000) - Uses reduce-overhead mode to reproduce the original issue conditions 2. test_compile_custom_only: - Tests custom operations without linear layers - Ensures custom operations work correctly with torch.compile These tests ensure that custom CUDA kernels properly synchronize with PyTorch's CUDA stream when used with torch.compile, preventing race conditions that previously caused incorrect outputs. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

zou3519

i see you have embraced claude code

test/test_extension.py

Replace manual tolerance specification with self.assertEqual which automatically handles appropriate tolerances for tensor comparisons. This makes the tests more concise and follows PyTorch testing conventions. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <noreply@anthropic.com>

soumith and others added 2 commits July 8, 2025 10:40

facebook-github-bot added the cla signed label Jul 8, 2025

soumith requested a review from zou3519 July 8, 2025 15:11

jansel approved these changes Jul 8, 2025

View reviewed changes

zou3519 approved these changes Jul 8, 2025

View reviewed changes

zou3519 reviewed Jul 8, 2025

View reviewed changes

test/test_extension.py Outdated Show resolved Hide resolved

zou3519 reviewed Jul 8, 2025

View reviewed changes

test/test_extension.py Outdated Show resolved Hide resolved

soumith merged commit 0ec4969 into master Jul 8, 2025
1 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

uses current CUDAStream correctly #118

uses current CUDAStream correctly #118

soumith commented Jul 8, 2025

Uh oh!

zou3519 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

uses current CUDAStream correctly #118

uses current CUDAStream correctly #118

Conversation

soumith commented Jul 8, 2025

Uh oh!

zou3519 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!