Skip to content

[cuda.parallel]: CI testing should exclude tests making large GPU memory allocations #4722

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
oleksandr-pavlyk opened this issue May 16, 2025 · 0 comments · Fixed by #4723
Closed
Assignees
Labels
cuda.parallel For all items related to the cuda.parallel Python module

Comments

@oleksandr-pavlyk
Copy link
Contributor

We use pytest-xdist to speed-up test execution, and invoke pytest -n auto.

Executing tests using N processes, we scale GPU allocation footprint by a factor of N as well (in the worst case scenario). This risks running into spurious OutOfMemoryError exceptions of our own doing.

A simple solution would be to introduce a pytest mark, say pytest.mark.large to mark those tests that make large GPU memory allocations and exclude them using pytest command line argument -m "not large" in CI jobs.

Alternative solution might be to introduce exclusive_gpu_use_lock based on FileLock as in https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-once

The lock would be used around blocks making and using GPU allocations, and would need to make sure to release GPU allocations before releasing the lock. Doing so would overlap JIT-ting steps, allocation and execution steps, and host validation steps while making sure that GPU allocation/execution/validation steps are serialized.

@oleksandr-pavlyk oleksandr-pavlyk self-assigned this May 16, 2025
@oleksandr-pavlyk oleksandr-pavlyk added the cuda.parallel For all items related to the cuda.parallel Python module label May 16, 2025
@github-project-automation github-project-automation bot moved this to Todo in CCCL May 16, 2025
@cccl-authenticator-app cccl-authenticator-app bot moved this from Todo to In Review in CCCL May 16, 2025
@github-project-automation github-project-automation bot moved this from In Review to Done in CCCL May 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.parallel For all items related to the cuda.parallel Python module
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

1 participant