You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We use pytest-xdist to speed-up test execution, and invoke pytest -n auto.
Executing tests using N processes, we scale GPU allocation footprint by a factor of N as well (in the worst case scenario). This risks running into spurious OutOfMemoryError exceptions of our own doing.
A simple solution would be to introduce a pytest mark, say pytest.mark.large to mark those tests that make large GPU memory allocations and exclude them using pytest command line argument -m "not large" in CI jobs.
The lock would be used around blocks making and using GPU allocations, and would need to make sure to release GPU allocations before releasing the lock. Doing so would overlap JIT-ting steps, allocation and execution steps, and host validation steps while making sure that GPU allocation/execution/validation steps are serialized.
The text was updated successfully, but these errors were encountered:
We use pytest-xdist to speed-up test execution, and invoke
pytest -n auto
.Executing tests using
N
processes, we scale GPU allocation footprint by a factor ofN
as well (in the worst case scenario). This risks running into spuriousOutOfMemoryError
exceptions of our own doing.A simple solution would be to introduce a pytest mark, say
pytest.mark.large
to mark those tests that make large GPU memory allocations and exclude them usingpytest
command line argument-m "not large"
in CI jobs.Alternative solution might be to introduce
exclusive_gpu_use_lock
based onFileLock
as in https://pytest-xdist.readthedocs.io/en/latest/how-to.html#making-session-scoped-fixtures-execute-only-onceThe lock would be used around blocks making and using GPU allocations, and would need to make sure to release GPU allocations before releasing the lock. Doing so would overlap JIT-ting steps, allocation and execution steps, and host validation steps while making sure that GPU allocation/execution/validation steps are serialized.
The text was updated successfully, but these errors were encountered: