-
Notifications
You must be signed in to change notification settings - Fork 217
Check cuda::memcpy_async
preconditions
#4700
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me would like some other eyes on this
/ok to test 123f07d |
🟨 CI finished in 1h 11m: Pass: 76%/174 | Total: 1d 03h | Avg: 9m 27s | Max: 35m 51s | Hits: 95%/165318
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
/ok to test d7bbc10 |
🟨 CI finished in 2h 45m: Pass: 81%/174 | Total: 1d 05h | Avg: 10m 08s | Max: 35m 08s | Hits: 97%/187940
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
/ok to test 284945f |
🟩 CI finished in 1h 15m: Pass: 100%/174 | Total: 1d 06h | Avg: 10m 31s | Max: 36m 23s | Hits: 97%/282514
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks great!! one more thing that can be useful.
We could check that the pointers live in the right memory space, i.e. shared/global
It is handled inside the |
/ok to test 1590854 |
🟩 CI finished in 5h 34m: Pass: 100%/183 | Total: 1d 09h | Avg: 10m 50s | Max: 44m 33s | Hits: 95%/290820
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
7 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
The preconditions of
cuda::memcpy_async
are described in the documentation, however, they are not checked in the code.This PR adds assert statements to
cuda::aligned_size_t
andcuda::memcpy_async
checking the preconditions.