-
Notifications
You must be signed in to change notification settings - Fork 218
Add NVTX nests guard back in CUB unit test conditionally based on Thrust entries #4583
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually. Contributors can view more details about this message here. |
471486f
to
a4a67e1
Compare
🟨 CI finished in 6h 39m: Pass: 96%/129 | Total: 2d 12h | Avg: 28m 18s | Max: 6h 00m | Hits: 93%/151088
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 129)
# | Runner |
---|---|
89 | linux-amd64-cpu16 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟨 CI finished in 4d 17h: Pass: 96%/129 | Total: 2d 12h | Avg: 28m 18s | Max: 6h 00m | Hits: 93%/151088
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 129)
# | Runner |
---|---|
89 | linux-amd64-cpu16 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟨 CI finished in 6h 38m: Pass: 96%/129 | Total: 2d 18h | Avg: 31m 05s | Max: 6h 00m | Hits: 85%/150958
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 129)
# | Runner |
---|---|
89 | linux-amd64-cpu16 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
6 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟨 CI finished in 7h 06m: Pass: 96%/138 | Total: 2d 17h | Avg: 28m 35s | Max: 6h 00m | Hits: 93%/151488
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 138)
# | Runner |
---|---|
95 | linux-amd64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx2080-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
4074a98
to
7084fc6
Compare
🟨 CI finished in 6h 40m: Pass: 96%/138 | Total: 2d 17h | Avg: 28m 31s | Max: 6h 00m | Hits: 93%/151488
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 138)
# | Runner |
---|---|
95 | linux-amd64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx2080-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟨 CI finished in 16h 16m: Pass: 97%/138 | Total: 2d 17h | Avg: 28m 32s | Max: 6h 00m | Hits: 93%/151488
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 138)
# | Runner |
---|---|
95 | linux-amd64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx2080-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
01a7db4
to
20d06ec
Compare
🟨 CI finished in 6h 37m: Pass: 89%/138 | Total: 2d 02h | Avg: 21m 47s | Max: 6h 00m | Hits: 94%/147786
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 138)
# | Runner |
---|---|
95 | linux-amd64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx2080-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟨 CI finished in 20h 01m: Pass: 97%/138 | Total: 2d 17h | Avg: 28m 17s | Max: 6h 00m | Hits: 94%/157121
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 138)
# | Runner |
---|---|
95 | linux-amd64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx2080-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟨 CI finished in 6h 40m: Pass: 97%/138 | Total: 2d 16h | Avg: 27m 53s | Max: 6h 00m | Hits: 94%/157121
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 138)
# | Runner |
---|---|
95 | linux-amd64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx2080-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
I think the CI test failures are real:
Running
I assume the |
It takes too much time in Thrust-API-heavy unit tests like cub.test.device_segmented_sort_keys.lid_0
cd7fe3e
to
3737aa1
Compare
I pushed a refactoring that avoids |
🟩 CI finished in 2h 49m: Pass: 100%/138 | Total: 1d 18h | Avg: 18m 24s | Max: 56m 48s | Hits: 95%/162119
|
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
libcu++ | |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 138)
# | Runner |
---|---|
95 | linux-amd64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
11 | windows-amd64-cpu16 |
10 | linux-arm64-cpu16 |
4 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx2080-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
fixes #4768.
Based on top of #4537 fixes the nest guard that was removed because CUB tests were using Thrust algorithms which in turn call CUB primitives underneath, thus creating nests (that are not problematic).
The nest guard's original purpose is so we do not add any new CUB device algorithms that are dependent on other CUB algorithms but are calling them from the device level and not the dispatch layer. We add this check back.
Adding new PR as #4537 is already green and don't want to re-trigger reviews.