-
Notifications
You must be signed in to change notification settings - Fork 217
Port thrust::offset_iterator
#4685
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🟨 CI finished in 1h 06m: Pass: 99%/174 | Total: 1d 13h | Avg: 12m 52s | Max: 53m 01s | Hits: 80%/282197
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
c80978b
to
ad433fb
Compare
🟩 CI finished in 1h 05m: Pass: 100%/174 | Total: 1d 11h | Avg: 12m 12s | Max: 44m 33s | Hits: 84%/283333
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
ad433fb
to
c5997ae
Compare
🟨 CI finished in 1h 08m: Pass: 99%/174 | Total: 1d 09h | Avg: 11m 42s | Max: 51m 17s | Hits: 90%/282094
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
c5997ae
to
d4cd8bb
Compare
🟩 CI finished in 1h 13m: Pass: 100%/174 | Total: 1d 07h | Avg: 11m 01s | Max: 42m 49s | Hits: 91%/283333
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
3eee889
to
2c318d0
Compare
🟨 CI finished in 1h 10m: Pass: 74%/174 | Total: 1d 01h | Avg: 8m 42s | Max: 36m 00s | Hits: 99%/156027
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
2c318d0
to
968fbbe
Compare
🟨 CI finished in 1h 08m: Pass: 74%/174 | Total: 1d 01h | Avg: 8m 50s | Max: 35m 27s | Hits: 99%/156027
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
968fbbe
to
172dcdd
Compare
🟨 CI finished in 1h 13m: Pass: 99%/174 | Total: 1d 09h | Avg: 11m 28s | Max: 1h 04m | Hits: 94%/281544
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟩 CI finished in 2h 13m: Pass: 100%/174 | Total: 1d 09h | Avg: 11m 30s | Max: 1h 04m | Hits: 94%/283333
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 174)
# | Runner |
---|---|
123 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
10 | linux-amd64-gpu-rtx2080-latest-1 |
6 | linux-amd64-gpu-rtxa6000-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
0713ee3
to
71436a2
Compare
🟨 CI finished in 3h 46m: Pass: 98%/183 | Total: 1d 10h | Avg: 11m 10s | Max: 50m 49s | Hits: 93%/284040
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
7 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
71436a2
to
536f57f
Compare
🟩 CI finished in 2h 25m: Pass: 100%/183 | Total: 4d 03h | Avg: 32m 45s | Max: 1h 47m | Hits: 58%/284800
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
7 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
🟩 CI finished in 2h 21m: Pass: 100%/183 | Total: 1d 11h | Avg: 11m 39s | Max: 51m 52s | Hits: 91%/284800
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
7 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
0c8d9d3
to
afeaecd
Compare
_CCCL_EXEC_CHECK_DISABLE | ||
[[nodiscard]] _LIBCUDACXX_HIDE_FROM_ABI constexpr const _Iter& base() const& noexcept | ||
{ | ||
return __iter_; | ||
} | ||
|
||
//! @brief Extracts the stored iterator we are offsetting from | ||
_CCCL_EXEC_CHECK_DISABLE | ||
[[nodiscard]] _LIBCUDACXX_HIDE_FROM_ABI constexpr _Iter base() && | ||
{ | ||
return _CUDA_VSTD::move(__iter_); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why don't we offer mutable access to the base iterator? _Iter& base()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that might be unsafe. ranges
try to be really explicit in what you can or cannot do. changing the underlying iterator under the hood is dangerous and could potentially break class invariants
_LIBCUDACXX_HIDE_FROM_ABI constexpr offset_iterator& operator++() | ||
{ | ||
++__offset_; | ||
return *this; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this offset_iterator
implements something else than the Thrust one. The offset in Thrust is always a single value, either as an integer or as an indirectly readable. The offset could be a unique_ptr<int>
or a thrust::device_ptr<int>
as well for example. In Thrust, operator++
is implemented via void advance(difference_type n)
, which always moves the base iterator if the offset is indirectly readable.
See also:
cccl/thrust/thrust/iterator/offset_iterator.h
Lines 142 to 145 in d9f3077
if constexpr (indirect_offset) | |
{ | |
this->base_reference() += n; | |
} |
Several of the iterator moving operations here are affected by the different design here.
template <class _OffsetType> | ||
struct __offset_iterator_offset_value_type<_OffsetType, false> | ||
{ | ||
using type = _CUDA_VSTD::iter_value_t<_OffsetType>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should just be
using type = _CUDA_VSTD::iter_value_t<_OffsetType>; | |
using type = decltype(*_CUDA_VSTD::declval<_OffsetType>()); |
to support non-iterator types.
🟩 CI finished in 1h 38m: Pass: 100%/183 | Total: 1d 19h | Avg: 14m 20s | Max: 1h 27m | Hits: 91%/290238
|
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
CUB | |
Thrust | |
CUDA Experimental | |
stdpar | |
python | |
CCCL C Parallel Library | |
Catch2Helper |
Modifications in project or dependencies?
Project | |
---|---|
CCCL Infrastructure | |
+/- | libcu++ |
+/- | CUB |
+/- | Thrust |
+/- | CUDA Experimental |
+/- | stdpar |
+/- | python |
+/- | CCCL C Parallel Library |
+/- | Catch2Helper |
🏃 Runner counts (total jobs: 183)
# | Runner |
---|---|
129 | linux-amd64-cpu16 |
15 | windows-amd64-cpu16 |
12 | linux-arm64-cpu16 |
12 | linux-amd64-gpu-rtxa6000-latest-1 |
7 | linux-amd64-gpu-rtx2080-latest-1 |
5 | linux-amd64-gpu-h100-latest-1 |
3 | linux-amd64-gpu-rtx4090-latest-1 |
This ports
thrust::offset_iterator
tocuda::offset_iterator
.While we are at it add a bunch of tests.
Currently I did not allow conversions / comparisons between differnet specializations, but those can be added rather easily.
For now we also require both the base iterator and the potential offset iterator to be random access.
We could relax this, but that is a considerable effort especially regarding testing and available APIs.