-
Notifications
You must be signed in to change notification settings - Fork 222
CCCL 2.x ‐ CCCL 3.0 migration guide
The CCCL team plans breaking changes carefully and only conducts them at major releases. The CCCL 2.8 release came with many deprecations to prepare for the breaking changes conducted in CCCL 3.0. This page summarizes the changes and helps migrating from CCCL 2.x to CCCL 3.0.
See also the list of all deprecated APIs in CCCL 2.8 and the list of breaking changes in CCCL 3.0.
-
CUB_IS_INT128_ENABLED
: No replacement -
CUB_MAX(a, b)
: Use thecuda::std::max(a, b)
function instead -
CUB_MIN(a, b)
: Use thecuda::std::min(a, b)
function instead -
CUB_QUOTIENT_CEILING(a, b)
: Usecuda::ceil_div(a, b)
instead -
CUB_QUOTIENT_FLOOR(a, b)
: Use plain integer divisiona / b
instead -
CUB_ROUND_DOWN_NEAREST(a, b)
: Usecuda::round_down(a, b)
instead -
CUB_ROUND_UP_NEAREST(a, b)
: Usecuda::round_up(a, b)
instead -
CUB_RUNTIME_ENABLED
: No replacement -
CUB_USE_COOPERATIVE_GROUPS
: No replacement -
CUDA_CUB_RET_IF_FAIL
: No replacement -
[THRUST|CUB]_CLANG_VERSION
: No replacement -
[THRUST|CUB]_DEVICE_COMPILER*
: No replacement -
[THRUST|CUB]_GCC_VERSION
: No replacement -
[THRUST|CUB]_HOST_COMPILER*
: No replacement -
[THRUST|CUB]_INCLUDE_DEVICE_CODE
: No replacement -
[THRUST|CUB]_INCLUDE_HOST_CODE
: No replacement -
[THRUST|CUB]_IS_DEVICE_CODE
: No replacement -
[THRUST|CUB]_IS_HOST_CODE
: No replacement -
[THRUST|CUB]_MSVC_VERSION_FULL
: No replacement -
[THRUST|CUB]_MSVC_VERSION
: No replacement -
THRUST_CDP_DISPATCH
: No replacement (Support for CUDA Dynamic Parallelism V1 (CDPv1) has been removed, see below) -
THRUST_DECLTYPE_RETURNS_WITH_SFINAE_CONDITION
: No replacement -
THRUST_DECLTYPE_RETURNS
: No replacement -
THRUST_DEVICE_CODE
: No replacement -
THRUST_HOST_BACKEND
: UseTHRUST_HOST_SYSTEM
instead -
THRUST_INLINE_CONSTANT
: Useinline constexpr
instead -
THRUST_INLINE_INTEGRAL_MEMBER_CONSTANT
: Usestatic constexpr
instead -
THRUST_LEGACY_GCC
: No replacement -
THRUST_MODERN_GCC_REQUIRED_NO_ERROR
: No replacement -
THRUST_MODERN_GCC
: No replacement -
THRUST_MVCAP
: No replacement -
THRUST_NODISCARD
: Use[[nodiscard]]
instead -
THRUST_RETOF1
: No replacement -
THRUST_RETOF2
: No replacement -
THRUST_RETOF
: No replacement -
THRUST_TUNING_ARCH
: No direct replacement. Use compiler-specific__CUDA_ARCH__
(nvcc) or__NVCOMPILER_CUDA_ARCH__
(nvc++) instead
-
_ReadWriteBarrier
and__thrust_compiler_fence
: Usecuda::atomic
instead -
cub::*Kernel
: Any CUB kernel entrypoint is considered an implementation detail. No public exposure is provided. -
cub::Agent*
: CUB agents were considered implementation details and have all been moved to internal namespaces. No public exposure is provided. -
cub::AliasTemporaries
: No replacement -
cub::ArrayWrapper
: Usecuda::std::array
instead -
cub::BAR
: No replacement -
cub::BaseTraits::CATEGORY
: Use the facilities from<cuda/std/type_traits>
instead -
cub::BaseTraits::NULL_TYPE
: No replacement -
cub::BaseTraits::PRIMITIVE
: Use the facilities from<cuda/std/type_traits>
instead -
cub::BFE
: Usecuda::bitfield_extract
instead -
cub::BFI
: Usecuda::bitfield_insert
instead -
cub::BinaryOpHasIdxParam::HAS_PARAM
: Usecub::BinaryOpHasIdxParam::value
instead -
cub::ConstantInputIterator
: Usethrust::constant_iterator
instead -
cub::CountingInputIterator
: Usethrust::counting_iterator
instead -
cub::CTA_SYNC_AND
: Use__syncthreads_and()
instead -
cub::CTA_SYNC_OR
: Use__syncthreads_or()
instead -
cub::CTA_SYNC
: Use__syncthreads()
instead -
cub::Device*Policy
: Those policy hubs are considered implementation details. No public exposure is provided. -
cub::DeviceSpmv
: Use cuSPARSE instead -
cub::Difference
: Usecuda::std::minus
instead -
cub::DivideAndRoundUp
: Usecuda::round_up
instead -
cub::Division
: Usecuda::std::divides
instead -
cub::Equality
: Usecuda::std::equal_to
instead -
cub::FFMA_RZ
: No replacement -
cub::FMUL_RZ
: No replacement -
cub::FpLimits<T>
: Usecuda::std::numeric_limits<T>
instead -
cub::GridBarrier
: Use the APIs from cooperative groups instead -
cub::GridBarrierLifetime
: Use the APIs from cooperative groups instead -
cub::IADD3
: No replacement -
cub::Inequality
: Usecuda::std::not_equal_to
instead -
cub::Int2Type
: Usecuda::std::integral_constant
instead -
cub::IterateThreadLoad
: No replacement -
cub::IterateThreadStore
: No replacement -
cub::KernelConfig
: No replacement -
cub::LaneId()
: Usecuda::ptx::get_sreg_laneid()
instead -
cub::LaneMaskGe()
: Usecuda::ptx::get_sreg_lanemask_ge()
instead -
cub::LaneMaskGt()
: Usecuda::ptx::get_sreg_lanemask_gt()
instead -
cub::LaneMaskLe()
: Usecuda::ptx::get_sreg_lanemask_le()
instead -
cub::LaneMaskLt()
: Usecuda::ptx::get_sreg_lanemask_lt()
instead -
cub::MakePolicyWrapper
: No replacement -
cub::Max
: Usecuda::maximum
instead -
cub::max
: Usecuda::std::max
instead -
cub::MemBoundScaling
: No replacement -
cub::Min
: Usecuda::minimum
instead -
cub::min
: Usecuda::std::min
instead -
cub::Mutex
: Usestd::mutex
instead -
cub::PolicyWrapper
: No replacement -
cub::PRMT
: Usecuda::ptx::prmt()
instead -
cub::RegBoundScaling
: No replacement -
cub::SHFL_IDX_SYNC
: Use__shfl_sync()
instead -
cub::SHL_ADD
: No replacement -
cub::SHR_ADD
: No replacement -
cub::Sum
: Usecuda::std::plus
instead -
cub::Swap(a, b)
: Usecuda::std::swap(a, b)
instead -
cub::ThreadTrap()
: Usecuda::std::terminate()
instead -
cub::TransformInputIterator
: Usethrust::transform_iterator
instead -
cub::TripleChevronFactory
: No replacement for now, we are working on a new kernel launch facility -
cub::ValueCache
: No replacement -
cub::WARP_ALL
: Use__all_sync()
instead -
cub::WARP_ANY
: Use__any_sync()
instead -
cub::WARP_BALLOT
: Use__ballot_sync()
instead -
cub::WARP_SYNC
: Use__syncwarp()
instead -
cub::WarpId()
: Usecuda::ptx::get_sreg_warpid()
instead -
thrust::*::[first_argument_type|second_argument_type|result_type]
: The nested aliases have been removed for all function object types:thrust::[plus|minus|multiplies|divides|modulus|negate|square|equal_to|not_equal_to|greater|less|greater_equal|less_equal|logical_and|logical_or|logical_not|bit_and|bit_or|bit_xor|identity|maximum|minimum|project1st|project2nd]
. No replacement. -
thrust::[unary|binary]_function
: No replacement. If you inherit from one of these types, just remove those base classes. -
thrust::[unary|binary]_traits
: No replacement. -
thrust::async::*
: No replacement for now. We are working on a C++26 senders implementation. For make a thrust algorithm skip syncing, usethrust::cuda::par_nosync
as execution policy. -
thrust::bidirectional_universal_iterator_tag
: No replacement -
thrust::conjunction_value<Ts...>
: Usecuda::std::bool_constant<(Ts && ...)>
instead -
thrust::conjunction_value_v<Ts...>
: Use a fold expression:Ts && ...
instead -
thrust::cuda_cub::core::*
: Those are considered implementation details. No public exposure is provided. -
thrust::cuda_cub::counting_iterator_t
: Usethrust::counting_iterator
instead -
thrust::cuda_cub::identity
: Usecuda::std::identity
instead -
thrust::cuda_cub::launcher::triple_chevron
: No replacement for now, we are working on a new kernel launch facility -
thrust::cuda_cub::terminate
: Usecuda::std::terminate()
instead -
thrust::cuda_cub::transform_input_iterator_t
: Usethrust::transform_iterator
instead -
thrust::cuda_cub::transform_pair_of_input_iterators_t
: Usethrust::transform_iterator of a thrust::zip_iterator
instead -
thrust::disjunction_value<Ts...>
: Usecuda::std::bool_constant<(Ts || ...)>
instead -
thrust::disjunction_value_v<Ts...>
: Use a fold expression:Ts || ...
instead -
thrust::forward_universal_iterator_tag
: No replacement -
thrust::identity<T>
: Usecuda::std::identity
instead. Ifthrust::identity
was used to perform a cast toT
, please define your own function object. -
thrust::input_universal_iterator_tag
: No replacement -
thrust::negation_value<T>
: Usecuda::std::bool_constant<!T>
instead -
thrust::negation_value_v<T>
: Use a plain negation!T
-
thrust::not[1|2]
: Usecuda::std::not_fn
instead -
thrust::null_type
: No replacement -
thrust::numeric_limits<T>
: Usecuda::std::numeric_limits<T>
instead -
thrust::optional<T>
: Usecuda::std::optional<T>
instead. -
thrust::output_universal_iterator_tag
: No replacement -
thrust::random_access_universal_iterator_tag
: No replacement -
thrust::remove_cvref[_t]
: Usecuda::std::remove_cvref[_t]
instead -
thrust::void_t
: Usecuda::std::void_t
instead
-
cub::Traits<T>::Max()
: Usecuda::std::numeric_limits<T>::max()
instead -
cub::Traits<T>::Min()
: Usecuda::std::numeric_limits<T>::min()
instead -
cub::MergePathSearch
: No replacement -
thrust::iterator_difference[_t]<T>
: Usecuda::std::iterator_traits<T>::difference_type
orcuda::std::iter_difference_t<T>
instead -
thrust::iterator_pointer[_t]<T>
: Usecuda::std::iterator_traits<T>::pointer
instead -
thrust::iterator_reference[_t]<T>
: Usecuda::std::iterator_traits<T>::reference
orcuda::std::iter_reference_t<T>
instead -
thrust::iterator_traits<T>
: Usecuda::std::iterator_traits<T>
instead -
thrust::iterator_value[_t]<T>
: Usecuda::std::iterator_traits<T>::value_type
orcuda::std::iter_value_t<T>
instead -
THRUST_FALSE
: No replacement -
THRUST_UNKNOWN
: No replacement -
THRUST_UNUSED_VAR
: No replacement -
THRUST_PREVENT_MACRO_SUBSTITUTION
: No replacement -
THRUST_STATIC_ASSERT(expr)
: Usestatic_assert(expr)
instead -
THRUST_TRUE
: No replacement
-
cub::Block*
: All trailingint LEGACY_PTX_ARCH
template parameters have been removed -
cub::CachingAllocator
: The constructor taking a trailingbool debug
parameter has been removed -
cub::Device*
: All overloads with a trailingbool debug_synchronous
parameter have been removed -
cub::Dispatch*
: All Boolean template parameters have been replaced by enumerations to increase readability -
cub::Dispatch*
: All policy hub template parameters have been moved to the back of the template parameters list -
cub::DispatchScan[ByKey]
: The offset type must be an unsigned type of at least 4-byte size -
cuda::ceil_div
: Now returns the common type of its arguments -
thrust::pair
: Is now an alias tocuda::std::pair
and no longer a distinct type -
thrust::tabulate_output_iterator
: Thevalue_type
has been fixed to bevoid
-
thrust::transform_iterator
: Upon copying, will now always copy its contained function. If the contained function is neither copy constructible nor copy assignable, the iterator fails to compile when attempting to be copied. -
thrust::tuple
: Is now an alias tocuda::std::tuple
and no longer a distinct type -
thrust::universal_host_pinned_memory_resource
: The alias has changed to a different memory resource, potentially changing pointer types derived from an allocator/container using this memory resource. - The following Thrust function object types have been made aliases to the equally-named types in
cuda::std::
:thrust::[plus|minus|multiplies|divides|modulus|negate|equal_to|not_equal_to|greater|less|greater_equal|less_equal|logical_and|logical_or|logical_not|bit_and|bit_or|bit_xor|identity|maximum|minimum]
. No replacement. -
CUB_DEFINE_DETECT_NESTED_TYPE
: The generated detector trait no longer provides a::VALUE
member. Use::value
instead.
cuda::std::iterator_traits
will now correctly recognize user-provided specializations of std::iterator_traits
.
All of Thrust's iterator traits have been redefined in terms of cuda::std::iterator_traits
,
and users should prefer to use iterator traits from libcu++.
thrust::iterator_traits
can no longer be specialized.
Users should prefer to specialize cuda::std::iterator_traits
instead of std::iterator_traits
when necessary,
to make their iterators work equally in device code.
The functionality and internal use of cub::Traits
has been minimized, because libcu++ provides better and standard alternatives.
Only the use in CUB's radix sort implementation for bit-twiddling remains.
Floating-point limits should be obtained using cuda::std::numeric_limits<T>
instead of cub::FpLimits<T>
.
Classification of types should be done with the facilities from <cuda/std/type_traits>
and <cuda/type_traits>
,
notably with cuda::std::is_signed[_v]
, cuda::std::is_integral[_v]
, etc.
There is an important difference for extended floating point types though:
Since cuda::std::is_floating_point[_v]
will only recognize C++ standard floating point types,
cuda::is_floating_point[_v]
must be used to correctly classify extended floating point types like __half
or __nv_bfloat16
.
cub::BaseTraits
and cub::Traits
can no longer be specialized for custom types, and cub::FpLimits
has been removed.
We acknowledge the need to provide user-defined floating point types though,
e.g., registering a custom half type with CUB to be used in radix sort.
Therefore, users can still specialize cub::NumericTraits
for their custom floating point types,
inheriting from cub::BaseTraits
and providing the necessary information for the type.
Additionally, the traits from libcu++ have to be specialized as well:
For example, a custom floating point type my_half
could be registered with CUB and libcu++ like this:
template <>
inline constexpr bool ::cuda::is_floating_point_v<my_half> = true;
template <>
class ::cuda::std::numeric_limits<my_half> {
public:
static constexpr bool is_specialized = true;
static __host__ __device__ my_half max() { return /* TODO */; }
static __host__ __device__ my_half min() { return /* TODO */; }
static __host__ __device__ my_half lowest() { return /* TODO */; }
};
template <>
struct CUB_NS_QUALIFIER::NumericTraits<my_half> : BaseTraits<FLOATING_POINT, true, uint16_t, my_half> {};
-
cub::DeviceReduce::[Arg][Max|Min]
: Will now usecuda::std::numeric_limits<T>::[max|min]()
instead ofcub::Traits
to determine the initial value -
cuda::std::mdspan
: The implementation was entirely rewritten and you may experience subtle behavioral changes -
thrust::transform_iterator
: The logic to determine the reference type has been reworked, especially wrt. to functions that return references to their own arguments (e.g.,thrust::identity
). -
thrust::transform_iterator::difference_type
: The logic to select the difference type has been reworked. It's now eitherint
orptrdiff
.
- All of libcu++'s old ABI namespaces have been removed
- At least C++17 is required
- At least clang 14 is required
- At least GCC 7 is required
- On Windows, at least Visual Studio 2019 is required (MSC_VER >= 1920)
- Intel ICC (
icpx
) is no longer supported - At least CUDA Toolkit 12.0 is required
- Support for CUDA Dynamic Parallelism V1 (CDPv1) has been removed
- At least a GPU with compute capability 50 (Maxwell) is required