Skip to content

fix compilation problems with MSVC+Cuda 12.9 #27522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 18, 2025
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 10 additions & 2 deletions cmake/OpenCVDetectCUDAUtils.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -388,8 +388,16 @@ macro(ocv_nvcc_flags)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=-fno-finite-math-only)
endif()

if(WIN32 AND NOT (CUDA_VERSION VERSION_LESS "11.2"))
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
if(WIN32)
if (NOT (CUDA_VERSION VERSION_LESS "11.2"))
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
endif()
if(CUDA_VERSION GREATER "12.7")
Copy link
Contributor

@cudawarped cudawarped Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which compilation errors do you get with this version of CUDA without the --expt-relaxed-constexpr flag? I'm interested because I can build against CUDA 12.8 on windows without this experimental flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get a lot of errors like :

D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/warp/reduce.hpp(140): error : calling a constexpr __host__ function("make_tuple") from a __device__ function("smem_tuple") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
5>        return make_tuple((volatile T0*) t0, (volatile T1*) t1);
5>               ^
5>            detected during:
5>              instantiation of "cuda::std::__4::tuple<volatile T0 *, volatile T1 *> cv::cudev::smem_tuple(T0 *, T1 *) [with T0=float, T1=float]" at line 379 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=float, BLOCK_SIZE=256]" at line 412 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 421 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 460 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 206 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 349 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 137 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/expr/reduction.hpp
5>              instantiation of "void cv::cudev::FindMinMaxValExprBody<SrcPtr>::assignTo(cv::cudev::GpuMat_<T> &, cv::cuda::Stream &) const [with SrcPtr=cv::cudev::GpuMat_<uchar>, T=float]" at line 325 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T> &cv::cudev::GpuMat_<T>::operator=(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 319 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T>::GpuMat_(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 130 of D:\github\chacha21\opencv_contrib\modules\cudev\test\test_reduction.cu

Copy link
Contributor

@cudawarped cudawarped Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You get this with CUDA 12.8 and the latest version of Visual Studio 2022? I only see those errors with CUDA 12.9 without the addition of -Xcompiler /Zc:preprocessor.

I'm building with Ninja are you building directly in VS?


Update: I can't recreate this, can you provide your CMake arguments, the version of VS, CUDA and CMake you are using please. I can build OpenCV without the --expt-relaxed-constexpr flag in VS 17.14.7 with CUDA 12.8, and CMake 4.01 using both the Ninja and VS generator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I can't reproduce it either with 12.8. I must have fooled myself among my different experiments.
I will update the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chacha21 Is that flag needed at all? In my testing only the -Xcompiler /Zc:preprocessor is required to fix the bug. See my proposed fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chacha21 Yes, details below.

-- OpenCV modules:
-- To be built: aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape signal stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
-- Disabled: world
-- Disabled by dependency: -
-- Unavailable: alphamat cannops cvv fastcv freetype hdf java julia matlab ovis python2 sfm viz
-- Applications: tests perf_tests examples apps
-- Documentation: doxygen python
-- Non-free algorithms: NO

The only addition I had to make was the fix in opencv/opencv_contrib#3968 to get cudev to build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that on my build system, I do need "--expt-relaxed-constexpr", otherwise the first compilation errors start with opencv_test_cudev


D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/warp/reduce.hpp(140): error : calling a constexpr __host__ function("make_tuple") from a __device__ function("smem_tuple") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
5>        return make_tuple((volatile T0*) t0, (volatile T1*) t1);
5>               ^
5>            detected during:
5>              instantiation of "cuda::std::__4::tuple<volatile T0 *, volatile T1 *> cv::cudev::smem_tuple(T0 *, T1 *) [with T0=float, T1=float]" at line 379 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=float, BLOCK_SIZE=256]" at line 412 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 421 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 460 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 206 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 349 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 137 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/expr/reduction.hpp
5>              instantiation of "void cv::cudev::FindMinMaxValExprBody<SrcPtr>::assignTo(cv::cudev::GpuMat_<T> &, cv::cuda::Stream &) const [with SrcPtr=cv::cudev::GpuMat_<uchar>, T=float]" at line 325 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T> &cv::cudev::GpuMat_<T>::operator=(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 319 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T>::GpuMat_(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 130 of D:\opencv_contrib-4.12.0\modules\cudev\test\test_reduction.cu
5>

CMakeCache.txt

Copy link
Contributor

@cudawarped cudawarped Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chacha21 Is that with the fix I linked to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is with


  if(WIN32)
	if (NOT (CUDA_VERSION VERSION_LESS "11.2"))
		set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
	endif()
	if(CUDA_VERSION GREATER "12.8")
		#set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} --expt-relaxed-constexpr)
		set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=/Zc:preprocessor)
	endif()
  endif()

Copy link
Contributor

@cudawarped cudawarped Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the cudev error you need to apply the PR I linked to. Without it I get the errors you mentioned above even with the experimental flag (--expt-relaxed-constexpr).

set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} --expt-relaxed-constexpr)
endif()
if(CUDA_VERSION GREATER "12.8")
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler /Zc:preprocessor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the rest of the file I would use -Xcompiler=/Zc:preprocessor

endif()
endif()

if(CMAKE_CROSSCOMPILING AND (ARM OR AARCH64))
Expand Down
Loading