Skip to content

fix compilation problems with MSVC+Cuda 12.9 #27522

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: 4.x
Choose a base branch
from

Conversation

chacha21
Copy link
Contributor

@chacha21 chacha21 commented Jul 8, 2025

fix for #27521

Actually, when ENABLE_CUDA_FIRST_CLASS_LANGUAGE is enabled, the fix it not necessary. However, even when ENABLE_CUDA_FIRST_CLASS_LANGUAGE is enabled, I have checked that the fix is harmless So I propose to keep it simple for now and enable the fix whatever the state of ENABLE_CUDA_FIRST_CLASS_LANGUAGE

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

fix for opencv#27521

Actually, when ENABLE_CUDA_FIRST_CLASS_LANGUAGE is enabled, the fix it not necessary.
However, even when ENABLE_CUDA_FIRST_CLASS_LANGUAGE is enabled, I have cjhecked that the fix is harmless
So I propsoe to keep it simple for now and enable the fix whatever the state of ENABLE_CUDA_FIRST_CLASS_LANGUAGE
if (NOT (CUDA_VERSION VERSION_LESS "11.2"))
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
endif()
if(CUDA_VERSION GREATER "12.7")
Copy link
Contributor

@cudawarped cudawarped Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which compilation errors do you get with this version of CUDA without the --expt-relaxed-constexpr flag? I'm interested because I can build against CUDA 12.8 on windows without this experimental flag.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can get a lot of errors like :

D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/warp/reduce.hpp(140): error : calling a constexpr __host__ function("make_tuple") from a __device__ function("smem_tuple") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
5>        return make_tuple((volatile T0*) t0, (volatile T1*) t1);
5>               ^
5>            detected during:
5>              instantiation of "cuda::std::__4::tuple<volatile T0 *, volatile T1 *> cv::cudev::smem_tuple(T0 *, T1 *) [with T0=float, T1=float]" at line 379 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=float, BLOCK_SIZE=256]" at line 412 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 421 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 460 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 206 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 349 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 137 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev/expr/reduction.hpp
5>              instantiation of "void cv::cudev::FindMinMaxValExprBody<SrcPtr>::assignTo(cv::cudev::GpuMat_<T> &, cv::cuda::Stream &) const [with SrcPtr=cv::cudev::GpuMat_<uchar>, T=float]" at line 325 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T> &cv::cudev::GpuMat_<T>::operator=(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 319 of D:\github\chacha21\opencv_contrib\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T>::GpuMat_(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 130 of D:\github\chacha21\opencv_contrib\modules\cudev\test\test_reduction.cu

Copy link
Contributor

@cudawarped cudawarped Jul 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You get this with CUDA 12.8 and the latest version of Visual Studio 2022? I only see those errors with CUDA 12.9 without the addition of -Xcompiler /Zc:preprocessor.

I'm building with Ninja are you building directly in VS?


Update: I can't recreate this, can you provide your CMake arguments, the version of VS, CUDA and CMake you are using please. I can build OpenCV without the --expt-relaxed-constexpr flag in VS 17.14.7 with CUDA 12.8, and CMake 4.01 using both the Ninja and VS generator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, I can't reproduce it either with 12.8. I must have fooled myself among my different experiments.
I will update the PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chacha21 Is that flag needed at all? In my testing only the -Xcompiler /Zc:preprocessor is required to fix the bug. See my proposed fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chacha21 Yes, details below.

-- OpenCV modules:
-- To be built: aruco bgsegm bioinspired calib3d ccalib core cudaarithm cudabgsegm cudacodec cudafeatures2d cudafilters cudaimgproc cudalegacy cudaobjdetect cudaoptflow cudastereo cudawarping cudev datasets dnn dnn_objdetect dnn_superres dpm face features2d flann fuzzy gapi hfs highgui img_hash imgcodecs imgproc intensity_transform line_descriptor mcc ml objdetect optflow phase_unwrapping photo plot python3 quality rapid reg rgbd saliency shape signal stereo stitching structured_light superres surface_matching text tracking ts video videoio videostab wechat_qrcode xfeatures2d ximgproc xobjdetect xphoto
-- Disabled: world
-- Disabled by dependency: -
-- Unavailable: alphamat cannops cvv fastcv freetype hdf java julia matlab ovis python2 sfm viz
-- Applications: tests perf_tests examples apps
-- Documentation: doxygen python
-- Non-free algorithms: NO

The only addition I had to make was the fix in opencv/opencv_contrib#3968 to get cudev to build.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that on my build system, I do need "--expt-relaxed-constexpr", otherwise the first compilation errors start with opencv_test_cudev


D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/warp/reduce.hpp(140): error : calling a constexpr __host__ function("make_tuple") from a __device__ function("smem_tuple") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
5>        return make_tuple((volatile T0*) t0, (volatile T1*) t1);
5>               ^
5>            detected during:
5>              instantiation of "cuda::std::__4::tuple<volatile T0 *, volatile T1 *> cv::cudev::smem_tuple(T0 *, T1 *) [with T0=float, T1=float]" at line 379 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, src_type, work_type>::reduceGrid<BLOCK_SIZE>(work_type *, int) [with src_type=uchar, work_type=float, BLOCK_SIZE=256]" at line 412 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,BLOCK_SIZE,PATCH_X,PATCH_Y,SrcPtr,ResType,MaskPtr>(SrcPtr, ResType *, MaskPtr, int, int) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, BLOCK_SIZE=256, PATCH_X=4, PATCH_Y=4, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 421 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::reduce<Reductor,Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Reductor=cv::cudev::grid_reduce_detail::MinMaxReductor<cv::cudev::grid_reduce_detail::both, uchar, float>, Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 460 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\grid\detail/reduce.hpp
5>              instantiation of "void cv::cudev::grid_reduce_detail::minMaxVal<Policy,SrcPtr,ResType,MaskPtr>(const SrcPtr &, ResType *, const MaskPtr &, int, int, cudaStream_t) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GlobPtr<uchar>, ResType=float, MaskPtr=cv::cudev::WithOutMask]" at line 206 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal_<Policy,SrcPtr,ResType>(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with Policy=cv::cudev::DefaultGlobReducePolicy, SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 349 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/grid/reduce.hpp
5>              instantiation of "void cv::cudev::gridFindMinMaxVal(const SrcPtr &, cv::cudev::GpuMat_<ResType> &, cv::cuda::Stream &) [with SrcPtr=cv::cudev::GpuMat_<uchar>, ResType=float]" at line 137 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev/expr/reduction.hpp
5>              instantiation of "void cv::cudev::FindMinMaxValExprBody<SrcPtr>::assignTo(cv::cudev::GpuMat_<T> &, cv::cuda::Stream &) const [with SrcPtr=cv::cudev::GpuMat_<uchar>, T=float]" at line 325 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T> &cv::cudev::GpuMat_<T>::operator=(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 319 of D:\opencv_contrib-4.12.0\modules\cudev\include\opencv2\cudev\ptr2d\detail/gpumat.hpp
5>              instantiation of "cv::cudev::GpuMat_<T>::GpuMat_(const cv::cudev::Expr<Body> &) [with T=float, Body=cv::cudev::FindMinMaxValExprBody<cv::cudev::GpuMat_<uchar>>]" at line 130 of D:\opencv_contrib-4.12.0\modules\cudev\test\test_reduction.cu
5>

CMakeCache.txt

Copy link
Contributor

@cudawarped cudawarped Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chacha21 Is that with the fix I linked to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is with


  if(WIN32)
	if (NOT (CUDA_VERSION VERSION_LESS "11.2"))
		set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcudafe --display_error_number --diag-suppress 1394,1388)
	endif()
	if(CUDA_VERSION GREATER "12.8")
		#set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} --expt-relaxed-constexpr)
		set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler=/Zc:preprocessor)
	endif()
  endif()

Copy link
Contributor

@cudawarped cudawarped Jul 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the cudev error you need to apply the PR I linked to. Without it I get the errors you mentioned above even with the experimental flag (--expt-relaxed-constexpr).

@asmorkalov asmorkalov added category: build/install category: gpu/cuda (contrib) OpenCV 4.0+: moved to opencv_contrib labels Jul 8, 2025
@asmorkalov asmorkalov added this to the 4.13.0 milestone Jul 8, 2025
endif()
if(CUDA_VERSION GREATER "12.8")
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} --expt-relaxed-constexpr)
set(CUDA_NVCC_FLAGS ${CUDA_NVCC_FLAGS} -Xcompiler /Zc:preprocessor)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the rest of the file I would use -Xcompiler=/Zc:preprocessor

@asmorkalov
Copy link
Contributor

@chacha21 Friendly reminder. Let's drive it to conclusion.

@cudawarped
Copy link
Contributor

Here is the file that can be of interest to detect if I did something wrong
test_reduction.cu.cache.zip

Where is this file located?

Can you provide your CMake command line arguments please, I'm having difficulty recreating the issue, I've tried Visual Studio and Ninja.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants