Restrict Register Per Thread in CUDA #4574
-
How to restrict the maximum register usage per thread in amrex, when I are trying to profile the GPU program? There are still output information like 'ptxas info : Used 240 registers, 592 bytes cmem[0]' during compilation. |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 10 replies
-
It seems to work for me . |
Beta Was this translation helpful? Give feedback.
-
I still think you had a typo.
Could you provide more detail so that we can see exactly what flags nvcc gets? For example, I can see
|
Beta Was this translation helpful? Give feedback.
-
I have tried another case within a workspace without PelePhysics. In this case, I have hard-coded the per thread register limit in nvcc from However, the issue does not get improved at all. It reports as
|
Beta Was this translation helpful? Give feedback.
-
I thought
I guess for some of the kernels it's impossible to run without bumping up the register counts. Then there is probably nothing you can do. |
Beta Was this translation helpful? Give feedback.
I thought
maxrregcount
is a hard ceiling for nvcc. But it is not. https://docs.nvidia.com/cuda/cuda-compiler-driver-nvcc/index.html#maxrregcount-amount-maxrregcountI guess for some of the kernels it's impossible to run without bumping up the register counts. Then there is probably nothing you can do.