-
Particle creation takes much longer on Nvidia GPU node (Grace Hopper) compare to a CPU node (intel Cascade Lake). Here is the segment of code we tested. Is there a way to make it also efficient on GPU node
|
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 8 replies
-
Hi lwJi, It looks like your code is relying on managed memory and calling The fastest way is to generate the particles on the GPU instead of the CPU. To do that, you could follow the code here. That routine 1) launches a kernel to count the number of particles that will be added in each cell, 2) calls If you still want to generate the particles on the host, you could also push them back onto a pinned ParticleTile, then copy them all at once to the GPU. Here is an example. That might make the most sense for you, as it looks like you already have a vector of positions and other particle data on the host. |
Beta Was this translation helpful? Give feedback.
Hi lwJi,
It looks like your code is relying on managed memory and calling
push_back
to add each particle to theParticleContainer
one-by-one. That will result in a lot of memory traffic back and forth between the host and the device, which I think is why your code is slow.The fastest way is to generate the particles on the GPU instead of the CPU. To do that, you could follow the code here. That routine 1) launches a kernel to count the number of particles that will be added in each cell, 2) calls
Gpu::exclusive_scan
on the resulting counts to get a set of offsets, then 3) launches a second kernel where threads fill in the data using those offsets.If you still want to generate the partic…