Skip to content

Particle creation takes a lot time on Nvidia GPU node #4506

Answered by atmyers
lwJi asked this question in Q&A
Discussion options

You must be logged in to vote

Hi lwJi,

It looks like your code is relying on managed memory and calling push_back to add each particle to the ParticleContainer one-by-one. That will result in a lot of memory traffic back and forth between the host and the device, which I think is why your code is slow.

The fastest way is to generate the particles on the GPU instead of the CPU. To do that, you could follow the code here. That routine 1) launches a kernel to count the number of particles that will be added in each cell, 2) calls Gpu::exclusive_scan on the resulting counts to get a set of offsets, then 3) launches a second kernel where threads fill in the data using those offsets.

If you still want to generate the partic…

Replies: 1 comment 8 replies

Comment options

You must be logged in to vote
8 replies
@lwJi
Comment options

@WeiqunZhang
Comment options

@lwJi
Comment options

@atmyers
Comment options

@lwJi
Comment options

Answer selected by lwJi
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants