TACC Open Hackathon 2024

Some notes for organizing our efforts

!!! Need at least three people for every day

Agenda (all times CST)

Tues Oct 8 10 AM – 11:30 AM online
- Meet with mentor
Tues Oct 15 9 AM – 5 PM online
- Cluster intro
- Introductory team presentations
- Work with mentor
Tues Oct 22 – Thurs Oct 24 9 AM – 5 PM hybrid
- Work on code with mentor

Our Goals

Primary

Improve MPI scaling for Parthenon applications with many separately enrolled fields

Ideas
- Use smaller fixed-space communication buffers that greedily fill and send repeatedly until all data is exchanged
- Use contiguous buffers large enough to accommodate all fields (not respecting sparsity)
- Others?
Example problem: [parthenon_vibe, advection, fine_advection]
- Modify example to vary number of separately enrolled fields at runtime

Secondary

Improve buffer kernel performance for few (large) blocks

Sample input (using plain advection example)

<parthenon/job>
problem_id = advection

<parthenon/mesh>
refinement = none

nx1 = 256
x1min = -0.5
x1max = 0.5
ix1_bc = periodic
ox1_bc = periodic

nx2 = 256
x2min = -0.5
x2max = 0.5
ix2_bc = periodic
ox2_bc = periodic

nx3 = 256
x3min = -0.5
x3max = 0.5
ix3_bc = periodic
ox3_bc = periodic

<parthenon/meshblock>
nx1 = 128
nx2 = 128
nx3 = 128

<parthenon/time>
nlim = 25
tlim = 1.0
integrator = rk2
ncycle_out_mesh = -10000

<Advection>
cfl = 0.45
vx = 1.0
vy = 1.0
vz = 1.0
profile = hard_sphere

refine_tol = 0.3    # control the package specific refinement tagging function
derefine_tol = 0.03
compute_error = false
num_vars = 1 # number of variables
vec_size = 10 # size of each variable
fill_derived = false # whether to fill one-copy test vars

Current performance

Sample performance on a single GH200 (ran above with block sizes of 64, 128 and 256):

nb64.out:|-> 6.62e-02 sec 3.6% 100.0% 0.0% ------ 51 boundary_communication.cpp::96::SendBoundBufs [for]
nb128.out:|-> 1.44e-01 sec 11.0% 100.0% 0.0% ------ 51 boundary_communication.cpp::96::SendBoundBufs [for]
nb256.out:|-> 5.45e-01 sec 25.9% 100.0% 0.0% ------ 51 boundary_communication.cpp::96::SendBoundBufs [for]

nb64.out:|-> 8.81e-02 sec 4.8% 100.0% 0.0% ------ 51 boundary_communication.cpp::274::SetBounds [for]
nb128.out:|-> 1.69e-01 sec 12.9% 100.0% 0.0% ------ 51 boundary_communication.cpp::274::SetBounds [for]
nb256.out:|-> 6.44e-01 sec 30.6% 100.0% 0.0% ------ 51 boundary_communication.cpp::274::SetBounds [for]

Diagnose (and improve?) particle efficiency at scale

Example problem: particles-example

Multigrid performance

Example problem:

NCCL/RCCL evaluation

This would be a heavy lift to fully implement
Example problem:

CUDA asynchronous memory copies

Example problem:

Team

Ben Ryan

Secondary goal interests
- Particle scaling

Luke

Secondary goal interests
- Multigrid parallel performance

Philipp

Secondary goal interests
- Improve buffer kernel performance for few (large) blocks

Patrick

Secondary goal interests

Alex

Secondary goal interests

Nirmal

Secondary goal interests

Ben Prather

Secondary goal interests
- Single-meshblock bottlenecks
- Interface for downstreams to add CUDA async copies?

Jonah

Secondary goal interests

TACC Open Hackathon 2024

Agenda (all times CST)

Our Goals

Primary

Improve MPI scaling for Parthenon applications with many separately enrolled fields

Secondary

Improve buffer kernel performance for few (large) blocks

Sample input (using plain advection example)

Current performance

Diagnose (and improve?) particle efficiency at scale

Multigrid performance

NCCL/RCCL evaluation

CUDA asynchronous memory copies

Team

Ben Ryan

Luke

Philipp

Patrick

Alex

Nirmal

Ben Prather

Jonah

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!