Skip to content

Implement data-specific cartesian index #1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

charleskawczynski
Copy link
Member

This PR defines a "datalayout-specific"-CartesianIndex, which we define getindex/setindex for on DataLayouts, and these implementations do not swap indices, improving gpu memory access patterns.

Our fill! method exclusively uses this new indexing pattern, since there is only one datalayout per broadcast expression. This means that we can specialize on that datalayout's index.

Main:

julia> benchmarkfill!(device, data, 3, "VIJFH" );
Benchmarking ClimaCore fill! for VIJFH DataLayout
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  76.520 μs   4.431 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     77.929 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   78.673 μs ± 44.074 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

              ▁▃▄▆▇█▇▅▆▃▂ ▁                                    
  ▁▁▁▁▁▂▃▃▄▄▆▇█████████████▇▆▆▅▄▄▃▃▂▂▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  76.5 μs         Histogram: frequency by time          81 μs <

 Memory estimate: 560 bytes, allocs estimate: 18.
Benchmarking array fill! for VIJFH DataLayout
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  35.840 μs  348.976 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     37.200 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.383 μs ±   3.625 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                 ▄▆▅▆▇█▅▅▅▄▁                                    
  ▁▁▁▁▁▁▂▂▃▃▄▅▇▇██████████████▇▆▆▆▆▅▄▅▄▃▃▄▃▂▂▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁ ▃
  35.8 μs         Histogram: frequency by time         39.6 μs <

 Memory estimate: 960 bytes, allocs estimate: 30.

julia> @test all(parent(data) .== 3)
Test Passed

This PR:

julia> benchmarkfill!(device, data, 3, "VIJFH" );
Benchmarking ClimaCore fill! for VIJFH DataLayout
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  44.090 μs  340.207 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     45.370 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   45.454 μs ±   3.293 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▁▁▂▅▅▆███▄▁                                        
  ▂▂▂▂▂▂▃▃▅▅▇████████████▆▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▂▂▂▂▂▂▁▂▂▂▂▂▂▂ ▄
  44.1 μs         Histogram: frequency by time         48.5 μs <

 Memory estimate: 560 bytes, allocs estimate: 18.
Benchmarking array fill! for VIJFH DataLayout
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min  max):  35.660 μs  292.658 μs  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     36.990 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   37.080 μs ±   3.062 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

            ▂▂▃▅▅▄▆▇█▆▆▅▂                                       
  ▁▁▁▁▂▃▃▅▆██████████████▇▆▅▄▃▂▂▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▃
  35.7 μs         Histogram: frequency by time         40.3 μs <

 Memory estimate: 960 bytes, allocs estimate: 30.

julia> @test all(parent(data) .== 3)
Test Passed

@charleskawczynski
Copy link
Member Author

Superseded by #1902

@charleskawczynski charleskawczynski deleted the ck/data_specific_cart_ind branch April 27, 2025 18:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant