Closed
Description
Broadcast operations on our most simple expressions result in uncoalesced memory reads for VIJFH
datalayouts, which was observed via nsight compute.
Issues are being analyzed/outlined in the following benchmark scripts:
benchmarks/scripts/index_swapping.jl
benchmarks/scripts/indexing_and_static_ndranges.jl
benchmarks/scripts/thermo_bench_bw.jl
benchmarks/scripts/benchmark_offset.jl
### Tasks
- [ ] https://github.com/CliMA/ClimaCore.jl/pull/1926