Skip to content

Add CPU runs to benchmark pipeline #1218

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions .buildkite/Manifest-v1.11.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# This file is machine-generated - editing it directly is not advised

julia_version = "1.11.5"
julia_version = "1.11.4"
manifest_format = "2.0"
project_hash = "c057205b5dfa45d173ab89f45dff9c1d2ab4a915"

Expand Down Expand Up @@ -477,7 +477,7 @@ uuid = "1ecacbb8-0713-4841-9a07-eb5aa8a2d53f"
version = "0.2.15"

[[deps.ClimaLand]]
deps = ["ClimaComms", "ClimaCore", "ClimaDiagnostics", "ClimaTimeSteppers", "ClimaUtilities", "Dates", "DocStringExtensions", "Insolation", "Interpolations", "LazyArtifacts", "LazyBroadcast", "LinearAlgebra", "NCDatasets", "SciMLBase", "StaticArrays", "SurfaceFluxes", "Thermodynamics"]
deps = ["ClimaComms", "ClimaCore", "ClimaDiagnostics", "ClimaTimeSteppers", "ClimaUtilities", "Dates", "DocStringExtensions", "Insolation", "Interpolations", "LazyArtifacts", "LazyBroadcast", "LinearAlgebra", "NCDatasets", "NVTX", "SciMLBase", "StaticArrays", "SurfaceFluxes", "Thermodynamics"]
path = ".."
uuid = "08f4d4ce-cf43-44bb-ad95-9d2d5f413532"
version = "0.16.3"
Expand Down Expand Up @@ -2261,7 +2261,7 @@ version = "2.5.4+0"
[[deps.OpenLibm_jll]]
deps = ["Artifacts", "Libdl"]
uuid = "05823500-19ac-5b8b-9628-191a04bc5112"
version = "0.8.5+0"
version = "0.8.1+4"

[[deps.OpenMPI_jll]]
deps = ["Artifacts", "CompilerSupportLibraries_jll", "Hwloc_jll", "JLLWrappers", "LazyArtifacts", "Libdl", "MPIPreferences", "TOML", "Zlib_jll"]
Expand Down
2 changes: 1 addition & 1 deletion .buildkite/Manifest.toml
Original file line number Diff line number Diff line change
Expand Up @@ -474,7 +474,7 @@ uuid = "1ecacbb8-0713-4841-9a07-eb5aa8a2d53f"
version = "0.2.15"

[[deps.ClimaLand]]
deps = ["ClimaComms", "ClimaCore", "ClimaDiagnostics", "ClimaTimeSteppers", "ClimaUtilities", "Dates", "DocStringExtensions", "Insolation", "Interpolations", "LazyArtifacts", "LazyBroadcast", "LinearAlgebra", "NCDatasets", "SciMLBase", "StaticArrays", "SurfaceFluxes", "Thermodynamics"]
deps = ["ClimaComms", "ClimaCore", "ClimaDiagnostics", "ClimaTimeSteppers", "ClimaUtilities", "Dates", "DocStringExtensions", "Insolation", "Interpolations", "LazyArtifacts", "LazyBroadcast", "LinearAlgebra", "NCDatasets", "NVTX", "SciMLBase", "StaticArrays", "SurfaceFluxes", "Thermodynamics"]
path = ".."
uuid = "08f4d4ce-cf43-44bb-ad95-9d2d5f413532"
version = "0.16.3"
Expand Down
36 changes: 29 additions & 7 deletions .buildkite/target/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -30,13 +30,39 @@ steps:

- wait

- group: "Benchmark (CPU)"
steps:
- label: ":bucket: Bucket (CPU)"
command: "julia --color=yes --project=.buildkite experiments/benchmarks/bucket.jl"
env:
CLIMACOMMS_DEVICE: CPU
agents:
slurm_mem: 8GB
artifact_paths:
- "bucket_benchmark_cpu/flame*html"

- group: "Target Benchmark"
- label: ":droplet: RichardsModel (CPU)"
command: "julia --color=yes --project=.buildkite experiments/benchmarks/richards.jl"
env:
CLIMACOMMS_DEVICE: CPU
agents:
slurm_mem: 8GB
artifact_paths:
- "richards_benchmark_cpu/flame*html"

- label: ":snow_capped_mountain: Snowy Land (CPU)"
command: "julia --color=yes --project=.buildkite experiments/benchmarks/snowy_land.jl"
env:
CLIMACOMMS_DEVICE: CPU
agents:
slurm_mem: 8GB
artifact_paths:
- "snowy_land_benchmark_cpu/flame*html"

- group: "Target Benchmark (GPU)"
steps:
- label: ":bucket: Bucket"
command: "julia --color=yes --project=.buildkite experiments/benchmarks/bucket.jl"
artifact_paths:
- "bucket_benchmark_gpu/flame*html"
env:
CLIMACOMMS_DEVICE: CUDA
agents:
Expand All @@ -45,8 +71,6 @@ steps:

- label: ":droplet: RichardsModel"
command: "julia --color=yes --project=.buildkite experiments/benchmarks/richards.jl"
artifact_paths:
- "richards_benchmark_gpu/flame*html"
env:
CLIMACOMMS_DEVICE: CUDA
agents:
Expand All @@ -55,8 +79,6 @@ steps:

- label: ":snow_capped_mountain: Snowy Land"
command: "julia --color=yes --project=.buildkite experiments/benchmarks/snowy_land.jl"
artifact_paths:
- "snowy_land_benchmark_gpu/flame*html"
env:
CLIMACOMMS_DEVICE: CUDA
agents:
Expand Down
2 changes: 2 additions & 0 deletions Project.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ LazyArtifacts = "4af54fe1-eca0-43a8-85a7-787d91b784e3"
LazyBroadcast = "9dccce8e-a116-406d-9fcc-a88ed4f510c8"
LinearAlgebra = "37e2e46d-f89d-539d-b4ee-838fcccc9c8e"
NCDatasets = "85f8d34a-cbdd-5861-8df4-14fed0d494ab"
NVTX = "5da4648a-3479-48b8-97b9-01cb529c0a1f"
SciMLBase = "0bca4576-84f4-4d90-8ffe-ffa030f20462"
StaticArrays = "90137ffa-7385-5640-81b9-e52037218182"
SurfaceFluxes = "49b00bb7-8bd4-4f2b-b78c-51cd0450215f"
Expand Down Expand Up @@ -55,6 +56,7 @@ LazyArtifacts = "1"
LazyBroadcast = "1.0.0"
LinearAlgebra = "1"
NCDatasets = "0.14"
NVTX = "1.0.0"
SciMLBase = "2.53"
StaticArrays = "1.6"
StatsBase = "0.34"
Expand Down
15 changes: 8 additions & 7 deletions experiments/benchmarks/bucket.jl
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ using Dates
using Test
import ClimaComms
ClimaComms.@import_required_backends
import CUDA
import ClimaUtilities.TimeVaryingInputs: TimeVaryingInput

import ClimaTimeSteppers as CTS
Expand Down Expand Up @@ -59,9 +60,7 @@ device = ClimaComms.device()
device_suffix = device isa ClimaComms.CPUSingleThreaded ? "cpu" : "gpu"

earth_param_set = ClimaLand.Parameters.LandParameters(FT);
outdir = "bucket_benchmark_$(device_suffix)"
@info "device: $device"
!ispath(outdir) && mkpath(outdir)


function setup_prob(t0, tf, Δt; nelements = (200, 7))
# We set up the problem in a function so that we can make multiple copies (for profiling)
Expand Down Expand Up @@ -150,8 +149,7 @@ function setup_prob(t0, tf, Δt; nelements = (200, 7))
updateat = collect(t0:(3Δt):tf)
drivers = ClimaLand.get_drivers(model)
updatefunc = ClimaLand.make_update_drivers(drivers)
driver_cb = ClimaLand.DriverUpdateCallback(updateat, updatefunc)
cb = SciMLBase.CallbackSet(driver_cb)
cb = ClimaLand.DriverUpdateCallback(updateat, updatefunc)

return prob, cb
end
Expand All @@ -175,6 +173,9 @@ function setup_simulation(; greet = false)
end
parsed_args = parse_commandline()
profiler = parsed_args["profiler"]
outdir = "bucket_benchmark_$(device_suffix)"
@info "device: $device"
!ispath(outdir) && mkpath(outdir)
prob, ode_algo, Δt, cb = setup_simulation(; greet = true)
@info "Starting profiling with $profiler"
if profiler == "flamegraph"
Expand Down Expand Up @@ -213,7 +214,6 @@ if profiler == "flamegraph"
@info "Done profiling"

if ClimaComms.device() isa ClimaComms.CUDADevice
import CUDA
lprob, lode_algo, lΔt, lcb = setup_simulation()
p = CUDA.@profile SciMLBase.solve(
lprob,
Expand Down Expand Up @@ -256,7 +256,8 @@ if profiler == "flamegraph"
@info "Saved allocation flame to $alloc_flame_file"
end

if get(ENV, "BUILDKITE_PIPELINE_SLUG", nothing) == "climaland-benchmark"
if get(ENV, "BUILDKITE_PIPELINE_SLUG", nothing) == "climaland-benchmark" &&
ClimaComms.device() isa ClimaComms.CUDADevice
PREVIOUS_BEST_TIME = 0.333
if average_timing_s > PREVIOUS_BEST_TIME + std_timing_s
@info "Possible performance regression, previous average time was $(PREVIOUS_BEST_TIME)"
Expand Down
5 changes: 3 additions & 2 deletions experiments/benchmarks/richards.jl
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ import NCDatasets

import ClimaComms
ClimaComms.@import_required_backends
import CUDA
import ClimaUtilities.TimeVaryingInputs: TimeVaryingInput
import ClimaUtilities.Regridders: InterpolationsRegridder
import ClimaTimeSteppers as CTS
Expand Down Expand Up @@ -262,7 +263,6 @@ if profiler == "flamegraph"
@info "Done profiling"

if ClimaComms.device() isa ClimaComms.CUDADevice
import CUDA
lprob, lode_algo, lΔt, lcb = setup_simulation()
p = CUDA.@profile SciMLBase.solve(
lprob,
Expand Down Expand Up @@ -305,7 +305,8 @@ if profiler == "flamegraph"
@info "Saved allocation flame to $alloc_flame_file"
end

if get(ENV, "BUILDKITE_PIPELINE_SLUG", nothing) == "climaland-benchmark"
if get(ENV, "BUILDKITE_PIPELINE_SLUG", nothing) == "climaland-benchmark" &&
ClimaComms.device() isa ClimaComms.CUDADevice
PREVIOUS_BEST_TIME = 0.6
if average_timing_s > PREVIOUS_BEST_TIME + std_timing_s
@info "Possible performance regression, previous average time was $(PREVIOUS_BEST_TIME)"
Expand Down
5 changes: 3 additions & 2 deletions experiments/benchmarks/snowy_land.jl
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,7 @@ delete!(ENV, "JULIA_CUDA_MEMORY_POOL")
import SciMLBase
import ClimaComms
ClimaComms.@import_required_backends
import CUDA
import ClimaTimeSteppers as CTS
import ClimaCore
@show pkgversion(ClimaCore)
Expand Down Expand Up @@ -432,7 +433,6 @@ if profiler == "flamegraph"
@info "Done profiling"

if ClimaComms.device() isa ClimaComms.CUDADevice
import CUDA
lprob, lode_algo, lΔt, lcb = setup_simulation()
p = CUDA.@profile SciMLBase.solve(
lprob,
Expand Down Expand Up @@ -474,7 +474,8 @@ if profiler == "flamegraph"
ProfileCanvas.html_file(alloc_flame_file, profile)
@info "Saved allocation flame to $alloc_flame_file"
end
if get(ENV, "BUILDKITE_PIPELINE_SLUG", nothing) == "climaland-benchmark"
if get(ENV, "BUILDKITE_PIPELINE_SLUG", nothing) == "climaland-benchmark" &&
ClimaComms.device() isa ClimaComms.CUDADevice
PREVIOUS_BEST_TIME = 0.74
if average_timing_s > PREVIOUS_BEST_TIME + std_timing_s
@info "Possible performance regression, previous average time was $(PREVIOUS_BEST_TIME)"
Expand Down
16 changes: 8 additions & 8 deletions src/ClimaLand.jl
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ using DocStringExtensions
using ClimaCore
using LazyBroadcast: lazy
import ClimaCore: Fields, Spaces

using NVTX
include("shared_utilities/Parameters.jl")
import .Parameters as LP

Expand Down Expand Up @@ -156,7 +156,7 @@ function make_imp_tendency(land::AbstractLandModel)
map(x -> make_compute_imp_tendency(getproperty(land, x)), components)
update_aux! = make_update_aux(land)
update_boundary_fluxes! = make_update_boundary_fluxes(land)
function imp_tendency!(dY, Y, p, t)
NVTX.@annotate function imp_tendency!(dY, Y, p, t)
update_aux!(p, Y, t)
update_boundary_fluxes!(p, Y, t)
for f! in compute_imp_tendency_list
Expand All @@ -172,7 +172,7 @@ function make_exp_tendency(land::AbstractLandModel)
map(x -> make_compute_exp_tendency(getproperty(land, x)), components)
update_aux! = make_update_aux(land)
update_boundary_fluxes! = make_update_boundary_fluxes(land)
function exp_tendency!(dY, Y, p, t)
NVTX.@annotate function exp_tendency!(dY, Y, p, t)
update_aux!(p, Y, t)
update_boundary_fluxes!(p, Y, t)
for f! in compute_exp_tendency_list
Expand All @@ -186,7 +186,7 @@ function make_update_aux(land::AbstractLandModel)
components = land_components(land)
update_aux_function_list =
map(x -> make_update_aux(getproperty(land, x)), components)
function update_aux!(p, Y, t)
NVTX.@annotate function update_aux!(p, Y, t)
for f! in update_aux_function_list
f!(p, Y, t)
end
Expand All @@ -198,7 +198,7 @@ function make_update_boundary_fluxes(land::AbstractLandModel)
components = land_components(land)
update_fluxes_function_list =
map(x -> make_update_boundary_fluxes(getproperty(land, x)), components)
function update_boundary_fluxes!(p, Y, t)
NVTX.@annotate function update_boundary_fluxes!(p, Y, t)
for f! in update_fluxes_function_list
f!(p, Y, t)
end
Expand All @@ -210,7 +210,7 @@ function make_compute_jacobian(land::AbstractLandModel)
components = land_components(land)
compute_jacobian_function_list =
map(x -> make_compute_jacobian(getproperty(land, x)), components)
function compute_jacobian!(jacobian, Y, p, dtγ, t)
NVTX.@annotate function compute_jacobian!(jacobian, Y, p, dtγ, t)
for f! in compute_jacobian_function_list
f!(jacobian, Y, p, dtγ, t)
end
Expand Down Expand Up @@ -242,7 +242,7 @@ the same function for the component models.

The `sfc_cache` field is available as scratch space.
"""
function total_energy_per_area!(
NVTX.@annotate function total_energy_per_area!(
surface_field,
land::AbstractLandModel,
Y,
Expand Down Expand Up @@ -276,7 +276,7 @@ the same function for the component models.

The `sfc_cache` field is available as scratch space.
"""
function total_liq_water_vol_per_area!(
NVTX.@annotate function total_liq_water_vol_per_area!(
surface_field,
land::AbstractLandModel,
Y,
Expand Down
8 changes: 4 additions & 4 deletions src/integrated/land.jl
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ end
"""
ClimaLand.land_components(land::LandModel)

Returns the components of the `LandModel`.
Returns the components of the `LandModel`.

Currently, this method is required in order to preserve an ordering in how
we update the component models' auxiliary states. The canopy update_aux! step
Expand Down Expand Up @@ -311,7 +311,7 @@ function make_update_boundary_fluxes(
update_canopy_bf! = make_update_boundary_fluxes(land.canopy)
update_snow_bf! = make_update_boundary_fluxes(land.snow)

function update_boundary_fluxes!(p, Y, t)
NVTX.@annotate function update_boundary_fluxes!(p, Y, t)
earth_param_set = land.soil.parameters.earth_param_set
# update root extraction
update_root_extraction!(p, Y, t, land) # defined in src/integrated/soil_canopy_root_interactions.jl
Expand Down Expand Up @@ -377,7 +377,7 @@ where the canopy LAI is zero. Note also that this serves the role of
`canopy_radiant_energy_fluxes!`, which computes the net canopy radiation
when the Canopy is run in standalone mode.
"""
function lsm_radiant_energy_fluxes!(
NVTX.@annotate function lsm_radiant_energy_fluxes!(
p,
land::LandModel{FT},
canopy_radiation::Canopy.AbstractRadiationModel{FT},
Expand Down Expand Up @@ -535,7 +535,7 @@ end
compute_liquid_influx(p,
model,
prognostic_land_components::Val{(:canopy, :snow, :soil, :soilco2)},
)
)

Returns the liquid water volume flux at the surface of the soil; uses
the same method as the soil+snow integrated model.
Expand Down
7 changes: 4 additions & 3 deletions src/shared_utilities/drivers.jl
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ using Insolation
using SurfaceFluxes
import SurfaceFluxes.Parameters as SFP
using StaticArrays
using NVTX
import ..Parameters as LP
export AbstractAtmosphericDrivers,
AbstractRadiativeDrivers,
Expand Down Expand Up @@ -1207,7 +1208,7 @@ function make_update_drivers(driver_tuple::Tuple)
update_driver_list = map(driver_tuple) do (driver)
make_update_drivers(driver)
end
function update_drivers!(p, t)
NVTX.@annotate function update_drivers!(p, t)
for ud! in update_driver_list
ud!(p, t)
end
Expand Down Expand Up @@ -1354,8 +1355,8 @@ The argument `era5_ncdata_path` is either a list of nc files, each with all of t

########## WARNING ##########

High wind speed anomalies (10-100x increase and decrease over a period of a several hours) appear in the ERA5
reanalysis data. These generate very large surface fluxes (due to wind speeds up to 300 m/s), which lead to instability. The kwarg max_wind_speed,
High wind speed anomalies (10-100x increase and decrease over a period of a several hours) appear in the ERA5
reanalysis data. These generate very large surface fluxes (due to wind speeds up to 300 m/s), which lead to instability. The kwarg max_wind_speed,
with a value give in m/s,
is used to clip these if it is not `nothing`.
See: https://confluence.ecmwf.int/display/CKB/ERA5%3A+large+10m+winds
Expand Down
3 changes: 2 additions & 1 deletion src/shared_utilities/implicit_timestepping.jl
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ import ClimaCore.MatrixFields: @name, ⋅
using ClimaCore: Spaces
import LinearAlgebra
import LinearAlgebra: I
using NVTX

export make_jacobian,
make_compute_jacobian, set_dfluxBCdY!, FieldMatrixWithSolver
Expand All @@ -24,7 +25,7 @@ function make_jacobian(model::AbstractModel)
update_aux! = make_update_aux(model)
update_boundary_fluxes! = make_update_boundary_fluxes(model)
compute_jacobian! = make_compute_jacobian(model)
function jacobian!(W, Y, p, dtγ, t)
NVTX.@annotate function jacobian!(W, Y, p, dtγ, t)
update_aux!(p, Y, t)
update_boundary_fluxes!(p, Y, t)
compute_jacobian!(W, Y, p, dtγ, t)
Expand Down
5 changes: 4 additions & 1 deletion src/shared_utilities/utils.jl
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@ import ClimaCore
import SciMLBase
import ClimaDiagnostics.Schedules: EveryCalendarDtSchedule
import Dates
using NVTX

using ClimaComms
using ClimaUtilities.ClimaArtifacts
Expand Down Expand Up @@ -661,7 +662,9 @@ function NaNCheckCallback(
cond = let schedule = schedule
(u, t, integrator) -> schedule(integrator)
end
affect! = (integrator) -> call_count_nans_state(integrator.u; mask)
NVTX.@annotate function affect!(integrator)
call_count_nans_state(integrator.u; mask)
end

SciMLBase.DiscreteCallback(cond, affect!)
end
Expand Down
Loading