Skip to content

Commit 2ff6481

Browse files
authored
Merge pull request #3863 from CliMA/j/multi_smooth
Extend the ERA5 driven SCM to multi-day and smooth data
2 parents fc1bacd + acb5d67 commit 2ff6481

File tree

7 files changed

+511
-163
lines changed

7 files changed

+511
-163
lines changed

config/model_configs/prognostic_edmfx_tv_era5driven_column.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ dz_bottom: 30
3232
perturb_initstate: false
3333
dt: "10secs"
3434
dt_rad: "10mins"
35-
t_end: "8hours"
35+
t_end: "30hours"
3636
cloud_model: "quadrature_sgs"
3737
call_cloud_diagnostics_per_stage: true
3838
toml: [toml/prognostic_edmfx_calibrated.toml]

docs/src/single_column_prospect.md

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,12 @@ start_date: "20070701"
4242
site_latitude: 17.0
4343
site_longitude: -149.0
4444
```
45-
By this point, the first 4 entries are intuitive. We need to dispatch over each of these methods to setup the forcing for each component of the model. To obtain the observations, now note that instead of directly specifying a file we must specify a `start_date`, `site_latitude`, and `site_longitude`. This is because we now use `ClimaArtifacts.jl` to store data to ensure reproducibility of our simulation and results. The data is generated by downloading from ECMWF and further documentation for ERA5 data download can be found either directly on the ECMWF page and `ClimaArtifacts.jl`. Note that the profiles, surface temperature, and surface fluxes cannot be obtained from a single request and so together we need 3 files for all the data. We include a script at `src/utils/era5_observations_to_forcing_file.jl` which extracts the profiles and computes the tendencies needed for the simulation from the raw ERA5 reanalysis files. We store the observations directly into an artifact `era5_hourly_atmos_processed` to eliminate the need to reprocess specific sites and locations. This setup means that users are free to choose sites globally at any time at which ERA5 data is available. Unfortunately, global hourly renanalysis is too large to store in an artifact and so we have currently only provided support for the first 5 days of July 2007 in the tropical Pacific, stored in `era5_hourly_atmos_raw`, only available on the `clima` and Caltech `HPC` servers.
45+
By this point, the first 4 entries are intuitive. We need to dispatch over each of these methods to setup the forcing for each component of the model. To obtain the observations, now note that instead of directly specifying a file we must specify a `start_date`, `site_latitude`, and `site_longitude`. This is because we now use `ClimaArtifacts.jl` to store data to ensure reproducibility of our simulation and results. `start_date` should be in in YYYMMDD format, `site_latitude` should be in degrees between -90 and 90, and `site_longitude` should be between -180 and 180.
46+
47+
!!! note
48+
Depending on the amount of smoothing and data resolution, points near the boundaries will throw index errors. With default settings, users should stay at least 5 points away from the poles (1° for ERA5 data) for smoothing (4 points) and gradients (one extra point).
49+
50+
The data is generated by downloading from ECMWF and further documentation for ERA5 data download can be found either directly on the ECMWF page and `ClimaArtifacts.jl`. Note that the profiles, surface temperature, and surface fluxes cannot be obtained from a single request and so together we need 3 files for all the data. We include a script at `src/utils/era5_observations_to_forcing_file.jl` which extracts the profiles and computes the tendencies needed for the simulation from the raw ERA5 reanalysis files. We store the observations directly into an artifact `era5_hourly_atmos_processed` to eliminate the need to reprocess specific sites and locations. This setup means that users are free to choose sites globally at any time at which ERA5 data is available. Unfortunately, global hourly renanalysis is too large to store in an artifact and so we have currently only provided support for the first 5 days of July 2007 in the tropical Pacific, stored in `era5_hourly_atmos_raw`, only available on the `clima` and Caltech `HPC` servers.
4651
#### Running the Reanalysis-driven case at different times and locations
4752
You need 3 separate files with specific variables and naming convention for the data processing script to work.
4853
1. Hourly profiles with variables, following ERA5 naming convention, including `t`, `q`, `u`, `v`, `w`, `z`, `clwc`, `ciwc`. This file should be stored in the appropriate artifacts directory with the following naming scheme `"forcing_and_cloud_hourly_profiles_$(start_date).nc"` where `start_date` should specify the date data starts on formatted YYYYMMDD. We require `clwc` and `ciwc` profiles because these are typical targets for calibration but are not needed to run the simulation directly.

src/solver/model_getters.jl

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -461,10 +461,8 @@ function get_external_forcing_model(parsed_args, ::Type{FT}) where {FT}
461461
)
462462
@info "External forcing file $(external_forcing_file) does not exist or does not cover the expected time range. Generating it now."
463463
# generate forcing from provided era5 data paths
464-
generate_external_era5_forcing_file(
465-
parsed_args["site_latitude"],
466-
parsed_args["site_longitude"],
467-
parsed_args["start_date"],
464+
generate_multiday_era5_external_forcing_file(
465+
parsed_args,
468466
external_forcing_file,
469467
FT,
470468
)

src/utils/era5_observations_to_forcing_file.jl

Lines changed: 177 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -31,6 +31,10 @@ function get_external_forcing_file_path(
3131
@clima_artifact("era5_hourly_atmos_processed"),
3232
)
3333
start_date = parsed_args["start_date"]
34+
t_end = get(parsed_args, "t_end", "23hours") # generate a single day file if t_end is not specified
35+
end_time =
36+
DateTime(start_date, "yyyymmdd") + Dates.Second(time_to_seconds(t_end))
37+
end_date = Dates.format(end_time, "yyyymmdd")
3438
# round to era5 quarter degree resolution for site selection
3539
site_latitude = round(parsed_args["site_latitude"] * 4) / 4
3640
site_longitude = round(parsed_args["site_longitude"] * 4) / 4
@@ -42,7 +46,7 @@ function get_external_forcing_file_path(
4246

4347
return joinpath(
4448
data_dir,
45-
"tv_forcing_$(site_latitude)_$(site_longitude)_$(start_date).nc",
49+
"tv_forcing_$(site_latitude)_$(site_longitude)_$(start_date)_$(end_date).nc",
4650
)
4751
end
4852

@@ -99,20 +103,20 @@ function get_horizontal_tendencies(
99103
dy = 2 * π * rearth / 360 * latitudinal_resolution
100104

101105
# get velocities at site location
102-
ᶜu = column_ds["u"][lon_index, lat_index, :, :]
103-
ᶜv = column_ds["v"][lon_index, lat_index, :, :]
106+
ᶜu = smooth_4D_era5(column_ds, "u", lon_index, lat_index)
107+
ᶜv = smooth_4D_era5(column_ds, "v", lon_index, lat_index)
104108

105109
# get temperature at N S E W of center for gradient calculation
106-
ʷT = column_ds["t"][lon_index - 1, lat_index, :, :]
107-
ⁿT = column_ds["t"][lon_index, lat_index + 1, :, :]
108-
ˢT = column_ds["t"][lon_index, lat_index - 1, :, :]
109-
ᵉT = column_ds["t"][lon_index + 1, lat_index, :, :]
110+
ʷT = smooth_4D_era5(column_ds, "t", lon_index - 1, lat_index)
111+
ⁿT = smooth_4D_era5(column_ds, "t", lon_index, lat_index + 1)
112+
ˢT = smooth_4D_era5(column_ds, "t", lon_index, lat_index - 1)
113+
ᵉT = smooth_4D_era5(column_ds, "t", lon_index + 1, lat_index)
110114

111115
# get specific humidity at N S E W of center for gradient calculation
112-
ʷq = column_ds["q"][lon_index - 1, lat_index, :, :]
113-
ⁿq = column_ds["q"][lon_index, lat_index + 1, :, :]
114-
ˢq = column_ds["q"][lon_index, lat_index - 1, :, :]
115-
ᵉq = column_ds["q"][lon_index + 1, lat_index, :, :]
116+
ʷq = smooth_4D_era5(column_ds, "q", lon_index - 1, lat_index)
117+
ⁿq = smooth_4D_era5(column_ds, "q", lon_index, lat_index + 1)
118+
ˢq = smooth_4D_era5(column_ds, "q", lon_index, lat_index - 1)
119+
ᵉq = smooth_4D_era5(column_ds, "q", lon_index + 1, lat_index)
116120

117121
# temperature and specific humidity advective tendency at center
118122
tntha = -(ᶜu .* (ᵉT .- ʷT) ./ (2 * dx) .+ ᶜv .* (ⁿT .- ˢT) ./ (2 * dy))
@@ -182,7 +186,7 @@ function get_coszen_inst(
182186
end
183187

184188
"""
185-
generate_external_era5_forcing_file(lat, lon, start_date, forcing_file_path, FT; time_resolution = 3600, data_dir = @clima_artifact("era5_hourly_atmos_raw"))
189+
generate_external_era5_forcing_file(lat, lon, start_date, forcing_file_path, FT; time_resolution = 3600, data_dir = @clima_artifact("era5_hourly_atmos_raw"), smooth_amount = 4)
186190
187191
Generate an external forcing file for the ClimaAtmos single column model.
188192
@@ -205,13 +209,15 @@ Note:
205209
`2x2x(pressure levels)x(time)`.
206210
- This expansion is necessary for interpolation to the model grid and applies even to surface variables
207211
in the current implementation.
212+
- The end time of the simulation is inferred from the start date and the simulation time, `t_end`.
208213
"""
209214
function generate_external_era5_forcing_file(
210215
lat,
211216
lon,
212217
start_date,
213218
forcing_file_path,
214219
FT;
220+
smooth_amount = 4,
215221
time_resolution = FT(3600), # size of accumulated variable period in seconds (3600 for hourly, 86400 for daily and monthly)
216222
data_dir = @clima_artifact("era5_hourly_atmos_raw"),
217223
)
@@ -243,25 +249,31 @@ function generate_external_era5_forcing_file(
243249
lat_index = findfirst(tvforcing["latitude"][:] .== lat)
244250
@assert lon_index != nothing "Longitude $lon not found in forcing_and_cloud_hourly_profiles_$(start_date).nc"
245251
@assert lat_index != nothing "Latitude $lat not found in forcing_and_cloud_hourly_profiles_$(start_date).nc"
246-
@assert 1 < lon_index < length(tvforcing["longitude"][:]) "Longitude index $lon is not covered by forcing_and_cloud_hourly_profiles_$(start_date).nc"
247-
@assert 1 < lat_index < length(tvforcing["latitude"][:]) "Latitude index $lat is not covered by forcing_and_cloud_hourly_profiles_$(start_date).nc"
252+
@assert smooth_amount + 1 <
253+
lon_index <
254+
length(tvforcing["longitude"][:]) - smooth_amount "Longitude $lon is not covered by profile forcing file with smoothing amount $smooth_amount"
255+
@assert smooth_amount + 1 <
256+
lat_index <
257+
length(tvforcing["latitude"][:]) - smooth_amount "Latitude $lat is not covered by profile forcing file with smoothing amount $smooth_amount"
248258

249259
sim_forcing = Dict()
250260
sim_forcing["time"] = tvforcing["valid_time"][:]
251261
sim_forcing["pressure_level"] = tvforcing["pressure_level"][:]
252-
sim_forcing["ua"] = tvforcing["u"][lon_index, lat_index, :, :]
253-
sim_forcing["va"] = tvforcing["v"][lon_index, lat_index, :, :]
254-
sim_forcing["wap"] = tvforcing["w"][lon_index, lat_index, :, :] # era5 w is in Pa/s, this is confusing notation
255-
sim_forcing["hus"] = tvforcing["q"][lon_index, lat_index, :, :]
256-
sim_forcing["ta"] = tvforcing["t"][lon_index, lat_index, :, :]
257-
sim_forcing["zg"] = tvforcing["z"][lon_index, lat_index, :, :]
258-
sim_forcing["z"] =
259-
tvforcing["z"][lon_index, lat_index, :, :] /
260-
external_tv_params.gravitational_acceleration # height in meters
261262

262-
# add cloud information - this is used for profile calibration and not for the forcing but saves multiple files for profile calibration
263-
sim_forcing["clw"] = tvforcing["clwc"][lon_index, lat_index, :, :]
264-
sim_forcing["cli"] = tvforcing["ciwc"][lon_index, lat_index, :, :]
263+
name_map = clima_to_era5_name_dict()
264+
for clima_name in ["ua", "va", "wap", "hus", "ta", "zg", "clw", "cli"]
265+
era5_name = name_map[clima_name]
266+
sim_forcing[clima_name] = smooth_4D_era5(
267+
tvforcing,
268+
era5_name,
269+
lon_index,
270+
lat_index;
271+
smooth_amount = smooth_amount,
272+
)
273+
end
274+
275+
sim_forcing["z"] =
276+
sim_forcing["zg"] / external_tv_params.gravitational_acceleration # height in meters
265277

266278
# compute subsidence
267279
pressure = tvforcing["pressure_level"] .* 100 # convert hPa to Pa
@@ -372,36 +384,55 @@ function generate_external_era5_forcing_file(
372384
lat_index_surf = findfirst(tv_accum["latitude"][:] .== lat)
373385
@assert lon_index_surf != nothing "Longitude $lon not found in hourly_accum_$(start_date).nc"
374386
@assert lat_index_surf != nothing "Latitude $lat not found in hourly_accum_$(start_date).nc"
375-
matching_time_indices =
376-
findall(in(tvforcing["valid_time"][:]), tv_accum["valid_time"][:])
387+
@assert smooth_amount + 1 <
388+
lon_index_surf <
389+
length(tv_accum["longitude"][:]) - smooth_amount "Longitude $lon is not covered by accumulated forcing file with smoothing amount $smooth_amount"
390+
@assert smooth_amount + 1 <
391+
lat_index_surf <
392+
length(tv_accum["latitude"][:]) - smooth_amount "Latitude $lat is not covered by accumulated forcing file with smoothing amount $smooth_amount"
377393

378394
defVar(ds, "hfls", FT, ("x", "y", "z", "time"))
379395
defVar(ds, "hfss", FT, ("x", "y", "z", "time"))
380-
# sensible and latent heat fluxes are opposite
396+
# sensible and latent heat fluxes are defined upwards in CliMA, also need to divide by the aggregation
381397
slhf =
382-
-tv_accum["slhf"][
398+
-smooth_3D_era5(
399+
tv_accum,
400+
"slhf",
383401
lon_index_surf,
384-
lat_index_surf,
385-
matching_time_indices,
386-
] / time_resolution
402+
lat_index_surf;
403+
smooth_amount = smooth_amount,
404+
) / time_resolution
405+
387406
sshf =
388-
-tv_accum["sshf"][
407+
-smooth_3D_era5(
408+
tv_accum,
409+
"sshf",
389410
lon_index_surf,
390-
lat_index_surf,
391-
matching_time_indices,
392-
] / time_resolution
411+
lat_index_surf;
412+
smooth_amount = smooth_amount,
413+
) / time_resolution
393414

394415
# surface temperature
395416
lon_index_surf2 = findfirst(tv_inst["longitude"][:] .== lon)
396417
lat_index_surf2 = findfirst(tv_inst["latitude"][:] .== lat)
397418
@assert lon_index_surf2 != nothing "Longitude $lon not found in hourly_inst_$(start_date).nc"
398419
@assert lat_index_surf2 != nothing "Latitude $lat not found in hourly_inst_$(start_date).nc"
399-
matching_time_indices =
400-
findall(in(tvforcing["valid_time"][:]), tv_inst["valid_time"][:])
420+
@assert smooth_amount + 1 <
421+
lon_index_surf2 <
422+
length(tv_inst["longitude"][:]) - smooth_amount "Longitude $lon is not covered by accumulated forcing file with smoothing amount $smooth_amount"
423+
@assert smooth_amount + 1 <
424+
lat_index_surf2 <
425+
length(tv_inst["latitude"][:]) - smooth_amount "Latitude $lat is not covered by accumulated forcing file with smoothing amount $smooth_amount"
401426

402427
defVar(ds, "ts", FT, ("x", "y", "z", "time"))
403-
skt =
404-
tv_inst["skt"][lon_index_surf2, lat_index_surf2, matching_time_indices]
428+
skt = smooth_3D_era5(
429+
tv_inst,
430+
"skt",
431+
lon_index_surf2,
432+
lat_index_surf2;
433+
smooth_amount = smooth_amount,
434+
)
435+
405436
for time_ind in 1:ds.dim["time"]
406437
ds["coszen"][:, :, :, time_ind] .= coszen_list[time_ind][1]
407438
ds["rsdt"][:, :, :, time_ind] .= coszen_list[time_ind][2]
@@ -416,3 +447,108 @@ function generate_external_era5_forcing_file(
416447
close(tv_inst)
417448
close(tv_accum)
418449
end
450+
451+
"""
452+
generate_multiday_era5_external_forcing_file(parsed_args, forcing_file_path, FT; smooth_amount = 4, time_resolution = FT(3600), input_data_dir = @clima_artifact("era5_hourly_atmos_raw"), output_data_dir = @clima_artifact("era5_hourly_atmos_processed"))
453+
454+
Generate an external forcing file for multi-day single column model runs, reusing daily forcing files if they already exist.
455+
456+
# Arguments
457+
- `parsed_args`: Dictionary containing simulation parameters including start_date, t_end, site_latitude, and site_longitude
458+
- `forcing_file_path`: Path where the concatenated forcing file will be saved
459+
- `FT`: Floating point type for the simulation
460+
461+
# Keyword Arguments
462+
- `smooth_amount`: Amount of temporal smoothing to apply (default: 4 - 1° on each side)
463+
- `time_resolution`: Time resolution in seconds for accumulated variables (defined in ERA5 docs; 3600 for hourly data; 86400 for daily and monthly data)
464+
- `input_data_dir`: Directory containing raw ERA5 data files, artifact directory by default
465+
- `output_data_dir`: Directory where individual daily forcing files are stored
466+
"""
467+
function generate_multiday_era5_external_forcing_file(
468+
parsed_args,
469+
forcing_file_path,
470+
FT;
471+
smooth_amount = 4,
472+
time_resolution = FT(3600), # size of accumulated variable period in seconds (3600 for hourly, 86400 for daily and monthly)
473+
input_data_dir = @clima_artifact("era5_hourly_atmos_raw"),
474+
output_data_dir = get(ENV, "BUILDKITE", "") == "true" ? mktempdir() :
475+
@clima_artifact("era5_hourly_atmos_processed"),
476+
)
477+
# run generate_external_era5_forcing_file for each day if its processed data file not found
478+
# get range of starttimes and endtimes
479+
start_date = DateTime(parsed_args["start_date"], "yyyymmdd")
480+
end_time = start_date + Dates.Second(time_to_seconds(parsed_args["t_end"]))
481+
end_date = Dates.format(end_time, "yyyymmdd")
482+
483+
start_dates = start_date:Day(1):end_time
484+
485+
file_list = String[]
486+
for dd in start_dates
487+
# get forcing file path
488+
single_parsed_args = Dict(
489+
"start_date" => Dates.format(dd, "yyyymmdd"),
490+
"site_latitude" => parsed_args["site_latitude"],
491+
"site_longitude" => parsed_args["site_longitude"],
492+
)
493+
single_file_path = get_external_forcing_file_path(
494+
single_parsed_args;
495+
data_dir = output_data_dir,
496+
)
497+
push!(file_list, single_file_path)
498+
# generate the external forcing file for this day
499+
if !isfile(single_file_path)
500+
generate_external_era5_forcing_file(
501+
parsed_args["site_latitude"],
502+
parsed_args["site_longitude"],
503+
Dates.format(dd, "yyyymmdd"),
504+
single_file_path,
505+
FT;
506+
time_resolution = time_resolution,
507+
data_dir = input_data_dir,
508+
smooth_amount = smooth_amount,
509+
)
510+
end
511+
end
512+
# concatenate data and save
513+
concat_ds = Dataset(file_list; aggdim = "time")
514+
NCDatasets.write(forcing_file_path, concat_ds)
515+
end
516+
517+
"""
518+
smooth_4D_era5(data, variable, lon_index, lat_index; smooth_amount = 4)
519+
520+
data is an array from ERA5 data, which has dimension order longitude, latitude,
521+
pressure_level, and time. We want to return smoothed data by a certain amount.
522+
Here we choose 4 points on either side which corresponds to a 2° box total. we
523+
just average the points here, but something more creative could be done.
524+
"""
525+
function smooth_4D_era5(data, variable, lon_index, lat_index; smooth_amount = 4)
526+
# extract data in box around the center point
527+
data_slice = data[variable][
528+
(lon_index - smooth_amount):(lon_index + smooth_amount),
529+
(lat_index - smooth_amount):(lat_index + smooth_amount),
530+
:,
531+
:,
532+
]
533+
# compute mean over lat/lon dimensions and return slice
534+
return mean(data_slice, dims = (1, 2))[1, 1, :, :]
535+
end
536+
537+
"""
538+
smooth_3D_era5(data, variable, lon_index, lat_index; smooth_amount = 4)
539+
540+
data is an array from ERA5, which has dimension order longitude, latitude, and time.
541+
This function returns data smoothed by a certain amount. Here, we choose 4 points on
542+
either side which corresponds to a 2° box total. wejust average the points here, but
543+
something more creative could be done.
544+
"""
545+
function smooth_3D_era5(data, variable, lon_index, lat_index; smooth_amount = 4)
546+
# extract data in box around the center point
547+
data_slice = data[variable][
548+
(lon_index - smooth_amount):(lon_index + smooth_amount),
549+
(lat_index - smooth_amount):(lat_index + smooth_amount),
550+
:,
551+
]
552+
# compute mean over lat/lon dimensions and return slice
553+
return mean(data_slice, dims = (1, 2))[1, 1, :]
554+
end

src/utils/utilities.jl

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -496,3 +496,21 @@ function issphere(space)
496496
return Meshes.domain(Spaces.topology(Spaces.horizontal_space(space))) isa
497497
Domains.SphereDomain
498498
end
499+
500+
"""
501+
clima_to_era5_name_dict()
502+
503+
Returns a dictionary mapping ClimaAtmos variable names to ERA5 variable names.
504+
"""
505+
function clima_to_era5_name_dict()
506+
Dict(
507+
"ua" => "u",
508+
"va" => "v",
509+
"wap" => "w", # era5 w is in Pa/s, this is confusing notation
510+
"hus" => "q",
511+
"ta" => "t",
512+
"zg" => "z",
513+
"clw" => "clwc",
514+
"cli" => "ciwc",
515+
)
516+
end

0 commit comments

Comments
 (0)