Skip to content

Settings

Brandon Kerns edited this page Jun 13, 2024 · 2 revisions

Description of LPT settings

Options are defined as Python dictionaries. There are 7 dictionaries each with options that can be set:

  • dataset
  • plotting
  • output
  • lpo_options
  • lpt_options
  • merge_split_options
  • mjo_id_options

The option values are set directly by Python scripts. After the values are set, the main function lpt_driver.py is called.

  • Default values for all the options are set in lpt/default_options.py.
  • Many of these options are over-ridden in your [RUN]/lpt_run.py.

For a brief overview/reminder of what the settings are, see the comments in lpt/default_options.py, MASTER_RUN/lpt_run.py and [RUN]/lpt_run.py. More details about each setting are provided in this wiki.

Dataset options

Table 1. dataset dictionary options.

Option Data Type Description
dataset['label'] string Used in the output file names. For example, "imerg" in lpt_systems_imerg_2023111000_2024020823.nc
dataset['raw_data_parent_dir'] string Parent directory, which is in common with all the files, for the input data. It can be a relative path. Subdirectories, such as by date, can be set using the file_name_format option.
dataset['raw_data_format'] string Controls which lpt/readdata.py function gets used to read in the raw data. The value must match a valid data format in the if/elif/else block at the top of readdata.py. See the current list of current options in Table 2 below.
dataset['file_name_format'] string The path for filenames under the raw_data_parent_dir. This is a Python format string such as would be used with datetime.strftime(). For example, for 00 UTC 2024-01-10, "%Y/%m/gridded_rain_rates_%Y%m%d%H.nc" would get converted in to "2024/01/gridded_rain_rates_2024011000.nc".
dataset['data_time_interval'] integer The time between input files. Units: Hours.
dataset['verbose'] True or False Whether to print more detailed information about the files to the screen.
dataset['longitude_variable_name'] string Longitude variable name for generic_netcdf. NOTE: The readdata.py functions are set up to convert -180 to 180 longitude to 0 - 360.
dataset['latitude_variable_name'] string Latitude variable name for generic_netcdf
dataset['time_variable_name'] string Time variable name for generic_netcdf. Ignored if there is no time dimension.
dataset['field_variable_name'] string Name of the variable to use for feature identification, a Python string (e.g., "rainfall" for LPT).
dataset['field_units'] string Units of data. This is mainly used for generating plots, not for calculations. It is OK to set it to "" if plots are not being created.
dataset['area'] list of Floats Geographical area of data to use. A Python list of float values for [lon_begin, lon_end, lat_begin, lat_end], e.g., [0.0, 360.0, -50.0, 50.0]. The input data will be subsetted to this region. NOTE: The readdata.py functions are set up to convert -180 to 180 longitude to 0 - 360.

Table 2. Raw data format options.

Raw data option value Description
generic_netcdf NetCDF data. The intended variable must have dimensions (lat, lon) or (time, lat, lon), or similar variables.
The specific variable names are set by the dataset dictionary options named like "*_variable_name". NOTE: These options are ignored for the other raw data formats.
cmorph CMORPH data in binary format. NOTE: For NetCDF format data, you can use generic_netcdf instead.
imerg_hdf5 IMERG V6 data in HDF5 format.
imerg_v7_hdf5 IMERG V7 data in HDF5 format.
cfs_forecast CFS Forecast data in Grib2 format.

Plotting options

Table 3. plotting dictionary options.

Option Data Type Description
plotting['do_plotting'] True or False Whether to generate plots. This applies only to the LPO and LPT steps, e.g., lpo_options['do_lpo_calc'] (Map plots of rainfall and LPO) and lpt_options['do_lpt_calc'] (Time-longitude plot). The other plotting options are ignored if this is set to False.
NOTE: This is best used as a "gut check" for a short time period to determine whether the code is doing what you expect. If you are running for a long period, this will consume resources, so maybe set it to False for your "production" runs.
plotting['plot_area'] list of Floats Geographical area of data for map plots. A Python list of float values for [lon_begin, lon_end, lat_begin, lat_end], e.g., [0.0, 360.0, -50.0, 50.0]
plotting['time_lon_range'] list of Floats Longitude range for time-longitude plots. Does not need to be the same as A Python list of float values for [lon_begin, lon_end], e.g., [40.0, 200.0]

Output options

The output path has several components, depending on the dataset label, accumulation/averaging period, spatial filtering, and threshold value.

The convention for LPO data output, expressed as a Python formatted string, is like this:

fout = (f"{output['data_dir']}"
    + f"/{dataset['label']}"
    + f"/g{lpo_options['filter_stdev']}"
    + f"_{lpo_options['accumulation_hours']}h"
    + f"/thresh{lpo_options['thresh']}"
    + "/objects/"
    + dt_this.strftime(output['sub_directory_format'])
    + "/" + dt_this.strftime('objects_%Y%m%d%H.nc')

for example: ./data/imerg/g50_72h/thresh12/objects/2024/01/20240110/objects_2024011000.nc.

  • For images, replace "data" with "images" and ".nc" with ".png".
  • For systems, replace "objects" with "systems" and no date-based sub directory (ignore output['sub_directory_format'])

Table 4. output dictionary options.

Option Data Type Description
output['img_dir'] string directory for plotting outputs. Can be a relative path.
output['data_dir'] string directory for data outputs (text/NetCDF). Can be a relative path.
output['sub_directory_format'] string The subdirectory beneath the img_dir or data_dir. This is a Python format string such as would be used with datetime.strftime(). For example, for 00 UTC 2024-01-10, '%Y/%m/%Y%m%d' is converted in to '2024/01/20240110'. This pertains to LPO output data and LPO map plots.

LPO Settings

Table 5.1. lpo_options dictionary options for LPO identification.

These options control the identify of large scale precipitation options (LPOs).

Option Data Type Description
lpo_options['do_lpo_calc'] True or False Whether to go through the LPO identification stage of the calculation. If set to False, the LPO step is skipped and all other lpo_options dictionary options are ignored.
lpo_options['lpo_calc_n_cores'] integer How many processes to use. If it is > 1, the LPO calculations will run in parallel with one time stamp per processor. Make sure you have the resources if you use many processors.
lpo_options['overwrite_existing_files'] True or False Whether to calculate LPO and re-write files that already exist.
lpo_options['multiply_factor'] float A factor to multiply the raw data by to get it into the units you want. For example, use 24.0 to convert from mm/h to mm/day.
lpo_options['field_units'] string The units of the data after applying the multiply_factor. For example, 'mm d-1'. This is used for NetCDF output.
lpo_options['thresh'] float Threshold value to use for LPO identification. The units are for the data after multiply_factor is applied.
lpo_options['accumulation_hours'] integer Accumulation/running averag period. Units: hours. Set to 0 to just use instantaneous values without any time averaging.
lpo_options['filter_stdev'] integer Number of grid points for the standard deviation of the Gaussian spatial filter. Set to 0 for no spatial smoothing.
lpo_options['filter_n_stdev_width'] integer How many standard deviations to use for the Gaussian spatial filter. For example, if the filter_stdev is 20 and n_stdev_width is set to 3, the filter extend out 60 points.
lpo_options['min_points'] = 400 integer Minimum number of contiguous grid points to keep as an LPO.
lpo_options['cold_start_mode'] True or False Whether to use cold start mode. Cold start mode is mainly for model runs, for which you may not have data going back in time to calculating a running average/accumulation at the beginning of the run. In Cold start mode, the averaging period is ramped up from an initial value (lpo_options['cold_start_const_period']) to the intended averaging period. For time 0 to lpo_options['cold_start_const_period'] hours, a constant average value is used. The same, stationary LPOs will be identified during this period. For cold_start_const_period to accumulation_hours, the average from the initial time to the valid time will be used.
lpo_options['cold_start_const_period'] integer Time period during which constant averaged data (and stationary LPO) is used at the beginning of the cold start. See above.

Table 5.2. lpo_options dictionary options for LPO spatio-temporal masks.

Option Data Type Description
lpo_options['do_lpo_mask'] True or False Whether to generate LPO mask file. If set to False, the rest of the LPO mask settings in this table are ignored. Note: This does not require lpo_options['do_lpo_calc'] = True, although LPO mask files would need to have been generated if do_lpo_calc is False.
lpo_options['mask_detailed_output'] True or False Whether to use detailed mask output. By default (False), a single variable "mask" is output. If this is set to True, up to four mask variables are output. This is mainly useful for understanding what each step of LPO does and for development. See the Output section below for more details.
lpo_options['mask_include_rain_rates'] True or False Whether to include masked rain rates in mask output files. The masked rain is simply the rain (or whatever variable is used for LPO step) with values outside of the mask set to missing.

lpo_options['mask_calc_volrain'] = True # Whether to calculate a volumetric rain and include with mask files. lpo_options['mask_calc_with_filter_radius'] = True # Whether to calculate the mask with filter variables. (Takes much longer to run) lpo_options['mask_calc_with_accumulation_period'] = True # Whether to calculate the mask with filter variables. (Takes much longer to run) lpo_options['mask_coarse_grid_factor'] = 0 # If > 0, it will use a coarsened grid to calculate masks. Good for high res data. lpo_options['target_memory_for_writing_masks_MB'] = 10000 # Target to limit memory demand from writing masks to files. The more, the faster it can run. lpo_options['mask_n_cores'] = 1 # How many processors to use for LPO mask calculations.

Clone this wiki locally