-
Notifications
You must be signed in to change notification settings - Fork 2
Settings
Options are defined as Python dictionaries. There are 7 dictionaries each with options that can be set:
- dataset
- plotting
- output
- lpo_options
- lpt_options
- merge_split_options
- mjo_id_options
The option values are set directly by Python scripts. After the values are set, the main function lpt_driver.py is called.
- Default values for all the options are set in lpt/default_options.py.
- Many of these options are over-ridden in your [RUN]/lpt_run.py.
For a brief overview/reminder of what the settings are, see the comments in lpt/default_options.py, MASTER_RUN/lpt_run.py and [RUN]/lpt_run.py. More details about each setting are provided in this wiki.
| Option | Data Type | Description |
|---|---|---|
| dataset['label'] | string | Used in the output file names. For example, "imerg" in lpt_systems_imerg_2023111000_2024020823.nc |
| dataset['raw_data_parent_dir'] | string | Parent directory, which is in common with all the files, for the input data. It can be a relative path. Subdirectories, such as by date, can be set using the file_name_format option. |
| dataset['raw_data_format'] | string | Controls which lpt/readdata.py function gets used to read in the raw data. The value must match a valid data format in the if/elif/else block at the top of readdata.py. See the current list of current options in Table 2 below. |
| dataset['file_name_format'] | string | The path for filenames under the raw_data_parent_dir. This is a Python format string such as would be used with datetime.strftime(). For example, for 00 UTC 2024-01-10, "%Y/%m/gridded_rain_rates_%Y%m%d%H.nc" would get converted in to "2024/01/gridded_rain_rates_2024011000.nc". |
| dataset['data_time_interval'] | integer | The time between input files. Units: Hours. |
| dataset['verbose'] | True or False | Whether to print more detailed information about the files to the screen. |
| dataset['longitude_variable_name'] | string | Longitude variable name for generic_netcdf. NOTE: The readdata.py functions are set up to convert -180 to 180 longitude to 0 - 360. |
| dataset['latitude_variable_name'] | string | Latitude variable name for generic_netcdf |
| dataset['time_variable_name'] | string | Time variable name for generic_netcdf. Ignored if there is no time dimension. |
| dataset['field_variable_name'] | string | Name of the variable to use for feature identification, a Python string (e.g., "rainfall" for LPT). |
| dataset['field_units'] | string | Units of data. This is mainly used for generating plots, not for calculations. It is OK to set it to "" if plots are not being created. |
| dataset['area'] | list of Floats | Geographical area of data to use. A Python list of float values for [lon_begin, lon_end, lat_begin, lat_end], e.g., [0.0, 360.0, -50.0, 50.0]. The input data will be subsetted to this region. NOTE: The readdata.py functions are set up to convert -180 to 180 longitude to 0 - 360. |
| Raw data option value | Description |
|---|---|
| generic_netcdf | NetCDF data. The intended variable must have dimensions (lat, lon) or (time, lat, lon), or similar variables. The specific variable names are set by the dataset dictionary options named like "*_variable_name". NOTE: These options are ignored for the other raw data formats. |
| cmorph | CMORPH data in binary format. NOTE: For NetCDF format data, you can use generic_netcdf instead. |
| imerg_hdf5 | IMERG V6 data in HDF5 format. |
| cfs_forecast | CFS Forecast data in Grib2 format. |
| Option | Data Type | Description |
|---|---|---|
| plotting['do_plotting'] | True or False | Whether to generate plots. This applies only to the LPO and LPT steps, e.g., lpo_options['do_lpo_calc'] (Map plots of rainfall and LPO) and lpt_options['do_lpt_calc'] (Time-longitude plot). The other plotting options are ignored if this is set to False. NOTE: This is best used as a "gut check" for a short time period to determine whether the code is doing what you expect. If you are running for a long period, this will consume resources, so maybe set it to False for your "production" runs. |
| plotting['plot_area'] | list of Floats | Geographical area of data for map plots. A Python list of float values for [lon_begin, lon_end, lat_begin, lat_end], e.g., [0.0, 360.0, -50.0, 50.0] |
| plotting['time_lon_range'] | list of Floats | Longitude range for time-longitude plots. Does not need to be the same as A Python list of float values for [lon_begin, lon_end], e.g., [40.0, 200.0] |
The output path has several components, depending on the dataset label, accumulation/averaging period, spatial filtering, and threshold value.
The convention for LPO data output, expressed as a Python formatted string, is like this:
fout = (f"{output['data_dir']}"
+ f"/{dataset['label']}"
+ f"/g{lpo_options['filter_stdev']}"
+ f"_{lpo_options['accumulation_hours']}h"
+ f"/thresh{lpo_options['thresh']}"
+ "/objects/"
+ dt_this.strftime(output['sub_directory_format'])
+ "/" + dt_this.strftime('objects_%Y%m%d%H.nc')for example: ./data/imerg/g50_72h/thresh12/objects/2024/01/20240110/objects_2024011000.nc.
- For images, replace "data" with "images" and ".nc" with ".png".
- For systems, replace "objects" with "systems" and no date-based sub directory (ignore output['sub_directory_format'])
| Option | Data Type | Description |
|---|---|---|
| output['img_dir'] | string | directory for plotting outputs. Can be a relative path. |
| output['data_dir'] | string | directory for data outputs (text/NetCDF). Can be a relative path. |
| output['sub_directory_format'] | string | The subdirectory beneath the img_dir or data_dir. This is a Python format string such as would be used with datetime.strftime(). For example, for 00 UTC 2024-01-10, '%Y/%m/%Y%m%d' is converted in to '2024/01/20240110'. This pertains to LPO output data and LPO map plots. |
These options control the identify of large scale precipitation options (LPOs).
| Option | Data Type | Description |
|---|---|---|
| lpo_options['do_lpo_calc'] | True or False | Whether to go through the LPO identification stage of the calculation. If set to False, the LPO step is skipped and all other lpo_options dictionary options are ignored. |
| lpo_options['lpo_calc_n_cores'] | integer | How many processes to use. If it is > 1, the LPO calculations will run in parallel with one time stamp per processor. Make sure you have the resources if you use many processors. |
| lpo_options['overwrite_existing_files'] | True or False | Whether to calculate LPO and re-write files that already exist. |
| lpo_options['multiply_factor'] | float | A factor to multiply the raw data by to get it into the units you want. For example, use 24.0 to convert from mm/h to mm/day. |
| lpo_options['field_units'] | string | The units of the data after applying the multiply_factor. For example, 'mm d-1'. This is used for NetCDF output. |
| lpo_options['thresh'] | float | Threshold value to use for LPO identification. The units are for the data after multiply_factor is applied. |
| lpo_options['accumulation_hours'] | integer | Accumulation/running averag period. Units: hours. Set to 0 to just use instantaneous values without any time averaging. |
| lpo_options['filter_stdev'] | integer | Number of grid points for the standard deviation of the Gaussian spatial filter. Set to 0 for no spatial smoothing. |
| lpo_options['filter_n_stdev_width'] | integer | How many standard deviations to use for the Gaussian spatial filter. For example, if the filter_stdev is 20 and n_stdev_width is set to 3, the filter extend out 60 points. |
| lpo_options['min_points'] = 400 | integer | Minimum number of contiguous grid points to keep as an LPO. |
| lpo_options['cold_start_mode'] | True or False | Whether to use cold start mode. Cold start mode is mainly for model runs, for which you may not have data going back in time to calculating a running average/accumulation at the beginning of the run. In Cold start mode, the averaging period is ramped up from an initial value (lpo_options['cold_start_const_period']) to the intended averaging period. For time 0 to lpo_options['cold_start_const_period'] hours, a constant average value is used. The same, stationary LPOs will be identified during this period. For cold_start_const_period to accumulation_hours, the average from the initial time to the valid time will be used. |
| lpo_options['cold_start_const_period'] | integer | Time period during which constant averaged data (and stationary LPO) is used at the beginning of the cold start. See above. |
| Option | Data Type | Description |
|---|---|---|
| lpo_options['do_lpo_mask'] | True or False | Whether to generate LPO mask file. If set to False, the rest of the LPO mask settings in this table are ignored. Note: This does not require lpo_options['do_lpo_calc'] = True, although LPO mask files would need to have been generated if do_lpo_calc is False. |
| lpo_options['mask_detailed_output'] | True or False | Whether to use detailed mask output. By default (False), a single variable "mask" is output. If this is set to True, up to four mask variables are output. This is mainly useful for understanding what each step of LPO does and for development. See the Output section below for more details. |
| lpo_options['mask_include_rain_rates'] | True or False | Whether to include masked rain rates in mask output files. The masked rain is simply the rain (or whatever variable is used for LPO step) with values outside of the mask set to missing. |
lpo_options['mask_calc_volrain'] = True # Whether to calculate a volumetric rain and include with mask files. lpo_options['mask_calc_with_filter_radius'] = True # Whether to calculate the mask with filter variables. (Takes much longer to run) lpo_options['mask_calc_with_accumulation_period'] = True # Whether to calculate the mask with filter variables. (Takes much longer to run) lpo_options['mask_coarse_grid_factor'] = 0 # If > 0, it will use a coarsened grid to calculate masks. Good for high res data. lpo_options['target_memory_for_writing_masks_MB'] = 10000 # Target to limit memory demand from writing masks to files. The more, the faster it can run. lpo_options['mask_n_cores'] = 1 # How many processors to use for LPO mask calculations.