Generation of GHG concentration inputs (i.e. forcings) for CMIP7's ScenarioMIP.
- development: the project is actively being worked on
We do all our environment management using pixi. To get started, you will need to make sure that pixi is installed (instructions here, we found that using the pixi provided script was best on a Mac).
To create the virtual environment, run
pixi install
pixi run pre-commit install
These steps are also captured in the Makefile
so if you want a single
command, you can instead simply run make virtual-enviroment
.
Having installed your virtual environment, you can now run commands in your virtual environment using
pixi run <command>
For example, to run Python within the virtual environment, run
pixi run python
As another example, to run a notebook server, run
pixi run jupyter lab
- Receive data from the emissions team
- Do a new run where you update
--emissions-file
,--run-id
,--esgf-version
and--input4mips-cvs-source
- Send data to the publication team using
scripts/upload-to-llnl.py
- Receive markers from the emissions team
- the markers are defined in
scripts/generate-concentration-files.py
. If there are changes, make sure you update this variable.
- the markers are defined in
- Receive emissions from the emissions team
- they should send two files.
They produce these files with the script
here
(hopefully merged into main soon).
The two files are:
- the emissions for each scenario, except for emissions of species that we derive from our inversions of sources like WMO (2022) (where we use only a single concentration projection, rather than having variation across scenarios)
- emissions for each scenario at the fossil/biosphere level. This is used for some extrapolations of latitudinal gradients. It's the same data as above, just at slightly higher sectoral detail.
- they should send two files.
They produce these files with the script
here
(hopefully merged into main soon).
The two files are:
- Put the received emissions in
data/raw/input-scenarios
- Update the emissions file you use for your run.
There are two options for how to do this:
- specify this from the command line via the
--emissions-file
option - change the value of the
emissions_file
variable inscripts/generate-concentration-files.py
- specify this from the command line via the
- Run with a new run ID and ESGF version (using the command line argument
--run-id
and--esgf-version
). Pick whatever makes sense here (we don't have strong rules about our versioning yet)- This will also require creating entries for the controlled vocabularies (CVs). This requires updating this file to include source IDs of the form "CR-scenario-esgf-version". In practice, simply copy the existing "CR-scenario-esgf-version" entries and update their version to match the ESGF version you used above. Then push this to GitHub.
- When you run, you will need to update the value of
--input4mips-cvs-source
. You can do this either via the command-line argument--input4mips-cvs-source
or just update the value inscripts/generate-concentration-files.py
. The value should be of the form"gh:[commit-id]"
e.g."gh:c75a54d0af36dbedf654ad2eeba66e9c1fbce2a2"
.
- When the run is finished, upload the results for the publication team with
pixi run python scripts/upload-to-llnl.py --unique-upload-id-dir <unique-value-here> output-bundles/<run-id>/data/processed/esgf-ready/input4MIPs
e.g.pixi run python scripts/upload-to-llnl.py --unique-upload-id-dir cr-scenario-concs-20250701-1 output-bundles/v0.1.0a2/data/processed/esgf-ready/input4MIPs
- Tell the publication team that the results are uploaded and the folder in which to find them i.e. the value of
--unique-upload-id-dir
By default, this all runs serially. You can add extra cores with the flags below:
--n-workers
: the number of threaded (i.e. parallel) workers to use for submitting jobs- note: this doesn't result in true parallelism. A full explanation is beyond the scope of this document (but if you want to google, explore the difference between multiprocessing with threads compared to processes in python)
--n-workers-multiprocessing
: the number of multiprocessing (i.e. parallel) workers to use, excluding any tasks that require running MAGICC--n-workers-multiprocessing-magicc
: the number of multiprocessing (i.e. parallel) workers to use for tasks that run MAGICC--n-workers-per-magicc-notebook
: the number of MAGICC workers to use in each MAGICC-running task.- note: the total number of MAGICC workers is the product of
--n-workers-multiprocessing-magicc
and--n-workers-per-magicc-notebook
- note: the total number of MAGICC workers is the product of
In general, you want:
--n-workers
: equal to the number of cores on your CPU (or more)--n-workers-multiprocessing
: equal to the number of cores on your CPU (or more)--n-workers-multiprocessing-magicc
,--n-workers-per-magicc-notebook
: the product should be equal to equal to the number of cores on your CPU (or more)
For example, for an eight core machine you might do something like
pixi run python scripts/generate-concentration-files.py --n-workers 8 --n-workers-multiprocessing 8 --n-workers-multiprocessing-magicc 2 --n-workers-per-magicc-notebook 4
If you need/want to run only for a specific gas, you can use the --ghg
flag as shown below.
pixi run python scripts/generate-concentration-files.py --ghg ccl4 --ghg cfc113
TODO: update this section as we add:
- tests
- anything else
Install and run instructions are the same as the above (this is a simple repository, without tests etc. so there are no development-only dependencies).
TODO: update as we figure out the structure
TODO: update as we figure out the structure
We have a basic Makefile
which captures key commands in one place
(for more thoughts on why this makes sense, see
general principles: automation).
For an introduction to make
, see
this introduction from Software Carpentry.
Having said this, if you're not interested in make
, you can just copy the
commands out of the Makefile
by hand and you will be 90% as happy.
In this repository, we use the following tools:
- git for version-control (for more on version control, see
general principles: version control)
- for these purposes, git is a great version-control system so we don't complicate things any further. For an introduction to Git, see this introduction from Software Carpentry.
- Pixi for environment management
(for more on environment management, see
general principles: environment management)
- there are lots of environment management systems. Pixi works well in our experience and, for projects that need conda, it is the only solution we have tried that worked really well.
- we track the
pixi.lock
file so that the environment is completely reproducible on other machines or by other people (e.g. if you want a colleague to take a look at what you've done)
- pre-commit with some very basic settings to get some
easy wins in terms of maintenance, specifically:
- code formatting with ruff
- basic file checks (removing unneeded whitespace, not committing large files etc.)
- (for more thoughts on the usefulness of pre-commit, see general principles: automation
- track your notebooks using
jupytext
(for more thoughts on the usefulness of Jupytext, see
tips and tricks: Jupytext)
- this avoids nasty merge conflicts and incomprehensible diffs
- prefect for workflow orchestration
This project was generated from this template: basic python repository. copier is used to manage and distribute this template.