Proposed Recipes for Large Ensemble pCO2 testbed

### Dataset Name

Large ensemble pCO2 testbed by @lgloege 

### Dataset URL

https://figshare.com/collections/Large_ensemble_pCO2_testbed/4568555

### Description

This is a collection of randomly selected ensemble members from 4 large ensemble projects:
- CanESM2 (http://data.ec.gc.ca/data/climate/scientificknowledge/the-eccc-climate-model-datasets-for-climate-science-and-impacts-research/the-canadian-earth-system-model-large-ensembles/)
- CESM-LENS (http://www.cesm.ucar.edu/projects/community-projects/LENS/)
- GFDL ( http://poseidon.princeton.edu)
- MPI-GE (https://mpimet.mpg.de/en/grand-ensemble/)

Each ensemble member was interpolated from its native grid to a 1x1 degree lat/lon grid. The variables are monthly over the 1982-2017 time frame and sampled as the SOCATv5 data product. Historical atmospheric CO2 is used up to 2005 with RCP8.5 after 2005.

The intention of this dataset is to evaluate ocean pCO2 gap-filling techniques.

### License

Unknown

### Data Format

NetCDF

### Data Format (other)

_No response_

### Access protocol

HTTP(S)

### Source File Organization

The data is organized on different levels:

- There are 5 models that provide a Large Ensemble (many different members to quantify internal variability)
- For each model there is one file per ensemble member given as `<model><member_id>.tar.gz` [example](https://figshare.com/articles/dataset/MPI_ocean_pCO2_testbed/11477949)
- Each of the tar files contains several netcdf files that represent different variables 
<img width="819" alt="image" src="https://user-images.githubusercontent.com/14314623/200400187-568e6906-51f7-41a5-a07d-f429df2a88a4.png">
These variables are already concatenated in time
<img width="1124" alt="image" src="https://user-images.githubusercontent.com/14314623/200400900-d0aa8bfd-bfe0-415b-ae68-bddfb7a44171.png">


### Example URLs

```shell
https://ndownloader.figshare.com/files/16129505
```
I actually have some trouble getting these from figshare. I was wondering if anyone here has had experience with pulling files from a collection/dataset in figshare? Id be happy to understand the [figshare API](https://docs.figshare.com) and parse http links, but maybe there is something more clever to do with these archive/doi repos like figshare/zenodo?


### Authorization

No; data are fully public

### Transformation / Processing

This is pretty straightforward. 

Id suggest to have one recipe per model (in a recipe dict), that simply combines variables by merging them. 

There should probably be some rechunking, but I think I need some input from the actual users (cc @hatlenheimdalthea @galenmckinley) what is the best chunking structure for the use cases (e.g. are the gap filling models trained on single time step maps or location timeseries). 


### Target Format

Zarr

### Comments

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Proposed Recipes for Large Ensemble pCO2 testbed #219

Dataset Name

Dataset URL

Description

License

Data Format

Data Format (other)

Access protocol

Source File Organization

Example URLs

Authorization

Transformation / Processing

Target Format

Comments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Proposed Recipes for Large Ensemble pCO2 testbed #219

Description

Dataset Name

Dataset URL

Description

License

Data Format

Data Format (other)

Access protocol

Source File Organization

Example URLs

Authorization

Transformation / Processing

Target Format

Comments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions