xcube-clms

The xcube-clms Python package provides an xcube data store that enables access to datasets hosted by the Copernicus Land Monitoring Service (CLMS). The data store is called "clms" and implemented as an xcube plugin. It uses the CLMS API under the hood.

Setup

Installing the xcube-clms plugin from the repository

To install xcube-clms directly from the git repository, clone the repository, cd into xcube-clms, and follow the steps below:

git clone https://github.com/xcube-dev/xcube-clms.git
cd xcube-clms
conda env create -f environment.yml
conda activate xcube-clms
pip install .

This sets up a new conda environment, installs all the dependencies required for xcube-clms, and then installs xcube-clms directly from the repository into the environment.

Installing the xcube-clms plugin from the conda-forge

This method assumes that you have an existing environment, and you want to install xcube-clms into it. With the existing environment activated, execute this command:

mamba install --channel conda-forge xcube-clms

If xcube and any other necessary dependencies are not already instelled, they will be installed automatically.

Create credentials to access the CLMS API

Create the credentials as a json file required for the CLMS API following the documentation. The credentials will be required during the initialization of the CLMS data store. Please follow the instructions in the example/notebooks/CLMSDataStoreTutorial.ipynb, on how to pass the credentials from the json file to the store.

Testing

To run the unit test suite:

pytest

Some notes on the strategy of unit-testing for some tests

The unit test suite uses pytest-recording to mock https requests via the Python library requests. During development an actual HTTP request is performed and the responses are saved in cassettes/.yaml files. During testing, only the cassettes/.yaml files are used without an actual HTTP request. During development, to save the responses to cassettes/**.yaml, run:

pytest -v -s --record-mode new_episodes

Note that --record-mode new_episodes overwrites all cassettes. If one only wants to write cassettes which are not saved already, --record-mode once can be used. pytest-recording supports all records modes given by [VCR.py](https://vcrpy.readthedocs.io/en/latest/usage.html#record-modes. After recording the cassettes, testing can be then performed as usual.

Additional Notes about the data store

This data store currently only supports some dataset sources from the CLMS API:

EEA
LEGACY

NOTE: More dataset sources will be supported in newer versions

EEA

This data store introduces the initial mechanism of preloading data, including cache management, downloading, and file processing. This uses the experimental Preload API from the xcube data store.

This new addition of a preload interface is due to the nature of the CLMS API which allows the user to create data requests, with undetermined time to wait in the queue for the request to be processed, followed by downloading zip files, unzipping them, extracting them in a cache and processing them which can be then finally opened using a cache data store for EEA pre-packaged data sources. The default is file data store stored at /clms_cache location in your cwd, but the users are free to choose their data store of their liking.

Preloading allows the data store to request the datasets for download to the CLMS API in both blocking/non-blocking way which handles sending the download request, queueing for download, waiting in the queue, periodically checking for the request status, downloading the data, extracting and post-processing it.

The preload mechanism can be used using .preload_data(*data_ids, **preload_params) on the CLMS data store instance.

LEGACY

For datasets available via the LEGACY source, they can be lazily loaded directly using open_data(...)

CLMS API

Requires an EU account to register on the CLMS site.
Once registered, the user should create an access token json file as described here

CLMS API issues

This API has some problems as listed below

The datasets which are made available via requests, contain a download link to a zip file, which is valid only for 3 days. But we found that this is not true and we cannot rely on this time to make sure that the download link still works. So, we have to create a workaround to manage our own expiry times. This issue has been raised with the CLMS service desk. Quoting their reply For the first issue mentioned by you:
```
The status is completed and there is indicated that there are 2 days for
expiring, but the download link is already expired, we are going to
investigate this bug.
```

We use the API to figure out if a certain data_id has already been requested to the CLMS server and its status so that we can get the download link directly or if it has not been requested yet or expired, we request it. But this is also not possible because although on their web UI, we cannot see the old downloads that have expired, the API does return the expired requests which were completed and do not contain any information that they are expired or when they will expire. Quoting the CLMS helpdesk replies

For the second issue mentioned by you: the @datarequest_search endpoint does
not seem to be working as expected, we are going to consult the API experts so
to check its functioning and in case an improvement is feasible in our side,
we´ll let you know.

and its follow up after a week

After having analysed the possibility to improve the status of the
downloads, our team answers the following: Currently, our download system is
not able to extract information on whether the link has expired or not,
therefore our API does not provide this information.. Due to this, we had to
create workarounds to figure out if a certain dataset's link was expired or
not.

The cancel endpoint for the API does not work and the issue was raised with the helpdesk team as well. Quoting their reply

Recently a new firewall of the CLMS Portal machine has been setup. This new
firewall is blocking some of the process cancelation request. We've detected
the issue and working with the IT team to solve it

CLMS API will has now started moving datasets to CDSE and WEkEO infrastructure which leads to the previously working datasets via CLMS to give an error. Upon requesting them with more information, this is what they replied:

The future of the CLMS website is to have the Global datasets and the Pan
European datasets centralised in as few repositories as possible (which is not
the current situation). For this, some of the datasets (and new recently
produced datasets) have been moved from EEA´s infrastructure to WEKEO´s
infrastructure.

Other datasets (Global datasets for the moment) are being moved to CDSE and the
CLMS website is being adapted to this new situation.
Regarding the datasets for which in the dataset page the explanation is that the
dataset is only accessible through the WEKEO external site, the CLMS API can´t
be used right now.

WEKEO has increased the download limits imposed to the CLMS website to retrieve
data from there, so maybe in a near future we´ll be able to offer the datasets
from there in a seamless manner, and the direct download of these datasets will
be enabled from the CLMS website (but it is not the case right now).

For the datasets that are being migrated to CDSE, we are currently analysing the
required adaptations on the CLMS website for users to keep downloading the data
in a seamless manner. For the moment we don’t directly offer any dataset through
CDSE.

Name		Name	Last commit message	Last commit date
Latest commit History 270 Commits
.github/workflows		.github/workflows
examples/notebooks		examples/notebooks
test		test
xcube_clms		xcube_clms
.gitignore		.gitignore
CHANGES.md		CHANGES.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

xcube-clms

Setup

Installing the xcube-clms plugin from the repository

Installing the xcube-clms plugin from the conda-forge

Create credentials to access the CLMS API

Testing

Some notes on the strategy of unit-testing for some tests

Additional Notes about the data store

EEA

LEGACY

CLMS API

CLMS API issues

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

xcube-dev/xcube-clms

Folders and files

Latest commit

History

Repository files navigation

xcube-clms

Setup

Installing the xcube-clms plugin from the repository

Installing the xcube-clms plugin from the conda-forge

Create credentials to access the CLMS API

Testing

Some notes on the strategy of unit-testing for some tests

Additional Notes about the data store

EEA

LEGACY

CLMS API

CLMS API issues

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages