Aggregation Scripts

These scripts generate the KEGG/COG/Pfam aggregations that are used for search.

A container hosted on Spin runs the agg.sh script, which performs the aggregations periodically (once every 4 hours, by default).

Note

The container image is hosted here.

Development

Here's how you can set up a local development environment:

Unless otherwise specified, all commands below are designed to be run from the root directory of the repository.

Note

These instructions do not cover the process of setting up a local MongoDB server or getting access to the NERSC filesystem.

Create and activate a Python virtual environment

python -m venv ./.venv
source ./.venv/bin/activate

Install Python dependencies
```
pip install -r requirements.txt
```
Done

Testing

We use pytest as our test framework.

Here's how you can run the tests:

Unless otherwise specified, all commands below are designed to be run from the root directory of the repository.

Run the tests
```
pytest --doctest-modules
```
The --doctest-modules option tells pytest that we want it to also find and run doctests defined in our modules.
See the test results in the console

Deployment

Here's how you can build a new version of the container image and push it to the GitHub Container Registry:

On GitHub, create a new Release.
1. Create a tag.
  - Use "v{major}.{minor}.{patch}" format (e.g. "v1.2.3").
2. Click the "Generate release notes" button.
3. Leave the Release title empty (so GitHub reuses the tag name as the Release title).
4. Click the "Publish release" button.
Wait 3-4 minutes for the container image to appear on the GitHub Container Registry.

Taking a long time? Check the "Actions" tab on GitHub to see the status of the GitHub Actions workflow that builds the image.

Now that the container image is hosted there, you can configure a Spin workload to run it.

Configuration

Environment variables

LOG_FILE: Path to file to which logs will be appended (Default: /tmp/agg.log)
POLL_TIME: Number of seconds to sleep between each run (Default: 14400, which is 4 hours)
NMDC_CLIENT_ID: Client ID for interacting with NMDC's runtime API
NMDC_CLIENT_PW: Password for interacting with NMDC's runtime API
ENV: Whether to run the scripts in the production or development runtime environment. Options are "prod" or "dev".

Release Notes

https://github.com/microbiomedata/nmdc-aggregator/releases

Name		Name	Last commit message	Last commit date
Latest commit History 108 Commits
.github/workflows		.github/workflows
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
VERSION		VERSION
__init__.py		__init__.py
agg.sh		agg.sh
aggregator.py		aggregator.py
generate_metag_metat_functional_agg.py		generate_metag_metat_functional_agg.py
generate_metap_functional_agg.py		generate_metap_functional_agg.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Aggregation Scripts

Development

Testing

Deployment

Configuration

Environment variables

Release Notes

About

Uh oh!

Releases 22

Packages

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages

microbiomedata/nmdc-aggregator

Folders and files

Latest commit

History

Repository files navigation

Aggregation Scripts

Development

Testing

Deployment

Configuration

Environment variables

Release Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 22

Packages 0

Uh oh!

Uh oh!

Contributors 8

Uh oh!

Languages

Packages