Skip to content

Add details on GSoC database project #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 4, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 19 additions & 5 deletions src/pages/gsoc_ideas.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -82,32 +82,46 @@ Medium
---
### 3. Database and Data Improvements{#db}

PEcAn relies on the BETYdb database to store trait and yield data as well as model provenance information. This project aims to separate trait data from provenance tracking, and ensure that PEcAn is able to run without the server currently required to run the Postgres database used by BETYdb. The goal is to make PEcAn workflows easier to test, deploy, and use while also making data more accessible.
PEcAn relies on the BETYdb database to store trait and yield data as well as model provenance information. This project aims to separate trait data from provenance tracking, ensure that PEcAn is able to run without the server currently required to run the Postgres database used by BETYdb, and enable flexible data sharing in place of a server-reliant sync mechanism. The goal is to make PEcAn workflows easier to test, deploy, and use while also making data more accessible.


**Potential Directions**

- **Minimal BETYdb Database:** Create a simplified version of BETYdb for demonstrations and Integration tests.
- **Non-Database Setup:** Enable workflows that do not require PostgreSQL or a web front-end.
- **Minimal BETYdb Database:** Create a simplified version of BETYdb for demonstrations and Integration tests, which might include:
- Review the provenance information we currently log, identify components that no longer need to be tracked or that should be temporary rather than permanent records, and build tools to clean unneeded records from the database.
- Design and create a freestanding version of the trait data, including choosing the format and distribution method, implementing whatever pipelines are needed to move the data over, and documenting how to use and update the result.
- Review the information we currently log, identify components that no longer need to be tracked or that should be temporary rather than permanent, and build tools to clean unneeded/expired records from the database.

- **Non-Database Setup:** Enable workflows that do not require PostgreSQL or a web front-end, potentially including:
- Identify PEcAn modules that are still DB-dependent and refactor them to allow freestanding use
- Implement mechanisms for decoupling the DB from the model pipelines in time and space while still tracking provenance. Perhaps this could involve separate prep/execution/post-logging phases, but we encourage your creative suggestions.
- Create tools that maximize interoperability with data from other sources, including from external databases or the user's own observations.
- Identify functionality from the "BETYdb network" sync system that is out of date and replace or remove it as needed.

**Expected outcomes**:

A successful project would complete a subset of the following tasks:
- A lightweight, distributable demo Postgres database.
- A distributable dataset of the existing trait and yield records in a maximally reusable format (i.e. maybe _not_ Postgres)
- A workflow that is independent of the Postgres database.

**Skills Required**:

- Familiarity with database concepts required
- Postgres experience helpful (and required if proposing DB cleanup tasks)
- R experience helpful (and required if proposing PEcAn code changes)

**Contact person:**

Chris Black (@infotroph)

**Duration:**

Suitable fora Medium (175hr) or Large (350 hr) project.
Suitable for a Medium (175hr) or Large (350 hr) project.

**Difficulty:**
Medium, Large

Intermediate to hard


---
Expand Down
Loading