added database project

dlebauer · dlebauer · commit 6323633d03c8 · 2025-02-10T22:21:02.000-07:00
diff --git a/src/pages/gsoc_ideas.mdx b/src/pages/gsoc_ideas.mdx
@@ -4,13 +4,13 @@ title: 'GSoC 2025 - PEcAn Project Ideas'
 
 # [GSoC - PEcAn Project Ideas](#background)
 
-Ecosystem science has many components, so does PEcAn! Some of those components where you can contribute. Below is a list of potential ideas. Feel free to contact any of the mentors in slack, or feel free to ask questions in our #gsoc-2025 channel in slack.
+PEcAn is an open-source ecosystem modeling framework integrating data, models, and uncertainty quantification. Below is a list of potential ideas where contributors can help improve and expand PEcAn. Come find us on Slack to discuss. If you have questions or would like to propose your own idea, contact @kooper in Slack or join our `#gsoc-2025`
 
 ---
 
 ## [Project Ideas](#ideas)
 
-Following is a list of project ideas, use this list to contact the appropriate mentors on slack. Feel free to propose your own ideas as well, in this case contact @kooper in Slack so he can put you in contact with the best mentors.
+Below is a list of project ideas. Feel free to contact the listed mentors on Slack to discuss further or contact @kooper with new ideas and he can help connect you with mentors.
 
 ---
 
@@ -21,9 +21,9 @@ This project would extend PEcAn's existing uncertainty partitioning routines, wh
 
 **Expected outcomes:**
 
-A successful project would complete at subset of the following tasks:
+A successful project would complete a subset of the following tasks:
 
-* Reliable, automated Sensitivity analyss and uncertainty partitioning 
+* Reliable, automated Sobol sensitivity analyss and uncertainty partitioning across multiple model inputs.
 * Applications to test case(s) in natural and / or managed ecosystems.
 
 **Prerequisites:**
@@ -45,9 +45,9 @@ Medium
 
 ---
 
-#### [Parallelization of runs](#hpc)
+#### [Parallelization of Model Runs on HPC](#hpc)
 
-This project would extend PEcAn's existing run mechanisms to be able to run on an HPC using apptainer. For uncertaintity analysis, PEcAn will run 1000s of runs of the same model with small permutations. This is a perfect use for an HPC run. The goal is to not submit 1000s of jobs, but have a single job with multiple nodes that will run all of the ensembles efficiently. Running can be orchistrated using RabbitMQ but other methods are encouraged as well. The end goal should be for the PEcAn system to be launched, and run the full workflow on the HPC from start to finish leveraging as many nodes as given during the submission.
+This project would extend PEcAn's existing run mechanisms to be able to run on a High Performance Compute cluster (HPC) using [Apptainer](https://apptainer.org). For uncertaintity analysis, PEcAn will run the same model 1000s of times with small permutations. This is a perfect use for an HPC run. The goal is to not submit 1000s of jobs, but have a single job with multiple nodes that will run all of the ensembles efficiently. Running can be orchistrated using RabbitMQ, but other methods are also encouraged. The end goal should be for the PEcAn system to be launched, and run the full workflow on the HPC from start to finish leveraging as many nodes as it is given during the submission.
 
 **Expected outcomes:**
 
@@ -58,8 +58,8 @@ A successful project would complete at subset of the following tasks:
 
 **Prerequisites:**
 
-- Required: R (existing workflow and prototype is in R), docker
-- Helpful: familiarity with HPC and apptain
+- Required: R (existing workflow and prototype is in R), Docker
+- Helpful: Familiarity with HPC and Apptainer
 
 **Contact person:**
 
@@ -74,23 +74,36 @@ Flexible to work as either a Medium (175hr) or Large (350 hr)
 Medium
 
 ---
-#### [Database Improvements](#db)
+#### [Database and Data Improvements](#db)
+
+PEcAn relies the BETYdb database to store trait and yield data as well as model provenance information. This project aims separating trait data from provenance tracking, and ensure that PEcAn is aboe to run without the Postgres server currently required to run BETYdb. The goal is to making the workflows easier to use and data more accessible.
+
+
+**Potential Directions**
+
+- **Minimal BETYdb Database:** Create a simplified version of BETYdb for demonstrations and Integration tests.
+- **Non-Database Setup:** Enable workflows that do not require PostgreSQL or a web front-end.
 
-**Chris TODO**
-- decouple traits from provenance
-- make betydb.org data available through R package
+**Expected outcomes**:
+
+A successful project would complete a subset of the following tasks:
+- A lightweight, distributable demo Postgres database.
+- A Postgres database independent workflow enabling easier local testing and deployment.
 
 
 
 **Contact person:**
+
 Chris Black (@infotroph)
 
 **Duration:**
-Flexible to work as either a Medium (175hr) or Large (350 hr)
+
+Suitable fora Medium (175hr) or Large (350 hr) project.
 
 **Difficulty:**
 Medium, Large
 
+
 ---
 
 #### [Development of Notebook-based PEcAn Workflows](#notebook)
@@ -123,6 +136,40 @@ Medium
 
 # This comment section for ideas that may be potentially viable in future (with revision)
 
+
+#### BETYdb R data package
+
+BETYdb's web front end is built on a version of Ruby on Rails that is functional byt no longer supported. A key feature of BETYdb is that the data is open and accessible. 
+
+Building an R data package would make the Trait and Yield data currently in BETYdb more accessible to users beyond the PEcAn community.
+
+**Expected outcomes:**
+
+A successful project would complete a subset of the following tasks:
+
+- An R package containing the data currently hosted in BETYdb.
+- Documentation and examples of use.
+- Updates to BETYdb documentation.
+
+**Prerequisites:**
+
+- Required: R
+- Helpful: R package development; familiarity with relational databases and SQL.
+
+**Contact person:**
+
+David LeBauer (@dlebauer)
+
+**Duration:**
+
+Medium (175hr) to Large (350hr) depending on scope of proposal.
+
+**Difficulty:**
+
+Medium
+
+---
+
 #### [Optimize PEcAn for freestanding use of single packages [R package development]](#freestanding)
 
 PEcAn was designed as a system of independent modules, each implemented as its own R package that was intended to be usable either standalone or as part of the full PEcAn system. Subsequent development focused on the most common cross-module workflows has lead to tighter coupling between modules than was originally intended, so that in practice many of the modules are now challenging to use, test, or develop without a full understanding of their interdependencies. Further, some packages expect inputs and outputs in data structures that are only generated by other PEcAn packages but might be more easily provided in standard well-known formats. We seek proposals to re-loosen these couplings by revisiting the design and interface of PEcAn packages through one or more of: