Skip to content

Commit d51d45a

Browse files
authored
Update 2025 GSOC ideas
A work in progress
1 parent 2f20442 commit d51d45a

File tree

1 file changed

+36
-87
lines changed

1 file changed

+36
-87
lines changed

src/pages/gsoc_ideas.mdx

Lines changed: 36 additions & 87 deletions
Original file line numberDiff line numberDiff line change
@@ -10,93 +10,89 @@ Ecosystem science has many components, so does PEcAn! Some of those components w
1010

1111
## [Project Ideas](#ideas)
1212

13-
Following is a list of project ideas, use this list to contact the appropriate mentors on slack. Feel free to propose your own ideas as well, in this case contact @kooper in slack so he can put you in contact with the best mentors.
13+
Following is a list of project ideas, use this list to contact the appropriate mentors on slack. Feel free to propose your own ideas as well, in this case contact @kooper in Slack so he can put you in contact with the best mentors.
1414

1515
---
1616

17-
#### [Machine Learning downscaling of PEcAn outputs](#ml)
17+
#### [Global sensitivity analysis / uncertainty partitioning](#sa)
1818

19-
This project would extend an existing prototype that takes ensemble-based outputs from the process-based PEcAn models (and the data assimilation code in particular) and use ML models to make predictions to new locations where the PEcAn models were not run (a.k.a. downscaling). Existing code downscales the low-frequency (monthly to annual) carbon pool outputs using a random forest model and a harmonized stack of gridded spatial data (climate, land use/land cover, soils, topography). The current system also preserves the covariance structure across variables, space, and time by downscaling each model ensemble member separately and then using the downscaled ensemble to calculate summary statistics. Also included are some basic assessments of (cross-)validation skill and variable importance.
19+
This project would extend PEcAn's existing uncertainty partitioning routines, which are primarily one-at-a-time and focused on model parameters, to also consider ensemble-based uncertainties in other model inputs (meteorology, soils, vegetation, phenology, etc). This project would employ Sobol' methods and some uncommitted code exists that manually prototyped how this would be done in PEcAn. The goal would be to refactor/reimplement this prototype into a reliable, automated system and apply it to some key test cases in both natural and managed ecosystems.
2020

21-
**Expected outcome:**
21+
22+
**Expected outcomes:**
2223

2324
A successful project would complete at subset of the following tasks:
2425

25-
1. Extend the code to downscale higher-frequency (hourly to daily) carbon flux outputs
26-
2. Develop tools for aggregating downscaled outputs to user-specified spatial units (e.g., political boundaries, atmospheric model grid cells)
27-
3. Explore alternative ML models and multi-model ensembles.
28-
4. Extend the set of covariate data to make use of time-varying inputs (e.g. that year’s weather rather than the climatological mean), additional remotely sensed observations, and the previous ecosystem state.
29-
5. Improving the downscaling validation checks, potentially adding additional corrections to the computed uncertainties (current prototype tool tends to underpredict the ensemble spread).
26+
* Reliable, automated Sensitivity analyss and uncertainty partitioning
27+
* Applications to test case(s) in natural and / or managed ecosystems.
3028

3129
**Prerequisites:**
3230

33-
- Required: R (existing prototype is in R); basic familiarity with ML techniques and packages
34-
- Helpful: familiarity with large spatial gridded data (e.g., GIS, R terra, remote sensing); more advanced statistics, ML, or data science; Python
31+
- Required: R (existing workflow and prototype is in R)
32+
- Helpful: familiarity with sensitivity analyses
3533

3634
**Contact person:**
35+
3736
Mike @Dietze
3837

3938
**Duration:**
40-
Size: 175 hours for 1-2 tasks, 350 hours for 3 or more tasks
39+
40+
Flexible to work as either a Small (175hr) or Large (350 hr)
4141

4242
**Difficulty:**
43+
4344
Medium
4445

4546
---
4647

47-
#### [Adopting data schema for field management events](#management)
48-
49-
This project aims to adapt a data schema for an R shiny application called fieldactivity. Fieldactivity is an application that allows field operators and researchers to enter field information about management activities through UI to aid bookkeeping of such events. The management activities and associated information are then stored in json files from which the information can be used for modelling.
50-
51-
The fieldactivity application uses UI elements that are created with RShiny and therefore follows the R coding conventions. At the moment, to meet these R coding criteria, the data structure is read from a json file called ui_structure_json, which contains the necessary attributes to create the UI with R. As this json file is independent and does not communicate with any other data sources, it must be manually updated if the data requirements are to be kept up to date with other data sources. To overcome the potential differences between the data sources, we have created a json data schema ([management-event.schema.json](https://github.com/hamk-uas/fieldobservatory-data-schemas/blob/main/management-event.schema.json)) to act as a single source of truth for different data sources. The GSoC task is to incorporate this schema into the fieldactivity shiny app such that it can read the variable information from the schema and store the data in the correct structure. In addition, the app should be made flexible such that when a change is made to the json schema, it can deploy and change / create UI elements accordingly on the fly. To achieve this, the functionalities around how the applications store the data need to be reconstructed.
52-
53-
**Expected outcome:**
54-
55-
The project can be divided to following subtasks:
48+
#### [Database Improvements](#db)
5649

57-
1. The fieldactivity application will be able to handle/read the data, which have been stored in the current way or structured according to the management data schema.
58-
2. The data storage convention will be changed for those management cases, where it is possible to store multiple incidents at once. Currently these cases are stored in a list in a format that the data schema doesn’t support.
59-
3. Include the data schema as part of the fieldactivity code:
60-
- Variable names and metadata are read from the data schema. This also requires translation of the data schema information so that UI elements can be created in R Shiny.
61-
- Stored data follows the structure and the names of the data schema.
50+
**Chris TODO**
51+
- decouple traits from provenance
52+
- make betydb.org data available through R package
6253

63-
**Prerequisites:**
6454

65-
- Required: R and RShiny, json
6655

6756
**Contact person:**
68-
Henri Kajasilta
57+
Chris Black (@infotroph)
6958

7059
**Duration:**
7160
Flexible to work as either a Small (175hr) or Large (350 hr)
7261

7362
**Difficulty:**
74-
Medium
63+
Medium, Large
7564

7665
---
7766

78-
#### [PEcAn Code Hardening by Integration Testing](#testing)
67+
#### [Development of Notebook-based PEcAn Workflows](#notebook)
68+
69+
The PEcAn workflow is currently run using either a web based user interface, an API, or custom R scripts. The web based user interface is easiest to use, but has limited functionality whereas the custom R scripts and API are more flexible, but require more experience.
7970

80-
The proposed project aims to enhance the reliability of PEcAn's integration tests by prioritizing packages associated with overall workflow bottlenecks. The focus will be on preparing contributors to gain an in-depth understanding of PEcAn's inner workings and the interactions between modules. It will commence with prioritizing basic runs to establish a robust foundation that include single site, single model runs to cover the major models. Subsequently, attention will shift towards ensemble runs, diversifying testing scenarios to ensure comprehensive coverage. A specific emphasis will be placed on Data Simulation models for single site, single model runs, with a focus on prominent models. This initiative aims to provide contributors with a holistic perspective on PEcAn's functionality, fostering a deeper understanding of how individual modules contribute to the overall workflow. By combining these elements, the GSoC project seeks to create a structured and immersive learning experience that equips participants to contribute effectively to PEcAn's development while addressing critical workflow bottlenecks.
71+
This project will focus on building Quarto workflows aimed at providing an interface to PEcAn that is both welcoming to new users and flexible enough to be a starting point for more advanced users. It will build on existing [Pull Request 1733](https://github.com/PecanProject/pecan/pull/1733).
8172

8273
**Expected outcome:**
8374

84-
- Increased module and model coverage in PEcAn’s automated integration tests; contributors can understand which components are and are not covered by existing tests.
75+
- Two or more template workflows for running the PEcAn workflow. Written vignette and video tutorial introducing their use.
8576

8677
**Prerequisites:**
8778

88-
- R
79+
- Familiarity with R. Familiarity with R studio and Quarto or Rmarkdown is a plus.
8980

9081
**Contact person:**
91-
Chris Black (@infotroph), Shashank Singh (@moki1202)
82+
David LeBauer @dlebauer, Nihar Sanda @koolgax99
9283

9384
**Duration:**
94-
Flexible to work as either a Small (175hr) or Large (350 hr)
85+
Medium (175hr)
9586

9687
**Difficulty:**
97-
Medium, Large
88+
Medium
9889

99-
---
90+
91+
92+
<!--
93+
94+
95+
# This comment section for ideas that may be potentially viable in future (with revision)
10096

10197
#### [Optimize PEcAn for freestanding use of single packages [R package development]](#freestanding)
10298

@@ -124,12 +120,11 @@ Flexible to work as either a Small (175hr) or Large (350 hr)
124120

125121
**Difficulty:**
126122
Medium, Large
127-
128123
---
129124

130125
#### [PEcAn model coupling and development [Data Science]](#coupling)
131126

132-
PEcAn has the capability to interface multiple ecological models. The goal of this project is to improve the coupling of existing models to PEcAn (specifically FATES) and add new models (specifically a simple vegetation model that is under development). It is also possible to contribute to the development of the simple vegetation model which is written in fortran.
127+
PEcAn has the capability to interface multiple ecological models. The goal of this project is to improve the coupling of existing models to PEcAn (specifically FATES) and add new models (specifically a simple vegetation model that is under development). It is also possible to contribute to the development of the simple vegetation model which is written in Fortran.
133128

134129
**Expected outcome:**
135130

@@ -149,51 +144,5 @@ Flexible to work as either a Small (175hr) or Large (350 hr)
149144
Medium
150145

151146
---
147+
-->
152148

153-
#### [Development of Notebook-based PEcAn Workflows](#notebook)
154-
155-
The PEcAn workflow is currently run using either a web based user interface, an API, or custom R scripts. The web based user interface is easiest to use, but has limited functionality whereas the custom R scripts and API are more flexible, but require more experience.
156-
157-
This project will focus on building Quarto workflows aimed at providing an interface to PEcAn that is both welcoming to new users and flexible enough to be a starting point for more advanced users. It will build on existing [Pull Request 1733](https://github.com/PecanProject/pecan/pull/1733).
158-
159-
**Expected outcome:**
160-
161-
- Two or more template workflows for running the PEcAn workflow. Written vignette and video tutorial introducing their use.
162-
163-
**Prerequisites:**
164-
165-
- Familiarity with R. Familiarity with R studio and Quarto or Rmarkdown is a plus.
166-
167-
**Contact person:**
168-
David LeBauer @dlebauer, Nihar Sanda @koolgax99
169-
170-
**Duration:**
171-
Small (175hr)
172-
173-
**Difficulty:**
174-
Medium
175-
176-
---
177-
178-
#### [PEcAn in the cloud](#cloud)
179-
180-
The PEcAn system is a complex system with many microservices such as the database system, frontend, models, job management etc. These microservices lend themselves to be deployed in the cloud. We have an existing helm chart that should get you most of the way there and should allow you to deploy pecan on kubernetes. Additionally there is a docker-compose file that should allow you to deploy PEcAn on a single server using docker.
181-
182-
This project will take the helm chart and docker-compose files and harden them and upgrade them to use the latest versions of containers. The current system uses the shared folder not only to deploy data in all services, but also uses it to let the central system know when executions are finished. We would like to move away from this shared system and use the message system to indicate executions are done, and use a file service to pull and push data (for example from/to S3).
183-
184-
**Expected outcome:**
185-
186-
- Updates to docker-compose and helm chart, as well as code submissions to mark executions as finished using RabbitMQ and file push/pull functionality when executing jobs.
187-
188-
**Prerequisites:**
189-
190-
- Familiarity with Kubernetes, Docker, Helm and R. Familiarity with RabbitMQ and postgreSQL is a plus
191-
192-
**Contact person:**
193-
Rob Kooper @kooper, Samu Varjonen @samu, Istem Fer @istfer
194-
195-
**Duration:**
196-
Large (350 hr)
197-
198-
**Difficulty:**
199-
Medium

0 commit comments

Comments
 (0)