This repository provides a machine learning pipeline designed to estimate mean and low reference flows for Brazilian river stretches. It includes scripts for data collection, preprocessing, model training, and evaluation.
-
Description: Data was collected using the Google Earth Engine Python API to extract hydrological and environmental metrics for Brazilian river stretches.
-
File:
src/data_treatment/gee_data_extract.py
-
Description: The raw data was processed using topological information from the Brazilian Hydrography Ottocodified (BHO) to generate features. Ran in the following order:
-
1. Structure Flow data:
src/data_treatment/org_flow.py
-
2. Aggregate all input data:
src/data_treatment/agg_att.py
-
3. Aggregated attributes to catchment accumulated:
src/data_treatment/acc_att.py
-
4. Structure All the data to be used by the ML models:
src/data_treatment/to_ml.py
-
Description: Six ML models were processed. A K-fold CV was used at the gauging sites, and the all gauging data was used for all ungauged sites, for all models.
-
File:
src/process_modelig/model_run.py
-
Description: The trained model was evaluated, and performance metrics were saved.
-
1. Evaluation of averaged ensemble combination:
src/process_post/ens_eval.py
-
2. Processing of the best ensemble combination to all data:
src/process_post/ens_run.py
-
3. Uncertainty estimation:
src/process_post/unc_run.py
Clone the Repository:
bash git clone https://github.com/barbedorafael/ml_pipeline.git cd ml_pipeline
Install Dependencies:
Install required libraries with:
bash pip install -r requirements.txt
- Python 3.10+
- Google Earth Engine Python API
- Additional Python libraries (see
requirements.txt
)
Suggestions, bug reports, and contributions are welcome! Open an issue or submit a pull request to improve the workflow.
This project is licensed under the MIT License. See the LICENSE
file for details.