Welcome to our project! This guide will help you get started with running the machine learning models in this repository.
There are two main Jupyter notebooks in this project, each serving a different purpose:
-
run.ipynb:
- Use this file if you want to dive deep into all implementations and the reasoning behind them. It's a comprehensive notebook that covers the entire workflow from data preprocessing, both unsupervised and supervised approach, model training and evaluation.
-
predict.ipynb:
- This file is for those who want to directly run the LSTM supervised model. It's designed for quick execution and getting predictions from the model without getting into the underlying details.
Before running either of the Jupyter notebooks, please ensure the following steps are completed:
a. Data Download:
- Download the necessary datasets (both
.parquet
and.csv
files) into/from a folder namedQCEH_data
. This data is crucial for the execution of the notebooks. Note that only a subset of the data we used is present here as examples. If you want access to additional data, feel free to contact the EPFL SPC lab, further data could be made available upon reasonable request. - Download the computationally expensive data from the
precomputed_data
directory. This data is used for certain parts of the analysis to save on computation time.
b. Helper Existence:
- Have in the same folder as the jupyter notebook, the helper file
helpers.py
. This file contains many tools for data analysis and collection.
c. Model Download:
- Download the needed model and parameters file named
ltsm10^-3_4_7_0.2
(model),x_mean.pt
andx_std.pt
(means and std for standardization) from theQCE_data
folder
After setting up the data, you can simply open and run the Jupyter notebooks:
- run.ipynb
- predict.ipynb
Please note that some cells in the run notebook might execute faster than others. The entire run.ipynb
may take approximately 1-2 hours of computation time, depending on your system's capabilities. predict.ipynb
on the other hand is quite fast.
- Ensure that your Python environment has all the required dependencies installed. Refer to the
requirements.txt
file for a list of necessary packages. - It's recommended to run these notebooks in a virtual environment to avoid dependency conflicts with other Python projects.
- Note that we preserved the separated implementations of each part of the analysis in the github corresponding to each section of the
run.ipynb
. these files are respectivelyinitial_analysis.ipynb
,clustering.ipynb
,supervised_baseline.ipynb
andRNNs.ipynb
do not hesitate to refer to them if you need to run a single part. - For faster computation please find in the following google drive files you can add to the
/precomputed_data
directory : precomputed labels, precomputed dbscan
Happy Analyzing!