Repository to host the SACAC Exploratory Data Analysis workshop files (25 March 2024)
The workshop will include multiple hands-on examples, and participants will have the opportunity to get to know the various EDA tools available. We will rely on the Python programming language, implemented in Google Colab which requires no setup and provides access to compute resources.
To take part in the workshop, you will need to clone this GitHub repository (repo) into your own personal Google Drive.
The clone-repo file will guide you through the process of cloning the repo to your own Google Drive. You can open the clone-repo file directly by clicking on the "Open in Colab" button below. Note that this will navigate you to the Google Colab site, so you may want to open in a new tab. You will need to provide permission for the repository to be cloned to your Google Drive. If you prefer, you may simply download the repository as a zip-file and upload it to your Google Drive manually.
After you have cloned the repo to your Google Drive, we suggest you run the test-setup notebook to make sure everything is running correctly.
- Go to your Google Drive and navigate to the SACAC-EDA-2024 folder you've just cloned.
 - Go to 
/examplesand open thetest_setupnotebook - Make sure to edit the path in the first cell of the notebook, as explained in the instructions.
 - You should be able to run all cells in the notebook
 
The data used during the workshop is not hosted on GitHub. Please see the README.md file in the \data folder of this repository for information on loading the necessary data.
- Context of process data analysis (CRISP-DM, process monitoring)
 - Process data - origins and ingest
 - Challenging process data characteristis
 - Data visualization
 - Data cleaning (removal, smoothing, replacement, downsampling)
 - Moving averages, exponential moving average --> rolling window noise removal
 - Missing values --> covered above
 - Omit for this addition: Autocorrelation
 - Done: Other visualisations? Scatter plots?
 - Omit for this addition: Granger Causality
 - Omit for this addition: Steady-state detection
 
- Principal component analysis
 - Manifold learning
 - Auto associative network / Autoencoders (bonus content)
 
- K-means clustering
 - DBSCAN
 - Interpreting clustering
 
- Decision tree introduction
 - Variable importance + partial dependence
 - Omit for this edition of the workshop: SHAP