SACAC Exploratory Data Analysis workshop

Repository to host the SACAC Exploratory Data Analysis workshop files (25 March 2024)

Setup

Cloning the repository to your Google Drive

The workshop will include multiple hands-on examples, and participants will have the opportunity to get to know the various EDA tools available. We will rely on the Python programming language, implemented in Google Colab which requires no setup and provides access to compute resources.

To take part in the workshop, you will need to clone this GitHub repository (repo) into your own personal Google Drive.

The clone-repo file will guide you through the process of cloning the repo to your own Google Drive. You can open the clone-repo file directly by clicking on the "Open in Colab" button below. Note that this will navigate you to the Google Colab site, so you may want to open in a new tab. You will need to provide permission for the repository to be cloned to your Google Drive. If you prefer, you may simply download the repository as a zip-file and upload it to your Google Drive manually.

Checking your setup in Google Colab

After you have cloned the repo to your Google Drive, we suggest you run the test-setup notebook to make sure everything is running correctly.

Go to your Google Drive and navigate to the SACAC-EDA-2024 folder you've just cloned.
Go to /examples and open the test_setup notebook
Make sure to edit the path in the first cell of the notebook, as explained in the instructions.
You should be able to run all cells in the notebook

Data used during the EDA workshop

The data used during the workshop is not hosted on GitHub. Please see the README.md file in the \data folder of this repository for information on loading the necessary data.

Topics covered during the workshop

Nature of data / Pre-processing / Time series

Context of process data analysis (CRISP-DM, process monitoring)
Process data - origins and ingest
Challenging process data characteristis
Data visualization
Data cleaning (removal, smoothing, replacement, downsampling)
Moving averages, exponential moving average --> rolling window noise removal
Missing values --> covered above
Omit for this addition: Autocorrelation
Done: Other visualisations? Scatter plots?
Omit for this addition: Granger Causality
Omit for this addition: Steady-state detection

Dimensionality reduction

Principal component analysis
Manifold learning
Auto associative network / Autoencoders (bonus content)

Clustering

K-means clustering
DBSCAN
Interpreting clustering

Model interpretation (w/ tree-based methods)

Decision tree introduction
Variable importance + partial dependence
Omit for this edition of the workshop: SHAP

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
data		data
examples		examples
slides		slides
LICENSE		LICENSE
README.md		README.md
clone-repo.ipynb		clone-repo.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SACAC Exploratory Data Analysis workshop

Setup

Cloning the repository to your Google Drive

Checking your setup in Google Colab

Data used during the EDA workshop

Topics covered during the workshop

Nature of data / Pre-processing / Time series

Dimensionality reduction

Clustering

Model interpretation (w/ tree-based methods)

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tmlouw/SACAC-EDA-2024

Folders and files

Latest commit

History

Repository files navigation

SACAC Exploratory Data Analysis workshop

Setup

Cloning the repository to your Google Drive

Checking your setup in Google Colab

Data used during the EDA workshop

Topics covered during the workshop

Nature of data / Pre-processing / Time series

Dimensionality reduction

Clustering

Model interpretation (w/ tree-based methods)

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages