This repository contains the full pipeline for an EEG-based research project focused on power spectral analysis, machine learning, and neural network classification to distinguish cognitive tasks and ASD profiles.
EEG_Research_Project/
βββ figures/ # Visual outputs (e.g., PSD plots, scalp maps)
ml_models/
βββ 01_asd_vs_td_classification.ipynb
βββ 01_asd_vs_td_classification.py
βββ 02_asd_vs_td_task_agnostic_model.ipynb
βββ 02_asd_vs_td_task_agnostic_model.py
βββ preprocessing/ # EEG preprocessing notebooks
β βββ eeg_preprocessing_pipeline.ipynb
βββ scripts/ # Python scripts for preprocessing and feature extraction
β βββ eeg_preprocessing_pipeline.py
βββ tables/ # CSVs with extracted features and results
βββ README.md # Project overview and structure
βββ requirements.txt # Dependencies
βββ LICENSE # Usage license
βββ .gitignore # Git exclusions
This project involves:
- Preprocessing EEG data collected via MUSE headset.
- Extracting power spectral features (Delta, Theta, Alpha, Beta, Gamma) using Welchβs method.
- Includes machine learning models (SVM, KNN, RF, and Ensemble) for:
- Classifying participants (ASD vs TD) using task-specific features
- Predicting ASD vs TD using task-agnostic EEG input (data from any task)
- Neural network models are under development for both use cases above.
The EEG preprocessing pipeline involves the following steps:
- Raw EEG Loading β Read multiple CSV files containing
RAW_
EEG columns - NaN and Inf Handling β Use forward-fill, backward-fill, and mean replacement for missing data
- Conversion to MNE Format β Create
RawArray
using MNE with 256 Hz sampling - Artifact Handling via ICA β Use Independent Component Analysis to detect and remove muscle artifacts using high-frequency PSD checks
- Power Spectral Density Calculation β Apply Welchβs method to compute absolute band power across Delta, Theta, Alpha, Beta, and Gamma bands
- Export to CSV β Save averaged power features per file for downstream ML/NN use
-
Data Loading β Features loaded from preprocessed EEG tables
-
Train-Test Split β Stratified sampling with validation
-
Feature Scaling β Using
StandardScaler
-
Modeling Techniques:
- Support Vector Machine (SVM)
- K-Nearest Neighbors (KNN)
- Random Forest (RF)
- VotingClassifier Ensemble (SVM + RF + KNN)
-
Hyperparameter Tuning:
- KNN: Optimal
k
selected through manual accuracy comparison - SVM: Tuned kernel (
linear
,rbf
) and regularization parameterC
- Random Forest: Tuned
n_estimators
,max_depth
, andcriterion
- Ensemble: Implemented soft voting using the best-performing base classifiers
- KNN: Optimal
-
Best Feature Identification:
- Feature importances were calculated using permutation importance
- Applied across all classifiers (SVM, KNN, RF) to measure accuracy drop when features were shuffled
-
Visualizations:
- Bar plot of the top 2 most important EEG features (from permutation importance)
- Bar plot of the least 2 important EEG features (from permutation importance)
-
Evaluation Metrics:
- Accuracy, Precision, Recall, and F1-score using
classification_report
- Compared across all models to select the best-performing classifier
- Accuracy, Precision, Recall, and F1-score using
Notebooks:
01_asd_vs_td_classification.ipynb
β Classifies ASD vs TD using EEG data02_asd_vs_td_task_agnostic_model.ipynb
β Predicts ASD vs TD using task-agnostic EEG input
Install dependencies:
pip install -r requirements.txt
- Add two neural network models:
- NN Model 1: ASD vs TD classification using task-specific EEG features
- NN Model 2: ASD vs TD classification using EEG data from any task (task-agnostic binary model)
- Expand hyperparameter optimization and model comparison
- Visualize time-frequency features and statistical comparisons