Deep-Spike: Foundation Model-based Pipeline for Large-Scale Spike Sorting of Neural Activity
Spike sorting of high-resolution neural recordings is essential for understanding brain activity, but it remains challenging when multiple units are recorded due to their overlapping spike timing, low signal-to-noise ratios and overlapping clusters. Here, we introduce DeepSpike, a self-supervised deep learning model that automates spike sorting and overcomes key limitations of conventional spike sorting methods. DeepSpike is pretrained on large-scale unlabelled spiking events obtained from electrophysiological data as a general foundation model, enabling it to generalize to new recordings without dataset-specific retraining. DeepSpike uses a self-supervised autoencoder to learn robust low-dimensional spike embeddings that facilitate accurate clustering and effective noise filtering. The model is trained on a new, large-scale dataset consisting of
- End-to-end spike sorting workflow
- Deep learning-based feature extraction (Autoencoder, VAE)
- Multiple clustering methods (GMM, DPGMM, HDBSCAN, KMeans)
- Integration with SpikeInterface for standardized spike sorting and evaluation
- Visualization tools for embeddings and clustering results
- Support for large public datasets
clustering.py
: Clustering algorithms and utilitiesdataset.py
: Dataset loading and preprocessingmodels.py
: Deep learning models (VAE)preprocess.py
: Data preprocessing functionsutils.py
: Utility functionsmodels/
: Pretrained model weightsnotebooks/
: Example Jupyter notebooks and analysis pipelinestables/
: SpikeVault255M and Public dataset metrics and recording details
-
Clone the repository
git clone https://github.com/HughYau/DeepSpike.git cd DeepSpike
-
Install dependencies Make sure you have Python 3.8+ and install the required packages:
pip install -r requirements.txt
-
Run example notebooks Open notebooks/deep_spike_guideline.ipynb for a step-by-step demonstration.