Skip to content

sergezaugg/feature_extraction_saec

Repository files navigation

Feature extraction with pre-trained spectrogram auto-encoders (fe_saec)

Overview

  • A python package to extract encoder-based features from spectrograms
  • Extracts array features with pre-trained encoders and converts them to linear features (details in pic below)
  • Encoders perform partial pooling of time axis (latent array representation is 2D -> channel by time)
  • Extracted features are meant to be used in companion project and its frontend

Intallation (usage in Python project)

  • Tested for Python 3.11 and 3.12
  • Make a fresh venv an install fe_saec from Python package wheel found on this github repo
  • pip install https://github.com/sergezaugg/feature_extraction_saec/releases/download/vx.x.x/fe_saec-x.x.x-py3-none-any.whl
  • torch and torchvision must be installed separately for specific CUDA version
  • pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126 (e.g. for Windows with CUDA 12.6 and Python 3.12.8)
  • If other CUDA version needed, check official pytorch instructions

Usage

  • Prepare PNG formatted color images of spectrograms, e.g. with this tool
  • sample_code.py illustrates a pipeline to extract features
  • Extracted features are written to disk as NPZ files in parent of images dir.

Project Structure

├── dev/                # Data, models, and dirs for code development
├── pics/               # Pictures for documentation
├── src/                # Source code (Python package)
├── tests/              # Tests for CI
├── pyproject.toml      # Build configuration
├── README.md           # Project documentation
├── requirements.txt    # Python dependencies
└── sample_code.py      # Example usage script

ML details

Example image

About

Python package for reproducible feature extraction from spectrograms with custom pre-trained encoders

Topics

Resources

License

Stars

Watchers

Forks