- A python package to extract encoder-based features from spectrograms
- Extracts array features with pre-trained encoders and converts them to linear features (details in pic below)
- Encoders perform partial pooling of time axis (latent array representation is 2D -> channel by time)
- Extracted features are meant to be used in companion project and its frontend
- Tested for Python 3.11 and 3.12
- Make a fresh venv an install fe_saec from Python package wheel found on this github repo
pip install https://github.com/sergezaugg/feature_extraction_saec/releases/download/vx.x.x/fe_saec-x.x.x-py3-none-any.whl
- torch and torchvision must be installed separately for specific CUDA version
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu126
(e.g. for Windows with CUDA 12.6 and Python 3.12.8)- If other CUDA version needed, check official pytorch instructions
- Prepare PNG formatted color images of spectrograms, e.g. with this tool
- sample_code.py illustrates a pipeline to extract features
- Extracted features are written to disk as NPZ files in parent of images dir.
├── dev/ # Data, models, and dirs for code development
├── pics/ # Pictures for documentation
├── src/ # Source code (Python package)
├── tests/ # Tests for CI
├── pyproject.toml # Build configuration
├── README.md # Project documentation
├── requirements.txt # Python dependencies
└── sample_code.py # Example usage script