A lightweight Swift-based feature extraction library for transforming raw audio chunks into log-Mel spectrograms, suitable for use in CoreML and on-device inference.
Built with ❤️ for on-device audio intelligence.
You can add OtosakuFeatureExtractor as a Swift Package dependency:
.package(url: "https://github.com/Otosaku/OtosakuFeatureExtractor-iOS.git", from: "1.0.2")Then add it to the target dependencies:
.target(
name: "YourApp",
dependencies: [
.product(name: "OtosakuFeatureExtractor", package: "OtosakuFeatureExtractor")
]
)[Raw Audio Chunk (Float64)]
↓ pre-emphasis
[Pre-emphasized audio]
↓ STFT (with Hann window)
[STFT result (complex)]
↓ Power Spectrum
[|FFT|^2]
↓ Mel Filterbank Projection (matrix multiply)
[Mel energies]
↓ log(ε + x)
[Log-Mel Spectrogram]
↓ MLMultiArray
[CoreML-compatible tensor]
You must provide a directory containing:
filterbank.npy— shape[80, 201], float32 or float64hann_window.npy— shape[400], float32 or float64
import OtosakuFeatureExtractor
let extractor = try OtosakuFeatureExtractor(directoryURL: featureFolderURL)- 🎛 Feature Extractor Assets
Download precomputedfilterbank.npyandhann_window.npyfiles required byOtosakuFeatureExtractor.
➡️ OtosakuFeatureExtractor Assets (.zip)
💬 Want a model trained on custom keywords?
Drop me a message at otosaku.dsp@gmail.com — let’s talk!
The input must be a raw audio chunk as Array<Double>, typically at 16kHz sample rate.
let logMel: MLMultiArray = try extractor.processChunk(chunk: audioChunk)
audioChunkshould be at least 400 samples long to match the FFT window size.
saveLogMelToJSON(logMel: features)- Accelerate — for optimized DSP
- CoreML
- pocketfft
- plain-pocketfft
OtosakuFeatureExtractor/
├── Sources/
│ └── OtosakuFeatureExtractor/
│ ├── OtosakuFeatureExtractor.swift
├── filterbank.npy
├── hann_window.npy
Project by @otosaku-ai under the Otosaku brand.
MIT License