Miipher-2

Unofficial implementation of Miipher-2: High-quality speech enhancement via HuBERT + Parallel Adapter

Key Features • Demo • Quick Start • Model Zoo • Training • Evaluation • Citation

🚀 Key Features

Speech enhancement based on Miipher-2 architecture
Lightweight Parallel Adapter design for efficient feature adaptation
Pre-trained models available on 🤗 Hugging Face
Comprehensive evaluation pipeline with multiple metrics

🎧 Demo

Experience the power of our model speech enhancement:

🔊 Degraded Audio	✨ Enhanced Audio
Noisy input	Clean output

🛠️ Quick Start

Prerequisites

# Install dependencies using uv
uv sync

📁 Project Structure

open-miipher-2/
├── configs/          # Hydra configuration files
├── src/miipher_2/    # Core Python modules
├── cmd/              # CLI entry points
├── exp/              # Model checkpoints
└── docs/             # Documentation

🚀 Quick Inference

Use our pre-trained model for instant speech enhancement:

# Download pre-trained model from Hugging Face
# Model: miipher-2-HuBERT-HiFi-GAN-v0.1

# Run inference on your audio files
uv run cmd/inference_dir.py --config-name infer_dir

🤗 Model Zoo

Model	SSL Backbone	Adapter Layers	Vocoder	Download
miipher-2 HuBERT HiFi-GAN v0.1	mHuBERT-147	Layer 6	HiFi-GAN	🤗 HuggingFace

📚 Training

Step 1: Data Preprocessing

Generate pseudo-degraded dataset from clean speech:

# Process JVS corpus (Japanese)
uv run cmd/preprocess.py --config-name preprocess_jvs

# Process LibriTTS (English)
uv run cmd/preprocess.py --config-name preprocess_libritts_r

# Process FLEURS (Multilingual)
uv run cmd/preprocess.py --config-name preprocess_fleurs_r

Output is saved in WebDataset format for efficient data loading.

Step 2: Train Parallel Adapter

# Train adapter module
uv run cmd/train_adapter.py --config-name adapter_layer_6_mhubert_147

# Resume from checkpoint
uv run cmd/train_adapter.py \
    checkpoint.resume_from="exp/adapter_layer_6_mhubert_147/checkpoint_199k.pt" \
    --config-name adapter_layer_6_mhubert_147

Step 3: Train SSL-Vocoder

# Pre-train Lightning SSL-Vocoder
uv run cmd/pre_train_vocoder.py --config-name hifigan_pretrain_layer_6_mhubert_147

💡 Note: Configuration is automatically inherited from checkpoint unless explicitly overridden.

📊 Evaluation

Step 1: Generate Degraded Test Data

Create evaluation dataset with various noise conditions:

uv run cmd/degrade.py \
    --clean_dir <path_to_clean_audio> \
    --noise_dir <path_to_noise_samples> \
    --out_dir <output_directory>

Step 2: Run Enhancement

Process degraded audio through the model:

uv run cmd/inference_dir.py --config-name infer_dir

Step 3: Compute Metrics

Evaluate enhancement quality with multiple metrics:

uv run cmd/evaluate.py \
    --clean_dir <clean_audio_dir> \
    --degraded_dir <degraded_audio_dir> \
    --restored_dir <enhanced_audio_dir> \
    --outfile results.csv

Metrics include:

PESQ (Perceptual Evaluation of Speech Quality)
STOI (Short-Time Objective Intelligibility)
SI-SDR (Scale-Invariant Signal-to-Distortion Ratio)
MOS-LQO (Mean Opinion Score)

🏗️ Architecture

Key Components

HuBERT Feature Extractor: Multilingual HuBERT (mHuBERT-147) for robust speech representations
Parallel Adapter: Lightweight feed-forward network inserted at specific layers
Feature Cleaner: Denoising module operating on SSL features
Lightning SSL-Vocoder: HiFi-GAN-based vocoder

🔧 Configuration

All configurations are managed through Hydra. Key config files:

configs/adapter_layer_6_mhubert_147.yaml - Adapter training
configs/infer_dir.yaml - Inference settings
configs/preprocess_*.yaml - Data preprocessing

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
cmds		cmds
configs		configs
demo		demo
docs		docs
results		results
samples		samples
scripts		scripts
src/miipher_2		src/miipher_2
.gitignore		.gitignore
.gitmodules		.gitmodules
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
auto_resume_training.sh		auto_resume_training.sh
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Miipher-2

🚀 Key Features

🎧 Demo

🛠️ Quick Start

Prerequisites

📁 Project Structure

🚀 Quick Inference

🤗 Model Zoo

📚 Training

Step 1: Data Preprocessing

Step 2: Train Parallel Adapter

Step 3: Train SSL-Vocoder

📊 Evaluation

Step 1: Generate Degraded Test Data

Step 2: Run Enhancement

Step 3: Compute Metrics

🏗️ Architecture

Key Components

🔧 Configuration

About

Uh oh!

Packages

Languages

Atotti/miipher-2

Folders and files

Latest commit

History

Repository files navigation

Miipher-2

🚀 Key Features

🎧 Demo

🛠️ Quick Start

Prerequisites

📁 Project Structure

🚀 Quick Inference

🤗 Model Zoo

📚 Training

Step 1: Data Preprocessing

Step 2: Train Parallel Adapter

Step 3: Train SSL-Vocoder

📊 Evaluation

Step 1: Generate Degraded Test Data

Step 2: Run Enhancement

Step 3: Compute Metrics

🏗️ Architecture

Key Components

🔧 Configuration

About

Resources

Uh oh!

Stars

Watchers

Forks

Packages 0

Languages

Packages