SurgeSynthesis: PyTorch VAE for Synthesizer Parameter Prediction

This project converts the original TensorFlow-based automatic synthesizer programming system to PyTorch, providing a modular and modern implementation for training variational autoencoders (VAEs) to predict synthesizer parameters from audio spectrograms.

Features

PyTorch Implementation: Modern PyTorch-based VAE models for synthesizer parameter prediction
Multi-Synthesizer Support: Support for Tyrell N6, Serum, and Diva synthesizers
WandB Integration: Comprehensive experiment tracking and visualization
Modular Architecture: Clean, maintainable code structure
Advanced Training Features: Early stopping, learning rate scheduling, model checkpointing
KL Divergence Warmup: Gradual introduction of KL loss for stable training

Project Structure

SurgeSynthesis/
├── models.py              # VAE model definitions
├── dataset.py             # Dataset and data loading utilities
├── utils.py               # Training utilities and loss functions
├── train_tyrell.py        # Training script for Tyrell N6
├── train_serum.py         # Training script for Serum
├── train_diva.py          # Training script for Diva
├── run_inference.py       # Inference script for trained models
├── requirements.txt       # Python dependencies
├── README.md             # This file
└── saved_models/         # Directory for saved model checkpoints
    ├── vae_tyrell/
    ├── vae_serum/
    └── vae_diva/

Requirements

System Requirements

Python 3.8 or higher
CUDA-capable GPU (recommended for training)
8GB+ RAM

Python Dependencies

Install the required packages using:

pip install -r requirements.txt

Main dependencies:

PyTorch (>= 2.0.0)
NumPy
Weights & Biases (wandb)
librosa
tqdm
matplotlib

Data Preparation

The training scripts expect data in NumPy format with the following structure:

Single Synthesizer Data

npy_data/
├── train_mels.npy              # Training spectrograms
├── train_tyrell_params.npy     # Training Tyrell parameters
├── train_tyrell_mask.npy       # Training Tyrell masks (optional)
├── valid_mels.npy              # Validation spectrograms
├── valid_tyrell_params.npy     # Validation Tyrell parameters
├── valid_tyrell_mask.npy       # Validation Tyrell masks (optional)
├── test_mels.npy               # Test spectrograms
├── test_tyrell_params.npy      # Test Tyrell parameters
└── test_tyrell_mask.npy        # Test Tyrell masks (optional)

Multi-Synthesizer Data (Optional)

If you have mixed synthesizer data, also include:

npy_data/
├── ... (all files above)
├── train_synth.npy             # Synthesizer labels for training
├── valid_synth.npy             # Synthesizer labels for validation
├── test_synth.npy              # Synthesizer labels for testing
├── train_serum_params.npy      # Training Serum parameters
├── train_serum_mask.npy        # Training Serum masks (optional)
├── train_diva_params.npy       # Training Diva parameters
├── train_diva_mask.npy         # Training Diva masks (optional)
├── valid_serum_params.npy      # Validation Serum parameters
├── valid_serum_mask.npy        # Validation Serum masks (optional)
├── valid_diva_params.npy       # Validation Diva parameters
├── valid_diva_mask.npy         # Validation Diva masks (optional)
├── test_serum_params.npy       # Test Serum parameters
├── test_serum_mask.npy         # Test Serum masks (optional)
├── test_diva_params.npy        # Test Diva parameters
└── test_diva_mask.npy          # Test Diva masks (optional)

Data Format Requirements

Spectrograms: Shape (N, H, W) where N=samples, H=128, W=431
Parameters: Shape (N, P) where P varies by synthesizer (Tyrell: 18005, Serum: 23325, Diva: 16237)
Masks: Shape (N, P) - binary masks for parameter validity (optional)
Synth Labels: Shape (N,) - strings indicating synthesizer type ('tyrell', 'serum', 'diva')

Data Validation

Before training, validate your data format:

python validate_data.py /path/to/npy_data tyrell

This script checks:

File existence and structure
Array shapes and data types
Data consistency between spectrograms and parameters
Value ranges and formats

Training

WandB Setup (Required)

Before training, set up Weights & Biases for experiment tracking:

wandb login

Basic Training Commands

Train Tyrell N6 Model

python train/train_tyrell.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4 --wandb

Train Serum Model

# Run from the SurgeSynthesis directory:
python train/train_serum.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4

# Or with GPU specification:
CUDA_VISIBLE_DEVICES=3 python train/train_serum.py --data-dir your/npy_data --epochs 500 --batch-size 128 --learning-rate 5e-5

Train Diva Model

python train/train_diva.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4

Train Multi-Synthesizer Model

python train/train_multi.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4 --wandb

This script trains a single model that can handle all three synthesizers simultaneously.

Advanced Training Options

All training scripts support the following arguments:

Argument	Description	Default
`--data-dir`, `-d`	Directory containing training data	`npy_data`
`--synth-type`, `-s`	Synthesizer type (tyrell/serum/diva)	Script-specific
`--latent-size`, `-l`	Latent dimension size	`64`
`--batch-size`, `-b`	Batch size for training	`32`
`--learning-rate`, `-lr`	Learning rate	`1e-4`
`--epochs`, `-e`	Number of epochs	`500`
`--warmup-epochs`, `-w`	KL loss warmup epochs	`100`
`--save-freq`	Save model every N epochs	`100`
`--early-stopping-patience`	Early stopping patience	`20`
`--num-workers`	Data loading workers	`4`

Example with Custom Parameters

python train/train_tyrell.py \
    --data-dir ./my_data \
    --latent-size 128 \
    --batch-size 64 \
    --learning-rate 5e-5 \
    --epochs 1000 \
    --warmup-epochs 200 \
    --save-freq 50

Important Notes

Working Directory: All training commands should be run from the SurgeSynthesis/ directory (where this README is located)
GPU Selection: Use CUDA_VISIBLE_DEVICES=N to specify which GPU to use (where N is the GPU number)
Module Imports: The training scripts use absolute imports and expect to find the SurgeSynthesis module

If you encounter import errors, make sure you're in the correct directory:

cd /path/to/SurgeSynthesis
python train/train_serum.py [arguments...]

Inference

Run inference on trained models using the inference script:

python run_inference.py \
    --data-dir your/npy_data \
    --model-path ../model_backup/saved_models/vae_serum/best_model.pth \
    --synth-type serum \
    --output-dir inference_results

Inference Arguments

Argument	Description	Required
`--data-dir`, `-d`	Directory containing test data	Yes
`--model-path`, `-m`	Path to trained model checkpoint	Yes
`--synth-type`, `-s`	Synthesizer type (tyrell/serum/diva)	Yes
`--output-dir`, `-o`	Directory to save results	No (default: `inference_results`)
`--latent-size`, `-l`	Latent dimension size	No (default: `64`)
`--batch-size`, `-b`	Batch size for inference	No (default: `32`)

Model Architecture

The PyTorch implementation provides several VAE model variants, all with data-determined parameter sizes:

Key Features

🎯 Data-Driven Architecture: Parameter dimensions are automatically determined from your training data, not hardcoded
🔄 KL Warmup: Gradual introduction of KL divergence for stable training
🎛️ Multi-Synthesizer Support: Single model handles multiple synthesizers with proper masking
⚡ Dynamic Processing: Synthesizer-specific neural network weights and adaptive filters
📊 WandB Integration: Complete experiment tracking and visualization

Available Models

1. Single Synthesizer VAE (`BaseVAE`, `VAESerum`, `VAEDiva`, `VAETyrell`)

Purpose: Dedicated models for individual synthesizers
Parameter Size: Automatically detected from training data (e.g., Tyrell: ~18K params, Serum: ~23K params)
Features: KL warmup, mask processing, synthesizer-specific architecture

2. Multi-Synthesizer VAE (`MultiSynthVAE`)

Purpose: Single model handling all three synthesizers simultaneously
Parameter Sizes: Each synthesizer decoder sized based on actual data dimensions
Features: Separate decoders for each synthesizer, proper mask handling, shared latent representation

3. Dynamic VAE (`DynamicVAE`)

Purpose: Adaptive processing with synthesizer-specific neural network weights
Features: Dynamic weight generation, synthesizer-specific processing layers

4. Dynamic MLP VAE (`DynamicMLPVAE`)

Purpose: Enhanced dynamic processing with multiple adaptation layers
Features: Synthesizer-specific MLP layers, adaptive feature processing

Parameter Size Detection

The models automatically determine parameter sizes from your training data:

# Training data determines model architecture
train_params = np.load("train_tyrell_params.npy")  # Shape: [N, 18005]
param_size = train_params.shape[-1]  # Automatically detected: 18005

# Model is created with correct dimensions
model = create_vae_model('tyrell', param_size=param_size)

This ensures the PyTorch models exactly match your data format, regardless of parameter count variations.

Monitoring Training

Training progress is automatically logged to Weights & Biases, including:

Training and validation losses
Individual loss components (spectrogram, parameters, KL divergence)
Learning rate changes
Model checkpoints

Access your WandB dashboard to monitor training in real-time.

Model Checkpoints

Models are automatically saved in the following locations:

saved_models/vae_{synth_type}/best_model.pth: Best validation loss model
saved_models/vae_{synth_type}/final_model.pth: Final epoch model
saved_models/vae_{synth_type}/checkpoint_epoch_{N}.pth: Periodic checkpoints

Each checkpoint contains:

Model state dictionary
Optimizer state dictionary
Training epoch
Loss value

Performance Tips

GPU Training

Use CUDA-capable GPU for faster training
Adjust batch size based on available GPU memory
Use num_workers > 0 for faster data loading

Memory Optimization

Reduce batch size if encountering OOM errors
Use gradient accumulation for effective larger batch sizes
Monitor GPU memory usage during training

Training Stability

Start with default hyperparameters
Use learning rate scheduling for better convergence
Enable early stopping to prevent overfitting

Troubleshooting

Common Issues

CUDA Out of Memory
- Reduce batch size
- Use CPU training (CUDA_VISIBLE_DEVICES="")
Data Loading Errors
- Verify data file paths and formats
- Check NumPy file compatibility
- Ensure sufficient disk space
WandB Issues
- Run wandb login before training
- Check internet connection
- Set WANDB_MODE=offline for offline logging
Model Loading Errors
- Ensure model architecture matches checkpoint
- Check file paths and permissions
- Verify PyTorch version compatibility

Getting Help

For issues specific to this implementation, check:

Error messages and stack traces
WandB logs for training anomalies
Data format and preprocessing steps

Differences from Original TensorFlow Implementation

Framework: PyTorch instead of TensorFlow
Structure: Modular design with separate files for models, data, and utilities
Logging: WandB integration instead of basic logging
Features: Enhanced with early stopping, learning rate scheduling, and better checkpointing
Code Style: More Pythonic and maintainable code structure

License

This project maintains compatibility with the original implementation while providing a modern PyTorch-based approach to automatic synthesizer programming.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
__pycache__		__pycache__
data		data
data_generation		data_generation
inference		inference
models		models
train		train
utils		utils
wandb		wandb
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
data_analysis.ipynb		data_analysis.ipynb
get_audio.ipynb		get_audio.ipynb
output1.wav		output1.wav
requirements.txt		requirements.txt
setup.sh		setup.sh

JiangXunyi/DeSynth

Folders and files

Latest commit

History

Repository files navigation