Skip to content

JiangXunyi/DeSynth

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

SurgeSynthesis: PyTorch VAE for Synthesizer Parameter Prediction

This project converts the original TensorFlow-based automatic synthesizer programming system to PyTorch, providing a modular and modern implementation for training variational autoencoders (VAEs) to predict synthesizer parameters from audio spectrograms.

Features

  • PyTorch Implementation: Modern PyTorch-based VAE models for synthesizer parameter prediction
  • Multi-Synthesizer Support: Support for Tyrell N6, Serum, and Diva synthesizers
  • WandB Integration: Comprehensive experiment tracking and visualization
  • Modular Architecture: Clean, maintainable code structure
  • Advanced Training Features: Early stopping, learning rate scheduling, model checkpointing
  • KL Divergence Warmup: Gradual introduction of KL loss for stable training

Project Structure

SurgeSynthesis/
β”œβ”€β”€ models.py              # VAE model definitions
β”œβ”€β”€ dataset.py             # Dataset and data loading utilities
β”œβ”€β”€ utils.py               # Training utilities and loss functions
β”œβ”€β”€ train_tyrell.py        # Training script for Tyrell N6
β”œβ”€β”€ train_serum.py         # Training script for Serum
β”œβ”€β”€ train_diva.py          # Training script for Diva
β”œβ”€β”€ run_inference.py       # Inference script for trained models
β”œβ”€β”€ requirements.txt       # Python dependencies
β”œβ”€β”€ README.md             # This file
└── saved_models/         # Directory for saved model checkpoints
    β”œβ”€β”€ vae_tyrell/
    β”œβ”€β”€ vae_serum/
    └── vae_diva/

Requirements

System Requirements

  • Python 3.8 or higher
  • CUDA-capable GPU (recommended for training)
  • 8GB+ RAM

Python Dependencies

Install the required packages using:

pip install -r requirements.txt

Main dependencies:

  • PyTorch (>= 2.0.0)
  • NumPy
  • Weights & Biases (wandb)
  • librosa
  • tqdm
  • matplotlib

Data Preparation

The training scripts expect data in NumPy format with the following structure:

Single Synthesizer Data

npy_data/
β”œβ”€β”€ train_mels.npy              # Training spectrograms
β”œβ”€β”€ train_tyrell_params.npy     # Training Tyrell parameters
β”œβ”€β”€ train_tyrell_mask.npy       # Training Tyrell masks (optional)
β”œβ”€β”€ valid_mels.npy              # Validation spectrograms
β”œβ”€β”€ valid_tyrell_params.npy     # Validation Tyrell parameters
β”œβ”€β”€ valid_tyrell_mask.npy       # Validation Tyrell masks (optional)
β”œβ”€β”€ test_mels.npy               # Test spectrograms
β”œβ”€β”€ test_tyrell_params.npy      # Test Tyrell parameters
└── test_tyrell_mask.npy        # Test Tyrell masks (optional)

Multi-Synthesizer Data (Optional)

If you have mixed synthesizer data, also include:

npy_data/
β”œβ”€β”€ ... (all files above)
β”œβ”€β”€ train_synth.npy             # Synthesizer labels for training
β”œβ”€β”€ valid_synth.npy             # Synthesizer labels for validation
β”œβ”€β”€ test_synth.npy              # Synthesizer labels for testing
β”œβ”€β”€ train_serum_params.npy      # Training Serum parameters
β”œβ”€β”€ train_serum_mask.npy        # Training Serum masks (optional)
β”œβ”€β”€ train_diva_params.npy       # Training Diva parameters
β”œβ”€β”€ train_diva_mask.npy         # Training Diva masks (optional)
β”œβ”€β”€ valid_serum_params.npy      # Validation Serum parameters
β”œβ”€β”€ valid_serum_mask.npy        # Validation Serum masks (optional)
β”œβ”€β”€ valid_diva_params.npy       # Validation Diva parameters
β”œβ”€β”€ valid_diva_mask.npy         # Validation Diva masks (optional)
β”œβ”€β”€ test_serum_params.npy       # Test Serum parameters
β”œβ”€β”€ test_serum_mask.npy         # Test Serum masks (optional)
β”œβ”€β”€ test_diva_params.npy        # Test Diva parameters
└── test_diva_mask.npy          # Test Diva masks (optional)

Data Format Requirements

  • Spectrograms: Shape (N, H, W) where N=samples, H=128, W=431
  • Parameters: Shape (N, P) where P varies by synthesizer (Tyrell: 18005, Serum: 23325, Diva: 16237)
  • Masks: Shape (N, P) - binary masks for parameter validity (optional)
  • Synth Labels: Shape (N,) - strings indicating synthesizer type ('tyrell', 'serum', 'diva')

Data Validation

Before training, validate your data format:

python validate_data.py /path/to/npy_data tyrell

This script checks:

  • File existence and structure
  • Array shapes and data types
  • Data consistency between spectrograms and parameters
  • Value ranges and formats

Training

WandB Setup (Required)

Before training, set up Weights & Biases for experiment tracking:

wandb login

Basic Training Commands

Train Tyrell N6 Model

python train/train_tyrell.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4 --wandb

Train Serum Model

# Run from the SurgeSynthesis directory:
python train/train_serum.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4

# Or with GPU specification:
CUDA_VISIBLE_DEVICES=3 python train/train_serum.py --data-dir your/npy_data --epochs 500 --batch-size 128 --learning-rate 5e-5

Train Diva Model

python train/train_diva.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4

Train Multi-Synthesizer Model

python train/train_multi.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4 --wandb

This script trains a single model that can handle all three synthesizers simultaneously.

Advanced Training Options

All training scripts support the following arguments:

Argument Description Default
--data-dir, -d Directory containing training data npy_data
--synth-type, -s Synthesizer type (tyrell/serum/diva) Script-specific
--latent-size, -l Latent dimension size 64
--batch-size, -b Batch size for training 32
--learning-rate, -lr Learning rate 1e-4
--epochs, -e Number of epochs 500
--warmup-epochs, -w KL loss warmup epochs 100
--save-freq Save model every N epochs 100
--early-stopping-patience Early stopping patience 20
--num-workers Data loading workers 4

Example with Custom Parameters

python train/train_tyrell.py \
    --data-dir ./my_data \
    --latent-size 128 \
    --batch-size 64 \
    --learning-rate 5e-5 \
    --epochs 1000 \
    --warmup-epochs 200 \
    --save-freq 50

Important Notes

  1. Working Directory: All training commands should be run from the SurgeSynthesis/ directory (where this README is located)
  2. GPU Selection: Use CUDA_VISIBLE_DEVICES=N to specify which GPU to use (where N is the GPU number)
  3. Module Imports: The training scripts use absolute imports and expect to find the SurgeSynthesis module

If you encounter import errors, make sure you're in the correct directory:

cd /path/to/SurgeSynthesis
python train/train_serum.py [arguments...]

Inference

Run inference on trained models using the inference script:

python run_inference.py \
    --data-dir your/npy_data \
    --model-path ../model_backup/saved_models/vae_serum/best_model.pth \
    --synth-type serum \
    --output-dir inference_results

Inference Arguments

Argument Description Required
--data-dir, -d Directory containing test data Yes
--model-path, -m Path to trained model checkpoint Yes
--synth-type, -s Synthesizer type (tyrell/serum/diva) Yes
--output-dir, -o Directory to save results No (default: inference_results)
--latent-size, -l Latent dimension size No (default: 64)
--batch-size, -b Batch size for inference No (default: 32)

Model Architecture

The PyTorch implementation provides several VAE model variants, all with data-determined parameter sizes:

Key Features

  • 🎯 Data-Driven Architecture: Parameter dimensions are automatically determined from your training data, not hardcoded
  • πŸ”„ KL Warmup: Gradual introduction of KL divergence for stable training
  • πŸŽ›οΈ Multi-Synthesizer Support: Single model handles multiple synthesizers with proper masking
  • ⚑ Dynamic Processing: Synthesizer-specific neural network weights and adaptive filters
  • πŸ“Š WandB Integration: Complete experiment tracking and visualization

Available Models

1. Single Synthesizer VAE (BaseVAE, VAESerum, VAEDiva, VAETyrell)

  • Purpose: Dedicated models for individual synthesizers
  • Parameter Size: Automatically detected from training data (e.g., Tyrell: ~18K params, Serum: ~23K params)
  • Features: KL warmup, mask processing, synthesizer-specific architecture

2. Multi-Synthesizer VAE (MultiSynthVAE)

  • Purpose: Single model handling all three synthesizers simultaneously
  • Parameter Sizes: Each synthesizer decoder sized based on actual data dimensions
  • Features: Separate decoders for each synthesizer, proper mask handling, shared latent representation

3. Dynamic VAE (DynamicVAE)

  • Purpose: Adaptive processing with synthesizer-specific neural network weights
  • Features: Dynamic weight generation, synthesizer-specific processing layers

4. Dynamic MLP VAE (DynamicMLPVAE)

  • Purpose: Enhanced dynamic processing with multiple adaptation layers
  • Features: Synthesizer-specific MLP layers, adaptive feature processing

Parameter Size Detection

The models automatically determine parameter sizes from your training data:

# Training data determines model architecture
train_params = np.load("train_tyrell_params.npy")  # Shape: [N, 18005]
param_size = train_params.shape[-1]  # Automatically detected: 18005

# Model is created with correct dimensions
model = create_vae_model('tyrell', param_size=param_size)

This ensures the PyTorch models exactly match your data format, regardless of parameter count variations.

Monitoring Training

Training progress is automatically logged to Weights & Biases, including:

  • Training and validation losses
  • Individual loss components (spectrogram, parameters, KL divergence)
  • Learning rate changes
  • Model checkpoints

Access your WandB dashboard to monitor training in real-time.

Model Checkpoints

Models are automatically saved in the following locations:

  • saved_models/vae_{synth_type}/best_model.pth: Best validation loss model
  • saved_models/vae_{synth_type}/final_model.pth: Final epoch model
  • saved_models/vae_{synth_type}/checkpoint_epoch_{N}.pth: Periodic checkpoints

Each checkpoint contains:

  • Model state dictionary
  • Optimizer state dictionary
  • Training epoch
  • Loss value

Performance Tips

GPU Training

  • Use CUDA-capable GPU for faster training
  • Adjust batch size based on available GPU memory
  • Use num_workers > 0 for faster data loading

Memory Optimization

  • Reduce batch size if encountering OOM errors
  • Use gradient accumulation for effective larger batch sizes
  • Monitor GPU memory usage during training

Training Stability

  • Start with default hyperparameters
  • Use learning rate scheduling for better convergence
  • Enable early stopping to prevent overfitting

Troubleshooting

Common Issues

  1. CUDA Out of Memory

    • Reduce batch size
    • Use CPU training (CUDA_VISIBLE_DEVICES="")
  2. Data Loading Errors

    • Verify data file paths and formats
    • Check NumPy file compatibility
    • Ensure sufficient disk space
  3. WandB Issues

    • Run wandb login before training
    • Check internet connection
    • Set WANDB_MODE=offline for offline logging
  4. Model Loading Errors

    • Ensure model architecture matches checkpoint
    • Check file paths and permissions
    • Verify PyTorch version compatibility

Getting Help

For issues specific to this implementation, check:

  1. Error messages and stack traces
  2. WandB logs for training anomalies
  3. Data format and preprocessing steps

Differences from Original TensorFlow Implementation

  • Framework: PyTorch instead of TensorFlow
  • Structure: Modular design with separate files for models, data, and utilities
  • Logging: WandB integration instead of basic logging
  • Features: Enhanced with early stopping, learning rate scheduling, and better checkpointing
  • Code Style: More Pythonic and maintainable code structure

License

This project maintains compatibility with the original implementation while providing a modern PyTorch-based approach to automatic synthesizer programming.

About

This is a PyTorch implementation of https://github.com/dafaronbi/Multi-Task-Automatic-Synthesizer-Programming. Thanks to the contribution of these authors!

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published