This project converts the original TensorFlow-based automatic synthesizer programming system to PyTorch, providing a modular and modern implementation for training variational autoencoders (VAEs) to predict synthesizer parameters from audio spectrograms.
- PyTorch Implementation: Modern PyTorch-based VAE models for synthesizer parameter prediction
- Multi-Synthesizer Support: Support for Tyrell N6, Serum, and Diva synthesizers
- WandB Integration: Comprehensive experiment tracking and visualization
- Modular Architecture: Clean, maintainable code structure
- Advanced Training Features: Early stopping, learning rate scheduling, model checkpointing
- KL Divergence Warmup: Gradual introduction of KL loss for stable training
SurgeSynthesis/
βββ models.py # VAE model definitions
βββ dataset.py # Dataset and data loading utilities
βββ utils.py # Training utilities and loss functions
βββ train_tyrell.py # Training script for Tyrell N6
βββ train_serum.py # Training script for Serum
βββ train_diva.py # Training script for Diva
βββ run_inference.py # Inference script for trained models
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ saved_models/ # Directory for saved model checkpoints
βββ vae_tyrell/
βββ vae_serum/
βββ vae_diva/
- Python 3.8 or higher
- CUDA-capable GPU (recommended for training)
- 8GB+ RAM
Install the required packages using:
pip install -r requirements.txt
Main dependencies:
- PyTorch (>= 2.0.0)
- NumPy
- Weights & Biases (wandb)
- librosa
- tqdm
- matplotlib
The training scripts expect data in NumPy format with the following structure:
npy_data/
βββ train_mels.npy # Training spectrograms
βββ train_tyrell_params.npy # Training Tyrell parameters
βββ train_tyrell_mask.npy # Training Tyrell masks (optional)
βββ valid_mels.npy # Validation spectrograms
βββ valid_tyrell_params.npy # Validation Tyrell parameters
βββ valid_tyrell_mask.npy # Validation Tyrell masks (optional)
βββ test_mels.npy # Test spectrograms
βββ test_tyrell_params.npy # Test Tyrell parameters
βββ test_tyrell_mask.npy # Test Tyrell masks (optional)
If you have mixed synthesizer data, also include:
npy_data/
βββ ... (all files above)
βββ train_synth.npy # Synthesizer labels for training
βββ valid_synth.npy # Synthesizer labels for validation
βββ test_synth.npy # Synthesizer labels for testing
βββ train_serum_params.npy # Training Serum parameters
βββ train_serum_mask.npy # Training Serum masks (optional)
βββ train_diva_params.npy # Training Diva parameters
βββ train_diva_mask.npy # Training Diva masks (optional)
βββ valid_serum_params.npy # Validation Serum parameters
βββ valid_serum_mask.npy # Validation Serum masks (optional)
βββ valid_diva_params.npy # Validation Diva parameters
βββ valid_diva_mask.npy # Validation Diva masks (optional)
βββ test_serum_params.npy # Test Serum parameters
βββ test_serum_mask.npy # Test Serum masks (optional)
βββ test_diva_params.npy # Test Diva parameters
βββ test_diva_mask.npy # Test Diva masks (optional)
- Spectrograms: Shape
(N, H, W)
where N=samples, H=128, W=431 - Parameters: Shape
(N, P)
where P varies by synthesizer (Tyrell: 18005, Serum: 23325, Diva: 16237) - Masks: Shape
(N, P)
- binary masks for parameter validity (optional) - Synth Labels: Shape
(N,)
- strings indicating synthesizer type ('tyrell', 'serum', 'diva')
Before training, validate your data format:
python validate_data.py /path/to/npy_data tyrell
This script checks:
- File existence and structure
- Array shapes and data types
- Data consistency between spectrograms and parameters
- Value ranges and formats
Before training, set up Weights & Biases for experiment tracking:
wandb login
python train/train_tyrell.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4 --wandb
# Run from the SurgeSynthesis directory:
python train/train_serum.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4
# Or with GPU specification:
CUDA_VISIBLE_DEVICES=3 python train/train_serum.py --data-dir your/npy_data --epochs 500 --batch-size 128 --learning-rate 5e-5
python train/train_diva.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4
python train/train_multi.py --data-dir your/npy_data --epochs 500 --batch-size 32 --learning-rate 1e-4 --wandb
This script trains a single model that can handle all three synthesizers simultaneously.
All training scripts support the following arguments:
Argument | Description | Default |
---|---|---|
--data-dir , -d |
Directory containing training data | npy_data |
--synth-type , -s |
Synthesizer type (tyrell/serum/diva) | Script-specific |
--latent-size , -l |
Latent dimension size | 64 |
--batch-size , -b |
Batch size for training | 32 |
--learning-rate , -lr |
Learning rate | 1e-4 |
--epochs , -e |
Number of epochs | 500 |
--warmup-epochs , -w |
KL loss warmup epochs | 100 |
--save-freq |
Save model every N epochs | 100 |
--early-stopping-patience |
Early stopping patience | 20 |
--num-workers |
Data loading workers | 4 |
python train/train_tyrell.py \
--data-dir ./my_data \
--latent-size 128 \
--batch-size 64 \
--learning-rate 5e-5 \
--epochs 1000 \
--warmup-epochs 200 \
--save-freq 50
- Working Directory: All training commands should be run from the
SurgeSynthesis/
directory (where this README is located) - GPU Selection: Use
CUDA_VISIBLE_DEVICES=N
to specify which GPU to use (where N is the GPU number) - Module Imports: The training scripts use absolute imports and expect to find the SurgeSynthesis module
If you encounter import errors, make sure you're in the correct directory:
cd /path/to/SurgeSynthesis
python train/train_serum.py [arguments...]
Run inference on trained models using the inference script:
python run_inference.py \
--data-dir your/npy_data \
--model-path ../model_backup/saved_models/vae_serum/best_model.pth \
--synth-type serum \
--output-dir inference_results
Argument | Description | Required |
---|---|---|
--data-dir , -d |
Directory containing test data | Yes |
--model-path , -m |
Path to trained model checkpoint | Yes |
--synth-type , -s |
Synthesizer type (tyrell/serum/diva) | Yes |
--output-dir , -o |
Directory to save results | No (default: inference_results ) |
--latent-size , -l |
Latent dimension size | No (default: 64 ) |
--batch-size , -b |
Batch size for inference | No (default: 32 ) |
The PyTorch implementation provides several VAE model variants, all with data-determined parameter sizes:
- π― Data-Driven Architecture: Parameter dimensions are automatically determined from your training data, not hardcoded
- π KL Warmup: Gradual introduction of KL divergence for stable training
- ποΈ Multi-Synthesizer Support: Single model handles multiple synthesizers with proper masking
- β‘ Dynamic Processing: Synthesizer-specific neural network weights and adaptive filters
- π WandB Integration: Complete experiment tracking and visualization
- Purpose: Dedicated models for individual synthesizers
- Parameter Size: Automatically detected from training data (e.g., Tyrell: ~18K params, Serum: ~23K params)
- Features: KL warmup, mask processing, synthesizer-specific architecture
- Purpose: Single model handling all three synthesizers simultaneously
- Parameter Sizes: Each synthesizer decoder sized based on actual data dimensions
- Features: Separate decoders for each synthesizer, proper mask handling, shared latent representation
- Purpose: Adaptive processing with synthesizer-specific neural network weights
- Features: Dynamic weight generation, synthesizer-specific processing layers
- Purpose: Enhanced dynamic processing with multiple adaptation layers
- Features: Synthesizer-specific MLP layers, adaptive feature processing
The models automatically determine parameter sizes from your training data:
# Training data determines model architecture
train_params = np.load("train_tyrell_params.npy") # Shape: [N, 18005]
param_size = train_params.shape[-1] # Automatically detected: 18005
# Model is created with correct dimensions
model = create_vae_model('tyrell', param_size=param_size)
This ensures the PyTorch models exactly match your data format, regardless of parameter count variations.
Training progress is automatically logged to Weights & Biases, including:
- Training and validation losses
- Individual loss components (spectrogram, parameters, KL divergence)
- Learning rate changes
- Model checkpoints
Access your WandB dashboard to monitor training in real-time.
Models are automatically saved in the following locations:
saved_models/vae_{synth_type}/best_model.pth
: Best validation loss modelsaved_models/vae_{synth_type}/final_model.pth
: Final epoch modelsaved_models/vae_{synth_type}/checkpoint_epoch_{N}.pth
: Periodic checkpoints
Each checkpoint contains:
- Model state dictionary
- Optimizer state dictionary
- Training epoch
- Loss value
- Use CUDA-capable GPU for faster training
- Adjust batch size based on available GPU memory
- Use
num_workers > 0
for faster data loading
- Reduce batch size if encountering OOM errors
- Use gradient accumulation for effective larger batch sizes
- Monitor GPU memory usage during training
- Start with default hyperparameters
- Use learning rate scheduling for better convergence
- Enable early stopping to prevent overfitting
-
CUDA Out of Memory
- Reduce batch size
- Use CPU training (
CUDA_VISIBLE_DEVICES=""
)
-
Data Loading Errors
- Verify data file paths and formats
- Check NumPy file compatibility
- Ensure sufficient disk space
-
WandB Issues
- Run
wandb login
before training - Check internet connection
- Set
WANDB_MODE=offline
for offline logging
- Run
-
Model Loading Errors
- Ensure model architecture matches checkpoint
- Check file paths and permissions
- Verify PyTorch version compatibility
For issues specific to this implementation, check:
- Error messages and stack traces
- WandB logs for training anomalies
- Data format and preprocessing steps
- Framework: PyTorch instead of TensorFlow
- Structure: Modular design with separate files for models, data, and utilities
- Logging: WandB integration instead of basic logging
- Features: Enhanced with early stopping, learning rate scheduling, and better checkpointing
- Code Style: More Pythonic and maintainable code structure
This project maintains compatibility with the original implementation while providing a modern PyTorch-based approach to automatic synthesizer programming.