Skip to content

vivekchavan14/TTS-ft

Repository files navigation

StyleTTS2 Finetuning on LibriTTS

This project fine-tunes the StyleTTS2 model on the LibriTTS dataset to produce high-quality, expressive, and controllable speech synthesis. It supports multispeaker speech generation, style control, and can be extended for applications like voice cloning, TTS APIs, and conversational agents.


Features

  • Finetuning StyleTTS2 on LibriTTS with second-stage training
  • Multispeaker support
  • Style embedding via diffusion model
  • ASR and F0 integration
  • Accelerated with mixed precision (fp16)
  • Dockerized training pipeline
  • Checkpoint management with AWS S3

Requirements

  • AWS EC2 (with GPU, e.g., g4dn.xlarge)
  • NVIDIA Docker runtime (--gpus all)
  • Docker image built or pulled from ECR
  • AWS CLI configured with access to an S3 bucket
  • Checkpoints from base model (e.g., epochs_2nd_00020.pth)

Running with Docker

# Run the Docker container (replace with your own image name)
docker run --gpus all -d --name styletts2-container <your-docker-image>

# Access the container
docker exec -it styletts2-container bash

Training

# Customize Configs/config_ft.yml:

log_dir: "Models/LibriTTS"
epochs: 35
batch_size: 2
pretrained_model: "Models/LibriTTS/epochs_2nd_00020.pth"
load_only_params: true
...

Launch training:

accelerate launch --mixed_precision=fp16 train_finetune_accelerate.py --config_path ./Configs/config_ft.yml

About

Fine-tuned TTS model

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published