TorToiSe TTS

An unofficial PyTorch re-implementation of TorToise TTS.

Almost all of the documentation and usage are carried over from my VALL-E implementation, as documentation is lacking for this implementation, as I whipped it up over the course of two days using knowledge I haven't touched in a year.

Requirements

A working PyTorch environment.

python3 -m venv venv && source ./venv/bin/activate is sufficient.

Install

Simply run pip install git+https://git.ecker.tech/mrq/tortoise-tts@new or pip install git+https://github.com/e-c-k-e-r/tortoise-tts.

Usage

Inferencing

Using the default settings: python3 -m tortoise_tts --yaml="./data/config.yaml" "Read verse out loud for pleasure." "./path/to/a.wav"

To inference using the included Web UI: python3 -m tortoise_tts.webui --yaml="./data/config.yaml"

Pass --listen 0.0.0.0:7860 if you're accessing the web UI from outside of localhost (or pass the host machine's local IP instead)

A LoRA can be loaded by appending --lora=./path/to/your/lora.sft for either above commands.

Training / Finetuning

Training is as simple as copying the reference YAML from ./data/config.yaml to any training directory of your choice (for examples: ./training/ or ./training/lora-finetune/).

Dataset

A pre-processed dataset is required. Refer to the VALL-E implementation for more details. But to reiterate:

Populate your source voices under ./voices/{group name}/{speaker name}/.
Run python3 -m tortoise_tts.emb.transcribe. This will generate a transcription with timestamps for your dataset.
Run python3 -m tortoise_tts.emb.process. This will phonemize the transcriptions and quantize the audio.
Whever you copied the ./data/config.yaml, populate cfg.dataset.training with strings {group name}/{speaker name}.
Either copy, move, or symlink the resultant ./training/24KHz-mel/ folder to the directory containing your copied config.yaml as data.
Run python3 -m tortoise_tts.data --yaml="./path/to/your/training/config.yaml --action=metadata to generate additional metadata, as the dataloader code is slop and needs to be updated.

Trainer

To start the trainer, run python3 -m tortoise_tts.train --yaml="./path/to/your/training/config.yaml.

Type save to save whenever. Type quit to quit and save whenever. Type eval to run evaluation / validation of the model.

For training a LoRA, uncomment the loras block in your training YAML.

For loading an existing finetuned model, create a folder with this structure, and load its accompanying YAML:

./some/arbitrary/path/:
    ckpt:
        autoregressive:
            fp32.pth # finetuned weights
    config.yaml

For LoRAs, replace the above fp32.pth with lora.pth.

To-Do

Why?

To:

atone for the mess I've made with forking TorToiSe TTS originally with a bunch of slopcode, and the nightmare that ai-voice-cloning turned out.
unify the trainer and the inference-er.
implement additional features with much ease, as I'm very well familiar with my framework.
disillusion myself that it won't get better than TorToiSe TTS:
- while it's faster than VALL-E, the quality leaves a lot to be desired (although this is simply due to the overall architecture).

License

Unless otherwise credited/noted in this README or within the designated Python file, this repository is licensed under AGPLv3.

Name		Name	Last commit message	Last commit date
Latest commit History 209 Commits
data		data
scripts		scripts
tortoise_tts		tortoise_tts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TorToiSe TTS

Requirements

Install

Usage

Inferencing

Training / Finetuning

Dataset

Trainer

To-Do

Why?

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

e-c-k-e-r/tortoise-tts

Folders and files

Latest commit

History

Repository files navigation

TorToiSe TTS

Requirements

Install

Usage

Inferencing

Training / Finetuning

Dataset

Trainer

To-Do

Why?

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages