Skip to content

test-dan-run/pawlyglot

Repository files navigation


pawlyglot is a project that aims to develop an end-to-end multilingual TTS, voice-cloning and lip-syncing pipeline, by combining several open-source projects. Only EN > CN translations will be supported at the moment.

The services will be hosted in docker containers. For experiment's sake, we will be using gRPC to communicate between the microservices. For more information about gRPC, you can read the article in the link here.

Currently, the plan is to use the following projects.

No. Services Model Implemented
1 Voice Activity Detection SpeechBrain CRDNN Yes
2 Speech Recognition Faster-Whisper (Small) Yes
3 Machine Translation Helsinki EN-ZH Yes
4 Voice Cloning and TTS Coqui XTTS-V2 Yes
5 Lip Sync Wav2Lip-GFPGAN No

Setup

  1. Clone the repository and build base docker image first. The base image is used as the starting point for all the other services.
git clone https://github.com/test-dan-run/pawlyglot.git
cd pawlyglot
docker build -f dockerfile.base -t pawlyglot/base:1.0.0 .
  1. Download the pretrained models

    • The VAD and Translation models are hot-loaded on build/start-up.
    • Download the zipped file containing the ASR model files here. Extract the contents into ./services/speech_recognition/models/small
    • Download the zipped file containing the TTS model files here. Extract the content into ./services/voice_cloning/models/xtts
  2. Build the rest of the services

# for development, mount models and source code
docker-compose -f docker-compose.dev.yaml build

# for staging/deployment
docker-compose build
  1. For the test_client.py in ./tests to work, please install grpcio-tools in your local environment (or create a virtualenv). And run the following commands to generate the auxillary pb files.
python3 -m pip install grpcio-tools==1.62.1

# be in the main directory
python3 -m grpc_tools.protoc -I ./proto \
    --python_out=./backend \
    --pyi_out=./backend \
    --grpc_python_out=./backend \
    ./proto/asr.proto \
    ./proto/vad.proto \
    ./proto/mt.proto \
    ./proto/vc.proto

Run

  1. Start up the services.
# for development, mount models and source code
docker-compose -f docker-compose.dev.yaml up

# for staging/deployment
docker-compose up
  1. Test out the orchestrator.
cd ./tests
# edit input_audio_filepath (Line 151) to whatever you want. Audio file is assumed to be sampled at 16KHz.
python3 orchestrate.py

About

Service-based Multilingual TTS and Lip-Syncing Pipeline

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published