pawlyglot is a project that aims to develop an end-to-end multilingual TTS, voice-cloning and lip-syncing pipeline, by combining several open-source projects. Only EN > CN translations will be supported at the moment.
The services will be hosted in docker containers. For experiment's sake, we will be using gRPC
to communicate between the microservices. For more information about gRPC, you can read the article in the link here.
Currently, the plan is to use the following projects.
No. | Services | Model | Implemented |
---|---|---|---|
1 | Voice Activity Detection | SpeechBrain CRDNN | Yes |
2 | Speech Recognition | Faster-Whisper (Small) | Yes |
3 | Machine Translation | Helsinki EN-ZH | Yes |
4 | Voice Cloning and TTS | Coqui XTTS-V2 | Yes |
5 | Lip Sync | Wav2Lip-GFPGAN | No |
- Clone the repository and build base docker image first. The base image is used as the starting point for all the other services.
git clone https://github.com/test-dan-run/pawlyglot.git
cd pawlyglot
docker build -f dockerfile.base -t pawlyglot/base:1.0.0 .
-
Download the pretrained models
- The VAD and Translation models are hot-loaded on build/start-up.
- Download the zipped file containing the ASR model files here. Extract the contents into
./services/speech_recognition/models/small
- Download the zipped file containing the TTS model files here. Extract the content into
./services/voice_cloning/models/xtts
-
Build the rest of the services
# for development, mount models and source code
docker-compose -f docker-compose.dev.yaml build
# for staging/deployment
docker-compose build
- For the
test_client.py
in./tests
to work, please installgrpcio-tools
in your local environment (or create a virtualenv). And run the following commands to generate the auxillary pb files.
python3 -m pip install grpcio-tools==1.62.1
# be in the main directory
python3 -m grpc_tools.protoc -I ./proto \
--python_out=./backend \
--pyi_out=./backend \
--grpc_python_out=./backend \
./proto/asr.proto \
./proto/vad.proto \
./proto/mt.proto \
./proto/vc.proto
- Start up the services.
# for development, mount models and source code
docker-compose -f docker-compose.dev.yaml up
# for staging/deployment
docker-compose up
- Test out the orchestrator.
cd ./tests
# edit input_audio_filepath (Line 151) to whatever you want. Audio file is assumed to be sampled at 16KHz.
python3 orchestrate.py