|
1 |
| -# Mozilla TTS |
2 | 1 |
|
3 |
| -Multi-platform Docker images for [Mozilla TTS](https://github.com/mozilla/TTS). Many thanks to [erogol](https://github.com/erogol) and [the community](https://discourse.mozilla.org/c/tts/285)! |
4 |
| - |
5 |
| - |
6 |
| - |
7 |
| -Supported languages (see [Released Models](https://github.com/mozilla/TTS/wiki/Released-Models)): |
8 |
| - |
9 |
| -* U.S. English (`en`) |
10 |
| - * [Tacotron2 DDC model trained from LJSpeech](https://drive.google.com/drive/folders/1Y_0PcB7W6apQChXtbt6v3fAiNwVf4ER5?usp=sharing) |
11 |
| - * [Multi-band MelGAN vocoder trained from LJSpeech](https://drive.google.com/drive/folders/1XeRT0q4zm5gjERJqwmX5w84pMrD00cKD?usp=sharing) |
12 |
| -* Spanish (`es`) |
13 |
| - * [Tacotron2 DDC model trained from M-AILabs](https://drive.google.com/drive/folders/1HxFUHQl6REh8CifOXL8IMlIyR9SDVcdu?usp=sharing) |
14 |
| - * [Full-band MelGAN vocoder trained from LibriTTS](https://drive.google.com/drive/folders/1LKAKOWqtUpiWr2Go3j5DFEFpUoQbW24C?usp=sharing) |
15 |
| - * [Notebook](https://colab.research.google.com/drive/1u_16ZzHjKYFn1HNVuA4Qf_i2MMFB9olY?usp=sharing) |
16 |
| -* French (`fr`) |
17 |
| - * [Tacotron2 DDC model trained from M-AILabs](https://colab.research.google.com/drive/16T5avz3zOUNcIbF_dwfxnkZDENowx-tZ?usp=sharing) |
18 |
| - * [Full-band MelGAN vocoder trained from LibriTTS](https://drive.google.com/drive/folders/1LKAKOWqtUpiWr2Go3j5DFEFpUoQbW24C?usp=sharing) |
19 |
| - * [Notebook](https://colab.research.google.com/drive/16T5avz3zOUNcIbF_dwfxnkZDENowx-tZ?usp=sharing) |
20 |
| -* German (`de`) |
21 |
| - * [Tacotron2 DDC model](https://colab.research.google.com/drive/1SPl226SwzrfMZltrVagIXya_ax4CsMh-?usp=sharing) trained from [Thorsten dataset](https://github.com/thorstenMueller/deep-learning-german-tts/) |
22 |
| - * [Parallel WaveGAN model](https://colab.research.google.com/drive/1SPl226SwzrfMZltrVagIXya_ax4CsMh-?usp=sharing) trained from same dataset |
23 |
| - * **Note:** due to a mistake at training configuration, this model does not read numbers written in digit form. |
24 |
| - |
25 |
| -Supported platforms: |
26 |
| - |
27 |
| -* `x86_64` |
28 |
| - * GPU is not supported (no CUDA or GPU-enabled PyTorch) |
29 |
| - * If your CPU does not support AVX instructions (Celeron, etc.), use `synesthesiam/mozillatts:<LANGUAGE>-noavx` (e.g., `synestheisam/mozillatts:en-noavx`) |
30 |
| -* `armv7` |
31 |
| - * Raspberry Pi 2/3/4 (32-bit |
32 |
| -* `arm64` |
33 |
| - * Raspberry Pi 2/3/4 (64-bit) |
34 |
| - |
35 |
| -### RAM Limitations |
36 |
| - |
37 |
| -If you're running on a Raspberry Pi with only 1 GB of RAM, you may be unable to load some of the larger models without increasing your swap space. To do this, simply edit the `/etc/dphys-swapfile` file (with `sudo`) and increase `CONF_SWAPSIZE` (1000 is recommended, value is MB). Make sure to reboot after editing this file. |
38 |
| - |
39 |
| -## Using |
40 |
| - |
41 |
| -```sh |
42 |
| -$ docker run -it -p 5002:5002 synesthesiam/mozillatts:<LANGUAGE> |
43 |
| -``` |
44 |
| - |
45 |
| -where `<LANGUAGE>` is one of the supported languages (`en`, `es`, `fr`, `de`). If no language is given, U.S. English is used. |
46 |
| - |
47 |
| -Visit http://localhost:5002 for web interface. |
48 |
| - |
49 |
| -Do an HTTP GET at http://localhost:5002/api/tts?text=your%20sentence to get WAV audio back: |
50 |
| - |
51 |
| -```sh |
52 |
| -$ curl -G --output - \ |
53 |
| - --data-urlencode 'text=Welcome to the world of speech synthesis!' \ |
54 |
| - 'http://localhost:5002/api/tts' | \ |
55 |
| - aplay |
56 |
| -``` |
57 |
| - |
58 |
| -HTTP POST is also supported: |
59 |
| - |
60 |
| -```sh |
61 |
| -$ curl -X POST -H 'Content-Type: text/plain' --output - \ |
62 |
| - --data 'Welcome to the world of speech synthesis!' \ |
63 |
| - 'http://localhost:5002/api/tts' | \ |
64 |
| - aplay |
65 |
| -``` |
66 |
| - |
67 |
| -A `/process` endpoint is available for compatibility with [MaryTTS](http://mary.dfki.de/). Expose the correct port (59125) for maximum compatibility: |
68 |
| - |
69 |
| -```sh |
70 |
| -$ docker run -it -p 59125:5002 synesthesiam/mozillatts |
71 |
| -``` |
72 |
| - |
73 |
| -You should now be able to use software like the [Home Assistant MaryTTS integration](https://www.home-assistant.io/integrations/marytts/). |
74 |
| -Note that only the `INPUT_TEXT` field is actually used. |
75 |
| - |
76 |
| -## Custom Model |
77 |
| - |
78 |
| -The Docker image is usually built with [buildx](https://docs.docker.com/buildx/working-with-buildx/) for multi-platform support. If you just want to build an image for one platform, you can do this: |
79 |
| - |
80 |
| -```sh |
81 |
| -$ NOBUILDX=1 LANGUAGE=en scripts/build-docker.sh |
82 |
| -``` |
83 |
| - |
84 |
| -When you set a `LANGUAGE`, the build script looks in `model/<LANGUAGE>`. These files should exist: |
85 |
| - |
86 |
| -* `model/<LANGUAGE>/config.json` |
87 |
| -* `model/<LANGUAGE>/checkpoint.pth.tar` (any name that ends in `.pth.tar` is fine) |
88 |
| -* `model/<LANGUAGE>/scale_stats.npy` (optional) |
89 |
| - |
90 |
| -Optionally, you may also include a vocoder: |
91 |
| - |
92 |
| -* `model/<LANGUAGE>/vocoder/config.json` |
93 |
| -* `model/<LANGUAGE>/vocoder/checkpoint.pth.tar` (any name that ends in `.pth.tar` is fine) |
94 |
| -* `model/<LANGUAGE>/vocoder/scale_stats.npy` (optional) |
95 |
| - |
96 |
| -If the sample rates between the model and vocoder don't match, the audio will be [interpolated](https://github.com/mozilla/TTS/issues/520). |
97 |
| - |
98 |
| -### Docker Download Cache |
99 |
| - |
100 |
| -When building the Docker image, the `download` directory may contain architecture-specific Python wheels. The `download/amd64` directory, for example, will be used with pip's `--find-links` on `x86_64` systems. If the `NOAVX` environment variable is not empty, then wheels in `download/<ARCHITECTURE>/noavx` will overwrite those in the parent directory. |
101 |
| - |
102 |
| -The `download/shared` directory is used for all architectures. If a `requirements.txt` file is present there, it is used to install dependencies for MozillaTTS. This can be used to exclude Tensorflow, etc., or to use specific package versions. |
103 |
| - |
104 |
| -### Use Docker buildx |
105 |
| - |
106 |
| -To use `buildx`, you'll need to [enable experimental features](https://docs.docker.com/buildx/working-with-buildx/) in the Docker CLI and then set up a private registry: |
107 | 2 |
|
108 | 3 | ```sh
|
109 |
| -$ docker run -d -p 15555:5000 --name registry --restart=always registry:2 |
110 |
| -``` |
| 4 | +git clone https://github.com/krishnanaredla/docker-tts-api.git |
| 5 | +cd docker-tts-api |
| 6 | +# unzip the models downloaded and place them in models folder |
| 7 | +docker build -t mtts:1.0 . |
111 | 8 |
|
112 |
| -This registry runs on port 15555. Next, create a configuration file at `/etc/docker/buildx.conf` with this inside: |
| 9 | +# After build success |
113 | 10 |
|
114 |
| -``` |
115 |
| -[registry."localhost:15555"] |
116 |
| - http = true |
117 |
| - insecure = true |
118 |
| -``` |
| 11 | +docker run --entrypoint "/bin/bash" -it -p 8081:8081 mtts:1.0 |
| 12 | +root@xx/app# cd tts_api/ |
| 13 | +root@xx:/app/tts_api# pip3 install --trusted-host pypi.org --trusted-host files.pythonhosted.org pyworld |
| 14 | +root@xx:/app/tts_api# python3 main.py |
119 | 15 |
|
120 |
| -Note the same port number (15555). Finally, run the following commands to create a builder: |
121 |
| - |
122 |
| -```sh |
123 |
| -$ docker run --rm --privileged multiarch/qemu-user-static --reset -p yes |
124 |
| -$ docker buildx create --config /etc/docker/buildx.conf --use --name mybuilder |
125 |
| -$ docker buildx use mybuilder |
126 |
| -$ docker buildx inspect --bootstrap |
127 | 16 | ```
|
128 | 17 |
|
129 |
| -For some reason, these have to be run again **after every reboot** and will sometimes require removing the builder first. |
130 |
| - |
131 |
| -If all is well, you can build for specific platforms like this: |
132 |
| - |
133 |
| -```sh |
134 |
| -$ PLATFORMS=linux/arm/v7 LANGUAGE=en DOCKER_REGISTRY=localhost:15555 scripts/build-docker.sh |
135 |
| -``` |
136 |
| - |
137 |
| -Note that the limiting factor for most platforms is a compiled PyTorch wheel. Pre-built wheels are available [here](https://github.com/synesthesiam/prebuilt-apps/releases) for ARM and PyTorch 1.6.0. Put wheels in the `download` directory before building. |
138 |
| - |
139 |
| - |
0 commit comments