Skip to content

Commit 477b143

Browse files
committed
Update readme
1 parent feecc04 commit 477b143

File tree

1 file changed

+121
-29
lines changed

1 file changed

+121
-29
lines changed

README.md

Lines changed: 121 additions & 29 deletions
Original file line numberDiff line numberDiff line change
@@ -1,50 +1,142 @@
1-
# TH TTS
1+
# Thai TTS (TH TTS)
22

3-
## How to run
3+
## Model Attribution
4+
5+
All model weights are provided by [VIZINTZOR](https://huggingface.co/VIZINTZOR) via Hugging Face:
6+
7+
- **VITS Thai Female/Male**:
8+
[MMS-TTS-THAI-FEMALEV2](https://huggingface.co/VIZINTZOR/MMS-TTS-THAI-FEMALEV2),
9+
[MMS-TTS-THAI-MALEV2](https://huggingface.co/VIZINTZOR/MMS-TTS-THAI-MALEV2)
10+
- **F5-TTS Thai**:
11+
[F5-TTS-THAI](https://huggingface.co/VIZINTZOR/F5-TTS-THAI)
12+
[F5-TTS-TH-V2](https://huggingface.co/VIZINTZOR/F5-TTS-TH-V2)
13+
14+
Please acknowledge and cite VIZINTZOR if you use these models in your work.
15+
16+
---
17+
18+
## Recommended Model
19+
20+
**For best quality and performance, use F5-TTS v1.**
21+
22+
---
23+
24+
## How to Run
25+
26+
You can run the server using either direct `uv` commands or the provided `entrypoint.sh` script (recommended for Docker and easy switching).
27+
28+
### 1. Using `uv` Directly
29+
30+
#### VITS Thai (Female/Male)
431

532
```bash
6-
uv run python src/wyoming_thai_vits.py --log-level DEBUG --host 0.0.0.0 --port 10200 \
33+
uv run python src/wyoming_thai_vits.py --log-level INFO --host 0.0.0.0 --port 10200 \
734
--model-id VIZINTZOR/MMS-TTS-THAI-FEMALEV2
835

9-
uv run python src/wyoming_thai_vits.py --log-level DEBUG --host 0.0.0.0 --port 10200 \
36+
uv run python src/wyoming_thai_vits.py --log-level INFO --host 0.0.0.0 --port 10200 \
1037
--model-id VIZINTZOR/MMS-TTS-THAI-MALEV2
1138
```
1239

13-
## How to test
14-
15-
### tool
40+
#### F5-TTS Thai v1 (**Recommended**)
1641

1742
```bash
18-
go install github.com/john-pettigrew/wyoming-cli@latest
43+
uv run python src/wyoming_thai_f5.py --log-level INFO --host 0.0.0.0 --port 10200 \
44+
--model-version v1
1945
```
2046

21-
### info
47+
#### F5-TTS Thai v2
48+
2249
```bash
23-
printf '{"type":"describe","data":{}}\n' | nc 127.0.0.1 10200
50+
uv run python src/wyoming_thai_f5.py --log-level INFO --host 0.0.0.0 --port 10200 \
51+
--model-version v2
2452
```
2553

26-
### synth
27-
> Connect to HA seems to work much better, wyoming-cli only managed to get describe, so just let people in UFW
54+
### 2. Using `entrypoint.sh` (Recommended)
55+
56+
Set the backend via `THTTS_BACKEND` environment variable:
57+
58+
- `VITS` for VITS model
59+
- `F5_V1` for F5-TTS v1 (**recommended**)
60+
- `F5_V2` for F5-TTS v2
61+
62+
Example:
63+
2864
```bash
29-
sudo ufw allow 10200/tcp
30-
sudo ufw delete allow 10200/tcp
65+
THTTS_BACKEND=F5_V1 ./entrypoint.sh
3166
```
3267

33-
```bash
34-
wyoming-cli tts -voice-name 'thai-female' -addr 'localhost:10200' -text 'สวัสดีชาวโลก' -output_file './hello.wav'
68+
You can override other parameters via environment variables (see below).
69+
70+
---
71+
72+
## Environment Variables
73+
74+
| Variable | Default Value | Description |
75+
|-----------------------|-----------------------------------------------|--------------------------------------------------|
76+
| `THTTS_BACKEND` | `VITS` | Model backend: `VITS`, `F5_V1`, or `F5_V2` |
77+
| `THTTS_HOST` | `0.0.0.0` | Bind address |
78+
| `THTTS_PORT` | `10200` | Port to listen on |
79+
| `THTTS_LOG_LEVEL` | `INFO` | Log level (`DEBUG`, `INFO`, etc.) |
80+
| `THTTS_MODEL` | `VIZINTZOR/MMS-TTS-THAI-FEMALEV2` | VITS model ID |
81+
| `THTTS_REF_AUDIO` | `hf_sample` | F5 reference audio path |
82+
| `THTTS_REF_TEXT` | *(empty)* | F5 reference transcript |
83+
| `THTTS_DEVICE` | `auto` | `auto`, `cpu`, or `cuda` |
84+
| `THTTS_SPEED` | `1.0` | F5 speech speed multiplier |
85+
| `THTTS_NFE_STEPS` | `32` | F5 denoising steps |
86+
| `THTTS_MAX_CONCURRENT`| `1` | Max concurrent synth requests |
87+
| `THTTS_CKPT_FILE` | *(auto-selected by backend)* | F5 checkpoint file path |
88+
| `THTTS_VOCAB_FILE` | *(auto-selected by backend)* | F5 vocab file path |
89+
90+
91+
## 3. Docker Compose (NVIDIA GPU)
92+
93+
```yaml
94+
services:
95+
thtts:
96+
image: ghcr.io/zen3515/thtts:latest
97+
container_name: thtts
98+
restart: unless-stopped
99+
shm_size: "2g" # please adjust
100+
environment:
101+
- THTTS_BACKEND=F5_V1
102+
- THTTS_HOST=0.0.0.0
103+
- THTTS_PORT=10200
104+
- THTTS_LOG_LEVEL=INFO
105+
- THTTS_DEVICE=auto
106+
- NVIDIA_VISIBLE_DEVICES=all
107+
- NVIDIA_DRIVER_CAPABILITIES=compute,utility
108+
ports:
109+
- "10200:10200"
110+
deploy:
111+
resources:
112+
reservations:
113+
devices:
114+
- driver: nvidia
115+
count: all
116+
capabilities: [gpu]
35117
```
36118
119+
**Note:**
120+
- Make sure you have [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html) installed.
121+
- Adjust the `THTTS_BACKEND` and other environment variables as needed.
122+
123+
---
124+
125+
## How to Test
126+
127+
### Query Info
128+
37129
```bash
38-
( printf '{"type":"synthesize","data":{"text":"สวัสดีครับ ยินดีที่ได้รู้จัก","voice":"thai-female"}}\n'; ) \
39-
| nc 127.0.0.1 10200 \
40-
| tee responses.ndjson \
41-
| jq -r 'select(.type=="audio-start") or select(.type=="audio-chunk") or select(.type=="audio-stop")' > audio_events.ndjson
42-
43-
# Extract audio chunks (base64) -> raw PCM
44-
jq -r 'select(.type=="audio-chunk") | .data.audio' audio_events.ndjson | base64 -d > out.pcm
45-
46-
# Convert PCM (s16le, 22.05kHz, mono) -> WAV (use either ffmpeg or sox)
47-
ffmpeg -f s16le -ar 22050 -ac 1 -i out.pcm out.wav -y
48-
# or:
49-
sox -t raw -r 22050 -e signed -b 16 -c 1 out.pcm out.wav
50-
```
130+
printf '{"type":"describe","data":{}}\n' | nc 127.0.0.1 10200
131+
```
132+
133+
### Synthesize Speech
134+
135+
Just connect it to homeassistant, it's probably the most up to spec with wyoming protocol
136+
137+
---
138+
139+
140+
## License
141+
142+
See individual model pages on Hugging Face for license details.

0 commit comments

Comments
 (0)