|
1 | 1 | ## Flash-TTS Installation Guide
|
2 | 2 |
|
3 | 3 | > This document provides a detailed walkthrough for installing and deploying the Flash-TTS inference engine, including
|
4 |
| -> environment requirements, model weight downloads, and dependency installation. |
| 4 | +> environment requirements, model weight downloads, and dependency installation steps. |
5 | 5 |
|
6 | 6 | ---
|
7 | 7 |
|
8 | 8 | ### Environment Requirements
|
9 | 9 |
|
10 |
| -- **Python**: 3.10+ |
11 |
| -- **Operating System**: Linux x86_64, macOS, or Windows (WSL2 recommended) |
12 |
| -- **Required Dependencies**: |
13 |
| - - `fastapi` |
14 |
| - - One of the supported inference backends: `vllm`, `sglang`, `llama-cpp-python`, `mlx-lm` |
| 10 | +* **Python**: Version 3.10 or above |
| 11 | +* **Operating System**: Linux x86\_64, macOS, or Windows (WSL2 is recommended) |
| 12 | +* **Required Dependencies**: |
| 13 | + |
| 14 | + * `fastapi` |
| 15 | + * At least one inference backend: `vllm`, `sglang`, `llama-cpp-python`, `mlx-lm`, or `tensorrt-llm` |
15 | 16 |
|
16 | 17 | ---
|
17 | 18 |
|
18 |
| -### Downloading Model Weights |
| 19 | +### Model Weight Downloads |
19 | 20 |
|
20 |
| -| Model | HuggingFace | ModelScope | GGUF | |
21 |
| -|:--------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:| |
22 |
| -| Spark-TTS | [SparkAudio/Spark-TTS-0.5B](https://huggingface.co/SparkAudio/Spark-TTS-0.5B) | [SparkAudio/Spark-TTS-0.5B](https://modelscope.cn/models/SparkAudio/Spark-TTS-0.5B) | [SparkTTS-LLM-GGUF](https://huggingface.co/mradermacher/SparkTTS-LLM-GGUF) | |
23 |
| -| Orpheus-TTS | [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft) & [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | [canopylabs/orpheus-3b-0.1-ft](https://modelscope.cn/models/canopylabs/orpheus-3b-0.1-ft) | [orpheus-gguf](https://huggingface.co/isaiahbjork/orpheus-3b-0.1-ft-Q4_K_M-GGUF) | |
24 |
| -| Orpheus-TTS (Multilingual) | [orpheus-multilingual-research-release](https://huggingface.co/collections/canopylabs/orpheus-multilingual-research-release-67f5894cd16794db163786ba) & [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | - | - | |
25 |
| -| MegaTTS3 | [ByteDance/MegaTTS3](https://huggingface.co/ByteDance/MegaTTS3) | - | - | |
| 21 | +| Model | Hugging Face | ModelScope | GGUF | |
| 22 | +|:--------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:| |
| 23 | +| Spark-TTS | [SparkAudio/Spark-TTS-0.5B](https://huggingface.co/SparkAudio/Spark-TTS-0.5B) | [SparkAudio/Spark-TTS-0.5B](https://modelscope.cn/models/SparkAudio/Spark-TTS-0.5B) | [SparkTTS-LLM-GGUF](https://huggingface.co/mradermacher/SparkTTS-LLM-GGUF) | |
| 24 | +| Orpheus-TTS | [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft) & [hubertsiuzdak/snac\_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | [canopylabs/orpheus-3b-0.1-ft](https://modelscope.cn/models/canopylabs/orpheus-3b-0.1-ft) | [orpheus-gguf](https://huggingface.co/isaiahbjork/orpheus-3b-0.1-ft-Q4_K_M-GGUF) | |
| 25 | +| Orpheus-TTS (Multilingual) | [orpheus-multilingual-research-release](https://huggingface.co/collections/canopylabs/orpheus-multilingual-research-release-67f5894cd16794db163786ba) & [hubertsiuzdak/snac\_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) | - | - | |
| 26 | +| MegaTTS3 | [ByteDance/MegaTTS3](https://huggingface.co/ByteDance/MegaTTS3) | - | - | |
26 | 27 |
|
27 | 28 | ---
|
28 | 29 |
|
29 |
| -### Installing Dependencies |
30 |
| - |
31 |
| -#### 1. Install `torch` and `torchaudio` |
| 30 | +### Dependency Installation |
32 | 31 |
|
33 |
| -Visit the [official PyTorch website](https://pytorch.org/get-started/locally/) to find the correct install command for |
34 |
| -your environment. Make sure to check your CUDA version and other device details. |
| 32 | +#### 1. Install PyTorch |
35 | 33 |
|
36 |
| -For example, with CUDA 12.4: |
| 34 | +Visit the [official PyTorch website](https://pytorch.org/get-started/locally/) to get the installation command suitable |
| 35 | +for your system and CUDA version. |
| 36 | +For example, for CUDA 12.4: |
37 | 37 |
|
38 | 38 | ```bash
|
39 | 39 | pip install torch==2.6.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
|
40 | 40 | ```
|
41 | 41 |
|
42 |
| -#### 2. Install `flashtts` |
| 42 | +#### 2. Install Flash-TTS |
| 43 | + |
| 44 | +* Install via pip: |
| 45 | + |
| 46 | +```bash |
| 47 | +pip install flashtts |
| 48 | +``` |
| 49 | + |
| 50 | +* Or install from source: |
43 | 51 |
|
44 |
| -- **pip**: |
45 |
| - ```bash |
46 |
| - pip install flashtts |
47 |
| - ``` |
48 |
| - |
49 |
| -- **source code**: |
50 |
| - ```bash |
51 |
| - git clone https://github.com/HuiResearch/FlashTTS.git |
52 |
| - cd FlashTTS |
53 |
| - pip install . |
54 |
| - ``` |
| 52 | +```bash |
| 53 | +git clone https://github.com/HuiResearch/FlashTTS.git |
| 54 | +cd FlashTTS |
| 55 | +pip install . |
| 56 | +``` |
55 | 57 |
|
56 |
| -If you encounter an error installing `WeTextProcessing` in a Windows environment due to the need for a VS C++ compiler, you can first install `pynini==2.1.6` using `conda`: |
| 58 | +> **Windows User Notice**: |
| 59 | +> If you encounter compilation errors when installing `WeTextProcessing`, you can install dependencies via Conda first: |
57 | 60 |
|
58 | 61 | ```bash
|
59 | 62 | conda install -c conda-forge pynini==2.1.6
|
60 | 63 | pip install WeTextProcessing==1.0.4.1
|
61 | 64 | ```
|
62 | 65 |
|
63 |
| -#### 3. Install Inference Backend (Choose One) |
64 |
| - |
65 |
| -- **vLLM** (version > 0.7.2) |
66 |
| - |
67 |
| - Default installation command for CUDA 12.4: |
68 |
| - ```bash |
69 |
| - pip install vllm |
70 |
| - ``` |
71 |
| - For other CUDA versions, refer to: https://docs.vllm.ai/en/latest/getting_started/installation.html |
72 |
| - |
73 |
| -- **llama-cpp-python** |
74 |
| - ```bash |
75 |
| - pip install llama-cpp-python |
76 |
| - ``` |
77 |
| - - If using GGUF weights, place `model.gguf` in the `checkpoints/<model>/LLM/` directory. |
78 |
| - - To convert manually: |
79 |
| - ```bash |
80 |
| - git clone https://github.com/ggml-org/llama.cpp.git |
81 |
| - cd llama.cpp |
82 |
| - python convert_hf_to_gguf.py Spark-TTS-0.5B/LLM --outfile Spark-TTS-0.5B/LLM/model.gguf |
83 |
| - ``` |
84 |
| - |
85 |
| -- **sglang** |
86 |
| - ```bash |
87 |
| - pip install sglang |
88 |
| - ``` |
89 |
| - Reference: https://docs.sglang.ai/start/install.html |
90 |
| - |
91 |
| -- **mlx-lm** (for Apple Silicon macOS) |
92 |
| - ```bash |
93 |
| - pip install mlx-lm |
94 |
| - ``` |
95 |
| - Reference: https://github.com/ml-explore/mlx-lm |
| 66 | +--- |
| 67 | + |
| 68 | +### Install Inference Backends (choose as needed) |
| 69 | + |
| 70 | +#### vLLM (Recommended) |
| 71 | + |
| 72 | +* Version ≥ 0.7.2 is required. For CUDA 12.4: |
| 73 | + |
| 74 | +```bash |
| 75 | +pip install vllm |
| 76 | +``` |
| 77 | + |
| 78 | +* For other versions, refer to |
| 79 | + the [vLLM official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html) |
| 80 | + |
| 81 | +--- |
| 82 | + |
| 83 | +#### llama-cpp-python |
| 84 | + |
| 85 | +```bash |
| 86 | +pip install llama-cpp-python |
| 87 | +``` |
| 88 | + |
| 89 | +* If using GGUF format weights, place the `model.gguf` file under the `checkpoints/<model>/LLM/` directory. |
| 90 | +* To convert weights, use the following commands: |
| 91 | + |
| 92 | +```bash |
| 93 | +git clone https://github.com/ggml-org/llama.cpp.git |
| 94 | +cd llama.cpp |
| 95 | +python convert_hf_to_gguf.py Spark-TTS-0.5B/LLM --outfile Spark-TTS-0.5B/LLM/model.gguf |
| 96 | +``` |
| 97 | + |
| 98 | +--- |
| 99 | + |
| 100 | +#### sglang |
| 101 | + |
| 102 | +```bash |
| 103 | +pip install sglang |
| 104 | +``` |
| 105 | + |
| 106 | +* For more information, refer to the [sglang installation guide](https://docs.sglang.ai/start/install.html) |
| 107 | + |
| 108 | +--- |
| 109 | + |
| 110 | +#### mlx-lm (Apple Silicon Only) |
| 111 | + |
| 112 | +```bash |
| 113 | +pip install mlx-lm |
| 114 | +``` |
| 115 | + |
| 116 | +* More info: [mlx-lm GitHub project](https://github.com/ml-explore/mlx-lm) |
| 117 | + |
| 118 | +--- |
| 119 | + |
| 120 | +#### TensorRT-LLM |
| 121 | + |
| 122 | +Example for CUDA 12.4: |
| 123 | + |
| 124 | +```bash |
| 125 | +pip install tensorrt-llm --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu124 |
| 126 | +``` |
| 127 | + |
| 128 | +> **Notes**: |
| 129 | +> |
| 130 | +> * TensorRT-LLM on Windows currently supports only Python 3.10. |
| 131 | +> * Latest supported version for Windows: 0.16.0 (as of 2025-05-05) |
| 132 | +> * Details: [NVIDIA PyPI Repository](https://pypi.nvidia.com) |
| 133 | +
|
| 134 | +Verify installation: |
| 135 | + |
| 136 | +```bash |
| 137 | +python -c "import tensorrt_llm; print(tensorrt_llm.__version__)" |
| 138 | +``` |
| 139 | + |
| 140 | +##### Convert LLM Weights to TensorRT Engine |
| 141 | + |
| 142 | +1. Refer to the official model conversion docs: |
| 143 | + [https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/) |
| 144 | + |
| 145 | +2. Choose the appropriate model type (e.g., Spark-TTS uses `qwen`). |
| 146 | + |
| 147 | +3. After conversion, rename the output engine folder to `tensorrt-engine` and move it to the model directory, for |
| 148 | + example: |
| 149 | + |
| 150 | +```bash |
| 151 | +Spark-TTS-0.5B/LLM/tensorrt-engine |
| 152 | +``` |
| 153 | + |
| 154 | +Flash-TTS can now load and infer from the converted model. |
| 155 | + |
| 156 | +--- |
| 157 | + |
| 158 | +### Backend Support Matrix |
| 159 | + |
| 160 | +| Inference Backend | Linux ✅ | Windows ✅ | macOS ✅ | Notes | |
| 161 | +|-------------------|:-------:|:---------:|:-------:|-----------------------------------------------------| |
| 162 | +| `vllm` | ✅ | ❌ | ❌ | Linux-only, requires CUDA | |
| 163 | +| `sglang` | ✅ | ❌ | ❌ | Linux-only, supports most GPUs | |
| 164 | +| `tensorrt-llm` | ✅ | ⚠️ | ❌ | Windows supports Python 3.10 only, version ≤ 0.16.0 | |
| 165 | +| `llama-cpp` | ✅ | ✅ | ✅ | GGUF format supported, cross-platform | |
| 166 | +| `mlx-lm` | ❌ | ❌ | ✅ | macOS only (Apple Silicon) | |
| 167 | +| `torch` | ✅ | ✅ | ✅ | Core dependency, supported on all platforms | |
| 168 | + |
| 169 | +> ⚠️ **Notes**: |
| 170 | +> |
| 171 | +> * On Windows, **WSL2** is recommended for full Linux feature support. |
| 172 | +> * On macOS, `mlx-lm` is not available for non-Apple Silicon chips. |
0 commit comments