HuiResearch
diff --git a/‎README.MD
Lines changed: 10 additions & 10 deletions b/‎README.MD
Lines changed: 10 additions & 10 deletions
diff --git a/‎README_EN.MD
Lines changed: 11 additions & 10 deletions b/‎README_EN.MD
Lines changed: 11 additions & 10 deletions
diff --git a/‎docs/en/get_started/installation.md
Lines changed: 142 additions & 65 deletions b/‎docs/en/get_started/installation.md
Lines changed: 142 additions & 65 deletions
diff --git a/‎docs/en/get_started/quick_start.md
Lines changed: 1 addition & 0 deletions b/‎docs/en/get_started/quick_start.md
Lines changed: 1 addition & 0 deletions
@@ -18,16 +18,16 @@
 
 ## ✨ 功能亮点
 
-|     | 功能          | 说明                                                 |
-|-----|-------------|----------------------------------------------------|
-| 🚀  | **多后端推理加速** | 支持 `vllm`、`sglang`、`llama‑cpp`、`mlx‑lm` 等多种高性能推理引擎 |
-| 🎯  | **高并发**     | 动态批处理与异步队列，轻松应对大流量请求                               |
-| 🎛️ | **全参数控制**   | 可调节音调、语速、温度、情感标签等                                  |
-| 📱  | **轻量部署**    | 基于 FastAPI，一条命令即可启动；最小依赖                           |
-| 🔊  | **长文本合成**   | 支持超长文本，保持连续音色一致                                    |
-| 🔄  | **流式 TTS**  | 边生成边播放，降低等待，提高交互体验                                 |
-| 🎭  | **多角色对话**   | 同文本多角色合成，适合剧本配音                                    |
-| 🎨  | **现代化前端**   | 适配Web端                                             |
+|     | 功能          | 说明                                                                |
+|-----|-------------|-------------------------------------------------------------------|
+| 🚀  | **多后端推理加速** | 支持 `vllm`、`sglang`、`llama‑cpp`、`mlx‑lm`、`tensorrt-llm` 等多种高性能推理引擎 |
+| 🎯  | **高并发**     | 动态批处理与异步队列，轻松应对大流量请求                                              |
+| 🎛️ | **全参数控制**   | 可调节音调、语速、温度、情感标签等                                                 |
+| 📱  | **轻量部署**    | 基于 FastAPI，一条命令即可启动；最小依赖                                          |
+| 🔊  | **长文本合成**   | 支持超长文本，保持连续音色一致                                                   |
+| 🔄  | **流式 TTS**  | 边生成边播放，降低等待，提高交互体验                                                |
+| 🎭  | **多角色对话**   | 同文本多角色合成，适合剧本配音                                                   |
+| 🎨  | **现代化前端**   | 适配Web端                                                            |
 
 ## 🖼️ 前端示例
 
 
@@ -19,16 +19,16 @@
 
 ## ✨ Highlights
 
-|     | Feature                        | Description                                                                                    |
-|-----|--------------------------------|------------------------------------------------------------------------------------------------|
-| 🚀  | **Multi-backend Acceleration** | Supports high-performance inference engines like `vllm`, `sglang`, `llama-cpp`, `mlx-lm`, etc. |
-| 🎯  | **High Concurrency**           | Dynamic batching and asynchronous queues to handle heavy traffic with ease                     |
-| 🎛️ | **Full Parameter Control**     | Adjust pitch, speaking rate, temperature, emotion tags, and more                               |
-| 📱  | **Lightweight Deployment**     | Built on FastAPI—start with a single command; minimal dependencies                             |
-| 🔊  | **Long-form Synthesis**        | Supports very long texts while maintaining consistent voice quality                            |
-| 🔄  | **Streaming TTS**              | Generate and play audio in real time; reduces wait time, enhances interactivity                |
-| 🎭  | **Multi-character Dialog**     | Synthesize multiple roles within the same text—ideal for script dubbing                        |
-| 🎨  | **Modern Frontend**            | Web-ready, responsive interface                                                                |
+|     | Feature                        | Description                                                                                                   |
+|-----|--------------------------------|---------------------------------------------------------------------------------------------------------------|
+| 🚀  | **Multi-backend Acceleration** | Supports high-performance inference engines like `vllm`, `sglang`, `llama-cpp`, `mlx-lm`,`tensorrt-llm`, etc. |
+| 🎯  | **High Concurrency**           | Dynamic batching and asynchronous queues to handle heavy traffic with ease                                    |
+| 🎛️ | **Full Parameter Control**     | Adjust pitch, speaking rate, temperature, emotion tags, and more                                              |
+| 📱  | **Lightweight Deployment**     | Built on FastAPI—start with a single command; minimal dependencies                                            |
+| 🔊  | **Long-form Synthesis**        | Supports very long texts while maintaining consistent voice quality                                           |
+| 🔄  | **Streaming TTS**              | Generate and play audio in real time; reduces wait time, enhances interactivity                               |
+| 🎭  | **Multi-character Dialog**     | Synthesize multiple roles within the same text—ideal for script dubbing                                       |
+| 🎨  | **Modern Frontend**            | Web-ready, responsive interface                                                                               |
 
 ## 🖼️ Frontend Demo
 
@@ -134,6 +134,7 @@ pip install flashtts
 For detailed installation steps, please refer to: [installation guide](docs/zh/get_started/installation.md)
 
 Local inference command:：
+
 ```bash
 flashtts infer \
   -i "hello world." \
 
@@ -1,95 +1,172 @@
 ## Flash-TTS Installation Guide
 
 > This document provides a detailed walkthrough for installing and deploying the Flash-TTS inference engine, including
-> environment requirements, model weight downloads, and dependency installation.
+> environment requirements, model weight downloads, and dependency installation steps.
 
 ---
 
 ### Environment Requirements
 
-- **Python**: 3.10+
-- **Operating System**: Linux x86_64, macOS, or Windows (WSL2 recommended)
-- **Required Dependencies**:
-    - `fastapi`
-    - One of the supported inference backends: `vllm`, `sglang`, `llama-cpp-python`, `mlx-lm`
+* **Python**: Version 3.10 or above
+* **Operating System**: Linux x86\_64, macOS, or Windows (WSL2 is recommended)
+* **Required Dependencies**:
+
+    * `fastapi`
+    * At least one inference backend: `vllm`, `sglang`, `llama-cpp-python`, `mlx-lm`, or `tensorrt-llm`
 
 ---
 
-### Downloading Model Weights
+### Model Weight Downloads
 
-|           Model            |                                                                                                             HuggingFace                                                                                                             |                                        ModelScope                                         |                                       GGUF                                       |
-|:--------------------------:|:-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:|
-|         Spark-TTS          |                                                                            [SparkAudio/Spark-TTS-0.5B](https://huggingface.co/SparkAudio/Spark-TTS-0.5B)                                                                            |    [SparkAudio/Spark-TTS-0.5B](https://modelscope.cn/models/SparkAudio/Spark-TTS-0.5B)    |    [SparkTTS-LLM-GGUF](https://huggingface.co/mradermacher/SparkTTS-LLM-GGUF)    |
-|        Orpheus-TTS         |                                  [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft) & [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz)                                  | [canopylabs/orpheus-3b-0.1-ft](https://modelscope.cn/models/canopylabs/orpheus-3b-0.1-ft) | [orpheus-gguf](https://huggingface.co/isaiahbjork/orpheus-3b-0.1-ft-Q4_K_M-GGUF) |
-| Orpheus-TTS (Multilingual) | [orpheus-multilingual-research-release](https://huggingface.co/collections/canopylabs/orpheus-multilingual-research-release-67f5894cd16794db163786ba) & [hubertsiuzdak/snac_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) |                                             -                                             |                                        -                                         |
-|          MegaTTS3          |                                                                                   [ByteDance/MegaTTS3](https://huggingface.co/ByteDance/MegaTTS3)                                                                                   |                                             -                                             |                                        -                                         |
+|           Model            |                                                                                                             Hugging Face                                                                                                             |                                        ModelScope                                         |                                       GGUF                                       |
+|:--------------------------:|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:-----------------------------------------------------------------------------------------:|:--------------------------------------------------------------------------------:|
+|         Spark-TTS          |                                                                            [SparkAudio/Spark-TTS-0.5B](https://huggingface.co/SparkAudio/Spark-TTS-0.5B)                                                                             |    [SparkAudio/Spark-TTS-0.5B](https://modelscope.cn/models/SparkAudio/Spark-TTS-0.5B)    |    [SparkTTS-LLM-GGUF](https://huggingface.co/mradermacher/SparkTTS-LLM-GGUF)    |
+|        Orpheus-TTS         |                                  [canopylabs/orpheus-3b-0.1-ft](https://huggingface.co/canopylabs/orpheus-3b-0.1-ft) & [hubertsiuzdak/snac\_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz)                                  | [canopylabs/orpheus-3b-0.1-ft](https://modelscope.cn/models/canopylabs/orpheus-3b-0.1-ft) | [orpheus-gguf](https://huggingface.co/isaiahbjork/orpheus-3b-0.1-ft-Q4_K_M-GGUF) |
+| Orpheus-TTS (Multilingual) | [orpheus-multilingual-research-release](https://huggingface.co/collections/canopylabs/orpheus-multilingual-research-release-67f5894cd16794db163786ba) & [hubertsiuzdak/snac\_24khz](https://huggingface.co/hubertsiuzdak/snac_24khz) |                                             -                                             |                                        -                                         |
+|          MegaTTS3          |                                                                                   [ByteDance/MegaTTS3](https://huggingface.co/ByteDance/MegaTTS3)                                                                                    |                                             -                                             |                                        -                                         |
 
 ---
 
-### Installing Dependencies
-
-#### 1. Install `torch` and `torchaudio`
+### Dependency Installation
 
-Visit the [official PyTorch website](https://pytorch.org/get-started/locally/) to find the correct install command for
-your environment. Make sure to check your CUDA version and other device details.
+#### 1. Install PyTorch
 
-For example, with CUDA 12.4:
+Visit the [official PyTorch website](https://pytorch.org/get-started/locally/) to get the installation command suitable
+for your system and CUDA version.
+For example, for CUDA 12.4:
 
 ```bash
 pip install torch==2.6.0 torchaudio==2.6.0 --index-url https://download.pytorch.org/whl/cu124
 ```
 
-#### 2. Install `flashtts`
+#### 2. Install Flash-TTS
+
+* Install via pip:
+
+```bash
+pip install flashtts
+```
+
+* Or install from source:
 
-- **pip**：
-  ```bash
-  pip install flashtts
-  ```
-  
-- **source code**：
-  ```bash
-  git clone https://github.com/HuiResearch/FlashTTS.git
-  cd FlashTTS
-  pip install .
-  ```
+```bash
+git clone https://github.com/HuiResearch/FlashTTS.git
+cd FlashTTS
+pip install .
+```
 
-If you encounter an error installing `WeTextProcessing` in a Windows environment due to the need for a VS C++ compiler, you can first install `pynini==2.1.6` using `conda`:
+> **Windows User Notice**:
+> If you encounter compilation errors when installing `WeTextProcessing`, you can install dependencies via Conda first:
 
 ```bash
 conda install -c conda-forge pynini==2.1.6
 pip install WeTextProcessing==1.0.4.1
 ```
 
-#### 3. Install Inference Backend (Choose One)
-
-- **vLLM** (version > 0.7.2)
-
-  Default installation command for CUDA 12.4:
-  ```bash
-  pip install vllm
-  ```
-  For other CUDA versions, refer to: https://docs.vllm.ai/en/latest/getting_started/installation.html
-
-- **llama-cpp-python**
-  ```bash
-  pip install llama-cpp-python
-  ```
-    - If using GGUF weights, place `model.gguf` in the `checkpoints/<model>/LLM/` directory.
-    - To convert manually:
-      ```bash
-      git clone https://github.com/ggml-org/llama.cpp.git
-      cd llama.cpp
-      python convert_hf_to_gguf.py Spark-TTS-0.5B/LLM --outfile Spark-TTS-0.5B/LLM/model.gguf
-      ```
-
-- **sglang**
-  ```bash
-  pip install sglang
-  ```
-  Reference: https://docs.sglang.ai/start/install.html
-
-- **mlx-lm** (for Apple Silicon macOS)
-  ```bash
-  pip install mlx-lm
-  ```
-  Reference: https://github.com/ml-explore/mlx-lm
+---
+
+### Install Inference Backends (choose as needed)
+
+#### vLLM (Recommended)
+
+* Version ≥ 0.7.2 is required. For CUDA 12.4:
+
+```bash
+pip install vllm
+```
+
+* For other versions, refer to
+  the [vLLM official documentation](https://docs.vllm.ai/en/latest/getting_started/installation.html)
+
+---
+
+#### llama-cpp-python
+
+```bash
+pip install llama-cpp-python
+```
+
+* If using GGUF format weights, place the `model.gguf` file under the `checkpoints/<model>/LLM/` directory.
+* To convert weights, use the following commands:
+
+```bash
+git clone https://github.com/ggml-org/llama.cpp.git
+cd llama.cpp
+python convert_hf_to_gguf.py Spark-TTS-0.5B/LLM --outfile Spark-TTS-0.5B/LLM/model.gguf
+```
+
+---
+
+#### sglang
+
+```bash
+pip install sglang
+```
+
+* For more information, refer to the [sglang installation guide](https://docs.sglang.ai/start/install.html)
+
+---
+
+#### mlx-lm (Apple Silicon Only)
+
+```bash
+pip install mlx-lm
+```
+
+* More info: [mlx-lm GitHub project](https://github.com/ml-explore/mlx-lm)
+
+---
+
+#### TensorRT-LLM
+
+Example for CUDA 12.4:
+
+```bash
+pip install tensorrt-llm --extra-index-url https://pypi.nvidia.com --extra-index-url https://download.pytorch.org/whl/cu124
+```
+
+> **Notes**:
+>
+> * TensorRT-LLM on Windows currently supports only Python 3.10.
+> * Latest supported version for Windows: 0.16.0 (as of 2025-05-05)
+> * Details: [NVIDIA PyPI Repository](https://pypi.nvidia.com)
+
+Verify installation:
+
+```bash
+python -c "import tensorrt_llm; print(tensorrt_llm.__version__)"
+```
+
+##### Convert LLM Weights to TensorRT Engine
+
+1. Refer to the official model conversion docs:
+   [https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples/models/core/)
+
+2. Choose the appropriate model type (e.g., Spark-TTS uses `qwen`).
+
+3. After conversion, rename the output engine folder to `tensorrt-engine` and move it to the model directory, for
+   example:
+
+```bash
+Spark-TTS-0.5B/LLM/tensorrt-engine
+```
+
+Flash-TTS can now load and infer from the converted model.
+
+---
+
+### Backend Support Matrix
+
+| Inference Backend | Linux ✅ | Windows ✅ | macOS ✅ | Notes                                               |
+|-------------------|:-------:|:---------:|:-------:|-----------------------------------------------------|
+| `vllm`            |    ✅    |     ❌     |    ❌    | Linux-only, requires CUDA                           |
+| `sglang`          |    ✅    |     ❌     |    ❌    | Linux-only, supports most GPUs                      |
+| `tensorrt-llm`    |    ✅    |    ⚠️     |    ❌    | Windows supports Python 3.10 only, version ≤ 0.16.0 |
+| `llama-cpp`       |    ✅    |     ✅     |    ✅    | GGUF format supported, cross-platform               |
+| `mlx-lm`          |    ❌    |     ❌     |    ✅    | macOS only (Apple Silicon)                          |
+| `torch`           |    ✅    |     ✅     |    ✅    | Core dependency, supported on all platforms         |
+
+> ⚠️ **Notes**:
+>
+> * On Windows, **WSL2** is recommended for full Linux feature support.
+> * On macOS, `mlx-lm` is not available for non-Apple Silicon chips.
@@ -36,6 +36,7 @@ flashtts infer \
 | `-b, --backend`                 | `str`   | —       | Yes      | Inference backend: `llama-cpp, vllm, sglang, mlx-lm, torch`                               |
 | `--lang`                        | `str`   | `None`  | No       | Language type for OrpheusTTS, e.g., `mandarin, english, french`, etc.                     |
 | `--snac_path`                   | `str`   | `None`  | No       | Path to SNAC module for OrpheusTTS                                                        |
+| `--llm_tensorrt_path`           | `str`   | `None`  | No  | Path to the TensorRT model. Only effective when the backend is set to `tensorrt-llm`. If not provided, defaults to `{model_path}/tensorrt-engine` |
 | `--llm_device`                  | `str`   | `auto`  | No       | Device for LLM computation: `cpu` or `cuda`                                               |
 | `--tokenizer_device`            | `str`   | `auto`  | No       | Device for audio tokenizer                                                                |
 | `--detokenizer_device`          | `str`   | `auto`  | No       | Device for audio detokenizer                                                              |