Skip to content

wenet-e2e/west

Repository files navigation

WEST

License Docs Paper WeChat

We Speech Tookit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction.

Highlights

  • Fully LLM-based: Standing on the shoulders of giants by reusing mature architectures, ecosystems (e.g., Hugging Face), and methods (e.g., sequence packing) from large models.

  • Full-stack: Supports tasks such as recognition, synthesis, understanding, dialogue, and multimodal capabilities, with extensibility to incorporate open-source models.

  • Simple and Stupid: A simple and stupid speech toolkit that everyone can Touch.

Install

conda create -n west python=3.10
conda activate west
pip install -r requirements.txt

Supported Tasks and Models

Task Model Recipe
Speech Recognition TouchASU(Built-in) aishell
Speech Synthesis TouchTTS(Built-in) libritts
Speech QA TouchASU(Built-in) belle_1.4M_qa
Speech Interaction TouchChat(Built-in)
MutliModal Interaction TouchOmni(Built-in)

Citation

Our paper is available on arXiv, and you can cite it as:

@misc{zhang2025westllmbasedspeech,
      title={WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction},
      author={Binbin Zhang and Chengdong Liang and Shuai Wang and Xuelong Geng and Zhao Guo and Haoyu Li and Hao Yin and Xipeng Yang and Pengshen Zhang and Changwei Ma and Lei Xie},
      year={2025},
      eprint={2509.19902},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2509.19902},
}

Discussion & Communication

We created a WeChat group for better discussion and quicker response. Please scan the personal QR code on the left, who is responsible for inviting you to the chat group. You can also scan the QR code on the right to follow our official account of WeNet Community.

About

We Speech Toolkit, LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published