JoyTTS is an end-to-end spoken chatbot that combines large language models (LLM) with text-to-speech (TTS) technology, featuring voice cloning capabilities. This project is built upon the open-source MiniCPM-o and CosyVoice2 models and trained on 2000 hours of conversational data. We have also provided the complete training code to facilitate further development and optimization by the community. On the testing machine seed-tts-zh, it achieves a SS (speaker similarity) score of 0.73 and a WER (Word Error Rate) of 5.09.
问题 | 孙悟空 | 猪八戒 | 林黛玉 |
参考音频 |
default.mov |
default.mov |
default.mov |
“今天天气怎么样” |
0.mov |
0.mov |
0.mov |
“中国的全称是什么” |
1.mov |
1.mov |
1.mov |
“手机都有哪些作用?” |
2.mov |
2.mov |
2.mov |
Model | SS |
WER |
---|---|---|
gpt-sovits | 0.55 | 5.13 |
cosyvoice2 | 0.748 | 1.45 |
Minicpm-o | 0.57 | - |
JoyTTS | 0.73 | 5.09 |
conda create -n JoyTTS -y python=3.10
conda activate JoyTTS
conda install -y -c conda-forge pynini==2.1.5
pip install -r requirements_JoyTTS.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-host=mirrors.aliyun.com
cp third_party/deepspeed/elasticity/elastic_agent.py $conda_envs/JoyTTS/lib/python3.10/site-packages/deepspeed/elasticity/elastic_agent.py
huggingface-cli download --resume-download --repo-type model jdh-algo/JoyTTS-v1 --local-dir pretrained_models
- Change the prompt info in pretrained_models/prompt_info.py
- Delete pretrained_models/spk2info.pt
- Run the following command
python inference.py --question '今天天气怎么样?' --spk_id '孙悟空'
python inference.py --question '今天天气怎么样?' --prompt_text 'prompt text' --prompt_wav the_path_of_prompt_wav
The result will be saved in "output.wav"
python gradio_demo.py
example (normalized_data.json) with 1 sample.
[
{
"utt": "0",
"reference": '数字银行 《数字银行》是2015年12月新世界出版社出版的金融类图书,该书的作者是刘振友。 书 名 数字银行 作 者 刘振友 类 别 金融 出版社 新世界出版社 出版时间 2015年12月 页 数 290 页 定 价 48.8 开 本 16 开 装 帧 平装 ISBN 978-7-5104-5480-6 数字银行内容简介 《数字银行》认为,传统银行若要更好地存续,必须实现自我进化。本书还将带你穿越历史,解读现代金融行业的发展轨道,知古鉴今,把握未来银行的发展思维脉络。 数字银行作者简介 刘振友是全球第一批致力于加密数字资产、数字货币、虚拟货币、商业积分、共赢经济模式、全民免费经济领域研究、应用、推广的资深专家,新锐作家。主要著作有:《颠覆》《3分钟让别人记住你》《互联网+》《数字银行》《让世界免费》。 数字银行作品目录',
"conversations": [
{
'role': 'user',
'content': '请问这本《数字银行》是什么时候出版的?'
},
{
'role': 'assistant',
'content': '这本《数字银行》是2015年12月出版的。它是一本金融类图书,由新世界出版社出版。了解这本书的出版时间可以帮助你了解它的时效性,从而了解该书所描述的内容和观点是否仍然适用于当前的金融环境。'
},
{
'role': 'user',
'content': '作者刘振友都写过哪些其他书籍?'
},
{
'role': 'assistant',
'content': '刘振友是一位新锐作家,他的主要著作有《颠覆》、《3分钟让别人记住你》、《互联网+》、《数字银行》和《让世界免费》。他是全球第一批致力于加密数字资产、数字货币、虚拟货币、商业积分、共赢经济模式、全民免费经济领域研究、应用、推广的资深专家。了解他的其他著作可以帮助你更全面地了解他的思想和研究领域。'
},
]
},
]
To enhance the efficiency of training the LLM-TTS model, we pre-generated hidden states using the already trained LLM-Chat model.
cd prepare_data_sever and python sever_llmanswer.py
cd prepare_data_sever and python sever_tts.py
cd prepare_data_generate
python 0_generate_datalist.py --data_root your_data_root --data_name your_data_name
python 1_generate_llmanswer.py --data_root your_data_root --data_name your_data_name
python 2_generate_tts.py --data_root your_data_root --data_name your_data_name
python 3_split_train_eval_offline.py --data_root your_data_root --data_name your_data_name
python 4_make_parquet_list.py --data_root your_data_root --data_name your_data_name
The generated data will be saved in {your_data_root}/parquet/{your_data_name}
cd prepare_data_sever and python sever_tts.py
cd prepare_data_generate_end2end
python 0_generate_datalist.py --data_root your_data_root --data_name your_data_name
python 2_generate_tts.py --data_root your_data_root --data_name your_data_name
python 3_split_train_eval_offline.py --data_root your_data_root --data_name your_data_name
python 4_make_parquet_list.py --data_root your_data_root --data_name your_data_name
The generated data will be saved in {your_data_root}/parquet_uselabel/{your_data_name}
cd examples/end2end
ln -s your_data_root data
sh run.sh #set stage=1
cd examples/end2end
ln -s your_data_root data
sh run.sh #set stage=2
- training and inference codes.
- trained models and technical report.
- freestyle dialogue model with voice cloning.
- vllm support.
- a better and faster model.
- automatic emotion control.
If you find our work helpful, please consider citing us:
@misc{zhou2025joytts,
title={JoyTTS: LLM-based Spoken Chatbot With Voice Cloning},
author={Fangru Zhou and Jun Zhao and Guoxin Wang},
year={2025},
howpublished = {\url{https://jdh-algo.github.io/JoyTTS}},
}
We would like to thank the contributors to the MiniCPM-o, CosyVoice2repositories, for their open research and extraordinary work.