SALMONN family: A suite of advanced multi-modal LLMs

🚀🚀 Welcome to the repo of SALMONN!

The SALMONN model family consists of a series of advanced multi-modal large language models. For more details, please refer to the corresponding branches.

🔥 News

[2025-07-08] We have opensourced video-SALMONN 2! video-SALMONN 2 is a powerful audio-visual LLM that generates high-quality audio-visual video captions and achieves competitive performance on general video QA benchmarks.
[2025-06-01] We have opensourced QualiSpeech dataset - A speech quality assessment dataset with natural language reasoning. You can use QualiSpeech to develop your own audio LLM for speech quality assessment or to evaluate the low-level speech perception capabilities of existing audio LLMs. Feel free to download it here!
[2025-03-03] We have released the data processing scripts and finetuned model checkpoints for SALMONN for speech quality assessment! See here!
[2024-09-04] We have released the model and inference code for video-SALMONN! See here!
[2024-05-28] 🧳 We have released all the annotations (including 600k SQA/AQA data and 50k audio-based storytelling data) for the 3-stage training of SALMONN! Feel free to download them here!
[2024-04-07] 🤖 We have released all the codes you need to train your own SALMONN! Try some cool things!
[2024-01-16] 💖 Our paper was accepted by ICLR 2024!
[2023-11-13] 🎁 We have released a 7B version of SALMONN at tsinghua-ee/SALMONN-7B and built the 7B demo here!
[2023-10-08] ✨ We have released the model checkpoint and the inference code for SALMONN-13B!

📖 Paper List

@inproceedings{
  sun2025videosalmonno1,
  title={{video-SALMONN-o1}: Reasoning-enhanced Audio-visual Large Language Model},
  author={Guangzhi Sun, Yudong Yang, Jimin Zhuang, Changli Tang, Yixuan Li, Wei Li, Zejun MA, Chao Zhang},
  booktitle={ICML},
  year={2025}
}

@article{tang2025video,
    title={{video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models}}, 
    author={Changli Tang and Yixuan Li and Yudong Yang and Jimin Zhuang and Guangzhi Sun and Wei Li and Zejun Ma and Chao Zhang},
    journal={arXiv preprint arXiv:2506.15220},
    year={2025},
}

@inproceedings{wang2024enabling,
  title={Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation},
  author={Wang, Siyin and Yu, Wenyi and Yang, Yudong and Tang, Changli and Li, Yixuan and Zhuang, Jimin and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others},
  booktitle={Proc. ICASSP},
  address={Hyderabad},
  year={2025}
}

@inproceedings{wang2024enabling,
  title={QualiSpeech: A Speech Quality Assessment Dataset with Natural Language Reasoning and Descriptions},
  author={Wang, Siyin and Yu, Wenyi and Chen, Xianzhao and Tian, Xiaohai and Zhang, Jun and Sun, Guangzhi and others},
  booktitle={Proc. ACL},
  address={Vienna},
  year={2025}
}

@inproceedings{
  sun2024videosalmonn,
  title={video-{SALMONN}: Speech-Enhanced Audio-Visual Large Language Models},
  author={Guangzhi Sun and Wenyi Yu and Changli Tang and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Yuxuan Wang and Chao Zhang},
  booktitle={Forty-first International Conference on Machine Learning},
  year={2024},
  url={https://openreview.net/forum?id=nYsh5GFIqX}
}

@inproceedings{
  tang2024salmonn,
  title={SALMONN: Towards Generic Hearing Abilities for Large Language Models},
  author={Changli Tang and Wenyi Yu and Guangzhi Sun and Xianzhao Chen and Tian Tan and Wei Li and Lu Lu and Zejun MA and Chao Zhang},
  booktitle={The Twelfth International Conference on Learning Representations},
  year={2024},
  url={https://openreview.net/forum?id=14rn7HpKVk}
}

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
other_third-party_licenses		other_third-party_licenses
resource		resource
.gitattributes		.gitattributes
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

SALMONN family: A suite of advanced multi-modal LLMs

🔥 News

📖 Paper List

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Uh oh!

License

bytedance/SALMONN

Folders and files

Latest commit

History

Repository files navigation

SALMONN family: A suite of advanced multi-modal LLMs

🔥 News

📖 Paper List

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Uh oh!

Packages