diff --git a/README.md b/README.md
index 2b6d237b..ae3f0ea2 100644
--- a/README.md
+++ b/README.md
@@ -94,6 +94,7 @@ A speech-to-speech dialogue model with both low-latency and high intelligence wh
## Multimodal Instruction Tuning
| Title | Venue | Date | Code | Demo |
|:--------|:--------:|:--------:|:--------:|:--------:|
+| 
[**LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token**](https://arxiv.org/pdf/2501.03895)
| arXiv | 2025-01-03 | [Github](https://github.com/ictnlp/LLaVA-Mini) | Local Demo |
| 
[**VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction**](https://arxiv.org/pdf/2501.01957)
| arXiv | 2025-01-03 | [Github](https://github.com/VITA-MLLM/VITA) | - |
| 
[**QVQ: To See the World with Wisdom**](https://qwenlm.github.io/blog/qvq-72b-preview/)
| Qwen | 2024-12-25 | [Github](https://github.com/QwenLM/Qwen2-VL) | [Demo](https://qwenlm.github.io/blog/qvq-72b-preview/) |
| 
[**DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding**](https://arxiv.org/pdf/2412.10302)
| arXiv | 2024-12-13 | [Github](https://github.com/deepseek-ai/DeepSeek-VL2) | - |