diff --git a/README.md b/README.md index 2b6d237b..ae3f0ea2 100644 --- a/README.md +++ b/README.md @@ -94,6 +94,7 @@ A speech-to-speech dialogue model with both low-latency and high intelligence wh ## Multimodal Instruction Tuning | Title | Venue | Date | Code | Demo | |:--------|:--------:|:--------:|:--------:|:--------:| +| ![Star](https://img.shields.io/github/stars/ICTNLP/LLaVA-Mini.svg?style=social&label=Star)
[**LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token**](https://arxiv.org/pdf/2501.03895)
| arXiv | 2025-01-03 | [Github](https://github.com/ictnlp/LLaVA-Mini) | Local Demo | | ![Star](https://img.shields.io/github/stars/VITA-MLLM/VITA.svg?style=social&label=Star)
[**VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction**](https://arxiv.org/pdf/2501.01957)
| arXiv | 2025-01-03 | [Github](https://github.com/VITA-MLLM/VITA) | - | | ![Star](https://img.shields.io/github/stars/QwenLM/Qwen2-VL.svg?style=social&label=Star)
[**QVQ: To See the World with Wisdom**](https://qwenlm.github.io/blog/qvq-72b-preview/)
| Qwen | 2024-12-25 | [Github](https://github.com/QwenLM/Qwen2-VL) | [Demo](https://qwenlm.github.io/blog/qvq-72b-preview/) | | ![Star](https://img.shields.io/github/stars/deepseek-ai/DeepSeek-VL2.svg?style=social&label=Star)
[**DeepSeek-VL2: Mixture-of-Experts Vision-Language Models for Advanced Multimodal Understanding**](https://arxiv.org/pdf/2412.10302)
| arXiv | 2024-12-13 | [Github](https://github.com/deepseek-ai/DeepSeek-VL2) | - |