|
2 | 2 |
|
3 | 3 | # <img width="60" alt="image" src="https://github.com/OpenGVLab/InternVL/assets/47669167/7037290e-f474-4d11-b90f-1d8316087bf8"> InternVL Family: Closing the Gap to Commercial Multimodal Models with Open-Source Suites —— A Pioneering Open-Source Alternative to GPT-4o
|
4 | 4 |
|
5 |
| -[\[🔥 Mini-InternVL\]](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/mini_internvl) [\[🆕 Blog\]](https://internvl.github.io/blog/) [\[🤔 FAQs\]](https://internvl.readthedocs.io/en/latest/tutorials/faqs.html) [\[🚀 InternVL2 Blog\]](https://internvl.github.io/blog/2024-07-02-InternVL-2.0/) [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/) [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[📖 Document\]](https://internvl.readthedocs.io/en/latest/) [\[🌐 API\]](https://internvl.readthedocs.io/en/latest/get_started/internvl_chat_api.html) [\[🚀 Quick Start\]](#quick-start-with-huggingface) |
| 5 | +<div align="center"> |
| 6 | + <img width="500" alt="image" src="https://github.com/user-attachments/assets/930e6814-8a9f-43e1-a284-118a5732daa4"> |
| 7 | + <br> |
| 8 | +</div> |
| 9 | + |
| 10 | +[\[🆕 Blog\]](https://internvl.github.io/blog/) [\[🤔 FAQs\]](https://internvl.readthedocs.io/en/latest/tutorials/faqs.html) [\[🚀 InternVL2 Blog\]](https://internvl.github.io/blog/2024-07-02-InternVL-2.0/) [\[🗨️ Chat Demo\]](https://internvl.opengvlab.com/) [\[🤗 HF Demo\]](https://huggingface.co/spaces/OpenGVLab/InternVL) [\[📖 Document\]](https://internvl.readthedocs.io/en/latest/) [\[🌐 API\]](https://internvl.readthedocs.io/en/latest/get_started/internvl_chat_api.html) [\[🚀 Quick Start\]](#quick-start-with-huggingface) |
6 | 11 |
|
7 |
| -[\[🔥 Mini-InternVL Report\]](https://arxiv.org/abs/2410.16261) [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) [\[📖 2.0 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) [\[📖 1.5 中文解读\]](https://zhuanlan.zhihu.com/p/699439759) [\[📖 1.0 中文解读\]](https://zhuanlan.zhihu.com/p/702946079) |
| 12 | +[\[🔥 Mini-InternVL Report\]](https://arxiv.org/abs/2410.16261) [\[📜 InternVL 1.5 Report\]](https://arxiv.org/abs/2404.16821) [\[📜 InternVL 1.0 Paper\]](https://arxiv.org/abs/2312.14238) |
| 13 | + |
| 14 | +[\[📖 2.0 中文解读\]](https://zhuanlan.zhihu.com/p/706547971) [\[📖 1.5 中文解读\]](https://zhuanlan.zhihu.com/p/699439759) [\[📖 1.0 中文解读\]](https://zhuanlan.zhihu.com/p/702946079) |
8 | 15 |
|
9 | 16 | [Switch to the Chinese version (切换至中文版)](/README_zh.md)
|
10 | 17 |
|
|
16 | 23 | </div>
|
17 | 24 |
|
18 | 25 | ## News 🚀🚀🚀
|
19 |
| -- `2024/10/21`: We release the Mini-InternVL series, which includes three chat models: __Mini-InternVL-1B__, __Mini-InternVL-2B__ and __Mini-InternVL-4B__. These models achieve impressive performance with minimal size: the 4B model achieves 90% of the performance with just 5% of the model size. For more details, please check our [Project page](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/mini_internvl) and [Document](https://internvl.readthedocs.io/en/latest/internvl2.0/domain_adaptation.html). |
| 26 | + |
| 27 | +- `2024/10/21`: We release the Mini-InternVL series. These models achieve impressive performance with minimal size: the 4B model achieves 90% of the performance with just 5% of the model size. For more details, please check our [project page](https://github.com/OpenGVLab/InternVL/tree/main/internvl_chat/shell/mini_internvl) and [document](https://internvl.readthedocs.io/en/latest/internvl2.0/domain_adaptation.html). |
20 | 28 | - `2024/08/01`: The [Chartmimic](https://chartmimic.github.io/) team evaluated the InternVL2 series models on their benchmark. The InternVL2-26B and 76B models achieved the top two performances among open-source models, with the InternVL2 76B model surpassing GeminiProVision and exhibiting comparable results to Claude-3-opus.
|
21 |
| -- `2024/08/01`: InternVL2-Pro achieved the SOTA performance among open-source models on the [CharXiv](https://charxiv.github.io/#leaderboard) dataset, surpassing some well-known closed-source models such as GPT-4V, Gemini 1.5 Flash, and Claude 3 Sonnet. |
| 29 | +- `2024/08/01`: InternVL2-Pro achieved the SOTA performance among open-source models on the [CharXiv](https://charxiv.github.io/#leaderboard) dataset, surpassing many closed-source models such as GPT-4V, Gemini 1.5 Flash, and Claude 3 Sonnet. |
22 | 30 | - `2024/07/24`: The [MLVU](https://github.com/JUNJIE99/MLVU) team evaluated InternVL-1.5 on their benchmark. The average performance on the multiple-choice task was 50.4%, while the performance on the generative tasks was 4.02. The performance on the multiple-choice task ranked #1 among all open-source MLLMs.
|
23 | 31 | - `2024/07/18`: 🔥🔥 InternVL2-40B achieved SOTA performance among open-source models on the [Video-MME](https://github.com/BradyFU/Video-MME) dataset, scoring 61.2 when inputting 16 frames and 64.4 when inputting 32 frames. It significantly outperforms other open-source models and is the closest open-source model to GPT-4o mini.
|
24 | 32 | - `2024/07/18`: 🔥 InternVL2-Pro achieved the SOTA performance on the [DocVQA](https://rrc.cvc.uab.es/?ch=17&com=evaluation&task=1) and [InfoVQA](https://rrc.cvc.uab.es/?ch=17&com=evaluation&task=3) benchmarks.
|
|
29 | 37 | - `2024/05/13`: InternVL 1.0 can now be used as the [text encoder](https://huggingface.co/OpenGVLab/InternVL-14B-224px) for diffusion models to support multilingual generation natively in over 110 languages worldwide. See [MuLan](https://github.com/mulanai/MuLan) for more details.
|
30 | 38 | - `2024/04/18`: InternVL-Chat-V1-5 has been released at [HF link](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5), approaching the performance of GPT-4V and Gemini Pro on various benchmarks like MMMU, DocVQA, ChartQA, MathVista, etc.
|
31 | 39 | - `2024/02/27`: InternVL is accepted by CVPR 2024 (Oral)! 🎉
|
32 |
| -- `2024/02/24`: InternVL-Chat models have been included in the [VLMEvalKit](https://github.com/open-compass/VLMEvalKit). |
33 | 40 | - `2024/02/21`: [InternVL-Chat-V1-2-Plus](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2-Plus) achieved SOTA performance on MathVista (59.9), MMBench (83.8), and MMVP (58.7). See our [blog](https://internvl.github.io/blog/2024-02-21-InternVL-1.2/) for more details.
|
34 | 41 | - `2024/02/12`: InternVL-Chat-V1-2 has been released. It achieves 51.6 on MMMU val and 82.3 on MMBench test. For more details, please refer to our [blog](https://internvl.github.io/blog/2024-02-21-InternVL-1.2/) and [SFT data](./internvl_chat#prepare-training-datasets). The model is now available on [HuggingFace](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-2), and both training / evaluation data and scripts are open-sourced.
|
35 | 42 | - `2024/01/24`: InternVL-Chat-V1-1 is released, it supports Chinese and has stronger OCR capability, see [here](https://huggingface.co/OpenGVLab/InternVL-Chat-V1-1).
|
|
0 commit comments