This repository contains the papers, datasets, and resources associated with the systematic review titled "Data-centric Approaches to Boosting MLLMs (Multimodal Large Language Models): A Systematic Review." The review focuses on exploring various data-centric strategies employed to enhance the performance, robustness, and safety of MLLMs across different domains.
Multimodal Large Language Models (MLLMs) are emerging as powerful tools for handling complex tasks that involve both visual and textual information. However, their success heavily depends on the quality and diversity of the data used during training and fine-tuning. This repository collects papers and resources related to data-centric approaches that focus on optimizing, curating, and utilizing data to improve MLLM performance in various applications such as vision-language understanding, visual question answering, and safety-critical tasks.
Our goal is to systematically review and categorize recent works that highlight data-centric methodologies for:
- Improving model generalization
- Enhancing multimodal alignment
- Ensuring safety and fairness
- Handling data noise and imbalance
📁 Data-centric-Approaches-MLLMs │ ├── 📁 papers/ # Folder containing the key papers reviewed │ ├── paper1.pdf │ ├── paper2.pdf │ └── ... │ ├── 📁 datasets/ # Links and references to relevant datasets │ ├── dataset1_info.md │ ├── dataset2_info.md │ └── ... │ ├── 📁 scripts/ # Any useful scripts for processing datasets or experiments │ └── process_data.py │ └── README.md # You are here!
Here is a selection of key papers reviewed in this repository:
-
Paper Title 1
Authors: A. Author, B. Author
Summary: A comprehensive overview of data-centric techniques for improving vision-language alignment in MLLMs. Link to paper -
Paper Title 2
Authors: C. Author, D. Author
Summary: This paper focuses on noise-robust training methodologies for multimodal models by curating high-quality datasets. Link to paper -
Paper Title 3
Authors: E. Author, F. Author
Summary: A study on the impact of fine-tuning with curated safety datasets to prevent harmful content generation in MLLMs. Link to paper
For a full list of papers, see the papers directory.
We review and compile several datasets relevant to data-centric MLLM training and evaluation. Some examples include:
-
Dataset Name 1
Description: A large-scale multimodal dataset for image-captioning tasks.
Link to dataset -
Dataset Name 2
Description: A benchmark dataset for evaluating safety and fairness in vision-language models.
Link to dataset
Refer to the datasets directory for detailed descriptions and links.
Contributions are welcome! If you have relevant papers, datasets, or scripts that align with the theme of this repository, feel free to open a pull request or raise an issue.
Please make sure to follow these guidelines when contributing:
- Fork the repository.
- Create a new branch for your changes.
- Ensure your contribution is relevant to data-centric approaches in MLLMs.
- Submit a pull request for review.
This repository is licensed under the MIT License. See the LICENSE file for more details.