Skip to content

SuDIS-ZJU/Data-Centric-MLLMs-Enhancements

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

Data-centric Approaches to Boosting MLLMs: A Systematic Review

This repository contains the papers, datasets, and resources associated with the systematic review titled "Data-centric Approaches to Boosting MLLMs (Multimodal Large Language Models): A Systematic Review." The review focuses on exploring various data-centric strategies employed to enhance the performance, robustness, and safety of MLLMs across different domains.

Table of Contents

Introduction

Multimodal Large Language Models (MLLMs) are emerging as powerful tools for handling complex tasks that involve both visual and textual information. However, their success heavily depends on the quality and diversity of the data used during training and fine-tuning. This repository collects papers and resources related to data-centric approaches that focus on optimizing, curating, and utilizing data to improve MLLM performance in various applications such as vision-language understanding, visual question answering, and safety-critical tasks.

Our goal is to systematically review and categorize recent works that highlight data-centric methodologies for:

  • Improving model generalization
  • Enhancing multimodal alignment
  • Ensuring safety and fairness
  • Handling data noise and imbalance

Repository Structure

📁 Data-centric-Approaches-MLLMs │ ├── 📁 papers/ # Folder containing the key papers reviewed │ ├── paper1.pdf │ ├── paper2.pdf │ └── ... │ ├── 📁 datasets/ # Links and references to relevant datasets │ ├── dataset1_info.md │ ├── dataset2_info.md │ └── ... │ ├── 📁 scripts/ # Any useful scripts for processing datasets or experiments │ └── process_data.py │ └── README.md # You are here!

Key Papers

Here is a selection of key papers reviewed in this repository:

  1. Paper Title 1
    Authors: A. Author, B. Author
    Summary: A comprehensive overview of data-centric techniques for improving vision-language alignment in MLLMs. Link to paper

  2. Paper Title 2
    Authors: C. Author, D. Author
    Summary: This paper focuses on noise-robust training methodologies for multimodal models by curating high-quality datasets. Link to paper

  3. Paper Title 3
    Authors: E. Author, F. Author
    Summary: A study on the impact of fine-tuning with curated safety datasets to prevent harmful content generation in MLLMs. Link to paper

For a full list of papers, see the papers directory.

Datasets

We review and compile several datasets relevant to data-centric MLLM training and evaluation. Some examples include:

  1. Dataset Name 1
    Description: A large-scale multimodal dataset for image-captioning tasks.
    Link to dataset

  2. Dataset Name 2
    Description: A benchmark dataset for evaluating safety and fairness in vision-language models.
    Link to dataset

Refer to the datasets directory for detailed descriptions and links.

Contributing

Contributions are welcome! If you have relevant papers, datasets, or scripts that align with the theme of this repository, feel free to open a pull request or raise an issue.

Please make sure to follow these guidelines when contributing:

  1. Fork the repository.
  2. Create a new branch for your changes.
  3. Ensure your contribution is relevant to data-centric approaches in MLLMs.
  4. Submit a pull request for review.

License

This repository is licensed under the MIT License. See the LICENSE file for more details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published