Skip to content

megagonlabs/Modality-Bias

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Mixed Signals: Decoding VLMs' Reasoning and Underlying Bias in Vision-Language Conflict

This repository is the implementation for the paper "Mixed Signals: Decoding VLMs' Reasoning and Underlying Bias in Vision-Language Conflict"

alt text


🛠️ Dataset Construction Process

We created five benchmarks to investigate VLMs' bias:

Graph Connectivity

  • We begin with graph samples sourced from an existing dataset (Isobench).
  • For each sample, we manually modify the adjacency matrices in a minimal way so that the original structural connectivity remains largely intact, yet introduces a conflicting cue for the target nodes.
  • The number of edges in each modified graph serves as our measure of complexity.

Function Convexity

  • Using samples from the Isobench dataset, we generate conflicting image-text pairs by altering function expressions.
  • Each function’s coefficients are multiplied by minus one, thereby creating a scenario where the visual depiction of the function’s convexity conflicts with its textual description.
  • The complexity is gauged by the character count of the mathematical expression.

Polynomial Roots Calculation

  • For this benchmark, polynomials of degrees 1 through 4 (which have closed form solutions) are generated by selecting random roots within the range of –10 to 10.
  • These roots form both the polynomial expression and the corresponding visual representation.
  • A conflict is introduced by randomly altering one of the roots, with the polynomial’s degree acting as a proxy for sample complexity.

Physics and Chemistry Questions

  • Drawing on samples from the Isobench dataset, we modify the textual descriptions of physics and chemistry problems.
  • These modifications are designed so that the answer derived from the text differs from what is suggested by the visual cues.
  • We filter out any questions answerable by text alone.
  • Each sample is manually labeled as “easy” or “hard” to indicate its complexity.

Visual Description

  • Starting with matched image-text pairs from a the VSR dataset, we generate an extended image description using an AI language model.
  • We then manually invert specific spatial relationships in the extended description, creating a mismatched pair.
  • The performance of VLMs on these conflicts is analyzed based on the specific spatial relation.

📝 Data Source Attribution

Our benchmarks build upon data derived from two publicly available datasets:

  1. VSR Dataset

  2. Isobench Dataset

Please refer to the respective sources for detailed licensing terms.


🤖 AI-Generated Content Disclaimer

Parts of this dataset, including extended textual description in VSR, were generated using OpenAI's GPT-4o mini model.

  • The content adheres to OpenAI's Usage Policies.
  • Outputs were reviewed and refined to align with the dataset's objectives.
  • No prohibited use cases or violations of OpenAI's terms are present in this dataset.

Please ensure compliance with OpenAI's policies if redistributing or modifying this dataset.


🧠 Usage Guidelines

  • Use this dataset for research and educational purposes.
  • Commercial use may require additional permissions depending on source licenses.

Citation

If you would like to cite our work, the bibtex is:

@article{pezeshkpour2025mixed,
title={Mixed Signals: Decoding VLMs' Reasoning and Underlying Bias in Vision-Language Conflict},
author={Pezeshkpour, Pouya and Aminnaseri, Moin and Hruschka, Estevam},
journal={arXiv preprint arXiv:2504.08974},
year={2025}
}

📜 Disclosure

Embedded in, or bundled with, this product are open source software (OSS) components, datasets and other third party components identified below. The license terms respectively governing the datasets and third-party components continue to govern those portions, and you agree to those license terms, which, when applicable, specifically limit any distribution. You may receive a copy of, distribute and/or modify any open source code for the OSS component under the terms of their respective licenses, which may be CC license and Apache 2.0 license. In the event of conflicts between Megagon Labs, Inc., license conditions and the Open Source Software license conditions, the Open Source Software conditions shall prevail with respect to the Open Source Software portions of the software. You agree not to, and are not permitted to, distribute actual datasets used with the OSS components listed below. You agree and are limited to distribute only links to datasets from known sources by listing them in the datasets overview table below. You are permitted to distribute derived datasets of data sets from known sources by including links to original dataset source in the datasets overview table below. You agree that any right to modify datasets originating from parties other than Megagon Labs, Inc. are governed by the respective third party’s license conditions. All OSS components and datasets are distributed WITHOUT ANY WARRANTY, without even implied warranty such as for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE, and without any liability to or claim against any Megagon Labs, Inc. entity other than as explicitly documented in this README document. You agree to cease using any part of the provided materials if you do not agree with the terms or the lack of any warranty herein. While Megagon Labs, Inc., makes commercially reasonable efforts to ensure that citations in this document are complete and accurate, errors may occur. If you see any error or omission, please help us improve this document by sending information to contact_oss@megagon.ai.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published