Hateful_Memes_in_VLM

This repository contains the source code, datasets, generated content, and scripts for the paper "From Meme to Threat: On the Hateful Meme Understanding and Induced Hateful Content Generation in Open-Source Vision Language Models."

In this paper, we present an in-depth evaluation of VLMs' ability to interpret hateful memes by curating a dataset of 39 hateful memes and over 12,000 responses from seven representative VLMs using carefully designed prompts. We also assess how malicious users could exploit VLMs and hateful memes to generate hateful content systematically. We generate hateful content (hate speech, jokes, and slogans) based on hateful memes and calculate the hatefulness of the generated content.

We have published the annotated dataset on Hugging Face (https://huggingface.co/datasets/TrustAIRLab/Hateful_Memes_in_VLM). Due to the inclusion of sensitive information in the source code and dataset, the detailed project data is hosted on Zenodo (https://zenodo.org/records/14752660) and is available upon request.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hateful_Memes_in_VLM

About

Uh oh!

Releases

Packages

License

TrustAIRLab/Hateful_Memes_in_VLM

Folders and files

Latest commit

History

Repository files navigation

Hateful_Memes_in_VLM

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages