Content Injection Attacks in Neural IR Models

This repository contains code and scripts for studying content injection attacks in neural information retrieval models. Content injection attacks involve inserting query and query terms into non-relevant passages to make them seem relevant and inserting non-relevant or even harmful text into seemingly relevant passages to promote misleading or malicious content in search.

We will continue to refine this codebase. For questions or support, please reach out to mtamber@uwaterloo.ca.

Overview

This repository offers:

Scripts to generate adversarial passages and evaluate model vulnerability
Scripts to train and evaluate classifiers and embedding models more robust to content injection attacks

Repository Structure

Directories

attack_passages/
Scripts for creating adversarial passages tailored to specific models and attack scenarios.
attack_results/
Stores model outputs under attack, along with evaluation scripts to analyze vulnerability.
classifier/
Scripts for training and testing a classifier that flags adversarially modified passages.
embedding_retrieval/
Scripts to fine-tune and evaluate embedding models for passage retrieval that are more robust to adversarial passages.
llm_judge/
Scripts to evaluate large language model (LLM) judgments of passage relevance.
model_scores_and_judgements/
Retrieval results and LLM-based relevance judgments for different experiments.
queries/
Query sets for the datasets used in these experiments.
random_sentences/
Random sentences used for sentence injection.
rel_passage_gen/
Scripts to produce relevant passages given specific queries.
reranker/
Scripts for evaluating rerankers.

Files

adversarial_passage_generator.py
Implements an AdversarialGenerator class to inject queries, keywords, or arbitrary text into passages, creating adversarial passages for testing.
get_passages_and_sentences_from_beir_corpora.py
Extracts valid sentences from BEIR corpora passages using heuristics and filters out passages lacking meaningful sentences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Content Injection Attacks in Neural IR Models

Overview

Repository Structure

Directories

Files

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
attack_passages		attack_passages
attack_results		attack_results
classifier		classifier
embedding_retrieval		embedding_retrieval
llm_judge		llm_judge
model_scores_and_judgements		model_scores_and_judgements
queries		queries
random_sentences		random_sentences
rel_passage_gen		rel_passage_gen
reranker		reranker
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
adversarial_passage_generator.py		adversarial_passage_generator.py
get_passages_and_sentences_from_beir_corpora.py		get_passages_and_sentences_from_beir_corpora.py

License

manveertamber/content_injection_attacks

Folders and files

Latest commit

History

Repository files navigation

Content Injection Attacks in Neural IR Models

Overview

Repository Structure

Directories

Files

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages