Awesome Red-Teaming LLMs

A comprehensive guide to understanding Attacks, Defenses and Red-Teaming for Large Language Models (LLMs).

Red-Teaming Attack Taxonomy

Evaluation & Benchmarks

Title	Link
Beyond Uniform Criteria: Scenario-Adaptive Multi-Dimensional Jailbreak Evaluation (SceneJailEval)	Link

Other Surveys

Title	Link
SoK: Prompt Hacking of Large Language Models	Link
A Survey on Trustworthy LLM Agents: Threats and Countermeasures	Link
The Emerged Security and Privacy of LLM Agent: A Survey with Case Studies	Link

Red-Teaming

Title	Link
Red-Teaming for Generative AI: Silver Bullet or Security Theater?	Link
Lessons From Red Teaming 100 Generative AI Products	Link
Rainbow Teaming: Open-Ended Generation of Diverse Adversarial Prompts	Link
Red-Teaming LLM Multi-Agent Systems via Communication Attacks	Link
Red Teaming the Mind of the Machine: A Systematic Evaluation of Prompt Injection and Jailbreak Vulnerabilities in LLMs	Link
Red-Teaming LLM Multi-Agent Systems via Communication Attacks	Link
TRIDENT: Enhancing Large Language Model Safety with Tri-Dimensional Diversified Red-Teaming Data Synthesis	Link
The State of Multilingual LLM Safety Research: From Measuring the Language Gap to Mitigating It	Link
RedTeamCUA: Realistic Adversarial Testing of Computer-Use Agents in Hybrid Web-OS Environments	Link
RRTL: Red Teaming Reasoning Large Language Models in Tool Learning	Link
Capability-Based Scaling Laws for LLM Red-Teaming	Link
Automated Red Teaming with GOAT: the Generative Offensive Agent Tester	Link
Strategize Globally, Adapt Locally: A Multi-Turn Red Teaming Agent with Dual-Level Learning	Link
Capability-Based Scaling Laws for LLM Red-Teaming	Link

If you like our work, please consider citing. If you would like to add your work to our taxonomy please open a pull request.

BibTex

@article{verma2024operationalizing,
  title={Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)},
  author={Verma, Apurv and Krishna, Satyapriya and Gehrmann, Sebastian and Seshadri, Madhavan and Pradhan, Anu and Ault, Tom and Barrett, Leslie and Rabinowitz, David and Doucette, John and Phan, NhatHai},
  journal={arXiv preprint arXiv:2407.14937},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 68 Commits
fig		fig
Attacks.md		Attacks.md
CONTRIBUTING.md		CONTRIBUTING.md
CompoundSystemsAttacks.md		CompoundSystemsAttacks.md
Defenses.md		Defenses.md
LICENSE		LICENSE
Operationalizing_a_Threat_Model__TMLR_.pdf		Operationalizing_a_Threat_Model__TMLR_.pdf
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Awesome Red-Teaming LLMs

Contents

Red-Teaming Attack Taxonomy

Evaluation & Benchmarks

Other Surveys

Red-Teaming

BibTex

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

License

dapurv5/awesome-red-teaming-llms

Folders and files

Latest commit

History

Repository files navigation

Awesome Red-Teaming LLMs

Contents

Red-Teaming Attack Taxonomy

Evaluation & Benchmarks

Other Surveys

Red-Teaming

BibTex

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Packages