Awesome Offensive Security AI

简体中文

A curated list of outstanding offensive security AI work, papers, organizations, libraries, and benchmarks.

Benchmarks & Evaluation

Cyber-Crowdsourced Elicitation

Evaluating AI cyber capabilities with crowdsourced elicitation. (Published: Jul 8, 2025)

Cyber-Crowdsourced Elicitation Blog Post

CVE-Bench

CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Vulnerabilities. (Submitted: Mar 17, 2025)

QHackBench

QHackBench: Benchmarking Large Language Models for Quantum Code Generation using PennyLane Hackathon Challenges. (Submitted: Jun 24, 2025)

QHackBench Paper

Bountybench

A framework to capture offensive & defensive cyber-capabilities in evolving real-world systems. (Submitted: May 21, 2025)

nyuctf_agents

nyuctf_agents: Baseline LLM agents for NYU CTF benchmark. (Latest Release: Feb 6, 2025)

InterCode-CTF

InterCode-CTF: A benchmark for evaluating LLMs in capture-the-flag challenges. (Submitted: Dec 3, 2024)

Cybench

Cybench: A Framework for Evaluating Cybersecurity Capabilities and Risks of Language Models. (Submitted: Aug 15, 2024)

CYBERSECEVAL 3

CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models. (Submitted: Aug, 2024)

CYBERSECEVAL 3 Paper

CyberMetric

CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge. (Submitted: Jul, 2024)

SEvenLLM

SEvenLLM: A benchmark to elicit, and improve cybersecurity incident analysis and response abilities in LLMs for Security Events. (Submitted: May, 2024)

CyberSecEval_2

CYBERSECEVAL 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models. (Submitted: Apr, 2024)

GDM Dangerous Capabilities

GDM Dangerous Capabilities: Capture the Flag. (Submitted: Mar, 2024)

LLM_CTF

An Empirical Evaluation of LLMs for Solving Offensive Security Challenges. (Submitted: Feb 19, 2024)

SecQA

SecQA: A Concise Question-Answering Dataset for Evaluating Large Language Models in Computer Security. (Submitted: Dec, 2023)

LLM Agents & Frameworks

CRAKEN

CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution. (Submitted: May 21, 2025)

RedTeamLLM

RedTeamLLM: an Agentic AI framework for offensive security. (Submitted: May 11, 2025)

HackSynth-GRPO

HackSynth-GRPO: A framework that enhances LLM agents for cryptographic CTF challenges using Guided Reinforcement Prompt Optimisation (GRPO). (Published: Apr 2025)

CAI

A lightweight, ergonomic framework for building bug bounty-ready Cybersecurity AIs (CAIs). (Paper Published: Apr 2025)

D-CIPHER

D-CIPHER: Dynamic collaborative intelligent multi-agent system for offensive security. (Submitted: Feb 15, 2025)

D-CIPHER Paper

CTFTiny

CTFTINY: LITE BENCHMARKING OFFENSIVE CYBER SKILLS IN LARGE LANGUAGE MODELS. (Submitted: Feb 11, 2025)

CTFTiny GitHub

PentestGPT

PentestGPT: A GPT-empowered penetration testing tool. (Published: Aug 12, 2024)

HackSynth

HackSynth: LLM agent and evaluation framework for autonomous penetration testing. (Submitted: Dec 2, 2024)

EnIGMA

EnIGMA: A mode for solving offensive cybersecurity (capture the flag) challenges, achieving state-of-the-art results on multiple cybersecurity benchmarks. (Submitted: Sep 24, 2024)

XBOW(Business)

XBOW: A system that autonomously finds and exploits potential security vulnerabilities. (Initial commit: ~10 months ago)

Competitions

AI Cyber Challenge

AI Cyber Challenge: A competition to advance the state of the art in AI for cybersecurity.

AI Cyber Challenge Website

LLM Security Research & Tools

Teaching LLMs how to XSS

Teaching LLMs how to XSS: An introduction to fine-tuning and reinforcement learning (using your own GPU)

Slides

inspect_cyber

inspect_cyber: An Inspect extension developed by UKGovernmentBEIS, designed for agentic cyber evaluations. This project aims to facilitate the assessment of AI agents in cybersecurity contexts.

How to Contribute

Contributions are always welcome! If you have a project, paper, or resource that fits this list, please feel free to open a pull request. Please ensure your submission adheres to the existing format and categories.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
CONTRIBUTING.md		CONTRIBUTING.md
README.md		README.md
README_ZH.md		README_ZH.md

ox01024/awesome-offensive-security-ai

Folders and files

Latest commit

History

Repository files navigation