Skip to content

SandyyyZheng/JailbreakSystem

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

43 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

😈 Jailbreak System

A platform for testing, evaluating, and analyzing jailbreak attacks against large language models. This system provides tools and interfaces for users to assess the robustness of closed-source language models against various attack strategies.

🐱 Project Overview

The Jailbreak System consists of three main components:

  1. Frontend: A React-based web interface for interacting with the system
  2. Backend: A Flask API server handling the core logic and model interactions
  3. Database: Stores attack patterns, prompts, and results

🦾 Updates

04/25/2025: Enables real LLM APIs (gpt-4o-mini, gpt-4o-2024-0806, gpt-4-turbo, claude-3.5-sonnet)

05/08/2025: Optimized evaluation logic (Harmful Score >= 4 -> Success)

05/10/2025: Implemented our own algorithm MIST!

🙌 Demo

Homepage

Homepage

Attacks Page

Attackspage

Prompts Page

PromptsPage

Results Page

ResultsPage

Result Details

Details

Statistics Page

StatsPage

✳️ Features

  • Create and manage jailbreak attacks
  • Test attacks against various language models
  • Analyze attack success rates and patterns
  • Categorize and organize prompts
  • Visualize attack results
  • Implement custom attack algorithms

✅ Supported Attack Algorithms

The system incorporates the following algorithms:

  1. Multi-language: Uses low-resource language to bypass restrictions. Please refer to NeuraIPS'23Workshop-LRL
  2. ASCII Art: Encodes sensitive words using ASCII art, built upon ACL'24-ArtPrompt
  3. Cipher: Uses various cryptographic encoding methods to bypass content moderation, built upon ICLR'24-CipherChat
  4. MIST: Our own jailbreak algorithm!! Please refer to mist_optimizer.py

🛖 Structure

JailbreakSystem/
├── frontend/          # React-based web interface
└── backend/          # Flask API server (includes database)

🔛 Getting Started

  1. Clone the repository:
git clone https://github.com/SandyyyZheng/JailbreakSystem.git
cd JailbreakSystem
  1. For documentations, see:

📖 License

This project is under the MIT license.

👻 Acknowledgments

  • Deepbricks for providing the APIs
  • 2025 Graduate Design for HFUT
  • Advised by Prof. Yuanzhi Yao. My deepest thanks to Dr. Yao for all the encouragement and support along the way 🥺
  • Relies heavily on Cursor (mainly claude-3.5-sonnet & claude-3.7-sonnet) to construct framework and fix bugs. Kudos to AI🤖!

About

JailbreakSystem--2025 Graduate Design for HFUT

Topics

Resources

License

Stars

Watchers

Forks