Skip to content

Official repo for paper: "RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search"

License

Notifications You must be signed in to change notification settings

knoveleng/rainbowplus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

17 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŒˆ RainbowPlus

๐Ÿ“‹ Overview

This repository contains the implementation of the methods described in our research paper "RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search". Building upon the foundational insights of Rainbow Teaming and the MAP-Elites algorithm, RainbowPlus introduces key enhancements to the evolutionary quality-diversity (QD) paradigm.

Specifically, RainbowPlus reimagines the archive as a dynamic, multi-individual container that stores diverse high-fitness prompts per cell, analogous to maintaining a population of elite solutions across behavioral niches. This enriched archive enables a broader evolutionary exploration of adversarial strategies.

Furthermore, RainbowPlus employs a comprehensive fitness function that evaluates multiple candidate prompts in parallel using a probabilistic scoring mechanism, replacing traditional pairwise comparisons and enhancing both accuracy and computational efficiency. By integrating these evolutionary principles into its adaptive QD search, RainbowPlus achieves superior attack efficacy and prompt diversity, outperforming both QD-based methods and state-of-the-art red-teaming approaches.

Diagram

โœจ Key Features

  • ๐Ÿ† State-of-the-Art Performance: Achieves superior results compared to existing methods on HarmBench benchmark
  • ๐Ÿ”„ Universal Compatibility: Supports both open-source and closed-source LLMs (OpenAI, vLLM)
  • โšก Computational Efficiency: Completes an evaluation in just 1.45 hours on HarmBench
  • ๐Ÿ› ๏ธ Flexible Configuration: Highly customizable for various experimental settings

RainbowPlus achieves state-of-the-art results compared to other red-teaming methods.

Results

๐Ÿ“ Repository Structure

โ”œโ”€โ”€ configs/                  # Configuration files
โ”‚   โ”œโ”€โ”€ categories/           # Category definitions
โ”‚   โ”œโ”€โ”€ styles/               # Style definitions
โ”‚   โ”œโ”€โ”€ base.yml              # Base configuration
โ”‚   โ”œโ”€โ”€ base-openai.yml       # Configuration to run LLMs from OpenAI
โ”‚   โ”œโ”€โ”€ base-opensource.yml   # Configuration to run open-source LLMs
โ”‚   โ””โ”€โ”€ eval.yml              # Evaluation configuration
โ”‚
โ”œโ”€โ”€ data/                     # Dataset storage
โ”‚
โ”œโ”€โ”€ rainbowplus/              # Core package
โ”‚   โ”œโ”€โ”€ configs/              # Configuration utilities
โ”‚   โ”œโ”€โ”€ llms/                 # LLM integration modules
โ”‚   โ”œโ”€โ”€ scores/               # Fitness and similarity functions
โ”‚   โ”œโ”€โ”€ archive.py            # Archive management
โ”‚   โ”œโ”€โ”€ evaluate.py           # Current evaluation implementation
โ”‚   โ”œโ”€โ”€ evaluate_v0.py        # Evaluation implementation from old version
โ”‚   โ”œโ”€โ”€ get_scores.py         # Metrics extraction utilities
โ”‚   โ”œโ”€โ”€ prompts.py            # LLM prompt templates
โ”‚   โ”œโ”€โ”€ rainbowplus.py        # Main implementation
โ”‚   โ””โ”€โ”€ utils.py              # Utility functions
โ”‚
โ”œโ”€โ”€ sh/                       # Shell scripts
โ”‚   โ””โ”€โ”€ run.sh                # All-in-one execution script
โ”‚
โ”œโ”€โ”€ README.md                 # This documentation
โ””โ”€โ”€ setup.py                  # Package installation script

๐Ÿš€ Getting Started

1๏ธโƒฃ Environment Setup

Create and activate a Python virtual environment, then install the required dependencies:

python -m venv venv
source venv/bin/activate
pip install -e .

2๏ธโƒฃ API Configuration

๐Ÿค— Hugging Face Token (Optional)

Required for accessing certain resources from the Hugging Face Hub (e.g., Llama Guard):

export HF_AUTH_TOKEN="YOUR_HF_TOKEN"

Alternatively:

huggingface-cli login --token=YOUR_HF_TOKEN

๐Ÿ”‘ OpenAI API Key (Optional)

Required when using OpenAI models:

export OPENAI_API_KEY="YOUR_API_KEY"

๐Ÿ“Š Usage

๐Ÿง  LLM Configuration

RainbowPlus supports two primary LLM integration methods:

1๏ธโƒฃ vLLM (Open-Source Models)

Example configuration for Qwen-2.5-7B-Instruct:

target_llm:
  type_: vllm

  model_kwargs:
    model: Qwen/Qwen2.5-7B-Instruct
    trust_remote_code: True
    max_model_len: 2048
    gpu_memory_utilization: 0.5

  sampling_params:
    temperature: 0.6
    top_p: 0.9
    max_tokens: 1024

Additional parameters can be specified according to the vLLM model documentation and sampling parameters documentation.

2๏ธโƒฃ OpenAI API (Closed-Source Models)

Example configuration for GPT-4o-mini:

target_llm:
  type_: openai

  model_kwargs:
    model: gpt-4o-mini

  sampling_params:
    temperature: 0.6
    top_p: 0.9
    max_tokens: 1024

Additional parameters can be specified according to the OpenAI API documentation.

๐Ÿงช Running Experiments

Basic execution with default configuration:

python -m rainbowplus.rainbowplus \
    --config_file configs/base.yml \
    --num_samples 150 \
    --max_iters 400 \
    --sim_threshold 0.6 \
    --num_mutations 10 \
    --fitness_threshold 0.6 \
    --log_dir logs-sota \
    --dataset ./data/harmbench.json \
    --log_interval 50 \
    --shuffle True

For customized experiments, you can override target LLM and specific parameters:

python -m rainbowplus.rainbowplus \
    --config_file configs/base-opensource.yml \
    --num_samples -1 \
    --max_iters 400 \
    --sim_threshold 0.6 \
    --num_mutations 10 \
    --fitness_threshold 0.6 \
    --log_dir logs-sota \
    --dataset ./data/harmbench.json \
    --target_llm "TARGET MODEL" \
    --log_interval 50 \
    --shuffle True

โš™๏ธ Configuration Parameters

Parameter Description
target_llm Target LLM identifier
num_samples Number of initial seed prompts
max_iters Maximum number of iteration steps
sim_threshold Similarity threshold for prompt mutation
num_mutations Number of prompt mutations per iteration
fitness_threshold Minimum fitness score to add prompt to archive
log_dir Directory for storing logs
dataset Dataset path
shuffle Whether to shuffle seed prompts
log_interval Number of iterations between log saves

๐Ÿ”„ Batch Processing Multiple Models

For evaluating multiple models sequentially:

MODEL_IDS="meta-llama/Llama-2-7b-chat-hf lmsys/vicuna-7b-v1.5 baichuan-inc/Baichuan2-7B-Chat Qwen/Qwen-7B-Chat"

for MODEL in $MODEL_IDS; do
    python -m rainbowplus.rainbowplus \
        --config_file configs/base-opensource.yml \
        --num_samples -1 \
        --max_iters 400 \
        --sim_threshold 0.6 \
        --num_mutations 10 \
        --fitness_threshold 0.6 \
        --log_dir logs-sota \
        --dataset ./data/harmbench.json \
        --target_llm $MODEL \
        --log_interval 50 \
        --shuffle True

    # Clean cache between model runs
    rm -r ~/.cache/huggingface/hub/
done

๐Ÿ“Š Evaluation

After running experiments, evaluate the results:

MODEL_IDS="meta-llama/Llama-2-7b-chat-hf"  # For multiple models: MODEL_IDS="meta-llama/Llama-2-7b-chat-hf lmsys/vicuna-7b-v1.5"

for MODEL in $MODEL_IDS; do
    # Run evaluation
    python -m rainbowplus.evaluate \
        --config configs/eval.yml \
        --log_dir "./logs-sota/$MODEL/harmbench"

    # Extract metrics
    python rainbowplus/get_scores.py \
        --log_dir "./logs-sota/$MODEL/harmbench" \
        --keyword "global"
done

Evaluation Parameters

Parameter Description
config Path to configuration file
log_dir Directory containing experiment logs
keyword Keyword for global config file name (default: global), you can ignore this param

๐Ÿ“‰ Output Metrics

Results are saved in JSON format:

{
    "General": 0.79,
    "All": 0.8666092943201377
}

Where:

  • General: Metrics calculated following standard methods
  • All: Metrics calculated across all generated prompts

โšก Streamlined Execution

For end-to-end execution, use the provided shell script sh/run.sh:

bash sh/run.sh
  • Modify common parameters (log_dir, max_iters, ...) in line 4-11.
  • Modify target LLMs in line 56-77 for open-source models.
  • Modify target LLMs in line 80-83 for closed-source models.

๐Ÿ”ฎ Next Features

  • Support OpenAI as fitness function
  • Deploy via FastAPI
  • Support more LLMs

๐Ÿ“ Citation

@misc{dang2025rainbowplusenhancingadversarialprompt,
      title={RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search}, 
      author={Quy-Anh Dang and Chris Ngo and Truong-Son Hy},
      year={2025},
      eprint={2504.15047},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2504.15047}, 
}

About

Official repo for paper: "RainbowPlus: Enhancing Adversarial Prompt Generation via Evolutionary Quality-Diversity Search"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published