OET: Optimization-based prompt injection Evaluation Toolkit

Jinsheng Pan, Xiaogeng Liu, and Chaowei Xiao.

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language understanding and generation, enabling their widespread adoption across various domains. However, their susceptibility to prompt injection attacks poses significant security risks, as adversarial inputs can manipulate model behavior and override intended instructions. Despite numerous defense strategies, a standardized framework to rigorously evaluate their effectiveness, especially under adaptive adversarial scenarios—is lacking. To address this gap, we introduce OET, an optimization-based evaluation toolkit that systematically benchmarks prompt injection attacks and defenses across diverse datasets using an adaptive testing framework. Our toolkit features a modular workflow that facilitates adversarial string generation, dynamic attack execution, and comprehensive result analysis, offering a unified platform for assessing adversarial robustness. Crucially, the adaptive testing framework leverages optimization methods with both white-box and black-box access to generate worst-case adversarial examples, thereby enabling strict red-teaming evaluations. Extensive experiments underscore the limitations of current defense mechanisms, with some models remaining susceptible even after implementing security enhancements.

Update

Date	Event
2025/5/01	We released our paper.
2025/4/09	We released our code.

Installation

    git clone https://github.com/Victor-lol/OET.git

    conda create -n oet python=3.10
    conda activate oet
    
    cd ./OET
    python3 setup.py install 
    pip install -r requirements.txt
    pip install -e.

Evaluation

Data Transformation

an example is shown in example/ex_data.py, users can transform data in their desired format. Deafult format: json, csv

Open-sourced Model

Construct configuration, modify configs/optimizer_config.yaml
Train Adv Strig, create EvalOptimizerModel from eval.open_pipeline as the pipeline and then call train function or create your own training function
Attack and check result, call complete function to run attack, and then run check_refusal_completions to calculate ASR. User can write their own attack and metric function using EvalOptimizerModel object

a usage example is shown in example/example.py and example/train.sh and example/infer.sh an example of creating customized training and attack function is shown in example/eval_struq.py For chat Template options, please refers to FastChat

Close-sourced Model

Directly Attack and check result, create EvalAPIModel from eval.close_pipeline as the pipeline and then call complete function or create your own attack function, and call check_refusal to calculate ASR.

a usage example is shown in example/example_close.py and example/infer_close.sh

Citation

If this work is helpful, please kindly cite as:

@misc{pan2025oetoptimizationbasedpromptinjection,
      title={OET: Optimization-based prompt injection Evaluation Toolkit}, 
      author={Jinsheng Pan and Xiaogeng Liu and Chaowei Xiao},
      year={2025},
      eprint={2505.00843},
      archivePrefix={arXiv},
      primaryClass={cs.CR},
      url={https://arxiv.org/abs/2505.00843}, 
}

Acknowledgement

This repo benefits from HarmBench, AutoDAN, and FastChat. Thanks for their wonderful works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

OET: Optimization-based prompt injection Evaluation Toolkit

Abstract

Table of Contents

Update

Installation

Evaluation

Data Transformation

Open-sourced Model

Close-sourced Model

Citation

Acknowledgement

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
configs		configs
data		data
eval		eval
example		example
optimizer		optimizer
.gitattributes		.gitattributes
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
requirements.txt		requirements.txt
setup.py		setup.py

License

SaFoLab-WISC/OET

Folders and files

Latest commit

History

Repository files navigation

OET: Optimization-based prompt injection Evaluation Toolkit

Abstract

Table of Contents

Update

Installation

Evaluation

Data Transformation

Open-sourced Model

Close-sourced Model

Citation

Acknowledgement

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages