Skip to content

yiksiu-chan/SpeakEasy

Repository files navigation

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions {#speak-easy}

This repository contains the code for our paper Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions. We show that simple interactions such as multi-step, multilingual querying can elicit sufficiently harmful jailbreaks from LLMs. We design a metric (HarmScore) to measure the actionability and informativeness of jailbreak responses, and a straightforward attack method (Speak Easy) that significantly increases the success of these exploits across multiple benchmarks.

Requirements

# Primary libraries
torch
numpy
openai
transformers

# Utility libraries
tqdm
pyyaml
python-box

# Additional dependencies
ray
vllm
ollama

Please refer to requirements.txt for a full list of libraries.

Evaluator Usage Example

All data files to be evaluated must adhere to the following format:

[
  {
    "query": "malicious query",
    "response": "response to query"
  },
  ...
]

You can evaluate files containing query-response pairs by running:

export CUDA_VISIBLE_DEVICES=# Specify GPUs
python score_qa_pairs.py --data-dir "path to data" --save-dir "save location" --scorer "scorer to use"

The above creates a new file at save_dir that contains the same query-response pairs as the original data, with an added score key representing the evaluator’s assigned score. We have provided an example data file under data/sample_data.json.

Questions

Any questions related to the code or the paper can be directed to Yik Siu Chan (yik_siu_chan@brown.edu) or Narutatsu Ri (nr3764@princeton.edu). If you encounter any problems when using the code or want to report a bug, please open an issue.

Citation

Please cite our paper if you find our repository helpful in your work:

@inproceedings{chan2025speakeasy,
  title={Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions}, 
  author={Yik Siu Chan and Narutatsu Ri and Yuxin Xiao and Marzyeh Ghassemi},
  year={2025},
  url={https://arxiv.org/abs/2502.04322}, 
  booktitle={Forty-second International Conference on Machine Learning}
}

About

[ICML 2025] Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •