Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions {#speak-easy}

This repository contains the code for our paper Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions. We show that simple interactions such as multi-step, multilingual querying can elicit sufficiently harmful jailbreaks from LLMs. We design a metric (HarmScore) to measure the actionability and informativeness of jailbreak responses, and a straightforward attack method (Speak Easy) that significantly increases the success of these exploits across multiple benchmarks.

Requirements

# Primary libraries
torch
numpy
openai
transformers

# Utility libraries
tqdm
pyyaml
python-box

# Additional dependencies
ray
vllm
ollama

Please refer to requirements.txt for a full list of libraries.

Evaluator Usage Example

All data files to be evaluated must adhere to the following format:

[
  {
    "query": "malicious query",
    "response": "response to query"
  },
  ...
]

You can evaluate files containing query-response pairs by running:

export CUDA_VISIBLE_DEVICES=# Specify GPUs
python score_qa_pairs.py --data-dir "path to data" --save-dir "save location" --scorer "scorer to use"

The above creates a new file at save_dir that contains the same query-response pairs as the original data, with an added score key representing the evaluator’s assigned score. We have provided an example data file under data/sample_data.json.

Questions

Any questions related to the code or the paper can be directed to Yik Siu Chan (yik_siu_chan@brown.edu) or Narutatsu Ri (nr3764@princeton.edu). If you encounter any problems when using the code or want to report a bug, please open an issue.

Citation

Please cite our paper if you find our repository helpful in your work:

@inproceedings{chan2025speakeasy,
  title={Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions}, 
  author={Yik Siu Chan and Narutatsu Ri and Yuxin Xiao and Marzyeh Ghassemi},
  year={2025},
  url={https://arxiv.org/abs/2502.04322}, 
  booktitle={Forty-second International Conference on Machine Learning}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
backbones		backbones
data		data
eval_models		eval_models
frameworks		frameworks
resp_select_models		resp_select_models
translation		translation
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Speak_Easy.png		Speak_Easy.png
requirements.txt		requirements.txt
run_frameworks.py		run_frameworks.py
run_frameworks.sh		run_frameworks.sh
score_qa_pairs.py		score_qa_pairs.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions {#speak-easy}

Requirements

Evaluator Usage Example

Questions

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

yiksiu-chan/SpeakEasy

Folders and files

Latest commit

History

Repository files navigation

Speak Easy: Eliciting Harmful Jailbreaks from LLMs with Simple Interactions {#speak-easy}

Requirements

Evaluator Usage Example

Questions

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages