[suggestion] Crescendo: A Quiet Crescendo in the Arms Race of LLM Jailbreakin

**Excerpt:** 
Large Language Models (LLMs) are increasingly embedded in our digital infrastructure—from search engines and productivity tools to customer service and creative writing. These models are trained not only to be capable but also to be safe. Alignment techniques aim to ensure that LLMs do not produce harmful, unethical, or illegal content.

But what if the model’s alignment can be bypassed—not with a single clever prompt, but through a conversation?

In our recent work, we introduce Crescendo[1], a novel multi-turn jailbreak attack that gradually leads an LLM to violate its safety constraints. Unlike traditional jailbreaks that rely on adversarial prompts or suffixes, Crescendo uses benign, human-readable inputs and leverages the model’s own outputs to steer the conversation. We also present Crescendomation, a tool that automates this attack and outperforms existing jailbreak methods across a wide range of models and tasks.

This article walks through the motivation, design, and implications of Crescendo, with examples and figures from our research. Our goal is to raise awareness of this new class of vulnerabilities and to encourage the development of more robust alignment techniques.


URL: https://www.usenix.org/publications/loginonline/crescendo-quiet-crescendo-arms-race-llm-jailbreaking#Ahmed%20Salem 

- [ ] Picked
- [ ] Scheduled
- [ ] Done

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[suggestion] Crescendo: A Quiet Crescendo in the Arms Race of LLM Jailbreakin #18

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[suggestion] Crescendo: A Quiet Crescendo in the Arms Race of LLM Jailbreakin #18

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions