Audio Jailbreaks

Welcome to the Audio Jailbreaks repository. This project provides an experimental framwork with which to generate and evaluate audio jailbreaks on the SALMON-N 7B Audio Language Model.

Previous results and logs are availably only in zipped form as they contain dangerous/vulgar outputs. The accompanying paper can also be found in this repository.

Repository Structure

beats/: Contains the core modules and scripts for audio processing.
figures/: Directory for storing figures and visualizations.
jailbreaks/: Contains subdirectories for different types of jailbreak evaluations:
- robustness/
- stealth/
- transferability/
- universality/
qformer/: Directory for Q-Former related scripts.
results/: Directory for storing result files.
training_logs/: Directory for storing training logs.
attack.ipynb: Notebook for running attack evaluations.
model.py: Main model script.
ontology.json: JSON file containing ontology definitions.
requirements.txt: List of Python dependencies.
environment.yml: Conda environment configuration file.
README.md: This file.
pip_reqs.txt: Additional pip requirements.
working_reqs.txt: Working requirements file.

Results Structure

There are three types of result files in the results directory:

{name}.json: A dictionary keyed by the id of the harmful prompt, containing the model’s responses under the jailbreak described by {name}. For example, music_500 is the music base audio optimized with 500 steps of gradient descent. These files follow the structure:

{
    "prompt_id": {
        "response": "Model response",
        "detox_scores": {
            "toxicity": 0.5,
            "severe_toxicity": 0.2,
            "obscene": 0.1,
            "threat": 0.0,
            "insult": 0.3,
            "identity_attack": 0.4
        },
        "label": 1
    }
}

overall_metrics.csv: A CSV file where each row represents one jailbreak. The columns aggregate information from the {name}.json file, including overall toxicity metrics.

Usage

Setup Environment:

conda env create -f environment.yml
conda activate audio-jailbreaks

Run Notebooks:
- Open and run the Jupyter notebooks (attack.ipynb, , , ) to perform different evaluations.
Analyze Results:
- Results will be saved in the directory as JSON and CSV files.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
beats		beats
jailbreaks		jailbreaks
other_third-party_licenses		other_third-party_licenses
prompts		prompts
qformer		qformer
resource		resource
results		results
.gitignore		.gitignore
CustomWhisper.py		CustomWhisper.py
README.md		README.md
attack.ipynb		attack.ipynb
environment.yml		environment.yml
model.py		model.py
ontology.json		ontology.json
pip_reqs.txt		pip_reqs.txt
realtox_top50.txt		realtox_top50.txt
requirements.txt		requirements.txt
results.zip		results.zip
robustness.ipynb		robustness.ipynb
stealth.ipynb		stealth.ipynb
training_logs.zip		training_logs.zip
universality.ipynb		universality.ipynb
working_reqs.txt		working_reqs.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Audio Jailbreaks

Repository Structure

Results Structure

Usage

About

Uh oh!

Releases

Packages

Languages

isha-gpt/audio-jailbreaks

Folders and files

Latest commit

History

Repository files navigation

Audio Jailbreaks

Repository Structure

Results Structure

Usage

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages