Semantic-Jailbreak-with-LLM

This project explores semantic jailbreaking techniques to bypass or extend the capabilities of Language Models (LLMs) by manipulating the input in a way that the model interprets differently from its intended design constraints. By leveraging this approach, the project aims to analyze, document, and potentially mitigate jailbreaking tactics on language models.

Introduction

Language models are increasingly integrated into various applications, yet they can be susceptible to "jailbreak" attempts where input manipulation forces the model to bypass its constraints. This project investigates semantic methods for jailbreaking, contributing to both the understanding and potential mitigation of such vulnerabilities. By understanding how models interpret language differently from expectations, we aim to provide a framework to enhance LLM safety and robustness.

Framework installation

Log into a linux machine with Anaconda installed.

In the terminal do:

Clone the repository:

git clone git@github.com:KarlaMun/Semantic-Jailbreak-with-LLM.git

Access the code:

cd Semantic-Jailbreak-with-LLM

Install the conda enviroment:

conda env create -f config.yml
conda activate llm-env
conda develop .

3.1 Update the conda enviroment:

conda env update --file config.yml --prune

Features

Semantic Jailbreak Methods: Techniques for crafting inputs that bypass LLM safeguards.
Analysis of Model Responses: Tools to evaluate the LLM's output in response to jailbreaking attempts.

License

This project is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analisis_de_modelos.ipynb		analisis_de_modelos.ipynb
baseGA.py		baseGA.py
environment.yml		environment.yml
genetic_algorithm.ipynb		genetic_algorithm.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic-Jailbreak-with-LLM

Introduction

Framework installation

Features

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

KarlaMun/Semantic-Jailbreak-with-LLM

Folders and files

Latest commit

History

Repository files navigation

Semantic-Jailbreak-with-LLM

Introduction

Framework installation

Features

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages