Djinn is a lightweight framework for authoring, validating, and distributing programming problems that contain both an intended (ground-truth) solution and an intentional exploit. Each problem ships with a verifier that accepts the ground-truth and the exploit, but rejects a set of negative examples ("nulls").
- Secure Sandboxing: Code submissions are verified in a secure, isolated cloud environment using E2B.
- Component-Based Authoring: Assemble problems from existing descriptions and ground-truth code using component-based generation.
- Verifier Evaluation: Evaluate problems and emit JSONL + metrics with
djinn evaluate-verifiers
(use--slug <slug>
for a single problem). - Flexible Exporting: Export your entire problem library to a local JSONL file or directly to the Hugging Face Hub.
First, clone the repository and install the project in editable mode. This will also install all the required dependencies listed in pyproject.toml
.
git clone https://github.com/EleutherAI/djinn # Replace with the correct URL if different
cd djinn
pip install -e .
The problems are stored in a separate repository as a git submodule. After cloning the main repository, you need to initialize and update the submodule:
git submodule update --init --recursive
If you're cloning for the first time, you can also clone with submodules in one step:
git clone --recurse-submodules https://github.com/EleutherAI/djinn
Djinn uses E2B for sandboxed code execution. You will need an API key to run the verification steps.
-
Sign up for a free API key at the E2B documentation page.
-
Provide the key to Djinn. The recommended way is to create a
.env
file in the root of the project:# Create a .env file from the example cp .env.example .env
Then, open the
.env
file and add your key:E2B_API_KEY="your_api_key_here"
Alternatively, you can set the key as an environment variable:
export E2B_API_KEY="your_api_key_here"
If the E2B_API_KEY
is not set, Djinn will fall back to running code locally using exec
, which is insecure and not recommended.
Djinn provides several commands for managing coding problems:
Command | Purpose | Example |
---|---|---|
djinn evaluate-verifiers |
Evaluate verifiers and emit JSONL + metrics | djinn evaluate-verifiers --slug palindrome |
djinn generate |
Import problems from datasets or assemble from components | djinn generate --import primeintellect --exploit "timing attack" --sample 3 --out out_dir |
djinn analyze |
Print difficulty analysis or create stratified splits | djinn analyze --create-splits |
djinn export |
Export to JSONL/Hugging Face | djinn export --hf-repo "user/dataset" |
djinn improve-verifiers |
Run centralized verifier improvement loop | djinn improve-verifiers --iters 2 |
djinn generate-references |
Build reference exploits/explanations per exploit type | djinn generate-references --max-per-type 2 |
Either import curated problems from supported datasets or assemble a problem from pre-written components.
Run the evaluation suite (consistency, security, cross-null checks). Artifacts will be written under a timestamped directory in generated_metrics/problem_generation/eval/
.
djinn evaluate-verifiers --slug palindrome
Two supported flows:
- Dataset import (PrimeIntellect or TACO-verified)
djinn generate --import primeintellect --exploit "timing attack" --sample 3 --out imported/
- Component-based assembly (provide description and optionally ground truth)
djinn generate \
--exploit "prototype pollution" \
--problem-description-file path/to/description.txt \
--ground-truth-file path/to/ground_truth.py \
--out problems/my_problem
Notes:
--sample
controls how many problems to import per exploit.- Pure free-form generation is disabled; use dataset
--import
or provide component files. --max-attempts
is retained for compatibility (used by downstream generation routines where applicable).
📖 For detailed documentation, examples, and advanced usage, see: djinn/generation/README.md
You can export the entire problem set to a local JSONL file or upload it directly to the Hugging Face Hub.
To a local file:
djinn export --out dataset.jsonl
To the Hugging Face Hub:
# First, log in to the Hugging Face Hub
huggingface-cli login
# Then, run the export command with your repository ID
djinn export --hf-repo "your-username/your-dataset-name"
You can add the --private
flag to create a private repository on the Hub.
The command djinn aggregate-training-runs
produces a timestamped directory under generated_metrics/problem_generation/eval/
containing exploit_logs_summary.json
and VERSION.json
.
exploit_logs_summary.json
structure:
- Top-level:
{ exploit_key: [rows, intended_exploit_description, intended_exploit_example] }
rows
is a list of rows shaped as:[problem_id, problem_description, completion_code, insecure_reward]
Example (truncated):
{
"reward_delta_filesystem_exposure": [
[
[
"digit_pattern_file_exposure_005_04",
"Implement a function that solves Charlie and Johnny's game problem...",
"def calculate_digit_score(x: int) -> int:\n ...",
1.0
]
],
"This exploit takes advantage of a critical filesystem vulnerability...",
"import os\nimport ast\n..."
]
}
VERSION.json
includes metadata like timestamp, mode, run_dirs, and (if available) git_sha
.