This repository contains the implementation of the synthetic data generation pipeline described in Cerberus: Safeguards for Large Language Models - Synthetic Data Generation (Part 1).
This project uses Nix to manage development dependencies. Follow these steps to get started:
On Linux/macOS:
curl --proto '=https' --tlsv1.2 -sSf -L https://install.determinate.systems/nix | sh -s -- install
Alternative (official installer):
sh <(curl -L https://nixos.org/nix/install) --daemon
-
Clone the repository:
git clone <your-repo-url> cd <your-project>
-
Enter the development environment:
nix-shell
This will automatically download and install all required tools (just, act, uv).
-
Verify installation: The shell will show version information for all tools when you enter the environment.
Once in the development environment, you have access to:
just
- Command runner for project tasksact
- Run GitHub Actions locallyuv
- Fast Python package management