This is my personal repo for RL environment engineering. I’m iterating on an agent that interacts with a custom RL-style environment and uses Anthropic tools to adapt behavior over time.
- The initial policy/agent should intentionally fail about 60% of the time (i.e., succeed roughly 40%) to ensure the environment is challenging and informative.
- Over repeated interactions, the agent should learn and demonstrate improvement, achieving the targeted behavior with number of repetitions.
-
- Git (latest)
- Python 3.11+ (see
.python-versionfor the exact version) - uv for Python project management (
pip install uvor seehttps://docs.astral.sh/uv/) - An Anthropic API key
- Clone and enter the repo
git clone <repo-url>- Install dependencies with uv
uv sync- Set your Anthropic API key in the current shell
$env:ANTHROPIC_API_KEY="your_api_key_here"- Run the agent
uv run main.pyThe agent can run tests concurrently or sequentially. Change the concurrent flag at the bottom of main.py:
asyncio.run(main(concurrent=True))
asyncio.run(main(concurrent=False))When running concurrently, results print as they complete (not in run order) for faster overall execution.
This project references and draws inspiration from large-scale software engineering benchmarks such as SWE-Bench Pro. For details about the dataset, task structure, and evaluation setup, see the dataset page on Hugging Face: ScaleAI/SWE-bench_Pro.