Skip to content

SumanMadipeddi/RL_Environment_Training

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

This is my personal repo for RL environment engineering. I’m iterating on an agent that interacts with a custom RL-style environment and uses Anthropic tools to adapt behavior over time.

Project goals

  • The initial policy/agent should intentionally fail about 60% of the time (i.e., succeed roughly 40%) to ensure the environment is challenging and informative.
  • Over repeated interactions, the agent should learn and demonstrate improvement, achieving the targeted behavior with number of repetitions.

Prerequisites

    • Git (latest)
  • Python 3.11+ (see .python-version for the exact version)
  • uv for Python project management (pip install uv or see https://docs.astral.sh/uv/)
  • An Anthropic API key

Setup (Windows PowerShell)

  1. Clone and enter the repo
git clone <repo-url>
  1. Install dependencies with uv
uv sync
  1. Set your Anthropic API key in the current shell
$env:ANTHROPIC_API_KEY="your_api_key_here"
  1. Run the agent
uv run main.py

Execution Modes

The agent can run tests concurrently or sequentially. Change the concurrent flag at the bottom of main.py:

asyncio.run(main(concurrent=True))
asyncio.run(main(concurrent=False))

When running concurrently, results print as they complete (not in run order) for faster overall execution.

Benchmark reference

This project references and draws inspiration from large-scale software engineering benchmarks such as SWE-Bench Pro. For details about the dataset, task structure, and evaluation setup, see the dataset page on Hugging Face: ScaleAI/SWE-bench_Pro.

About

RL env training with a learning agent; SWE-Bench Pro

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages