AI reasoning benchmark through reverse-engineering of byte transformations. Test your model's algorithmic thinking capabilities!
"Here’s the input, here’s the output, guess how we got from one to the other. Write a 5-line Python function."
GTA-Benchmark challenges participants to reverse-engineer hidden transformation algorithms by examining input-output pairs. Each puzzle provides 24 visible test cases for analysis and 24 hidden test cases for validation.
Paste your function, hit [Submit], and get instant feedback.
A running instance of GTA-Benchmark is available at http://138.197.66.242:5000/
- Python 3.9 or higher
- Docker
- pip (Python package installer)
- Example puzzles are included in
puzzles/examples/
- Actual benchmark puzzles are kept private
- All test buffers are 64 bytes
graph TD
subgraph User Interaction
User[User] -->|Interacts with| WebUI[Web Interface]
end
subgraph Backend
WebUI -->|Sends code to| Flask[Flask App]
Flask -->|Processes request via| API[API Endpoint]
API -->|Invokes| Docker[Docker Sandbox]
Docker -->|Runs| Runner[Code Runner]
Runner -->|Generates| Results[Results]
Results -->|Stores results in| Database[SQLite Database]
Database -->|Updates| Leaderboard[Leaderboard]
Results -->|Returns to| Flask
Flask -->|Displays to| WebUI
WebUI -->|Fetches leaderboard from| Leaderboard
end
subgraph Source Control
GitHub[GitHub Repository] -->|Hosts code for| Flask & Docker & Runner
end
classDef interaction fill:#727,stroke:#222,stroke-width:2px;
classDef backend fill:#229,stroke:#222,stroke-width:2px;
classDef source fill:#292,stroke:#222,stroke-width:2px;
class User,WebUI interaction;
class Flask,API,Docker,Runner,Results,Database,Leaderboard backend;
class GitHub source;
- Clone the repository
- Install requirements:
pip install -r requirements.txt
- Make sure Docker is running
- Run the server:
python app.py
- Access the web interface at
http://localhost:5000
The system uses Docker for secure sandbox execution of submitted solutions. Make sure:
- Docker is installed and running
- Current user has permissions to run Docker commands
- Python image
python:3.9-slim
can be pulled from Docker Hub
All user-submitted code runs in an isolated Docker container with:
- Memory limit: 64MB
- Execution timeout: 3 seconds
- Network access: Disabled
- Read-only filesystem
- Process limit: 100