Towers of Hanoi AI Test

Tests OpenAI reasoning models on the classic Towers of Hanoi puzzle. The AI gets up to double the theoretical minimum moves to solve the puzzle, with outcomes based on efficiency.

No, this does not test the exact same scenario from that Apple paper.

Setup

Install dependencies:

pip install -r requirements.txt

Set OpenAI API key:

cp .env.example .env
# Edit .env and add: OPENAI_API_KEY=your_key_here

Usage

# Interactive mode
python run.py

# Auto mode (no pauses)
python run.py 4 --auto

# Mock AI testing (no API key needed)
python run.py 3 --mock optimal --auto
python run.py 3 --mock invalid-move --auto
python run.py 3 --mock budget-exceeded --auto

Mock AI Modes 🤖

Test the application without OpenAI API calls using mock AI clients:

--mock optimal - Perfect optimal solver (100% efficiency)
--mock invalid-move - Tests error handling with invalid moves
--mock budget-exceeded - Tests budget enforcement with inefficient moves

How it works

AI gets up to 2x the theoretical minimum moves (2 * (2^n - 1)) to solve n disks
AI receives comprehensive enhanced context: last 3 moves + corresponding previous game states + previous reasoning
AI can learn from its own past reasoning patterns and build upon successful strategies
AI is instructed to avoid cycling back to previous states (prevents inefficient loops)
Test stops when puzzle is solved, when an invalid move is detected, or when the budget is exceeded
Results auto-exported to output/ with timestamps
Efficiency calculated only for completed games - incomplete games show "N/A"

Results

OPTIMAL_SUCCESS: AI solved within theoretical minimum moves (2^n - 1)
SUCCESS: AI solved within 2x budget but used more than minimum moves
FAILURE: AI exceeded 2x budget or made invalid move

Mock AI Testing

The application includes a mock AI system for testing without OpenAI API calls:

# Test perfect optimal solution
python run.py 3 --mock optimal --auto

# Test error handling
python run.py 3 --mock invalid-move --auto

# Test budget enforcement  
python run.py 3 --mock budget-exceeded --auto

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
.vscode		.vscode
src		src
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Towers of Hanoi AI Test

Setup

Usage

Mock AI Modes 🤖

How it works

Results

Mock AI Testing

About

Uh oh!

Releases

Packages

Languages

grankko/hanoi-redemption

Folders and files

Latest commit

History

Repository files navigation

Towers of Hanoi AI Test

Setup

Usage

Mock AI Modes 🤖

How it works

Results

Mock AI Testing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages