Orak 🎮

Orak (오락) is a foundational benchmark for evaluating Large Language Model (LLM) agents in diverse popular video games. Please check out our paper and the leaderboard for more details!

*The name Orak comes from 오락 (orak), the native Korean word meaning "game".

by Dongmin Park^1*, Minkyu Kim^1*, Beongjun Choi^1*, Junhyuck Kim¹, Keon Lee¹, Jonghyun Lee¹, Inkyu Park¹, Byeong-Uk Lee¹, Jaeyoung Hwang¹, Jaewoo Ahn^1,2, Ameya S. Mahabaleshwarkar³, Bilal Kartal³, Pritam Biswas³, Yoshi Suhara³, Kangwook Lee^1,4, Jaewoong Cho¹.

¹ KRAFTON AI, ² Seoul National University, ³ NVIDIA, ⁴ University of Wisconsin-Madison

Features

Cover most game genres with 12 popular titles — see full game list
Enable plug-and-play studies of agentic modules via the Model Context Protocol (MCP) interface
Support analysis of both LLMs and VLMs on textual and visual game states
Easily integrate new environments, models, and custom agents with a config-driven setup — see script customization

Project Structure

Game List

Action	Adventure	RPG	Simulation	Strategy	Puzzle
Street Fighter III	Ace Attorney	Pokémon Red	Minecraft	StarCraft II	Baba Is You

Super Mario	Her Story	Darkest Dungeon	Stardew Valley	Slay the Spire	2048

Core Modules

MCP structure description

mcp_agent_client/: Manages interaction between agent modules at mcp_agent_servers and the game environment.
- Defines a MCP client responsible for managing connections between agent servers and game servers.
- Implements API functions for multiple LLM instances (e.g., OpenAI's GPT-4o, Meta's Llama-3.2-1B-Instruct).
- Provides game-independent play logic with a main configuration for managing hyperparameters related to game execution.
mcp_agent_servers/: Implementations of LLM/SLM-based gaming agents.
- Implements servers (MCP tools) that communicate with the mcp_agent_client to support agentic modulation.
- Defines prompts for each game and agent.
- Compatible with platform-provided client LLMs (e.g., Claude Desktop) without relying on the APIs defined in mcp_agent_client.
mcp_game_servers/: Collection of supported game environments.
- Implements servers (MCP tools) that communicate with the mcp_agent_client to deliver and update game states.
- Defines environment implementations for each supported game.
- Compatible with platform-provided client LLMs (e.g., Claude Desktop) without relying on the APIs defined in mcp_agent_client.

Installation

1. Game Setup

Each game must be set up individually following the instructions in docs/setup_{game}.md. Note that, 6 games, Ace Attorney, Her Story, Darkest Dungeon, Stardew Valley, Slay the Spire, and Baba Is You, require a one-time purchase, typically priced between $9.99 and $24.99. The other six games are free-to-play.

2. Python Environment

We support both the MCP script (based on uv environment) and Python scripts (based on conda environments). Both scripts invoke the same game environment and produce identical gameplay results.

MCP version

uv installation

powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # windows

virtual environment creation

uv venv --python 3.10
.venv/Script/activate # windows
uv pip install -e .

default for most games

uv pip install -r requirements/base.txt

for some games (supermario, etc)

uv pip install -r requirements/base.txt
uv pip install -r requirements/{game}.txt

Python script version

default for most games

conda create -n orak python=3.10
conda activate orak
pip install -r requirements/base.txt

for some games (supermario, etc)

conda create -n orak python=3.10
conda activate orak
pip install -r requirements/base.txt
pip install -r requirements/{game}.txt

3. API Key Setup

To use commercial API-based LLMs (from OpenAI, Anthropic, Google, DeepSeek), create a key file under the src/mcp_agent_servers/keys/ as follows:

API key details

OpenAI
- create src/mcp_agent_servers/keys/openai-key/key.env and add your API key (as a plain text string started with sk-***)
Anthropic
- create src/mcp_agent_servers/keys/anthropic-key/key.env and add your API key (as a plain text string started with sk-***)
Google (current version is for Vertex AI)
- create src/mcp_agent_servers/keys/google-key/gemini_gcp.json and add your GCP JSON service account key.
DeepSeek
- create src/mcp_agent_servers/keys/deepseek-key/key.env and add your API key (as a plain text string started with sk-***)

Evaluation

Leaderboard (Single-player)

Setup the Game Environment
Follow the Game Environment Setup Guide to configure the required environment.
Launch the Game
Ensure the game is running and ready for interaction. Note that the game will be automatically launched for certain games, e.g., supermario and 2048. Note: Some games require minor manual setup after launch, so please check the corresponding setup file in docs/setup_{game}.md before running.
Run the Gaming Agent

MCP version
```
bash scripts/leaderboard/mcp/{game}.sh
```

python script version

bash scripts/leaderboard/python/{game}.sh

Battle Arena (Two-players)

MCP version
```
bash scripts/arena/mcp/{game}.sh
```
python script version
```
bash scripts/arena/python/{game}.sh
```

We described the details of evaulation metrics for each game in docs/eval_metrics.md.

Run Your Custom Script

You can easily customize the run script by specifying <Game, LLM, Agent Module, Input Type>. This enables studies on agentic strategies and input state types over all games — see available agent list

MCP version

uv run ./scripts/mcp_play_game.py \
   --config ./src/mcp_agent_client/configs/{game}/config.yaml \
      env.input_modality={input_modality} \
      agent.llm_name={model} \
      agent.agent_type={agent} \
      agent.prompt_path=mcp_agent_servers.{game}.prompts.{input_modality}.{agent}

Replace {game}, {model}, {agent}, {input_modality} with the name of the those you want to run. You can also customize the configuration by changing <Game, LLM, Agent Module, Input Type> in ./src/mcp_agent_client/configs/{game}/config.yaml — see config details

Python script version
```
python scripts/play_game.py --config {config_path} \
   env.input_modality={input_modality} \
   agent.llm_name={model} \
   agent.agent_type={agent} \
   agent.prompt_path=mcp_agent_servers.{game}.prompts.{input_modality}.{agent}
```
Replace {game}, {model}, {agent}, {input_modality} with the name of the those you want to run. You can also customize the configuration by changing <Game, LLM, Agent Module, Input Type> in ./src/mcp_agent_client/configs/{game}/config.yaml — see config details

Bonus: Freeform Gameplay with Claude via MCP

Our MCP interface supports fully free-form, open-ended gameplay beyond standard evaluation with static agentic strategies. The LLM can decide when and how to use different tools and prompts during gameplay. For example, you can simply prompt Claude with "Play the {game} by yourself. {some instructions to use the mcp tools}", which allows Claude to take full control of gameplay decisions and tool usage. Below are video examples of Claude actively playing Ace Attorney and Baba Is You — see claude gameplay guideline for more details

_{Claude playing Ace Attorney}

_{Claude playing Baba Is You}

Submission Guideline

You can submit your own LLM backbones and agentic strategies to our repo. Please check out the guideline in docs/submission_guidline.md. We are also open to the contribution for adding new games. If you want to do so, please make the PR or reach out to dongmin.park@krafton.com. We will credit your contribution in README.md.

Citation

@article{park2025orak,
  title     = {Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games},
  author    = {Park, Dongmin and Kim, Minkyu and Choi, Beongjun and Kim, Junhyuck and Lee, Keon and Lee, Jonghyun and Park, Inkyu and Lee, Byeong-Uk and Hwang, Jaeyoung and Ahn, Jaewoo and Mahabaleshwarkar, Ameya S. and Kartal, Bilal and Biswas, Pritam and Suhara, Yoshi and Lee, Kangwook and Cho, Jaewoong},
  year      = {2025},
  eprint    = {2506.03610},
  archivePrefix = {arXiv},
  note      = {arXiv:2506.03610}
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
docs		docs
executables		executables
requirements		requirements
scripts		scripts
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Orak 🎮

Table of Contents

Features

Project Structure

Game List

Core Modules

Installation

1. Game Setup

2. Python Environment

3. API Key Setup

Evaluation

Leaderboard (Single-player)

Battle Arena (Two-players)

Run Your Custom Script

Bonus: Freeform Gameplay with Claude via MCP

Submission Guideline

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 8

Uh oh!

Languages

License

krafton-ai/Orak

Folders and files

Latest commit

History

Repository files navigation

Orak 🎮

Table of Contents

Features

Project Structure

Game List

Core Modules

Installation

1. Game Setup

2. Python Environment

3. API Key Setup

Evaluation

Leaderboard (Single-player)

Battle Arena (Two-players)

Run Your Custom Script

Bonus: Freeform Gameplay with Claude via MCP

Submission Guideline

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 8

Uh oh!

Languages

Packages