Orak (오락) is a foundational benchmark for evaluating Large Language Model (LLM) agents in diverse popular video games. Please check out our paper and the leaderboard for more details!
*The name Orak comes from 오락 (orak), the native Korean word meaning "game".
by Dongmin Park1*, Minkyu Kim1*, Beongjun Choi1*, Junhyuck Kim1, Keon Lee1, Jonghyun Lee1, Inkyu Park1, Byeong-Uk Lee1, Jaeyoung Hwang1, Jaewoo Ahn1,2, Ameya S. Mahabaleshwarkar3, Bilal Kartal3, Pritam Biswas3, Yoshi Suhara3, Kangwook Lee1,4, Jaewoong Cho1.
1 KRAFTON AI, 2 Seoul National University, 3 NVIDIA, 4 University of Wisconsin-Madison
- Features
- Project Structure
- Installation
- Evaluation
- Agent Module Study
- Bonus: Freeform Gameplay with Claude
- Submission Guideline
- Cover most game genres with 12 popular titles — see full game list
- Enable plug-and-play studies of agentic modules via the Model Context Protocol (MCP) interface
- Support analysis of both LLMs and VLMs on textual and visual game states
- Easily integrate new environments, models, and custom agents with a config-driven setup — see script customization
Action | Adventure | RPG | Simulation | Strategy | Puzzle |
---|---|---|---|---|---|
Street Fighter III | Ace Attorney | Pokémon Red | Minecraft | StarCraft II | Baba Is You |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
Super Mario | Her Story | Darkest Dungeon | Stardew Valley | Slay the Spire | 2048 |
![]() |
![]() |
![]() |
![]() |
![]() |
![]() |
MCP structure description
mcp_agent_client/
: Manages interaction between agent modules at mcp_agent_servers and the game environment.- Defines a MCP client responsible for managing connections between agent servers and game servers.
- Implements API functions for multiple LLM instances (e.g., OpenAI's GPT-4o, Meta's Llama-3.2-1B-Instruct).
- Provides game-independent play logic with a main configuration for managing hyperparameters related to game execution.
mcp_agent_servers/
: Implementations of LLM/SLM-based gaming agents.- Implements servers (MCP tools) that communicate with the
mcp_agent_client
to support agentic modulation. - Defines prompts for each game and agent.
- Compatible with platform-provided client LLMs (e.g., Claude Desktop) without relying on the APIs defined in
mcp_agent_client
.
- Implements servers (MCP tools) that communicate with the
mcp_game_servers/
: Collection of supported game environments.- Implements servers (MCP tools) that communicate with the
mcp_agent_client
to deliver and update game states. - Defines environment implementations for each supported game.
- Compatible with platform-provided client LLMs (e.g., Claude Desktop) without relying on the APIs defined in
mcp_agent_client
.
- Implements servers (MCP tools) that communicate with the
Each game must be set up individually following the instructions in docs/setup_{game}.md
.
Note that, 6 games, Ace Attorney, Her Story, Darkest Dungeon, Stardew Valley, Slay the Spire, and Baba Is You, require a one-time purchase, typically priced between $9.99 and $24.99. The other six games are free-to-play.
We support both the MCP script (based on uv environment) and Python scripts (based on conda environments). Both scripts invoke the same game environment and produce identical gameplay results.
MCP version
uv installation
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex" # windows
virtual environment creation
uv venv --python 3.10
.venv/Script/activate # windows
uv pip install -e .
default for most games
uv pip install -r requirements/base.txt
for some games (supermario, etc)
uv pip install -r requirements/base.txt
uv pip install -r requirements/{game}.txt
Python script version
default for most games
conda create -n orak python=3.10
conda activate orak
pip install -r requirements/base.txt
for some games (supermario, etc)
conda create -n orak python=3.10
conda activate orak
pip install -r requirements/base.txt
pip install -r requirements/{game}.txt
To use commercial API-based LLMs (from OpenAI, Anthropic, Google, DeepSeek), create a key file under the src/mcp_agent_servers/keys/
as follows:
API key details
- OpenAI
- create
src/mcp_agent_servers/keys/openai-key/key.env
and add your API key (as a plain text string started withsk-***
)
- create
- Anthropic
- create
src/mcp_agent_servers/keys/anthropic-key/key.env
and add your API key (as a plain text string started withsk-***
)
- create
- Google (current version is for Vertex AI)
- create
src/mcp_agent_servers/keys/google-key/gemini_gcp.json
and add your GCP JSON service account key.
- create
- DeepSeek
- create
src/mcp_agent_servers/keys/deepseek-key/key.env
and add your API key (as a plain text string started withsk-***
)
- create
-
Setup the Game Environment
Follow the Game Environment Setup Guide to configure the required environment. -
Launch the Game
Ensure the game is running and ready for interaction. Note that the game will be automatically launched for certain games, e.g., supermario and 2048. Note: Some games require minor manual setup after launch, so please check the corresponding setup file indocs/setup_{game}.md
before running. -
Run the Gaming Agent
- MCP version
bash scripts/leaderboard/mcp/{game}.sh
- python script version
bash scripts/leaderboard/python/{game}.sh
- MCP version
bash scripts/arena/mcp/{game}.sh
- python script version
bash scripts/arena/python/{game}.sh
We described the details of evaulation metrics for each game in docs/eval_metrics.md
.
You can easily customize the run script by specifying <Game, LLM, Agent Module, Input Type>. This enables studies on agentic strategies and input state types over all games — see available agent list
-
MCP version
uv run ./scripts/mcp_play_game.py \ --config ./src/mcp_agent_client/configs/{game}/config.yaml \ env.input_modality={input_modality} \ agent.llm_name={model} \ agent.agent_type={agent} \ agent.prompt_path=mcp_agent_servers.{game}.prompts.{input_modality}.{agent}
Replace
{game}, {model}, {agent}, {input_modality}
with the name of the those you want to run. You can also customize the configuration by changing <Game, LLM, Agent Module, Input Type> in./src/mcp_agent_client/configs/{game}/config.yaml
— see config details -
Python script version
python scripts/play_game.py --config {config_path} \ env.input_modality={input_modality} \ agent.llm_name={model} \ agent.agent_type={agent} \ agent.prompt_path=mcp_agent_servers.{game}.prompts.{input_modality}.{agent}
Replace
{game}, {model}, {agent}, {input_modality}
with the name of the those you want to run. You can also customize the configuration by changing <Game, LLM, Agent Module, Input Type> in./src/mcp_agent_client/configs/{game}/config.yaml
— see config details
Our MCP interface supports fully free-form, open-ended gameplay beyond standard evaluation with static agentic strategies. The LLM can decide when and how to use different tools and prompts during gameplay. For example, you can simply prompt Claude with "Play the {game} by yourself. {some instructions to use the mcp tools}", which allows Claude to take full control of gameplay decisions and tool usage. Below are video examples of Claude actively playing Ace Attorney and Baba Is You — see claude gameplay guideline for more details
![]() Claude playing Ace Attorney |
![]() Claude playing Baba Is You |
You can submit your own LLM backbones and agentic strategies to our repo. Please check out the guideline in docs/submission_guidline.md. We are also open to the contribution for adding new games. If you want to do so, please make the PR or reach out to dongmin.park@krafton.com. We will credit your contribution in README.md.
@article{park2025orak,
title = {Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games},
author = {Park, Dongmin and Kim, Minkyu and Choi, Beongjun and Kim, Junhyuck and Lee, Keon and Lee, Jonghyun and Park, Inkyu and Lee, Byeong-Uk and Hwang, Jaeyoung and Ahn, Jaewoo and Mahabaleshwarkar, Ameya S. and Kartal, Bilal and Biswas, Pritam and Suhara, Yoshi and Lee, Kangwook and Cho, Jaewoong},
year = {2025},
eprint = {2506.03610},
archivePrefix = {arXiv},
note = {arXiv:2506.03610}
}