A suite of 100+ Single-/Two-/Multi-Player texted based games for benchmarking and training of LLMs.
- 31/07/2025 We added SettlersOfCatan to TextArena!
- 14/07/2025 Announcing MindGames a NeurIPS2025 competition for training LLMs on various TextArena games that require theory of mind.
- 01/07/2025 Release of v0.6.9 with 100 games and simplified states, new observation wrappers for training and default wrappers for environments.
- 01/07/2025 Release of SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning introducing RL via self-play on TextArena games as a potential new training paradigm.
- 22/06/2025 Release of UnstableBaselines a light weight async online RL library for training LLMs on TextArena games.
- 16/04/2025 Release of the TextArena paper
- 14/02/2025 Release of the new, stable version for both pip and the website
- 31/01/2025 Initial demo release highlighted by Andrej Karpathy (crashing all our servers)
TextArena is a flexible and extensible framework for training, evaluating, and benchmarking models in text-based games. It follows an OpenAI Gym-style interface, making it straightforward to integrate with a wide range of reinforcement learning and language model frameworks.
Install TextArena directly from PyPI:
pip install textarena
The only requirement Agents need to fulfill is having a call function that accepts string observations and returns string action. We have implemented a number of basic agents that you can find here. In this example, we show how you can let GPT-4o-mini play against anthropic/claude-3.5-haiku in a game of TicTacToe.
We will be using the OpenRouterAgent, so first you need to set you OpenRouter API key:
export OPENROUTER_API_KEY="YOUR_OPENROUTER_API_KEY"
Now we can build the models and let them play:
import textarena as ta
# Initialize agents
agents = {
0: ta.agents.OpenRouterAgent(model_name="GPT-4o-mini"),
1: ta.agents.OpenRouterAgent(model_name="anthropic/claude-3.5-haiku"),
}
# Initialize the environment
env = ta.make(env_id="TicTacToe-v0")
# wrap it for additional visualizations
env = ta.wrappers.SimpleRenderWrapper(env=env)
env.reset(num_players=len(agents))
done = False
while not done:
player_id, observation = env.get_observation()
action = agents[player_id](observation)
done, step_info = env.step(action=action)
rewards, game_info = env.close()
If you use TextArena in your research, please cite:
@misc{guertler2025textarena,
title={TextArena},
author={Leon Guertler and Bobby Cheng and Simon Yu and Bo Liu and Leshem Choshen and Cheston Tan},
year={2025},
eprint={2504.11442},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.11442},
}
If you have any questions at all, feel free to reach out on discord. The below issues are great starting points if you want to contribute:
- Transfer the 'How to Contribute' from here to individual issues
- Make RushHour board generation algorithmic
- extend 2048 to arbitrary board sizes (should be very straight forward)
- extend Fifteenpuzzel to arbitrary sizes
- Add a nice end-of-game screen to the SimpleRenderWrapper visualizations