MIKASA-Robo

Benchmark for robotic tabletop manipulation memory-intensive tasks

Example tasks from the MIKASA-Robo benchmark

🎉 NOW AVAILABLE ON PIP! 🎉
pip install mikasa-robo-suite

🎯 ALL 32 DATASETS NOW AVAILABLE!🎯
Check out our memory-intensive training datasets for Offline RL!

Overview

MIKASA-Robo is a comprehensive benchmark suite for memory-intensive robotic manipulation tasks, part of the MIKASA (Memory-Intensive Skills Assessment Suite for Agents) framework. It features:

12 distinct task types with varying difficulty levels
32 total tasks covering different memory aspects
32 visual-based datasets for Offline RL
First benchmark specifically designed for testing agent memory in robotic manipulation

Key Features

Diverse Memory Testing: Covers four fundamental memory types:
- Object Memory
- Spatial Memory
- Sequential Memory
- Memory Capacity
Built on ManiSkill3: Leverages the powerful ManiSkill3 framework, providing:
- GPU parallelization
- User-friendly interface
- Customizable environments

List of Tasks

Memory Task	Mode	Brief Description	T	Memory Task Type
`ShellGame[Mode]-v0`	`Touch` `Push` `Pick`	Memorize the position of the ball after some time being covered by the cups and then interact with the cup the ball is under.	90	Object
`Intercept[Mode]-v0`	`Slow` `Medium` `Fast`	Memorize the positions of the rolling ball, estimate its velocity through those positions, and then aim the ball at the target.	90	Spatial
`InterceptGrab[Mode]-v0`	`Slow` `Medium` `Fast`	Memorize the positions of the rolling ball, estimate its velocity through those positions, and then catch the ball with the gripper and lift it up.	90	Spatial
`RotateLenient[Mode]-v0`	`Pos` `PosNeg`	Memorize the initial position of the peg and rotate it by a given angle.	90	Spatial
`RotateStrict[Mode]-v0`	`Pos` `PosNeg`	Memorize the initial position of the peg and rotate it to a given angle without shifting its center.	90	Object
`TakeItBack-v0`	---	Memorize the initial position of the cube, move it to the target region, and then return it to its initial position.	180	Spatial
`RememberColor[Mode]-v0`	`3`/`5`/`9`	Memorize the color of the cube and choose among other colors.	60	Object
`RememberShape[Mode]-v0`	`3`/`5`/`9`	Memorize the shape of the cube and choose among other shapes.	60	Object
`RememberShapeAndColor[Mode]-v0`	`3×2`/`3×3` `5×3`	Memorize the shape and color of the cube and choose among other shapes and colors.	60	Object
`BunchOfColors[Mode]-v0`	`3`/`5`/`7`	Remember the colors of the set of cubes shown simultaneously in the bunch and touch them in any order.	120	Capacity
`SeqOfColors[Mode]-v0`	`3`/`5`/`7`	Remember the colors of the set of cubes shown sequentially and then select them in any order.	120	Capacity
`ChainOfColors[Mode]-v0`	`3`/`5`/`7`	Remember the colors of the set of cubes shown sequentially and then select them in the same order.	120	Sequential

Total: 32 tabletop robotic manipulation memory-intensive tasks in 12 groups. T - episode timeout.

Quick Start

Installation

# Local installation
git clone git@github.com:CognitiveAISystems/MIKASA-Robo.git
cd MIKASA-Robo
pip install -e .

# Remote installation
pip install mikasa-robo-suite

Basic Usage

import mikasa_robo_suite
from mikasa_robo_suite.utils.wrappers import StateOnlyTensorToDictWrapper
from tqdm.notebook import tqdm
import torch
import gymnasium as gym

# Create the environment via gym.make()
# obs_mode="rgb" for modes "RGB", "RGB+joint", "RGB+oracle" etc.
# obs_mode="state" for mode "state"
episode_timeout = 90
env = gym.make("RememberColor9-v0", num_envs=4, obs_mode="rgb", render_mode="all")
env = StateOnlyTensorToDictWrapper(env) # * always use this wrapper!

obs, _ = env.reset(seed=42)
print(obs.keys())
for i in tqdm(range(episode_timeout)):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(torch.from_numpy(action))

env.close()

Advanced Usage: Debug Wrappers

MIKASA-Robo has implemented special task-specific and task-agnostic wrappers that allow you to track the progress of agents training, the reward agents receive, the number of steps agents have taken, and the individual contribution from each reward component. It is not necessary to use these wrappers, but if you do decide not to use them, remember that env = StateOnlyTensorToDictWrapper(env) must always be used to get the correct observation keys! For mode details see quick_start.ipynb.

With all task-predefined wrappers

import mikasa_robo_suite
from mikasa_robo_suite.dataset_collectors.get_mikasa_robo_datasets import env_info
from tqdm.notebook import tqdm
import torch
import gymnasium as gym

env_name = "RememberColor9-v0"
obs_mode = "rgb" # or "state"
num_envs = 4
seed = 42

env = gym.make(env_name, num_envs=num_envs, obs_mode=obs_mode, render_mode="all")

state_wrappers_list, episode_timeout = env_info(env_name)
print(f"Episode timeout: {episode_timeout}")
for wrapper_class, wrapper_kwargs in state_wrappers_list:
    env = wrapper_class(env, **wrapper_kwargs)

obs, _ = env.reset(seed=seed)
print(obs.keys())
for i in tqdm(range(episode_timeout)):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(torch.from_numpy(action))

env.close()

With selective wrappers

import mikasa_robo_suite
from mikasa_robo_suite.utils.wrappers import *
from mikasa_robo_suite.memory_envs import *
import gymnasium as gym
from gymnasium.envs.registration import registry
from tqdm.notebook import tqdm

env_name = "ShellGameTouch-v0"
obs_mode = "state"
num_envs = 4
seed = 42

env = gym.make(env_name, num_envs=num_envs, obs_mode=obs_mode, render_mode="all")
max_steps = registry.get(env_name).max_episode_steps
print(f"Episode timeout: {max_steps}")

env = StateOnlyTensorToDictWrapper(env)
env = InitialZeroActionWrapper(env, n_initial_steps=1)
env = ShellGameRenderCupInfoWrapper(env)
env = RenderStepInfoWrapper(env)
env = RenderRewardInfoWrapper(env)
env = DebugRewardWrapper(env)

obs, _ = env.reset(seed=seed)
print(obs.keys())
for i in tqdm(range(max_steps)):
    action = env.action_space.sample()
    obs, reward, terminated, truncated, info = env.step(torch.from_numpy(action))

env.close()

Training

MIKASA-Robo supports multiple training configurations:

PPO with MLP (State-Based)

python3 baselines/ppo/ppo_memtasks.py \
    --env_id=RememberColor9-v0 \
    --exp-name=remember-color-9-v0 \
    --num-steps=60 \
    --num_eval_steps=180 \
    --include-state

PPO with MLP (RGB + Joint)

python3 baselines/ppo/ppo_memtasks.py \
    --env_id=RememberColor9-v0 \
    --exp-name=remember-color-9-v0 \
    --num-steps=60 \
    --num_eval_steps=180 \
    --include-rgb \
    --include-joints

PPO with LSTM (RGB + Joint)

python3 baselines/ppo/ppo_memtasks_lstm.py \
    --env_id=RememberColor9-v0 \
    --exp-name=remember-color-9-v0 \
    --num-steps=60 \
    --num_eval_steps=180 \
    --include-rgb \
    --include-joints

To train with sparse rewards, add --reward-mode=sparse.

MIKASA-Robo Ideology

The agent's memory capabilities can be accessed not only when the environment demands memory, but also when the observations are provided in the correct format. Currently, we have implemented several training modes:

state: In this mode, the agent receives comprehensive, vectorized information about the environment, joints, and TCP pose, along with oracle data that is essential for solving memory-intensive tasks. When trained in this way, the agent addresses the MDP problem and does not require memory.
RGB+joints: Here, the agent receives image data from a camera mounted above and from the manipulator's gripper, along with the position and velocity of its joints. This mode provides no additional information, meaning the agent must learn to store and utilize oracle data. It is designed to test the agent's memory capabilities.

These training modes are obtained by using correct flags. Thus,

# To train in `state` mode:
--include-state

# To train in `RGB+joints` mode:
--include-rgb \
--include-joints

# Additionally, for debugging you can add oracle information to the observation:
--include-oracle

Datasets for Offline RL

For Offline RL we have prepared several ready-made datasets available for use immediately after download, as well as checkpoints of trained oracle agents to collect datasets of any size for all MIKASA-Robo tasks using single script.

Download ready-made datasets

To allow you to quickly start offline training, we provide datasets for all 32 MIKASA-Robo tasks, consisting of 1000 episodes each and available on Hugging Face (~200G in total):

Example of the dataset structure for ShellGameTouch-v0 with episode timeout T = 90:

import numpy as mp
episode = np.load(f'ShellGameTouch-v0/train_data_781.npz')

print(episode['rgb'].shape) # (90, 128, 128, 6) - two RGB images (view from above and from the gripper)
print(episode['joints'].shape) # (90, 25) - joint positions and velocities, and Tool Center Point (TCP) position and rotation
print(episode['action'].shape) # (90, 8) - action (8-dimensional vector)
print(episode['reward'].shape) # (90, ) - (dense) reward for each step
print(episode['success'].shape) # (90,) - (sparse) success flag for each step
print(episode['done'].shape) # (90, ) - done flag for each step

Collect datasets using oracle agents checkpoints

Download checkpoints (160Mb) of pretrained oracle agents for further datasets collection:

cd MIKASA-Robo

wget https://huggingface.co/datasets/avanturist/mikasa-robo/resolve/main/oracle_checkpoints.zip

unzip oracle_checkpoints.zip

Or, if you want to train oracle agents from scratch, use this code:

# For single task:
python3 mikasa_robo_suite/dataset_collectors/get_dataset_collectors_ckpt.py --env_id=ShellGameTouch-v0

# For all tasks:
python3 mikasa_robo_suite/dataset_collectors/parallel_training_manager.py

Once you download / trained oracle agents checkpoints, you can build datasets of arbitrary size (multiples of 250 episodes) for any MIKASA-Robo task:

# For single task:
python3 mikasa_robo_suite/dataset_collectors/get_mikasa_robo_datasets.py \
    --env-id=ShellGameTouch-v0 \
    --path-to-save-data="data" \
    --ckpt-dir="." \
    --num-train-data=1000

# For all tasks:
python3 mikasa_robo_suite/dataset_collectors/parallel_dataset_collection_manager.py \
    --path-to-save-data="data" \
    --ckpt-dir="." \
    --num-train-data=1000

Citation

If you find our work useful, please cite our paper:

@misc{cherepanov2025mikasa,
      title={Memory, Benchmark & Robots: A Benchmark for Solving Complex Tasks with Reinforcement Learning}, 
      author={Egor Cherepanov and Nikita Kachaev and Alexey K. Kovalev and Aleksandr I. Panov},
      year={2025},
      eprint={2502.10550},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.10550}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
assets		assets
baselines		baselines
mikasa_robo_suite		mikasa_robo_suite
run_scripts/ppo		run_scripts/ppo
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
quick_start.ipynb		quick_start.ipynb
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MIKASA-Robo

Benchmark for robotic tabletop manipulation memory-intensive tasks

Table of Contents

Overview

Key Features

List of Tasks

Quick Start

Installation

Basic Usage

Advanced Usage: Debug Wrappers

With all task-predefined wrappers

With selective wrappers

Training

PPO with MLP (State-Based)

PPO with MLP (RGB + Joint)

PPO with LSTM (RGB + Joint)

MIKASA-Robo Ideology

Datasets for Offline RL

Download ready-made datasets

Collect datasets using oracle agents checkpoints

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

CognitiveAISystems/MIKASA-Robo

Folders and files

Latest commit

History

Repository files navigation

MIKASA-Robo

Benchmark for robotic tabletop manipulation memory-intensive tasks

Table of Contents

Overview

Key Features

List of Tasks

Quick Start

Installation

Basic Usage

Advanced Usage: Debug Wrappers

With all task-predefined wrappers

With selective wrappers

Training

PPO with MLP (State-Based)

PPO with MLP (RGB + Joint)

PPO with LSTM (RGB + Joint)

MIKASA-Robo Ideology

Datasets for Offline RL

Download ready-made datasets

Collect datasets using oracle agents checkpoints

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages