Voxel SERL - real-world reinforcement learning in the context of vacuum gripping

Webpage: nisutte.github.io
Paper: arxiv.org/abs/2503.02405

Voxel SERL builds upon the original [SERL](https://github.com/rail-berkeley/serl] implementation) implementation by incorporating additional modalities into the reinforcement learning pipeline. It utilizes 3D spatial perception to improve the robustness of real-world vacuum gripping.

Getting started

Prerequisites:

NVIDIA driver and CUDA/cuDNN matching your target JAX build (only if using GPU)
uv installed (recommended): see https://docs.astral.sh/uv/ or install on Linux:

curl -LsSf https://astral.sh/uv/install.sh | sh     # install uv
uv sync                                             # environment setup
source .venv/bin/activate                           # environment activation (not necessarily needed)

Switching CUDA versions (JAX/Flax)

By default this project uses CUDA 12 builds via jax[cuda12] (cudnn 8/12 toolchain) as specified in pyproject.toml:

# pyproject.toml → [project].dependencies (default)
"jax[cuda12]==0.4.25",

To switch to CUDA 11 builds, change the extra to cuda11 and re-sync. For CPU-only installs, you can use "jax==0.4.25" (no CUDA extra) with the same jaxlib version and re-run uv sync.

Box picking

Contributions

Code Directory	Description
robot_controllers	Impedance controller for the UR5 robot arm
box_picking_env	Environment setup for the box picking task
vision	Point-Cloud based encoders
utils	Point-Cloud fusion and voxelization

Quick start guide for box picking with a UR5 robot arm

Without cameras

Follow the installation in the official SERL repo.
Check envs and either use the provided box_picking_env or set up a new environment using the one mentioned as a template. (New environments have to be registered here)
Use the config file to configure all the robot-arm specific parameters, as well as gripper and camera infos.
Go to the box picking folder and modify the bash files run_learner.py and run_actor.py. If no images are used, set camera_mode to none . WandB logging can be deactivated if debug is set to True.
Record 20 demostrations using record_demo.py in the same folder. Double check that the camera_mode and all environment-wrappers are identical to drq_policy.py.
Execute run_learner.py and run_actor.py simultaneously to start the RL training.
To evaluate on a policy, modify and execute run_evaluation.py with the specified checkpoint path and step.

Modaliy examples

Box handover (multi robot)

Information

In this setup, two UR5 robotic arms are positioned facing each other to perform a box handover using suction grippers. Each arm is equipped with a wrist-mounted D405 camera that supplies voxelized, localized point cloud data to the RL pipeline. The episode begins after a scripted box pickup, which is handled during the environment reset. Both robots are jointly controlled by a single RL policy operating in a 14-dimensional action space. Safety mechanisms automatically detect and handle collisions if the arms move too close together or if excessive forces are detected during the handover.

The Onshape design file for the D405 camera holder is available here.

Changes

File	Description
controller_client.py	New controller written in C++ (old one can still be used)
dual_ur5_env.py	Dual robot env core; relative frames/velocities, start poses, BT pickup; 14D action, 80+ D obs incl EE-to-EE relative pose.
box_handover_env.py	Handover logic, safety, drop-prevention; integrates collision and terminations.
config.py	Handover parameters and variants; 90° handover wip.
threaded_collision_detection.py	Threaded collision checks incl suction cup; used for safety/termination.
voxel_grid_encoders.py	VoxNet backbone (shared; freeze/unfreeze supported).
observation_statistics_wrapper.py	Observation stats and normalization (RLDS-derived).
relative_env.py	Relative frames/velocities; EMAs for force and velocity.
drq.py	DRQ: true critic ensemble, ReLU, narrower tanh std, noise aug, grad-norm logging.
encoding.py	Encoder wiring incl VoxNet and actor/critic inputs.
data_store.py	Async RLDS logging; end-of-episode flush; consistency checks.

Calibration: For robot-to-robot (base-to-base) calibration, I used my supervisor's repository (which is not public unfortunately). The process involves using charuco markers attached to the end effector and performing hand-eye calibration for both robots. Ultimately, the goal is to obtain the transformation matrix T_robotBaseLeft2robotBaseRight, saved as a .npy file.

Environment details (DualUR5Env)

Action space: 14D (7-DoF pose+grip per arm; concatenated left||right).
Observation (state): basic keys and shapes below. Images (if enabled) are passed through as left/*, right/*.

Key	Shape	Notes
left/tcp_pose	6	TCP pose (xyz, MRP)
left/tcp_vel	6	Linear xyz, angular MRP
left/gripper_state	2	Left gripper object detection and pressure
left/tcp_force	3	Force at left TCP
left/tcp_torque	3	Torque at left TCP
left/action	7	Last applied action for left arm
right/tcp_pose	6
right/tcp_vel	6
right/gripper_state	2
right/tcp_force	3
right/tcp_torque	3
right/action	7
l2r/tcp_pose	6	Left-EE to Right-EE relative pose
l2r/tcp_vel	6	Relative velocity (in left EE frame)
r2l/tcp_pose	6	Right-EE to Left-EE relative pose
r2l/tcp_vel	6	Relative velocity (in right EE frame)

Notes:

Pose representation is quaternion by default; with DualToMrpWrapper it becomes 6D (xyz + MRP). Relative angular rates are mapped accordingly.
Observation normalization (means/stds) is computed from RLDS and applied via wrappers.

TODO list

(click to expand)

Future work

Automate the pickup by integrating pose estimation of the boxes
Implement PointNet in JAX for the backbone, compare it to VoxNet
Add simulation pretraining to the repo, to speed up training in the real world.

Name		Name	Last commit message	Last commit date
Latest commit History 746 Commits
.github/workflows		.github/workflows
.vscode		.vscode
docs		docs
examples		examples
franka_sim		franka_sim
serl_launcher		serl_launcher
serl_robot_infra		serl_robot_infra
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
README_original.md		README_original.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Voxel SERL - real-world reinforcement learning in the context of vacuum gripping

Getting started

Switching CUDA versions (JAX/Flax)

Box picking

Contributions

Quick start guide for box picking with a UR5 robot arm

Without cameras

Modaliy examples

Box handover (multi robot)

Information

Changes

Environment details (DualUR5Env)

TODO list

Future work

About

Uh oh!

Releases 1

Packages

Languages

License

nisutte/voxel-serl

Folders and files

Latest commit

History

Repository files navigation

Voxel SERL - real-world reinforcement learning in the context of vacuum gripping

Getting started

Switching CUDA versions (JAX/Flax)

Box picking

Contributions

Quick start guide for box picking with a UR5 robot arm

Without cameras

Modaliy examples

Box handover (multi robot)

Information

Changes

Environment details (DualUR5Env)

TODO list

Future work

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages