Skip to content

GCRL_UR10e is a sim-to-real reinforcement learning framework for the UR10e robotic arm, using goal-conditioned policies trained in simulation and deployed with either servoJ or moveJ control. The system leverages visual goals and supports real-time testing for reach-and-grasp tasks.

Notifications You must be signed in to change notification settings

cherylwang20/GCRL_UR10e

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection

This repository implements the method presented in the paper:
"Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection"
Huiyi Wang, Fahim Shahriar, Seyed Alireza Azimi, Gautham Vasan, A. Rupam Mahmood, Colin Bellinger
Accepted at the CoRL 2024 Workshop on Minimalist Robot Learning (MRM-D)
đź“„ Read the paper


Overview

This project investigates how goal-conditioned reinforcement learning (GCRL) can be enhanced using mask-based goal representations derived from natural language descriptions of target objects. The method enables a single manipulation policy to generalize across a wide variety of objects and goal configurations.

This repository includes:

  • A simulation environment for the UR10e robot
  • Integration with a physical UR10e robot
  • A trained mask-conditioned PPO policy
  • Instructions to train in simulation and deploy on hardware

Key Idea

Traditional GCRL approaches often struggle to generalize to new target objects. This work shows that binary goal masks—either ground-truth or generated by a pre-trained object grounding model—enable better generalization and faster learning than alternative goal conditioning strategies such as one-hot vectors or cropped target images.

In particular, we use a pre-trained object grounding model (GroundingDINO + SAM) to convert a textual goal description (e.g., “apple on the right”) into a binary mask that highlights the object’s location in the scene. This goal mask is updated at every timestep, allowing the agent to:

  • Track progress toward the goal
  • Receive implicit feedback
  • Mitigate the sparse reward problem

The RL policy is conditioned on:

  • RGB image
  • Proprioceptive state
  • Binary goal mask (updated at each timestep)

image


Method Summary

  • Text Prompt → Grounded Object Detector → Binary Mask
    Leverages vision-language grounding to generate object-specific goal representations.

  • Goal Conditioning Variants Compared:

    • One-hot vector (baseline)
    • Goal object image crop
    • Binary goal mask (proposed)
  • Learning Algorithm:
    PPO (Proximal Policy Optimization) trained on visual, proprioceptive, and mask inputs


Results

Goal Representation Seen Objects (In-Distribution) Unseen Objects (Out-of-Distribution)
One-hot Vector 13% 20%
Goal Object Image 62% 28%
GT Binary Mask 89% 90%
  • Binary masks enable strong zero-shot generalization to novel target objects.
  • Training with GT masks transfers well to DINO-generated masks on seen objects (~90% success).
  • Performance with real-time DINO-generated masks degrades in cluttered scenes due to detection noise.

Disclaimer

This codebase is provided for research purposes. Users are fully responsible for validating and testing any part of the code—both in simulation and on real robotic systems.
The authors and contributors assume no liability for any damage, failure, or unexpected behavior that may result from deploying the provided code on physical hardware. Proceed with caution and validate thoroughly in controlled environments.


Code and Usage

Clone the Repository

git clone https://github.com/cherylwang20/GCRL_UR10e.git
cd GCRL_UR10e
git submodule update --init --recursive

Installation

You would also need to use an external pre-trained object recognition model for object inference. We use GDINO here, the model should be cloned already through submodule. Please allow the instruction link in the GDINO repo to make sure that CUDA with torch and GPU is compatible.

cd GroundingDINO
pip install -e .

Note on PyTorch 2.0 Compatibility:
If you encounter an error with value.type() in ms_deform_attn_cuda.cu, replace it with value.scalar_type() in:

groundingdino/models/GroundingDINO/csrc/MsDeformAttn/ms_deform_attn_cuda.cu

Set Up the Virtual Environment

Use Python 3.9 (later versions may cause issues with loading the baseline):

python3.9 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Load Submodules

UR10e Gym Environment (mj_envs)

cd mj_envs
pip install -e .

Download the Pre-trained Policy

mkdir -p policy
gdown 'https://drive.google.com/uc?id=1wKpIUVp2kXvf_Lq1VV7aKIoERLOS6QtW' -O policy/baseline.zip

Training a New Policy

To train a new policy, run:

python training/Train_reach.py --env_name 'UR10eReach1C-v1' --group 'Reach_4C_dt20' --num_envs 4 --learning_rate 0.0003 --clip_range 0.1 --seed=0 --channel_num 4 --fs 20

Training Script Arguments

--env_name 'UR10eReach1C-v1' : Specifies the UR10e environment for training.

--group 'Reach_4C_dt20' : Name of the experiment group for logging.

--num_envs 4 : Number of parallel environments.

--learning_rate 0.0003 : Learning rate for PPO.

--clip_range 0.1 : PPO clip range for stable policy updates.

--seed 0 : Random seed, often set via SLURM for batch runs.

--channel_num 4 : Number of input image channels.

--fs 20 : Frame skip (simulation step interval).

Evaluate an Existing Policy

python training/Eval_Baseline.py --env_name "UR10eReach1C-v1" --model_num "baseline"

Sim2Real

To achieve effective sim2real transfer, we fine-tune the policy trained above with observation image augmentation using continuous training. To train with image augmentation, download the resized external images originally from OpenX into background from https://mcgill-my.sharepoint.com/:u:/g/personal/huiyi_wang_mail_mcgill_ca/EZM8oZL_PPVIiOtrbl8Gy0sBLTBYWjd18TOdrS43WULVdA?e=ZBfhfY.

image

Use the following command:

python training/Train_reach.py --env_name "UR10eReach1C-v1" --group 'Reach_4C_dt20_cont' --num_envs 4 --learning_rate 0.0003 --clip_range 0.1 --seed=0 --channel_num 4 --fs 20 --merge True --cont "Your Previous Policy"

No change in the hyperparameter or reward shaping is required. We trained an additional 1 Million Steps until full convergence. Sim2Real shows a lack of transferability without this augmentation.


Details on the Robotic Setup

Getting Started

  • The robot's initial joint configuration is:
    [4.7799, -2.0740, 2.6200, 3.0542, -1.5800, 1.4305e-05] (in radians), with the gripper fully open.
  • Place target objects 30–50 cm in front of the camera, making sure they are visible at the start.

  • The camera is mounted on the Robotiq gripper using a custom 3D-printed bracket.
    It is essential that the gripper is visible in the camera view around 17 degrees downwards.

Sample Video Demonstration for UR10e Reaching

  • G.DINO Prompt: Green Apple
  • Control: ServoJ

📝 Citation

@inproceedings{
    wang2024goalconditioned,
    title={Versatile and Generalizable Manipulation via Goal-Conditioned Reinforcement Learning with Grounded Object Detection},
    author={Huiyi Wang and Fahim Shahriar and Seyed Alireza Azimi and Gautham Vasan and A. Rupam Mahmood and Colin Bellinger},
    booktitle={CoRL 2024 Workshop on Minimalist Robot Learning (MRM-D)},
    year={2024},
    url={https://openreview.net/forum?id=TgXIkK8WPQ}
}

About

GCRL_UR10e is a sim-to-real reinforcement learning framework for the UR10e robotic arm, using goal-conditioned policies trained in simulation and deployed with either servoJ or moveJ control. The system leverages visual goals and supports real-time testing for reach-and-grasp tasks.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •