Robotic Action Frame Prediction with InstructPix2Pix

This repository contains the code and configuration files for training a multimodal fine-tuned InstructPix2Pix model to predict future robotic action frames. The model generates 256×256 resolution images conditioned on a current observation and textual instruction (e.g., "stack blocks", "beat the blocks with hammer"). Results achieve SSIM up to 0.98 and PSNR over 40 dB on synthetic RoboTwin tasks.

🛠️ Environment Setup

1. Clone the Repository

git clone https://github.com/CAI991108/robotic-frame-prediction.git
cd robotic-frame-prediction

2. Create a Conda Environment

cd instruct-pix2pix
conda env create -f environment.yaml
conda activate ip2p
bash scripts/download_checkpoints.sh

Download Pretrained Models

bash scripts/download_pretrained_sd.sh  # Stable Diffusion v1.5
bash scripts/download_checkpoints.sh   # InstructPix2Pix

📊 Data Preparation

1. Generate RoboTwin Dataset

Follow RoboTwin's official guide to generate episodes for three tasks:
- block_hammer_beat
- block_handover
- block_stack_easy
Place generated data in ./RoboTwin_data.

2. Preprocess Data

Use the provided script to preprocess the RoboTwin dataset:

# Step 1: Extract frames and map to instructions
python ./RoboTwin/test.py --root_dir <your_RoboTwin_data_dir> --output_jsonl instructpix2pix_dataset.jsonl

# Step 2: Convert to InstructPix2Pix-compatible format
python ./instruct-pix2pix/data/instructpix2pix/data_prepare.py --input_jsonl instructpix2pix_dataset.jsonl --output_dir <your_output_dir>

🚀 Training

1. Configure Paths

Edit the ./instruct-pix2pix/configs/train.yaml file to set the paths for the dataset and checkpoints:

data:
  params:
    batch_size: 2   # Batch size for training
    num_workers: 2   # Number of workers for data loading
    train:
      params:
        path: ./data/instructpix2pix  # Path to preprocessed data

2. Start Training

python ./instruct-pix2pix/main.py \
  --name default \
  --base configs/train.yaml \
  --train \
  --gpus 0,1  # Use 2 GPUs

Key Training Parameters:

Batch size: 2 per GPU (effective 16 with gradient accumulation)
Learning rate: 1e-4 (AdamW optimizer)
Epochs: 100

📈 Evaluation

Visualize Predictions and SSIM/PSNR Scores

python ./instruct-pix2pix/eval.py --ckpt logs/train_default/checkpoints/last.ckpt

🔄 Reproducibility Notes

Hardware

GPUs: 2× NVIDIA RTX 2080 Ti (22GB VRAM each)
RAM: 134 GB

Common Issues

Dependency Conflicts: Ensure exact versions in requirements.txt and environment.yaml.
OOM Errors: Reduce batch size, enable --half-precision or set use_ema: false.
Dataset Paths: Verify paths in train.yaml and data_prepare.py.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Final_Project		Final_Project
LICENSE		LICENSE
README.md		README.md
main.pdf		main.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Robotic Action Frame Prediction with InstructPix2Pix

📋 Table of Contents

🛠️ Environment Setup

1. Clone the Repository

2. Create a Conda Environment

Download Pretrained Models

📊 Data Preparation

1. Generate RoboTwin Dataset

2. Preprocess Data

🚀 Training

1. Configure Paths

2. Start Training

Key Training Parameters:

📈 Evaluation

Visualize Predictions and SSIM/PSNR Scores

🔄 Reproducibility Notes

Hardware

Common Issues

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

CAI991108/Robotic-Action-Frame-Prediction-with-InstructPix2Pix

Folders and files

Latest commit

History

Repository files navigation

Robotic Action Frame Prediction with InstructPix2Pix

📋 Table of Contents

🛠️ Environment Setup

1. Clone the Repository

2. Create a Conda Environment

Download Pretrained Models

📊 Data Preparation

1. Generate RoboTwin Dataset

2. Preprocess Data

🚀 Training

1. Configure Paths

2. Start Training

Key Training Parameters:

📈 Evaluation

Visualize Predictions and SSIM/PSNR Scores

🔄 Reproducibility Notes

Hardware

Common Issues

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages