Skip to content

BXZZcj/ESOT500syn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ESOT500syn: Synthetic Multi-Modality Perception Dataset Generation Pipeline

ESOT500syn is an easy-to-use and highly extensible synthetic event stream dataset generation pipeline. It currently can generate datasets for the following tasks, which are also the primary focus for which this repository was initially developed:

  • Object Bounding Box Tracking (modal/amodal)
  • Object Mask Tracking (modal/amodal)
  • 6D Pose Estimation and Tracking However, it can additionally generate datasets for more tasks (such as Depth Estimation and Visual-based Odometry), depending on user custom configurations. ESOT500syn uses the ManiSkill3 framework to generate RGB datasets, and utilizes the V2CE-Toolbox to convert the RGB datasets into event streams.

License: MIT


Key Features

  • Fully Configuration-Driven: Define entire datasets—from scenes to assets, lighting, object motion, camera motion and so on—in human-readable YAML files. No code changes needed for generating new scenarios.
  • Procedural Generation Pipeline: A two-stage workflow allows you to first generate thousands of deterministic sequence configurations in YAML format, then load the YAML configurations and run the simulation to generate RGB dataset, then convert the RGB dataset into event stream. These steps form a separate and robust batch processing pipeline.
  • Extensive Motion Library: Features a rich, expandable library of deterministic (motion patterns are hard-coded) and stochastic (motion patterns incorporate randomness) motion patterns for both objects and the camera.
  • Composable Motion System: Combine simple motion patterns (e.g., "oscillate" + "spin") to create infinitely complex and challenging object motion trajectories.
  • Rich Domain Randomization: Automatically randomize scenes, target objects, distractor objects, lighting conditions, and camera perspectives to create highly diverse datasets.
  • Intelligent Asset Spawning: Ensures that all procedurally generated objects are spawned within the camera's field of view using a visibility-checking algorithm.
  • Automatic Annotation Generation: Automatically saves RGB frames, modal/amodal masks, 2D modal/amodal bounding boxes, and 6D object poses in camera coordinates for each sequence.
  • Integrated Event Stream Conversion: Seamlessly converts the generated RGB sequences into event streams using the integrated V2CE-Toolbox submodule.

Project Structure

The project is organized into a modular structure to ensure clarity and extensibility:

ESOT500syn/
├── configs/                  # All user-facing configuration files
│   ├── meta_base_config.yaml        # Shared settings for all sequences
│   ├── meta_gen_config.yaml         # Rules for randomizing the dataset
│   └── single_sequence_configs.yaml # Just a demo configuration
│
├── libs/                     # Third-party code managed as git submodules
│   └── V2CE-Toolbox/         # Convert RGB into event streams
│
├── scripts/                  # User-facing executable scripts
│   ├── generate_batch_configs.py     # Stage 1: Generate all sequence configs
│   ├── generate_batch_sequence.py    # Stage 2: Run simulation for all generated configs
│   ├── generate_batch_events.py      # Stage 3: Convert all RGB sequences to events
│   ├── generate_single_sequence.py   # runs a single config file
│   ├── imgs_to_video.py              # Convert RGB sequence into video
│   └── vis_events.py                 # Visualize event stream
│
└── src/
    └── esot500syn/           # The core Python library for the project
        ├── runner.py           # Core simulation loop and setup logic
        ├── motion/             # Object and camera motion definitions
        ├── processing/         # Annotation logic
        ├── simulation/         # ManiSkill environment wrappers and mixins
        └── demo/               # Some demo scripts

Setup and Installation

1. Prerequisites:

  • A Conda or venv environment (Python 3.10+ is recommended).

2. Clone the Repository: Clone this repository along with its git submodules (for V2CE-Toolbox):

git clone --recurse-submodules https://github.com/your-username/ESOT500syn.git
cd ESOT500syn

3. Install Dependencies: This project uses pyproject.toml for dependency management. Install the project in editable mode:

pip install -e .

4. Download Required Assets:

  • ManiSkill Assets: Ensure you have downloaded the required scene datasets (e.g., ReplicaCAD, ArchitecTHOR) and the YCB object models as per the ManiSkill3 documentation.
    python -m mani_skill.utils.download_asset ReplicaCAD
    python -m mani_skill.utils.download_asset RoboCasa
    python -m mani_skill.utils.download_asset AI2THOR
    python -m mani_skill.utils.download_asset ycb
  • V2CE Model Weights: Download the pre-trained model from the V2CE Google Drive link and place it in the following directory:
    libs/V2CE-Toolbox/weights/v2ce_3d.pt
    

5. Note on Headless Rendering: Although ESOT500syn leverage ManiSkill3 simulation framework, it is developed on a headless device, so all the codes are serving for headless runs. Moreover, to as much as possible reduce the visual sim2real gap, ESOT500 by default open the ray-tracing technique (from ManiSkill3) to produce RGB datasets, but the ray-tracing technique seems to be not supported by GPU with ManiSkill3 (at least produces bugs when I'm developing ESOT500syn), to the default device to produce RGB is CPU. Users can also change the device and rendering technique settings in meta_base_config.yaml.

Usage: The 3-Stage Generation Workflow

This project uses a three-stage process to generate the final event stream dataset.

Stage 1: Generate Sequence Configurations

This stage reads your randomization rules (defined by users in meta_gen_config.yaml & meta_base_config.yaml) and generates a unique, deterministic configuration file for every sequence in your dataset.

➡️ Action: Run the generate_configs.py script.

python path/to/scripts/generate_batch_configs.py \
    --base_config path/to/meta_base_config.yaml \
    --gen_config path/to/meta_gen_config.yaml \
    --output_dir path/to/output_dir

⬅️ Result: The path/to/output_dir/ directory will be populated with seq_0000/, seq_0001/, etc., each containing a config.yaml file.

Stage 2: Run Batch Simulation

This stage iterates through the previously generated configurations and runs the ManiSkill simulation for each one, saving the RGB frames and annotations.

➡️ Action: Run the generate_batch_sequence.py script, pointing it to the directory from Stage 1.

python path/to/scripts/generate_batch_sequence.py \
    --configs_dir path/to/output_dir \
    --runner_script path/to/generate_single_sequence.py

⬅️ Result: The output directory specified in your meta_base_config.yaml (e.g., path/to/output_dir/seq_{index}/{scene}/{target_object}) will be populated with the full data for each sequence (rgb/, modal_mask/, amodal_mask/, annotations.json).

Stage 3: Convert RGB to Events

This final stage converts all the generated RGB sequences into event streams.

➡️ Action: Run the generate_batch_events.py script, pointing it to the dataset directory from Stage 2.

# There are many other parameters optional, please refer to the source code of scripts/convert_to_events.py, or directly chech the V2CE-Toolbox.
python scripts/convert_to_events.py \
    --dataset_dir path/to/output_dir \
    --fps 30 # The fps of your input RGB sequence

⬅️ Result: Each seq_{index}/ directory will now also contain an .npz file with the event stream data.

Customizing Your Dataset

The entire generation process is controlled by configs/meta_base_config.yaml and configs/meta_gen_config.yaml.

  • meta_base_config.yaml: Defines static parameters shared across all generated sequences, such as image resolution, simulation quality, and batch settings (seed, num_sequences). It also provides fallback values, like a default lighting setup if randomization is disabled.

  • meta_gen_config.yaml: This is the creative heart of your dataset. It defines the rules and sampling space for randomization. To change the characteristics of your dataset, simply edit this file:

    • Scenes: Adjust scene_id_ranges to control which environments are used.
    • Assets & Motion: Modify the motion_pool lists to change the dynamics. For example, to make objects move slower, reduce the speed ranges. To have more static distractors, duplicate the { type: "static" } entry in the distractor_motion_pool.
    • Camera: Add specific camera poses for certain scenes in poses_by_scene for artistic control, or adjust the motion_pool to favor more dynamic or static camera work.
    • Lighting: Uncomment the lighting block in continuous_sampling to enable fully randomized lighting for every sequence.

Extending the Motion Library

Adding new custom motions is designed to be simple and requires no changes to the core runner logic.

  1. Open the relevant file, e.g., src/esot500syn/motion/object.py or src/esot500syn/motion/camera.py.
  2. Create a new Python function that accepts parameters (mixin, start_pose, config) or (step, env, sensor, initial_pose, cfg) and returns a sapien.Pose.
  3. Decorate the function with @register_motion_pattern("your_new_motion_name").

Example:

@register_motion_pattern("new_spiral_motion")
def motion_spiral(mixin, start_pose, config):
    # ... your logic to calculate new_p and new_q ...
    return sapien.Pose(p=new_p, q=new_q)
  1. You can now immediately use your_new_motion_name in the motion_pool of your meta_gen_configs.yaml!

Debugging a Single Sequence

When a batch generation fails or produces unexpected results, you can easily debug the problematic sequence in isolation.

Every generated sequence folder (e.g., path/to/output/seq_0123/) contains a config.yaml. This file is a complete, deterministic snapshot of the run.

Use the generate_single_sequence.py script to run it:

python scripts/generate_single_sequence.py --config path/to/output/seq_0123/config.yaml

License

This project is licensed under the MIT License following the V2CE. See the LICENSE file for details.

Acknowledgments

Thanks to the great projects including ManiSkill3 and V2CE-Toolbox.

About

Generate synthestic multi-modality perception dataset via Maniskill3 & V2CE

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages