ESOT500syn is an easy-to-use and highly extensible synthetic event stream dataset generation pipeline. It currently can generate datasets for the following tasks, which are also the primary focus for which this repository was initially developed:
- Object Bounding Box Tracking (modal/amodal)
- Object Mask Tracking (modal/amodal)
- 6D Pose Estimation and Tracking However, it can additionally generate datasets for more tasks (such as Depth Estimation and Visual-based Odometry), depending on user custom configurations. ESOT500syn uses the ManiSkill3 framework to generate RGB datasets, and utilizes the V2CE-Toolbox to convert the RGB datasets into event streams.
- Fully Configuration-Driven: Define entire datasets—from scenes to assets, lighting, object motion, camera motion and so on—in human-readable YAML files. No code changes needed for generating new scenarios.
- Procedural Generation Pipeline: A two-stage workflow allows you to first generate thousands of deterministic sequence configurations in YAML format, then load the YAML configurations and run the simulation to generate RGB dataset, then convert the RGB dataset into event stream. These steps form a separate and robust batch processing pipeline.
- Extensive Motion Library: Features a rich, expandable library of deterministic (motion patterns are hard-coded) and stochastic (motion patterns incorporate randomness) motion patterns for both objects and the camera.
- Composable Motion System: Combine simple motion patterns (e.g., "oscillate" + "spin") to create infinitely complex and challenging object motion trajectories.
- Rich Domain Randomization: Automatically randomize scenes, target objects, distractor objects, lighting conditions, and camera perspectives to create highly diverse datasets.
- Intelligent Asset Spawning: Ensures that all procedurally generated objects are spawned within the camera's field of view using a visibility-checking algorithm.
- Automatic Annotation Generation: Automatically saves RGB frames, modal/amodal masks, 2D modal/amodal bounding boxes, and 6D object poses in camera coordinates for each sequence.
- Integrated Event Stream Conversion: Seamlessly converts the generated RGB sequences into event streams using the integrated
V2CE-Toolbox
submodule.
The project is organized into a modular structure to ensure clarity and extensibility:
ESOT500syn/
├── configs/ # All user-facing configuration files
│ ├── meta_base_config.yaml # Shared settings for all sequences
│ ├── meta_gen_config.yaml # Rules for randomizing the dataset
│ └── single_sequence_configs.yaml # Just a demo configuration
│
├── libs/ # Third-party code managed as git submodules
│ └── V2CE-Toolbox/ # Convert RGB into event streams
│
├── scripts/ # User-facing executable scripts
│ ├── generate_batch_configs.py # Stage 1: Generate all sequence configs
│ ├── generate_batch_sequence.py # Stage 2: Run simulation for all generated configs
│ ├── generate_batch_events.py # Stage 3: Convert all RGB sequences to events
│ ├── generate_single_sequence.py # runs a single config file
│ ├── imgs_to_video.py # Convert RGB sequence into video
│ └── vis_events.py # Visualize event stream
│
└── src/
└── esot500syn/ # The core Python library for the project
├── runner.py # Core simulation loop and setup logic
├── motion/ # Object and camera motion definitions
├── processing/ # Annotation logic
├── simulation/ # ManiSkill environment wrappers and mixins
└── demo/ # Some demo scripts
1. Prerequisites:
- A Conda or venv environment (Python 3.10+ is recommended).
2. Clone the Repository: Clone this repository along with its git submodules (for V2CE-Toolbox):
git clone --recurse-submodules https://github.com/your-username/ESOT500syn.git
cd ESOT500syn
3. Install Dependencies:
This project uses pyproject.toml
for dependency management. Install the project in editable mode:
pip install -e .
4. Download Required Assets:
- ManiSkill Assets: Ensure you have downloaded the required scene datasets (e.g.,
ReplicaCAD
,ArchitecTHOR
) and the YCB object models as per the ManiSkill3 documentation.python -m mani_skill.utils.download_asset ReplicaCAD python -m mani_skill.utils.download_asset RoboCasa python -m mani_skill.utils.download_asset AI2THOR python -m mani_skill.utils.download_asset ycb
- V2CE Model Weights: Download the pre-trained model from the V2CE Google Drive link and place it in the following directory:
libs/V2CE-Toolbox/weights/v2ce_3d.pt
5. Note on Headless Rendering: Although ESOT500syn leverage ManiSkill3 simulation framework, it is developed on a headless device, so all the codes are serving for headless runs. Moreover, to as much as possible reduce the visual sim2real gap, ESOT500 by default open the ray-tracing technique (from ManiSkill3) to produce RGB datasets, but the ray-tracing technique seems to be not supported by GPU with ManiSkill3 (at least produces bugs when I'm developing ESOT500syn), to the default device to produce RGB is CPU. Users can also change the device and rendering technique settings in meta_base_config.yaml.
This project uses a three-stage process to generate the final event stream dataset.
This stage reads your randomization rules (defined by users in meta_gen_config.yaml
& meta_base_config.yaml
) and generates a unique, deterministic configuration file for every sequence in your dataset.
➡️ Action: Run the generate_configs.py
script.
python path/to/scripts/generate_batch_configs.py \
--base_config path/to/meta_base_config.yaml \
--gen_config path/to/meta_gen_config.yaml \
--output_dir path/to/output_dir
⬅️ Result: The path/to/output_dir/
directory will be populated with seq_0000/
, seq_0001/
, etc., each containing a config.yaml
file.
This stage iterates through the previously generated configurations and runs the ManiSkill simulation for each one, saving the RGB frames and annotations.
➡️ Action: Run the generate_batch_sequence.py
script, pointing it to the directory from Stage 1.
python path/to/scripts/generate_batch_sequence.py \
--configs_dir path/to/output_dir \
--runner_script path/to/generate_single_sequence.py
⬅️ Result: The output directory specified in your meta_base_config.yaml
(e.g., path/to/output_dir/seq_{index}/{scene}/{target_object}
) will be populated with the full data for each sequence (rgb/
, modal_mask/
, amodal_mask/
, annotations.json
).
This final stage converts all the generated RGB sequences into event streams.
➡️ Action: Run the generate_batch_events.py
script, pointing it to the dataset directory from Stage 2.
# There are many other parameters optional, please refer to the source code of scripts/convert_to_events.py, or directly chech the V2CE-Toolbox.
python scripts/convert_to_events.py \
--dataset_dir path/to/output_dir \
--fps 30 # The fps of your input RGB sequence
⬅️ Result: Each seq_{index}/
directory will now also contain an .npz
file with the event stream data.
The entire generation process is controlled by configs/meta_base_config.yaml
and configs/meta_gen_config.yaml
.
-
meta_base_config.yaml
: Defines static parameters shared across all generated sequences, such as image resolution, simulation quality, and batch settings (seed
,num_sequences
). It also provides fallback values, like a default lighting setup if randomization is disabled. -
meta_gen_config.yaml
: This is the creative heart of your dataset. It defines the rules and sampling space for randomization. To change the characteristics of your dataset, simply edit this file:- Scenes: Adjust
scene_id_ranges
to control which environments are used. - Assets & Motion: Modify the
motion_pool
lists to change the dynamics. For example, to make objects move slower, reduce thespeed
ranges. To have more static distractors, duplicate the{ type: "static" }
entry in thedistractor_motion_pool
. - Camera: Add specific camera poses for certain scenes in
poses_by_scene
for artistic control, or adjust themotion_pool
to favor more dynamic or static camera work. - Lighting: Uncomment the
lighting
block incontinuous_sampling
to enable fully randomized lighting for every sequence.
- Scenes: Adjust
Adding new custom motions is designed to be simple and requires no changes to the core runner logic.
- Open the relevant file, e.g., src/esot500syn/motion/object.py or src/esot500syn/motion/camera.py.
- Create a new Python function that accepts parameters (mixin, start_pose, config) or (step, env, sensor, initial_pose, cfg) and returns a sapien.Pose.
- Decorate the function with @register_motion_pattern("your_new_motion_name").
Example:
@register_motion_pattern("new_spiral_motion")
def motion_spiral(mixin, start_pose, config):
# ... your logic to calculate new_p and new_q ...
return sapien.Pose(p=new_p, q=new_q)
- You can now immediately use
your_new_motion_name
in themotion_pool
of yourmeta_gen_configs.yaml
!
When a batch generation fails or produces unexpected results, you can easily debug the problematic sequence in isolation.
Every generated sequence folder (e.g., path/to/output/seq_0123/
) contains a config.yaml
. This file is a complete, deterministic snapshot of the run.
Use the generate_single_sequence.py
script to run it:
python scripts/generate_single_sequence.py --config path/to/output/seq_0123/config.yaml
This project is licensed under the MIT License following the V2CE. See the LICENSE
file for details.
Thanks to the great projects including ManiSkill3 and V2CE-Toolbox.