A high-performance toolkit for autonomous vehicle scenario-based testing and dataset conversion
ScenarioMax is an extension to ScenarioNet that transforms various autonomous driving datasets into standardized formats. Like ScenarioNet, it first converts different datasets (Waymo, nuPlan, nuScenes) to a unified pickle format. ScenarioMax then extends this process with additional pipelines to convert this unified data into formats compatible with Waymax, V-Max, and GPUDrive.
- Multi-Dataset Support: Unified interface for Waymo Open Motion Dataset, nuScenes, nuPlan, and OpenScenes
- Flexible Output Formats: Convert to TFExample (Waymax/V-Max), JSON (GPUDrive), or unified pickle format
- High Performance: Parallel processing with memory optimization and progress monitoring
- Two-Stage Architecture: Raw β Unified β Target format pipeline for maximum flexibility
- Enhanced Scenarios: Optional scenario enhancement with customizable processing steps
- Installation
- Quick Start
- Usage Examples
- Supported Datasets
- Output Formats
- Architecture
- Development
- Contributing
- License
- Python 3.10
- uv for fast dependency management
- Access to at least one supported dataset (Waymo, nuPlan, or nuScenes)
- Sufficient disk space for dataset processing
# Clone the repository
git clone https://github.com/valeoai/ScenarioMax.git
cd ScenarioMax
# Create and activate virtual environment
uv venv -p 3.10
source .venv/bin/activate
# Install ScenarioMax with dataset support
make womd # Waymo Open Motion Dataset
make nuplan # nuPlan dataset
make nuscenes # nuScenes dataset
make all # All datasets
make dev # Development environment
# For specific datasets
uv pip install -e ".[womd]" # Waymo support
uv pip install -e ".[nuplan]" # nuPlan support
uv pip install -e ".[nuscenes]" # nuScenes support
uv pip install -e ".[dev]" # Development tools
uv pip install -e ".[all]" # All datasets support
For nuPlan dataset, set required environment variables:
export NUPLAN_MAPS_ROOT=/path/to/nuplan/maps
export NUPLAN_DATA_ROOT=/path/to/nuplan/data
# Convert Waymo dataset to TFRecord format
scenariomax-convert \
--waymo_src /path/to/waymo/data \
--dst /path/to/output \
--target_format tfexample \
--num_workers 8
# Convert nuScenes to GPUDrive format
scenariomax-convert \
--nuscenes_src /path/to/nuscenes \
--dst /path/to/output \
--target_format gpudrive
# Multi-dataset conversion
scenariomax-convert \
--waymo_src /data/waymo \
--nuscenes_src /data/nuscenes \
--dst /output \
--target_format tfexample
# Create unified format for later processing
scenariomax-convert \
--waymo_src /data/waymo \
--dst /unified_output \
--target_format pickle \
--num_workers 8
# Raw β Enhanced β TFRecord with scenario enhancement
scenariomax-convert \
--waymo_src /data/waymo \
--dst /output \
--target_format tfexample \
--enable_enhancement \
--num_workers 8
# Process multiple datasets with sharding
scenariomax-convert \
--waymo_src /data/waymo \
--nuplan_src /data/nuplan \
--nuscenes_src /data/nuscenes \
--dst /output \
--target_format tfexample \
--shard 1000 \
--num_workers 16
# Stage 1: Raw β Pickle
scenariomax-convert \
--waymo_src /data/waymo \
--dst /intermediate \
--target_format pickle
# Stage 2: Pickle β Enhanced β TFRecord
scenariomax-convert \
--pickle_src /intermediate \
--dst /final_output \
--target_format tfexample \
--enable_enhancement
Dataset | Version | Link | Status |
---|---|---|---|
Waymo Open Motion Dataset | v1.3.0 | Site | β Full Support |
nuPlan | v1.1 | Site | β Full Support |
nuScenes | v1.0 | Site | π§ WIP |
Argoverse | v2.0 | Site | π§ WIP |
# nuScenes with specific split
scenariomax-convert \
--nuscenes_src /data/nuscenes \
--split v1.0-trainval \
--dst /output \
--target_format tfexample
# nuPlan with direct log parsing
scenariomax-convert \
--nuplan_src /data/nuplan \
--nuplan_direct_from_logs \
--dst /output \
--target_format gpudrive
--target_format tfexample
- Use Case: Training neural networks with Waymax/V-Max
- Output:
training.tfrecord
files with sharding support
--target_format gpudrive
- Use Case: GPU-accelerated simulation and training
- Output: JSON files compatible with GPUDrive simulator
--target_format pickle
- Use Case: Intermediate format for custom processing
- Features: Full scenario data preservation, Python-native
- Output:
.pkl
files with complete scenario information
ScenarioMax uses a two-stage pipeline architecture:
Raw Data β Unified Format β Target Format
β β β
[Dataset] [Enhancement] [ML Ready]
- Raw to Unified: Dataset-specific parsers convert native formats to standardized Python dictionaries
- Enhancement (Optional): Apply transformations, filtering, or augmentation
- Unified to Target: Convert to training-ready formats (TFRecord, JSON, etc.)
pipeline.py
: Main orchestrator with multi-dataset supportdataset_registry.py
: Dynamic dataset configuration systemraw_to_unified/
: Dataset-specific extractors and convertersunified_to_*/
: Target format converterscore/write.py
: Parallel processing with memory management
# Processing options
--num_workers 8 # Parallel workers (default: 8)
--shard 1000 # Output sharding
--num_files 100 # Limit files processed
--enable_enhancement # Enable scenario enhancement
# Dataset options
--split v1.0-trainval # nuScenes data split
--nuplan_direct_from_logs # Alternative nuPlan parsing
# Output options
--tfrecord_name training # TFRecord filename
--log_level INFO # Logging verbosity
--log_file /path/to/log # Log file location
This project is licensed under the MIT License - see the LICENSE file for details.