A production-ready, scalable multi-task perception system for autonomous vehicles, capable of handling semantic segmentation, object detection, classification, and depth estimation tasks simultaneously.
- Multi-Task Learning: Jointly train multiple perception tasks with uncertainty weighting
- Scalable Architecture: Designed for deployment on vehicles
- Production-Ready: Includes distributed training, experiment tracking, and model export
- Safety-First: Comprehensive validation and testing framework
- Real-Time Performance: Optimized for edge deployment on automotive hardware
-
Semantic Segmentation
- Road, lane, vehicle, and pedestrian segmentation
- High-precision pixel-level classification
-
Object Detection
- Vehicle, pedestrian, and traffic light detection
- Anchor-based detection with FPN
-
Classification
- Stain/no-stain classification
- Binary classification with uncertainty
-
Depth Estimation
- Monocular depth estimation
- Metric depth prediction
- Backbone: ResNet50 with Feature Pyramid Network (FPN)
- Task Heads: Specialized decoders for each task
- Loss: Multi-task loss with uncertainty weighting
- Training: Distributed training with mixed precision
The model uses uncertainty weighting to automatically balance the losses from different tasks. For each task
Lₜ = ∑ [ Lᵢ / σᵢ² ] + log(σᵢ), for i = 1 to T
where:
-
$\mathcal{L}_t$ is the loss for task$t$ -
$\sigma_t$ is the learnable uncertainty parameter for task$t$ -
$T$ is the total number of tasks
pip install -r requirements.txt
data/
├── train/
│ ├── images/
│ ├── semantic/
│ ├── detection/
│ ├── classification/
│ └── depth/
├── val/
└── test/
- Configure training parameters in
configs/config.yaml
- Run training:
python train.py
- Export model to ONNX:
python export.py
- Deploy to target hardware using vendor SDK
- TensorBoard: Training metrics and visualizations
- Weights & Biases: Experiment tracking and model management
- Logging: Comprehensive logging for debugging and monitoring
- Unit tests for all components
- Integration tests for end-to-end pipeline
- Simulation testing in CARLA/LGSVL
- Real-world validation on test vehicles
- Latency: < 100ms on target hardware
- Accuracy: State-of-the-art on all tasks
- Scalability: Designed for scaling!
- Fork the repository
- Create a feature branch
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code in your research, please cite the following papers:
@inproceedings{kendall2018multi,
title={Multi-task learning using uncertainty to weigh losses for scene geometry and semantics},
author={Kendall, Alex and Gal, Yarin and Cipolla, Roberto},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={7482--7491},
year={2018}
}
@inproceedings{lin2017feature,
title={Feature pyramid networks for object detection},
author={Lin, Tsung-Yi and Doll{\'a}r, Piotr and Girshick, Ross and He, Kaiming and Hariharan, Bharath and Belongie, Serge},
booktitle={Proceedings of the IEEE conference on computer vision and pattern recognition},
pages={2117--2125},
year={2017}
}
@inproceedings{ren2015faster,
title={Faster r-cnn: Towards real-time object detection with region proposal networks},
author={Ren, Shaoqing and He, Kaiming and Girshick, Ross and Sun, Jian},
booktitle={Advances in neural information processing systems},
pages={91--99},
year={2015}
}
For questions and support, please open an issue.