MASTTA: A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation
This is an official repo implementation of A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation, IEEE Transactions on Circuits and Systems for Video Technology
The code and pre-trained weights will be released soon...
Performance comparison of the achievable longest mTTA and the corresponding mAP of state-of-the-art models on the DAD and CCD datasets.
The best balanced results are in bold.
DAD Dataset
|
CCD Dataset
|
Performance comparison of the achievable highest mAP and the corresponding mTTA of state-of-the-art models on the DAD dataset.
The best balanced results are in bold.
Method | Source | mAP (%) | mTTA (s) | TTA₈₀R |
---|---|---|---|---|
GCRNN | ACMMM20 | 72.23 | 1.33 | 1.56 |
L-RAI | CVPR21 | 51.40 | 3.01 | - |
MViTv2 | CVPR21 | 64.49 | - | - |
DSTA | TITS22 | 72.34 | 1.50 | 1.81 |
VideoSwin | CVPR22 | 65.45 | - | - |
UniFormer V2 | ICCV23 | 65.24 | - | - |
DAA-GNN | PR23 | 75.20 | 1.47 | - |
- | ICIP24 | 75.61 | 3.31 | - |
MASTTA (Ours) | - | 80.81 | 3.32 | 3.51 |
If you find our work useful, plase use the following BiBtex for citation:
@ARTICLE{10933925,
author={Patera, Patrik and Chen, Yie-Tarng and Fang, Wen-Hsien},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
title={A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation},
year={2025},
volume={},
number={},
pages={1-1},
keywords={Visualization;Accidents;Adaptation models;Feature extraction;Attention mechanisms;Accuracy;Tuning;Data models;Computational modeling;Vehicle dynamics;accident anticipation;multi-modal architecture;parameter-efficient transfer learning;image-to-video adaptation},
doi={10.1109/TCSVT.2025.3552895}
}