Skip to content

The fficial implementation of [TCSVT2025] A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation https://ieeexplore.ieee.org/abstract/document/10933925/

License

Notifications You must be signed in to change notification settings

patpatera/MASTTA

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

MASTTA: A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation

This is an official repo implementation of A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation, IEEE Transactions on Circuits and Systems for Video Technology

The code and pre-trained weights will be released soon...

Architecture

image

Visualisation of Accident Events

image

Results

Performance comparison of the achievable longest mTTA and the corresponding mAP of state-of-the-art models on the DAD and CCD datasets.
The best balanced results are in bold.

DAD Dataset

Method Source mAP (%) mTTA (s)
DSAACCV1648.41.34
adaLEACVPR1852.33.43
GCRNNACMMM2053.73.53
FAICPR2149.83.76
DRIVEICCV2162.82.78
L-RACVPR2149.13.04
DSTATITS2256.13.66
GSCTIV2360.42.55
DAA-GNNPR2370.61.59
-ICIP2461.14.06
MASTTA (Ours)-70.23.96

CCD Dataset

Method Source mAP (%) mTTA (s)
DSAACCV1699.64.87
GCRNNACMMM2099.54.74
DSTATITS2299.64.52
-ICIP2499.34.97
MASTTA (Ours)-99.94.95

Performance comparison of the achievable highest mAP and the corresponding mTTA of state-of-the-art models on the DAD dataset.
The best balanced results are in bold.

Method Source mAP (%) mTTA (s) TTA₈₀R
GCRNN ACMMM20 72.23 1.33 1.56
L-RAI CVPR21 51.40 3.01 -
MViTv2 CVPR21 64.49 - -
DSTA TITS22 72.34 1.50 1.81
VideoSwin CVPR22 65.45 - -
UniFormer V2 ICCV23 65.24 - -
DAA-GNN PR23 75.20 1.47 -
- ICIP24 75.61 3.31 -
MASTTA (Ours) - 80.81 3.32 3.51

Citation

If you find our work useful, plase use the following BiBtex for citation:

@ARTICLE{10933925,
  author={Patera, Patrik and Chen, Yie-Tarng and Fang, Wen-Hsien},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation}, 
  year={2025},
  volume={},
  number={},
  pages={1-1},
  keywords={Visualization;Accidents;Adaptation models;Feature extraction;Attention mechanisms;Accuracy;Tuning;Data models;Computational modeling;Vehicle dynamics;accident anticipation;multi-modal architecture;parameter-efficient transfer learning;image-to-video adaptation},
  doi={10.1109/TCSVT.2025.3552895}
}


Releases

No releases published

Packages

No packages published