MASTTA: A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation

This is an official repo implementation of A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation, IEEE Transactions on Circuits and Systems for Video Technology

The code and pre-trained weights will be released soon...

Architecture

Visualisation of Accident Events

Results

Performance comparison of the achievable longest mTTA and the corresponding mAP of state-of-the-art models on the DAD and CCD datasets.
The best balanced results are in bold.

DAD Dataset

Method	Source	mAP (%)	mTTA (s)
DSA	ACCV16	48.4	1.34
adaLEA	CVPR18	52.3	3.43
GCRNN	ACMMM20	53.7	3.53
FA	ICPR21	49.8	3.76
DRIVE	ICCV21	62.8	2.78
L-RA	CVPR21	49.1	3.04
DSTA	TITS22	56.1	3.66
GSC	TIV23	60.4	2.55
DAA-GNN	PR23	70.6	1.59
-	ICIP24	61.1	4.06
MASTTA (Ours)	-	70.2	3.96

CCD Dataset

Method	Source	mAP (%)	mTTA (s)
DSA	ACCV16	99.6	4.87
GCRNN	ACMMM20	99.5	4.74
DSTA	TITS22	99.6	4.52
-	ICIP24	99.3	4.97
MASTTA (Ours)	-	99.9	4.95

Performance comparison of the achievable highest mAP and the corresponding mTTA of state-of-the-art models on the DAD dataset.
The best balanced results are in bold.

Method	Source	mAP (%)	mTTA (s)	TTA₈₀R
GCRNN	ACMMM20	72.23	1.33	1.56
L-RAI	CVPR21	51.40	3.01	-
MViTv2	CVPR21	64.49	-	-
DSTA	TITS22	72.34	1.50	1.81
VideoSwin	CVPR22	65.45	-	-
UniFormer V2	ICCV23	65.24	-	-
DAA-GNN	PR23	75.20	1.47	-
-	ICIP24	75.61	3.31	-
MASTTA (Ours)	-	80.81	3.32	3.51

Citation

If you find our work useful, plase use the following BiBtex for citation:

@ARTICLE{10933925,
  author={Patera, Patrik and Chen, Yie-Tarng and Fang, Wen-Hsien},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  title={A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation}, 
  year={2025},
  volume={},
  number={},
  pages={1-1},
  keywords={Visualization;Accidents;Adaptation models;Feature extraction;Attention mechanisms;Accuracy;Tuning;Data models;Computational modeling;Vehicle dynamics;accident anticipation;multi-modal architecture;parameter-efficient transfer learning;image-to-video adaptation},
  doi={10.1109/TCSVT.2025.3552895}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

MASTTA: A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation

Architecture

Visualisation of Accident Events

Results

Citation

About

Uh oh!

Releases

Packages

License

patpatera/MASTTA

Folders and files

Latest commit

History

Repository files navigation

MASTTA: A Multi-modal Architecture with Spatio-Temporal-Text Adaptation for Video-based Traffic Accident Anticipation

Architecture

Visualisation of Accident Events

Results

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages