Official implementation of MaeFuse: Transferring Omni Features with Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training (TIP 2025)
arxiv link: https://arxiv.org/abs/2404.11016
Any questions can be consulted -> (Email:lijiayang.cs@gmail.com)
The mainstream approach often uses downstream tasks to drive fusion, resulting in more explicit object information in the fusion results. However, this leads to increased complexity and overfitting issues. Our core idea is that such complex downstream tasks are unnecessary. Instead, using a pre-trained encoder with high-level semantic information, such as MAE, can solve all problems effectively. The Fig.1 below explains this well. While we use MAE as the pre-trained encoder, you could also use other encoders with better performance, such as VAE.
Fig.1 Visualization results of the average fusion of feature vectors from different layers of the two modalities.
version 2 will be launched soon. Looking forward to your ⭐!
Setup: Ensure you have Python 3.10 installed. Use the following command to initialize the environment:
pip install -r requirements.txt
Pre-train Weight for MAE(resume): https://drive.google.com/file/d/16YnXfUeqBbSprhWV1OygriAsr2y9cCcf/view?usp=sharing
📖 Pre-training Process Guidance
Test Weight: https://drive.google.com/file/d/18N6tn78VztQOvobVWu6J-RJHo3jsBKkk/view?usp=sharing
Note: The dataset directory specified in --address
must contain two subdirectories named vi
and ir
that contain visible and infrared images respectively.
To use this script, you need to provide the following command-line arguments:
--checkpoint
: Path to the model checkpoint file (e.g.,final_new_60.pth
).--address
: Path to the dataset directory (e.g.,TNO_dataset
).--output
: Path to the folder where output images will be saved.
Note: For details on training code usage, refer to the internal documentation in train.py
.
python train.py
python test_fusion.py --checkpoint path_to_weight --address path_to_dataset --output path_to_output
--checkpoint
: This argument specifies the file path to the model checkpoint, which is used to load the pre-trained model for image fusion.--address
: This argument specifies the directory containing the dataset, which should include both visible and infrared images.--output
: This argument specifies the directory where the fused output images will be saved. If the directory does not exist, it will be created automatically.
@ARTICLE{10893688,
author={Li, Jiayang and Jiang, Junjun and Liang, Pengwei and Ma, Jiayi and Nie, Liqiang},
journal={IEEE Transactions on Image Processing},
title={MaeFuse: Transferring Omni Features With Pretrained Masked Autoencoders for Infrared and Visible Image Fusion via Guided Training},
year={2025},
volume={34},
number={},
pages={1340-1353},
keywords={Feature extraction;Visualization;Training;Image fusion;Data mining;Transformers;Semantics;Deep learning;Lighting;Image color analysis;Image fusion;vision transformer;masked autoencoder;guided training},
doi={10.1109/TIP.2025.3541562}
}