This implementation solely accommodates pretrained encoder models and bi-modal fusion training, offering a faster and simpler process. It additionally allows for single-GPU training and evaluation. To initiate the training of SAFFE, execute train.ipynb:
☀️ This model operates using the imagenet-100 kegalle dataset.
☀️ The model vector dimension is 768