Data-hungry property of Vision Transformer (ViT) hinders it from widespread application on various data scarcity sceneries.
We collect a paper list for data-efficient training methods as follows.
The contributions of data-efficient ViT are welcomed.
-
Data-Efficient Multi-Scale Fusion Vision Transformer
TLDR: incorporate multi-scale vision tokens to improve data efficiency for the training of vision transformer.
data-efficient training on CIFAR10, CIFAR100, EMNIST, Fashion-MNIST and Caltech101 dataset.
-
Multi-Grained Contrast for Data-Efficient Unsupervised Representation Learning
TLDR: construct multi-grained correspondences between positive views for contrastive learning to capture the representations of different granularity semantics.
data-efficient training on COCO, PASCAL VOC and ADE20K dataset.
-
Inter-Instance Similarity Modeling for Contrastive Learning
TLDR: construct inter-instance similarity between different image instance by patch-mix strategy to encourage the model to capture the similarity between natural images.
data-efficient training on CIFAR10 and CIFAR100 dataset.
-
Asymmetric Patch Sampling for Contrastive Learning
[link], [arxiv], [code], [2025]
TLDR: construct hard positive pairs to encourage more appearance-invariant representations.
data-efficient training on CIFAR10 and CIFAR100 dataset.