-
Notifications
You must be signed in to change notification settings - Fork 2
Description
I read your article and found the method very interesting. For the InfoNCE contrastive loss, when I notably looked at the articles {SimCLR, MoCo-V2, MoCo-V3}, it seems that there can be a quite significant difference in performance depending on the value of the batch_size (since the "negative samples" are based on those of the current batch) (that can be of like ~1-3%). In {SimCLR, MoCo-V3} in particular, they used a quite large batch_size (like 4096 in MoCo-V3).
I was wondering if by any chance you have done some experiments with different batch_size values OR done some stuff around it to try to artificially mimic the case where we have a larger batch_size. Since the InfoNCE loss this time is only one of the three loss functions used, I was wondering if the batch_size value is as important as in {SimCLR, MoCo-V3} or if it is less important. For example, can we still get relatively good performance with a smaller batch_size like {64, 128, 256, 512} or do we risk having quite big performance losses (compared to using quite large batch_size) ?