Question about batch_size and the InfoNCE contrastive loss

I read your article and found the method very interesting. For the InfoNCE contrastive loss, when I notably looked at the articles {SimCLR, MoCo-V2, MoCo-V3}, it seems that there can be a quite significant difference in performance depending on the value of the batch_size (since the "negative samples" are based on those of the current batch) (that  can be of like ~1-3%). In {SimCLR, MoCo-V3} in particular, they used a quite large batch_size (like 4096 in MoCo-V3).

I was wondering if by any chance you have done some experiments with different batch_size values ​​OR done some stuff around it to try to artificially mimic the case where we have a larger batch_size.  Since the InfoNCE loss this time is only one of the three loss functions used, I was wondering if the batch_size value is as important as in {SimCLR, MoCo-V3} or if it is less important. For example, can we still get relatively good performance with a smaller batch_size like {64, 128, 256, 512} or do we risk having quite big performance losses (compared to using quite large batch_size) ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Question about batch_size and the InfoNCE contrastive loss #3

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question about batch_size and the InfoNCE contrastive loss #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions