Skip to content

Question about batch_size and the InfoNCE contrastive loss #3

@PoissonChasseur

Description

@PoissonChasseur

I read your article and found the method very interesting. For the InfoNCE contrastive loss, when I notably looked at the articles {SimCLR, MoCo-V2, MoCo-V3}, it seems that there can be a quite significant difference in performance depending on the value of the batch_size (since the "negative samples" are based on those of the current batch) (that can be of like ~1-3%). In {SimCLR, MoCo-V3} in particular, they used a quite large batch_size (like 4096 in MoCo-V3).

I was wondering if by any chance you have done some experiments with different batch_size values ​​OR done some stuff around it to try to artificially mimic the case where we have a larger batch_size. Since the InfoNCE loss this time is only one of the three loss functions used, I was wondering if the batch_size value is as important as in {SimCLR, MoCo-V3} or if it is less important. For example, can we still get relatively good performance with a smaller batch_size like {64, 128, 256, 512} or do we risk having quite big performance losses (compared to using quite large batch_size) ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions