Skip to content

Unfair comparison on CIFAR100-LT (extra/different data usage) #2

@quotation2520

Description

@quotation2520

Hi,

I have a serious concern in CIFAR-LT/data/data_processing.py, where you generate the imbalanced dataset.

First, CIFAR100-LT should be generated with the convention you can find here.
That is, the image samples should be collected after we shuffle the whole train images with a fixed seed 0.
Your code, however, does not perform a shuffling operation and uses different images for the training data.
This makes your version of CIFAR100-LT different from the ones used in the rest of the methods in your table.

Secondly, you first take 25 validation samples from each class for meta-learning, and then construct the imbalanced training data from unused images. This is additional data usage.
For example, CIFAR100-LT with ir=100 has the following distribution of the number of classes:
[500, 477, 455, 434, 415, 396, 378, 361, 344, 328, 314, 299, 286, 273, 260, 248, 237, 226, 216, 206, 197, 188, 179, 171, 163, 156, 149, 142, 135, 129, 123, 118, 112, 107, 102, 98, 93, 89, 85, 81, 77, 74, 70, 67, 64, 61, 58, 56, 53, 51, 48, 46, 44, 42, 40, 38, 36, 35, 33, 32, 30, 29, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5]

Training and meta-training should be all conducted on the above images, and you cannot separate 25 samples from each class to construct a balanced validation split. (There are classes that has <25 samples.)

Overall, I'm afraid that you might have made an unfair comparison on CIFAR100-LT.
Kindly correct me if I'm wrong, I hope to get a reply.
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions