-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Hi,
I have a serious concern in CIFAR-LT/data/data_processing.py
, where you generate the imbalanced dataset.
First, CIFAR100-LT should be generated with the convention you can find here.
That is, the image samples should be collected after we shuffle the whole train images with a fixed seed 0.
Your code, however, does not perform a shuffling operation and uses different images for the training data.
This makes your version of CIFAR100-LT different from the ones used in the rest of the methods in your table.
Secondly, you first take 25 validation samples from each class for meta-learning, and then construct the imbalanced training data from unused images. This is additional data usage.
For example, CIFAR100-LT with ir=100 has the following distribution of the number of classes:
[500, 477, 455, 434, 415, 396, 378, 361, 344, 328, 314, 299, 286, 273, 260, 248, 237, 226, 216, 206, 197, 188, 179, 171, 163, 156, 149, 142, 135, 129, 123, 118, 112, 107, 102, 98, 93, 89, 85, 81, 77, 74, 70, 67, 64, 61, 58, 56, 53, 51, 48, 46, 44, 42, 40, 38, 36, 35, 33, 32, 30, 29, 27, 26, 25, 24, 23, 22, 21, 20, 19, 18, 17, 16, 15, 15, 14, 13, 13, 12, 12, 11, 11, 10, 10, 9, 9, 8, 8, 7, 7, 7, 6, 6, 6, 6, 5, 5, 5, 5]
Training and meta-training should be all conducted on the above images, and you cannot separate 25 samples from each class to construct a balanced validation split. (There are classes that has <25 samples.)
Overall, I'm afraid that you might have made an unfair comparison on CIFAR100-LT.
Kindly correct me if I'm wrong, I hope to get a reply.
Thanks.