Skip to content

PatrikOkanovic/RS2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning (PDF)

Requirements

To create a conda environment:

conda create -n rs2_env python=3.9
conda activate rs2_env

To create a virtual environment:

python -m venv rs2_env
source rs2_env/bin/activate

To install requirements:

pip install -r requirements.txt

Note: the above installation has been tested using Python 3.9.16 on Linux machines running Ubuntu 20.04.6 LTS. It’s possible some errors may arise on other platforms. For example, we found it necessary to downgrade urllib3 from version 2.0.2 to 1.26.16 on some Mac machines. See link1 and link2 for more details regarding this issue.

Datasets

More details about downloading and preparing the datasets can be found: here.

Running The Code

Running RS2 w/o replacement on CIFAR10 using ResNet18 for 200 epochs:

python src/DeepCore/main.py --data_path "data/" --dataset "CIFAR10" --n-class 10 --model "ResNet18" --selection "UniformNoReplacement" --epochs 200 --batch-size 128 --fraction 0.1 --per_epoch "True" 

In order to train on a static subset each epoch, the parameter --per_epoch should be set to False.

We support training on standard datasets by changing the --dataset parameter to: CIFAR10, CIFAR100, ImageNet30, ImageNet, TinyImageNet. Paramater --n-class should be set to the number of different classes for the chosen dataset.

Parameter --selection defines which method will be used for selecting the subsets. Some of the possible methods are: Forgetting, Herding, Craig, Uniform, etc. For a complete list take a look at the names of imported classes: here.

fraction represents the selection ratio r, i.e., the percentage of data that will be used per round.

For details about the hyperparameters, and how to change them take a look at arguments.py.

Reproducing the Paper Experiments

We provide scripts for running experiments with and without Slurm. The hyperparameters used to produce the results reported in the paper are set in the corresponding scripts. For details about the hyperparameters, and how to change them take a look at arguments.py.

Before executing files from scripts/ move them to the main folder.

mv scripts/<script_name> .

Running time-to-accuracy for CIFAR-10:

source run_timeToAcc_cifar10.sh

Running time-to-accuracy for ImageNet-1k:

source run_timeToAcc_imagenet.sh

Running robustness experiments on CIFAR-10 with label noise:

source run_robustness_experiments.sh

Running dataset distillation experiments:

source run_dataset_distillation.sh

Running per-round sampling experiments on CIFAR-10

source run_perround_experiments.sh

Analyzing ouptut files

More details about analyzing the output files can be found here.

Results

Time-to-accuracy

CIFAR-10 ImageNet-1k
CIFAR-10 ImageNet-1k

Citation

@inproceedings{
    okanovic2024repeated,
    title     = {Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning},
    author    = {Patrik Okanovic and Roger Waleffe and Vasilis Mageirakos and Konstantinos Nikolakakis and Amin Karbasi and Dionysios Kalogerias and Nezihe Merve G{\"u}rel and Theodoros Rekatsinas},
    booktitle = {The Twelfth International Conference on Learning Representations},
    year      = {2024},
    url       = {https://openreview.net/forum?id=JnRStoIuTe}
}

References

  • Park, Dongmin, Dimitris Papailiopoulos, and Kangwook Lee. "Active Learning is a Strong Baseline for Data Subset Selection." Has it Trained Yet? NeurIPS 2022 Workshop [code]
  • Guo et al. DeepCore: A Comprehensive Library for Coreset Selection in Deep Learning, 2022. [code]

About

Repeated Random Sampling for Minimizing the Time-to-Accuracy of Learning

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published