Skip to content

Unable to Achieve 90% Accuracy with PerBatch Method in CORDS_SL_CIFAR10_Custom_Train.ipynb #96

@football-prince

Description

@football-prince

Dear CORDS team,

I am trying to replicate the results from your paper "GRAD-MATCH: Gradient Matching based Data Subset Selection for Efficient Deep Model Training" using the provided code in cords/examples/SL/image_classification/python_notebooks/CORDS_SL_CIFAR10_Custom_Train.ipynb. However, I am encountering a discrepancy in performance.

I have followed the instructions in the notebook and implemented a custom dataloader for the PerBatch subset selection method, as the original code did not provide it. After training for 300 epochs, I am only able to achieve 82% accuracy, whereas the paper reports 90% accuracy. I would appreciate any help in identifying potential issues or misconfigurations that could be causing this gap.

Here are the relevant parts of the code and parameter settings I used:

from cords.utils.data.dataloader.SL.adaptive import GradMatchDataLoader, CRAIGDataLoader, RandomDataLoader
from dotmap import DotMap

selection_strategy = 'GradMatch'

dss_args = dict(
    model=model,
    loss=criterion_nored,
    eta=0.01,                  # Learning rate
    num_classes=10,            # Number of classes in the dataset, e.g., CIFAR-10
    num_epochs=300,            # Total number of epochs
    device='cuda',             # Device set to GPU
    fraction=0.1,              # Fraction of the subset size
    select_every=20,           # Frequency of subset selection every few epochs
    kappa=0,                   # Kappa value
    linear_layer=True,         # Whether to use a linear layer
    selection_type='PerBatch', # GradMatch selection type like 'PerClass', 'PerBatch'
    greedy='Stochastic',       # Type of greedy selection
    valid=False,               # Whether to use validation set for subset selection
    v1=True,                   # GradMatch parameter v1
    lam=0.5,                   # Regularization parameter lam
    eps=1e-5                   # Convergence tolerance eps
)

dss_args = DotMap(dss_args)

dataloader = GradMatchDataLoader(
    trainloader,            # Training data loader
    valloader,              # Validation data loader
    dss_args,               # Parameter dictionary
    logger,                 # Logger
    batch_size=20,          # Batch size for the data loader
    shuffle=True,           # Whether to shuffle the data
    pin_memory=False        # Whether to use pin_memory for faster data loading
)

Specific Issue:

I am using the PerBatch subset selection method, but the accuracy still cannot exceed 82%, while the paper reports 90%.
Are there any key configurations or settings in dss_args or GradMatchDataLoader initialization that I may have missed?
I have checked all steps but still cannot achieve the expected results. Any guidance or recommendations for adjustments would be greatly appreciated.

Thank you for your time and help!

Best regards,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions