Skip to content

Fix augmentation application when no model is attached #2720

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

samet-akcay
Copy link
Contributor

@samet-akcay samet-akcay commented May 22, 2025

📝 Description

This PR fixes an issue where user-specified augmentations weren't being applied to datasets when no model was attached to the datamodule. The fix ensures augmentations work in all scenarios, improving the user experience especially during testing and development.

Problem

Currently, when users set augmentations on a datamodule but don't attach a model, those augmentations aren't applied during dataset setup. This causes confusion since users naturally expect their specified transforms to be applied regardless of whether a model is present.

The issue occurs in the _update_augmentations method which only applies user augmentations when both:

  1. A dataset subset exists
  2. A model transform exists via trainer.model.pre_processor.transform

This behavior forces users to implement workarounds like directly setting augmentations on datasets or attaching dummy models.

Solution

The fix updates the _update_augmentations method to properly handle scenarios where no model transform exists, ensuring user-specified augmentations are always applied.

Code Changes

# Current implementation
def _update_augmentations(self) -> None:
    """Update the augmentations for each subset."""
    for subset_name in ["train", "val", "test"]:
        subset = getattr(self, f"{subset_name}_data", None)
        augmentations = getattr(self, f"{subset_name}_augmentations", None)
        model_transform = get_nested_attr(self, "trainer.model.pre_processor.transform", None)

        if subset and model_transform:  # Only applies transforms when model exists
            self._update_subset_augmentations(subset, augmentations, model_transform)

# Fixed implementation
def _update_augmentations(self) -> None:
    """Update the augmentations for each subset."""
    for subset_name in ["train", "val", "test"]:
        subset = getattr(self, f"{subset_name}_data", None)
        augmentations = getattr(self, f"{subset_name}_augmentations", None)
        model_transform = get_nested_attr(self, "trainer.model.pre_processor.transform", None)

        if subset:  # Apply transforms regardless of model
            if model_transform:
                self._update_subset_augmentations(subset, augmentations, model_transform)
            elif augmentations:
                subset.augmentations = augmentations

Before Fix

# Creating a datamodule with resize augmentation
augmentations = v2.Compose([
    v2.Resize((64, 64))  # Resize to 64x64
])

folder_datamodule = Folder(
    name="bottle",
    root="datasets/MVTecAD/bottle",
    normal_dir="train/good",
    abnormal_dir="test/broken_large",
    train_augmentations=augmentations,
    test_augmentations=augmentations,
)
folder_datamodule.setup()

# Result: Images remain in original size despite resize transform
batch = next(iter(folder_datamodule.train_dataloader()))
print(batch.image.shape)  # Shows original dimensions, e.g. [3, 900, 900]

After Fix

# Same code as before
augmentations = v2.Compose([
    v2.Resize((64, 64))  # Resize to 64x64
])

folder_datamodule = Folder(
    name="bottle",
    root="datasets/MVTecAD/bottle",
    normal_dir="train/good",
    abnormal_dir="test/broken_large",
    train_augmentations=augmentations,
    test_augmentations=augmentations,
)
folder_datamodule.setup()

# Result: Images are correctly resized to 64x64
batch = next(iter(folder_datamodule.train_dataloader()))
print(batch.image.shape)  # Shows [3, 64, 64] as expected

Previous Workarounds (No Longer Needed)

Before this fix, users had to use one of these workarounds:

  1. Direct Dataset Augmentation:
# After setup
folder_datamodule.train_data.augmentations = augmentations
folder_datamodule.test_data.augmentations = augmentations
  1. Attach a Dummy Model:
# Create a simple model
from anomalib.models import Padim
from anomalib.pre_processing.pre_processor import PreProcessor

model = Padim(pre_processor=PreProcessor(transform=v2.Compose([])))
folder_datamodule.trainer = type('Trainer', (), {'model': model})
folder_datamodule.setup()

Select what type of change your PR is:

  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 🔨 Refactor (non-breaking change which refactors the code base)
  • 🚀 New feature (non-breaking change which adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📚 Documentation update
  • 🔒 Security update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

  • 📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
  • 📚 I have made the necessary updates to the documentation (if applicable).
  • 🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

Signed-off-by: Samet Akcay <samet.akcay@intel.com>
Signed-off-by: Samet Akcay <samet.akcay@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant