Skip to content

🔄 refactor(model): memory usage optimisation #2813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

alfieroddan
Copy link
Contributor

📝 Description

The memory bank models currently keep training embeddings, model's stored embeddings on device instead of releasing.

This can be seen most dramatically in Patchcore and PaDiM. We expect peak memory to drop in validation but actually we keep the training embeddings in memory:

Screenshot 2025-07-07 at 15 46 19 Screenshot 2025-07-07 at 15 46 30

As an example here is Patchcore's lightning fit function, and I've commented where we should delete duplicate embeddings:

def fit(self) -> None:
        """Apply subsampling to the embedding collected from the training set.

        This method:
        1. Aggregates embeddings from all training batches
        2. Applies coreset subsampling to reduce memory requirements
        """
        logger.info("Aggregating the embedding extracted from the training set.")
        embeddings = torch.vstack(self.embeddings)

        logger.info("Applying core-set subsampling to get the embedding.")
        self.model.subsample_embedding(embeddings, self.coreset_sampling_ratio)

        # del self.embeddings <-- we should remove the extra ones

In addition to duplicate embeddings we should also be storing embeddings on the model not in lightning. So that the model can be used stand alone.

Lastly, the memory bank models are not designed for DDP or multi-gpu use. I have put default args to fix this.

✨ Changes

Select what type of change your PR is:

  • 🚀 New feature (non-breaking change which adds functionality)
  • 🐞 Bug fix (non-breaking change which fixes an issue)
  • 🔄 Refactor (non-breaking change which refactors the code base)
  • ⚡ Performance improvements
  • 🎨 Style changes (code style/formatting)
  • 🧪 Tests (adding/modifying tests)
  • 📚 Documentation update
  • 📦 Build system changes
  • 🚧 CI/CD configuration
  • 🔧 Chore (general maintenance)
  • 🔒 Security update
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

  • 📚 I have made the necessary updates to the documentation (if applicable).
  • 🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).
  • 🏷️ My PR title follows conventional commit format.

For more information about code review checklists, see the Code Review Checklist.

alfieroddan and others added 13 commits July 2, 2025 17:00
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
…emory bank framework

Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
@alfieroddan alfieroddan marked this pull request as ready for review July 7, 2025 14:49
@alfieroddan alfieroddan requested a review from samet-akcay as a code owner July 7, 2025 14:49
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
@alfieroddan
Copy link
Contributor Author

Ran some benchmarks just to check (bear in mind patchcore runs out of memory on benchmark script).

Screenshot 2025-07-10 at 15 20 57 Screenshot 2025-07-10 at 15 21 04

…u_usage

Signed-off-by: alfieroddan <51797647+alfieroddan@users.noreply.github.com>
Copy link
Contributor

@rajeshgangireddy rajeshgangireddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work.
I have some small comments.

Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Copy link
Contributor

@samet-akcay samet-akcay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking great, thanks for the refactor!

Only have few minor comments.

Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
…ian. Test for None replacements for args

Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
Signed-off-by: Alfie Roddan <51797647+alfieroddan@users.noreply.github.com>
@alfieroddan alfieroddan requested a review from samet-akcay July 15, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants