Tabular datamodule (Custom dataset from DataFrame, CSV, or Parquet) #2713

manuelkonrad · 2025-05-18T19:03:04Z

📝 Description

Hi, I submitted a similar PR about six months ago (PR #2403). It has not been reviewed yet, and in the meantime, Anomalib has gone from v1 to v2. Therefore, I decided to refactor the feature according to the new structure and submit it as a new PR.

This PR adds the Tabular datamodule which is instantiated directly from a pandas DataFrame. It is an alternative to the Folder datamodule for custom datasets where the labels are not encoded in the directory structure. Useful for situations where labels are refined regularly or for sub-sampling large datasets without copying or moving files.
The datamodule also includes a from_file constructor which loads the data from a tabular file supported by pandas.

✨ Changes

Select what type of change your PR is:

🐞 Bug fix (non-breaking change which fixes an issue)
🔨 Refactor (non-breaking change which refactors the code base)
🚀 New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📚 Documentation update
🔒 Security update

✅ Checklist

Before you submit your pull request, please make sure you have completed the following steps:

📋 I have summarized my changes in the CHANGELOG and followed the guidelines for my type of change (skip for minor changes, documentation updates, and test enhancements).
📚 I have made the necessary updates to the documentation (if applicable).
🧪 I have written tests that support my changes and prove that my fix is effective or my feature works (if applicable).

For more information about code review checklists, see the Code Review Checklist.

Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>

samet-akcay · 2025-05-18T20:30:16Z

Thanks for creating this updated PR, and for your patience. We've been a bit side tracked by some other tasks.

@rajeshgangireddy and @ashwinvaidya17 can you prioritise this PR review please?

rajeshgangireddy · 2025-05-19T13:19:28Z

Hi @manuelkonrad
Great PR. Thank you.
I am yet to try it out, but so far it LGTM.

If it's not too much, could you also add a simple example notebook under examples/notebooks/100_datamodules

rajeshgangireddy

Minor comments.

src/anomalib/data/datamodules/image/tabular.py

src/anomalib/data/datasets/image/tabular.py

tests/conftest.py

Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>

manuelkonrad · 2025-05-26T06:06:58Z

Hi @rajeshgangireddy, thanks a lot for your helpful review!

I implemented the proposed changes and also added an example notebook.

rajeshgangireddy · 2025-06-02T10:02:14Z

Hi @manuelkonrad ,
It looks like our semgrep check is failing (unrelated to this PR).
I will first need to fix that and then merge your PR.

samet-akcay

Great PR! Thanks a lot for your efforts, @manuelkonrad!

I only have a single ask to have an example in the docstring to show from_file() method. Otherwise, looking great!

src/anomalib/data/datamodules/image/tabular.py

Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>

manuelkonrad · 2025-06-08T13:47:03Z

Great PR! Thanks a lot for your efforts, @manuelkonrad!

I only have a single ask to have an example in the docstring to show from_file() method. Otherwise, looking great!

Thanks for the feedback, @samet-akcay! I added the example.

manuelkonrad requested review from samet-akcay and ashwinvaidya17 as code owners May 18, 2025 19:03

added first draft of tabular datamodule

d5c34ff

Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>

manuelkonrad force-pushed the feature/tabular-datamodule branch from b90fb9f to d5c34ff Compare May 18, 2025 19:03

manuelkonrad mentioned this pull request May 18, 2025

Dataframe datamodule #2403

Closed

9 tasks

rajeshgangireddy self-requested a review May 19, 2025 07:30

rajeshgangireddy reviewed May 19, 2025

View reviewed changes

manuelkonrad and others added 4 commits May 20, 2025 20:06

refactored make_tabular_dataset and addressed some minor comments

4d31b37

Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>

Merge branch 'main' into feature/tabular-datamodule

a1c9d4f

Merge branch 'main' into feature/tabular-datamodule

7647b67

added example notebook for the tabular datamodule

3a6d712

Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>

Merge branch 'main' into feature/tabular-datamodule

3b42eef

samet-akcay requested changes Jun 4, 2025

View reviewed changes

src/anomalib/data/datamodules/image/tabular.py Show resolved Hide resolved

samet-akcay and others added 2 commits June 7, 2025 21:07

Merge branch 'main' into feature/tabular-datamodule

29b5797

added docstring example for Tabular.from_file constructor

0f14459

Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>

samet-akcay approved these changes Jun 9, 2025

View reviewed changes

samet-akcay merged commit c8ab62d into open-edge-platform:main Jun 9, 2025
10 checks passed

samet-akcay mentioned this pull request Jun 11, 2025

Create datamodules from custom data #2515

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tabular datamodule (Custom dataset from DataFrame, CSV, or Parquet) #2713

Tabular datamodule (Custom dataset from DataFrame, CSV, or Parquet) #2713

Uh oh!

manuelkonrad commented May 18, 2025 •

edited

Loading

Uh oh!

samet-akcay commented May 18, 2025

Uh oh!

rajeshgangireddy commented May 19, 2025 •

edited

Loading

Uh oh!

rajeshgangireddy left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

manuelkonrad commented May 26, 2025

Uh oh!

rajeshgangireddy commented Jun 2, 2025

Uh oh!

samet-akcay left a comment

Uh oh!

Uh oh!

manuelkonrad commented Jun 8, 2025

Uh oh!

Uh oh!

Uh oh!

Tabular datamodule (Custom dataset from DataFrame, CSV, or Parquet) #2713

Tabular datamodule (Custom dataset from DataFrame, CSV, or Parquet) #2713

Uh oh!

Conversation

manuelkonrad commented May 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📝 Description

✨ Changes

✅ Checklist

Uh oh!

samet-akcay commented May 18, 2025

Uh oh!

rajeshgangireddy commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajeshgangireddy left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

manuelkonrad commented May 26, 2025

Uh oh!

rajeshgangireddy commented Jun 2, 2025

Uh oh!

samet-akcay left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

manuelkonrad commented Jun 8, 2025

Uh oh!

Uh oh!

Uh oh!

manuelkonrad commented May 18, 2025 •

edited

Loading

rajeshgangireddy commented May 19, 2025 •

edited

Loading