-
Notifications
You must be signed in to change notification settings - Fork 766
Tabular datamodule (Custom dataset from DataFrame, CSV, or Parquet) #2713
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tabular datamodule (Custom dataset from DataFrame, CSV, or Parquet) #2713
Conversation
Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>
b90fb9f
to
d5c34ff
Compare
Thanks for creating this updated PR, and for your patience. We've been a bit side tracked by some other tasks. @rajeshgangireddy and @ashwinvaidya17 can you prioritise this PR review please? |
Hi @manuelkonrad If it's not too much, could you also add a simple example notebook under examples/notebooks/100_datamodules |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments.
Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>
Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>
Hi @rajeshgangireddy, thanks a lot for your helpful review! I implemented the proposed changes and also added an example notebook. |
Hi @manuelkonrad , |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great PR! Thanks a lot for your efforts, @manuelkonrad!
I only have a single ask to have an example in the docstring to show from_file()
method. Otherwise, looking great!
Signed-off-by: Manuel Konrad <84141230+manuelkonrad@users.noreply.github.com>
Thanks for the feedback, @samet-akcay! I added the example. |
📝 Description
Hi, I submitted a similar PR about six months ago (PR #2403). It has not been reviewed yet, and in the meantime, Anomalib has gone from v1 to v2. Therefore, I decided to refactor the feature according to the new structure and submit it as a new PR.
Tabular
datamodule which is instantiated directly from a pandasDataFrame
. It is an alternative to theFolder
datamodule for custom datasets where the labels are not encoded in the directory structure. Useful for situations where labels are refined regularly or for sub-sampling large datasets without copying or moving files.from_file
constructor which loads the data from a tabular file supported by pandas.✨ Changes
Select what type of change your PR is:
✅ Checklist
Before you submit your pull request, please make sure you have completed the following steps:
For more information about code review checklists, see the Code Review Checklist.