Skip to content

Re-write DataModule from scratch enabling support for Spark DataFrames, Polars, and larger than memory dataframes #402

@manujosephv

Description

@manujosephv

Is your feature request related to a problem? Please describe.
When the data size is quite large, many times we might need to use larger than RAM data. Also, using an engine like Polars will speed things up a lot.

Describe the solution you'd like
Re-write Datamodule to be more performant. Out of core processing like SparkDataframe or Polars combined with NVTabular might be a good solution.

Describe alternatives you've considered
Currently its impossible to load larger than memory datasets

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions