Skip to content

Row-oriented reader needs to be able to skip fields #190

@GKrivosheev-rms

Description

@GKrivosheev-rms

Currently, all fields and properties of TRow must be present in the file for Row-based reader to work.
Often row properties are computed, unfilled or otherwise do not needs to be read from the file. We need a way to mark the type so that those members are not deserialized from file.

Proposal:
Add IgnoreColumn attribute to mark columns that must be skipped, such as:

struct MyRow
{
    [IgnoreColumn]
    public DateTime CurrentDate => DateTime.Now;

    [MapToColumn("ColumnB")]
    public string MyValue;
}
using var reader = ParquetFile.CreateRowReader<MyRow>("example.parquet");
...

Alternatively, make reader and writer symmetrical, and allow reader to be customied with list of columns, such as below. Note that the columns are names of members in the class, not in the file. This will allow to set a subset of members in the type.

public static ParquetRowReader<TTuple> CreateRowReader<TTuple>(string path, string[] columnNames = null);

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions