Skip to content

Column names as class attributes only work if type hint is wrapped in Series #2144

@GiorgioBalestrieri

Description

@GiorgioBalestrieri

This relates to #364 (which introduced a great feature).

When accessing column names as class attributes of a schema defined by inheriting from DataFrameModel, if the type hint is not wrapped in Series, then the type of the column name is inferred to be the column type, instead of str.

Minimum working example:

import pandera as pa
from pandera.typing import Series


class FooSchema(pa.DataFrameModel):
    a: Series[int]
    b: Series[float]


class BarSchema(pa.DataFrameModel):
    x: int
    y: float


if __name__ == "__main__":
    foo_columns: list[str] = [FooSchema.a, FooSchema.b]
    bar_columns: list[str] = [BarSchema.x, BarSchema.y]

Mypy output:

pandera_issue.py:17: error: List item 0 has incompatible type "int"; expected "str"  [list-item]
pandera_issue.py:17: error: List item 1 has incompatible type "float"; expected "str"  [list-item]
Found 2 errors in 1 file (checked 1 source file)

This is relatively easy to solve by wrapping the type hints in Series, but:

  1. I would say this is unexpected (and undocumented) behavior, and those error messages are quite confusing
  2. having to wrap every type hint in Series adds boilerplate (which is why support for non-wrapped type hints was added in the first place)

EDIT: I think the reason Series[...] works is that SeriesBase (that Series inherits from) implements __get__ with str as a return type:

def __get__(
self, instance: object, owner: type
) -> str: # pragma: no cover

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions