Coercing datetime types? #770
-
|
Hi, I'm trying to use pandera to simplify parsing text-based tables (from gherkin based tests) into schema-based dataframes. My initial hope was that I can just define an appropriate schema with coerce=True fields, and let pandera handle most of the work. However, it doesn't seem to be quite as easy with date/datetime-like types. For example, While the df is created and passes validation, its 'date' column is of dtype "object", and has string values. Is there any way for me to specify a schema that coerces the value into an actual date(time)-like dtype? Also somewhat oddly, manual conversion seems to have slightly funny effects. Say I afterwards do Thank you |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
|
The syntax You need to call validate to coerce: re: funny effects. Line 2 of your example does not work, I think you meant
|
Beta Was this translation helpful? Give feedback.
Hi @fabianoliver
The syntax
DataFrame[TestSchema]is intended to help mypy linting but you still define a "regular" pandasDataFrame(notice that you import frompandera.typing). The only additions are type hints and support for pydantic.You need to call validate to coerce:
TestSchema.validate(df). The reason why we cannot implement the behavior you expected is that Pandera would need to override everyDataFramemethod that outputs a DataFrame to callvalidatebefore return.re: funny effects. Line 2 of your example does not work, I think you meant
df.date[0]. It's expected to have Timestamp type in a datetime64[ns] series, the documentation says