Skip to content

possible to disable sanitize_columns? #533

Open
@pwmcintyre

Description

@pwmcintyre

Hi

Similar to existing issues:

I have Glue tables with - and them, and fields with ., and while I understand they're not supported, they do work!

I'm trying to use this package to help write some parquet files, but this "feature" is preventing me.

Table

Table i'm trying to add data to:

image

Dataset

Dataset i'm trying to write:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1 entries, 0 to 0
Data columns (total 27 columns):
 #   Column                                           Non-Null Count  Dtype  
---  ------                                           --------------  -----  
 0   p_version                                        1 non-null      object 
 1   asset                                            1 non-null      object 
 2   date                                             1 non-null      object 
 3   meta.format                                      1 non-null      object 
...

Sample:

df_flat.head()

image

Code

wr.s3.to_parquet(
    df=df_flat,
    path=path,
    dataset=True,
    mode="append",
    database=database,
    table=table,
    sanitize_columns=False, # ignored!
    partition_cols=partition_cols,
    schema_evolution=False, # prevent accidental Catalogue updates
)

Error

InvalidArgumentValue: Schema change detected: New column meta_format with type string. Please pass schema_evolution=True to allow new columns behaviour.

Seeking advice.

Metadata

Metadata

Assignees

Labels

WIPWork in progressenhancementNew feature or requestquestionFurther information is requested

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions