Converter class does not convert Athena string data to pandas str type

First of all, thank you for creating this library! It's been immensely helpful and I've used it in multiple contexts over several years and would love to contribute - especially if it helps solve my current problem! 

With `pyathena=1.10.7 ` and `pandas=1.0.5` I am running the following code with the expectation that the converter class will cast the Athena `string` data type as an `str` pandas dtype.
```
from pyathena import connect
from pyathena.pandas_cursor import PandasCursor
from pyathena.converter import Converter

class CustomPandasTypeConverter(Converter):

    def __init__(self):
        super(CustomPandasTypeConverter, self).__init__(
            mappings=None,
            types={
                'boolean': bool,
                'tinyint': int,
                'smallint': int,
                'integer': int,
                'bigint': int,
                'float': float,
                'real': float,
                'double': float,
                'decimal': float,
                'char': str,
                'varchar': str,
                'array': str,
                'map': str,
                'row': str,
                'varbinary': str,
                'json': str,
                'string': str
            }
        )

    def convert(self, type_, value):
        # Not used in PandasCursor.
        pass
    
cur = connect(s3_staging_dir='<staging_directory_url>',
                region_name='<aws_region>',
                cursor_class = PandasCursor,
                converter=CustomPandasTypeConverter(),
                work_group = '<workgroup_name>').cursor()

query = 'SELECT * FROM <schema>.<table>'
df = cur.execute(query).as_pandas()
df.dtypes
```

When I inspect the `dtypes`, Athena `int`s are converted to pandas `int`s, `decimals` are converted to `floats` and `string`s are consistently returned as `object` dtypes. However Athena `string` `NULL`s  are cast as `NaN`s which require explicit column-by-column `fillna` operations. This is particularly inconvenient, since I'm trying to subsequently convert the pandas dataframe to a Spark dataframe. Now that I've typed all this out, I'm guessing this is related to #118?

Also, I'm not sure where the right place to ask this is, but are there any plans to implement a `PySparkCursor` for `PyAthena`? If not can I help by contributing?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Converter class does not convert Athena string data to pandas str type #148

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Converter class does not convert Athena string data to pandas str type #148

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions