Skip to content

bug: read_parquet broken in duckdb 1.4 #11634

@cboettig

Description

@cboettig

What happened?

With prior versions of duckdb, we can read parquet no problem:

import ibis
con = ibis.duckdb.connect()
con.read_parquet("https://minio.carlboettiger.info/public-grids/hex/h0.parquet").execute()

With the current version (1.4), we now get an error:

AttributeError: 'pyarrow.lib.RecordBatchReader' object has no attribute 'column_names'

Note that this error does not occur in duckdb 1.4 using the native duckdb client:

import duckdb
duckdb.from_parquet("https://minio.carlboettiger.info/public-grids/hex/h0.parquet").df()

What version of ibis are you using?

10.8.0

What backend(s) are you using, if any?

DuckDB

Relevant log output

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[1], line 3
      1 import ibis
      2 con = ibis.duckdb.connect()
----> 3 con.read_parquet("https://minio.carlboettiger.info/public-grids/hex/h0.parquet").execute()

File /opt/conda/lib/python3.12/site-packages/ibis/expr/types/core.py:424, in Expr.execute(self, limit, params, **kwargs)
    370 def execute(
    371     self,
    372     *,
   (...)    375     **kwargs: Any,
    376 ) -> pd.DataFrame | pd.Series | Any:
    377     """Execute an expression against its backend if one exists.
    378 
    379     Parameters
   (...)    422     [`Value.to_pandas()`](./expression-generic.qmd#ibis.expr.types.generic.Value.to_pandas)
    423     """
--> 424     return self._find_backend(use_default=True).execute(
    425         self, limit=limit, params=params, **kwargs
    426     )

File /opt/conda/lib/python3.12/site-packages/ibis/backends/duckdb/__init__.py:1415, in Backend.execute(self, expr, params, limit, **kwargs)
   1398 rel = self._to_duckdb_relation(expr, params=params, limit=limit, **kwargs)
   1399 table = rel.arrow()
   1401 df = pd.DataFrame(
   1402     {
   1403         name: (
   1404             col.to_pylist()
   1405             if (
   1406                 pat.is_nested(col.type)
   1407                 or pat.is_dictionary(col.type)
   1408                 or
   1409                 # pyarrow / duckdb type null literals columns as int32?
   1410                 # but calling `to_pylist()` will render it as None
   1411                 col.null_count
   1412             )
   1413             else col.to_pandas()
   1414         )
-> 1415         for name, col in zip(table.column_names, table.columns)
   1416     }
   1417 )
   1418 df = DuckDBPandasData.convert_table(df, expr.as_table().schema())
   1419 return expr.__pandas_result__(df)

AttributeError: 'pyarrow.lib.RecordBatchReader' object has no attribute 'column_names'

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions