Skip to content

bug: duckdb backend breaks on nulls in memtable #11602

@90degs2infty

Description

@90degs2infty

What happened?

The following breaks:

import ibis

schema = ibis.schema(
    {"a": ibis.dtype("int", nullable=False), "b": ibis.dtype("int", nullable=True)}
)
table = ibis.memtable([{"a": 42, "b": 43}, {"a": 44, "b": ibis.null()}], schema=schema)

print("Original schema:")
print(table.schema())
print("Schema after dropping nulls:")
print(table.drop_null().schema())

print(table.execute())
print(table.drop_null().execute())

Output:

> uv run main.py
Original schema:
ibis.Schema {
  a  !int64
  b  int64
}
Schema after dropping nulls:
ibis.Schema {
  a  !int64
  b  int64
}
Traceback (most recent call last):
  File "project/.venv/lib/python3.13/site-packages/ibis/backends/duckdb/__init__.py", line 1716, in _register_in_memory_table
    obj = data.to_pyarrow_dataset(schema)
          ^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'PandasDataFrameProxy' object has no attribute 'to_pyarrow_dataset'. Did you mean: 'to_pyarrow_bytes'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "project/main.py", line 13, in <module>
    print(table.execute())
          ~~~~~~~~~~~~~^^
  File "project/.venv/lib/python3.13/site-packages/ibis/expr/types/core.py", line 424, in execute
    return self._find_backend(use_default=True).execute(
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^
        self, limit=limit, params=params, **kwargs
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "project/.venv/lib/python3.13/site-packages/ibis/backends/duckdb/__init__.py", line 1398, in execute
    rel = self._to_duckdb_relation(expr, params=params, limit=limit, **kwargs)
  File "project/.venv/lib/python3.13/site-packages/ibis/backends/duckdb/__init__.py", line 1316, in _to_duckdb_relation
    self._run_pre_execute_hooks(expr)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "project/.venv/lib/python3.13/site-packages/ibis/backends/duckdb/__init__.py", line 1297, in _run_pre_execute_hooks
    super()._run_pre_execute_hooks(expr)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "project/.venv/lib/python3.13/site-packages/ibis/backends/__init__.py", line 1321, in _run_pre_execute_hooks
    self._register_in_memory_tables(expr)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
  File "project/.venv/lib/python3.13/site-packages/ibis/backends/__init__.py", line 1298, in _register_in_memory_tables
    self._register_in_memory_table(memtable)
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "project/.venv/lib/python3.13/site-packages/ibis/backends/duckdb/__init__.py", line 1718, in _register_in_memory_table
    obj = data.to_pyarrow(schema)
  File "project/.venv/lib/python3.13/site-packages/ibis/formats/pandas.py", line 436, in to_pyarrow
    return pa.Table.from_pandas(obj, schema=pyarrow_schema)
           ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pyarrow/table.pxi", line 4795, in pyarrow.lib.Table.from_pandas
  File "project/.venv/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 637, in dataframe_to_arrays
    arrays = [convert_column(c, f)
              ~~~~~~~~~~~~~~^^^^^^
  File "project/.venv/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 625, in convert_column
    raise e
  File "project/.venv/lib/python3.13/site-packages/pyarrow/pandas_compat.py", line 619, in convert_column
    result = pa.array(col, type=type_, from_pandas=True, safe=safe)
  File "pyarrow/array.pxi", line 365, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 91, in pyarrow.lib._ndarray_to_array
  File "pyarrow/error.pxi", line 92, in pyarrow.lib.check_status
pyarrow.lib.ArrowInvalid: ('Could not convert None with type NullScalar: tried to convert to int64', 'Conversion failed for column b with type object')

One side-question on the above schemas: after explicitly dropping nulls, I'd expect the targeted columns to be marked as non-nullable, i.e. I'd expect Schema after dropping nulls to read

ibis.Schema {
  a  !int64
  b  !int64    # mind the !
}

Can you please shed some light on why this is not the case? It's probably expected, but I don't understand why 🤔 Thanks! 🙏

What version of ibis are you using?

Python: 3.13
ibis: 10.8.0

What backend(s) are you using, if any?

duckdb, python package version 1.3.2

Relevant log output

Code of Conduct

  • I agree to follow this project's Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugIncorrect behavior inside of ibis

    Type

    No type

    Projects

    Status

    backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions