Skip to content

CatalogProvider errors are badly mangled #1226

@colinmarc

Description

@colinmarc

I'm working on a setup where we use a python CatalogProvider with register_catalog_provider:

class MyCatalog:
    ...

ctx.register_catalog_provider('datafusion', MyCatalog())
ctx.sql(...)

This results in a call stack that goes python -> rust -> python and back. As a result, if an error is raised by MyCatalog, it gets badly mangled before being reraised (for example by ctx.sql):

DataFusion error: Execution("PyErr { type: <class 'internal.CatalogClientError'>, value: CatalogClientError('Table \".nonexistant_table\" not found...')"

There's no way to recover anything useful from this exception without string-parsing.

To fix this, we'd probably need to add DataFusionError::Ffi(Box<dyn Error>) upstream, then construct it here:

InnerDataFusionError::Execution(format!("{e:?}"))

Then, we could check for it here, and, if it matches, potentially return the original PyErr unchanged:

https://github.com/apache/datafusion-python/blob/f0bbad7543717c5f08ba2acb92d42c9d30fd2355/src/errors.rs

I haven't tested this approach, but if it sounds reasonable I could give it a shot.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions