-
Notifications
You must be signed in to change notification settings - Fork 128
Open
Description
Description
Background
Users of datafusion-python
can create table providers from at least two different pathways:
- A table provider created via PyCapsule (i.e. custom providers implemented in Rust or elsewhere, exposed to Python via
__datafusion_table_provider__
). - A view-based provider created via
into_view()
on an existingDataFrame
, then registered with aSessionContext
.
These two types serve similar roles (they supply data / logical plans to DataFusion), but currently they behave differently, which can lead to confusion.
Problem / Confusion
- A user might reasonably expect that a table provider object created with
into_view()
could be registered with a session context in the same way as a PyCapsule‐exposed provider, but that may not always work (or may not be documented). - There is risk of mismatch in how the internals treat providers from the two sources (views vs PyCapsules).
- Without a unified type or interface, it’s unclear whether certain operations should/can be supported for both.
- The divergence might cause unexpected errors or surprising behavior for the user, especially around registration, reuse, or compatibility of providers.
Desired Behavior / Suggestion
-
Define a single common
PyTableProvider
(or similarly named abstraction) that works identically whether created viainto_view()
or via a PyCapsule / external source. -
Ensure that the
SessionContext.register_table_provider(...)
accepts this common type regardless of source. -
Document clearly:
- what kinds of table providers are accepted (views, PyCapsules, external)
- how to obtain them from each path
- equivalence or limitations (if any)
-
Possibly enhance the implementation so that a view-based provider can be converted (or wrapped) into the same internal abstraction that a PyCapsule provider uses.
Benefits
- Reduced confusion for users.
- More consistency in the API.
- Easier to reason about table providers across different parts of a codebase.
- Potential fewer bugs when mixing providers from different sources.
Context
This issue is motivated by this comment
#1016 (comment)
Metadata
Metadata
Assignees
Labels
No labels