Description
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Now that we have good structures to validate and manipulate Variants programmatically in Rust APIs we need to low level kernels to manipulate the objects so we can query them in high level languages
The goal is to implement queries such as the following in DataFusion and other arrow-rs based engines that selects fields or elements from an object
Examples
-- Extract the field "my_field" from the variant
SELECT v["my_field"] FROM my_table;
-- Extract the second element from the list stored in the field "x" of v
SELECT v["x"][2] FROM my_table;
-- Cast the field stored in "name" as a string
SELECT CAST(v["x"]) AS VARCHAR
Describe the solution you'd like
To implement this kind of access, I think we need a kernel that uses the Rust Variant
API to extract and data cast the data
Describe alternatives you've considered
Here is a proposal based on the variant_get
function from DataBricks and feedback from @Samyak2 and @adriangb on #7715 (comment)
I am sure the lifetimes need some more work
/// Given a StructArray with an array with Variant values (stored as a StructArray with
/// `metadata`, `value`, and optionally `typed_value` fields)
/// returns the specified field or element
pub fn variant_get(variant_array: StructArray, options: GetOptions<'_>) -> Result<ArrayRef> {
..
}
/// Controls the action of the variant_get kernel
///
/// If `as_type` is specified `cast_options` controls what to do if the
///
struct GetOptions<'a> {
/// What path to extract
path: VariantPath,
/// if `as_type` is None, the returned array will itself be a StructArray with Variant values
///
/// if `as_type` is `Some(type)` the field is returned as the specified type if possible. To specify returning
/// a Variant, pass a Field with variant type in the metadata.
as_type: Option<&Field>,
/// Controls the casting behavior (e.g. error vs substituting null on cast error)
cast_options: CastOptions,
}
/// Represents a qualified path to a potential subfield of an element
struct VariantPath(Vec<VariantPathElement>);
/// Element of a path
enum VariantPathElement<'a> {
/// Access field with name `name`
Field {
name: Cow<'a, str>
},
/// Access the list element offset
Index {
offset: usize
}
Prior Art
Here is a databricks function that does this: https://docs.databricks.com/gcp/en/sql/language-manual/functions/variant_get
Additional context