Skip to content

[FEATURE] Improve performance of KLLSketch and DataType Analyzer #583

@zeotuan

Description

@zeotuan

Is your feature request related to a problem? Please describe.
Currently, KLLSketch and DataType analyzer is implemented use the UserDefinedAggregateFunction

private[sql] class StatefulDataType extends UserDefinedAggregateFunction {

which is considered deprecated and should be replaced with Aggregator which offer much greater performance which was outlined here apache/spark#25024 (comment)

Describe the solution you'd like
Reimplement StatefulDataType and StatefulKLLSketch using Aggregator

I am happy to help with this implementation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions