How to Handle High-Cardinality Column Filtering in Pandas AI with Large Datasets? #1728
Unanswered
akhilesh-chander
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I'm utilizing
Pandas AI
to perform natural language queries on large DataFrames (e.g., 100,000+ rows, 20+ columns). One significant challenge I'm encountering is filtering on high-cardinality columns, such as product, which can have thousands of unique values.Due to the dataset's size, it's impractical to pass the entire DataFrame or all possible filter values to the language model. Consequently, when I execute queries like:
"Show sales for Product X in January"
Pandas AI often:
Applies filters that don't match any records, resulting in empty outputs.
Selects incorrect or irrelevant filters due to limited context.
Fails silently, leading to misleading or zero-result responses.
his behavior undermines the reliability of natural language querying in large-scale, real-world data scenarios.
Question:
How can I effectively manage filtering on high-cardinality columns in Pandas AI when working with large datasets? Specifically:
Any insights or suggestions would be greatly appreciated.
Beta Was this translation helpful? Give feedback.
All reactions