Ideas for Column-Level Analysis of 5k+ Queries using SQLGlot #5525
Closed
ragunandhant
started this conversation in
General
Replies: 1 comment
-
I'd suggest starting a thread in our Slack which has a more active community. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
@georgesittas
Hi everyone,
I'm embarking on a significant data analysis project and I'm looking for some guidance from the SQLGlot community. I need to perform a deep, programmatic analysis on a corpus of over 5,000 SQL queries to understand how our database columns are being used. My goal is to build a tool that can automatically extract and aggregate insights on column usage, and SQLGlot seems like the perfect library for the job.
I've outlined the specific information I'm trying to extract for each column. I'm hoping to get some advice on the best way to approach this with SQLGlot's AST (Abstract Syntax Tree).
For a given column (e.g., customers.created_at), I want to understand:
Aliases: What are the various aliases assigned to it in the SELECT clause (e.g., SELECT created_at AS registration_date, ...`)?
Transformations: What functions or operations are applied to it (e.g., SELECT DATE(created_at), ..., SELECT SUBSTR(user_id, 1, 5), ...`)?
This is a critical part of the analysis. I'd like to identify:
Filtering: How is the column used to filter data in WHERE clauses? I'm interested in the expressions and constants it's compared against (e.g., WHERE status = 'active', WHERE created_at > '2024-01-01'`).
Join Conditions: How is the column used in JOIN conditions? Which other tables and columns is it being joined to (e.g., ON users.id = orders.user_id`)?
By analyzing all queries, I want to build a comprehensive map of our database schema in terms of usage. This includes:
Which tables are most frequently joined together?
What are the most common join keys used to connect tables?
Beta Was this translation helpful? Give feedback.
All reactions