A request for Databricks cluster key optimization #4824

tshen-PayPay · 2025-03-04T01:14:23Z

tshen-PayPay
Mar 4, 2025

Due to Databricks’ liquid clustering key constraints (see [Databricks documentation](https://docs.databricks.com/aws/en/delta/clustering)), the index is only effective when no functions are applied on the processed_at column. For example, consider the following SQL:

to_date(from_utc_timestamp(processed_at, 'Asia/Tokyo')) >= to_date('2025-02-01')
AND to_date(from_utc_timestamp(processed_at, 'Asia/Tokyo')) < date_add(MONTH, 1, to_date('2025-02-01'))

Since applying any operators on processed_at prevents the liquid clustering key index from being effective, we need a functionality that optimizes the query when converting it to Databricks SQL. The optimized version rewrites the condition as follows:

processed_at >= to_utc_timestamp(to_date('2025-02-01'), 'Asia/Tokyo')
AND processed_at < to_utc_timestamp(date_add(MONTH, 1, to_date('2025-02-01')), 'Asia/Tokyo')

This rewritten SQL avoids wrapping processed_at with any functions, thus ensuring that the liquid clustering key index can be utilized effectively.

georgesittas · 2025-03-04T10:56:58Z

georgesittas
Mar 4, 2025
Collaborator

This is out of scope for SQLGlot, but you can implement a custom transformation if you want.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

A request for Databricks cluster key optimization #4824

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

A request for Databricks cluster key optimization #4824

Uh oh!

tshen-PayPay Mar 4, 2025

Replies: 1 comment

Uh oh!

georgesittas Mar 4, 2025 Collaborator

tshen-PayPay
Mar 4, 2025

georgesittas
Mar 4, 2025
Collaborator