Skip to content

Commit 06d5504

Browse files
authored
Update sqlglot to support 27.0.0 or later (#4227)
## Changes This PR updates the `sqlglot` requirement to allow the 27.0.0 release. This release includes support for placeholders in SQL, a relatively new syntax feature supported in DBR 15.2 (or Apache Spark 4.0). SQL queries can be of the form: ```sql SELECT 1 AS col1, 2 AS col2, 3 AS col3 FROM {sdf_system_columns} LIMIT 5 ``` …where `sdf_system_columns` is passed as a keyword argument, for example: ```python sdf_system_columns = spark.read.table("system.information_schema.columns") sdf_example = spark.sql("SELECT 1 AS col1, 2 AS col2, 3 AS col3 FROM {sdf_system_columns} LIMIT 5", sdf_system_columns = sdf_system_columns) ``` Older versions of sqlgplot would fail on this with a parsing error, but now it's parsed correctly but returns a `Placeholder` AST node as the name of the table instead of a string. Given the complexity of handling this properly, for now I've chosen to preserve the old behaviour: the linter will mark the query as unsupported. (See [es-1285042.py](https://github.com/databrickslabs/ucx/blob/e48281a67fbd9f9b80688a8fd08c0e6e19dab5c6/tests/unit/source_code/samples/functional/es-1285042.py) for an example of this.) ### Linked issues Closes #4203 ### Tests - existing integration tests - existing (+ 1 updated) unit tests
1 parent e48281a commit 06d5504

File tree

2 files changed

+10
-3
lines changed

2 files changed

+10
-3
lines changed

pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ dependencies = ["databricks-sdk>=0.58.0,<0.59.0",
4949
"databricks-labs-lsql>=0.16.0,<0.17.0",
5050
"databricks-labs-blueprint>=0.11.0,<0.12.0",
5151
"PyYAML>=6.0.0,<6.1.0",
52-
"sqlglot>=26.7.0,<26.8.0",
52+
"sqlglot>=26.7.0,<27.1.0",
5353
"astroid>=3.3.0,<3.4.0"]
5454

5555
[project.optional-dependencies]

src/databricks/labs/ucx/source_code/sql/sql_parser.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -3,8 +3,8 @@
33
from typing import TypeVar
44

55
from sqlglot import parse
6-
from sqlglot.errors import SqlglotError
7-
from sqlglot.expressions import Create, Delete, Drop, Expression, Select, Table, Update, Use
6+
from sqlglot.errors import SqlglotError, UnsupportedError
7+
from sqlglot.expressions import Create, Delete, Drop, Expression, Select, Table, Update, Use, Identifier
88

99
from databricks.labs.ucx.source_code.base import UsedTable, CurrentSessionState
1010

@@ -54,6 +54,13 @@ def _collect_table_info(
5454
if not src_schema:
5555
logger.warning(f"Could not determine schema for table {table.name}")
5656
return None
57+
# Sqlglot handlers parameter markers by returning an Identifier as the name instead of a string.
58+
# For example: {foo} -> Identifier(this=foo)
59+
if isinstance(table.name, Identifier):
60+
# TODO: Support these properly, for example by inferring the placeholder value from the outside context.
61+
msg = f"Table placeholder detected, not yet supported: {{{table.name}}}"
62+
logger.debug(msg)
63+
raise UnsupportedError(msg)
5764
return UsedTable(
5865
catalog_name=catalog_name,
5966
schema_name=src_schema,

0 commit comments

Comments
 (0)