-
Notifications
You must be signed in to change notification settings - Fork 264
Description
I have a sql view definition that uses a CTE. In case a table alias is used as the first table in the final from clause, the column lineage has the table alias in stead of the actual table name. In case no table alias is used, then the lineage is correct.
WITH
table1_cte
AS (
SELECT
tab1_col1
FROM
table1
)
SELECT
t1_cte
.tab1_col1
AS t1_cte_col1
,
table2_alias
.col1
AS tab2_col1
FROM
(
(
table2
as table2_alias
JOIN table1_cte
AS t1_cte
ON (
table2_alias
.tab2_col1
= t1_cte
.tab1_col1
)
)
)
Python code to reproduce:
from sqllineage.runner import LineageRunner
sql statement that produces correct lineage
sql_lineage_ok = """
create view test_view as
WITH
table1_cte
AS (
SELECT
tab1_col1
FROM
table1
)
SELECT
t1_cte
.tab1_col1
AS t1_cte_col1
,
table2
.col1
AS tab2_col1
FROM
(
(
table2
JOIN table1_cte
AS t1_cte
ON (
table2
.tab2_col1
= t1_cte
.tab1_col1
)
)
)"""
sql statement that produces incorrect lineage, the tabel2_alias should not be in the lineage but table2 should be
sql_lineage_not_ok = """
create view test_view as
WITH
table1_cte
AS (
SELECT
tab1_col1
FROM
table1
)
SELECT
t1_cte
.tab1_col1
AS t1_cte_col1
,
table2_alias
.col1
AS tab2_col1
FROM
(
(
table2
as table2_alias
JOIN table1_cte
AS t1_cte
ON (
table2_alias
.tab2_col1
= t1_cte
.tab1_col1
)
)
)"""
result = LineageRunner(sql=sql_lineage_ok, verbose=True, silent_mode=True, dialect='mysql')
result.print_column_lineage()
output
#.test_view.t1_cte_col1 <- table1_cte.tab1_col1 <- .table1.tab1_col1
#.test_view.tab2_col1 <- .table2.col1
lst_col_lineage_ok = result.get_column_lineage()
print (lst_col_lineage_ok)
result = LineageRunner(sql=sql_lineage_not_ok, verbose=True, silent_mode=True, dialect='mysql')
result.print_column_lineage()
output
#.test_view.t1_cte_col1 <- table1_cte.tab1_col1 <- .table1.tab1_col1
#.test_view.tab2_col1 <- .table2_alias.col1
lst_col_lineage_not_ok = result.get_column_lineage()
print (lst_col_lineage_not_ok)
Expected behavior
In case an alias is used, the column lineage should still return the actual table name and not the alias
Python version (available via python --version
)
Python 3.12.10
SQLLineage version (available via sqllineage --version
):
sqllineage 1.5.4