Skip to content

column lineage has table alias in stead of the table name when using a CTE #713

@lucdelanglez

Description

@lucdelanglez

I have a sql view definition that uses a CTE. In case a table alias is used as the first table in the final from clause, the column lineage has the table alias in stead of the actual table name. In case no table alias is used, then the lineage is correct.

WITH
table1_cte AS (
SELECT
tab1_col1
FROM
table1
)
SELECT
t1_cte.tab1_col1 AS t1_cte_col1,
table2_alias.col1 AS tab2_col1
FROM
(
(
table2 as table2_alias
JOIN table1_cte AS t1_cte ON (
table2_alias.tab2_col1 = t1_cte.tab1_col1
)
)
)

Python code to reproduce:
from sqllineage.runner import LineageRunner

sql statement that produces correct lineage

sql_lineage_ok = """
create view test_view as
WITH
table1_cte AS (
SELECT
tab1_col1
FROM
table1
)
SELECT
t1_cte.tab1_col1 AS t1_cte_col1,
table2.col1 AS tab2_col1
FROM
(
(
table2
JOIN table1_cte AS t1_cte ON (
table2.tab2_col1 = t1_cte.tab1_col1
)
)
)"""

sql statement that produces incorrect lineage, the tabel2_alias should not be in the lineage but table2 should be

sql_lineage_not_ok = """
create view test_view as
WITH
table1_cte AS (
SELECT
tab1_col1
FROM
table1
)
SELECT
t1_cte.tab1_col1 AS t1_cte_col1,
table2_alias.col1 AS tab2_col1
FROM
(
(
table2 as table2_alias
JOIN table1_cte AS t1_cte ON (
table2_alias.tab2_col1 = t1_cte.tab1_col1
)
)
)"""

result = LineageRunner(sql=sql_lineage_ok, verbose=True, silent_mode=True, dialect='mysql')

result.print_column_lineage()

output

#.test_view.t1_cte_col1 <- table1_cte.tab1_col1 <- .table1.tab1_col1
#.test_view.tab2_col1 <- .table2.col1
lst_col_lineage_ok = result.get_column_lineage()
print (lst_col_lineage_ok)

result = LineageRunner(sql=sql_lineage_not_ok, verbose=True, silent_mode=True, dialect='mysql')
result.print_column_lineage()

output

#.test_view.t1_cte_col1 <- table1_cte.tab1_col1 <- .table1.tab1_col1
#.test_view.tab2_col1 <- .table2_alias.col1
lst_col_lineage_not_ok = result.get_column_lineage()
print (lst_col_lineage_not_ok)

Expected behavior
In case an alias is used, the column lineage should still return the actual table name and not the alias

Python version (available via python --version)
Python 3.12.10

SQLLineage version (available via sqllineage --version):
sqllineage 1.5.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions