Skip to content

Incorrect Lineage Detection for Nested UNION in SQL #696

@Bureaux-Tao

Description

@Bureaux-Tao

Description:

I have encountered an issue with the lineage detection in the following SQL query:

INSERT INTO DB.A
(        
     aa
    ,bb
    ,cc
)
SELECT
        CAST('yyyy-dd-mm' AS VARCHAR(10))
       ,CURRENT_TIMESTAMP()
       ,CAST(T.cc AS VARCHAR(12))
FROM DB.B T
UNION ALL
(
SELECT DISTINCT
        CAST('yyyy-dd-mm' AS VARCHAR(10))
       ,CURRENT_TIMESTAMP()
       ,CAST(T.cc AS VARCHAR(12))
FROM DB.B T
LEFT JOIN DB.C T2
ON T.cc = T2.cc
INNER JOIN DB.D T3
ON T2.dd = T3.dd
UNION
SELECT CAST('${var:vdate}' AS VARCHAR(10))
       ,CURRENT_TIMESTAMP()
       ,CAST(T.cc AS VARCHAR(12))
FROM DB.B T
INNER JOIN DB.D T3
ON T.cc = T3.ee
);

Expected Lineage:

Theoretically, the lineage detection for this query should result in:

Source Tables:
    db.b
    db.c
    db.d
Target Tables:
    db.a

Actual Lineage:

However, the lineage detection output by the program is:

Source Tables:
    db.b
Target Tables:
    db.a

Observation:

The issue seems to be related to the nested UNION within the UNION ALL. If I remove the inner UNION and its subsequent code, the lineage is correctly identified.

Request:

Could you please investigate whether sqllineage needs to be optimized to handle such nested UNION structures correctly? Thank you.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions