Skip to content

Display of Lag With with_column uses autogenerated name and provided name #1234

@ntjohnson1

Description

@ntjohnson1

Describe the bug
with_column typically add a new column with the provided name. However, when using lag we get two new columns, with the provided name and automatically generate one.

To Reproduce

import datafusion as dfn
from datafusion import col, lit, functions as F
import pyarrow as pa


def datafusion_example() -> None:
    table = pa.table({"a": [1.0, 2.0, 3.0]})
    ctx = dfn.SessionContext()
    df = ctx.from_arrow(table)
    print(
        df.with_column(
            "previous_a",
            F.lag(
                col("a"),
                default_value=None,
            ),
        )
    )

    print(df.with_column("something_else", col("a") + lit(1.0)))


if __name__ == "__main__":
    datafusion_example()

Output

DataFrame()
+-----+-----------------------------------------------------------------------------------------------------------------+------------+
| a   | lag(ced8c2b3710c14382bd5eb58d49ffbd53.a,Int64(1),NULL) ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING | previous_a |
+-----+-----------------------------------------------------------------------------------------------------------------+------------+
| 1.0 |                                                                                                                 |            |
| 2.0 | 1.0                                                                                                             | 1.0        |
| 3.0 | 2.0                                                                                                             | 2.0        |
+-----+-----------------------------------------------------------------------------------------------------------------+------------+
DataFrame()
+-----+----------------+
| a   | something_else |
+-----+----------------+
| 1.0 | 2.0            |
| 2.0 | 3.0            |
| 3.0 | 4.0            |
+-----+----------------+

Expected behavior

DataFrame()
+-----+------------+
| a   | previous_a |
+-----+------------+
| 1.0 |            |
| 2.0 | 1.0        |
| 3.0 | 2.0        |
+-----+------------+
DataFrame()
+-----+----------------+
| a   | something_else |
+-----+----------------+
| 1.0 | 2.0            |
| 2.0 | 3.0            |
| 3.0 | 4.0            |
+-----+----------------+

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions