Skip to content

Incorrect NaN Comparison #1233

@ntjohnson1

Description

@ntjohnson1

Describe the bug
NaN < <float> and NaN > <float> should both yield false.
However I don't get that behavior in datafusion.

To Reproduce
I reproduced this on 48 and 49

import numpy as np
import pyarrow as pa
import pyarrow.compute as pc
import datafusion as dfn
from datafusion import col


def py_arrow_example() -> None:
    nan = pa.array([np.nan], type=pa.float64())
    not_nan = pa.array([1.0], type=pa.float64())
    print(f"Less: {pc.less(nan, not_nan)}\nGreater: {pc.greater(nan, not_nan)}")


def datafusion_example() -> None:
    table = pa.table({"a": [np.nan], "b": [1.0]})
    ctx = dfn.SessionContext()
    df = ctx.from_arrow(table)
    result = df.select(
        (col("a") < col("b")).alias("less"),
        (col("a") > col("b")).alias("greater"),
    )
    print(result)


if __name__ == "__main__":
    py_arrow_example()
    datafusion_example()

Output

Less: [
  false
]
Greater: [
  false
]
DataFrame()
+-------+---------+
| less  | greater |
+-------+---------+
| false | true    |
+-------+---------+

Expected behavior
False for both comparisons or clearer documentation for the handling.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions