Skip to content

sort missing data placement #2267

@chris-b1

Description

@chris-b1

Currently, missing is treated as the largest value in a sort. pandas has a na_position kwarg that lets you specify how missing data should be ordered, by default placing it last, regardless of ascending or descending sort.

Whether that default is correct (I like it, but could be familiarity) I do think having an option on sort/order to control missing placement would be nice.

DataFrames 0.21

julia> df = DataFrame(:a => [1.0, 9.0, 2.0, missing])
4×1 DataFrame
│ Row │ a        │
│     │ Float64? │
├─────┼──────────┤
│ 11.0      │
│ 29.0      │
│ 32.0      │
│ 4missing  │

julia> sort(df, :a)
4×1 DataFrame
│ Row │ a        │
│     │ Float64? │
├─────┼──────────┤
│ 11.0      │
│ 22.0      │
│ 39.0      │
│ 4missing  │

julia> sort(df, :a, rev=true)
4×1 DataFrame
│ Row │ a        │
│     │ Float64? │
├─────┼──────────┤
│ 1missing  │
│ 29.0      │
│ 32.0      │
│ 41.0

pandas

df = pd.DataFrame({'a': [1.0, 9.0, 2.0, np.NaN]})

df
Out[42]: 
     a
0  1.0
1  9.0
2  2.0
3  NaN

df.sort_values('a')
Out[43]: 
     a
0  1.0
2  2.0
1  9.0
3  NaN

df.sort_values('a', ascending=False)
Out[44]: 
     a
1  9.0
2  2.0
0  1.0
3  NaN

df.sort_values('a', ascending=False, na_position='first')
Out[45]: 
     a
3  NaN
1  9.0
2  2.0
0  1.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions