-
Notifications
You must be signed in to change notification settings - Fork 374
Closed
Description
Currently, missing
is treated as the largest value in a sort. pandas has a na_position
kwarg that lets you specify how missing data should be ordered, by default placing it last, regardless of ascending or descending sort.
Whether that default is correct (I like it, but could be familiarity) I do think having an option on sort/order to control missing placement would be nice.
DataFrames 0.21
julia> df = DataFrame(:a => [1.0, 9.0, 2.0, missing])
4×1 DataFrame
│ Row │ a │
│ │ Float64? │
├─────┼──────────┤
│ 1 │ 1.0 │
│ 2 │ 9.0 │
│ 3 │ 2.0 │
│ 4 │ missing │
julia> sort(df, :a)
4×1 DataFrame
│ Row │ a │
│ │ Float64? │
├─────┼──────────┤
│ 1 │ 1.0 │
│ 2 │ 2.0 │
│ 3 │ 9.0 │
│ 4 │ missing │
julia> sort(df, :a, rev=true)
4×1 DataFrame
│ Row │ a │
│ │ Float64? │
├─────┼──────────┤
│ 1 │ missing │
│ 2 │ 9.0 │
│ 3 │ 2.0 │
│ 4 │ 1.0 │
pandas
df = pd.DataFrame({'a': [1.0, 9.0, 2.0, np.NaN]})
df
Out[42]:
a
0 1.0
1 9.0
2 2.0
3 NaN
df.sort_values('a')
Out[43]:
a
0 1.0
2 2.0
1 9.0
3 NaN
df.sort_values('a', ascending=False)
Out[44]:
a
1 9.0
2 2.0
0 1.0
3 NaN
df.sort_values('a', ascending=False, na_position='first')
Out[45]:
a
3 NaN
1 9.0
2 2.0
0 1.0