Skip to content

add Base.length(::SkipMissing) for AbstractArray #42689

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

thazhemadam
Copy link
Member

Add Base.length definition for SkipMissing.

Resolves #42672.

Before this change:

julia> x = [missing, 1, missing];

julia> (lengthskipmissing)(x)
ERROR: MethodError: no method matching length(::Base.SkipMissing{Vector{Union{Missing, Int64}}})

With this change:

julia> x = [missing, 1, missing];

julia> (lengthskipmissing)(x)
1

@martinholters
Copy link
Member

Apart from the potentially unexpected runtime, this seems problematic as it means that calling length can actually change the length (think stateful iterators).

@thazhemadam
Copy link
Member Author

this seems problematic as it means that calling length can actually change the length (think stateful iterators).

@martinholters could you please elaborate? Calling length on the same object multiple times seems to return the same result for me and the original data structure (upon which length∘skipmissing was called) also seems to be unchanged.

@martinholters
Copy link
Member

julia> itr = Iterators.Stateful([missing, 1, missing])
Base.Iterators.Stateful{Vector{Union{Missing, Int64}}, Union{Nothing, Tuple{Union{Missing, Int64}, Int64}}}(Union{Missing, Int64}[missing, 1, missing], (missing, 2), 0)

julia> count(!ismissing, skipmissing(itr).x) # proposed definition of length
1

julia> count(!ismissing, skipmissing(itr).x)
0

@thazhemadam
Copy link
Member Author

Ah, I see. The only way I can think of would be to call count on a copy instead, but that's just even less performant. 😕

@bkamins
Copy link
Member

bkamins commented Jul 1, 2022

Maybe we could restrict the signature to skipmissing on AbstractArray. This method is really needed (otherwise there is no easy way to check the number of elements of passed object that is skipmissing).

@nalimilan
Copy link
Member

I agree. The operation is O(N) but that's also the case for strings so I don't see that as an argument not to implement it.

@thazhemadam thazhemadam force-pushed the master branch 2 times, most recently from 153cce8 to 2512934 Compare July 12, 2022 06:47
Define `length(::SkipMissing)` with `O(N)` complexity.
`length(::SkipMissing)`, as defined, can actually change the length
of some objects (like stateful iterators [1]).

Thus, restrict the method to work only on `AbstractArray`.

[1] `length(::SkipMissing)` changing the length of a `Stateful` iterator.
```
julia> itr = Iterators.Stateful([missing, 1, missing])
Base.Iterators.Stateful{Vector{Union{Missing, Int64}}, Union{Nothing, Tuple{Union{Missing, Int64}, Int64}}}(Union{Missing, Int64}[missing, 1, missing], (missing, 2), 0)

julia> count(!ismissing, skipmissing(itr).x) # proposed definition of length
1

julia> count(!ismissing, skipmissing(itr).x)
0
```
Copy link
Member

@bkamins bkamins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@thazhemadam - when this PR gets merged could you also open a follow-up PR to Compat.jl to ensure backward compatibility? Thank you!

@thazhemadam
Copy link
Member Author

@thazhemadam - when this PR gets merged could you also open a follow-up PR to Compat.jl to ensure backward compatibility? Thank you!

Yes, I can do that. 🙂

@thazhemadam thazhemadam changed the title add Base.length(::SkipMissing) add Base.length(::SkipMissing) for AbstractArray Jul 12, 2022
Add test checking if a `MethodError` is thrown while trying to use
`length(::SkipMissing)` with a Stateful iterator.
@vtjnash
Copy link
Member

vtjnash commented Jul 20, 2022

Contractually, length should probably be O(1) though; and we hope to fix that for String at some point.

@bkamins
Copy link
Member

bkamins commented Jul 20, 2022

I think skipmissing should be an allowed exception. The reason is that, although in this case the operation is not O(1), in it very often used in statistics and the fact that length does not work on it is a major usability issue reported by many users.

@nalimilan
Copy link
Member

Contractually, length should probably be O(1) though; and we hope to fix that for String at some point.

Is that really possible? Note that we also define length(graphemes(::String)), which is unlikely to be made O(1).

Marking for triage to get a decision.

@nalimilan nalimilan added missing data Base.missing and related functionality statistics The Statistics stdlib module triage This should be discussed on a triage call and removed statistics The Statistics stdlib module labels Jul 28, 2022
@JeffBezanson
Copy link
Member

This feels weird to me but it's hard to point to a real problem it causes, so might as well?

@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Aug 18, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
missing data Base.missing and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add a method for length(x::Base.SkipMissing)
6 participants