-
Notifications
You must be signed in to change notification settings - Fork 19
Optionally handle non monotonic time-series #132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
55cfe97
to
cd9610e
Compare
d689858
to
b67ef67
Compare
b67ef67
to
909f17a
Compare
d02ca22
to
fc4ecf1
Compare
* New load_xdf parameter: handle_non_monotonic (default: False) - None: disable monotonicity checking - False: check and warn only if non-monotonic data are detected - True: attempt to sort non-monotonic data * Additional tests for monotonicity
fc4ecf1
to
d9f1874
Compare
@cbrnr This one is ready for review now. One thing I'm unsure about is whether this is just trying to fix a symptom of a more fundamental problem (e.g. LabRecorder, LSL or device drivers). But I have a lot of data with out-of-order timestamps which this PR seem to help with, and a possibly related issue has been raised here: sccn/labstreaminglayer#125 |
`handle_non_monontonic` now accepts four arguments: - None: disable monotonicity checking - "warn": check and warn only (default) - "trust_timeseries": sort time-stamps only - "trust_timestamps": sort time-stamps and reorder time-series data
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this extensive PR! I only did a very rough pass over the changes (skipping many many details), but I wanted to get a few things out of the way before we continue the discussion (besides my inline comments).
Most importantly, do we all agree that this functionality should go directly into the main load_xdf
function? Given that this problem so far only occurs with ANT devices, it might very well be a problem on their side. If that's the case, I'd be hesitant to add such relatively complex code just to fix/work around their issue. In that case, maybe a separate tool/function which fixes a given XDF file might be a better option.
Also, and I'm not sure if I've missed it, but are we testing the new functionality directly with the load_xdf
function? It looks like the tests currently use different pieces of the underlying functions directly, but I think it would be important to test our main function.
Some minor suggestions:
- Can we simplify the parameter name a bit, maybe
handle_nonmonotonic
or something? Suggestions welcome! - What is the performance impact of sorting timestamps and/or samples? I imagine that this could be noticeable on large datasets, in which case we should at least document it.
@@ -2,6 +2,7 @@ | |||
### Added | |||
- Add new `case_sensitive` parameter to `match_streaminfos`, defaulting to `False` to maintain previous behavior; when `False`, stream properties are matched more leniently ([#134](https://github.com/xdf-modules/pyxdf/pull/134) by [Stefan Appelhoff](https://github.com/sappelhoff)) | |||
- Expose detected clock segments (used in synchronisation) as `stream["info"]["clock_segments"]` ([#131](https://github.com/xdf-modules/pyxdf/pull/131) by [Jamie Forth](https://github.com/jamieforth)) | |||
- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior with additional warning when non-monotonic data are detected; when `None` check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust-timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior with additional warning when non-monotonic data are detected; when `None` check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust-timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth)) | |
- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior but with an additional warning when non-monotonic data are detected; if `None`, the check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust_timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth)) |
# Validate handle_non_monotonic argument. | ||
handle_non_monotonic = HandleNonMonoState(handle_non_monotonic) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although this validates the input, the error message you get for an unsupported value is not very helpful (ValueError: 'whatever' is not a valid HandleNonMonoState
). I would handle (no pun intended) this with a custom more specific error message, which probably makes the use of an Enum
unnecessary.
@@ -542,6 +604,157 @@ def _scan_forward(f): | |||
return False | |||
|
|||
|
|||
# Named tuple to represent monotonicity information. | |||
Monotonicity = namedtuple("Monotonicity", ["result", "n", "dec_count", "eq_count"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be public? Also, and especially if this is just private, does this need to be a NamedTuple
?
Attempt to handle out-of-order timestamps/samples.
New load_xdf parameter:
handle_non_monotonic
(default: "warn")Sometimes LabRecorder XDF files contain out-of-order ground-truth time-stamps (~3% in my case). The problem seems to originate in certain device-specific proprietary software generating the LSL streams – not LSL itself, LabRecorder or network infrastructure.
This PR helps process such files by clock-segment-wise sorting time-stamps (and optionally reordering samples) after synchronisation and before dejittering, thus avoiding highly segmented streams. When non-monotonic data are detected the user must determine whether to trust the original time-stamps or sample sequence order, or just ignore/disable the check if this is not important for them.
If this PR is just a workaround for one broken device it probably doesn't belong in
pyxdf
. An alternative could be to instead provide a new load parameter for the user to pass in a custom function to be applied before dejittering, similar toon_chunk
.I don't have a strong preference, but on the other hand this doesn't introduce any breaking changes, the default behaviour has a negligible impact on load time (e.g. <1 second difference loading a 3GB file on my laptop according to
timeit
) and it's generally a good thing to give users a heads up about potential problems with their data.