Optionally handle non monotonic time-series #132

jamieforth · 2025-01-26T12:49:38Z

Attempt to handle out-of-order timestamps/samples.

New load_xdf parameter: handle_non_monotonic (default: "warn")

None: disable monotonicity checking (previous behaviour)
"warn": check and warn only (otherwise previous behaviour)
"trust_timeseries": sort time-stamps only
"trust_timestamps": sort time-stamps and reorder time-series data accordingly

Sometimes LabRecorder XDF files contain out-of-order ground-truth time-stamps (~3% in my case). The problem seems to originate in certain device-specific proprietary software generating the LSL streams – not LSL itself, LabRecorder or network infrastructure.

This PR helps process such files by clock-segment-wise sorting time-stamps (and optionally reordering samples) after synchronisation and before dejittering, thus avoiding highly segmented streams. When non-monotonic data are detected the user must determine whether to trust the original time-stamps or sample sequence order, or just ignore/disable the check if this is not important for them.

If this PR is just a workaround for one broken device it probably doesn't belong in pyxdf. An alternative could be to instead provide a new load parameter for the user to pass in a custom function to be applied before dejittering, similar to on_chunk.

I don't have a strong preference, but on the other hand this doesn't introduce any breaking changes, the default behaviour has a negligible impact on load time (e.g. <1 second difference loading a 3GB file on my laptop according to timeit) and it's generally a good thing to give users a heads up about potential problems with their data.

* New load_xdf parameter: handle_non_monotonic (default: False) - None: disable monotonicity checking - False: check and warn only if non-monotonic data are detected - True: attempt to sort non-monotonic data * Additional tests for monotonicity

jamieforth · 2025-03-21T14:37:20Z

@cbrnr This one is ready for review now.

One thing I'm unsure about is whether this is just trying to fix a symptom of a more fundamental problem (e.g. LabRecorder, LSL or device drivers). But I have a lot of data with out-of-order timestamps which this PR seem to help with, and a possibly related issue has been raised here: sccn/labstreaminglayer#125

`handle_non_monontonic` now accepts four arguments: - None: disable monotonicity checking - "warn": check and warn only (default) - "trust_timeseries": sort time-stamps only - "trust_timestamps": sort time-stamps and reorder time-series data

cbrnr

Thanks for this extensive PR! I only did a very rough pass over the changes (skipping many many details), but I wanted to get a few things out of the way before we continue the discussion (besides my inline comments).

Most importantly, do we all agree that this functionality should go directly into the main load_xdf function? Given that this problem so far only occurs with ANT devices, it might very well be a problem on their side. If that's the case, I'd be hesitant to add such relatively complex code just to fix/work around their issue. In that case, maybe a separate tool/function which fixes a given XDF file might be a better option.

Also, and I'm not sure if I've missed it, but are we testing the new functionality directly with the load_xdf function? It looks like the tests currently use different pieces of the underlying functions directly, but I think it would be important to test our main function.

Some minor suggestions:

Can we simplify the parameter name a bit, maybe handle_nonmonotonic or something? Suggestions welcome!
What is the performance impact of sorting timestamps and/or samples? I imagine that this could be noticeable on large datasets, in which case we should at least document it.

cbrnr · 2025-05-05T09:21:04Z

CHANGELOG.md

@@ -2,6 +2,7 @@
 ### Added
 - Add new `case_sensitive` parameter to `match_streaminfos`, defaulting to `False` to maintain previous behavior; when `False`, stream properties are matched more leniently ([#134](https://github.com/xdf-modules/pyxdf/pull/134) by [Stefan Appelhoff](https://github.com/sappelhoff))
 - Expose detected clock segments (used in synchronisation) as `stream["info"]["clock_segments"]` ([#131](https://github.com/xdf-modules/pyxdf/pull/131) by [Jamie Forth](https://github.com/jamieforth))
+- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior with additional warning when non-monotonic data are detected; when `None` check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust-timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth))


Suggested change

- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior with additional warning when non-monotonic data are detected; when `None` check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust-timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth))

- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior but with an additional warning when non-monotonic data are detected; if `None`, the check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust_timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth))

cbrnr · 2025-05-05T09:24:46Z

src/pyxdf/pyxdf.py

+    # Validate handle_non_monotonic argument.
+    handle_non_monotonic = HandleNonMonoState(handle_non_monotonic)


Although this validates the input, the error message you get for an unsupported value is not very helpful (ValueError: 'whatever' is not a valid HandleNonMonoState). I would handle (no pun intended) this with a custom more specific error message, which probably makes the use of an Enum unnecessary.

cbrnr · 2025-05-05T09:26:08Z

src/pyxdf/pyxdf.py

@@ -542,6 +604,157 @@ def _scan_forward(f):
            return False


+# Named tuple to represent monotonicity information.
+Monotonicity = namedtuple("Monotonicity", ["result", "n", "dec_count", "eq_count"])


Does this need to be public? Also, and especially if this is just private, does this need to be a NamedTuple?

jamieforth force-pushed the pr/handle-non-monotonic branch 5 times, most recently from 55cfe97 to cd9610e Compare January 27, 2025 00:44

jamieforth marked this pull request as draft January 27, 2025 09:04

jamieforth force-pushed the pr/handle-non-monotonic branch 2 times, most recently from d689858 to b67ef67 Compare February 2, 2025 23:18

jamieforth mentioned this pull request Feb 3, 2025

Handle negative time intervals when dejittering #130

Merged

jamieforth force-pushed the pr/handle-non-monotonic branch from b67ef67 to 909f17a Compare February 3, 2025 14:17

jamieforth force-pushed the pr/handle-non-monotonic branch 2 times, most recently from d02ca22 to fc4ecf1 Compare February 14, 2025 08:36

Optionally handle non-monotonic data

d9f1874

* New load_xdf parameter: handle_non_monotonic (default: False) - None: disable monotonicity checking - False: check and warn only if non-monotonic data are detected - True: attempt to sort non-monotonic data * Additional tests for monotonicity

jamieforth force-pushed the pr/handle-non-monotonic branch from fc4ecf1 to d9f1874 Compare March 21, 2025 13:41

jamieforth marked this pull request as ready for review March 21, 2025 13:59

jamieforth mentioned this pull request Mar 27, 2025

jumbled timestamps sccn/labstreaminglayer#125

Open

jamieforth added 3 commits May 3, 2025 12:05

Minor comment cleanup

837a91a

Add changelog entry for xdf-modules#132

9e36b8c

cbrnr reviewed May 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optionally handle non monotonic time-series #132

Optionally handle non monotonic time-series #132

Uh oh!

jamieforth commented Jan 26, 2025 •

edited

Loading

Uh oh!

jamieforth commented Mar 21, 2025

Uh oh!

cbrnr left a comment

Uh oh!

cbrnr May 5, 2025

Uh oh!

cbrnr May 5, 2025

Uh oh!

cbrnr May 5, 2025

Uh oh!

Uh oh!

		# Validate handle_non_monotonic argument.
		handle_non_monotonic = HandleNonMonoState(handle_non_monotonic)

Optionally handle non monotonic time-series #132

Are you sure you want to change the base?

Optionally handle non monotonic time-series #132

Uh oh!

Conversation

jamieforth commented Jan 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jamieforth commented Mar 21, 2025

Uh oh!

cbrnr left a comment

Choose a reason for hiding this comment

Uh oh!

cbrnr May 5, 2025

Choose a reason for hiding this comment

Uh oh!

cbrnr May 5, 2025

Choose a reason for hiding this comment

Uh oh!

cbrnr May 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jamieforth commented Jan 26, 2025 •

edited

Loading