Skip to content

Optionally handle non monotonic time-series #132

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

jamieforth
Copy link
Contributor

@jamieforth jamieforth commented Jan 26, 2025

Attempt to handle out-of-order timestamps/samples.

New load_xdf parameter: handle_non_monotonic (default: "warn")

  • None: disable monotonicity checking (previous behaviour)
  • "warn": check and warn only (otherwise previous behaviour)
  • "trust_timeseries": sort time-stamps only
  • "trust_timestamps": sort time-stamps and reorder time-series data accordingly

Sometimes LabRecorder XDF files contain out-of-order ground-truth time-stamps (~3% in my case). The problem seems to originate in certain device-specific proprietary software generating the LSL streams – not LSL itself, LabRecorder or network infrastructure.

This PR helps process such files by clock-segment-wise sorting time-stamps (and optionally reordering samples) after synchronisation and before dejittering, thus avoiding highly segmented streams. When non-monotonic data are detected the user must determine whether to trust the original time-stamps or sample sequence order, or just ignore/disable the check if this is not important for them.

If this PR is just a workaround for one broken device it probably doesn't belong in pyxdf. An alternative could be to instead provide a new load parameter for the user to pass in a custom function to be applied before dejittering, similar to on_chunk.

I don't have a strong preference, but on the other hand this doesn't introduce any breaking changes, the default behaviour has a negligible impact on load time (e.g. <1 second difference loading a 3GB file on my laptop according to timeit) and it's generally a good thing to give users a heads up about potential problems with their data.

@jamieforth jamieforth force-pushed the pr/handle-non-monotonic branch 5 times, most recently from 55cfe97 to cd9610e Compare January 27, 2025 00:44
@jamieforth jamieforth marked this pull request as draft January 27, 2025 09:04
@jamieforth jamieforth force-pushed the pr/handle-non-monotonic branch 2 times, most recently from d689858 to b67ef67 Compare February 2, 2025 23:18
@jamieforth jamieforth force-pushed the pr/handle-non-monotonic branch from b67ef67 to 909f17a Compare February 3, 2025 14:17
@jamieforth jamieforth force-pushed the pr/handle-non-monotonic branch 2 times, most recently from d02ca22 to fc4ecf1 Compare February 14, 2025 08:36
* New load_xdf parameter: handle_non_monotonic (default: False)
  - None: disable monotonicity checking
  - False: check and warn only if non-monotonic data are detected
  - True: attempt to sort non-monotonic data

* Additional tests for monotonicity
@jamieforth jamieforth force-pushed the pr/handle-non-monotonic branch from fc4ecf1 to d9f1874 Compare March 21, 2025 13:41
@jamieforth jamieforth marked this pull request as ready for review March 21, 2025 13:59
@jamieforth
Copy link
Contributor Author

@cbrnr This one is ready for review now.

One thing I'm unsure about is whether this is just trying to fix a symptom of a more fundamental problem (e.g. LabRecorder, LSL or device drivers). But I have a lot of data with out-of-order timestamps which this PR seem to help with, and a possibly related issue has been raised here: sccn/labstreaminglayer#125

jamieforth added 3 commits May 3, 2025 12:05
`handle_non_monontonic` now accepts four arguments:
  - None: disable monotonicity checking
  - "warn": check and warn only (default)
  - "trust_timeseries": sort time-stamps only
  - "trust_timestamps": sort time-stamps and reorder time-series data
Copy link
Contributor

@cbrnr cbrnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this extensive PR! I only did a very rough pass over the changes (skipping many many details), but I wanted to get a few things out of the way before we continue the discussion (besides my inline comments).

Most importantly, do we all agree that this functionality should go directly into the main load_xdf function? Given that this problem so far only occurs with ANT devices, it might very well be a problem on their side. If that's the case, I'd be hesitant to add such relatively complex code just to fix/work around their issue. In that case, maybe a separate tool/function which fixes a given XDF file might be a better option.

Also, and I'm not sure if I've missed it, but are we testing the new functionality directly with the load_xdf function? It looks like the tests currently use different pieces of the underlying functions directly, but I think it would be important to test our main function.

Some minor suggestions:

  • Can we simplify the parameter name a bit, maybe handle_nonmonotonic or something? Suggestions welcome!
  • What is the performance impact of sorting timestamps and/or samples? I imagine that this could be noticeable on large datasets, in which case we should at least document it.

@@ -2,6 +2,7 @@
### Added
- Add new `case_sensitive` parameter to `match_streaminfos`, defaulting to `False` to maintain previous behavior; when `False`, stream properties are matched more leniently ([#134](https://github.com/xdf-modules/pyxdf/pull/134) by [Stefan Appelhoff](https://github.com/sappelhoff))
- Expose detected clock segments (used in synchronisation) as `stream["info"]["clock_segments"]` ([#131](https://github.com/xdf-modules/pyxdf/pull/131) by [Jamie Forth](https://github.com/jamieforth))
- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior with additional warning when non-monotonic data are detected; when `None` check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust-timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior with additional warning when non-monotonic data are detected; when `None` check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust-timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth))
- Add new `handle_non_monotonic` parameter to `load_xdf`, defaulting to `"warn"` to maintain previous behavior but with an additional warning when non-monotonic data are detected; if `None`, the check is bypassed entirely. Non-monotonic data are handled by sorting time-stamps (`"trust_timeseries"`) or by re-ordering samples according to sorted time-stamps (`"trust_timestamps"`) ([#132](https://github.com/xdf-modules/pyxdf/pull/132) by [Jamie Forth](https://github.com/jamieforth))

Comment on lines +223 to +224
# Validate handle_non_monotonic argument.
handle_non_monotonic = HandleNonMonoState(handle_non_monotonic)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although this validates the input, the error message you get for an unsupported value is not very helpful (ValueError: 'whatever' is not a valid HandleNonMonoState). I would handle (no pun intended) this with a custom more specific error message, which probably makes the use of an Enum unnecessary.

@@ -542,6 +604,157 @@ def _scan_forward(f):
return False


# Named tuple to represent monotonicity information.
Monotonicity = namedtuple("Monotonicity", ["result", "n", "dec_count", "eq_count"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this need to be public? Also, and especially if this is just private, does this need to be a NamedTuple?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants