Skip to content

Fixed #1077 Add extra check for window size #1078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Apr 6, 2025

Conversation

NimaSarajpoor
Copy link
Collaborator

See #1077

@NimaSarajpoor NimaSarajpoor requested a review from seanlaw as a code owner March 27, 2025 18:48
Copy link

codecov bot commented Mar 27, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.32%. Comparing base (9504301) to head (949db7e).
Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #1078   +/-   ##
=======================================
  Coverage   97.31%   97.32%           
=======================================
  Files          93       93           
  Lines       15239    15266   +27     
=======================================
+ Hits        14830    14857   +27     
  Misses        409      409           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please provide a few test cases? It's hard to know what the parameters are related to

Copy link
Collaborator Author

@NimaSarajpoor NimaSarajpoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanlaw
I've added two tests. I left one comment for your consideration.

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Mar 30, 2025

@seanlaw
IIRC, you mentioned before that we can simply check the condition for all subsequences. After going through the long explanation of my logic, I am thinking of changing the approach.

l = n - m + 1
# leftmost neighbor is at index `0`
# rightmost neighbor is at index `l-1`
for i in range(l):
     max_gap = max(i - 0, (l-1)-i)
     if max_gap <= excl_zone:
         msg = ....   # warning message
         warnings.warn(msg)
         break

What is your opinion? It is simpler. Its only downside is that it takes longer but still should be fast and its impact on performance should be negligible.

@seanlaw
Copy link
Contributor

seanlaw commented Mar 30, 2025

I don't mind it if the time is negligible but I wonder if we can either replace the for loop with numpy vectorized functions?

@NimaSarajpoor
Copy link
Collaborator Author

but I wonder if we can either replace the for loop with numpy vectorized functions?

This should work.

l = n - m + 1
indices = np.arange(l)

# leftmost neighbor is at index `0`
# rightmost neighbor is at index `l-1`
max_gaps = np.maximum(indices - 0, (l - 1) - indices)

if np.any(max_gaps <= excl_zone):    # ALTERNATIVELY: np.min(max_gaps) <= excl_zone
    msg = ...   #  warning message
    warnings.warn(msg)

I don't mind it if the time is negligible

Cool. Will check it.

@seanlaw
Copy link
Contributor

seanlaw commented Mar 30, 2025

This should work.

Hmmm, it feels like there's likely a simple equation that can be used rather than generating the full set of indices and then checking max_gaps. It feels like there is a simple pattern

Update

I think it's simply if l // 2 <= excl_zone

I have no idea how to interpret this (i.e., why is this the constraint from a conceptual perspective?)...

Actually, as I think about it more, I think I understand.

For any time series, T, an "eligible nearest neighbor" subsequence for the central-most subsequence (located at index i = l // 2) must be located outside of the excl_zone and this central-most subsequence will ALWAYS have the smallest number of "eligible nearest neighbors" amongst all other subsequences. Thus, all we need to do is check whether the excl_zone eliminates all "eligible nearest neighbors" for the central-most subsequence in T. If it doesn't, then that implies that all other subsequences within T will have at least one or more eligible nearest neighbor(s) that will lie outside of their respective excl_zone.

It looks like there might be a caveat if l has an even length?

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Mar 30, 2025

For any time series, T, an "eligible nearest neighbor" subsequence for the central-most subsequence (located at index i = l // 2) must be located outside of the excl_zone and this central-most subsequence will ALWAYS have the smallest number of "eligible nearest neighbors" amongst all other subsequences. Thus, all we need to do is check whether the excl_zone eliminates all "eligible nearest neighbors" for the central-most subsequence in T. If it doesn't, then that implies that all other subsequences within T will have at least one or more eligible nearest neighbor(s) that will lie outside of their respective excl_zone

That was the logic I was trying to explain in that very long comment (from a previous commit: b7494d9). You have a strong skill for delivering a concise message while avoiding verbosity!!!

Thus, all we need to do is check whether the excl_zone eliminates all "eligible nearest neighbors" for the central-most subsequence in T

YES! EXACTLY!! And, in fact, we just need to check if its furthest neighbour is eliminated. For the central-most subsequence that starts at math.ceil((l-1) / 2)), its furthest neighbour is located at 0.

I think it's simply if l // 2 <= excl_zone

It looks like there might be a caveat if l has an even length?

I think math.ceil((l - 1) / 2)) should work.

l = n - m + 1
idx = math.ceil((l-1) / 2))
minmax_gap = idx - 0   # 0 is the leftmost neighbour of subsequence that starts at index `idx`

# `l` is even. Indices are  {0, 1, 2, 3, 4, 5}: minmax_gap = 3
# `l` is odd. Indices are  {0, 1, 2, 3, 4}: minmax_gap = 2

@NimaSarajpoor
Copy link
Collaborator Author

NimaSarajpoor commented Mar 30, 2025

@seanlaw
Okay... Now that we have a more concise, less verbose explanation, I will switch back to my previous approach and will check for that central-most subsequence. I will replace my old comment with the one you shared.

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks pretty good. I left a few minor comments

Copy link
Contributor

@seanlaw seanlaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NimaSarajpoor Everything looks good on my end aside from a few minor changes. Please let me know if this is ready to be merged.

@NimaSarajpoor
Copy link
Collaborator Author

@seanlaw

Everything looks good on my end aside from a few minor changes. Please let me know if this is ready to be merged.

I addressed the comments. If you have no concern regarding this comment, it should be ready to be merged now.

@seanlaw seanlaw merged commit d5e7607 into stumpy-dev:main Apr 6, 2025
27 checks passed
@seanlaw
Copy link
Contributor

seanlaw commented Apr 6, 2025

Thank you for contributing this @NimaSarajpoor!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants