Improve Snippet function #400
Replies: 11 comments 19 replies
-
@ninimama If it's okay with you, I'd like to move this over to the discussion section as I think it would be healthy/useful to discuss/debate each of the points above. We should keep in mind that STUMPY doesn't try to be everything for everyone and our goal is to reproduce the published work. |
Beta Was this translation helpful? Give feedback.
-
Sure. I think it is good to move 1 & 3 to discussion. However, if you would like to reproduce the results of the paper, I think the snippet function should produce the indices of a set of subsequences similar to each snippet (e.g: check out Fig 24 of the paper or Fig 19). However, if you think the ultimate goal of snippets is to merely provide a set of representative sequences rather than how they appear in the time series, then we can move 2 to discussion as well. |
Beta Was this translation helpful? Give feedback.
-
@ninimama I haven't looked into this part much so maybe you can help me understand how the indices are calculated? It appears that it has do with finding cross-over points.
I think this is what I'm trying to get at with an open discussion. I am open to being convinced :) |
Beta Was this translation helpful? Give feedback.
-
This is part of the snippet code. If I understand correctly, So, if my understanding is correct, all we need to do is to return those indices. Of course there might be some overlaps between one subsequence to the next, but we don't need to worry about that. We just need to return those indices. So, coloring them should give us the Fig. 24 of the paper. Please feel free to close this one and move the questions to the discussion. |
Beta Was this translation helpful? Give feedback.
-
Yes, that's what I meant.
Why do you think it's a problem? I guess one should assess how much each snippet covers the time series? |
Beta Was this translation helpful? Give feedback.
-
I am curious why that is? For |
Beta Was this translation helpful? Give feedback.
-
According to the result I got for =====================================
Are the In fact, the summation of the fraction values of all snippets is 1 for any k snippets. That's why the fraction values of ===================================== I should note that a low fraction value doesn't necessarily mean that the corresponding pattern is redundant. According to the paper, the authors stated that the changes in the areas can help the user with finding the proper k. I plotted it for different number of snippets of As you can see, it is not so obvious which k is good. ===================================== NOTE: There is no random state in the function |
Beta Was this translation helpful? Give feedback.
-
I don't know. Maybe.
Ahhh, that makes sense. Since there is no big drop in the plot of |
Beta Was this translation helpful? Give feedback.
-
I will investigate it (by checking out the MPdist profile values)
Could you please elaborate on this? Finding the optimal window size ( I tried to remove the first 50 elements of the time series and this is what I got: Here, it suggests k=2 is good, but again the discovered patterns are not satisfactory in my perspective. Is it possible that the code considers only multipliers of m as the indices of subsequence, it misses some opportunities in finding the correct pattern? |
Beta Was this translation helpful? Give feedback.
-
I tried to check it out and this is what I got for the snippets of the toy data in the notebook: Please note that a subsequence of length The ========================================================================= I think one approach to check out the functionality of So, instead, I created the I would truly appreciate if you could clarify a few things for me:
Sorry for my long questions. I am reading some articles currently to get a better idea of what's going on. I would truly appreciate if you could help me with the aforementioned process/questions. |
Beta Was this translation helpful? Give feedback.
-
A potential bug(?) If you look at the regimes of snippets I provided in the figure in my previous comment, you will see that the last index of the regime of the second snippet is 1801. But, the length of the whole time series is 2000 (and the last index is 1999). So, if 1801 is the beginning of the snippet, then the last subsequence is But, my point is this: shouldn't the last index be 1800 ? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
As I was working with snippets, I noticed the three following issues:
1- Apparently the indices of the snippet start exactly at the multiplies of m, which may not be the case for some data.
2- The function provides only one profile per snippet. However, that snippet repeats throughout the whole time series as higher fraction means it has been repeated more. So, it should be useful to have that information in the output, where I can see all the profiles (the starting index) of each snippet.
3- The fraction output is not sorted. I think it is better to sort it in descending order and accordingly provide the other outputs for the same order.
Beta Was this translation helpful? Give feedback.
All reactions