-
Notifications
You must be signed in to change notification settings - Fork 17
[ML-49316] Support MonthMid and MonthEnd for DeepAR #160
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
num_months = 24 | ||
|
||
# Starting from end day of January 2020 | ||
base_dates = pd.date_range( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it starting on the last ay of Jan 2020?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see line 228, by specifying freq='M', it is by default the end of the month
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Can you add a comment on the line below?
@@ -18,6 +18,50 @@ | |||
import pandas as pd | |||
|
|||
|
|||
def validate_and_generate_index(df: pd.DataFrame, time_col: str, frequency: str): | |||
""" | |||
Generate a complete time index for the given DataFrame based on the specified frequency. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed function description!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for fixing it
num_months = 24 | ||
|
||
# Starting from end day of January 2020 | ||
base_dates = pd.date_range( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Can you add a comment on the line below?
This PR fixes the bug that when user dataset has monthly frequency and the day of the month is not the first day, DeepAR will fail. The bug results from this line,
freq
is "MS" so the generated new_index_full will always be the first day of month. So this line,will generate a df with all rows in target column to be NaN.
To fix the bug, this PR introduces
validate_and_generate_index
, to generate a complete time index for the given DataFrame based on the specified frequency. If it is monthly frequency, it will generate the index based on the given day of month, also detect if it is the end of month.pd.date_range(total_min, total_max, freq=frequency)
To test the function,
Run the below command locally
Month start: notebook
Month mid: notebook
Month end: notebook