Skip to content

[ML-49316] Support MonthMid and MonthEnd for DeepAR #160

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 31, 2025

Conversation

Lanz-db
Copy link

@Lanz-db Lanz-db commented Jan 31, 2025

This PR fixes the bug that when user dataset has monthly frequency and the day of the month is not the first day, DeepAR will fail. The bug results from this line,

new_index_full = pd.date_range(total_min, total_max, freq=frequency)

freq is "MS" so the generated new_index_full will always be the first day of month. So this line,

df.reindex(new_index_full)

will generate a df with all rows in target column to be NaN.

To fix the bug, this PR introduces

  1. a helper function, validate_and_generate_index , to generate a complete time index for the given DataFrame based on the specified frequency. If it is monthly frequency, it will generate the index based on the given day of month, also detect if it is the end of month.
  2. Use this helper function instead of pd.date_range(total_min, total_max, freq=frequency)

To test the function,

  • Unit tests added in utils_test.py
    Run the below command locally
PYTHONPATH=~/automl/runtime/ pytest tests/automl_runtime/forecast/deepar/utils_test.py

num_months = 24

# Starting from end day of January 2020
base_dates = pd.date_range(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is it starting on the last ay of Jan 2020?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see line 228, by specifying freq='M', it is by default the end of the month

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you add a comment on the line below?

@@ -18,6 +18,50 @@
import pandas as pd


def validate_and_generate_index(df: pd.DataFrame, time_col: str, frequency: str):
"""
Generate a complete time index for the given DataFrame based on the specified frequency.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed function description!

Copy link
Contributor

@maggiewang-db maggiewang-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for fixing it

num_months = 24

# Starting from end day of January 2020
base_dates = pd.date_range(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Can you add a comment on the line below?

@Lanz-db Lanz-db merged commit 1561d1e into branch-0.2.20.5 Jan 31, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants