Skip to content

[MLForecast] Calling preprocess on test set changes the future #501

@JLiekenbrock

Description

@JLiekenbrock

What happened + What you expected to happen

Calling preprocess on a test set changes the result of the future.
I would expect preprocess to be side effect free.

Versions / Dependencies

MacOS, Python3.12.3, MLForecast:1.0.2

Reproduction script

from mlforecast.utils import generate_series, generate_prices_for_series
from mlforecast import MLForecast
import lightgbm as lgb
import pandas as pd

series = generate_series(200, equal_ends=True)

def train_test_split_last_n_pandas(df: pd.DataFrame, time_col: str = "ds", id_col: str = "unique_id", n: int = 3):
df_sorted = df.sort_values(by=[id_col, time_col])
test = df_sorted
train = df_sorted.groupby(id_col).apply(lambda group: group.iloc[:-n] if len(group) > n else group.iloc[0:0])
train = train.reset_index(drop=True)
return train, test

train, test = train_test_split_last_n_pandas(series, n =3)

fcst = MLForecast(
models=[lgb.LGBMRegressor()],
freq='D',
lags=[1],
)

fcst.fit(train)

fu1 = fcst.make_future_dataframe(h=1)

pre = fcst.preprocess(test)

fu2 = fcst.make_future_dataframe(h=1)

print(fu1.equals(fu2))

Issue Severity

Low: It annoys or frustrates me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions