-
Notifications
You must be signed in to change notification settings - Fork 25
Open
Description
When using eurybia on small dataframes the computed AUC can be below 0.5, even if you compare the same dataframe in baseline and current. It is caused by the train test split on the concatenated data.
A solution could be to apply the train test split with the same seed on both baseline and current dataframes before concatenating them.
An other quick solution could be to duplicate the data to have enough data for a balanced train test split.
To reproduce:
from eurybia import SmartDrift
import pandas as pd
df = pd.DataFrame([[0,1],[0,1],[0,1],[0,2],[0,2],[0,2],[0,2]], columns=["A","B"])
sd = SmartDrift(
df_current=df,
df_baseline=df,
)
sd.compile()
sd.generate_report(
output_file="auc_test.html",
title_story="AUC Test",
)Metadata
Metadata
Assignees
Labels
No labels