Skip to content

AUC below 0.5 #60

@MLecardonnel

Description

@MLecardonnel

When using eurybia on small dataframes the computed AUC can be below 0.5, even if you compare the same dataframe in baseline and current. It is caused by the train test split on the concatenated data.
A solution could be to apply the train test split with the same seed on both baseline and current dataframes before concatenating them.
An other quick solution could be to duplicate the data to have enough data for a balanced train test split.

To reproduce:

from eurybia import SmartDrift
import pandas as pd

df = pd.DataFrame([[0,1],[0,1],[0,1],[0,2],[0,2],[0,2],[0,2]], columns=["A","B"])

sd = SmartDrift(
    df_current=df,
    df_baseline=df,
)

sd.compile()

sd.generate_report(
    output_file="auc_test.html",
    title_story="AUC Test",
)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions