-
Notifications
You must be signed in to change notification settings - Fork 117
Open
Description
Hello!
I've run into some issues while converting OptBinning to PMML format. Specifically, I used the Binning Process class and specified binning_transform_params with metric_missing='empirical'. After inspecting the resulting binning table, I noticed that the Weight-of-Evidence (WoE) value for missing entries was non-zero (it was equal to -0.182322). However, when examining the exported PMML file, the attribute mapMissingTo was set to 0 (Discretize field="age" mapMissingTo="0.0"), which contradicts my treatment of missing values.
Could you please clarify if it's possible to configure mapMissingTo reflects the empirical WoE values for missing entries instead of being 0?
Thanks!
import pandas as pd
import numpy as np
from optbinning import BinningProcess
from sklearn2pmml import sklearn2pmml
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import Pipeline
df = pd.DataFrame({
'age':[np.nan, 80, 54, 31, 32, 79, 43, np.nan, 22, 48, 62, 66, 33, 76, 60, 68, 47, 72, 20, 51, 44, 38, 25, 64, 63, 39, 52,
65, 59, 53, 73, 78, 45, 27, 57, 21, 34, 24, 42, np.nan],
'y':[1, 1, 0, 1, 0, 1, 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 0,
1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1,
0, 0, 1, 1, 0, 0]
})
pipeline = Pipeline([
('binning', BinningProcess(variable_names = ['age'],
binning_transform_params = {
'age' : {'metric' : 'woe',
'metric_missing':'empirical'}
})),
("logistic_regression", LogisticRegression(random_state=42))
])
pipeline.fit(df[['age']], df['y'])
P.S. The same problem with metric_special
Metadata
Metadata
Assignees
Labels
No labels