Skip to content

Support for cross_entropy objective function in regression context? #56

@phaidara

Description

@phaidara

Hello,

I am currently working on a project where I want to fit a model on probabilities and save it to PMML for later use in Java program.

I am training a LightGBMRegressor with the cross_entropy objective function.
The training part is working well. I am able to fit a PMMLPipeline on my data and use it to predict probabilities as expected.

But the saving to PMML part is failing with the following exception:

SEVERE: Failed to convert PKL to PMML
java.lang.IllegalArgumentException: Expected a regression-type objective function, got 'cross_entropy'
at lightgbm.sklearn.LGBMRegressor.checkLabel(LGBMRegressor.java:47)
at sklearn.Estimator.encode(Estimator.java:100)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233)
at org.jpmml.sklearn.Main.run(Main.java:217)
at org.jpmml.sklearn.Main.main(Main.java:143)

Exception in thread "main" java.lang.IllegalArgumentException: Expected a regression-type objective function, got 'cross_entropy'
at lightgbm.sklearn.LGBMRegressor.checkLabel(LGBMRegressor.java:47)
at sklearn.Estimator.encode(Estimator.java:100)
at sklearn2pmml.pipeline.PMMLPipeline.encodePMML(PMMLPipeline.java:233)
at org.jpmml.sklearn.Main.run(Main.java:217)
at org.jpmml.sklearn.Main.main(Main.java:143)

It seems that the cross_entropy objective function is not compatible with the LGBMRegressor in the jpmml LightGBM Java API. I tested the cross_entropy with an LGBMClassifier and binary targets (0, 1) instead of probabilities, and this is working fine.

Would it be possible to fix this behavior? Thanks!

Reproducible example:

## Library import
import pandas as pd
import lightgbm as lgb
import sklearn2pmml
from sklearn.datasets import make_classification
from numpy.random import default_rng

# Random classification data
seed = 1234
x, y_cls = make_classification(random_state=seed)

# Fitting classifier on binary target
classifier = lgb.LGBMClassifier(objective = "cross_entropy")
clf_pipeline = sklearn2pmml.PMMLPipeline([("classifier", classifier)])
clf_pipeline.fit(x, y_cls)
# Saving classifier is working fine
sklearn2pmml.sklearn2pmml(clf_pipeline, "working_cross_entropy_classifier.pmml")

# Generating random probability target.
rng = default_rng(seed)
y_reg = rng.uniform(low=0, high=1, size=y_cls.shape)

# Fitting regressor on probability target
regressor = lgb.LGBMRegressor(objective = "cross_entropy")
reg_pipeline = sklearn2pmml.PMMLPipeline([("regressor", regressor)])
reg_pipeline.fit(x, y_reg)

# Prediction output probability scores
reg_pipeline.predict(x)

# But saving pipeline fails with above exception:
sklearn2pmml.sklearn2pmml(reg_pipeline, "non_working_cross_entropy_regressor.pmml") 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions