Skip to content

[ENH] TransformedTargetRegressor should mesh with linear or smooth transformations to produce exact pdf and log_pdf #601

@joshdunnlime

Description

@joshdunnlime

Is your feature request related to a problem? Please describe.

LogLoss is a very commonly used proper scoring rule (e.g. logistic regression and NNs) and has efficient implementations for most distributions. The TransformTargetRegressor (TTR) as implementation does not include this. The TTR is lacking a log_pdf method, likely due to the numerical complications of mapping the log_pdf across domains. E.g. the transform function must be "a one-to-one, monotonic function with a differentiable inverse".

In reality, many of the functions used to transform the target have trivial differentiable inverses and therefore, trivial solutions to mapping the log_pdf. The MinMaxScaler internals for the proposed in SupportScaler is one such - it is a linear transformation so the derivative of the inverse is a constant.

Here is an walkthrough:

import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

from skpro.metrics import LogLoss, CRPS
from skpro.regression.xgboostlss import XGBoostLSS
from skpro.regression.compose import TransformedTargetRegressor

size = 1000

X = np.random.normal(1, 1, size)
y = 2 * X + np.random.normal(0, 0.5, size)

Xy = pd.DataFrame(
    {"target": y, "feature": X},
    index=range(size)
)

# use lower n_trails for speed-up
params = {"n_trials": 2, "dist": "Normal"}

# simple example with no scaling - logloss works
xgb = XGBoostLSS(**params)
xgb.fit(X=Xy[["feature"]], y=Xy["target"])
p = xgb.predict(Xy[["feature"]])
pp = xgb.predict_proba(Xy[["feature"]])

print(CRPS()(y_true=Xy["target"], y_pred=pp))
print(LogLoss()(y_true=Xy["target"], y_pred=pp))

0.25535
0.56026

Now use the TTR:

xgb = XGBoostLSS(**params)
mms = MinMaxScaler()
pipe = TransformedTargetRegressor(regressor=xgb, transformer=mms)

# pipeline with TTR logloss fails
pipe.fit(X=Xy[["feature"]], y=Xy["target"])
pp = pipe.predict_proba(Xy[["feature"]])
p = pipe.predict(Xy[["feature"]])

print(CRPS()(y_true=Xy["target"], y_pred=pp))
print(LogLoss()(y_true=Xy["target"], y_pred=pp))

0.24274
inf

CRPS returns results. When trying to perform the LogLoss on a TTR we get inf. We can do a manual transformation and get the LogLoss on the new domain:

xgb = XGBoostLSS(**params)
mms = MinMaxScaler()
y_ = mms.fit_transform(Xy[["target"]])

xgb.fit(X=Xy[["feature"]], y=y_)
pp = xgb.predict_proba(Xy[["feature"]])
p = xgb.predict(Xy[["feature"]])
p_ = mms.inverse_transform(p.reshape(-1, 1))

crps = CRPS()(y_true=y_, y_pred=pp)
print(crps)
ll = LogLoss()(y_true=y_, y_pred=pp)
print(ll)

0.01859
-2.0715

Both the CRPS and LogLoss can be adjusted back to their respective domains by using the Change of Variable (applying the Jacobian of the transform appropriately).

jac = np.abs(1 / mms.scale_)
print((CRPS()(y_true=y_, y_pred=pp) * jac)[0])
print((ll + np.log(jac))[0])

0.25338
0.54385

I believe the mathematics behind this are correct. We multiply by the Jacobian of the inverse transform for CRPS. For LogLoss we add the log-Jacobian to log_pdf.

Describe the solution you'd like

  1. Add the whole sk-style transformer to the TransformedDistribution class. This will allow more properties/method to be accessed - currently, it only has the inverse_transform as a callable. Adding the whole instance would allow access to the parameters. In the case of MinMaxScaler above, this allows us to access the .scale_ property and directly compute the deriative if the inverse.
  2. Add the log_pdf method to the TranformedDistribution, with changes-of-variables Jacobain.

Bonus ideas:

A) Consider adding automatic differentiation of the inverse for more complex transformations.
B) Consider adding a _dinv_dx method (I'm sure there is a better name) to the transformer so that a user can directly formulate a derivative and skip auto-diff.

I see no reason to add the changes-of-variables version of the CRPS - it is purely here to help illustrate that the changes-of-variables score are somewhat consistent.

Describe alternatives you've considered

The "manual" option is the only other idea. This is prohibitive as it prevents users from getting meaningful LogLoss scores for comparison across different pipelines.

I believe the average user would be unaware of the "changes-of-variables" so would miss out on potentially using either TTR or LogLoss. Furthermore, it complements the other proposal of the SupportScaler. Again, this is a simple transformation so adding LogLoss for this is trivial. This allows LogLoss as a metric to GridSearch across multiple distributions with differing supports.

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    API designAPI design & software architecturefeature requestNew feature or requestmodule:regressionprobabilistic regression modulemodule:transformationstransformations module: feature extraction, pre-/post-processing

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions