[ENH] `TransformedTargetRegressor` should mesh with linear or smooth transformations to produce exact `pdf` and `log_pdf`

**Is your feature request related to a problem? Please describe.**

LogLoss is a very commonly used proper scoring rule (e.g. logistic regression and NNs) and has efficient implementations for most distributions. The TransformTargetRegressor (TTR) as implementation does not include this. The TTR is lacking a `log_pdf` method, likely due to the numerical complications of mapping the `log_pdf` across domains. E.g. the transform function must be "[a one-to-one, monotonic function with a differentiable inverse](https://mc-stan.org/docs/reference-manual/transforms.html#univariate-changes-of-variables)".

In reality, many of the functions used to transform the target have trivial differentiable inverses and therefore, trivial solutions to mapping the log_pdf. The MinMaxScaler internals for the proposed in [SupportScaler](https://github.com/sktime/skpro/issues/588) is one such - it is a linear transformation so the derivative of the inverse is a constant.

Here is an walkthrough:

```python
import numpy as np
import pandas as pd
from sklearn.preprocessing import MinMaxScaler

from skpro.metrics import LogLoss, CRPS
from skpro.regression.xgboostlss import XGBoostLSS
from skpro.regression.compose import TransformedTargetRegressor

size = 1000

X = np.random.normal(1, 1, size)
y = 2 * X + np.random.normal(0, 0.5, size)

Xy = pd.DataFrame(
    {"target": y, "feature": X},
    index=range(size)
)

# use lower n_trails for speed-up
params = {"n_trials": 2, "dist": "Normal"}

# simple example with no scaling - logloss works
xgb = XGBoostLSS(**params)
xgb.fit(X=Xy[["feature"]], y=Xy["target"])
p = xgb.predict(Xy[["feature"]])
pp = xgb.predict_proba(Xy[["feature"]])

print(CRPS()(y_true=Xy["target"], y_pred=pp))
print(LogLoss()(y_true=Xy["target"], y_pred=pp))
```
> 0.25535
> 0.56026

Now use the TTR:

```python
xgb = XGBoostLSS(**params)
mms = MinMaxScaler()
pipe = TransformedTargetRegressor(regressor=xgb, transformer=mms)

# pipeline with TTR logloss fails
pipe.fit(X=Xy[["feature"]], y=Xy["target"])
pp = pipe.predict_proba(Xy[["feature"]])
p = pipe.predict(Xy[["feature"]])

print(CRPS()(y_true=Xy["target"], y_pred=pp))
print(LogLoss()(y_true=Xy["target"], y_pred=pp))
```
> 0.24274
> inf

CRPS returns results. When trying to perform the LogLoss on a TTR we get `inf`. We can do a manual transformation and get the LogLoss on the new domain:

```python
xgb = XGBoostLSS(**params)
mms = MinMaxScaler()
y_ = mms.fit_transform(Xy[["target"]])

xgb.fit(X=Xy[["feature"]], y=y_)
pp = xgb.predict_proba(Xy[["feature"]])
p = xgb.predict(Xy[["feature"]])
p_ = mms.inverse_transform(p.reshape(-1, 1))

crps = CRPS()(y_true=y_, y_pred=pp)
print(crps)
ll = LogLoss()(y_true=y_, y_pred=pp)
print(ll)
```
> 0.01859
> -2.0715

Both the CRPS and LogLoss can be adjusted back to their respective domains by using the [Change of Variable](https://mc-stan.org/docs/reference-manual/transforms.html#change-of-variables.section) (applying the Jacobian of the transform appropriately).

```python
jac = np.abs(1 / mms.scale_)
print((CRPS()(y_true=y_, y_pred=pp) * jac)[0])
print((ll + np.log(jac))[0])
```
> 0.25338
> 0.54385 

I believe the mathematics behind this are correct. We multiply by the Jacobian of the inverse transform for CRPS. For LogLoss we add the log-Jacobian to log_pdf.

**Describe the solution you'd like**

1) Add the whole sk-style transformer to the TransformedDistribution class. This will allow more properties/method to be accessed - currently, it only has the `inverse_transform` as a callable. Adding the whole instance would allow access to the parameters. In the case of MinMaxScaler above, this allows us to access the `.scale_` property and directly compute the deriative if the inverse.
2) Add the `log_pdf` method to the TranformedDistribution, with changes-of-variables Jacobain.

Bonus ideas:

A) Consider adding automatic differentiation of the inverse for more complex transformations.
B) Consider adding a `_dinv_dx` method (I'm sure there is a better name) to the transformer so that a user can directly formulate a derivative and skip auto-diff.

I see no reason to add the changes-of-variables version of the CRPS - it is purely here to help illustrate that the changes-of-variables score are somewhat consistent.

**Describe alternatives you've considered**

The "manual" option is the only other idea. This is prohibitive as it prevents users from getting meaningful LogLoss scores for comparison across different pipelines.

I believe the average user would be unaware of the "changes-of-variables" so would miss out on potentially using either TTR or LogLoss. Furthermore, it complements the other proposal of the SupportScaler. Again, this is a simple transformation so adding LogLoss for this is trivial. This allows LogLoss as a metric to GridSearch across multiple distributions with differing supports.

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] `TransformedTargetRegressor` should mesh with linear or smooth transformations to produce exact `pdf` and `log_pdf` #601

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENH] TransformedTargetRegressor should mesh with linear or smooth transformations to produce exact pdf and log_pdf #601

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[ENH] `TransformedTargetRegressor` should mesh with linear or smooth transformations to produce exact `pdf` and `log_pdf` #601