Consider making `assert_predictable` configurable

Let's assume one has a task which has a factor predictor, and only a subset of the factor levels is known when training the model (meaning the data to be scored can have new factor levels). `mlr3pipelines` offers `po("fixfactors")` to handle such situations, but one can add arbitrary preprocessing steps to make the pipeline even more robust against such edge cases.

However, because `assert_predictable()` is called _before_ any preprocessing transformations are applied to the `newdata` argument in `$predict_newdata()`, one has to manually prepare the prediction data set which goes against the general concept of mlr3, and especially of `mlr3pipelines`.

Minimal example:
```r
library(mlr3)
library(mlr3pipelines)
training_data = iris
prediction_data = transform(iris[1,], Species  = "NewSpecies")
tsk = as_task_regr(training_data, target = "Sepal.Length")
model = as_learner(po("fixfactors") %>>% lrn("regr.featureless"))
model$train(tsk)
model$predict_newdata(prediction_data)
# Error: Learner 'fixfactors.regr.featureless' received task with different column info (feature type or factor level ordering) during train and predict.
```

What is the current idiomatic way to avoid a strict check of column types and especially factor levels? Or do we really have to move the required preprocessing steps out of the learner graph and apply the transformations on the prediction data _before_ passing it to `model$predict_newdata()` to please `mlr3::assert_predictable()`?


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Consider making `assert_predictable` configurable #1402

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Consider making assert_predictable configurable #1402

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Consider making `assert_predictable` configurable #1402