Skip to content

Consider making assert_predictable configurable #1402

@tdeenes

Description

@tdeenes

Let's assume one has a task which has a factor predictor, and only a subset of the factor levels is known when training the model (meaning the data to be scored can have new factor levels). mlr3pipelines offers po("fixfactors") to handle such situations, but one can add arbitrary preprocessing steps to make the pipeline even more robust against such edge cases.

However, because assert_predictable() is called before any preprocessing transformations are applied to the newdata argument in $predict_newdata(), one has to manually prepare the prediction data set which goes against the general concept of mlr3, and especially of mlr3pipelines.

Minimal example:

library(mlr3)
library(mlr3pipelines)
training_data = iris
prediction_data = transform(iris[1,], Species  = "NewSpecies")
tsk = as_task_regr(training_data, target = "Sepal.Length")
model = as_learner(po("fixfactors") %>>% lrn("regr.featureless"))
model$train(tsk)
model$predict_newdata(prediction_data)
# Error: Learner 'fixfactors.regr.featureless' received task with different column info (feature type or factor level ordering) during train and predict.

What is the current idiomatic way to avoid a strict check of column types and especially factor levels? Or do we really have to move the required preprocessing steps out of the learner graph and apply the transformations on the prediction data before passing it to model$predict_newdata() to please mlr3::assert_predictable()?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions