-
-
Notifications
You must be signed in to change notification settings - Fork 95
Description
Let's assume one has a task which has a factor predictor, and only a subset of the factor levels is known when training the model (meaning the data to be scored can have new factor levels). mlr3pipelines offers po("fixfactors") to handle such situations, but one can add arbitrary preprocessing steps to make the pipeline even more robust against such edge cases.
However, because assert_predictable() is called before any preprocessing transformations are applied to the newdata argument in $predict_newdata(), one has to manually prepare the prediction data set which goes against the general concept of mlr3, and especially of mlr3pipelines.
Minimal example:
library(mlr3)
library(mlr3pipelines)
training_data = iris
prediction_data = transform(iris[1,], Species = "NewSpecies")
tsk = as_task_regr(training_data, target = "Sepal.Length")
model = as_learner(po("fixfactors") %>>% lrn("regr.featureless"))
model$train(tsk)
model$predict_newdata(prediction_data)
# Error: Learner 'fixfactors.regr.featureless' received task with different column info (feature type or factor level ordering) during train and predict.What is the current idiomatic way to avoid a strict check of column types and especially factor levels? Or do we really have to move the required preprocessing steps out of the learner graph and apply the transformations on the prediction data before passing it to model$predict_newdata() to please mlr3::assert_predictable()?