-
-
Notifications
You must be signed in to change notification settings - Fork 6
Description
Simple Example:
task <- tsk("usarrests")
kmeans_centers <- lapply(1:10, function(x) po("scale") %>>% lrn("clust.kmeans", centers = x))
design = benchmark_grid(
tasks = task,
learners = kmeans_centers,
resamplings = rsmp("insample")
)
bmr = benchmark(design)
bmr$score(msr("clust.wss"))$clust.wsswill throw an output like
[1] 355807.82 114846.81 81862.19 79208.07 70152.06 68255.12 68148.43 63241.63 54304.11 43632.32
The output from wss is obviously too high to be scaled.
The problem can be found in MeasureClustInternal that takes the "raw" task without any preprocessing to calculate the features.
I think, this is probably only an issue that mlr3cluster suffers from, as all other Measures are only dependent on the predictions ...?
mlr3cluster/R/MeasureClustInternal.R
Lines 22 to 30 in 23b3bef
| private = list( | |
| .score = function(prediction, task, ...) { | |
| X = as.matrix(task$data(rows = prediction$row_ids)) | |
| if (!is.double(X)) { # clusterCrit does not convert lgls/ints | |
| storage.mode(X) = "double" | |
| } | |
| intCriteria(X, prediction$partition, self$crit)[[1L]] | |
| } | |
| ) |
This could be avioded if there is any generic access to the preprocessed task in the pipeline.
In this case, one could exchange the taske in the function by the learner itself.
The problem is, if I enter the state of a trained pipeline, stored preprocessed Tasks are empty...