Skip to content

Measures that rely on Tasks do not work for Pipelines #13

@henrifnk

Description

@henrifnk

Simple Example:

task <- tsk("usarrests")
kmeans_centers <- lapply(1:10, function(x) po("scale") %>>% lrn("clust.kmeans", centers = x))
design = benchmark_grid(
  tasks = task,
  learners = kmeans_centers,
  resamplings = rsmp("insample")
)
bmr = benchmark(design)
bmr$score(msr("clust.wss"))$clust.wss

will throw an output like

[1] 355807.82 114846.81 81862.19 79208.07 70152.06 68255.12 68148.43 63241.63 54304.11 43632.32

The output from wss is obviously too high to be scaled.

The problem can be found in MeasureClustInternal that takes the "raw" task without any preprocessing to calculate the features.
I think, this is probably only an issue that mlr3cluster suffers from, as all other Measures are only dependent on the predictions ...?

private = list(
.score = function(prediction, task, ...) {
X = as.matrix(task$data(rows = prediction$row_ids))
if (!is.double(X)) { # clusterCrit does not convert lgls/ints
storage.mode(X) = "double"
}
intCriteria(X, prediction$partition, self$crit)[[1L]]
}
)

This could be avioded if there is any generic access to the preprocessed task in the pipeline.
In this case, one could exchange the taske in the function by the learner itself.
The problem is, if I enter the state of a trained pipeline, stored preprocessed Tasks are empty...

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions