Reproducibility Issue With Parallel Processing?

Here is code where I would have expected the aggregate results at the end for two identical benchmarks to be identical, but they are not. Since I am only an intermediate level coder in R, perhaps there is something wrong with my code. In any event, I pass this along for your consideration as a possible issue in mlr3automl. As you can imagine, this code takes a while to execute, ~10 minutes on my iMac Pro.

```
#############################################################
# Cross-validating the regression learners
#############################################################

library("doFuture")
library("doRNG")
library("future")
library("future.apply")
library("mlr3verse")
library("mlr3automl")
library("mlr3hyperband")

# set logger thresholds

lgr::get_logger("mlr3")$set_threshold("error")
lgr::get_logger("bbotk")$set_threshold("error")

# specify regression learners

learners = list(
  lrn(
    "regr.featureless",
    id = "fl"
  ),
  lrn(
    "regr.lm",
    id = "lm"
  ),
  lrn(
    "regr.cv_glmnet",
    id = "glm"
  ),
  lrn(
    "regr.ranger",
    id = "rf"
  ),
  lrn(
    "regr.xgboost",
    id = "xgb"
  ),
  lrn(
    "regr.svm",
    id = "svm"
  )
)

learner_ids = sapply(
  learners,
  function(x) x$id
)

# define regression task

task = tsk("boston_housing")

# select small subset of features

task$select(c("age", "crim", "lat", "lon"))

# specify resampling

resampling = rsmp("cv")

# specify measure

measure = msr("regr.mse")

# autotuners for models with hyperparameters

learners[[3]] = create_autotuner(
  learner = lrn("regr.cv_glmnet"),
  tuner = tnr("hyperband")
)

learners[[4]] = create_autotuner(
  learner = lrn("regr.ranger"),
  tuner = tnr("hyperband"),
  num_effective_vars = length(
    task$feature_names
  )
)

learners[[5]] = create_autotuner(
  learner = lrn("regr.xgboost"),
  tuner = tnr("hyperband")
)

learners[[6]] = create_autotuner(
  learner = lrn("regr.svm"),
  tuner = tnr("hyperband")
)

# create benchmark grid

design = benchmark_grid(
  tasks = task,
  learners = learners,
  resamplings = resampling
)

# start parallel processing

registerDoFuture()
plan(multisession, workers = availableCores() - 1)
registerDoRNG(123456)

# execute benchmark

bmr1 = mlr3::benchmark(design)

# terminate parallel processing

plan(sequential)

# start parallel processing

registerDoFuture()
plan(multisession, workers = availableCores() - 1)
registerDoRNG(123456)

# execute benchmark

bmr2 = mlr3::benchmark(design)

# terminate parallel processing

plan(sequential)

# test for reproducibility

bmr1$aggregate()$regr.mse == bmr2$aggregate()$regr.mse
```

Here are a couple of interesting clues. If I run this code several times, the end result is the same each time (i.e., the same mix of TRUE and FALSE results for the different stochastic learners). But if I run this code in R and then run the same code in RStudio, I get a different mix of TRUE and FALSE results depending on the platform. Finally, if I run this code substituting a different dataset, then I get a different mix of TRUE and FALSE results at the end.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reproducibility Issue With Parallel Processing? #22

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Reproducibility Issue With Parallel Processing? #22

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions