-
Notifications
You must be signed in to change notification settings - Fork 4
Description
I'm looking for a good tool to benchmark ELKI ( http://elki.dbs.ifi.lmu.de/ ) clustering performance across parameters.
The problem is, that the parameters aren't as nicely uniform as in your examples, and they have strong interdependencies.
The most interesting parameter obviously is the clustering algorithm. Say I'm looking only at k-means and DBSCAN for this example (but there are tons more in ELKI, which is why I could need benchmarking tool support).
- k-means has the key parameters "kmeans.k" (the number of clusters) and the initialization method. Randomized initialization methods will also have a seed parameter, to fix the random seed.
- for DBSCAN, the key parameters are the distance function, the radius epsilon (which depends a lot on the distance function), and minPts which interplays with the radius: a larger radius will need a larger minPts.
The big challenge here are the dependencies of the parameters. The most simple one is that the "k" parameter only exists for k-means, whereas for DBSCAN one needs to choose distance function, minPts and epsilon. But then, there are also k-means initialization heuristics that have parameters such as the random seed...
Will 3x be able to handle such complex cases?