Skip to content

KMeansTrainer.Options NumberOfThreads has no effect? Low CPU load during .Fit() #7477

@mikeKuester

Description

@mikeKuester

System Information:

  • Windows 11 22H2
  • ML.NET Version: Microsoft.ML 4.0.2
  • .NET Version: .NET 9.0

Describe the bug
I have a large dataset with approx. 500,000 spectras with 200 data points each. I use the kMeansTrainer for the clustering. It works - but during the "training" (which takes ~75 s) ...

var model = pipeline.Fit(trainingData);

... the CPU load is less than 20% on a i7-11850H @ 2.5 GHz CPU with 16 threads. If I got the ML.Net code right, it allows max. 16 / 2 = 8 threads. But I can set the NumberOfThreads in the trainer options from 1 to 8 - it doesn't change anything.

            var options = new KMeansTrainer.Options
            {
                NumberOfClusters = numberOfClusters,
                // InitializationAlgorithm = KMeansTrainer.InitializationAlgorithm.KMeansYinyang,
                // OptimizationTolerance = 1e-7f, // default 1e-7f
                NumberOfThreads = 8,  // default 8                
                // MaximumNumberOfIterations = 1000
            };
            var pipeline = mlContext.Clustering.Trainers.KMeans(options);

The consumed time doesn't change and the CPU load is also low.

If I reduce e.g. the MaximumNumberOfIterations or the OptimizationTolerance I can see an effect on the training time (and on the results).

For the normalization of the data I use Parallel.For loops and during this loops, I can clearly see that all thread ramp up to ~80% load for a short time.

If I dig into the code: KMeansLloydsYinYangTrain.Train in every iteration the chunks should be processes in parallel, right? If I select the maximum of the 8 threads, I would expect that 8 threads should run at max. load.

Expected behavior
I expect that the KMeans training should use the maximum allowed CPU power in order to reduce time.

Additional context
We compared our C# app with MatLab kMeans with the same data: ML.Net is faster than the MatLab implementation, but if I see that the cpu has 80 % idle time during the training, it could be even faster. ;)

Thanks,
Mike

Metadata

Metadata

Assignees

No one assigned

    Labels

    untriagedNew issue has not been triaged

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions