GPU support - Design feedback requested #310

cicero-ai · 2025-06-20T03:59:36Z

Hi Lorenzo,

Ok, submitted PR as requested, but again this code is nowhere close to being ready to merge. However, could use some design guidance if possible. Quite confident I have this right, but not 100% certain. Turns out GPU acceleration is rather intricate with various design considerations and trade offs.

To go through how a typical run will work, first as you can see within /src/gpu/models/logistic_regression/gradient_descent.rs there's that TryFrom<> trait implementation.

Then within fit() function of LogisticRegression I will add something like:

#[cfg(feature = "gpu")]
if let Ok(gpu_module) = (&params, k).try_into() {
    // Add some cool way to check # rows / samples threshold, and if matrix large enough, run it on GPU
    let mut gpu_worker = GpuWorker::new(gpu_module, X.into(), &Y);
    return gpu_worker.run().into();
}

And it will be similar small code blocks for each algorithm supported, and that's really the only modifications to actual smartcore code base. The API is going to remain exactly the same.

As long as the TryFrom<> implementation converts whatever we throw at it into a Box, indicating the algorithm supports GPU acceleration, and if the number of rows / samples is large enough, then simply a GpuWorker struct is instantiated and it handles everything then returns the exact same result the smartcore crate expects.

I think that should work, right?

Then GpuStation struct I think I got right... it's a global static, which caches / lazy loads the various GPU resources which I believe I have grouped together correctly. Only supports one GPU device for now, I figured get one working for now, then enhance to multiple device support later if needed.

One somewhat unfortunate thing is there's no real way to keep the model architectures and solvers seperate like in the smartcore design. It's just not going to work for GPU acceleration.

Then there's just a bunch of other little design trade offs throughout. For example, I don't like that GpuAlgorithm enum in there as it seems like a dirty, lost orphan enum to me, but it's actually kind of required. We need some type of hashable unique identifier for each algorithm, as it's needed within the hash map key of the workgroups property of GpuStation. And when you start playing around trying to put a trait object inside a hash map key wrapped in a mutex, things get really messy really quickly.

Anyway, I think I'm on the right track, but if I could get some extra eyes on this it would be greatly appreciated. Absolutely no rush, as again I'm going to pivot off this for a couple weeks and get the next version of my NLU engine out. I will be back to finish this though.

Cheers,
Matt

Mec-iS · 2025-06-20T10:16:20Z

Thank you!

Sure, let's take the right time. Also we need to consider everything from the user experience perspective as you mentioned.

If we can assure that we have zero-overhead from those checks and we document them correctly, it should not be a problem to have the method to fork into a GPU-specific routine.

cicero-ai · 2025-06-20T14:14:01Z

Sounds good. Yes, core tenants of my approach are transparent GPU acceleration layer, minimal changes to code base, zero API changes, and zero changes required to developer code. Existing devs will just have to enable new "gpu" feature, run existing code and get performance boost. Optionally set threshold if they desire, nothing else.

Overhead for checks is nearly non-existent. Checking # rows/ samples against threshold is O(1), and tryFrom<> slightly more but not much. Not worried about that part, more worried about architecture of the GPU layer in and of itself.

Anyway, I'm back to my NLU engine. will swing back to this in a couple weeks. If I don't hear from you before then, will drop message here to hopefully restart discussion before I jump back into it to ensure I'm on the right track.

Question, is this feature beneficial and worthwhile to the crate? Essentially what happened is I had three choices, burn out my laptop's CPU, develop Python into my Rust pipeline, or just add GPU acceleration into smartcore keeping everything Rust. I went with the last option.

Cheers,
Matt

Rough draft, don't take it seriously

1b3ce9c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

GPU support - Design feedback requested #310

GPU support - Design feedback requested #310

Uh oh!

cicero-ai commented Jun 20, 2025

Uh oh!

Mec-iS commented Jun 20, 2025

Uh oh!

cicero-ai commented Jun 20, 2025

Uh oh!

Uh oh!

GPU support - Design feedback requested #310

Are you sure you want to change the base?

GPU support - Design feedback requested #310

Uh oh!

Conversation

cicero-ai commented Jun 20, 2025

Uh oh!

Mec-iS commented Jun 20, 2025

Uh oh!

cicero-ai commented Jun 20, 2025

Uh oh!

Uh oh!