Autoscaling Llama.cpp server clusters in GPU spot instances. #6764
jboero
started this conversation in
Show and tell
Replies: 2 comments 4 replies
-
Great, please have a look to: I believe helm approach is more flexible, but we can also introduce a terraform example. |
Beta Was this translation helpful? Give feedback.
0 replies
-
How does the K8s sample work with GPUs though? Each autoscale VM here adds another GPU (or set of GPUs) to the cluster. As I understand most cloud managed Kubernetes options don't support GPUs or customizing kernel/drivers either. |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I wrote up a Terraform module PoC for autoscaling clusters of Llama.cpp server in GCP spot instances with GPUs. Autoscaling is a bit overzealous but it tends to work pretty well. Anyone curious to try I'd love to hear feedback.
https://github.com/jboero/terraform-google-llama-autoscale
Beta Was this translation helpful? Give feedback.
All reactions