Autoscaling Llama.cpp server clusters in GPU spot instances. #6764

jboero · 2024-04-19T09:20:03Z

jboero
Apr 19, 2024

I wrote up a Terraform module PoC for autoscaling clusters of Llama.cpp server in GCP spot instances with GPUs. Autoscaling is a bit overzealous but it tends to work pretty well. Anyone curious to try I'd love to hear feedback.

https://github.com/jboero/terraform-google-llama-autoscale

phymbert · 2024-04-19T09:22:33Z

phymbert
Apr 19, 2024
Collaborator

Great, please have a look to:

kubernetes example #6546

I believe helm approach is more flexible, but we can also introduce a terraform example.

0 replies

jboero · 2024-04-19T09:44:58Z

jboero
Apr 19, 2024
Author

How does the K8s sample work with GPUs though? Each autoscale VM here adds another GPU (or set of GPUs) to the cluster. As I understand most cloud managed Kubernetes options don't support GPUs or customizing kernel/drivers either.

4 replies

phymbert Apr 19, 2024
Collaborator

It works really fine with GCP Kubernetes offer, just add a GPU pool, and classical autoscalling works pretty fine.

jboero Apr 19, 2024
Author

Oh cool that's good to know. Do you know what the cost roughly comes out to be?

phymbert Apr 20, 2024
Collaborator

Add a GPU pool:

gcloud container node-pools create xx-gpu \
    --cluster xxx \
    --accelerator type=nvidia-l4,count=2,gpu-driver-version=latest \
    --num-nodes=1 \
    --enable-autoscaling  \
    --min-nodes 0 --max-nodes 2 \
    --zone=europe-west3-b \
    --machine-type g2-standard-24

No idea regarding particular pricing override from the machine type pricing.

jboero Apr 22, 2024
Author

Oh nice this supports spot instances also (--spot). It's been a while since I've played with GKE but all checks out at the same rates of $0.08/node for Skylake spot instances in europe-west4. Also it seems the GKE layer doesn't add any management costs. This screenshot from a bare minimal 3 node cluster.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Autoscaling Llama.cpp server clusters in GPU spot instances. #6764

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 4 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Autoscaling Llama.cpp server clusters in GPU spot instances. #6764

Uh oh!

jboero Apr 19, 2024

Replies: 2 comments · 4 replies

Uh oh!

phymbert Apr 19, 2024 Collaborator

Uh oh!

Uh oh!

jboero Apr 19, 2024 Author

Uh oh!

phymbert Apr 19, 2024 Collaborator

Uh oh!

Uh oh!

jboero Apr 19, 2024 Author

Uh oh!

phymbert Apr 20, 2024 Collaborator

Uh oh!

jboero Apr 22, 2024 Author

jboero
Apr 19, 2024

Replies: 2 comments 4 replies

phymbert
Apr 19, 2024
Collaborator

jboero
Apr 19, 2024
Author

phymbert Apr 19, 2024
Collaborator

jboero Apr 19, 2024
Author

phymbert Apr 20, 2024
Collaborator

jboero Apr 22, 2024
Author