Sample to demonstrate how to run a self-hosted LLM on AWS with Nvidia GPUs in Kubox.
Kubox is an on-demand data platform designed to build and deploy analytics applications anywhere. It combines open-source Kubernetes with a customisable data infrastructure, making it easy to scale and manage complex data workloads. Kubox offers the simplicity of SaaS with the flexibility of PaaS, minimising overhead while providing a vendor-neutral data infrastructure.
https://docs.kubox.ai/introduction
Tip
Kubox is currently in its early-stage public preview and under active development. We’re continuously improving and refining the platform, so things may change as we grow. We welcome your feedback and suggestions to help shape the future of Kubox AI.
- Introduction
- Quickstart
- Installation and Setup
- Running The Application
- Local Development
- Contributing
- License
This example shows how to run a self-hosted Large Language Model (LLM) on Kubox, using [Ray] and [vLLM] to efficiently serve a Meta-Llama-3.1 model on an NVIDIA L4 GPU instance in AWS.
This example demonstrates how to run a self-hosted Large Language Model (LLM) on Kubox. The system Ray.io and vLLM to efficient serve a Meta-Llama-3.1
model on NVIDIA L4 GPUs instanace on AWS.
# 1. Install Kubox CLI
curl https://kubox.sh | sh
# 2. Clone this repository
git clone git@github.com:kubox-ai/chatbot.git && cd chatbot
# 3. Create your cluster
kubox create -f cluster.yaml
# 4. Deploy the Audio ML App
kustomize build ./cluster/infrastructure/apps | kubectl apply -f -
To download, configure and setup authentication with AWS CLI follow instructions from AWS Documentation
Run the following command to verify your AWS CLI credentials:
aws sts get-caller-identity
# Example output
{
"UserId": "AIDAIEXAMPLEID",
"Account": "123456789012",
"Arn": "arn:aws:iam::123456789012:user/example-user"
}
Download and install Kubox CLI
curl https://kubox.sh | sh
Verify Installation
kubox version
The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can download it from Kubernetes.io.
Kustomize introduces a template-free way to customize application configuration that simplifies the use of off-the-shelf applications. You can download it from kustomize.io
git clone git@github.com:kubox-ai/chatbot.git
cd chatbot
AWS Quotas for two of r6i.2xlarge
and one g6.4xlarge
.
Using a GPU requires creating an AMI in your desired AWS region. If you are using the ap-southeast-2
region, a public AMI is ami-0db4a0fc42c49f8c6
and by default configured in the cluster.yaml
file. To use GPU instances in other AWS regions, you need to create a custom AMI in that region and update cluster.yaml
file accordingly. For more information see Creating GPU Amazon Machine Image (AMI)
kubox create -f cluster.yaml
export KUBECONFIG=./cluster/config/kubeconfig
kustomize build ./cluster/infrastructure/apps | kubectl apply -f -
kubectl get pods -n kubox
Note
Currently it take about 10 minutes to start the GPU containers, download the ML models and serve the REST endpoints.
NAME READY STATUS RESTARTS AGE
chat-service-raycluster-j6spk-head-bgd89 1/1 Running 0 6m33s
chat-service-raycluster-j6spk-small-group-worker-77kz6 1/1 Running 0 6m33s
kuberay-operator-5dd6779f94-bqnn8 1/1 Running 0 66m
open-webui-557b8c6c-scksq 1/1 Running 0 66m
Forward the open-webui
service to your local machine’s port 8080 to launch the web UI.
kubectl port-forward -n kubox svc/open-webui 8080:80
Now you can access GUI at http://localhost:8080
In a separate terminal, forward the chat-service-head-svc
service to your local machine’s port 8265.
kubectl port-forward -n kubox svc/chat-service-head-svc 8265:8265
Now you can access GUI at http://localhost:8265
kubox delete -f cluster.yaml
- UV - An extremely fast Python package and project manager
Create virtual environment
uv venv --python 3.11.9
Activate the virtual environment
source .venv/bin/activate
Install dependencies
uv sync
serve run deployment.yaml
Now you can access GUI at http://localhost:8265
Tip
If you have issues creating see the troubleshooting guide here
We 💜 contributions from the community!
Whether it's a bug report, feature suggestion, documentation improvement, or code contribution — you are welcome.
- Open an Issue to report bugs or request features
- Submit a Pull Request for improvements
This repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute this project under the terms of the license. See the LICENSE file for more details.
Thank you for using Kubox. Let's build something awesome together! 🚀