Self-hosted LLM in Kubox

Sample to demonstrate how to run a self-hosted LLM on AWS with Nvidia GPUs in Kubox.

Kubox

Kubox is an on-demand data platform designed to build and deploy analytics applications anywhere. It combines open-source Kubernetes with a customisable data infrastructure, making it easy to scale and manage complex data workloads. Kubox offers the simplicity of SaaS with the flexibility of PaaS, minimising overhead while providing a vendor-neutral data infrastructure.

https://docs.kubox.ai/introduction

Tip

Kubox is currently in its early-stage public preview and under active development. We’re continuously improving and refining the platform, so things may change as we grow. We welcome your feedback and suggestions to help shape the future of Kubox AI.

Introduction
- System Architecture Overview
Quickstart
Installation and Setup
Running The Application
Local Development
- Software Pre-requisites
- Developing and Serving LLMs using Ray
  - Setup Virtual Environment
  - Start Ray Service
Contributing
License

Introduction

This example shows how to run a self-hosted Large Language Model (LLM) on Kubox, using [Ray] and [vLLM] to efficiently serve a Meta-Llama-3.1 model on an NVIDIA L4 GPU instance in AWS.

This example demonstrates how to run a self-hosted Large Language Model (LLM) on Kubox. The system Ray.io and vLLM to efficient serve a Meta-Llama-3.1 model on NVIDIA L4 GPUs instanace on AWS.

System Architecture Overview

Quickstart

# 1. Install Kubox CLI
curl https://kubox.sh | sh

# 2. Clone this repository
git clone git@github.com:kubox-ai/chatbot.git && cd chatbot

# 3. Create your cluster
kubox create -f cluster.yaml

# 4. Deploy the Audio ML App
kustomize build ./cluster/infrastructure/apps | kubectl apply -f -

Installation and Setup

Install AWS CLI

To download, configure and setup authentication with AWS CLI follow instructions from AWS Documentation

Run the following command to verify your AWS CLI credentials:

aws sts get-caller-identity

# Example output
{
    "UserId": "AIDAIEXAMPLEID",
    "Account": "123456789012",
    "Arn": "arn:aws:iam::123456789012:user/example-user"
}

Install Kubox CLI

Download and install Kubox CLI

curl https://kubox.sh | sh

Verify Installation

kubox version

Install kubectl

The Kubernetes command-line tool, kubectl, allows you to run commands against Kubernetes clusters. You can download it from Kubernetes.io.

Install kustomize

Kustomize introduces a template-free way to customize application configuration that simplifies the use of off-the-shelf applications. You can download it from kustomize.io

Clone Repository

git clone git@github.com:kubox-ai/chatbot.git
cd chatbot

Check AWS Quotas for EC2 Instances

AWS Quotas for two of r6i.2xlarge and one g6.4xlarge.

Creating a GPU Amazon Machine Image (AMI)

Using a GPU requires creating an AMI in your desired AWS region. If you are using the ap-southeast-2 region, a public AMI is ami-0db4a0fc42c49f8c6 and by default configured in the cluster.yaml file. To use GPU instances in other AWS regions, you need to create a custom AMI in that region and update cluster.yaml file accordingly. For more information see Creating GPU Amazon Machine Image (AMI)

Running The Application

Create Cluster

kubox create -f cluster.yaml

Deploy LLMs

export KUBECONFIG=./cluster/config/kubeconfig

kustomize build ./cluster/infrastructure/apps | kubectl apply -f -

Verifying Kubox Cluster

kubectl get pods -n kubox

Note

Currently it take about 10 minutes to start the GPU containers, download the ML models and serve the REST endpoints.

NAME                                                     READY   STATUS    RESTARTS   AGE
chat-service-raycluster-j6spk-head-bgd89                 1/1     Running   0          6m33s
chat-service-raycluster-j6spk-small-group-worker-77kz6   1/1     Running   0          6m33s
kuberay-operator-5dd6779f94-bqnn8                        1/1     Running   0          66m
open-webui-557b8c6c-scksq                                1/1     Running   0          66m

Launch the Web UI

Forward the open-webui service to your local machine’s port 8080 to launch the web UI.

kubectl port-forward -n kubox svc/open-webui 8080:80

Now you can access GUI at http://localhost:8080

Verify Ray Service

In a separate terminal, forward the chat-service-head-svc service to your local machine’s port 8265.

kubectl port-forward -n kubox svc/chat-service-head-svc 8265:8265

Now you can access GUI at http://localhost:8265

Tear Down

kubox delete -f cluster.yaml

Local Development

Software Pre-requisites

UV - An extremely fast Python package and project manager

Developing and Serving LLMs using Ray

Setup Virtual Environment

Create virtual environment

uv venv --python 3.11.9

Activate the virtual environment

source .venv/bin/activate

Install dependencies

uv sync

Start Ray Service

serve run deployment.yaml

Now you can access GUI at http://localhost:8265

Tip

If you have issues creating see the troubleshooting guide here

Contributing

We 💜 contributions from the community!
Whether it's a bug report, feature suggestion, documentation improvement, or code contribution — you are welcome.

Open an Issue to report bugs or request features
Submit a Pull Request for improvements

License

This repository is licensed under the Apache License 2.0. You are free to use, modify, and distribute this project under the terms of the license. See the LICENSE file for more details.

Thank you for using Kubox. Let's build something awesome together! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.github		.github
cluster/infrastructure		cluster/infrastructure
docs/images		docs/images
src/serve		src/serve
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
cluster.yaml		cluster.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Self-hosted LLM in Kubox

Kubox

Introduction

System Architecture Overview

Quickstart

Installation and Setup

Install AWS CLI

Install Kubox CLI

Install kubectl

Install kustomize

Clone Repository

Check AWS Quotas for EC2 Instances

Creating a GPU Amazon Machine Image (AMI)

Running The Application

Create Cluster

Deploy LLMs

Verifying Kubox Cluster

Launch the Web UI

Verify Ray Service

Tear Down

Local Development

Software Pre-requisites

Developing and Serving LLMs using Ray

Setup Virtual Environment

Start Ray Service

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Languages

kubox-ai/chatbot

Folders and files

Latest commit

History

Repository files navigation

Self-hosted LLM in Kubox

Kubox

Introduction

System Architecture Overview

Quickstart

Installation and Setup

Install AWS CLI

Install Kubox CLI

Install kubectl

Install kustomize

Clone Repository

Check AWS Quotas for EC2 Instances

Creating a GPU Amazon Machine Image (AMI)

Running The Application

Create Cluster

Deploy LLMs

Verifying Kubox Cluster

Launch the Web UI

Verify Ray Service

Tear Down

Local Development

Software Pre-requisites

Developing and Serving LLMs using Ray

Setup Virtual Environment

Start Ray Service

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Languages

Packages