This repo hosts a kubernetes operator that is responsible for creating and managing llama-stack server.
- Automated deployment of Llama Stack servers
- Support for multiple distributions (includes Ollama, vLLM, and others)
- Customizable server configurations
- Volume management for model storage
- Kubernetes-native resource management
You can install the operator directly from a released version or the latest main branch using kubectl apply -f
.
To install the latest version from the main branch:
kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/main/release/operator.yaml
To install a specific released version (e.g., v1.0.0), replace main
with the desired tag:
kubectl apply -f https://raw.githubusercontent.com/llamastack/llama-stack-k8s-operator/v1.0.0/release/operator.yaml
- Deploy Inference provider server (ollama, vllm etc)
- Create LlamaStackDistribution CR to get the server running. Example-
apiVersion: llamastack.io/v1alpha1
kind: LlamaStackDistribution
metadata:
name: llamastackdistribution-sample
namespace: <your-namespace>
spec:
replicas: 1
server:
distribution:
name: ollama
containerSpec:
port: 8321
env:
- name: INFERENCE_MODEL
value: "meta-llama/Llama-3.2-3B-Instruct"
- name: OLLAMA_URL
value: "http://ollama-server-service.default.svc.cluster.local:11434"
storage:
size: "20Gi"
mountPath: "/home/lls/.lls"
- Verify the server pod is running in the user define namespace.
- Kubernetes cluster (v1.20 or later)
- Go version go1.23
- operator-sdk v1.39.2 (v4 layout) or newer
- kubectl configured to access your cluster
- A running inference server:
- For local development, you can use the provided script:
/hack/deploy-ollama.sh
- For local development, you can use the provided script:
-
Custom operator image can be built using your local repository
make image IMG=quay.io/<username>/llama-stack-k8s-operator:<custom-tag>
The default image used is
quay.io/llamastack/llama-stack-k8s-operator:latest
when not supply argument formake image
-
Once the image is created, the operator can be deployed directly. For each deployment method a kubeconfig should be exported
export KUBECONFIG=<path to kubeconfig>
Deploying operator locally
-
Deploy the created image in your cluster using following command:
make deploy IMG=quay.io/<username>/llama-stack-k8s-operator:<custom-tag>
-
To remove resources created during installation use:
make undeploy
The operator includes end-to-end (E2E) tests to verify the complete functionality of the operator. To run the E2E tests:
- Ensure you have a running Kubernetes cluster
- Run the E2E tests using one of the following commands:
- If you want to deploy the operator and run tests:
make deploy test-e2e
- If the operator is already deployed:
make test-e2e
- If you want to deploy the operator and run tests:
The make target will handle prerequisites including deploying ollama server.
Please refer to api documentation