A kubectl plugin for deploying and managing AI/ML models using the Kubernetes AI Toolchain Operator (Kaito).
kubectl-kaito simplifies AI model deployment on Kubernetes by providing an intuitive command-line interface that abstracts away complex YAML configurations. Deploy, manage, and interact with large language models and other AI workloads with simple commands.
- One-command deployment Deploy AI models with a single command that automatically provisions GPU nodes and configures the inference stack
- Real-time monitoring Monitor workspace deployment status with real-time conditions, NodeClaim tracking, and detailed health checks
- OpenAI-compatible APIs Interact with deployed models through an OpenAI-compatible chat interface with customizable system prompts
- Model discovery Browse and discover Kaito pre-configured AI models with detailed specifications and GPU requirements
- Seamless endpoint access Access inference endpoints automatically using Kubernetes API proxy - works anywhere kubectl works without manual setup
# List available models
kubectl kaito models list
# Deploy a model for inference
kubectl kaito deploy --workspace-name my-workspace \
--model phi-3.5-mini-instruct \
--instance-type Standard_NC6s_v3
# Check deployment status
kubectl kaito status --workspace-name my-workspace
# Get inference endpoint
kubectl kaito get-endpoint --workspace-name my-workspace
# Start interactive chat
kubectl kaito chat --workspace-name my-workspace
- Kubernetes cluster with GPU nodes
- Kaito operator installed in your cluster
- kubectl configured to access your cluster
Prerequisites: Install krew if you haven't already.
kubectl krew install kaito
# Get the script
curl -sO https://raw.githubusercontent.com/kaito-project/kaito-kubectl-plugin/refs/heads/main/hack/generate-krew-manifest.sh
export RELEASE_TAG=v0.1.1
# Generate manifest for a specific version with real SHA256 values
chmod +x ./generate-krew-manifest.sh && ./generate-krew-manifest.sh $RELEASE_TAG
# Install the generated manifest
kubectl krew install --manifest=krew/kaito-$RELEASE_TAG.yaml
kubectl kaito --help
# Deploy Phi-3.5 Mini for general inference
kubectl kaito deploy \
--workspace-name phi-workspace \
--model phi-3.5-mini-instruct \
--instance-type Standard_NC6s_v3
# Monitor deployment
kubectl kaito status --workspace-name phi-workspace --watch
# Test the deployment
kubectl kaito chat --workspace-name phi-workspace
# Fine-tune a model with your data
kubectl kaito deploy \
--workspace-name tune-phi \
--model phi-3.5-mini-instruct \
--tuning \
--tuning-method qlora \
--input-urls "https://example.com/training-data.parquet" \
--output-image "myregistry.azurecr.io/phi-tuned:v1" \
--output-image-secret my-registry-secret
# Deploy the fine-tuned model
kubectl kaito deploy \
--workspace-name phi-tuned \
--model phi-3.5-mini-instruct \
--adapters phi-adapter="myregistry.azurecr.io/phi-tuned:v1"
# Deploy Llama-2 70B across multiple nodes
kubectl create secret generic hf-token --from-literal=token=your_token
kubectl kaito deploy \
--workspace-name large-llama \
--model llama-2-70b \
--model-access-secret hf-token \
--instance-type Standard_NC24ads_A100_v4 \
--count 4
Command | Description |
---|---|
deploy |
Deploy a Kaito workspace for model inference or fine-tuning |
status |
Check status of Kaito workspaces |
get-endpoint |
Get inference endpoints for a workspace |
chat |
Interactive chat with deployed AI models |
models |
Manage and list supported AI models |
# Clone the repository
git clone https://github.com/kaito-project/kaito-kubectl-plugin.git
cd kaito-kubectl-plugin
# Build the plugin
make build
# Make sure to uninstall the krew plugin to be able to run the local binary
kubectl krew uninstall kaito
# Run the cli from the local binary
./bin/kubectl-kaito --help
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.