llamactl

Unified management and routing for llama.cpp, MLX and vLLM models with web dashboard.

📚 Full Documentation →

Features

🚀 Easy Model Management

Multiple Models Simultaneously: Run different models at the same time (7B for speed, 70B for quality)
Smart Resource Management: Automatic idle timeout, LRU eviction, and configurable instance limits
Web Dashboard: Modern React UI for managing instances, monitoring health, and viewing logs

🔗 Flexible Integration

OpenAI API Compatible: Drop-in replacement - route requests to different models by instance name
Multi-Backend Support: Native support for llama.cpp, MLX (Apple Silicon optimized), and vLLM
Docker Ready: Run backends in containers with full GPU support

🌐 Distributed Deployment

Remote Instances: Deploy instances on remote hosts
Central Management: Manage everything from a single dashboard with automatic routing

Quick Start

Install a backend (llama.cpp, MLX, or vLLM) - see Prerequisites below
Download llamactl for your platform
Run llamactl and open http://localhost:8080
Create an instance and start inferencing!

Prerequisites

Backend Dependencies

For llama.cpp backend: You need llama-server from llama.cpp installed:

# Homebrew (macOS)
brew install llama.cpp

# Or build from source - see llama.cpp docs
# Or use Docker - no local installation required

For MLX backend (macOS only): You need MLX-LM installed:

# Install via pip (requires Python 3.8+)
pip install mlx-lm

# Or in a virtual environment (recommended)
python -m venv mlx-env
source mlx-env/bin/activate
pip install mlx-lm

For vLLM backend: You need vLLM installed:

# Install via pip (requires Python 3.8+, GPU required)
pip install vllm

# Or in a virtual environment (recommended)
python -m venv vllm-env
source vllm-env/bin/activate
pip install vllm

# Or use Docker - no local installation required

Docker Support

llamactl can run backends in Docker containers, eliminating the need for local backend installation:

backends:
  llama-cpp:
    docker:
      enabled: true
  vllm:
    docker:
      enabled: true

Installation

Option 1: Download Binary (Recommended)

# Linux/macOS - Get latest version and download
LATEST_VERSION=$(curl -s https://api.github.com/repos/lordmathis/llamactl/releases/latest | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
curl -L https://github.com/lordmathis/llamactl/releases/download/${LATEST_VERSION}/llamactl-${LATEST_VERSION}-$(uname -s | tr '[:upper:]' '[:lower:]')-$(uname -m).tar.gz | tar -xz
sudo mv llamactl /usr/local/bin/

# Or download manually from the releases page:
# https://github.com/lordmathis/llamactl/releases/latest

# Windows - Download from releases page

Option 2: Docker (No local backend installation required)

# Clone repository and build Docker images
git clone https://github.com/lordmathis/llamactl.git
cd llamactl
mkdir -p data/llamacpp data/vllm models

# Build and start llamactl with llama.cpp CUDA backend
docker-compose -f docker/docker-compose.yml up llamactl-llamacpp -d

# Build and start llamactl with vLLM CUDA backend
docker-compose -f docker/docker-compose.yml up llamactl-vllm -d

# Build from source using multi-stage build
docker build -f docker/Dockerfile.source -t llamactl:source .

Note: Dockerfiles are configured for CUDA. Adapt base images for other platforms (CPU, ROCm, etc.).

Option 3: Build from Source

Requires Go 1.24+ and Node.js 22+

git clone https://github.com/lordmathis/llamactl.git
cd llamactl
cd webui && npm ci && npm run build && cd ..
go build -o llamactl ./cmd/server

Usage

Open http://localhost:8080
Click "Create Instance"
Choose backend type (llama.cpp, MLX, or vLLM)
Configure your model and options (ports and API keys are auto-assigned)
Start the instance and use it with any OpenAI-compatible client

Configuration

llamactl works out of the box with sensible defaults.

server:
  host: "0.0.0.0"                # Server host to bind to
  port: 8080                     # Server port to bind to
  allowed_origins: ["*"]         # Allowed CORS origins (default: all)
  allowed_headers: ["*"]         # Allowed CORS headers (default: all)
  enable_swagger: false          # Enable Swagger UI for API docs

backends:
  llama-cpp:
    command: "llama-server"
    args: []
    environment: {}               # Environment variables for the backend process
    docker:
      enabled: false
      image: "ghcr.io/ggml-org/llama.cpp:server"
      args: ["run", "--rm", "--network", "host", "--gpus", "all", "-v", "~/.local/share/llamactl/llama.cpp:/root/.cache/llama.cpp"]
      environment: {}             # Environment variables for the container

  vllm:
    command: "vllm"
    args: ["serve"]
    environment: {}               # Environment variables for the backend process
    docker:
      enabled: false
      image: "vllm/vllm-openai:latest"
      args: ["run", "--rm", "--network", "host", "--gpus", "all", "--shm-size", "1g", "-v", "~/.local/share/llamactl/huggingface:/root/.cache/huggingface"]
      environment: {}             # Environment variables for the container

  mlx:
    command: "mlx_lm.server"
    args: []
    environment: {}               # Environment variables for the backend process

instances:
  port_range: [8000, 9000]       # Port range for instances
  data_dir: ~/.local/share/llamactl         # Data directory (platform-specific, see below)
  configs_dir: ~/.local/share/llamactl/instances  # Instance configs directory
  logs_dir: ~/.local/share/llamactl/logs    # Logs directory
  auto_create_dirs: true         # Auto-create data/config/logs dirs if missing
  max_instances: -1              # Max instances (-1 = unlimited)
  max_running_instances: -1      # Max running instances (-1 = unlimited)
  enable_lru_eviction: true      # Enable LRU eviction for idle instances
  default_auto_restart: true     # Auto-restart new instances by default
  default_max_restarts: 3        # Max restarts for new instances
  default_restart_delay: 5       # Restart delay (seconds) for new instances
  default_on_demand_start: true  # Default on-demand start setting
  on_demand_start_timeout: 120   # Default on-demand start timeout in seconds
  timeout_check_interval: 5      # Idle instance timeout check in minutes

auth:
  require_inference_auth: true   # Require auth for inference endpoints
  inference_keys: []             # Keys for inference endpoints
  require_management_auth: true  # Require auth for management endpoints
  management_keys: []            # Keys for management endpoints

For detailed configuration options including environment variables, file locations, and advanced settings, see the Configuration Guide.

License

MIT License - see LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 539 Commits
.github/workflows		.github/workflows
.vscode		.vscode
cmd/server		cmd/server
docker		docker
docs		docs
pkg		pkg
webui		webui
.dockerignore		.dockerignore
.gitignore		.gitignore
CNAME		CNAME
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
docs-requirements.txt		docs-requirements.txt
go.mod		go.mod
go.sum		go.sum
mkdocs.yml		mkdocs.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llamactl

Features

Quick Start

Prerequisites

Backend Dependencies

Docker Support

Installation

Option 1: Download Binary (Recommended)

Option 2: Docker (No local backend installation required)

Option 3: Build from Source

Usage

Configuration

License

About

Uh oh!

Releases 19

Packages

Uh oh!

Contributors 5

Uh oh!

Languages

License

lordmathis/llamactl

Folders and files

Latest commit

History

Repository files navigation

llamactl

Features

Quick Start

Prerequisites

Backend Dependencies

Docker Support

Installation

Option 1: Download Binary (Recommended)

Option 2: Docker (No local backend installation required)

Option 3: Build from Source

Usage

Configuration

License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 19

Packages 0

Uh oh!

Contributors 5

Uh oh!

Languages

Packages