Your lightweight, private, local AI chatbot powered by llama.cpp (no GPU required)
A modern web interface for llama.cpp with markdown rendering, syntax highlighting, and intelligent conversation management. Chat with local LLMs through a sleek, GitHub-inspired interface.
- π€ llama.cpp Integration - Direct integration with llama.cpp server for optimal performance
- π Dynamic Model Switching - Switch between models without restarting services
- π¬ Multiple Conversations - Create, manage, and rename chat sessions
- π Persistent History - SQLite database storage with search functionality
- π Lightweight - Minimal resource usage, runs on CPU-only systems
- π Full Markdown Rendering - GitHub-flavored syntax with code highlighting
- β‘ Performance Metrics - Real-time response times, token tracking, and speed analytics
- π₯ Health Monitoring - Automatic service monitoring and restart capabilities
Install llama.cpp:
# Option 1: Build via llama_cpp_setup.sh ((recommended)
curl -fsSL https://github.com/ukkit/llama-chat/raw/main/llama_cpp_setup.sh | bash
Other installation options
# Option 2:Build from source
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release
# Option 3: Install via package manager (if available)
# Ubuntu/Debian:
# apt install llama.cpp
# macOS:
# brew install llama.cpp
which llama-server # Should show the path to llama-server
For most users (auto-install):
curl -fsSL https://github.com/ukkit/llama-chat/raw/main/install.sh | bash
What the install script does:
- β Sets up Python virtual environment
- β Downloads recommended model (~400MB)
- β Installs llama-chat with Flask frontend
- β Creates configuration files
- β Starts both llama.cpp server and web interface
Access at: http://localhost:3333
π§ Manual Installation
For detailed manual installation steps:
# Prerequisites: Python 3.8+, llama.cpp installed, and at least one .gguf model
git clone https://github.com/ukkit/llama-chat.git
cd llama-chat
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
# Download a model (optional - you can add your own)
./chat-manager.sh download-model \
"https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf" \
"qwen2.5-0.5b-instruct-q4_0.gguf"
# Start services
./chat-manager.sh start
File | Purpose |
---|---|
cm.conf |
Main chat-manager configuration (ports, performance, model settings) |
config.json |
Model parameters, timeouts, system prompt |
docs/detailed_cm.conf |
Config file with more configuration options for llama-chat and llama.cpp server |
See docs/config.md for complete configuration options.
llama-chat includes a comprehensive management script with enhanced features:
# Basic operations
./chat-manager.sh start # Start all services (llama.cpp + Flask + monitor)
./chat-manager.sh stop # Stop all services
./chat-manager.sh restart # Restart all services
./chat-manager.sh status # Show detailed service status and health
See docs/chat-manager.md for detailed operations
llama-chat works with any .gguf format model. Here are some popular options:
# Fast, lightweight (400MB) - Great for testing
./chat-manager.sh download-model \
"https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf" \
"qwen2.5-0.5b-instruct-q4_0.gguf"
# Compact, good performance (1.3GB)
./chat-manager.sh download-model \
"https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf" \
"llama3.2-1b-instruct-q4.gguf"
- Ultra-fast: tinyllama, qwen2.5:0.5b (good for testing)
- Balanced: phi3-mini, llama3.2:1b (daily use)
- High-quality: llama3.1:8b, qwen2.5:7b (when you have RAM)
- Specialized: codellama, mistral-nemo (coding, specific tasks)
Switch between models without restarting services:
# Switch to a different model
./chat-manager.sh switch-model phi3-mini-4k-instruct-q4.gguf
# Check current model
./chat-manager.sh status
# List available models
./chat-manager.sh list-models
Issue | Solution |
---|---|
llama.cpp not found | Install llama.cpp and ensure llama-server is in PATH |
Port in use | ./chat-manager.sh force-cleanup |
No models | ./chat-manager.sh download-model <url> <file> |
Process stuck | ./chat-manager.sh force-cleanup |
Slow responses | Use smaller model or adjust GPU_LAYERS |
Memory issues | Reduce context size in cm.conf |
Model switching fails | Check model file exists: ./chat-manager.sh list-models |
Services won't start | Check health: ./chat-manager.sh test |
Problem | Cause | Solution |
---|---|---|
llama-server not found | llama.cpp not installed | Install llama.cpp from source or package manager |
Permission denied | Executable permissions missing | chmod +x chat-manager.sh |
Port conflicts | Services already running | ./chat-manager.sh force-cleanup |
Python module errors | Virtual environment issues | Re-run setup: ./chat-manager.sh setup-venv |
Model loading fails | Corrupted or wrong format | Re-download model |
See docs/troubleshooting.md for comprehensive troubleshooting.
Platform | CPU | RAM | llama.cpp | Status | Notes |
---|---|---|---|---|---|
Ubuntu 20.04+ | x86_64 | 8GB+ | Source/Package | β Excellent | Primary development platform |
Windows 11 | x86_64 | 8GB+ | WSL2/Source | β Good | WSL2 recommended |
Debian 12+ | x86_64 | 8GB+ | Source/Package | β Excellent | Server deployments |
Document | Description |
---|---|
Installation Guide | Complete installation instructions |
Configuration Guide | Detailed configuration options |
API Documentation | REST API reference with examples |
Troubleshooting | Common issues and solutions |
Management Script | chat-manager.sh documentation |
Models | Model recommendations and setup |
- llama.cpp - High-performance inference engine
- Flask - Web framework
- marked.js - Markdown parser
- highlight.js - Syntax highlighting
- Hugging Face - Model hosting and community
Made with β€οΈ for the AI community
β Star this project if you find it helpful!
MIT License - see LICENSE file.