llama-chat 🦙

Your lightweight, private, local AI chatbot powered by llama.cpp (no GPU required)

A modern web interface for llama.cpp with markdown rendering, syntax highlighting, and intelligent conversation management. Chat with local LLMs through a sleek, GitHub-inspired interface.

✨ Features

🤖 llama.cpp Integration - Direct integration with llama.cpp server for optimal performance
🔄 Dynamic Model Switching - Switch between models without restarting services
💬 Multiple Conversations - Create, manage, and rename chat sessions
📚 Persistent History - SQLite database storage with search functionality
🚀 Lightweight - Minimal resource usage, runs on CPU-only systems
📝 Full Markdown Rendering - GitHub-flavored syntax with code highlighting
⚡ Performance Metrics - Real-time response times, token tracking, and speed analytics
🏥 Health Monitoring - Automatic service monitoring and restart capabilities

🚀 Quick Start

Prerequisites

⚠️ Before installing llama-chat, you need to have llama.cpp installed on your system ⚠️

Install llama.cpp:

# Option 1: Build via llama_cpp_setup.sh ((recommended)
curl -fsSL https://github.com/ukkit/llama-chat/raw/main/llama_cpp_setup.sh | bash

Other installation options

# Option 2:Build from source
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build
cmake --build build --config Release

# Option 3: Install via package manager (if available)
# Ubuntu/Debian:
# apt install llama.cpp

# macOS:
# brew install llama.cpp

⚠️ Make sure llama-server is in your PATH ⚠️

which llama-server  # Should show the path to llama-server

30-Second Quick Start

For most users (auto-install):

curl -fsSL https://github.com/ukkit/llama-chat/raw/main/install.sh | bash

What the install script does:

✅ Sets up Python virtual environment
✅ Downloads recommended model (~400MB)
✅ Installs llama-chat with Flask frontend
✅ Creates configuration files
✅ Starts both llama.cpp server and web interface

Access at: http://localhost:3333

🔧 Manual Installation

For detailed manual installation steps:

# Prerequisites: Python 3.8+, llama.cpp installed, and at least one .gguf model
git clone https://github.com/ukkit/llama-chat.git
cd llama-chat
python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

# Download a model (optional - you can add your own)
./chat-manager.sh download-model \
  "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf" \
  "qwen2.5-0.5b-instruct-q4_0.gguf"

# Start services
./chat-manager.sh start

📸 Screenshots

📷 App Screenshots

Main interface

Chat Interface

Select Models from Dropdown

Model Switch

Switch Model by Selecting existin Chat

Model Switching complete

Full Markdown rendering

Configuration Files

File	Purpose
`cm.conf`	Main chat-manager configuration (ports, performance, model settings)
`config.json`	Model parameters, timeouts, system prompt
`docs/detailed_cm.conf`	Config file with more configuration options for llama-chat and llama.cpp server

See docs/config.md for complete configuration options.

🔧 Enhanced Management Commands

llama-chat includes a comprehensive management script with enhanced features:

Core Operations

# Basic operations
./chat-manager.sh start              # Start all services (llama.cpp + Flask + monitor)
./chat-manager.sh stop               # Stop all services
./chat-manager.sh restart            # Restart all services
./chat-manager.sh status             # Show detailed service status and health

See docs/chat-manager.md for detailed operations

🤖 Supported Models

llama-chat works with any .gguf format model. Here are some popular options:

Recommended Starter Models

# Fast, lightweight (400MB) - Great for testing
./chat-manager.sh download-model \
  "https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct-GGUF/resolve/main/qwen2.5-0.5b-instruct-q4_0.gguf" \
  "qwen2.5-0.5b-instruct-q4_0.gguf"

# Compact, good performance (1.3GB)
./chat-manager.sh download-model \
  "https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF/resolve/main/Llama-3.2-1B-Instruct-Q4_K_M.gguf" \
  "llama3.2-1b-instruct-q4.gguf"

Model Categories

Ultra-fast: tinyllama, qwen2.5:0.5b (good for testing)
Balanced: phi3-mini, llama3.2:1b (daily use)
High-quality: llama3.1:8b, qwen2.5:7b (when you have RAM)
Specialized: codellama, mistral-nemo (coding, specific tasks)

Dynamic Model Switching

Switch between models without restarting services:

# Switch to a different model
./chat-manager.sh switch-model phi3-mini-4k-instruct-q4.gguf

# Check current model
./chat-manager.sh status

# List available models
./chat-manager.sh list-models

🔧 Need Help

Issue	Solution
llama.cpp not found	Install llama.cpp and ensure `llama-server` is in PATH
Port in use	`./chat-manager.sh force-cleanup`
No models	`./chat-manager.sh download-model <url> <file>`
Process stuck	`./chat-manager.sh force-cleanup`
Slow responses	Use smaller model or adjust GPU_LAYERS
Memory issues	Reduce context size in cm.conf
Model switching fails	Check model file exists: `./chat-manager.sh list-models`
Services won't start	Check health: `./chat-manager.sh test`

Common Installation Issues

Problem	Cause	Solution
llama-server not found	llama.cpp not installed	Install llama.cpp from source or package manager
Permission denied	Executable permissions missing	`chmod +x chat-manager.sh`
Port conflicts	Services already running	`./chat-manager.sh force-cleanup`
Python module errors	Virtual environment issues	Re-run setup: `./chat-manager.sh setup-venv`
Model loading fails	Corrupted or wrong format	Re-download model

See docs/troubleshooting.md for comprehensive troubleshooting.

✔️ Tested Platforms

Platform	CPU	RAM	llama.cpp	Status	Notes
Ubuntu 20.04+	x86_64	8GB+	Source/Package	✅ Excellent	Primary development platform
Windows 11	x86_64	8GB+	WSL2/Source	✅ Good	WSL2 recommended
Debian 12+	x86_64	8GB+	Source/Package	✅ Excellent	Server deployments

📚 Documentation

Document	Description
Installation Guide	Complete installation instructions
Configuration Guide	Detailed configuration options
API Documentation	REST API reference with examples
Troubleshooting	Common issues and solutions
Management Script	chat-manager.sh documentation
Models	Model recommendations and setup

🙏 Acknowledgments

llama.cpp - High-performance inference engine
Flask - Web framework
marked.js - Markdown parser
highlight.js - Syntax highlighting
Hugging Face - Model hosting and community

Made with ❤️ for the AI community

⭐ Star this project if you find it helpful!

MIT License - see LICENSE file.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

llama-chat 🦙

✨ Features

🚀 Quick Start

Prerequisites

30-Second Quick Start

📸 Screenshots

Configuration Files

🔧 Enhanced Management Commands

Core Operations

🤖 Supported Models

Recommended Starter Models

Model Categories

Dynamic Model Switching

🔧 Need Help

Common Installation Issues

✔️ Tested Platforms

📚 Documentation

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
docs		docs
static		static
templates		templates
LICENSE		LICENSE
README.md		README.md
app.py		app.py
chat-manager.sh		chat-manager.sh
cm.conf		cm.conf
config.json		config.json
install.sh		install.sh
llama_cpp_setup.sh		llama_cpp_setup.sh
requirements.txt		requirements.txt

License

ukkit/llama-chat

Folders and files

Latest commit

History

Repository files navigation

llama-chat 🦙

✨ Features

🚀 Quick Start

Prerequisites

30-Second Quick Start

📸 Screenshots

Configuration Files

🔧 Enhanced Management Commands

Core Operations

🤖 Supported Models

Recommended Starter Models

Model Categories

Dynamic Model Switching

🔧 Need Help

Common Installation Issues

✔️ Tested Platforms

📚 Documentation

🙏 Acknowledgments

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages