Run a full local RAG pipeline on your low-end, CPU-only Windows machine. Private, resilient, secure.
TL;DR: See SUMMARY.md for the stack breakdown and performance.
Llamabox lets you self-host a complete AI and database stack inside WSL2 + Debian, optimized for CPU-only, low-resource machines. It's built for privacy-first applications like chatbots, AI search, and offline assistants—no GPU, no OpenAI keys, no cloud APIs.
🧠 Llamabox includes:
- llama.cpp: CPU-only inference engine
- Redis Stack: Vector database for embeddings
- Neo4j: Graph-based knowledge base
- Secure systemd-based setup: Auto-restarting services
- Optional browser extension: Capture and sync content from Chrome/Edge
- Overview
- Why WSL2 + Debian?
- Use Cases
- Key Features
- Prerequisites
- Installation
- Service Management
- Performance Benchmarks
- FAQ & Troubleshooting
- Contributing
- License & Credits
- Browser Extension
- ✅ WSL2 runs native Linux with low overhead on Windows
- ✅ Debian is lightweight and rock-stable
- ✅ No Docker needed – systemd and all services run directly under WSL
- ✅ Keeps everything local and private, with no cloud dependencies
- 💬 Local chatbots and AI assistants
- 🔍 Search over documents, pages, and structured data
- 🧩 Graph-based reasoning with Neo4j
- 🧠 Embed and store knowledge using Redis vectors
- 🛡️ Fully offline / air-gapped deployments
- Passwordless local use; optional
fail2ban
,ufw
for edge exposure - No SSH exposed by default
- Auto security updates with
unattended-upgrades
- All critical services are systemd-managed
- Auto-restarts on crash or reboot
- Logs available via
journalctl
CPU-only, cloud-free, privacy-first Retrieval-Augmented Generation:
- User sends query →
llama.cpp
- Query embedding → Redis vector DB
- Knowledge retrieved → Neo4j
- Final answer generated → All local
- ~1GB idle memory usage
- Runs on as little as 2 cores and 4GB RAM
- Zero GPU required
- Windows 10/11 with WSL2
- Installed Debian distro via Microsoft Store or
wsl --install -d Debian
- Min. 4GB RAM (8GB recommended)
- 20GB free disk space
# In Windows Terminal:
wsl --install -d Debian
# Inside Debian WSL shell:
sudo apt update && sudo apt install git -y
git clone https://github.com/rajatasusual/llamabox
cd llamabox
./setup.sh
🔧 See INSTALLATION.md for customization and optional steps.
# Check service statuses (Redis, Neo4j, llama-server, etc.)
./scripts/check.sh
# Or manually start a specific service:
sudo systemctl restart neo4j
sudo journalctl -u llama-server.service
📘 More in MANAGE.md
Test device: 4-core AMD Z1, 4GB RAM, WSL2 Debian
Model: LLaMA 3B Q8_0
Threads | Prompt Type | Tokens/sec | Notes |
---|---|---|---|
2 | pp512 |
253.07 ± 23.75 | Long-form |
2 | tg128 |
54.44 ± 4.87 | Short query |
✅ Runs smoothly on CPU-only setup
✅ Great for background tasks and lightweight chatbots
✅ All on a 10-year-old laptop? Yes.
-
❓ Systemd isn't working in WSL2
✅ Add this to/etc/wsl.conf
:[boot] systemd=true
-
❓ "Out of memory" loading model
✅ Try a smaller GGUF model
✅ Or edit.wslconfig
on Windows:[wsl2] memory=8GB
-
❓ Redis or Neo4j not starting?
✅ Run./scripts/check.sh
✅ Or restart manually:sudo systemctl restart redis-stack-server
More in FAQs.md
We’d love your help!
- Create issues, fix bugs, suggest features
- PRs welcome: fork → feature branch → pull request
- Style guide and guidelines coming soon
Licensed under the MIT License.
Shout-outs:
The Llamabox Extension captures web pages and sends them to your local server for embedding.
🔹 Features:
- Extract full article text or selection
- Sync with WSL2 HTTP server
- Works offline, configurable shortcuts
🔧 To install:
- Clone the repo
- Load
extension/
as an unpacked extension in Chrome or Edge - Set WSL IP in config page