A robust, production-ready web interface for Large Language Models (LLMs) featuring a hybrid architecture with FastAPI backend and Streamlit frontend. Built for developers, researchers, and AI enthusiasts who need a comprehensive platform for LLM interaction, document processing, and API integration.
Hybrid Design - Combines the best of both worlds:
- FastAPI Backend (
localhost:8000
) - Entry point atmain.py
with high-performance async API and comprehensive endpoints suitable for asynchronous workload - Streamlit Frontend (
localhost:8501
) - Entry point atstreamlit_app.py
with intuitive web interface and automatic backend detection - Modular Services - Includes
rag_service.py
,ollama.py
,file_ingest.py
,enhanced_extractors.py
, andenhanced_document_processor.py
for specialized functionalities - Intelligent Fallback - Seamlessly switches between FastAPI and local processing based on backend availability
- β‘ High-Performance API - Async FastAPI backend for scalable LLM processing
- π§ Dual Backend Support - Seamlessly switch between Ollama (local) and vLLM (Hugging Face) backends
- π RAG Integration - Upload documents (PDFs, DOCX, TXT) with enhanced extraction and query with context-aware responses
- π Auto-Failover - Intelligent backend detection with graceful fallbacks
- π€ Multi-Model Support - Access to popular models through vLLM or local Ollama models
- π Auto-Generated API Docs - Interactive Swagger UI at
/docs
- π RESTful Endpoints - Complete API for chat, RAG, and model management
- π Pure Python Stack - Easy to extend, customize, and deploy with modular Python files for RAG and LLM interaction
- π¦ Dependency Management - Reproducible installs with
uv
- π Local-First - Runs entirely on localhost, no external dependencies
- π CORS Configured - Proper cross-origin resource sharing setup
- βοΈ Health Monitoring - Built-in health checks and status monitoring
- π Streaming Support - Real-time response streaming capabilities
The instructions below are tested against this repository: https://github.com/debabratamishra/litemind-ui and Docker images pushed to Docker Hub under the user debabratamishra1
(https://hub.docker.com/u/debabratamishra1).
One-line installer (downloads pre-built Docker images and starts services):
π Note: Docker deployment currently supports Ollama backend only. vLLM backend support will be added in a future release.
curl -fsSL https://raw.githubusercontent.com/debabratamishra/litemind-ui/main/install.sh | bash
What this does:
- Downloads and starts pre-built Docker images from Docker Hub (user:
debabratamishra1
) - Writes basic configuration files if missing
- Starts frontend and backend services using docker-compose
If you prefer to inspect the compose file before starting, see Option 1 (manual) below.
Manual Docker Hub Installation
# Download the production compose file
curl -O https://raw.githubusercontent.com/debabratamishra/litemind-ui/main/docker-compose.hub.yml
# Create required directories (only needed once)
mkdir -p uploads chroma_db storage .streamlit logs
# Start services with the provided compose file
docker-compose -f docker-compose.hub.yml up -d
Available Docker images (hosted on Docker Hub under debabratamishra1
):
- Backend: https://hub.docker.com/r/debabratamishra1/litemindui-backend
- Frontend: https://hub.docker.com/r/debabratamishra1/litemindui-frontend
Quick start (build and run locally with Docker):
π Note: Docker deployment currently supports Ollama backend only. vLLM backend support will be added in a future release.
- Clone the repository
git clone https://github.com/debabratamishra/litemind-ui
cd litemind-ui
- Setup Docker environment
make setup
# or manually: ./scripts/docker-setup.sh
- Start the application
make up
# or: docker-compose up -d
- Access the application
- Frontend (Streamlit): http://localhost:8501
- Backend API: http://localhost:8000
- API Documentation: http://localhost:8000/docs
Prerequisites for Docker:
- Docker and Docker Compose installed
- Ollama running on host system (if you plan to use local Ollama models) at
http://localhost:11434
- At least 4GB RAM (8GB+ recommended)
See DOCKER.md
for advanced configuration and troubleshooting.
Make commands for Docker Hub images (already provided in the Makefile):
make hub-up # Start with Docker Hub images
make hub-down # Stop Docker Hub services
make version # Show version management options
Use this if you prefer to run services locally without Docker. These instructions assume Python 3.12+ and a virtual environment.
- Clone the repository
git clone https://github.com/debabratamishra/litemind-ui
cd litemind-ui
- Create and activate a virtual environment, then install dependencies
python -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -r requirements.txt
- Create required directories
mkdir -p uploads .streamlit
Environment variables you may want to set (examples):
export OLLAMA_BASE_URL="http://localhost:11434"
export UPLOAD_FOLDER="./uploads"
Notes:
- Running the full stack natively requires additional setup (Ollama, model files, vLLM/GPU drivers) and is intended for development.
- For most users, the Docker-based Quick Install is the simplest way to get started.
Quick notes for different host platforms:
- macOS (Apple Silicon / M1/M2): Docker will run amd64 images under emulation which can be slower. The installer now auto-sets DOCKER_DEFAULT_PLATFORM=linux/amd64 for arm64 hosts. If you prefer, set it manually before running the installer:
export DOCKER_DEFAULT_PLATFORM=linux/amd64
curl -fsSL https://raw.githubusercontent.com/debabratamishra/litemind-ui/main/install.sh | bash
-
macOS (Intel) and Linux (Ubuntu): The quick-install should work as-is provided Docker and docker-compose (or the Docker Compose CLI plugin) are installed.
-
Windows: Run the installer inside WSL2 (recommended) or Git Bash. Plain PowerShell/cmd doesn't provide bash by default. Example using WSL2:
wsl
# inside WSL shell
curl -fsSL https://raw.githubusercontent.com/debabratamishra/litemind-ui/main/install.sh | bash
If you run into platform/architecture errors during image pull, try pulling manually and inspecting logs:
docker-compose -f docker-compose.hub.yml pull
docker-compose -f docker-compose.hub.yml up -d
docker-compose -f docker-compose.hub.yml logs -f
2. **Install dependencies**
```bash
uv pip install -r requirements.txt
- Create required directories
mkdir -p uploads .streamlit
- The
UPLOAD_FOLDER
can be customized via environment variables.
Endpoint | Method | Description |
---|---|---|
/health |
GET | Backend health check |
/models |
GET | Available Ollama models |
/api/chat |
POST | Process chat messages (supports both Ollama and vLLM backends) |
/api/chat/stream |
POST | Streaming chat responses (supports both backends) |
/api/rag/upload |
POST | Upload documents for RAG processing |
/api/rag/query |
POST | Query uploaded documents with context-aware responses |
/api/rag/documents |
GET | List uploaded documents |
/api/vllm/models |
GET | Available vLLM models and configuration |
/api/vllm/set-token |
POST | Configure Hugging Face access token |
- Navigate to the Chat tab
- Select Backend: Choose between Ollama (local) or vLLM (Hugging Face)
- Configure Models:
- For Ollama: Select from locally installed models
- For vLLM: Choose from popular models or enter custom model names
- Enter your message and receive AI responses
- Switch to the RAG tab
- Upload PDF, TXT, or DOCX files
- Choose Backend: RAG works with both Ollama and vLLM backends
- Query your documents with natural language
- Get contextually relevant answers
- Seamless Integration: Switch between backends without losing your current page
- Model Persistence: Backend-specific model selections are preserved
- Automatic Configuration: UI adapts based on selected backend capabilities
Easily interact with the LiteMindUI backend from your applications.
import requests
response = requests.post(
"http://localhost:8000/api/chat",
json={"message": "Hello, world!", "model": "llama3.1"}
)
print(response.json()["response"])
const fetch = require('node-fetch');
fetch('http://localhost:8000/api/chat', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ message: 'Hello, world!', model: 'llama3.1' })
})
.then(res => res.json())
.then(data => console.log(data.response))
.catch(err => console.error(err));
For a complete list of endpoints and request/response formats, visit the Swagger UI:
Create .streamlit/config.toml
:
[server]
address = "localhost"
port = 8501
export OLLAMA_BASE_URL="http://localhost:11434"
export UPLOAD_FOLDER="./uploads"
- Backend Detection: Automatic FastAPI availability checking with local fallback
- Dynamic Models: Real-time model list fetching from Ollama backend
- Streaming Responses: Real-time token streaming for better UX
- Document Processing: Multi-format document ingestion and vectorization performed at ingestion for faster retrieval
- Error Handling: Comprehensive error handling with user-friendly messages
Docker Deployment Issues:
- Ollama not accessible: Ensure Ollama is running with
ollama serve
- Permission errors: Run
chmod 755 ~/.cache/huggingface ~/.ollama
- Port conflicts: Check with
lsof -i :8000 :8501
and kill conflicting processes - Container build fails: Clean with
make clean && make setup && make up
Backend Issues:
- vLLM backend not working: Verify Hugging Face token is valid and model exists
- Backend switching problems: Clear browser cache and reload the page
- Model loading errors: Check model compatibility and available GPU memory
Native Installation Issues:
- Module not found: Reinstall dependencies with
uv pip install -r requirements.txt
- Streamlit not starting: Check if port 8501 is available
- FastAPI errors: Verify Python 3.12+ and check logs in terminal
General Issues:
- Models not loading: Verify Ollama is running and models are pulled
- Upload failures: Check
uploads
directory permissions - RAG not working: Ensure documents are uploaded and processed successfully
π For comprehensive troubleshooting guides:
- Docker issues: DOCKER.md
- Health checks: DOCKER_HEALTH_CHECKS.md
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Commit changes:
git commit -am 'Add feature'
- Push to branch:
git push origin feature-name
- Submit a Pull Request