This project provides a containerized, offline-capable LLM API using Ollama. It automatically pulls and serves a chosen model on first run and exposes a simple REST API for interaction. Designed to be portable and ready for homelab or production environments.
- Fully offline after initial model pull
- Configurable model selection (
llama3
,mistral
,phi3
, etc.) - Dockerized for consistent deployment
- Health checks and persistent model storage
- Simple REST API powered by FastAPI
- Docker
- Docker Compose
git clone https://github.com/yourusername/ollama-offline-agent.git
cd ollama-offline-agent
docker compose up --build
On first run, the container will:
- Start the Ollama daemon.
- Pull the specified model if it is not already cached.
- Launch the API server on port 8000.
Subsequent runs will skip the model pull if the model is already present.
The default model is llama3
. To change it, edit docker-compose.yml
:
environment:
- OLLAMA_MODEL=mistral
Rebuild the container to apply changes:
docker compose up --build
GET /
Response:
{
"message": "Ollama Offline Agent is running."
}
POST /ask
Content-Type: application/json
Body:
{
"prompt": "What is the capital of France?"
}
Response:
{
"prompt": "What is the capital of France?",
"response": "The capital of France is Paris."
}
The Docker volume models
ensures that downloaded models persist across container rebuilds.
The container includes a Docker healthcheck that monitors the API’s availability.
This project is licensed under the MIT License. See LICENSE for details.