This repository contains a Flask-based API service that interacts with the Ollama service for language model processing. It demonstrates:
- A REST API endpoint (
/api/respond
) that returns a complete response. - A streaming endpoint (
/api/stream
) that returns a chunked response (simulating ChatGPT-style streaming). - Enhanced Swagger documentation with a custom configuration.
├── basic.py # Main Flask application with API endpoints and Swagger documentation
├── Dockerfile # Dockerfile to containerize the API service
├── docker-compose.yaml # Docker Compose configuration to run both Ollama and Alfresco LLM AI services
├── requirements.txt # Python dependencies
├── start.sh # Script to run the Ollama service
├── the-illusion-of-thinking.txt # Research context and detailed text on reasoning models
└── README.md # This file
- Docker and Docker Compose installed
- (Optional) Python 3.9 if you want to run the application locally without Docker
-
Clone the repository:
git clone <repository-url> cd <repository-directory>
-
Build and run the containers using Docker Compose:
docker-compose up --build
This command will:
- Build the
alfresco-llm-ai
container - Start the Ollama service container
- Connect both services via a custom Docker network
- Build the
-
Access the API Documentation:
Open your browser and navigate to http://localhost:5000/apidocs to view the detailed Swagger UI documentation.
- REST API endpoint (
/api/respond
) for complete model responses. - SSE streaming endpoint (
/api/stream
) for real-time model output using Server-Sent Events. - Dynamic model selection and Ollama endpoint configuration via environment variables in Docker Compose.
You can configure the model and Ollama endpoint dynamically using environment variables in your Docker Compose setup:
OLLAMA_MODEL
: Selects the model to use (default:gemma
).OLLAMA_URL
: Sets the Ollama API endpoint (default:http://ollama:11434/api/generate
).
Example usage in docker-compose.yaml
:
services:
alfresco-llm-ai:
environment:
- OLLAMA_MODEL=llama2
- OLLAMA_URL=http://ollama:11434/api/generate
In your docker-compose.yaml
, you can set environment variables for dynamic model and endpoint selection:
services:
ollama:
environment:
- OLLAMA_MODEL=gemma
alfresco-llm-ai:
environment:
- OLLAMA_MODEL=gemma
- OLLAMA_URL=http://ollama:11434/api/generate
You can override these values at runtime or in a .env
file for flexible configuration.
-
Description: Accepts a JSON payload with
context
andprompt
, processes the prompt (automatically prepending a professional instruction), and returns a complete response from the model. -
Request Body:
{ "context": "Your context here.", "prompt": "Your prompt here." }
-
Responses:
- 200 OK: Returns the model's full response in JSON format.
- 400 Bad Request: If the request payload is invalid.
- 500 Server Error: If there's an error communicating with the Ollama service or processing the request.
-
Description: Streams the model's response chunk-by-chunk in real time using Server-Sent Events (SSE).
-
Request Body:
{ "context": "Your context here.", "prompt": "Your prompt here." }
-
Responses:
- 200 OK: Streams text chunks in real-time with MIME type
text/plain
. - 400 Bad Request: If the request payload is invalid.
- 500 Server Error: If an error occurs during processing.
- 200 OK: Streams text chunks in real-time with MIME type
-
How to use:
- Use a browser SSE client, JavaScript EventSource, or
curl -N
to consume the stream. - Example:
curl -N -X POST -H "Content-Type: application/json" -d '{"context":"test","prompt":"hello"}' http://localhost:5000/api/stream
- Use a browser SSE client, JavaScript EventSource, or
The Swagger UI is configured with a custom template and configuration to provide enriched documentation. Features include:
- A custom title, description, and version
- Detailed endpoint descriptions including additional response codes
- A customizable layout
To modify these settings, edit the Swagger configuration in basic.py
where the template
and swagger_config
dictionaries are defined.
This project uses an automated CI/CD workflow via GitHub Actions to build and push the Docker image every time changes are pushed to the master branch. The built image is hosted on GitHub Container Registry and can be fetched using a GitHub access token with the required scopes (e.g., read:packages
).
Package URL: https://github.com/users/kushal-banik-hyland/packages/container/package/alfresco-llm-ai
How to Retrieve the Docker Image:
- Generate a GitHub Personal Access Token with the
read:packages
scope. - Log in to GitHub Container Registry via Docker:
echo YOUR_TOKEN | docker login ghcr.io -u YOUR_USERNAME --password-stdin
- Pull the Docker image:
docker pull ghcr.io/kushal-banik-hyland/alfresco-llm-ai:latest
Important: Before running docker compose up -d
to start your services, ensure you are logged in to GitHub Container Registry by executing the login command above. This is necessary to pull the private image successfully.
-
Null Responses:
- Ensure the URL in the
chat_with_ollama
function is set to use the correct network hostname (i.e.,http://ollama:11434/api/generate
).
- Ensure the URL in the
-
Swagger UI Not Loading:
- Verify that you are accessing http://localhost:5000/apidocs, and rebuild your containers if changes are not visible.
Feel free to open issues or submit pull requests if you have suggestions or improvements.
- SSE endpoint is ideal for real-time streaming of model output.
- Model and endpoint selection are fully dynamic via environment variables.
OLLAMA_MODEL=gemma
OLLAMA_URL=http://ollama:11434/api/generate