This module provides natural language capability for specific features within VA LEAF, such as automatic categorization of IT issue tickets.
This leverages llama.cpp to implement self-hosted inference tasks with open models such as Gemma 3.
Prerequisites:
- 6GB free RAM
- OCI-compliant container engine such as Docker or Podman
- Download a compatible model, such as
gemma-3-4b-it-q4_0.gguf
from Gemma 3 - Navigate to
./docker/cpu
or./docker/cuda
depending on CUDA-compatible hardware availability - Update
./docker/*/docker-compose.yml
with the path to the model. - Start the container
To quickly check functionality, navigate to http://localhost:8012
or the relevant hostname and send a message
- Generate a secure hash, and set the environment variable
LLM_API_KEY
LEAF Agent environment variables:
LLM_API_KEY
must match the key generated in this configurationLLM_CATEGORIZATION_URL
must match the URL of the llama.cpp OpenAI-compatible Chat Completions API endpoint (e.g.http://localhost:8012/v1/chat/completions
)