gollamas

A per model "reverse proxy" which redirects requests to multiple ollama servers.

This is a reverse proxy for ollama, it accepts mainly chat and generation requests, it reads requests and transfers the payload to a server which has been specifically assigned to run the model reffered to in the request. Reffer to API for a list of endpoints currently supported.

run binary

Binaries are automatically compiled and made available in the latest github release.

gollamas --level=warn \
    --listen 0.0.0.0:11434 
    --proxy=tinyllama=http://server-01:11434 \
    --proxy=llama3.2-vision=http://server-01:11434 \
    --proxy=deepseek-r1:14b=http://server-02:11434

run on docker

Images are automatically built for amd64, arm, arm64, riscv64, s390x and ppc64le. Issues for other architectures are welcome.

Official images are automaticaly made available on docker hub and ghcr.io. You can run the latest image from either.

from docker hub

The main images are on docker hub.

docker run -it \
  -e GOLLAMAS_PROXIES="llama3.2-vision=http://server:11434,deepseek-r1:14b=http://server2:11434" \
  slawoc/gollamas:latest

github

Alternatively images are published to ghcr.io.

docker run -it \
  -e GOLLAMAS_PROXIES="llama3.2-vision=http://server:11434,deepseek-r1:14b=http://server2:11434" \
  ghcr.io/slawo/gollamas:latest

run code locally

go run ./*.go --level=trace \
    --listen 0.0.0.0:11434 
    --proxy=tinyllama=http://server-02:11434 \
    --proxy=llama3.2-vision=http://server-02:11434 \
    --proxy=deepseek-r1:14b=http://server-01:11434

kubernetes

Example of a kube deployment.

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: gollamas
  namespace: ai
spec:
  replicas: 3
  selector:
    matchLabels:
      name: gollamas
  template:
    metadata:
      labels:
        name: gollamas
    spec:
      containers:
        - name: gollamas
          image: slawoc/gollamas:latest
          ports:
            - name: http
              containerPort: 11434
              protocol: TCP
          env:
            - name: GOLLAMAS_LISTEN
              value: 0.0.0.0:11434
            - name: GOLLAMAS_PROXIES
              value: qwen2.5-coder:14b=http://ollama.ai.svc.cluster.local,gemma3:12b=http://f-01-ai.example.com:11434,llama3.2-vision=http://f-02-ai.example.com:11434
            - name: GOLLAMAS_ALIASES
              value: ""
            - name: GOLLAMAS_LIST_ALIASES
              value: "true"
          resources:
            requests:
              cpu: 100m
              memory: 64Mi
            limits:
              cpu: 500m
              memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
  name: gollamas
  namespace: ai
spec:
  type: LoadBalancer
  selector:
    name: gollamas
  ports:
    - port: 80
      name: http
      targetPort: http
      protocol: TCP

Usage

parameters

The existing flags should remain fairly stable going forward, if flags are to be renamed best effort will be made to keep both the new name and old name as well as existing behaviour until final release.

Flag	Env var	Description
`--listen`	"GOLLAMAS_LISTEN", "LISTEN"	address on which the router will be listening on, ie: "localhost:11434"
`--proxy value`		assigns a destination for a model, can be a url or a connection id ex: --proxy 'llama3.2-vision=http://server:11434' ex: --proxy 'llama3.2-vision=c1 --connection c1=http://server:11434'
`--proxies value`	"GOLLAMAS_PROXIES" "PROXIES"	assigns destinations for the models, in the list of model=destination pairs ex: --proxies 'llama3.2-vision=http://server:11434,deepseek-r1:14b=http://server2:11434'
`--connection value`		assigns an identifier to a connection which can be reffered to by proxy declarations ex: --connection c1=http://server:11434 --proxy llama=c1
`--connections value`	"GOLLAMAS_CONNECTIONS" "CONNECTIONS"	provides a list of connections which can be reffered to by id ex: --connections c1=http://server:11434,c2=http://server2:11434
`--alias value`		assigns an alias from an existing model name passed in the proxy configuration 'alias=concrete_model' ex: --alias gpt-3.5-turbo=llama3.2
`--aliases value`	"GOLLAMAS_ALIASES", "ALIASES"	sets aliases for the given model names ex: --aliases 'gpt-3.5-turbo=llama3.2,deepseek=deepseek-r1:14b'
`--list-aliases`	"GOLLAMAS_LIST_ALIASES" "LIST_ALIASES"	show aliases which match a model when listing models

plural flags

You should use the singular flags --alias, --connection and --proxy vs providing a coma separated list to plural flags like --aliases, --connections and --proxies. Usage of the plural flags is discouraged, those flags have been added as a temporary solution to permit passing the associated environment variables in docker containers. Those flags might be removed in future versions while the environemt variables will be retained.

Setting both sigular flags and plural ones will not result in errors but will result in undefined behaviour which can change with future versions. Use only one type of flags, preferably the singular versions.

environment variables

For each option you can set either the flags or the environment variables, setting both will result in undefined behavior which can change with future versions.

Use the GOLLAMAS_ prefixed environment variables.

connections

You can asign ids to connections like so --connection CID1=http://main-ai:11434 --connection CID2=http://mini-ai-01:11434 and reffer to each connection by id when listing the models to be proxied --proxy deepseek-r1:70b=CID1 --proxy tinyllama=CID2.

When a connection is given an id the the ID will be used instead of the url string in any responses or logs

Since 0.4.1 when multiple models are proxied to the same URL only one connection will be created for that url.It is still possible to create 2 connections on the same URL using the --connection flag (--connection C1=http://server1 --connection C2=http://server1).

Features

There are various scenarios this projects attempts to resolve, here is a list of features currently implemented and being considered for implementation:

Usecases

API

Not all endpoints are covered, particularly endpoints which deal with customisation and creation of models are not supported until there is a clear usecase for this.

Supported endpoints
- GET /
- GET /api/tags
- GET /api/ps
- GET /api/version
- GET /v1/models
- GET /v1/models/:model
- HEAD /
- HEAD /api/tags
- HEAD /api/version
- POST /api/chat
- POST /api/embed
- POST /api/embeddings
- POST /api/generate
- POST /api/pull
- POST /api/show
- POST /v1/chat/completions
- POST /v1/completions
- POST /v1/embeddings
Not supported
- DELETE /api/delete
- HEAD /api/blobs/:digest
- POST /api/blobs/:digest
- POST /api/copy
- POST /api/create
- POST /api/push

Internals

The server relies on existing ollama models and middlewares to speed up the development of the initial implementation. Only the requests which have a model ( or the deprecated name) field are transfered to the right server.

When possible other endpoints hit all configured servers to either select one answer (ie: the lowest version available), or are combined and processed into one response (ie: lists of models).

Name		Name	Last commit message	Last commit date
Latest commit History 133 Commits
.devcontainer		.devcontainer
.github		.github
mocks		mocks
.editorconfig		.editorconfig
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
ginHelpers.go		ginHelpers.go
go.mod		go.mod
go.sum		go.sum
main.go		main.go
main_internal_test.go		main_internal_test.go
main_test.go		main_test.go
reflect.go		reflect.go
reflect_internal_test.go		reflect_internal_test.go
router.go		router.go
routerOptions.go		routerOptions.go
router_test.go		router_test.go
service.go		service.go
service_test.go		service_test.go
version.go		version.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gollamas

run binary

run on docker

from docker hub

github

run code locally

kubernetes

Usage

parameters

plural flags

environment variables

connections

Features

Usecases

API

Internals

About

Uh oh!

Releases 15

Packages

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

License

slawo/gollamas

Folders and files

Latest commit

History

Repository files navigation

gollamas

run binary

run on docker

from docker hub

github

run code locally

kubernetes

Usage

parameters

plural flags

environment variables

connections

Features

Usecases

API

Internals

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 15

Packages 0

Uh oh!

Uh oh!

Contributors 3

Uh oh!

Languages

Packages