FabricAI Inference Server

A hackable, modular, containerized inference server for deploying large language models in local or hybrid environments.

Architecture

See docs page.

Prerequisites

Python 3.10+ (if running locally)
Docker & Docker Compose with Docker Desktop (recommended for containerized usage)
Poetry (if installing locally)

Getting Started

Clone the Repository

git clone https://github.com/tmcarmichael/fabricai-inference-server.git
cd fabricai-inference-server

Download the Model

Suggested: TheBloke/Llama-2-13B-Ensemble-v5-GGUF (https://huggingface.co/TheBloke/Llama-2-13B-Ensemble-v5-GGUF)

Check hardware compatibility (Huggingface supports a check for this), if needed use 4bit or 3bit quantization.

Configure Model Path

Create a .env file at the project root:

cp .env.example .env

Edit .env to set:

LOCAL_MODEL_DIR=/absolute/path/to/your/large-model
LLM_MODEL=/models/llama-2-13b-ensemble-v5.Q4_K_M.gguf

Run with Docker

Build & start:

docker-compose up --build

This spins up:

fabricai-inference-server (FastAPI, uvicorn)
Redis (for session/conversation memory)

Test the server

SSE Endpoint:

curl -N -X POST http://localhost:8000/v1/inference_sse \
    -H "Content-Type: application/json" \
    -d '{"prompt": "Hello from Docker!"}'

Status:

curl http://localhost:8000/v1/status

[Optional] Local Development Environment without Docker

Install Poetry:

pip install --upgrade poetry

Install Dependencies:

poetry install

Start the Server:

poetry run uvicorn fabricai_inference_server.server:app --host 0.0.0.0 --port 8000

[Optional] Event-based Streaming

Socket.IO Support: Connect via Socket.IO at ws://localhost:8000 and emit the "inference_prompt" event.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
docs		docs
examples		examples
fabricai_inference_server		fabricai_inference_server
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yaml		docker-compose.yaml
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
run.sh		run.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

FabricAI Inference Server

Architecture

Prerequisites

Getting Started

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

tmcarmichael/fabricai-inference-server

Folders and files

Latest commit

History

Repository files navigation

FabricAI Inference Server

Architecture

Prerequisites

Getting Started

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages