Skip to content

arvinsingh/MyLLMPlayground

Repository files navigation

My LLM Playground

Highly configurable API with multiple LLM inference backends and persistent storage.

  1. Start-up
  2. Generation Playground
  3. Assistant Playground
  4. Semantic Search Playground
  5. Tuning Playground
  6. Architecture

1. Start-up

Backend Setup

# Install & activate
> uv sync && source .venv/bin/activate

# Download model
> python scripts/download_model.py transformers google/gemma-3-270m --output-dir ./models/gemma-3-270m

# start backend
> python main.py --config config.yaml

Frontend Setup

> cd frontend
> npm install
> npm run dev

Backend API: http://localhost:8000 | Docs: http://localhost:8000/docs
Frontend UI: http://localhost:5173

Quick Configuration Examples

Development (config.development.yaml):

database:
  url: "sqlite+aiosqlite:///./data/conversations.db"
  echo: true  # SQL query logging

logging:
  level: "DEBUG"
  enable_console: true
  enable_file: true
  json_format: false

Production (config.production.yaml):

database:
  url: "${DATABASE_URL}"
  echo: false
  pool_size: 20

logging:
  level: "INFO" 
  enable_console: false
  enable_file: true
  json_format: true

Backends for Generation

Backend Best For Features
Transformers Research, wide compatibility Quantization, GPU acceleration
llama.cpp CPU inference, low memory GGUF format, hybrid CPU/GPU
vLLM Production, high throughput PagedAttention, tensor parallel
Ollama Easy setup, model management Built-in downloads, streaming

2. Generation Playground

Text Generation Interface

  • React UI (TypeScript, Vite, Tailwind CSS)
  • Text Generation (Text completion interface)
  • Continue and expand writing with AI assistance

API Examples

Simple Generation

# generation
curl -X POST http://localhost:8000/generate \
  -d '{"prompt": "Hello", "parameters": {"temperature": 0.7}}'

Legacy Endpoint

# Stateless chat
curl -X POST http://localhost:8000/chat \
  -d '{"messages": [{"role": "user", "content": "Hi"}]}'

3. Assistant Playground

Chat Interface

  • Real-time conversations with message history
  • Session-based conversations (Persistent SQL database storage)
  • Conversation Management (List, view, and delete conversation sessions)

Session-Based Chat API

# Start conversation
SESSION_ID=$(curl -X POST http://localhost:8000/conversations/start \
  -H "Content-Type: application/json" \
  -d '{"system_prompt": "You are helpful"}' | jq -r '.session_id')

# Send messages
curl -X POST http://localhost:8000/conversations/$SESSION_ID/message \
  -H "Content-Type: application/json" \
  -d '{"message": "Hello!", "parameters": {"max_tokens": 100}}'

# Get history
curl http://localhost:8000/conversations/$SESSION_ID

Backend

  • Multiple Backends (Transformers, llama.cpp, vLLM, Ollama)
  • Dual Modes (text completion + assistant chat)
  • Logging (File rotation, JSON format, configurable levels)
  • Plug & Play (Config-driven model/backend switching)

4. Semantic Search Playground

Knowledge Base (RAG)

  • RAG (Retrieval-Augmented Generation) (Semantic search with ChromaDB)
  • Document Processing (PDF, Word, TXT file upload and chunking)
  • Semantic search over documents and conversations

RAG API Endpoints

# Upload a document
curl -X POST http://localhost:8000/rag/upload \
  -F "file=@document.pdf"

# Search documents and conversations
curl -X POST http://localhost:8000/rag/search \
  -H "Content-Type: application/json" \
  -d '{"query": "machine learning concepts", "limit": 10}'

# List uploaded documents
curl http://localhost:8000/rag/documents

5. Tuning Playground (TBD)

Current plan

Frontend:

  1. Dataset uploader
  2. Training config selector (LoRA vs QLoRA vs full)
  3. Progress dashboard

Backend:

  1. Job queue (FastAPI background tasks / Celery)
  2. Training pipeline (Hugging Face transformers, peft, accelerate)
  3. Model registry with versioning

Serving:

  1. Dynamically load fine-tuned models
  2. Let user switch between base and tuned versions

Dedicated tab where users can:

  1. Upload a few examples
  2. Choose a tuning method
  3. Preview model behavior before and after

Like giving user a sandbox to test model behavior.

6. Architecture

Backend Structure

├── llm_playground/
│   ├── api/           # FastAPI endpoints
│   ├── backends/      # Transformers, llama.cpp, vLLM, Ollama
│   ├── core/          # Services, SQL conversation management, RAG
│   ├── models/        # Pydantic schemas, SQLAlchemy database models
│   └── config/        # Configuration management
├── scripts/           # Download models, run modes
├── data/              # SQLite database, ChromaDB vector store
└── logs/              # Application & access logs

Frontend Structure

frontend/
├── src/
│   ├── components/    # React components
│   │   ├── chat/     # Chat interface components
│   │   ├── conversations/ # Conversation management
│   │   ├── generation/    # Text generation interface
│   │   ├── layout/        # Layout components (Sidebar)
│   │   ├── rag/          # RAG/Knowledge base components
│   │   ├── settings/      # Settings and system info
│   │   └── ui/           # Reusable UI components
│   ├── lib/          # API client and utilities
│   ├── types/        # TypeScript type definitions
│   └── App.tsx       # Main application component
├── public/           # Static assets
└── package.json      # Dependencies and scripts

Database Configuration

Development (SQLite - Default)

# config.yaml
database:
  url: "sqlite+aiosqlite:///./data/conversations.db"
  echo: false

Environment Variables

export DATABASE_URL="postgresql+asyncpg://user:password@host:5432/db"
python main.py

Production (PostgreSQL)

# config.production.yaml
database:
  url: ${DATABASE_URL}
  pool_size: 20
  max_overflow: 30

Database Storage

SQLite (Development)

  • Location: ./data/conversations.db (project directory)
  • Access: sqlite3 ./data/conversations.db
  • Benefits: Zero setup, portable, version-controllable

PostgreSQL (Production)

  • Location: External database server or Docker container
  • Access: Standard PostgreSQL tools (psql, pgAdmin)
  • Benefits: Concurrent users, ACID transactions, full-text search

Database Schema

-- Conversations table
CREATE TABLE conversations (
    session_id VARCHAR(36) PRIMARY KEY,
    created_at TIMESTAMP NOT NULL,
    last_updated TIMESTAMP NOT NULL,
    system_prompt TEXT,
    model_name VARCHAR(255),
    backend VARCHAR(50),
    last_parameters JSON,
    message_count INTEGER NOT NULL DEFAULT 0
);

-- Messages table
CREATE TABLE messages (
    id SERIAL PRIMARY KEY,
    session_id VARCHAR(36) REFERENCES conversations(session_id),
    role VARCHAR(20) NOT NULL,
    content TEXT NOT NULL,
    timestamp TIMESTAMP NOT NULL,
    sequence_number INTEGER NOT NULL
);

-- RAG Documents table
CREATE TABLE documents (
    id SERIAL PRIMARY KEY,
    filename VARCHAR(255) NOT NULL,
    content_hash VARCHAR(64) NOT NULL UNIQUE,
    file_size INTEGER NOT NULL,
    document_metadata JSON,
    created_at TIMESTAMP NOT NULL,
    updated_at TIMESTAMP NOT NULL
);

-- RAG Document Chunks table
CREATE TABLE document_chunks (
    id SERIAL PRIMARY KEY,
    document_id INTEGER REFERENCES documents(id),
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    chunk_metadata JSON,
    created_at TIMESTAMP NOT NULL
);

-- RAG Conversation Chunks table (for searchable chat history)
CREATE TABLE conversation_chunks (
    id SERIAL PRIMARY KEY,
    session_id VARCHAR(36) REFERENCES conversations(session_id),
    chunk_index INTEGER NOT NULL,
    content TEXT NOT NULL,
    chunk_metadata JSON,
    created_at TIMESTAMP NOT NULL
);

Deployment

Option 1: Docker with PostgreSQL

# Use docker-compose for full stack
docker-compose up -d

Option 2: Managed Database

# Use cloud PostgreSQL (AWS RDS, Google Cloud SQL, etc.)
export DATABASE_URL="postgresql+asyncpg://user:pass@db-host:5432/llm_playground"
python main.py --config config.production.yaml

Option 3: SQLite (Small Scale)

# Single-server deployments
python main.py --config config.yaml

About

Multibackend llm playground

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published