Sorai LLM Proxy Gateway

Sorai provides a unified HTTP API for tapping into multiple AI model providers. Built in Rust as a lightweight, high-performance open-source LLM proxy gateway. Acting as the central orchestrator, Sorai handles every request and response with precision. With uniform endpoints for text and chat completions, smart fallback logic, and full observability, Sorai transforms client-to-LLM interactions into a seamless, elegantly managed experience.

Key Features

🚀 High Performance: Leverages Rust's speed and memory safety for low-latency, high-throughput proxying.
🔌 Multi-Provider Support: Seamlessly connects to OpenAI, Anthropic, AWS Bedrock, Cohere, etc.
⚡ Flexible Integration: Minimal configuration required for various LLM backends.
📊 Built-in Monitoring: Prometheus metrics and comprehensive observability.
🛠️ Developer-Friendly: Simple setup, clear documentation, and extensible design.
🔄 Fallback Support: Automatic failover between providers for reliability.
🌐 CORS Support: Configurable Cross-Origin Resource Sharing.
📝 Structured Logging: Configurable logging with rotation and timestamps.
🐳 Docker Ready: Container support with multi-platform builds.
📈 Scalable Architecture: Connection pooling and request timeout handling.
📝 Open Source: Licensed under Apache 2.0.

Supported Providers

Provider	Key	Configuration Section	Status
OpenAI	`openai`	`[openai]`	✅
Anthropic	`anthropic`	`[anthropic]`	⏳
Azure OpenAI	`azure`	`[azure_openai]`	⏳
AWS Bedrock	`bedrock`	`[bedrock]`	⏳
Cohere	`cohere`	`[cohere]`	⏳
Google Vertex AI	`vertex`	`[vertext]`	⏳

Getting Started

Prerequisites

Rust: Ensure you have Rust installed (version 1.87 or later). Install via rustup
Git: Required to clone the repository
API Keys: Valid API keys for your chosen LLM providers
Optional: Docker for containerized deployment

Installation

Clone the repository:

git clone https://github.com/riipandi/sorai.git && cd sorai

Build the project:

# Using cargo directly
cargo build --release

# Or using just (recommended)
just build

Set up configuration:

# Copy example configuration
cp config.toml.example config.toml

# Edit with your API keys and settings
nano config.toml

Configuration

Create your config.toml file based on the config.toml.example

Running the Server

# Using cargo
cargo run -- serve

# Using just (with auto-reload for development)
just dev

# Using built binary
./target/release/sorai serve

# With custom config path
./target/release/sorai serve -config /path/to/config.toml

API Endpoints

Sorai provides OpenAI-compatible API endpoints:

POST /v1/chat/completions - Chat completions with conversation context
POST /v1/text/completions - Simple text completions
GET /metrics - Prometheus metrics for monitoring

Base URL

http://localhost:8000

Performance Benchmarks

Here are example benchmarks using oha HTTP load generator:

Health Check Endpoint Performance

Test Configuration:

Concurrent Users: 100
Total Requests: 2,500
Target: GET /healthz
Environment: local development server

oha -n 2500 -c 100 --latency-correction http://localhost:8000/healthz

Example Results:

Summary:
  Success rate: 100.00%
  Total:        0.0886 secs
  Slowest:      0.0078 secs
  Fastest:      0.0001 secs
  Average:      0.0034 secs
  Requests/sec: 28203.6793

  Total data:   183.11 KiB
  Size/request: 75 B
  Size/sec:     2.02 MiB

Response time histogram:
  0.000 [1]   |
  0.001 [103] |■■■■■■
  0.002 [225] |■■■■■■■■■■■■■
  0.002 [308] |■■■■■■■■■■■■■■■■■■
  0.003 [453] |■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.004 [539] |■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 [442] |■■■■■■■■■■■■■■■■■■■■■■■■■■
  0.005 [244] |■■■■■■■■■■■■■■
  0.006 [130] |■■■■■■■
  0.007 [45]  |■■
  0.008 [10]  |

Load Testing Different Scenarios

# Light load - 50 concurrent users, 1000 requests
oha -n 1000 -c 50 http://localhost:8000/healthz

# Medium load - 100 concurrent users, 2500 requests with 10s duration
oha -z 10s -n 2500 -c 100 http://localhost:8000/healthz

# Heavy load - 500 concurrent users, 10000 requests
oha -z 10s -n 10000 -c 500 http://localhost:8000/healthz

# Sustained load test - 30 seconds duration
oha -c 100 -z 30s http://localhost:8000/healthz

API Endpoint Benchmarks

For testing actual LLM proxy endpoints:

# Test chat completions endpoint (requires valid API key)
oha -n 100 -c 10 -m POST \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer your-api-key" \
  -d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}' \
  http://localhost:8000/v1/chat/completions

Performance Notes:

Actual LLM endpoints performance depends on upstream provider latency
Connection pooling and keep-alive significantly improve throughput
Memory usage remains stable under high concurrent load

Documentation

For detailed documentation, see:

HTTP Transport Documentation - Complete API reference
OpenAPI Specification - Machine-readable API spec
Example Requests - Sample requests using httl

Monitoring

Sorai provides comprehensive monitoring through Prometheus metrics at /metrics endpoint, including:

Request counts by provider, model, and status
Request latency histograms
Token usage statistics
Error rates and types
Connection pool statistics

Docker Support

Sorai includes full Docker support with multi-platform builds:

# Build Docker image
just docker-build

# Run with Docker
just docker-run serve

# Using Docker Compose
just compose-up

Contributing

We welcome contributions to make Sorai even better!

Read our Contributing Guidelines for detailed guidelines
Fork the repository and create a feature branch
Submit a pull request with a clear title and description
Join the discussion on GitHub Issues

Join the flow. Amplify your your AI-powered applications with Sorai! 🚀

Why "Sorai"?

Inspired from Indonesian term for "joyous uproar", Sorai captures the essence of lively connection. More than just a proxy, it's a seamless bridge that elevates your AI workflows with speed and reliability.

License

Sorai is licensed under the Apache License 2.0. See the LICENSE file for more information.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you shall be licensed under the Apache License 2.0, without any additional terms or conditions.

Copyrights in this project are retained by their contributors.

_{🤫 Psst! If you like my work you can support me via GitHub sponsors.}

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
.vscode		.vscode
docker		docker
docs		docs
specs		specs
src		src
tests		tests
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
compose.yml		compose.yml
config.toml.example		config.toml.example
justfile		justfile
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Sorai LLM Proxy Gateway

Key Features

Supported Providers

Getting Started

Prerequisites

Installation

Configuration

Running the Server

API Endpoints

Base URL

Performance Benchmarks

Health Check Endpoint Performance

Load Testing Different Scenarios

API Endpoint Benchmarks

Documentation

Monitoring

Docker Support

Contributing

Why "Sorai"?

License

About

Uh oh!

Releases 2

Packages

Uh oh!

Languages

License

riipandi/sorai

Folders and files

Latest commit

History

Repository files navigation

Sorai LLM Proxy Gateway

Key Features

Supported Providers

Getting Started

Prerequisites

Installation

Configuration

Running the Server

API Endpoints

Base URL

Performance Benchmarks

Health Check Endpoint Performance

Load Testing Different Scenarios

API Endpoint Benchmarks

Documentation

Monitoring

Docker Support

Contributing

Why "Sorai"?

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Languages

Packages