Sorai provides a unified HTTP API for tapping into multiple AI model providers. Built in Rust as a lightweight, high-performance open-source LLM proxy gateway. Acting as the central orchestrator, Sorai handles every request and response with precision. With uniform endpoints for text and chat completions, smart fallback logic, and full observability, Sorai transforms client-to-LLM interactions into a seamless, elegantly managed experience.
- π High Performance: Leverages Rust's speed and memory safety for low-latency, high-throughput proxying.
- π Multi-Provider Support: Seamlessly connects to OpenAI, Anthropic, AWS Bedrock, Cohere, etc.
- β‘ Flexible Integration: Minimal configuration required for various LLM backends.
- π Built-in Monitoring: Prometheus metrics and comprehensive observability.
- π οΈ Developer-Friendly: Simple setup, clear documentation, and extensible design.
- π Fallback Support: Automatic failover between providers for reliability.
- π CORS Support: Configurable Cross-Origin Resource Sharing.
- π Structured Logging: Configurable logging with rotation and timestamps.
- π³ Docker Ready: Container support with multi-platform builds.
- π Scalable Architecture: Connection pooling and request timeout handling.
- π Open Source: Licensed under Apache 2.0.
Provider | Key | Configuration Section | Status |
---|---|---|---|
OpenAI | openai |
[openai] |
β |
Anthropic | anthropic |
[anthropic] |
β³ |
Azure OpenAI | azure |
[azure_openai] |
β³ |
AWS Bedrock | bedrock |
[bedrock] |
β³ |
Cohere | cohere |
[cohere] |
β³ |
Google Vertex AI | vertex |
[vertext] |
β³ |
- Rust: Ensure you have Rust installed (version 1.87 or later). Install via rustup
- Git: Required to clone the repository
- API Keys: Valid API keys for your chosen LLM providers
- Optional: Docker for containerized deployment
- Clone the repository:
git clone https://github.com/riipandi/sorai.git && cd sorai
- Build the project:
# Using cargo directly
cargo build --release
# Or using just (recommended)
just build
- Set up configuration:
# Copy example configuration
cp config.toml.example config.toml
# Edit with your API keys and settings
nano config.toml
Create your config.toml
file based on the config.toml.example
# Using cargo
cargo run -- serve
# Using just (with auto-reload for development)
just dev
# Using built binary
./target/release/sorai serve
# With custom config path
./target/release/sorai serve -config /path/to/config.toml
Sorai provides OpenAI-compatible API endpoints:
POST /v1/chat/completions
- Chat completions with conversation contextPOST /v1/text/completions
- Simple text completionsGET /metrics
- Prometheus metrics for monitoring
http://localhost:8000
Here are example benchmarks using oha HTTP load generator:
Test Configuration:
- Concurrent Users: 100
- Total Requests: 2,500
- Target:
GET /healthz
- Environment: local development server
oha -n 2500 -c 100 --latency-correction http://localhost:8000/healthz
Example Results:
Summary:
Success rate: 100.00%
Total: 0.0886 secs
Slowest: 0.0078 secs
Fastest: 0.0001 secs
Average: 0.0034 secs
Requests/sec: 28203.6793
Total data: 183.11 KiB
Size/request: 75 B
Size/sec: 2.02 MiB
Response time histogram:
0.000 [1] |
0.001 [103] |β β β β β β
0.002 [225] |β β β β β β β β β β β β β
0.002 [308] |β β β β β β β β β β β β β β β β β β
0.003 [453] |β β β β β β β β β β β β β β β β β β β β β β β β β β
0.004 [539] |β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β β
0.005 [442] |β β β β β β β β β β β β β β β β β β β β β β β β β β
0.005 [244] |β β β β β β β β β β β β β β
0.006 [130] |β β β β β β β
0.007 [45] |β β
0.008 [10] |
# Light load - 50 concurrent users, 1000 requests
oha -n 1000 -c 50 http://localhost:8000/healthz
# Medium load - 100 concurrent users, 2500 requests with 10s duration
oha -z 10s -n 2500 -c 100 http://localhost:8000/healthz
# Heavy load - 500 concurrent users, 10000 requests
oha -z 10s -n 10000 -c 500 http://localhost:8000/healthz
# Sustained load test - 30 seconds duration
oha -c 100 -z 30s http://localhost:8000/healthz
For testing actual LLM proxy endpoints:
# Test chat completions endpoint (requires valid API key)
oha -n 100 -c 10 -m POST \
-H "Content-Type: application/json" \
-H "Authorization: Bearer your-api-key" \
-d '{"model":"gpt-4o-mini","messages":[{"role":"user","content":"Hello"}]}' \
http://localhost:8000/v1/chat/completions
Performance Notes:
- Actual LLM endpoints performance depends on upstream provider latency
- Connection pooling and keep-alive significantly improve throughput
- Memory usage remains stable under high concurrent load
For detailed documentation, see:
- HTTP Transport Documentation - Complete API reference
- OpenAPI Specification - Machine-readable API spec
- Example Requests - Sample requests using httl
Sorai provides comprehensive monitoring through Prometheus metrics at /metrics
endpoint, including:
- Request counts by provider, model, and status
- Request latency histograms
- Token usage statistics
- Error rates and types
- Connection pool statistics
Sorai includes full Docker support with multi-platform builds:
# Build Docker image
just docker-build
# Run with Docker
just docker-run serve
# Using Docker Compose
just compose-up
We welcome contributions to make Sorai even better!
- Read our Contributing Guidelines for detailed guidelines
- Fork the repository and create a feature branch
- Submit a pull request with a clear title and description
- Join the discussion on GitHub Issues
Join the flow. Amplify your your AI-powered applications with Sorai! π
Inspired from Indonesian term for "joyous uproar", Sorai captures the essence of lively connection. More than just a proxy, it's a seamless bridge that elevates your AI workflows with speed and reliability.
Sorai is licensed under the Apache License 2.0. See the LICENSE file for more information.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in this project by you shall be licensed under the Apache License 2.0, without any additional terms or conditions.
Copyrights in this project are retained by their contributors.
π€« Psst! If you like my work you can support me via GitHub sponsors.