The fastest, lightest, and easiest-to-integrate AI Gateway on the market.
Built by the team at Helicone, open-sourced for the community.
🚀 Quick Start • 📖 Docs • 💬 Discord • 🌐 Website
Open-source, lightweight, and built on Rust.
Handle hundreds of models and millions of LLM requests with minimal latency and maximum reliability.
The NGINX of LLMs.
from openai import OpenAI
client = OpenAI(
api_key="YOUR_HELICONE_API_KEY",
base_url="https://ai-gateway.helicone.ai/ai",
)
completion = client.chat.completions.create(
model="openai/gpt-4o-mini", # or 100+ models
messages=[
{
"role": "user",
"content": "Hello, how are you?"
}
]
)
-- For custom config, check out our configuration guide and the providers we support.
Request any LLM provider using familiar OpenAI syntax. Stop rewriting integrations—use one API for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ more providers.
Smart Routing to always hit the fastest, cheapest, or most reliable option, and always aware of provider uptimes and your rate limits. Built-in strategies include model-based latency routing (fastest model), provider latency-based P2C + PeakEWMA (fastest provider), weighted distribution (based on model weight), and cost optimization (cheapest option).
Rate limit to prevent runaway costs and usage abuse. Set limits per user, team, or globally with support for request counts, token usage, and dollar amounts.
Cache responses to reduce costs and latency by up to 95%. Supports Redis and S3 backends with intelligent cache invalidation.
Monitor performance and debug issues with built-in Helicone integration, plus OpenTelemetry support for logs, metrics, and traces.
Use our cloud-hosted AI Gateway or deploy it to your own infrastructure in seconds by using Docker or following any of our deployment guides here.
Launch.Final.1.1.1.mp4
Metric | Helicone AI Gateway | Typical Setup |
---|---|---|
P95 Latency | <5ms | ~60-100ms |
Memory Usage | ~64MB | ~512MB |
Requests/sec | ~3,000 | ~500 |
Binary Size | ~30MB | ~200MB |
Cold Start | ~100ms | ~2s |
Note: See benchmarks/README.md for detailed benchmarking methodology and results.
AI.Gateway.Demo.mp4
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Your App │───▶│ Helicone AI │───▶│ LLM Providers │
│ │ │ Gateway │ │ │
│ OpenAI SDK │ │ │ │ • OpenAI │
│ (any language) │ │ • Load Balance │ │ • Anthropic │
│ │ │ • Rate Limit │ │ • AWS Bedrock │
│ │ │ • Cache │ │ • Google Vertex │
│ │ │ • Trace │ │ • 20+ more │
│ │ │ • Fallbacks │ │ │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐
│ Helicone │
│ Observability │
│ │
│ • Dashboard │
│ • Observability │
│ • Monitoring │
│ • Debugging │
└─────────────────┘
For the cloud hosted router, we provide a configuration wizard in the UI to help you setup your router without the need for any YAML engineering.
For complete reference of our configuration options, check out our configuration reference and the providers we support.
from openai import OpenAI
client = OpenAI(
- api_key=os.getenv("OPENAI_API_KEY")
+ api_key="placeholder-api-key" # Gateway handles API keys
+ base_url="http://localhost:8080/router/your-router-name"
)
response = client.chat.completions.create(
- model="gpt-4o-mini",
+ model="openai/gpt-4o-mini", # or 100+ models
messages=[{"role": "user", "content": "Hello!"}]
)
import { OpenAI } from "openai";
const client = new OpenAI({
- apiKey: os.getenv("OPENAI_API_KEY")
+ apiKey: "placeholder-api-key", // Gateway handles API keys
+ baseURL: "http://localhost:8080/router/your-router-name",
});
const response = await client.chat.completions.create({
- model: "gpt-4o",
+ model: "openai/gpt-4o",
messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});
The option might be best for you if you are extremely latency sensitive, or want to avoid a cloud offering and would prefer to self host the gateway.
- Set up your
.env
file with yourPROVIDER_API_KEY
s
OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key
- Run locally in your terminal
npx @helicone/ai-gateway@latest
- Make your requests using any OpenAI SDK:
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8080/ai",
# Gateway handles API keys, so this only needs to be
# set to a valid Helicone API key if authentication is enabled.
api_key="placeholder-api-key"
)
# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or other 100+ models..
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
That's it. No new SDKs to learn, no integrations to maintain. Fully-featured and open-sourced.
-- For custom config, check out our configuration guide and the providers we support.
If you are self hosting the gateway and would like to configure different routing strategies, you may follow the below steps:
Include your PROVIDER_API_KEY
s in your .env
file.
If you would like to enable authentication, set the HELICONE_CONTROL_PLANE_API_KEY
variable as well.
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_CONTROL_PLANE_API_KEY=sk-...
Note: This is a sample config.yaml
file. Please refer to our configuration guide for the full list of options, examples, and defaults.
See our full provider list here.
helicone: # Include your HELICONE_API_KEY in your .env file
features: all
cache-store:
type: in-memory
global: # Global settings for all routers
cache:
directive: "max-age=3600, max-stale=1800"
routers:
your-router-name: # Single router configuration
load-balance:
chat:
strategy: model-latency
models:
- openai/gpt-4o-mini
- anthropic/claude-3-7-sonnet
rate-limit:
per-api-key:
capacity: 1000
refill-frequency: 1m # 1000 requests per minute
npx @helicone/ai-gateway@latest --config config.yaml
from openai import OpenAI
import os
helicone_api_key = os.getenv("HELICONE_API_KEY")
client = OpenAI(
base_url="http://localhost:8080/router/your-router-name",
api_key=helicone_api_key
)
# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
model="anthropic/claude-3-5-sonnet", # Or other 100+ models..
messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)
For a complete guide on self-hosting options, including Docker deployment, Kubernetes, and cloud platforms, see our deployment guides.
- 📖 Full Documentation - Complete guides and API reference
- 🚀 Quickstart Guide - Get up and running in 1 minute
- 🔬 Advanced Configurations - Configuration reference & examples
- 💬 Discord Server - Our community of passionate AI engineers
- 🐙 GitHub Discussions - Q&A and feature requests
- 🐦 Twitter - Latest updates and announcements
- 📧 Newsletter - Tips and tricks to deploying AI applications
- 🎫 Report bugs: Github issues
- 💼 Enterprise Support: Book a discovery call with our team
The Helicone AI Gateway is licensed under the Apache License - see the file for details.
Made with ❤️ by Helicone.