Helicone AI Gateway

The fastest, lightest, and easiest-to-integrate AI Gateway on the market.

Built by the team at Helicone, open-sourced for the community.

🚀 Quick Start • 📖 Docs • 💬 Discord • 🌐 Website

🚆 1 API. 100+ models.

Open-source, lightweight, and built on Rust.

Handle hundreds of models and millions of LLM requests with minimal latency and maximum reliability.

The NGINX of LLMs.

👩🏻‍💻 Set up in seconds

With the cloud hosted AI Gateway

from openai import OpenAI

client = OpenAI(
  api_key="YOUR_HELICONE_API_KEY",
  base_url="https://ai-gateway.helicone.ai/ai",
)

completion = client.chat.completions.create(
  model="openai/gpt-4o-mini", # or 100+ models
  messages=[
    {
      "role": "user",
      "content": "Hello, how are you?"
    }
  ]
)

-- For custom config, check out our configuration guide and the providers we support.

Why Helicone AI Gateway?

🌐 Unified interface

Request any LLM provider using familiar OpenAI syntax. Stop rewriting integrations—use one API for OpenAI, Anthropic, Google, AWS Bedrock, and 20+ more providers.

⚡ Smart provider selection

Smart Routing to always hit the fastest, cheapest, or most reliable option, and always aware of provider uptimes and your rate limits. Built-in strategies include model-based latency routing (fastest model), provider latency-based P2C + PeakEWMA (fastest provider), weighted distribution (based on model weight), and cost optimization (cheapest option).

💰 Control your spending

Rate limit to prevent runaway costs and usage abuse. Set limits per user, team, or globally with support for request counts, token usage, and dollar amounts.

🚀 Improve performance

Cache responses to reduce costs and latency by up to 95%. Supports Redis and S3 backends with intelligent cache invalidation.

📊 Simplified tracing

Monitor performance and debug issues with built-in Helicone integration, plus OpenTelemetry support for logs, metrics, and traces.

☁️ One-click deployment

Use our cloud-hosted AI Gateway or deploy it to your own infrastructure in seconds by using Docker or following any of our deployment guides here.

Launch.Final.1.1.1.mp4

⚡ Scalable for production

Metric	Helicone AI Gateway	Typical Setup
P95 Latency	<5ms	~60-100ms
Memory Usage	~64MB	~512MB
Requests/sec	~3,000	~500
Binary Size	~30MB	~200MB
Cold Start	~100ms	~2s

Note: See benchmarks/README.md for detailed benchmarking methodology and results.

🎥 Demo

AI.Gateway.Demo.mp4

🏗️ How it works

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│   Your App      │───▶│ Helicone AI     │───▶│  LLM Providers  │
│                 │    │ Gateway         │    │                 │
│ OpenAI SDK      │    │                 │    │ • OpenAI        │
│ (any language)  │    │ • Load Balance  │    │ • Anthropic     │
│                 │    │ • Rate Limit    │    │ • AWS Bedrock   │
│                 │    │ • Cache         │    │ • Google Vertex │
│                 │    │ • Trace         │    │ • 20+ more      │
│                 │    │ • Fallbacks     │    │                 │
└─────────────────┘    └─────────────────┘    └─────────────────┘
                               │
                               ▼
                      ┌─────────────────┐
                      │ Helicone        │
                      │ Observability   │
                      │                 │
                      │ • Dashboard     │
                      │ • Observability │
                      │ • Monitoring    │
                      │ • Debugging     │
                      └─────────────────┘

⚙️ Custom configuration

Cloud hosted router configuration

For the cloud hosted router, we provide a configuration wizard in the UI to help you setup your router without the need for any YAML engineering.

For complete reference of our configuration options, check out our configuration reference and the providers we support.

📚 Migration guide

From OpenAI (Python)

from openai import OpenAI

client = OpenAI(
-   api_key=os.getenv("OPENAI_API_KEY")
+   api_key="placeholder-api-key" # Gateway handles API keys
+   base_url="http://localhost:8080/router/your-router-name"
)

response = client.chat.completions.create(
-    model="gpt-4o-mini",
+    model="openai/gpt-4o-mini", # or 100+ models
    messages=[{"role": "user", "content": "Hello!"}]
)

From OpenAI (TypeScript)

import { OpenAI } from "openai";

const client = new OpenAI({
-   apiKey: os.getenv("OPENAI_API_KEY")
+   apiKey: "placeholder-api-key", // Gateway handles API keys
+   baseURL: "http://localhost:8080/router/your-router-name",
});

const response = await client.chat.completions.create({
-  model: "gpt-4o",
+  model: "openai/gpt-4o",
  messages: [{ role: "user", content: "Hello from Helicone AI Gateway!" }],
});

Self-host the AI Gateway

The option might be best for you if you are extremely latency sensitive, or want to avoid a cloud offering and would prefer to self host the gateway.

Run the AI Gateway locally

Set up your .env file with your PROVIDER_API_KEYs

OPENAI_API_KEY=your_openai_key
ANTHROPIC_API_KEY=your_anthropic_key

Run locally in your terminal

npx @helicone/ai-gateway@latest

Make your requests using any OpenAI SDK:

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/ai",
    # Gateway handles API keys, so this only needs to be 
    # set to a valid Helicone API key if authentication is enabled.
    api_key="placeholder-api-key"
)

# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",  # Or other 100+ models..
    messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)

That's it. No new SDKs to learn, no integrations to maintain. Fully-featured and open-sourced.

-- For custom config, check out our configuration guide and the providers we support.

Self hosted configuration customization

If you are self hosting the gateway and would like to configure different routing strategies, you may follow the below steps:

1. Set up your environment variables

Include your PROVIDER_API_KEYs in your .env file.

If you would like to enable authentication, set the HELICONE_CONTROL_PLANE_API_KEY variable as well.

OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
HELICONE_CONTROL_PLANE_API_KEY=sk-...

2. Customize your config file

Note: This is a sample config.yaml file. Please refer to our configuration guide for the full list of options, examples, and defaults. See our full provider list here.

helicone: # Include your HELICONE_API_KEY in your .env file
  features: all

cache-store:
  type: in-memory

global: # Global settings for all routers
  cache:
    directive: "max-age=3600, max-stale=1800"

routers:
  your-router-name: # Single router configuration
    load-balance:
      chat:
        strategy: model-latency
        models:
          - openai/gpt-4o-mini
          - anthropic/claude-3-7-sonnet
    rate-limit:
      per-api-key:
        capacity: 1000
        refill-frequency: 1m # 1000 requests per minute

3. Run with your custom configuration

npx @helicone/ai-gateway@latest --config config.yaml

4. Make your requests

from openai import OpenAI
import os

helicone_api_key = os.getenv("HELICONE_API_KEY")

client = OpenAI(
    base_url="http://localhost:8080/router/your-router-name",
    api_key=helicone_api_key
)

# Route to any LLM provider through the same interface, we handle the rest.
response = client.chat.completions.create(
    model="anthropic/claude-3-5-sonnet",  # Or other 100+ models..
    messages=[{"role": "user", "content": "Hello from Helicone AI Gateway!"}]
)

For a complete guide on self-hosting options, including Docker deployment, Kubernetes, and cloud platforms, see our deployment guides.

📚 Resources

Documentation

📖 Full Documentation - Complete guides and API reference
🚀 Quickstart Guide - Get up and running in 1 minute
🔬 Advanced Configurations - Configuration reference & examples

Community

💬 Discord Server - Our community of passionate AI engineers
🐙 GitHub Discussions - Q&A and feature requests
🐦 Twitter - Latest updates and announcements
📧 Newsletter - Tips and tricks to deploying AI applications

Support

🎫 Report bugs: Github issues
💼 Enterprise Support: Book a discovery call with our team

📄 License

The Helicone AI Gateway is licensed under the Apache License - see the file for details.

Made with ❤️ by Helicone.

Website • Docs • Twitter • Discord

Name		Name	Last commit message	Last commit date
Latest commit History 493 Commits
.ai		.ai
.cargo-husky/hooks		.cargo-husky/hooks
.cargo		.cargo
.claude		.claude
.cursor/rules		.cursor/rules
.github/workflows		.github/workflows
ai-gateway		ai-gateway
benchmarks		benchmarks
crates		crates
examples		examples
infrastructure		infrastructure
scripts		scripts
.dockerignore		.dockerignore
.env.template		.env.template
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DEMO.md		DEMO.md
DEVELOPMENT.md		DEVELOPMENT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
RELEASE.md		RELEASE.md
SIDECAR.md		SIDECAR.md
cliff.toml		cliff.toml
dist-workspace.toml		dist-workspace.toml
render.yaml		render.yaml
rustfmt.toml		rustfmt.toml

License

Helicone/ai-gateway

Folders and files

Latest commit

History

Repository files navigation