Documentation

Complete guide to using CacheGateway for unified AI model access.

Quick Start

1. Create an Account

Sign up for CacheGateway and get your API key instantly. No credit card required for the free tier with 25,000 requests per month.

Get Started

2. Use Your Existing SDK

No new SDK needed! Just change your base URL. Works with OpenAI, Anthropic, and all provider SDKs you already use.

# Just change the base URL - that's it!
export OPENAI_BASE_URL=https://openai.cachegateway.com
export OPENAI_API_KEY=your-openai-api-key

⚠️ Important: Only the hostname changes - no /v1 suffix needed. SDKs append paths automatically.

3. Make Your First Request

Your existing code works instantly. Zero code changes required - just point to CacheGateway.

from openai import OpenAI

# Just change base_url - your code stays the same!
client = OpenAI(
    api_key="your-openai-api-key",
    base_url="https://openai.cachegateway.com"
)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}]
)

⚠️ BYOK (Bring Your Own Keys): Use your actual provider API key. CacheGateway forwards it directly to the upstream provider — zero markup, your key is hashed for lookup and never stored.

4. Enjoy Automatic Benefits

That's it! You now get semantic caching, multi-region edge routing, and real-time cost tracking — all with zero configuration.

View API Reference

Core Concepts

Automatic Caching

CacheGateway caches deterministic requests (temperature=0) and semantically similar prompts (configurable threshold, default 0.85). Cache hits are served from the edge cache — typically much faster than a full provider round-trip.

Caching Benefits:

  • Workload-dependent cost reduction — high-frequency similar prompts benefit most; unique one-off queries see little benefit
  • Faster cache hits — edge KV vs full provider round-trip
  • Zero configuration — works automatically for temperature=0
  • Configurable TTL — default 24h, adjustable per Lane

Provider Routing via Subdomain

Route to different providers by changing the subdomain. Models use their native names.

Subdomain Routing:

  • • https://openai.cachegateway.com → OpenAI (use gpt-4o, gpt-4o-mini, etc.)
  • • https://anthropic.cachegateway.com → Anthropic (use claude-3-5-sonnet-20241022, etc.)
  • • https://google.cachegateway.com → Google AI (use gemini-2.5-flash, gemini-2.5-pro)
  • • Coming soon: Groq, Mistral, Together AI, Fireworks, Perplexity, DeepInfra, OpenRouter (same OpenAI-compatible adapter)

Lane Configuration (Dashboard)

Sign up for the dashboard at app.cachegateway.com to configure advanced features per Lane:

  • System Prompts - Auto-inject context into all requests
  • Guardrails - Content filtering, PII detection
  • Rate Limits - Daily/monthly cost and usage caps
  • Analytics - Real-time cost tracking and insights
  • Multi-Provider Keys - Manage all your provider keys in one place

Best Practices

Handle Provider Errors Gracefully

Wrap requests in retry logic with exponential backoff so transient provider errors don't take down your app. Automatic multi-provider failover is on our roadmap.

Monitor Your Usage

Use the dashboard analytics to track costs across providers. Identify opportunities to optimize by switching to more cost-effective models for certain use cases.

Cache Responses When Possible

For deterministic queries (temperature=0), enable response caching to reduce costs and latency for repeated requests.

Use Streaming for Long Responses

Enable streaming for chat applications to provide a better user experience with immediate feedback as tokens are generated.

Security

API Key Security

Your API keys provide full access to CacheGateway. Keep them secure and never expose them in client-side code.

Security Best Practices:

  • • Store API keys in environment variables
  • • Never commit keys to version control
  • • Rotate keys regularly
  • • Use separate keys for dev/staging/production
  • • Monitor key usage for anomalies

Data Privacy

CacheGateway processes requests at the edge and does not store your prompts or responses. Provider API keys are stored as SHA-256 hashes (the original key never leaves your control). See the BYOK Security Disclosure for the full data-flow breakdown.

Need More Help?

Can't find what you're looking for? Our support team is here to help.