Frequently Asked Questions
Quick answers to common questions about CacheGateway
BYOK stands for "Bring Your Own Keys." You provide your own provider API keys (OpenAI, Anthropic, etc.) and CacheGateway proxies your requests to those providers. We charge a flat subscription for gateway services (caching, routing, analytics). We never mark up token costs.
You need provider API keys from the upstream services: OpenAI (platform.openai.com/api-keys), Anthropic (console.anthropic.com/keys), Google (ai.google.dev), Groq (console.groq.com/keys), Mistral (console.mistral.ai/api-keys). Once you have them, register them in your CacheGateway dashboard → Settings → API Keys.
Live today: OpenAI, Anthropic, and Google AI — each routable on its own subdomain (e.g., openai.cachegateway.com, anthropic.cachegateway.com). OpenAI-compatible adapters (Groq, Mistral, Together AI, Fireworks, Perplexity, DeepInfra, OpenRouter) are rolling out, and more providers (Cohere, AWS Bedrock, Replicate, etc.) are on the roadmap.
When you hit your monthly request quota, requests return HTTP 429 with details about the limit and reset time. Monthly quotas by tier: Free 25K, Starter 250K, Pro 1M, Scale 10M. Cache hits count as 0.1 requests. Free and Starter are hard-blocked at the limit; Pro and Scale allow overage at $0.0005 per request beyond quota. For per-minute provider rate limits, implement exponential backoff and retry after the specified time.
CacheGateway charges a flat monthly subscription based on tier: Free $0 (25K req), Starter $20 (250K req), Pro $99 (1M req), Scale $499 (10M req). You pay providers directly for token costs — no markup from us. Example: 100K requests with GPT-4o-mini = $13.50 paid to OpenAI + $20 paid to CacheGateway (Starter) = $33.50 total. Cache hits are billed at 1/10 of a request and avoid provider token costs.
CacheGateway offers semantic caching: similar prompts can return cached responses (configurable similarity threshold, default 0.95). Default TTL is 24 hours, configurable per Lane. Cache hits are typically 10-100x faster than provider calls. Actual savings depend heavily on your workload — high-frequency similar prompts benefit most, unique one-off queries see little benefit.
Auto failover (automatic fallback to a backup provider when the primary is degraded) is on the V2 roadmap and marked "Coming soon" on our pricing page. It is not yet available. Current behavior on provider errors: HTTP error code passes through to your client, your application code handles retry logic.
All connections use TLS 1.3. By default, we do NOT store your provider API keys — we only store a SHA-256 hash for lookup, and your actual key passes through each request to the upstream provider. Request and response bodies are logged only with your explicit opt-in for observability purposes. We do not train models on your data. Metadata (timestamps, token counts, costs) retention defaults to 30 days (7 days on Free tier, 90 days on Scale).
Not yet. Enterprise features such as SLA guarantees, SSO/SAML, and audit log integrations are not part of the current product. If you have an enterprise need, contact us at founder@cachegateway.com and we can discuss whether the Scale tier covers your use case.
No. CacheGateway is closed-source commercial software, operated as a managed SaaS.
Still have questions?
Can't find what you're looking for? Our support team is here to help.