Skip to content

llmgate

  • :zap: One API, every provider

    Switch between OpenAI, Gemini, Anthropic, Groq, and six more providers by changing a single word.

  • :eyes: Vision & Multimodal

    Pass images alongside text — URL or base64 — with automatic per-provider serialization.

  • :arrows_counterclockwise: Fallback & Routing

    Pass a list of models — llmgate tries each in order and falls back automatically on rate limits or errors.

  • :package: Zero boilerplate

    Consistent response shape, unified error types, and Pydantic v2 models throughout.

  • :arrows_counterclockwise: Async-first

    Every call has a sync and async variant. Batch completions with concurrency control built in.

from llmgate import completion

# OpenAI
resp = completion("gpt-4o-mini", [{"role": "user", "content": "Hello!"}])

# Switch to Gemini — one word changes
resp = completion("gemini-2.5-flash-lite", [{"role": "user", "content": "Hello!"}])

# Switch to Groq
resp = completion("groq/llama-3.3-70b-versatile", [{"role": "user", "content": "Hello!"}])

print(resp.text)  # always the same shape

Why llmgate?

Every LLM provider has a different SDK, different message formats, different error types, and different response shapes. Switching providers in production means touching dozens of files.

llmgate solves this with a thin, stable abstraction: one function, one response model, one error hierarchy — regardless of which provider is under the hood.

What you get Detail
9 providers OpenAI · Anthropic · Gemini · Groq · Mistral · Cohere · Azure · Bedrock · Ollama
Vision URL + base64 images across 8 providers
Streaming stream=True returns Iterator[StreamChunk]
Tools Function calling with a unified ToolCall type
Structured outputs Pass any Pydantic model, get back a validated instance
Embeddings 7 providers, batched, async
Batch Parallel completions with configurable concurrency
Fallback & Routing model=[...] list — automatic multi-provider failover with AllProvidersFailedError
Middleware Retry, cache, logging, rate-limit, fallback — composable

Install

pip install llmgate

Optional providers

Some providers require an extra package:

pip install llmgate[mistral]   # Mistral
pip install llmgate[cohere]    # Cohere
pip install llmgate[bedrock]   # AWS Bedrock
pip install llmgate[ollama]    # Ollama (local)
pip install llmgate[all]       # everything


Quick Example

from llmgate import completion
import os

os.environ["OPENAI_API_KEY"] = "sk-..."

resp = completion(
    model="gpt-4o-mini",
    messages=[
        {"role": "system", "content": "You are a concise assistant."},
        {"role": "user",   "content": "What is a large language model?"},
    ],
    max_tokens=100,
)

print(resp.text)
print(f"Tokens used: {resp.usage.total_tokens}")

Get started → View on GitHub →