Skip to content

Vision & Images

llmgate v0.5.0 adds first-class multimodal support. Pass images alongside text using a unified content-parts API — the correct wire format for each provider is handled automatically.

Backward compatible

All existing text-only code continues to work unchanged. Vision is purely additive.


How it works

Message.content accepts either:

  • A plain str — the existing text-only interface
  • A list of TextPart and ImagePart objects — the new multimodal interface
from llmgate.types import TextPart, ImagePart, ImageURL, ImageBytes, Message

Image Types

ImageURL — URL-based image

ImageURL(
    url="https://example.com/photo.jpg",
    detail="auto",   # optional: "auto" | "low" | "high" (OpenAI/Azure only)
)

Accepts:

  • https:// public URLs
  • data:image/jpeg;base64,... data URIs

ImageBytes — Inline base64

ImageBytes(
    data="<base64-encoded-bytes>",   # no data-URI prefix
    mime_type="image/jpeg",          # "image/jpeg" | "image/png" | "image/webp" | "image/gif"
)

Usage

URL image

from llmgate import completion

resp = completion(
    "gpt-4o-mini",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text",      "text": "What's in this image?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
        ],
    }],
)
print(resp.text)

Dict format

You can pass content parts as plain dicts (OpenAI format) or as typed TextPart/ImagePart objects — both work.

Base64 image

import base64
from llmgate import completion
from llmgate.types import TextPart, ImagePart, ImageBytes, Message

with open("photo.jpg", "rb") as f:
    b64 = base64.b64encode(f.read()).decode()

resp = completion(
    "claude-opus-4-7",
    messages=[Message(
        role="user",
        content=[
            ImagePart(type="image_bytes", image_bytes=ImageBytes(data=b64, mime_type="image/jpeg")),
            TextPart(text="Describe this image in detail."),
        ],
    )],
)
print(resp.text)

Multiple images

resp = completion(
    "gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text",      "text": "What's the difference between these two images?"},
            {"type": "image_url", "image_url": {"url": "https://example.com/image1.jpg"}},
            {"type": "image_url", "image_url": {"url": "https://example.com/image2.jpg"}},
        ],
    }],
)

With detail hint (OpenAI / Azure)

resp = completion(
    "gpt-4o",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text",      "text": "Read the text in this image."},
            {"type": "image_url", "image_url": {"url": "https://...", "detail": "high"}},
        ],
    }],
)

The detail field is silently ignored by providers that don't support it.


Provider Support Matrix

Provider URL images Base64 / bytes Notes
OpenAI detail param supported
Azure OpenAI Identical to OpenAI
Anthropic Up to 100 images/request
Gemini URLs fetched client-side → inline bytes
Groq Vision model required (e.g. llama-4-scout-17b)
Mistral Wire format difference handled automatically
Bedrock URLs fetched client-side; raw bytes to Converse API
Ollama ⚠️ URL images are fetched client-side automatically
Cohere Raises VisionNotSupported

Provider-Specific Notes

Gemini

Gemini does not accept image URLs natively. llmgate fetches the image bytes via httpx and sends them inline — this happens automatically and is transparent to you.

# Just works — llmgate fetches the URL for you
resp = completion(
    "gemini-2.0-flash",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text",      "text": "Describe this."},
            {"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
        ],
    }],
)

Ollama

Ollama uses a top-level images list on the message, not a content-parts array. llmgate handles this translation automatically.

resp = completion(
    "ollama/llava",
    messages=[{
        "role": "user",
        "content": [
            {"type": "text",      "text": "What's this?"},
            {"type": "image_url", "image_url": {"url": "https://..."}},
        ],
    }],
)

Cohere

Cohere's vision API is not yet stable. llmgate raises VisionNotSupported if you attempt to send image content to a Cohere model.

from llmgate.exceptions import VisionNotSupported

try:
    resp = completion("cohere/command-r-plus", messages_with_image)
except VisionNotSupported:
    # Fall back to a different provider
    resp = completion("gpt-4o-mini", messages_with_image)

Full Typed Example

import base64
from pathlib import Path
from llmgate import completion
from llmgate.types import TextPart, ImagePart, ImageURL, ImageBytes, Message

# Load a local image
image_data = base64.b64encode(Path("chart.png").read_bytes()).decode()

resp = completion(
    "gpt-4o",
    messages=[
        Message(role="system", content="You are a data analyst."),
        Message(
            role="user",
            content=[
                TextPart(text="Summarize the trend shown in this chart."),
                ImagePart(
                    type="image_bytes",
                    image_bytes=ImageBytes(data=image_data, mime_type="image/png"),
                ),
            ],
        ),
    ],
    max_tokens=300,
)

print(resp.text)