Vision & Images¶
llmgate v0.5.0 adds first-class multimodal support. Pass images alongside text using a unified content-parts API — the correct wire format for each provider is handled automatically.
Backward compatible
All existing text-only code continues to work unchanged. Vision is purely additive.
How it works¶
Message.content accepts either:
- A plain
str— the existing text-only interface - A
listofTextPartandImagePartobjects — the new multimodal interface
Image Types¶
ImageURL — URL-based image¶
ImageURL(
url="https://example.com/photo.jpg",
detail="auto", # optional: "auto" | "low" | "high" (OpenAI/Azure only)
)
Accepts:
https://public URLsdata:image/jpeg;base64,...data URIs
ImageBytes — Inline base64¶
ImageBytes(
data="<base64-encoded-bytes>", # no data-URI prefix
mime_type="image/jpeg", # "image/jpeg" | "image/png" | "image/webp" | "image/gif"
)
Usage¶
URL image¶
from llmgate import completion
resp = completion(
"gpt-4o-mini",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
],
}],
)
print(resp.text)
Dict format
You can pass content parts as plain dicts (OpenAI format) or as typed TextPart/ImagePart objects — both work.
Base64 image¶
import base64
from llmgate import completion
from llmgate.types import TextPart, ImagePart, ImageBytes, Message
with open("photo.jpg", "rb") as f:
b64 = base64.b64encode(f.read()).decode()
resp = completion(
"claude-opus-4-7",
messages=[Message(
role="user",
content=[
ImagePart(type="image_bytes", image_bytes=ImageBytes(data=b64, mime_type="image/jpeg")),
TextPart(text="Describe this image in detail."),
],
)],
)
print(resp.text)
Multiple images¶
resp = completion(
"gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's the difference between these two images?"},
{"type": "image_url", "image_url": {"url": "https://example.com/image1.jpg"}},
{"type": "image_url", "image_url": {"url": "https://example.com/image2.jpg"}},
],
}],
)
With detail hint (OpenAI / Azure)¶
resp = completion(
"gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Read the text in this image."},
{"type": "image_url", "image_url": {"url": "https://...", "detail": "high"}},
],
}],
)
The detail field is silently ignored by providers that don't support it.
Provider Support Matrix¶
| Provider | URL images | Base64 / bytes | Notes |
|---|---|---|---|
| OpenAI | ✅ | ✅ | detail param supported |
| Azure OpenAI | ✅ | ✅ | Identical to OpenAI |
| Anthropic | ✅ | ✅ | Up to 100 images/request |
| Gemini | ✅ | ✅ | URLs fetched client-side → inline bytes |
| Groq | ✅ | ✅ | Vision model required (e.g. llama-4-scout-17b) |
| Mistral | ✅ | ✅ | Wire format difference handled automatically |
| Bedrock | ✅ | ✅ | URLs fetched client-side; raw bytes to Converse API |
| Ollama | ⚠️ | ✅ | URL images are fetched client-side automatically |
| Cohere | ❌ | ❌ | Raises VisionNotSupported |
Provider-Specific Notes¶
Gemini¶
Gemini does not accept image URLs natively. llmgate fetches the image bytes via httpx and sends them inline — this happens automatically and is transparent to you.
# Just works — llmgate fetches the URL for you
resp = completion(
"gemini-2.0-flash",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "Describe this."},
{"type": "image_url", "image_url": {"url": "https://example.com/photo.jpg"}},
],
}],
)
Ollama¶
Ollama uses a top-level images list on the message, not a content-parts array. llmgate handles this translation automatically.
resp = completion(
"ollama/llava",
messages=[{
"role": "user",
"content": [
{"type": "text", "text": "What's this?"},
{"type": "image_url", "image_url": {"url": "https://..."}},
],
}],
)
Cohere¶
Cohere's vision API is not yet stable. llmgate raises VisionNotSupported if you attempt to send image content to a Cohere model.
from llmgate.exceptions import VisionNotSupported
try:
resp = completion("cohere/command-r-plus", messages_with_image)
except VisionNotSupported:
# Fall back to a different provider
resp = completion("gpt-4o-mini", messages_with_image)
Full Typed Example¶
import base64
from pathlib import Path
from llmgate import completion
from llmgate.types import TextPart, ImagePart, ImageURL, ImageBytes, Message
# Load a local image
image_data = base64.b64encode(Path("chart.png").read_bytes()).decode()
resp = completion(
"gpt-4o",
messages=[
Message(role="system", content="You are a data analyst."),
Message(
role="user",
content=[
TextPart(text="Summarize the trend shown in this chart."),
ImagePart(
type="image_bytes",
image_bytes=ImageBytes(data=image_data, mime_type="image/png"),
),
],
),
],
max_tokens=300,
)
print(resp.text)