Skip to content

Structured Outputs

Get typed, validated Pydantic instances directly from any LLM — no manual JSON parsing.


The parse() shorthand

from pydantic import BaseModel
from llmgate import parse

class Movie(BaseModel):
    title: str
    year: int
    director: str
    rating: float

movie = parse(
    "groq/llama-3.3-70b-versatile",
    [{"role": "user", "content": "Name a classic sci-fi film with details."}],
    response_format=Movie,
)

print(movie.title)    # "2001: A Space Odyssey"
print(movie.year)     # 1968
print(type(movie))    # <class 'Movie'>

Via completion() with response_format

from llmgate import completion

resp = completion(
    "gpt-4o-mini",
    messages,
    response_format=Movie,
)

movie: Movie = resp.parsed    # validated Pydantic instance
print(resp.text)              # also available as raw JSON string

Async

from llmgate import aparse

movie = await aparse("gemini-2.5-flash-lite", messages, response_format=Movie)

Nested models

from pydantic import BaseModel
from typing import list

class Actor(BaseModel):
    name: str
    role: str

class Film(BaseModel):
    title: str
    year: int
    cast: list[Actor]
    synopsis: str

film = parse("claude-3-5-sonnet-20241022", messages, response_format=Film)
for actor in film.cast:
    print(f"{actor.name} as {actor.role}")

Provider strategies

Each provider implements structured output differently. llmgate picks the best approach automatically:

Provider Strategy
OpenAI / Azure Native json_schema with strict: true — schema-constrained decoding
Gemini response_schema + application/json MIME — native structured generation
Groq / Mistral / Ollama json_object mode + Pydantic model_validate_json()
Anthropic / Bedrock / Cohere Schema injected into system prompt → JSON extracted and validated

Incompatibility

response_format and stream=True cannot be used together. Structured outputs require the complete response for validation.

Complex schemas

For Anthropic and Bedrock, keep schemas straightforward — very deeply nested or recursive schemas can confuse the prompt-injection approach. For complex cases, prefer OpenAI or Gemini which use native schema-constrained decoding.