Changelog¶
All notable changes to llmgate are documented here.
v0.8.3 โ 2026-05-01¶
๐ Bug Fix โ Gemini Multi-Turn Conversations¶
- Fixed broken multi-turn conversations on Gemini โ Assistant turns were being serialised with an invalid
"assistant_parts"key instead of the correct"parts"key required by thegoogle-genaiSDK. This caused all prior model text and tool calls to be silently dropped from the conversation context, making every Gemini call effectively single-turn regardless of the history passed in.
v0.8.1 โ 2026-04-30¶
โจ Added โ Embedding Middleware¶
- Embedding Middleware Support โ Middleware chains (
RetryMiddleware,LoggingMiddleware, etc.) are now fully supported for embedding calls. BaseMiddlewareextended withembed_handleandaembed_handlemethods for synchronous and asynchronous embedding hooks.LLMGate.embedandLLMGate.aembednow properly pass requests through the middleware stack before hitting the providers.RetryMiddlewareupdated to natively catch transient errors on embeddings and retry with exponential backoff.
v0.8.0 โ 2026-04-30¶
โจ Added โ Streaming Fallback Resilience¶
stream=True+ model list โ Seamless mid-stream fallback recovery is now fully supported. You can pass a list of models alongsidestream=Trueand if one fails mid-stream, llmgate will automatically failover to the next model in the chain.stream_fallback_modeโ New parameter to configure mid-stream recovery strategy:"restart"(Default): Next model starts fresh. Safe and universally supported."prefill": Buffers partial text and injects it as anassistantprefill, allowing the fallback model to pick up exactly where the previous model left off. Supported natively by Gemini, Groq, Mistral, Cohere, and Ollama."user_turn": Wraps partial text in an assistant turn, followed by a user prompt to continue. Works universally.- Auto-Downgrade logic: Providers that do not support prefilling (OpenAI, Anthropic, Azure, Bedrock) are automatically detected and downgraded to
"user_turn"to avoid API schema rejections mid-stream. - Observability fields added to
StreamChunk: chunk.fallback_attempts: A list of models that were tried before the current chunk's model.chunk.resumed_from_partial: True if the stream resumed mid-way via prefill or user_turn.
v0.7.0 โ 2026-04-29¶
โก Performance โ Embedding Batching¶
- Gemini: fixed N-call loop โ single
embed_content(contents=[list])call (was making one API call per chunk) - Ollama: fixed N-call loop โ single
embed(input=[list])call - Bedrock: parallel
invoke_modelcalls viaThreadPoolExecutor(real-time API has no true batch endpoint); results returned in original order
โจ Added โ Provider-Specific Embedding Params¶
New first-class parameters on embed() / aembed() and EmbeddingRequest:
| Param | Providers | Description |
|---|---|---|
task_type |
Gemini | Optimisation hint: RETRIEVAL_DOCUMENT, RETRIEVAL_QUERY, SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, QUESTION_ANSWERING, FACT_VERIFICATION |
title |
Gemini | Document title โ improves quality when task_type="RETRIEVAL_DOCUMENT" |
input_type |
Cohere, Bedrock-Cohere | Purpose hint: search_document, search_query, classification, clustering |
truncate |
Cohere, Ollama | Overflow strategy โ Cohere: NONE/START/END; Ollama: true/false |
encoding_format |
OpenAI, Azure, Mistral | Output encoding: float or base64 |
user |
OpenAI, Azure | End-user identifier for abuse monitoring |
Additional provider improvements:
- Gemini: task_type + title + dimensions now pass through EmbedContentConfig; response parsing cleaned up
- Cohere: input_type was hardcoded to "search_document" โ now user-controlled; extra_kwargs.pop() mutation bug fixed
- Bedrock: normalize (default True) and dimensions forwarded into Titan V2 request body; Cohere-on-Bedrock gets input_type + truncate
- Mistral: fixed import path (mistralai not mistralai.client); dimensions maps to output_dimension for Matryoshka reduction; encoding_format forwarded
- OpenAI / Azure: encoding_format and user now explicit params; kwargs.pop() side-effect fixed โ kwargs.get()
๐งช Tests¶
- Embedding test suite expanded: 23 โ 53 tests (all mocked, no API keys needed in CI)
- Live-tested against Gemini (
gemini-embedding-001): batch of 5,task_type,title,dimensions=128
v0.6.0 โ 2026-04-25¶
โจ Added โ Fallback / Routing¶
completion(model=[...])/acompletion(model=[...])โ pass a list of model strings for automatic multi-provider fallback routing. First successful response wins.LLMGate(fallback_chain=[...], fallback_on=(...))โ app-level fallback config; all middleware (retry, logging, etc.) applies per-candidate before advancing to the next model.FallbackMiddlewareโ composable middleware for drop-in fallback on existing middleware stacks.AllProvidersFailedErrorโ raised when all models in the chain fail; carrieserrors: list[tuple[str, Exception]]for per-model diagnostics.CompletionResponse.fallback_attemptsโ newlist[str]field indicating which models were tried (and failed) before this response. Empty on first-try success.- Default
fallback_on:(RateLimitError, ProviderAPIError, AuthError)โ all three trigger fallback; configurable per-call or per-gate. stream=True+ model list raisesValueError(streaming fallback planned for v0.7).- 29 new mocked unit tests; live-tested against Groq, Anthropic, and Gemini.
v0.5.0 โ 2026-04-21¶
โจ Added โ Vision / Multimodal Support¶
- New types in
llmgate.types: ImageURLโ URL or data-URI image reference with optionaldetailhintImageBytesโ inline base64-encoded image with explicit MIME typeTextPartโ text segment within a multipart messageImagePartโ image segment (image_urlorimage_bytesvariant)Message.contentwidened fromstrtostr | list[TextPart | ImagePart]โ fully backward compatiblellmgate/vision.pyโ central normalizer module with per-provider serializers:to_openai_content()โ OpenAI / Azureto_groq_content()โ Groq (strips unsupporteddetailparam)to_mistral_content()โ Mistral (plain-stringimage_url)to_anthropic_content()โ Anthropic image source blocksto_gemini_parts()โ GeminiPartobjects (URLs fetched client-side)to_bedrock_content()โ Bedrock Converse image blocks (raw bytes)to_ollama_message()โ Ollama top-levelimageslist- New exception
VisionNotSupportedโ raised by Cohere - 53 new tests in
tests/test_vision.py - Package exports:
ImageURL,ImageBytes,TextPart,ImagePart,VisionNotSupported
v0.4.0 โ 2026-04-11¶
โจ Added โ Batch Completions¶
batch()/abatch()โ parallel completions with configurable concurrencyBatchResulttype with aggregate stats (successful,failed,total_tokens,success_rate)BatchErrortype with per-request failure detailsfail_fastmode to abort the batch on first errorBatchTimeoutErrorexception for per-request timeoutsgate.batch()/gate.abatch()onLLMGate
v0.3.0 โ 2026-03-31¶
โจ Added โ Structured Outputs & Embeddings¶
parse()/aparse()โ shorthand returning a typed Pydantic instanceresponse_formatparameter oncompletion()/acompletion()embed()/aembed()โ embeddings across 7 providersEmbeddingRequest/EmbeddingResponsetypesEmbeddingsNotSupportedexception- Per-provider embedding adapters: OpenAI, Gemini, Azure, Cohere, Mistral, Bedrock, Ollama
v0.2.0 โ 2026-03-25¶
โจ Added¶
- Streaming (
stream=True) returningIterator[StreamChunk]/AsyncIterator[StreamChunk] - Tool / function calling with
ToolDefinition,FunctionDefinition,ToolCall - Composable middleware:
RetryMiddleware,CacheMiddleware,LoggingMiddleware,RateLimitMiddleware LLMGateclient class for gate-level middleware configuration- 5 new optional providers: Mistral, Cohere, Azure OpenAI, AWS Bedrock, Ollama
v0.1.0 โ 2026-03-24¶
๐ Initial release¶
completion()/acompletion()supporting OpenAI, Anthropic, Gemini, Groq- Unified
CompletionResponseshape across all providers AuthError,RateLimitError,ProviderAPIError,ModelNotFoundErrorexceptions- Provider auto-detection from model string prefix
- Full test suite (all mocked โ no API keys needed in CI)
- GitHub Actions CI + PyPI Trusted Publishing