Changes for version 0.301 - 2026-02-27

  • Rate limit extraction from HTTP response headers: new Langertha::RateLimit data class with normalized requests_limit, requests_remaining, tokens_limit, tokens_remaining, and reset fields plus raw provider-specific headers. Supported providers: OpenAI/Groq/Cerebras/OpenRouter/Replicate/HuggingFace (x-ratelimit-*) and Anthropic (anthropic-ratelimit-*). Engine stores latest rate_limit, Response carries per-response rate_limit with requests_remaining/tokens_remaining convenience methods.
  • New engine: HuggingFace — HuggingFace Inference Providers (OpenAI-compatible, org/model format, chat + streaming + tool calling)

Documentation

Simple chat with Ollama
Simple chat with OpenAI
Simple script to check the model list on an OpenAI compatible API
Simple transcription with a Whisper compatible server or OpenAI

Modules

The clan of fierce vikings with 🪓 and 🛡️ to AId your rAId
Chat abstraction wrapping an engine with optional overrides
Embedding abstraction wrapping an engine with optional model override
AKI.IO native API
AKI.IO via OpenAI-compatible API
Cerebras Inference API
Google Gemini API
GroqCloud API
HuggingFace Inference Providers API
llama.cpp server
Nous Research Inference API
Ollama via OpenAI-compatible API
Base class for OpenAI-compatible engines
Perplexity Sonar API
Base class for all remote engines
Whisper compatible transcription server
vLLM inference server
Image generation abstraction wrapping an engine with optional overrides
Base class for plugins
Langfuse observability plugin for any PluginHost
Autonomous agent with conversation history and MCP tools
Result object from a Raider raid
Rate limit information from API response headers
A HTTP Request inside of Langertha
LLM response with metadata
Role for APIs with normal chat functionality
Role for an engine where you can specify the context size (in tokens)
Role for APIs with embedding functionality
Role for HTTP APIs
Role for engines that support image generation
Role for JSON
Role for engines that support keep-alive duration
Langfuse observability integration
Role for APIs with several models
Role for OpenAI-compatible API format
Role for APIs with OpenAPI definition
Role for objects that host plugins (Raider, Engine)
Role for an engine where you can specify structured output
Role for an engine where you can specify the response size (in tokens)
Role for an engine that can set a seed
Role for streaming support
Role for APIs with system prompt
Role for an engine that can have a temperature setting
Configurable think tag filtering for reasoning models
Role for MCP tool calling support
Role for APIs with transcription functionality
Pre-computed OpenAPI operations for Mistral
Pre-computed OpenAPI operations for Ollama
Pre-computed OpenAPI operations for OpenAI
Iterator for streaming responses
Represents a single chunk from a streaming response
Bring your own viking!