Revision history for Langertha
0.301 2026-02-27 01:57:13Z
- Rate limit extraction from HTTP response headers: new
Langertha::RateLimit data class with normalized requests_limit,
requests_remaining, tokens_limit, tokens_remaining, and reset
fields plus raw provider-specific headers. Supported providers:
OpenAI/Groq/Cerebras/OpenRouter/Replicate/HuggingFace
(x-ratelimit-*) and Anthropic (anthropic-ratelimit-*). Engine
stores latest rate_limit, Response carries per-response rate_limit
with requests_remaining/tokens_remaining convenience methods.
- New engine: HuggingFace — HuggingFace Inference Providers
(OpenAI-compatible, org/model format, chat + streaming + tool calling)
0.300 2026-02-26 21:03:33Z
- Plugin system: Langertha::Plugin base class with lifecycle hooks
(plugin_before_raid, plugin_build_conversation, plugin_before_llm_call,
plugin_after_llm_response, plugin_before_tool_call,
plugin_after_tool_call, plugin_after_raid) and self_tools support.
Plugins can be specified by short name (resolved to
Langertha::Plugin::* or LangerthaX::Plugin::*).
- Langertha::Plugin::Langfuse: Langfuse observability as a plugin
(alternative to engine-level Role::Langfuse), with cascading traces,
generations, and tool call spans in the Raider loop.
- Role::PluginHost: shared plugin hosting for engines and Raider,
with plugin resolution, instantiation, and _plugin_instances caching.
- Wrapper classes: Langertha::Chat, Langertha::Embedder,
Langertha::ImageGen for wrapping engines with optional overrides
(model, system_prompt, temperature, etc.) and plugin lifecycle hooks.
- Class sugar: `use Langertha qw( Raider )` and
`use Langertha qw( Plugin )` for quick subclass setup with
auto-import of Moose and Future::AsyncAwait.
- Image generation: Role::ImageGeneration with image_model attribute,
OpenAICompatible image_request/image_response/simple_image methods,
OpenAI now composes ImageGeneration role (default: gpt-image-1).
- Role::KeepAlive: extracted keep_alive attribute from Ollama into
a reusable role with get_keep_alive accessor.
- Ollama: update to current API — use operationIds chat/embed/list/ps
(was generateChat/generateEmbeddings/getModels/getRunningModels),
embedding response uses embeddings[0] (was embedding).
- NousResearch: reasoning_prompt is now a configurable attribute
(was hardcoded string).
- Groq, Mistral, OpenAI: consolidate `with 'Langertha::Role::Tools'`
into the main role composition block.
- Log::Any debug/trace logging in Role::Chat, Role::Embedding,
Role::HTTP, Role::Tools, and Role::OpenAPI for request lifecycle
visibility.
- Add Log::Any to cpanfile runtime dependencies.
- Update OpenAPI specs: openai.yaml, mistral.yaml, ollama.yaml to
latest upstream versions.
- Pre-computed OpenAPI lookup tables: ship Langertha::Spec::OpenAI (148
ops), Langertha::Spec::Mistral (67 ops), and Langertha::Spec::Ollama
(12 ops) as static Perl data instead of parsing YAML + constructing
OpenAPI::Modern at runtime. Startup cost drops from ~16s to <1ms.
- New openapi_operations attribute in Role::OpenAPI with automatic
fallback: engines that override _build_openapi_operations get the
fast path; custom engines using openapi_file still work via the
slow YAML/OpenAPI::Modern path.
- Add maint/generate_spec_data.pl to regenerate Spec modules from
share/*.yaml when specs are updated.
- New tests: t/84_live_imagegen.t, t/87_raider_plugins.t,
t/89_langertha_sugar.t, t/91_plugin_config.t, t/92_embedder.t,
t/93_chat.t, t/94_plugin_langfuse.t, t/95_imagegen.t.
0.202 2026-02-25 03:50:44Z
- Engine base class hierarchy: introduce Engine::Remote (JSON + HTTP
+ url required) and Engine::OpenAIBase (+ OpenAICompatible, OpenAPI,
Models, Temperature, ResponseSize, SystemPrompt, Streaming, Chat).
All 15 engines now extend these base classes instead of repeating
10+ role composition statements. New engines need only 2-3 lines.
- Migrate non-OpenAI engines to extend Engine::Remote:
Anthropic, Gemini, Ollama, AKI
- Migrate OpenAI-compatible engines to extend Engine::OpenAIBase:
OpenAI, DeepSeek, Groq, Perplexity, Mistral, MiniMax, NousResearch,
AKIOpenAI, OllamaOpenAI, vLLM (Whisper inherits via OpenAI)
- New engine: Cerebras — fastest inference platform (llama-3.3-70b)
- New engine: OpenRouter — unified gateway for 300+ models
- New engine: Replicate — thousands of open-source models
- New engine: LlamaCpp — llama.cpp server with embeddings
- OpenAICompatible: api_key is now optional (undef = no Authorization
header), enabling local engines (vLLM, llama.cpp) without dummy keys
- OpenAICompatible: model is now optional in requests, enabling
single-model servers (vLLM, llama.cpp) without explicit model names
- Add comprehensive engine hierarchy test (t/10_engine_hierarchy.t)
verifying inheritance, role composition, instantiation, and request
generation for all 19 engines
- Raider self-tools: raider_mcp => 1 enables LLM-controlled tools:
raider_ask_user, raider_pause, raider_abort, raider_wait,
raider_wait_for, raider_session_history, raider_manage_mcps,
raider_switch_engine
- Raider engine_catalog: runtime engine switching via self-tool or API
- Raider mcp_catalog: dynamic MCP server activation/deactivation
- Raider inline tools: quick tool definitions without MCP server setup
- Raider::Result: typed result objects (final, question, pause, abort)
with backward-compatible stringification
- AKI: openai() no longer carries over native model name (different
naming between native and /v1 API), uses default model and warns
- Add live embedding test (t/82_live_embedding.t) with semantic
similarity verification via Math::Vector::Similarity for OpenAI,
Mistral, Ollama, OllamaOpenAI, and LlamaCpp
- Add live chat test (t/83_live_chat.t) for all 16 engines including
Cerebras, OpenRouter, Perplexity, MiniMax, and LlamaCpp
0.201 2026-02-23 03:50:17Z
- Add Response.thinking attribute for chain-of-thought reasoning:
- Native extraction: DeepSeek/OpenAI-compatible reasoning_content,
Anthropic thinking blocks, Gemini thought parts — automatically
populated on Response.thinking, no configuration needed
- Think tag filter: <think> tag stripping enabled by default on
all engines. Handles both closed (<think>...</think>) and
unclosed (<think>...) tags. Configurable tag name via
think_tag (default: 'think'). Disable with
think_tag_filter => 0. Filtering applied across all text
paths: simple_chat, streaming, tool calling, and Raider.
- Add NousResearch reasoning attribute — enables chain-of-thought
reasoning for Hermes 4 and DeepHermes 3 models by prepending
the standard Nous reasoning system prompt
- Langfuse cascading traces — Raider now creates proper hierarchical
Trace → Span (iteration) → Generation (llm-call) / Span (tool)
structure instead of flat trace → generation. Iteration spans group
the LLM call and its tool calls. Tool spans capture per-tool timing,
input, and output. Trace is updated with final output at raid end.
- Langfuse: add langfuse_span() for creating span events
- Langfuse: add langfuse_update_trace(), langfuse_update_span(),
langfuse_update_generation() for updating observations after creation
- Langfuse: langfuse_trace() now supports tags, user_id, session_id,
release, version, public, and environment fields
- Langfuse: langfuse_generation() now supports parent_observation_id,
model_parameters, level, status_message, and version fields
- Langfuse: Raider generations now include token usage data and
model parameters (temperature, max_tokens) when available
- Raider: add langfuse_trace_name, langfuse_user_id, langfuse_session_id,
langfuse_tags, langfuse_release, langfuse_version, langfuse_metadata
attributes for customizing Langfuse trace creation
- Refactor all OpenAI-compatible engines to compose
Langertha::Role::OpenAICompatible directly instead of extending
Langertha::Engine::OpenAI. Each engine now only includes the roles
it actually supports (e.g. DeepSeek gets Chat but not Embedding).
Removes all "doesn't support X" croak overrides. Affected engines:
DeepSeek, Groq, Mistral, MiniMax, NousResearch, Perplexity, vLLM,
AKIOpenAI, OllamaOpenAI.
- Add Raider context compression — when prompt token usage exceeds
a configurable threshold (max_context_tokens * context_compress_threshold),
history is automatically summarized via LLM before the next raid.
Supports separate compression_engine for using cheaper models.
Manual compression via compress_history/compress_history_f.
- Add Raider session_history — full chronological archive of ALL
messages including tool calls and results, persisted across
clear_history and reset. Queryable by the LLM via MCP tool
registered with register_session_history_tool().
- Add MiniMax to live tool calling test (t/80_live_tool_calling.t)
and live raider test (t/82_live_raider.t)
- Add t/83_live_minimax.t: dedicated MiniMax live test covering
simple_chat, list_models, and Raider with Coding Plan web search
- Add Raider inject() method for mid-raid context injection —
queue messages from async callbacks, timers, or other tasks
that get picked up at the next iteration naturally
- Add Raider on_iteration callback — called before each LLM call
(iterations 2+) with ($raider, $iteration), returns messages
to inject. Injected messages are persisted in history.
- Add Langertha::Engine::MiniMax for MiniMax AI API
(chat, streaming, tool calling via OpenAI-compatible API)
- Rewrite all POD to inline style across all modules —
=attr directly after has, =method directly after sub.
Add POD to all previously undocumented modules.
- Improve =seealso cross-links: remove redundant main module
links, add meaningful related module references
0.200 2026-02-22 21:53:36Z
- Add Langertha::Response: metadata container wrapping LLM text content
with id, model, finish_reason, usage (token counts), timing, and created
fields. Uses overload stringification for backward compatibility —
existing code treating responses as strings continues to work.
- All chat_response methods now return Langertha::Response objects:
- Role::OpenAICompatible: extracts id, model, created, finish_reason, usage
- Engine::Anthropic: extracts id, model, stop_reason, input/output_tokens
- Engine::Gemini: extracts modelVersion, finishReason, usageMetadata
(normalized to prompt_tokens/completion_tokens/total_tokens)
- Engine::Ollama: extracts model, done_reason, eval counts, timing fields
- Engine::AKI: extracts model_name, total_duration
- Add Langertha::Raider: autonomous agent with conversation history and
MCP tool calling. Features mission (system prompt), persistent history
across raids, cumulative metrics (raids, iterations, tool_calls, time_ms),
clear_history and reset methods. Supports Hermes tool calling.
Auto-instruments raids with Langfuse traces and per-iteration
generation events when Langfuse is enabled on the engine.
- Add Langertha::Role::Langfuse: observability integration with Langfuse
REST API. Composed into Role::Chat — every engine has Langfuse support
built in. Auto-instruments simple_chat with trace and generation events.
Batched ingestion via POST /api/public/ingestion with Basic Auth.
Disabled by default — active when langfuse_public_key and
langfuse_secret_key are set (via constructor or LANGFUSE_PUBLIC_KEY /
LANGFUSE_SECRET_KEY / LANGFUSE_URL env vars).
- Add ex/response.pl: Response metadata showcase (tokens, model, timing)
- Add ex/raider.pl: autonomous file explorer agent example
- Add ex/langfuse.pl: Langfuse observability example
- Add ex/langfuse-k8s.yaml: Kubernetes manifest for self-hosted Langfuse
with pre-configured project and API keys (zero setup)
- Add t/70_response.t: Response unit tests across all engine formats
- Add t/72_langfuse.t: Langfuse integration tests with mock HTTP
- Add t/82_live_raider.t: live Raider integration test
- Add Langertha::Role::OpenAICompatible: extracted OpenAI API format
methods into a reusable role. Engines that use the OpenAI-compatible
API format now compose this role instead of duplicating methods.
Engine::OpenAI and all subclasses continue to work unchanged.
- Add Langertha::Engine::OllamaOpenAI: first-class engine for Ollama's
OpenAI-compatible /v1 endpoint. Ollama's openai() method now returns
this engine instead of a raw Engine::OpenAI instance.
- Add Langertha::Engine::AKI for AKI.IO native API
(chat completions with key-in-body auth, synchronous mode,
dynamic endpoint listing via list_models and endpoint_details)
- Add Langertha::Engine::AKIOpenAI for AKI.IO via OpenAI-compatible API
(chat, streaming, tool calling via Role::OpenAICompatible)
- Add Langertha::Engine::NousResearch for Nous Research Inference API
with Hermes-native tool calling via <tool_call> XML tags
- Add Langertha::Engine::Perplexity for Perplexity Sonar API
(chat and streaming only, no tool calling)
- Add hermes_tools feature flag to Langertha::Role::Tools for
Hermes-native tool calling via <tool_call>/<tool_response> XML tags;
enables MCP tool calling on any model that supports the Hermes
prompt format, even without API-level tool support
- Add hermes_call_tag, hermes_response_tag attributes for custom
XML tag names (default: tool_call, tool_response)
- Add hermes_tool_instructions attribute for customizing the
instruction text without changing the structural XML template
- Add hermes_tool_prompt attribute for full system prompt override
- Add hermes_extract_content() method for engines to override
response content extraction in Hermes mode
- MCP tool calling now supported on ALL engines:
- OpenAI (inherited by Groq, vLLM, Mistral, DeepSeek)
- Anthropic (with Anthropic-native tool format)
- Gemini (with Gemini-native functionDeclarations format)
- Ollama (OpenAI-compatible tool format)
- NousResearch (Hermes-native via <tool_call> XML tags)
- Add extract_tool_call() to Role::Tools for engine-agnostic
tool call parsing across all provider formats
- Fix Gemini tool calling: pass-through native message formats,
convert MCP tool results to Gemini's functionResponse object
- Fix Gemini chat_request to preserve native parts in messages
from tool result round-trips
- Remove hardcoded all_models() lists from all engines; model
discovery is now exclusively dynamic via list_models()
- Update default models:
- Anthropic: claude-sonnet-4-6 (short alias)
- Gemini: gemini-2.5-flash (2.0-flash deprecated for new users)
- Add Hermes tool calling unit test with mock round-trip
(t/66_tool_calling_hermes.t)
- Add vLLM tool calling unit test (t/65_tool_calling_vllm.t)
- Add live integration test for all engines including Ollama, vLLM,
and NousResearch (t/80_live_tool_calling.t) with multi-model support
- Add mock round-trip test for Ollama tool calling
(t/64_tool_calling_ollama_mock.t) using fixture data
- Add shared Test::MockAsyncHTTP test helper (t/lib/)
for mocking async HTTP in engine tests
- Normalize test API key env vars to TEST_LANGERTHA_*_API_KEY
prefix to prevent accidental use of production keys
- Add TEST_LANGERTHA_OLLAMA_URL and TEST_LANGERTHA_OLLAMA_MODELS
env vars for Ollama live testing
- Add TEST_LANGERTHA_VLLM_URL, TEST_LANGERTHA_VLLM_MODEL, and
TEST_LANGERTHA_VLLM_TOOL_CALL_PARSER env vars for vLLM live testing
- Add AKI.IO native API unit test (t/25_aki_requests.t) with mock
response parsing for chat, list_models, and endpoint_details
- Add AKI.IO live integration test (t/81_live_aki.t) for
list_models, endpoint_details, and simple_chat
- Add AKI.IO to live tool calling test (t/80_live_tool_calling.t)
via OpenAI-compatible API
- Add TEST_LANGERTHA_AKI_API_KEY and TEST_LANGERTHA_AKI_MODEL
env vars for AKI.IO live testing
- Use RFC 2606 test.invalid domain for dummy URLs in unit tests
- Add ex/hermes_tools.pl example for Hermes-native tool calling
- Rewrite all POD to inline style across all 37 modules —
=attr directly after has, =method directly after sub.
Add POD to 18 previously undocumented modules.
0.100 2026-02-20 05:33:44Z
- Add MCP (Model Context Protocol) tool calling support
- New Langertha::Role::Tools for engine-agnostic tool calling
- Anthropic engine: full tool calling support (format_tools,
response_tool_calls, format_tool_results, response_text_content)
- Async chat_with_tools_f() method for automatic multi-round
tool-calling loop with configurable max iterations
- Requires Net::Async::MCP for MCP server communication
- Add Future::AsyncAwait support for async/await syntax
- All _f methods (simple_chat_f, simple_chat_stream_f, etc.)
- Streaming with real-time async callbacks
- Add streaming support
- Synchronous callback, iterator, and Future-based APIs
- SSE parsing for OpenAI/Anthropic/Groq/Mistral/DeepSeek
- NDJSON parsing for Ollama
- Add Gemini engine (Google AI Studio)
- Add dynamic model listing via provider APIs with caching
- Add Anthropic extended parameters (effort, inference_geo)
- Improve POD documentation across all modules
0.008 2025-03-30 04:55:38Z
- Add Mistral engine integration
- Adapt Mistral OpenAPI spec for our parser
0.007 2025-01-25 19:29:51Z
- Add DeepSeek engine
0.006 2024-09-30 14:07:25Z
- Add Structured Output support
- Add Groq engine and Groq Whisper support
- Add TEST_WITHOUT_STRUCTURED_OUTPUT env variable
0.005 2024-08-22 13:43:31Z
- Fix data type on keep_alive and remove POSIX round usage
0.004 2024-08-13 23:10:57Z
- Fix interpretation of max_tokens on Anthropic (response size, not context)
0.003 2024-08-11 00:21:01Z
- Add context size and temperature controls
0.002 2024-08-10 02:22:12Z
- Add Whisper Transcription API
- Add more engines
- Fix encoding issues
0.001 2024-08-03 22:47:33Z
- Initial release
- Unified Perl interface for LLM APIs
- Engines: OpenAI, Anthropic, Ollama
- Role-based architecture (Chat, HTTP, Models, JSON, Embedding)
- OpenAPI spec-driven request generation
- Embedding support