Revision history for Langertha

0.301     2026-02-27 01:57:13Z
    - Rate limit extraction from HTTP response headers: new
      Langertha::RateLimit data class with normalized requests_limit,
      requests_remaining, tokens_limit, tokens_remaining, and reset
      fields plus raw provider-specific headers. Supported providers:
      OpenAI/Groq/Cerebras/OpenRouter/Replicate/HuggingFace
      (x-ratelimit-*) and Anthropic (anthropic-ratelimit-*). Engine
      stores latest rate_limit, Response carries per-response rate_limit
      with requests_remaining/tokens_remaining convenience methods.
    - New engine: HuggingFace — HuggingFace Inference Providers
      (OpenAI-compatible, org/model format, chat + streaming + tool calling)

0.300     2026-02-26 21:03:33Z
    - Plugin system: Langertha::Plugin base class with lifecycle hooks
      (plugin_before_raid, plugin_build_conversation, plugin_before_llm_call,
      plugin_after_llm_response, plugin_before_tool_call,
      plugin_after_tool_call, plugin_after_raid) and self_tools support.
      Plugins can be specified by short name (resolved to
      Langertha::Plugin::* or LangerthaX::Plugin::*).
    - Langertha::Plugin::Langfuse: Langfuse observability as a plugin
      (alternative to engine-level Role::Langfuse), with cascading traces,
      generations, and tool call spans in the Raider loop.
    - Role::PluginHost: shared plugin hosting for engines and Raider,
      with plugin resolution, instantiation, and _plugin_instances caching.
    - Wrapper classes: Langertha::Chat, Langertha::Embedder,
      Langertha::ImageGen for wrapping engines with optional overrides
      (model, system_prompt, temperature, etc.) and plugin lifecycle hooks.
    - Class sugar: `use Langertha qw( Raider )` and
      `use Langertha qw( Plugin )` for quick subclass setup with
      auto-import of Moose and Future::AsyncAwait.
    - Image generation: Role::ImageGeneration with image_model attribute,
      OpenAICompatible image_request/image_response/simple_image methods,
      OpenAI now composes ImageGeneration role (default: gpt-image-1).
    - Role::KeepAlive: extracted keep_alive attribute from Ollama into
      a reusable role with get_keep_alive accessor.
    - Ollama: update to current API — use operationIds chat/embed/list/ps
      (was generateChat/generateEmbeddings/getModels/getRunningModels),
      embedding response uses embeddings[0] (was embedding).
    - NousResearch: reasoning_prompt is now a configurable attribute
      (was hardcoded string).
    - Groq, Mistral, OpenAI: consolidate `with 'Langertha::Role::Tools'`
      into the main role composition block.
    - Log::Any debug/trace logging in Role::Chat, Role::Embedding,
      Role::HTTP, Role::Tools, and Role::OpenAPI for request lifecycle
      visibility.
    - Add Log::Any to cpanfile runtime dependencies.
    - Update OpenAPI specs: openai.yaml, mistral.yaml, ollama.yaml to
      latest upstream versions.
    - Pre-computed OpenAPI lookup tables: ship Langertha::Spec::OpenAI (148
      ops), Langertha::Spec::Mistral (67 ops), and Langertha::Spec::Ollama
      (12 ops) as static Perl data instead of parsing YAML + constructing
      OpenAPI::Modern at runtime. Startup cost drops from ~16s to <1ms.
    - New openapi_operations attribute in Role::OpenAPI with automatic
      fallback: engines that override _build_openapi_operations get the
      fast path; custom engines using openapi_file still work via the
      slow YAML/OpenAPI::Modern path.
    - Add maint/generate_spec_data.pl to regenerate Spec modules from
      share/*.yaml when specs are updated.
    - New tests: t/84_live_imagegen.t, t/87_raider_plugins.t,
      t/89_langertha_sugar.t, t/91_plugin_config.t, t/92_embedder.t,
      t/93_chat.t, t/94_plugin_langfuse.t, t/95_imagegen.t.

0.202     2026-02-25 03:50:44Z
    - Engine base class hierarchy: introduce Engine::Remote (JSON + HTTP
      + url required) and Engine::OpenAIBase (+ OpenAICompatible, OpenAPI,
      Models, Temperature, ResponseSize, SystemPrompt, Streaming, Chat).
      All 15 engines now extend these base classes instead of repeating
      10+ role composition statements. New engines need only 2-3 lines.
    - Migrate non-OpenAI engines to extend Engine::Remote:
      Anthropic, Gemini, Ollama, AKI
    - Migrate OpenAI-compatible engines to extend Engine::OpenAIBase:
      OpenAI, DeepSeek, Groq, Perplexity, Mistral, MiniMax, NousResearch,
      AKIOpenAI, OllamaOpenAI, vLLM (Whisper inherits via OpenAI)
    - New engine: Cerebras — fastest inference platform (llama-3.3-70b)
    - New engine: OpenRouter — unified gateway for 300+ models
    - New engine: Replicate — thousands of open-source models
    - New engine: LlamaCpp — llama.cpp server with embeddings
    - OpenAICompatible: api_key is now optional (undef = no Authorization
      header), enabling local engines (vLLM, llama.cpp) without dummy keys
    - OpenAICompatible: model is now optional in requests, enabling
      single-model servers (vLLM, llama.cpp) without explicit model names
    - Add comprehensive engine hierarchy test (t/10_engine_hierarchy.t)
      verifying inheritance, role composition, instantiation, and request
      generation for all 19 engines
    - Raider self-tools: raider_mcp => 1 enables LLM-controlled tools:
      raider_ask_user, raider_pause, raider_abort, raider_wait,
      raider_wait_for, raider_session_history, raider_manage_mcps,
      raider_switch_engine
    - Raider engine_catalog: runtime engine switching via self-tool or API
    - Raider mcp_catalog: dynamic MCP server activation/deactivation
    - Raider inline tools: quick tool definitions without MCP server setup
    - Raider::Result: typed result objects (final, question, pause, abort)
      with backward-compatible stringification
    - AKI: openai() no longer carries over native model name (different
      naming between native and /v1 API), uses default model and warns
    - Add live embedding test (t/82_live_embedding.t) with semantic
      similarity verification via Math::Vector::Similarity for OpenAI,
      Mistral, Ollama, OllamaOpenAI, and LlamaCpp
    - Add live chat test (t/83_live_chat.t) for all 16 engines including
      Cerebras, OpenRouter, Perplexity, MiniMax, and LlamaCpp

0.201     2026-02-23 03:50:17Z
    - Add Response.thinking attribute for chain-of-thought reasoning:
      - Native extraction: DeepSeek/OpenAI-compatible reasoning_content,
        Anthropic thinking blocks, Gemini thought parts — automatically
        populated on Response.thinking, no configuration needed
      - Think tag filter: <think> tag stripping enabled by default on
        all engines. Handles both closed (<think>...</think>) and
        unclosed (<think>...) tags. Configurable tag name via
        think_tag (default: 'think'). Disable with
        think_tag_filter => 0. Filtering applied across all text
        paths: simple_chat, streaming, tool calling, and Raider.
    - Add NousResearch reasoning attribute — enables chain-of-thought
      reasoning for Hermes 4 and DeepHermes 3 models by prepending
      the standard Nous reasoning system prompt
    - Langfuse cascading traces — Raider now creates proper hierarchical
      Trace → Span (iteration) → Generation (llm-call) / Span (tool)
      structure instead of flat trace → generation. Iteration spans group
      the LLM call and its tool calls. Tool spans capture per-tool timing,
      input, and output. Trace is updated with final output at raid end.
    - Langfuse: add langfuse_span() for creating span events
    - Langfuse: add langfuse_update_trace(), langfuse_update_span(),
      langfuse_update_generation() for updating observations after creation
    - Langfuse: langfuse_trace() now supports tags, user_id, session_id,
      release, version, public, and environment fields
    - Langfuse: langfuse_generation() now supports parent_observation_id,
      model_parameters, level, status_message, and version fields
    - Langfuse: Raider generations now include token usage data and
      model parameters (temperature, max_tokens) when available
    - Raider: add langfuse_trace_name, langfuse_user_id, langfuse_session_id,
      langfuse_tags, langfuse_release, langfuse_version, langfuse_metadata
      attributes for customizing Langfuse trace creation
    - Refactor all OpenAI-compatible engines to compose
      Langertha::Role::OpenAICompatible directly instead of extending
      Langertha::Engine::OpenAI. Each engine now only includes the roles
      it actually supports (e.g. DeepSeek gets Chat but not Embedding).
      Removes all "doesn't support X" croak overrides. Affected engines:
      DeepSeek, Groq, Mistral, MiniMax, NousResearch, Perplexity, vLLM,
      AKIOpenAI, OllamaOpenAI.
    - Add Raider context compression — when prompt token usage exceeds
      a configurable threshold (max_context_tokens * context_compress_threshold),
      history is automatically summarized via LLM before the next raid.
      Supports separate compression_engine for using cheaper models.
      Manual compression via compress_history/compress_history_f.
    - Add Raider session_history — full chronological archive of ALL
      messages including tool calls and results, persisted across
      clear_history and reset. Queryable by the LLM via MCP tool
      registered with register_session_history_tool().
    - Add MiniMax to live tool calling test (t/80_live_tool_calling.t)
      and live raider test (t/82_live_raider.t)
    - Add t/83_live_minimax.t: dedicated MiniMax live test covering
      simple_chat, list_models, and Raider with Coding Plan web search
    - Add Raider inject() method for mid-raid context injection —
      queue messages from async callbacks, timers, or other tasks
      that get picked up at the next iteration naturally
    - Add Raider on_iteration callback — called before each LLM call
      (iterations 2+) with ($raider, $iteration), returns messages
      to inject. Injected messages are persisted in history.
    - Add Langertha::Engine::MiniMax for MiniMax AI API
      (chat, streaming, tool calling via OpenAI-compatible API)
    - Rewrite all POD to inline style across all modules —
      =attr directly after has, =method directly after sub.
      Add POD to all previously undocumented modules.
    - Improve =seealso cross-links: remove redundant main module
      links, add meaningful related module references

0.200     2026-02-22 21:53:36Z
    - Add Langertha::Response: metadata container wrapping LLM text content
      with id, model, finish_reason, usage (token counts), timing, and created
      fields. Uses overload stringification for backward compatibility —
      existing code treating responses as strings continues to work.
    - All chat_response methods now return Langertha::Response objects:
      - Role::OpenAICompatible: extracts id, model, created, finish_reason, usage
      - Engine::Anthropic: extracts id, model, stop_reason, input/output_tokens
      - Engine::Gemini: extracts modelVersion, finishReason, usageMetadata
        (normalized to prompt_tokens/completion_tokens/total_tokens)
      - Engine::Ollama: extracts model, done_reason, eval counts, timing fields
      - Engine::AKI: extracts model_name, total_duration
    - Add Langertha::Raider: autonomous agent with conversation history and
      MCP tool calling. Features mission (system prompt), persistent history
      across raids, cumulative metrics (raids, iterations, tool_calls, time_ms),
      clear_history and reset methods. Supports Hermes tool calling.
      Auto-instruments raids with Langfuse traces and per-iteration
      generation events when Langfuse is enabled on the engine.
    - Add Langertha::Role::Langfuse: observability integration with Langfuse
      REST API. Composed into Role::Chat — every engine has Langfuse support
      built in. Auto-instruments simple_chat with trace and generation events.
      Batched ingestion via POST /api/public/ingestion with Basic Auth.
      Disabled by default — active when langfuse_public_key and
      langfuse_secret_key are set (via constructor or LANGFUSE_PUBLIC_KEY /
      LANGFUSE_SECRET_KEY / LANGFUSE_URL env vars).
    - Add ex/response.pl: Response metadata showcase (tokens, model, timing)
    - Add ex/raider.pl: autonomous file explorer agent example
    - Add ex/langfuse.pl: Langfuse observability example
    - Add ex/langfuse-k8s.yaml: Kubernetes manifest for self-hosted Langfuse
      with pre-configured project and API keys (zero setup)
    - Add t/70_response.t: Response unit tests across all engine formats
    - Add t/72_langfuse.t: Langfuse integration tests with mock HTTP
    - Add t/82_live_raider.t: live Raider integration test
    - Add Langertha::Role::OpenAICompatible: extracted OpenAI API format
      methods into a reusable role. Engines that use the OpenAI-compatible
      API format now compose this role instead of duplicating methods.
      Engine::OpenAI and all subclasses continue to work unchanged.
    - Add Langertha::Engine::OllamaOpenAI: first-class engine for Ollama's
      OpenAI-compatible /v1 endpoint. Ollama's openai() method now returns
      this engine instead of a raw Engine::OpenAI instance.
    - Add Langertha::Engine::AKI for AKI.IO native API
      (chat completions with key-in-body auth, synchronous mode,
      dynamic endpoint listing via list_models and endpoint_details)
    - Add Langertha::Engine::AKIOpenAI for AKI.IO via OpenAI-compatible API
      (chat, streaming, tool calling via Role::OpenAICompatible)
    - Add Langertha::Engine::NousResearch for Nous Research Inference API
      with Hermes-native tool calling via <tool_call> XML tags
    - Add Langertha::Engine::Perplexity for Perplexity Sonar API
      (chat and streaming only, no tool calling)
    - Add hermes_tools feature flag to Langertha::Role::Tools for
      Hermes-native tool calling via <tool_call>/<tool_response> XML tags;
      enables MCP tool calling on any model that supports the Hermes
      prompt format, even without API-level tool support
    - Add hermes_call_tag, hermes_response_tag attributes for custom
      XML tag names (default: tool_call, tool_response)
    - Add hermes_tool_instructions attribute for customizing the
      instruction text without changing the structural XML template
    - Add hermes_tool_prompt attribute for full system prompt override
    - Add hermes_extract_content() method for engines to override
      response content extraction in Hermes mode
    - MCP tool calling now supported on ALL engines:
      - OpenAI (inherited by Groq, vLLM, Mistral, DeepSeek)
      - Anthropic (with Anthropic-native tool format)
      - Gemini (with Gemini-native functionDeclarations format)
      - Ollama (OpenAI-compatible tool format)
      - NousResearch (Hermes-native via <tool_call> XML tags)
    - Add extract_tool_call() to Role::Tools for engine-agnostic
      tool call parsing across all provider formats
    - Fix Gemini tool calling: pass-through native message formats,
      convert MCP tool results to Gemini's functionResponse object
    - Fix Gemini chat_request to preserve native parts in messages
      from tool result round-trips
    - Remove hardcoded all_models() lists from all engines; model
      discovery is now exclusively dynamic via list_models()
    - Update default models:
      - Anthropic: claude-sonnet-4-6 (short alias)
      - Gemini: gemini-2.5-flash (2.0-flash deprecated for new users)
    - Add Hermes tool calling unit test with mock round-trip
      (t/66_tool_calling_hermes.t)
    - Add vLLM tool calling unit test (t/65_tool_calling_vllm.t)
    - Add live integration test for all engines including Ollama, vLLM,
      and NousResearch (t/80_live_tool_calling.t) with multi-model support
    - Add mock round-trip test for Ollama tool calling
      (t/64_tool_calling_ollama_mock.t) using fixture data
    - Add shared Test::MockAsyncHTTP test helper (t/lib/)
      for mocking async HTTP in engine tests
    - Normalize test API key env vars to TEST_LANGERTHA_*_API_KEY
      prefix to prevent accidental use of production keys
    - Add TEST_LANGERTHA_OLLAMA_URL and TEST_LANGERTHA_OLLAMA_MODELS
      env vars for Ollama live testing
    - Add TEST_LANGERTHA_VLLM_URL, TEST_LANGERTHA_VLLM_MODEL, and
      TEST_LANGERTHA_VLLM_TOOL_CALL_PARSER env vars for vLLM live testing
    - Add AKI.IO native API unit test (t/25_aki_requests.t) with mock
      response parsing for chat, list_models, and endpoint_details
    - Add AKI.IO live integration test (t/81_live_aki.t) for
      list_models, endpoint_details, and simple_chat
    - Add AKI.IO to live tool calling test (t/80_live_tool_calling.t)
      via OpenAI-compatible API
    - Add TEST_LANGERTHA_AKI_API_KEY and TEST_LANGERTHA_AKI_MODEL
      env vars for AKI.IO live testing
    - Use RFC 2606 test.invalid domain for dummy URLs in unit tests
    - Add ex/hermes_tools.pl example for Hermes-native tool calling
    - Rewrite all POD to inline style across all 37 modules —
      =attr directly after has, =method directly after sub.
      Add POD to 18 previously undocumented modules.

0.100     2026-02-20 05:33:44Z
    - Add MCP (Model Context Protocol) tool calling support
      - New Langertha::Role::Tools for engine-agnostic tool calling
      - Anthropic engine: full tool calling support (format_tools,
        response_tool_calls, format_tool_results, response_text_content)
      - Async chat_with_tools_f() method for automatic multi-round
        tool-calling loop with configurable max iterations
      - Requires Net::Async::MCP for MCP server communication
    - Add Future::AsyncAwait support for async/await syntax
      - All _f methods (simple_chat_f, simple_chat_stream_f, etc.)
      - Streaming with real-time async callbacks
    - Add streaming support
      - Synchronous callback, iterator, and Future-based APIs
      - SSE parsing for OpenAI/Anthropic/Groq/Mistral/DeepSeek
      - NDJSON parsing for Ollama
    - Add Gemini engine (Google AI Studio)
    - Add dynamic model listing via provider APIs with caching
    - Add Anthropic extended parameters (effort, inference_geo)
    - Improve POD documentation across all modules

0.008     2025-03-30 04:55:38Z
    - Add Mistral engine integration
    - Adapt Mistral OpenAPI spec for our parser

0.007     2025-01-25 19:29:51Z
    - Add DeepSeek engine

0.006     2024-09-30 14:07:25Z
    - Add Structured Output support
    - Add Groq engine and Groq Whisper support
    - Add TEST_WITHOUT_STRUCTURED_OUTPUT env variable

0.005     2024-08-22 13:43:31Z
    - Fix data type on keep_alive and remove POSIX round usage

0.004     2024-08-13 23:10:57Z
    - Fix interpretation of max_tokens on Anthropic (response size, not context)

0.003     2024-08-11 00:21:01Z
    - Add context size and temperature controls

0.002     2024-08-10 02:22:12Z
    - Add Whisper Transcription API
    - Add more engines
    - Fix encoding issues

0.001     2024-08-03 22:47:33Z
    - Initial release
    - Unified Perl interface for LLM APIs
    - Engines: OpenAI, Anthropic, Ollama
    - Role-based architecture (Chat, HTTP, Models, JSON, Embedding)
    - OpenAPI spec-driven request generation
    - Embedding support