Changes for version 0.06 - 2026-01-18

  • Extended Model Support (Multi-Architecture) and Chat Template / Prompt Formatting
  • Dynamic architecture detection from GGUF general.architecture
  • Architecture-prefixed metadata keys (e.g., qwen2.context_length, phi3.context_length)
  • New Lugh::Prompt module for chat template formatting with 9 built-in formats: chatml, llama2, llama3, mistral, gemma, zephyr, alpaca, vicuna, raw
  • Automatic format detection from model architecture (17 architectures mapped)
  • New Lugh::Model API methods:
    • arch_type() - Get architecture type string
    • arch_has_qkv_combined() - Check for combined QKV tensors
    • arch_has_ffn_gate() - Check for gated FFN
    • arch_has_post_norm() - Check for post-normalization
    • arch_is_llama() / arch_is_qwen() / arch_is_phi() / etc.
  • New Lugh::Prompt
    • apply() method for formatting message arrays with options:
      • add_generation_prompt - append assistant prompt for generation
      • system_to_user - prepend system message to first user message
    • Utility methods:
      • format_name() - Get current format name
      • format_message($role, $content) - Format single message
      • available_formats() - List all format names
      • format_for_architecture($arch) - Auto-detect format
      • has_format($name) - Check format exists
      • get_format($name) - Get format details as hashref
  • New test files: t/10-multi-arch.t, t/11-prompt.t, t/12-prompt-integration.t

Modules

Pure C LLM Inference Engine for Perl (built on ggml)
Memory Context for Tensor Allocation
Computation Graph for Tensor Operations
Transformer Forward Pass and Token Generation
KV Cache for efficient incremental decoding
GGUF Model Loading and Tensor Access
Tensor Operations for Neural Network Computation
Chat Template Formatting for LLM Conversations
N-Dimensional Tensor with ggml Backend
BPE Tokenizer for Text Encoding and Decoding