Revision history for Lugh

0.09    2026-01-20
        - Context Extension (RoPE Scaling)
          - New Lugh::RoPE module for extending context beyond training length
            - Four scaling types: none, linear, yarn, longrope
            - Built-in presets: linear_2x, linear_4x, yarn_32k, yarn_64k, yarn_128k
            - YaRN parameters: ext_factor, attn_factor, beta_fast, beta_slow
            - Auto-detection from GGUF metadata (rope.scaling.* keys)
          - LughHyperparams extended with rope_scaling_type, n_ctx_orig, rope_ext_factor, rope_attn_factor, rope_beta_fast, rope_beta_slow
          - Integration with all forward methods via rope => $rope parameter
          - New test files: t/26-rope.t, t/27-rope-integration.t
        - Unified Forward Pass API Refactor
          - now wrappers around static C function (do_forward_unified)
          - forward() now only accepts named parameters (hash form):
            - forward(tokens => \@tokens, lora => $lora, rope => $rope)
            - forward_simple(\@tokens) - simple forward pass
            - forward_cache($cache, \@tokens, ...) - with KV cache
            - forward_pool($pool, \@tokens, ...) - with memory pool
            - forward_batch(\@sequences, ...) - batch processing
            - forward_cache_pool($cache, $pool, \@tokens, ...) - cache + pool
            - forward_batch_pool($pool, \@sequences, ...) - batch + pool

0.08    2026-01-20
        - New Lugh::Quant module
        - All 35+ GGML quantization types exposed as constants:
          - Float types: F32, F16, BF16, F64
          - Integer types: I8, I16, I32, I64
          - Basic quantization: Q4_0, Q4_1, Q5_0, Q5_1, Q8_0, Q8_1
          - K-quant types: Q2_K, Q3_K, Q4_K, Q5_K, Q6_K, Q8_K
          - IQ types: IQ1_S, IQ1_M, IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_S, IQ4_NL, IQ4_XS
          - Experimental: TQ1_0, TQ2_0, MXFP4
        - Type introspection functions:
          - type_name(), type_size(), blck_size(), type_sizef()
          - is_quantized(), requires_imatrix(), row_size()
          - type_count(), all_types(), all_quantized_types()
          - type_from_name(), type_info()
        - Tensor type methods: type(), type_name(), type_size(), blck_size(), 
          is_quantized(), nbytes()
        - OO quantize/dequantize on Lugh::Tensor:
          - $tensor->quantize($ctx, $type) - F32 to quantized
          - $tensor->dequantize($ctx) - quantized/F16/BF16 to F32

0.07    2026-01-19
        LoRA (Low-Rank Adaptation) Support
        - New Lugh::LoRA module for loading and applying LoRA adapters
        - Dual format support: GGUF (.gguf) and SafeTensors (.safetensors)
        - Dynamic scale adjustment with scale() method (0.0 to 2.0+)
        - Integration with all forward methods:
          - forward(tokens => \@tokens, lora => $lora)
          - forward_cache(..., lora => $lora)
          - forward_pool(..., lora => $lora)
          - forward_batch(..., lora => $lora)
        - LoRA API methods:
          - new(model => $model, file => $path) - Load adapter
          - scale($value) / scale() - Set/get LoRA influence
          - format() - Get format type ('gguf' or 'safetensors')
          - n_weights() - Get number of adapted weights
          - alpha() - Get LoRA alpha parameter
        - Validated against llama-cpp-python reference implementation
        - New test files: t/13-backend.t, t/14-memory-pool.t, t/15-batch.t,
          t/16-edge-cases.t, t/17-sample-topk.t, t/18-inference-methods.t,
          t/19-model-tensors.t, t/20-lora-interface.t, t/21-lora-forward.t,
          t/22-lora-cache.t, t/23-lora-pool.t, t/24-lora-batch.t

0.06    2026-01-18
        Extended Model Support (Multi-Architecture) and Chat Template / Prompt Formatting
        - Dynamic architecture detection from GGUF general.architecture
        - Architecture-prefixed metadata keys (e.g., qwen2.context_length, phi3.context_length)
        - New Lugh::Prompt module for chat template formatting with 9 built-in formats: chatml, llama2, llama3, mistral, gemma, zephyr, alpaca, vicuna, raw
        - Automatic format detection from model architecture (17 architectures mapped)
        - New Lugh::Model API methods:
          - arch_type() - Get architecture type string
          - arch_has_qkv_combined() - Check for combined QKV tensors
          - arch_has_ffn_gate() - Check for gated FFN
          - arch_has_post_norm() - Check for post-normalization
          - arch_is_llama() / arch_is_qwen() / arch_is_phi() / etc.
        - New Lugh::Prompt
          - apply() method for formatting message arrays with options:
            - add_generation_prompt - append assistant prompt for generation
            - system_to_user - prepend system message to first user message
          - Utility methods:
            - format_name() - Get current format name
            - format_message($role, $content) - Format single message
            - available_formats() - List all format names
            - format_for_architecture($arch) - Auto-detect format
            - has_format($name) - Check format exists
            - get_format($name) - Get format details as hashref
        - New test files: t/10-multi-arch.t, t/11-prompt.t, t/12-prompt-integration.t

0.05    2026-01-18
        Performance Optimizations
        - GPU backend activation
          - Multi-backend support: Metal, BLAS, CUDA, Vulkan, CPU
        - New backend discovery API:
          - Lugh::available_backends() - List all available backends
          - Lugh::backend_count() - Get backend count
          - Lugh::backend_info($name) - Get backend metadata
          - Lugh::backend_available($name) - Check availability
          - Lugh::best_backend() - Get recommended backend
          - Lugh::has_metal() / Lugh::metal_available() - Metal support
        - Backend selection parameter for Lugh::Inference->new(backend => $name)
        - Memory pools for efficient repeated inference:
          - create_memory_pool() - Create reusable compute resources
          - forward_with_pool($pool, \@tokens) - Forward pass with pooled resources
          - Lugh::MemoryPool class with reset() and backend() methods
        - Batch processing for multiple sequences:
          - forward_batch(\@sequences) - Process multiple token sequences
        - New performance test file t/09-performance.t
        - Full documentation for new APIs Lugh/Lugh::Inference

0.04    2026-01-18
        - Added KV Cache support for efficient incremental decoding - Lugh::KVCache
        - Lugh::Inference - New create_kvcache() and forward_with_cache() methods
        - New test file t/08-kvcache.t

0.03    2026-01-18
        - Added generate() method for multi-token autoregressive generation
        - Added sample_top_k() method for top-k sampling
        - Generation supports: greedy, top_p, top_k, temperature, streaming callbacks
        - Added EOS token stopping and callback-based early stopping
        - New test suite t/07-generate.t with 22 tests including exact output validation

0.02    2026-01-17
        - Added Flash Attention support via ggml_flash_attn_ext()
        - Added support for tied embeddings (output.weight = token_embd.weight)
        - Bundled TinyStories-656K test model (749KB) for self-contained tests

0.01    Date/time
        First version, released on an unsuspecting world.