Changes for version 0.09 - 2026-01-20

  • Context Extension (RoPE Scaling)
    • New Lugh::RoPE module for extending context beyond training length
      • Four scaling types: none, linear, yarn, longrope
      • Built-in presets: linear_2x, linear_4x, yarn_32k, yarn_64k, yarn_128k
      • YaRN parameters: ext_factor, attn_factor, beta_fast, beta_slow
      • Auto-detection from GGUF metadata (rope.scaling.* keys)
    • LughHyperparams extended with rope_scaling_type, n_ctx_orig, rope_ext_factor, rope_attn_factor, rope_beta_fast, rope_beta_slow
    • Integration with all forward methods via rope => $rope parameter
    • New test files: t/26-rope.t, t/27-rope-integration.t
  • Unified Forward Pass API Refactor
    • now wrappers around static C function (do_forward_unified)
    • forward() now only accepts named parameters (hash form):
      • forward(tokens => \@tokens, lora => $lora, rope => $rope)
      • forward_simple(\@tokens) - simple forward pass
      • forward_cache($cache, \@tokens, ...) - with KV cache
      • forward_pool($pool, \@tokens, ...) - with memory pool
      • forward_batch(\@sequences, ...) - batch processing
      • forward_cache_pool($cache, $pool, \@tokens, ...) - cache + pool
      • forward_batch_pool($pool, \@sequences, ...) - batch + pool

Modules

Pure C LLM Inference Engine for Perl (built on ggml)
Memory Context for Tensor Allocation
Computation Graph for Tensor Operations
Transformer Forward Pass and Token Generation
KV Cache for efficient incremental decoding
Low-Rank Adaptation (LoRA) adapter support for Lugh
GGUF Model Loading and Tensor Access
Tensor Operations for Neural Network Computation
Chat Template Formatting for LLM Conversations
Quantization utilities for Lugh tensors
RoPE (Rotary Position Embedding) Scaling Configuration
N-Dimensional Tensor with ggml Backend
BPE Tokenizer for Text Encoding and Decoding