Changes for version 0.07 - 2026-01-19

  • LoRA (Low-Rank Adaptation) Support
  • New Lugh::LoRA module for loading and applying LoRA adapters
  • Dual format support: GGUF (.gguf) and SafeTensors (.safetensors)
  • Dynamic scale adjustment with scale() method (0.0 to 2.0+)
  • Integration with all forward methods:
    • forward(\@tokens, lora => $lora)
    • forward_with_cache(..., lora => $lora)
    • forward_with_pool(..., lora => $lora)
    • forward_batch(..., lora => $lora)
  • LoRA API methods:
    • new(model => $model, file => $path) - Load adapter
    • scale($value) / scale() - Set/get LoRA influence
    • format() - Get format type ('gguf' or 'safetensors')
    • n_weights() - Get number of adapted weights
    • alpha() - Get LoRA alpha parameter
  • Validated against llama-cpp-python reference implementation
  • New test files: t/13-backend.t, t/14-memory-pool.t, t/15-batch.t, t/16-edge-cases.t, t/17-sample-topk.t, t/18-inference-methods.t, t/19-model-tensors.t, t/20-lora-interface.t, t/21-lora-forward.t, t/22-lora-cache.t, t/23-lora-pool.t, t/24-lora-batch.t

Modules

Pure C LLM Inference Engine for Perl (built on ggml)
Memory Context for Tensor Allocation
Computation Graph for Tensor Operations
Transformer Forward Pass and Token Generation
KV Cache for efficient incremental decoding
Low-Rank Adaptation (LoRA) adapter support for Lugh
GGUF Model Loading and Tensor Access
Tensor Operations for Neural Network Computation
Chat Template Formatting for LLM Conversations
N-Dimensional Tensor with ggml Backend
BPE Tokenizer for Text Encoding and Decoding