Revision history for Lugh
0.06 2026-01-18
Extended Model Support (Multi-Architecture) and Chat Template / Prompt Formatting
- Dynamic architecture detection from GGUF general.architecture
- Architecture-prefixed metadata keys (e.g., qwen2.context_length, phi3.context_length)
- New Lugh::Prompt module for chat template formatting with 9 built-in formats: chatml, llama2, llama3, mistral, gemma, zephyr, alpaca, vicuna, raw
- Automatic format detection from model architecture (17 architectures mapped)
- New Lugh::Model API methods:
- arch_type() - Get architecture type string
- arch_has_qkv_combined() - Check for combined QKV tensors
- arch_has_ffn_gate() - Check for gated FFN
- arch_has_post_norm() - Check for post-normalization
- arch_is_llama() / arch_is_qwen() / arch_is_phi() / etc.
- New Lugh::Prompt
- apply() method for formatting message arrays with options:
- add_generation_prompt - append assistant prompt for generation
- system_to_user - prepend system message to first user message
- Utility methods:
- format_name() - Get current format name
- format_message($role, $content) - Format single message
- available_formats() - List all format names
- format_for_architecture($arch) - Auto-detect format
- has_format($name) - Check format exists
- get_format($name) - Get format details as hashref
- New test files: t/10-multi-arch.t, t/11-prompt.t, t/12-prompt-integration.t
0.05 2026-01-18
Performance Optimizations
- GPU backend activation
- Multi-backend support: Metal, BLAS, CUDA, Vulkan, CPU
- New backend discovery API:
- Lugh::available_backends() - List all available backends
- Lugh::backend_count() - Get backend count
- Lugh::backend_info($name) - Get backend metadata
- Lugh::backend_available($name) - Check availability
- Lugh::best_backend() - Get recommended backend
- Lugh::has_metal() / Lugh::metal_available() - Metal support
- Backend selection parameter for Lugh::Inference->new(backend => $name)
- Memory pools for efficient repeated inference:
- create_memory_pool() - Create reusable compute resources
- forward_with_pool($pool, \@tokens) - Forward pass with pooled resources
- Lugh::MemoryPool class with reset() and backend() methods
- Batch processing for multiple sequences:
- forward_batch(\@sequences) - Process multiple token sequences
- New performance test file t/09-performance.t
- Full documentation for new APIs Lugh/Lugh::Inference
0.04 2026-01-18
- Added KV Cache support for efficient incremental decoding - Lugh::KVCache
- Lugh::Inference - New create_kvcache() and forward_with_cache() methods
- New test file t/08-kvcache.t
0.03 2026-01-18
- Added generate() method for multi-token autoregressive generation
- Added sample_top_k() method for top-k sampling
- Generation supports: greedy, top_p, top_k, temperature, streaming callbacks
- Added EOS token stopping and callback-based early stopping
- New test suite t/07-generate.t with 22 tests including exact output validation
0.02 2026-01-17
- Added Flash Attention support via ggml_flash_attn_ext()
- Added support for tied embeddings (output.weight = token_embd.weight)
- Bundled TinyStories-656K test model (749KB) for self-contained tests
0.01 Date/time
First version, released on an unsuspecting world.