Changes for version 0.02 - 2026-01-17

  • Added Flash Attention support via ggml_flash_attn_ext()
  • Added support for tied embeddings (output.weight = token_embd.weight)
  • Bundled TinyStories-656K test model (749KB) for self-contained tests

Modules

Pure C LLM Inference Engine for Perl (built on ggml)
Memory Context for Tensor Allocation
Computation Graph for Tensor Operations
Transformer Forward Pass and Token Generation
GGUF Model Loading and Tensor Access
Tensor Operations for Neural Network Computation
N-Dimensional Tensor with ggml Backend
BPE Tokenizer for Text Encoding and Decoding