NAME
Lugh::Model - GGUF Model Loading and Tensor Access
VERSION
Version 0.01
SYNOPSIS
use Lugh;
# Load a GGUF model file
my $model = Lugh::Model->new(
model => '/path/to/model.gguf'
);
# Get model information
print "Architecture: ", $model->architecture, "\n";
print "Tensors: ", $model->n_tensors, "\n";
print "Metadata keys: ", $model->n_kv, "\n";
# Access model metadata
my $n_layers = $model->get_kv('llama.block_count');
my $n_embd = $model->get_kv('llama.embedding_length');
my $vocab_size = $model->get_kv('llama.vocab_size');
# List all tensors
my @names = $model->tensor_names;
# Get tensor information
my ($type, $n_dims, @shape) = $model->tensor_info('token_embd.weight');
# List all metadata keys
my @keys = $model->kv_keys;
DESCRIPTION
Lugh::Model provides an interface for loading and inspecting GGUF model files. GGUF (GPT-Generated Unified Format) is the standard format for storing large language models, used by llama.cpp and related projects.
The model object loads the entire model into memory, including all tensors with their weights. This allows direct access to model parameters for inference.
GGUF Format
GGUF files contain:
Header - Magic number, version, tensor count, metadata count
Metadata - Key-value pairs describing the model architecture, hyperparameters, tokenizer vocabulary, and other configuration
Tensor Info - Name, dimensions, type, and offset for each tensor
Tensor Data - The actual weight data, potentially quantized
Supported Quantization Types
The model loader supports all ggml quantization types, including:
F32, F16, BF16 - Full/half precision floats
Q4_0, Q4_1, Q4_K_S, Q4_K_M - 4-bit quantization
Q5_0, Q5_1, Q5_K_S, Q5_K_M - 5-bit quantization
Q8_0, Q8_1, Q8_K - 8-bit quantization
Q2_K, Q3_K_S, Q3_K_M, Q3_K_L - 2-3 bit quantization
Q6_K - 6-bit quantization
IQ1_S, IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_XS, IQ3_S, IQ4_NL, IQ4_XS - i-quants
CONSTRUCTOR
new
my $model = Lugh::Model->new(
model => '/path/to/model.gguf'
);
Creates a new Model object by loading a GGUF file.
Parameters:
model(required) - Path to the GGUF model file. Also acceptsfileorpathas aliases.
Returns: A Lugh::Model object.
Throws: Dies if the file cannot be loaded or is not a valid GGUF file.
Example:
my $model = Lugh::Model->new(
model => '/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf'
);
METHODS
filename
my $path = $model->filename;
Returns the path to the loaded GGUF file.
architecture
my $arch = $model->architecture;
Returns the model architecture string (e.g., "llama", "gpt2", "falcon"). Returns "unknown" if the architecture is not specified in the model.
n_tensors
my $count = $model->n_tensors;
Returns the number of tensors in the model.
n_kv
my $count = $model->n_kv;
Returns the number of metadata key-value pairs in the model.
tensor_names
my @names = $model->tensor_names;
Returns a list of all tensor names in the model.
Example:
my @names = $model->tensor_names;
# Returns: ('token_embd.weight', 'blk.0.attn_norm.weight', ...)
tensor_info
my ($type, $n_dims, $ne0, $ne1, $ne2, $ne3) = $model->tensor_info($name);
Returns information about a specific tensor.
Parameters:
$name- The tensor name
Returns: A list containing:
$type- The ggml type code (0=F32, 1=F16, etc.)$n_dims- Number of dimensions (1-4)$ne0, $ne1, $ne2, $ne3- Size of each dimension
Returns an empty list if the tensor is not found.
Example:
my ($type, $dims, @shape) = $model->tensor_info('token_embd.weight');
# For TinyLlama: (2, 2, 2048, 32000, 1, 1)
# Type 2 = Q4_K, 2D tensor, shape [2048, 32000]
kv_keys
my @keys = $model->kv_keys;
Returns a list of all metadata keys in the model.
Example:
my @keys = $model->kv_keys;
# Returns: ('general.architecture', 'llama.block_count', ...)
get_kv
my $value = $model->get_kv($key);
Returns the value of a metadata key.
Parameters:
$key- The metadata key name
Returns: The value as a scalar (string, number, or boolean), or an array reference for array values. Returns undef if the key is not found.
Example:
my $n_layers = $model->get_kv('llama.block_count'); # 22 for TinyLlama
my $n_embd = $model->get_kv('llama.embedding_length'); # 2048
my $vocab = $model->get_kv('tokenizer.ggml.tokens'); # ['<unk>', '<s>', ...]
COMMON METADATA KEYS
General
general.architecture- Model architecture (e.g., "llama")general.name- Model namegeneral.quantization_version- Quantization format version
Architecture-specific (llama)
llama.block_count- Number of transformer layersllama.embedding_length- Hidden dimension (n_embd)llama.attention.head_count- Number of attention headsllama.attention.head_count_kv- Number of KV heads (for GQA)llama.attention.layer_norm_rms_epsilon- RMSNorm epsilonllama.context_length- Maximum context lengthllama.feed_forward_length- FFN intermediate dimensionllama.vocab_size- Vocabulary sizellama.rope.dimension_count- RoPE rotation dimensionsllama.rope.freq_base- RoPE frequency base (10000 for llama)
Tokenizer
tokenizer.ggml.model- Tokenizer type (e.g., "llama", "gpt2")tokenizer.ggml.tokens- Vocabulary tokens (array)tokenizer.ggml.scores- Token scores (array)tokenizer.ggml.token_type- Token types (array)tokenizer.ggml.bos_token_id- Beginning of sequence token IDtokenizer.ggml.eos_token_id- End of sequence token IDtokenizer.ggml.unknown_token_id- Unknown token IDtokenizer.ggml.padding_token_id- Padding token ID
TENSOR NAMING CONVENTION
Tensor names follow a standard convention:
token_embd.weight- Token embedding matrix [n_embd, n_vocab]output.weight- Output projection [n_vocab, n_embd]output_norm.weight- Final layer normblk.N.attn_norm.weight- Attention layer norm for layer Nblk.N.attn_q.weight- Query projection for layer Nblk.N.attn_k.weight- Key projection for layer Nblk.N.attn_v.weight- Value projection for layer Nblk.N.attn_output.weight- Attention output projectionblk.N.ffn_norm.weight- FFN layer normblk.N.ffn_gate.weight- FFN gate projection (SwiGLU)blk.N.ffn_up.weight- FFN up projectionblk.N.ffn_down.weight- FFN down projection
THREAD SAFETY
Lugh::Model objects are NOT thread-safe. Each Perl thread must create its own Model object. The XS code uses a registry pattern with mutex locks for the global registry, but individual model contexts should not be shared across threads.
MEMORY USAGE
Loading a model allocates memory for all tensors. Memory usage depends on the quantization:
Model Size Q4_K_M Q8_0 F16
7B params 4.0 GB 7.0 GB 14 GB
13B params 7.4 GB 13 GB 26 GB
1.1B params 0.6 GB 1.1 GB 2.2 GB
The memory is freed when the Model object goes out of scope.
SEE ALSO
Lugh, Lugh::Tokenizer, Lugh::Inference
https://github.com/ggerganov/ggml/blob/master/docs/gguf.md - GGUF specification
AUTHOR
lnation <email@lnation.org>
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.