NAME

Lugh::Model - GGUF Model Loading and Tensor Access

VERSION

Version 0.01

SYNOPSIS

use Lugh;

# Load a GGUF model file
my $model = Lugh::Model->new(
    model => '/path/to/model.gguf'
);

# Get model information
print "Architecture: ", $model->architecture, "\n";
print "Tensors: ", $model->n_tensors, "\n";
print "Metadata keys: ", $model->n_kv, "\n";

# Access model metadata
my $n_layers = $model->get_kv('llama.block_count');
my $n_embd = $model->get_kv('llama.embedding_length');
my $vocab_size = $model->get_kv('llama.vocab_size');

# List all tensors
my @names = $model->tensor_names;

# Get tensor information
my ($type, $n_dims, @shape) = $model->tensor_info('token_embd.weight');

# List all metadata keys
my @keys = $model->kv_keys;

DESCRIPTION

Lugh::Model provides an interface for loading and inspecting GGUF model files. GGUF (GPT-Generated Unified Format) is the standard format for storing large language models, used by llama.cpp and related projects.

The model object loads the entire model into memory, including all tensors with their weights. This allows direct access to model parameters for inference.

GGUF Format

GGUF files contain:

Header - Magic number, version, tensor count, metadata count
Metadata - Key-value pairs describing the model architecture, hyperparameters, tokenizer vocabulary, and other configuration
Tensor Info - Name, dimensions, type, and offset for each tensor
Tensor Data - The actual weight data, potentially quantized

Supported Quantization Types

The model loader supports all ggml quantization types, including:

F32, F16, BF16 - Full/half precision floats
Q4_0, Q4_1, Q4_K_S, Q4_K_M - 4-bit quantization
Q5_0, Q5_1, Q5_K_S, Q5_K_M - 5-bit quantization
Q8_0, Q8_1, Q8_K - 8-bit quantization
Q2_K, Q3_K_S, Q3_K_M, Q3_K_L - 2-3 bit quantization
Q6_K - 6-bit quantization
IQ1_S, IQ2_XXS, IQ2_XS, IQ2_S, IQ3_XXS, IQ3_XS, IQ3_S, IQ4_NL, IQ4_XS - i-quants

CONSTRUCTOR

new

my $model = Lugh::Model->new(
    model => '/path/to/model.gguf'
);

Creates a new Model object by loading a GGUF file.

Parameters:

model (required) - Path to the GGUF model file. Also accepts file or path as aliases.

Returns: A Lugh::Model object.

Throws: Dies if the file cannot be loaded or is not a valid GGUF file.

Example:

my $model = Lugh::Model->new(
    model => '/models/tinyllama-1.1b-chat-v1.0.Q4_K_M.gguf'
);

METHODS

filename

my $path = $model->filename;

Returns the path to the loaded GGUF file.

architecture

my $arch = $model->architecture;

Returns the model architecture string (e.g., "llama", "gpt2", "falcon"). Returns "unknown" if the architecture is not specified in the model.

n_tensors

my $count = $model->n_tensors;

Returns the number of tensors in the model.

n_kv

my $count = $model->n_kv;

Returns the number of metadata key-value pairs in the model.

tensor_names

my @names = $model->tensor_names;

Returns a list of all tensor names in the model.

Example:

my @names = $model->tensor_names;
# Returns: ('token_embd.weight', 'blk.0.attn_norm.weight', ...)

tensor_info

my ($type, $n_dims, $ne0, $ne1, $ne2, $ne3) = $model->tensor_info($name);

Returns information about a specific tensor.

Parameters:

$name - The tensor name

Returns: A list containing:

$type - The ggml type code (0=F32, 1=F16, etc.)
$n_dims - Number of dimensions (1-4)
$ne0, $ne1, $ne2, $ne3 - Size of each dimension

Returns an empty list if the tensor is not found.

Example:

my ($type, $dims, @shape) = $model->tensor_info('token_embd.weight');
# For TinyLlama: (2, 2, 2048, 32000, 1, 1)
# Type 2 = Q4_K, 2D tensor, shape [2048, 32000]

kv_keys

my @keys = $model->kv_keys;

Returns a list of all metadata keys in the model.

Example:

my @keys = $model->kv_keys;
# Returns: ('general.architecture', 'llama.block_count', ...)

get_kv

my $value = $model->get_kv($key);

Returns the value of a metadata key.

Parameters:

$key - The metadata key name

Returns: The value as a scalar (string, number, or boolean), or an array reference for array values. Returns undef if the key is not found.

Example:

my $n_layers = $model->get_kv('llama.block_count');  # 22 for TinyLlama
my $n_embd = $model->get_kv('llama.embedding_length');  # 2048
my $vocab = $model->get_kv('tokenizer.ggml.tokens');  # ['<unk>', '<s>', ...]

COMMON METADATA KEYS

General

general.architecture - Model architecture (e.g., "llama")
general.name - Model name
general.quantization_version - Quantization format version

Architecture-specific (llama)

llama.block_count - Number of transformer layers
llama.embedding_length - Hidden dimension (n_embd)
llama.attention.head_count - Number of attention heads
llama.attention.head_count_kv - Number of KV heads (for GQA)
llama.attention.layer_norm_rms_epsilon - RMSNorm epsilon
llama.context_length - Maximum context length
llama.feed_forward_length - FFN intermediate dimension
llama.vocab_size - Vocabulary size
llama.rope.dimension_count - RoPE rotation dimensions
llama.rope.freq_base - RoPE frequency base (10000 for llama)

Tokenizer

tokenizer.ggml.model - Tokenizer type (e.g., "llama", "gpt2")
tokenizer.ggml.tokens - Vocabulary tokens (array)
tokenizer.ggml.scores - Token scores (array)
tokenizer.ggml.token_type - Token types (array)
tokenizer.ggml.bos_token_id - Beginning of sequence token ID
tokenizer.ggml.eos_token_id - End of sequence token ID
tokenizer.ggml.unknown_token_id - Unknown token ID
tokenizer.ggml.padding_token_id - Padding token ID

TENSOR NAMING CONVENTION

Tensor names follow a standard convention:

token_embd.weight - Token embedding matrix [n_embd, n_vocab]
output.weight - Output projection [n_vocab, n_embd]
output_norm.weight - Final layer norm
blk.N.attn_norm.weight - Attention layer norm for layer N
blk.N.attn_q.weight - Query projection for layer N
blk.N.attn_k.weight - Key projection for layer N
blk.N.attn_v.weight - Value projection for layer N
blk.N.attn_output.weight - Attention output projection
blk.N.ffn_norm.weight - FFN layer norm
blk.N.ffn_gate.weight - FFN gate projection (SwiGLU)
blk.N.ffn_up.weight - FFN up projection
blk.N.ffn_down.weight - FFN down projection

THREAD SAFETY

Lugh::Model objects are NOT thread-safe. Each Perl thread must create its own Model object. The XS code uses a registry pattern with mutex locks for the global registry, but individual model contexts should not be shared across threads.

MEMORY USAGE

Loading a model allocates memory for all tensors. Memory usage depends on the quantization:

Model Size    Q4_K_M     Q8_0       F16
7B params     4.0 GB     7.0 GB     14 GB
13B params    7.4 GB     13 GB      26 GB
1.1B params   0.6 GB     1.1 GB     2.2 GB

The memory is freed when the Model object goes out of scope.

AUTHOR

lnation <email@lnation.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Lugh, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lugh

CPAN shell

perl -MCPAN -e shell
install Lugh

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

DESCRIPTION

GGUF Format

Supported Quantization Types

CONSTRUCTOR

new

METHODS

filename

architecture

n_tensors

n_kv

tensor_names

tensor_info

kv_keys

get_kv

COMMON METADATA KEYS

General

Architecture-specific (llama)

Tokenizer

TENSOR NAMING CONVENTION

THREAD SAFETY

MEMORY USAGE

SEE ALSO

AUTHOR

LICENSE

Module Install Instructions