NAME
Lugh::Quant - Quantization utilities for Lugh tensors
SYNOPSIS
use Lugh;
use Lugh::Quant;
# Type constants
my $type = Lugh::Quant::Q4_K; # 12
# Or import them
use Lugh::Quant qw(Q4_K Q8_0 type_info);
# Type information
say Lugh::Quant::type_name($type); # "q4_K"
say Lugh::Quant::type_size($type); # bytes per block
say Lugh::Quant::blck_size($type); # elements per block
say Lugh::Quant::is_quantized($type); # 1
# Get all available types
my @all_types = Lugh::Quant::all_types();
my @quant_types = Lugh::Quant::all_quantized_types();
# Detailed type info
my $info = Lugh::Quant::type_info(Lugh::Quant::Q4_K);
# { type => 12, name => "q4_K", size => 144, blck_size => 256, ... }
# Quantize/dequantize are OO methods on Lugh::Tensor
my $ctx = Lugh::Context->new(mem_size => 1024 * 1024);
my $f32_tensor = Lugh::Tensor->new_f32($ctx, 256);
# Quantize F32 to Q4_K
my $q4_tensor = $f32_tensor->quantize($ctx, Lugh::Quant::Q4_K);
# Dequantize back to F32
my $restored = $q4_tensor->dequantize($ctx);
DESCRIPTION
Lugh::Quant provides type constants and utilities for working with quantized tensors in ggml. Quantization reduces model memory usage and can improve inference speed while maintaining acceptable accuracy.
The actual quantize() and dequantize() operations are OO methods on Lugh::Tensor - this module provides the type constants and introspection functions.
What is Quantization?
Quantization converts floating-point weights (typically F32 or F16) to lower precision formats. Instead of using 4 bytes per weight (F32), quantized models might use 4 bits (0.5 bytes) or less per weight.
Key trade-offs:
Memory - Quantized models use 4-8x less memory
Speed - Can be faster due to reduced memory bandwidth
Accuracy - Small quality loss, usually imperceptible
TYPE CONSTANTS
All GGML data types are exposed as constants:
Float Types
F32- 32-bit floating point (4 bytes)F16- 16-bit floating point (2 bytes)BF16- Brain float 16 (2 bytes)F64- 64-bit floating point (8 bytes)
Integer Types
I8- 8-bit signed integerI16- 16-bit signed integerI32- 32-bit signed integerI64- 64-bit signed integer
Basic Quantization (Legacy)
Q4_0- 4-bit quantization, simpleQ4_1- 4-bit quantization with offsetQ5_0- 5-bit quantizationQ5_1- 5-bit quantization with offsetQ8_0- 8-bit quantizationQ8_1- 8-bit quantization with offset
K-Quant Types (Recommended)
K-quant types provide better quality than legacy types at similar sizes:
Q2_K- 2-bit K-quant (~2.5 bits/weight)Q3_K- 3-bit K-quant (~3.4 bits/weight)Q4_K- 4-bit K-quant (~4.5 bits/weight) (Most Popular)Q5_K- 5-bit K-quant (~5.5 bits/weight)Q6_K- 6-bit K-quant (~6.5 bits/weight)Q8_K- 8-bit K-quant
IQ Types (Importance Matrix)
IQ types use importance matrices for optimal quantization at very low bitrates:
IQ1_S- 1-bit importance quantIQ1_M- 1-bit importance quant (mixed)IQ2_XXS- 2-bit importance quant (extra extra small)IQ2_XS- 2-bit importance quant (extra small)IQ2_S- 2-bit importance quant (small)IQ3_XXS- 3-bit importance quant (extra extra small)IQ3_S- 3-bit importance quant (small)IQ4_NL- 4-bit importance quant (non-linear)IQ4_XS- 4-bit importance quant (extra small)
Experimental Types
TQ1_0- Ternary quantization (1.6 bits/weight)TQ2_0- Ternary quantization variantMXFP4- Microscaling FP4 format
FUNCTIONS
type_name
my $name = Lugh::Quant::type_name($type);
Returns the string name of a type (e.g., "q4_K", "f32").
type_size
my $bytes = Lugh::Quant::type_size($type);
Returns the size in bytes of one block of this type.
blck_size
my $elements = Lugh::Quant::blck_size($type);
Returns the number of elements in one block. For quantized types this is typically 32 or 256.
type_sizef
my $bytes_per_element = Lugh::Quant::type_sizef($type);
Returns the effective size per element as a float (type_size / blck_size). This is the actual "bits per weight" / 8.
is_quantized
my $bool = Lugh::Quant::is_quantized($type);
Returns true if the type is a quantized format (not F32/F16/I32/etc).
requires_imatrix
my $bool = Lugh::Quant::requires_imatrix($type);
Returns true if the type requires an importance matrix for optimal quantization (IQ types).
row_size
my $bytes = Lugh::Quant::row_size($type, $n_elements);
Returns the number of bytes needed to store a row of $n_elements.
type_count
my $count = Lugh::Quant::type_count();
Returns the total number of defined types (including removed types).
all_types
my @types = Lugh::Quant::all_types();
Returns a list of all valid type IDs.
all_quantized_types
my @types = Lugh::Quant::all_quantized_types();
Returns a list of all quantized type IDs.
type_from_name
my $type = Lugh::Quant::type_from_name("q4_K");
Looks up a type by name. Returns -1 if not found.
type_info
my $info = Lugh::Quant::type_info($type);
# Returns hashref:
# {
# type => 12,
# name => "q4_K",
# size => 144,
# blck_size => 256,
# sizef => 0.5625,
# is_quantized => 1,
# requires_imatrix => 0
# }
Returns comprehensive information about a type as a hashref.
TENSOR METHODS
These methods are available on Lugh::Tensor objects:
Type Inspection
my $tensor = ...; # from model or created
say $tensor->type(); # numeric type ID
say $tensor->type_name(); # "q4_K", "f32", etc.
say $tensor->type_size(); # bytes per block
say $tensor->blck_size(); # elements per block
say $tensor->is_quantized(); # 1 or 0
say $tensor->nbytes(); # total bytes
quantize
my $quantized = $f32_tensor->quantize($ctx, $dest_type);
Quantizes an F32 tensor to the specified quantized type. Returns a new tensor. The source tensor must be F32.
use Lugh::Quant qw(Q4_K);
my $q4 = $tensor->quantize($ctx, Q4_K);
Note: For best results with IQ types, an importance matrix should be used. This method uses NULL for the importance matrix, which works but may not give optimal quality for IQ types.
dequantize
my $f32_tensor = $quantized_tensor->dequantize($ctx);
Dequantizes a quantized tensor back to F32. Also works with F16 and BF16. Returns a new F32 tensor.
my $restored = $q4_tensor->dequantize($ctx);
QUANTIZATION GUIDE
Choosing a Quantization Type
For most users, we recommend:
Q4_K - Best balance of size and quality (default choice)
Q5_K - Slightly better quality, ~10% larger
Q8_0 - Best quality, 2x size of Q4_K
Q2_K or IQ2_XS - Extreme compression, quality loss
Memory Comparison
For a 7B parameter model:
F32: 28.0 GB (unquantized)
F16: 14.0 GB
Q8_0: 7.0 GB
Q6_K: 5.5 GB
Q5_K: 4.8 GB
Q4_K: 4.0 GB
Q3_K: 3.4 GB
Q2_K: 2.7 GB
IQ2_XS: 2.1 GB
Quality vs Size Trade-off
Quality (perplexity): IQ1 < Q2 < IQ2 < Q3 < Q4 < Q5 < Q6 < Q8 < F16 < F32
Size (bits/weight): ~1.5 2.5 2.3 3.4 4.5 5.5 6.5 8.0 16 32
EXAMPLES
Inspect Model Quantization
use Lugh;
use Lugh::Quant;
my $model = Lugh::Model->new(model => 'model.gguf');
my $inf = Lugh::Inference->new(model => $model);
# Get tensor info
my @tensors = $model->tensor_names();
for my $name (@tensors) {
my $tensor = $inf->get_tensor($name);
printf "%s: %s (%d elements, %d bytes)\n",
$name,
$tensor->type_name(),
$tensor->nelements(),
$tensor->nbytes();
}
List All Quantized Types
use Lugh::Quant;
for my $type (Lugh::Quant::all_quantized_types()) {
my $info = Lugh::Quant::type_info($type);
printf "%-10s: %.2f bits/weight%s\n",
$info->{name},
$info->{sizef} * 8,
$info->{requires_imatrix} ? " (needs imatrix)" : "";
}
Manual Quantization
use Lugh;
use Lugh::Quant qw(Q4_K);
my $ctx = Lugh::Context->new(mem_size => 10 * 1024 * 1024);
# Create F32 tensor with random data
my $f32 = Lugh::Tensor->new_f32($ctx, 256, 256); # 256x256 matrix
# ... fill with data ...
# Quantize to Q4_K (OO style)
my $q4k = $f32->quantize($ctx, Q4_K);
printf "F32: %d bytes\n", $f32->nbytes(); # 262144 bytes
printf "Q4_K: %d bytes\n", $q4k->nbytes(); # ~147456 bytes
# Dequantize back
my $restored = $q4k->dequantize($ctx);
SEE ALSO
Lugh, Lugh::Model, Lugh::Tensor
REFERENCES
AUTHOR
Robert Acock
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.