NAME

Lugh::Quant - Quantization utilities for Lugh tensors

SYNOPSIS

use Lugh;
use Lugh::Quant;

# Type constants
my $type = Lugh::Quant::Q4_K;  # 12

# Or import them
use Lugh::Quant qw(Q4_K Q8_0 type_info);

# Type information
say Lugh::Quant::type_name($type);     # "q4_K"
say Lugh::Quant::type_size($type);     # bytes per block
say Lugh::Quant::blck_size($type);     # elements per block
say Lugh::Quant::is_quantized($type);  # 1

# Get all available types
my @all_types = Lugh::Quant::all_types();
my @quant_types = Lugh::Quant::all_quantized_types();

# Detailed type info
my $info = Lugh::Quant::type_info(Lugh::Quant::Q4_K);
# { type => 12, name => "q4_K", size => 144, blck_size => 256, ... }

# Quantize/dequantize are OO methods on Lugh::Tensor
my $ctx = Lugh::Context->new(mem_size => 1024 * 1024);
my $f32_tensor = Lugh::Tensor->new_f32($ctx, 256);

# Quantize F32 to Q4_K
my $q4_tensor = $f32_tensor->quantize($ctx, Lugh::Quant::Q4_K);

# Dequantize back to F32
my $restored = $q4_tensor->dequantize($ctx);

DESCRIPTION

Lugh::Quant provides type constants and utilities for working with quantized tensors in ggml. Quantization reduces model memory usage and can improve inference speed while maintaining acceptable accuracy.

The actual quantize() and dequantize() operations are OO methods on Lugh::Tensor - this module provides the type constants and introspection functions.

What is Quantization?

Quantization converts floating-point weights (typically F32 or F16) to lower precision formats. Instead of using 4 bytes per weight (F32), quantized models might use 4 bits (0.5 bytes) or less per weight.

Key trade-offs:

  • Memory - Quantized models use 4-8x less memory

  • Speed - Can be faster due to reduced memory bandwidth

  • Accuracy - Small quality loss, usually imperceptible

TYPE CONSTANTS

All GGML data types are exposed as constants:

Float Types

  • F32 - 32-bit floating point (4 bytes)

  • F16 - 16-bit floating point (2 bytes)

  • BF16 - Brain float 16 (2 bytes)

  • F64 - 64-bit floating point (8 bytes)

Integer Types

  • I8 - 8-bit signed integer

  • I16 - 16-bit signed integer

  • I32 - 32-bit signed integer

  • I64 - 64-bit signed integer

Basic Quantization (Legacy)

  • Q4_0 - 4-bit quantization, simple

  • Q4_1 - 4-bit quantization with offset

  • Q5_0 - 5-bit quantization

  • Q5_1 - 5-bit quantization with offset

  • Q8_0 - 8-bit quantization

  • Q8_1 - 8-bit quantization with offset

K-Quant Types (Recommended)

K-quant types provide better quality than legacy types at similar sizes:

  • Q2_K - 2-bit K-quant (~2.5 bits/weight)

  • Q3_K - 3-bit K-quant (~3.4 bits/weight)

  • Q4_K - 4-bit K-quant (~4.5 bits/weight) (Most Popular)

  • Q5_K - 5-bit K-quant (~5.5 bits/weight)

  • Q6_K - 6-bit K-quant (~6.5 bits/weight)

  • Q8_K - 8-bit K-quant

IQ Types (Importance Matrix)

IQ types use importance matrices for optimal quantization at very low bitrates:

  • IQ1_S - 1-bit importance quant

  • IQ1_M - 1-bit importance quant (mixed)

  • IQ2_XXS - 2-bit importance quant (extra extra small)

  • IQ2_XS - 2-bit importance quant (extra small)

  • IQ2_S - 2-bit importance quant (small)

  • IQ3_XXS - 3-bit importance quant (extra extra small)

  • IQ3_S - 3-bit importance quant (small)

  • IQ4_NL - 4-bit importance quant (non-linear)

  • IQ4_XS - 4-bit importance quant (extra small)

Experimental Types

  • TQ1_0 - Ternary quantization (1.6 bits/weight)

  • TQ2_0 - Ternary quantization variant

  • MXFP4 - Microscaling FP4 format

FUNCTIONS

type_name

my $name = Lugh::Quant::type_name($type);

Returns the string name of a type (e.g., "q4_K", "f32").

type_size

my $bytes = Lugh::Quant::type_size($type);

Returns the size in bytes of one block of this type.

blck_size

my $elements = Lugh::Quant::blck_size($type);

Returns the number of elements in one block. For quantized types this is typically 32 or 256.

type_sizef

my $bytes_per_element = Lugh::Quant::type_sizef($type);

Returns the effective size per element as a float (type_size / blck_size). This is the actual "bits per weight" / 8.

is_quantized

my $bool = Lugh::Quant::is_quantized($type);

Returns true if the type is a quantized format (not F32/F16/I32/etc).

requires_imatrix

my $bool = Lugh::Quant::requires_imatrix($type);

Returns true if the type requires an importance matrix for optimal quantization (IQ types).

row_size

my $bytes = Lugh::Quant::row_size($type, $n_elements);

Returns the number of bytes needed to store a row of $n_elements.

type_count

my $count = Lugh::Quant::type_count();

Returns the total number of defined types (including removed types).

all_types

my @types = Lugh::Quant::all_types();

Returns a list of all valid type IDs.

all_quantized_types

my @types = Lugh::Quant::all_quantized_types();

Returns a list of all quantized type IDs.

type_from_name

my $type = Lugh::Quant::type_from_name("q4_K");

Looks up a type by name. Returns -1 if not found.

type_info

my $info = Lugh::Quant::type_info($type);
# Returns hashref:
# {
#     type => 12,
#     name => "q4_K",
#     size => 144,
#     blck_size => 256,
#     sizef => 0.5625,
#     is_quantized => 1,
#     requires_imatrix => 0
# }

Returns comprehensive information about a type as a hashref.

TENSOR METHODS

These methods are available on Lugh::Tensor objects:

Type Inspection

my $tensor = ...;  # from model or created

say $tensor->type();         # numeric type ID
say $tensor->type_name();    # "q4_K", "f32", etc.
say $tensor->type_size();    # bytes per block
say $tensor->blck_size();    # elements per block
say $tensor->is_quantized(); # 1 or 0
say $tensor->nbytes();       # total bytes

quantize

my $quantized = $f32_tensor->quantize($ctx, $dest_type);

Quantizes an F32 tensor to the specified quantized type. Returns a new tensor. The source tensor must be F32.

use Lugh::Quant qw(Q4_K);
my $q4 = $tensor->quantize($ctx, Q4_K);

Note: For best results with IQ types, an importance matrix should be used. This method uses NULL for the importance matrix, which works but may not give optimal quality for IQ types.

dequantize

my $f32_tensor = $quantized_tensor->dequantize($ctx);

Dequantizes a quantized tensor back to F32. Also works with F16 and BF16. Returns a new F32 tensor.

my $restored = $q4_tensor->dequantize($ctx);

QUANTIZATION GUIDE

Choosing a Quantization Type

For most users, we recommend:

  • Q4_K - Best balance of size and quality (default choice)

  • Q5_K - Slightly better quality, ~10% larger

  • Q8_0 - Best quality, 2x size of Q4_K

  • Q2_K or IQ2_XS - Extreme compression, quality loss

Memory Comparison

For a 7B parameter model:

F32:   28.0 GB (unquantized)
F16:   14.0 GB
Q8_0:   7.0 GB
Q6_K:   5.5 GB
Q5_K:   4.8 GB
Q4_K:   4.0 GB
Q3_K:   3.4 GB
Q2_K:   2.7 GB
IQ2_XS: 2.1 GB

Quality vs Size Trade-off

Quality (perplexity):  IQ1 < Q2 < IQ2 < Q3 < Q4 < Q5 < Q6 < Q8 < F16 < F32
Size (bits/weight):    ~1.5  2.5  2.3   3.4  4.5  5.5  6.5  8.0  16   32

EXAMPLES

Inspect Model Quantization

use Lugh;
use Lugh::Quant;

my $model = Lugh::Model->new(model => 'model.gguf');
my $inf = Lugh::Inference->new(model => $model);

# Get tensor info
my @tensors = $model->tensor_names();
for my $name (@tensors) {
    my $tensor = $inf->get_tensor($name);
    printf "%s: %s (%d elements, %d bytes)\n",
        $name,
        $tensor->type_name(),
        $tensor->nelements(),
        $tensor->nbytes();
}

List All Quantized Types

use Lugh::Quant;

for my $type (Lugh::Quant::all_quantized_types()) {
    my $info = Lugh::Quant::type_info($type);
    printf "%-10s: %.2f bits/weight%s\n",
        $info->{name},
        $info->{sizef} * 8,
        $info->{requires_imatrix} ? " (needs imatrix)" : "";
}

Manual Quantization

use Lugh;
use Lugh::Quant qw(Q4_K);

my $ctx = Lugh::Context->new(mem_size => 10 * 1024 * 1024);

# Create F32 tensor with random data
my $f32 = Lugh::Tensor->new_f32($ctx, 256, 256);  # 256x256 matrix
# ... fill with data ...

# Quantize to Q4_K (OO style)
my $q4k = $f32->quantize($ctx, Q4_K);

printf "F32:  %d bytes\n", $f32->nbytes();   # 262144 bytes
printf "Q4_K: %d bytes\n", $q4k->nbytes();   # ~147456 bytes

# Dequantize back
my $restored = $q4k->dequantize($ctx);

SEE ALSO

Lugh, Lugh::Model, Lugh::Tensor

REFERENCES

AUTHOR

Robert Acock

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.