NAME

Lugh::Tensor - N-Dimensional Tensor with ggml Backend

VERSION

Version 0.11

SYNOPSIS

use Lugh;

# Create a context
my $ctx = Lugh::Context->new(mem_size => 1024 * 1024);

# Create tensors
my $vector = Lugh::Tensor->new_f32($ctx, 100);           # 1D
my $matrix = Lugh::Tensor->new_f32($ctx, 100, 200);      # 2D
my $tensor3d = Lugh::Tensor->new_f32($ctx, 10, 20, 30);  # 3D

# Set values
$vector->set_f32(1.0, 2.0, 3.0, ...);  # Must provide all elements

# Get values
my @values = $vector->get_f32();

# Get tensor properties
my $n = $tensor->nelements();  # Total element count
my $dims = $tensor->n_dims();  # Number of dimensions
my @shape = $tensor->shape();  # Size of each dimension

DESCRIPTION

Lugh::Tensor represents an N-dimensional array of numbers, implemented using ggml's tensor system. Tensors are the fundamental building blocks for neural network computations.

Tensor Properties

  • Data type - F32 (32-bit float), or quantized types for model weights

  • Dimensions - 1D to 4D arrays

  • Shape - Size of each dimension

  • Strides - Memory layout for traversal

Memory Layout

Tensors use row-major (C-style) memory layout:

2D tensor [3, 4]:

Memory: [a00, a01, a02, a03, a10, a11, a12, a13, a20, a21, a22, a23]

Logical:
    a00 a01 a02 a03
    a10 a11 a12 a13
    a20 a21 a22 a23

The first dimension changes fastest in memory.

CONSTRUCTOR

new_f32

my $tensor = Lugh::Tensor->new_f32($context, @dimensions);

Creates a new tensor with F32 (32-bit float) data type.

Parameters:

  • $context - A Lugh::Context object

  • @dimensions - 1 to 4 dimension sizes

Returns: A Lugh::Tensor object.

Throws: Dies if allocation fails or dimensions are invalid.

Examples:

# 1D vector with 100 elements
my $v = Lugh::Tensor->new_f32($ctx, 100);

# 2D matrix with 100 rows, 200 columns
my $m = Lugh::Tensor->new_f32($ctx, 100, 200);

# 3D tensor
my $t = Lugh::Tensor->new_f32($ctx, 10, 20, 30);

# 4D tensor (max dimensions)
my $t4 = Lugh::Tensor->new_f32($ctx, 2, 3, 4, 5);

METHODS

set_f32

$tensor->set_f32(@values);

Sets all tensor elements from a list of values.

Parameters:

  • @values - Exactly nelements() float values

Throws: Dies if wrong number of values provided.

Example:

my $t = Lugh::Tensor->new_f32($ctx, 3);
$t->set_f32(1.0, 2.0, 3.0);

get_f32

my @values = $tensor->get_f32();

Returns all tensor elements as a list.

Returns: A list of nelements() float values.

Example:

my @data = $tensor->get_f32();
print "First element: $data[0]\n";
print "Sum: ", sum(@data), "\n";

nelements

my $n = $tensor->nelements();

Returns the total number of elements in the tensor.

Example:

my $t = Lugh::Tensor->new_f32($ctx, 10, 20, 30);
print $t->nelements();  # 6000

n_dims

my $dims = $tensor->n_dims();

Returns the number of dimensions (1-4).

Example:

my $t = Lugh::Tensor->new_f32($ctx, 10, 20);
print $t->n_dims();  # 2

shape

my @shape = $tensor->shape();

Returns the size of each dimension.

Example:

my $t = Lugh::Tensor->new_f32($ctx, 10, 20, 30);
my @shape = $t->shape();  # (10, 20, 30)

type

my $type_id = $tensor->type();

Returns the numeric type ID of the tensor (e.g., 0 for F32, 12 for Q4_K).

Example:

my $t = Lugh::Tensor->new_f32($ctx, 100);
print $t->type();  # 0 (F32)

type_name

my $name = $tensor->type_name();

Returns the string name of the tensor's type.

Example:

my $t = Lugh::Tensor->new_f32($ctx, 100);
print $t->type_name();  # "f32"

# From a quantized model tensor
print $weight_tensor->type_name();  # "q4_K"

type_size

my $bytes = $tensor->type_size();

Returns the size in bytes of one block of this type.

blck_size

my $elements = $tensor->blck_size();

Returns the number of elements per block. For quantized types this is typically 32 or 256.

is_quantized

my $bool = $tensor->is_quantized();

Returns true if the tensor uses a quantized data type.

Example:

if ($tensor->is_quantized()) {
    print "Tensor uses ", $tensor->type_name(), " quantization\n";
}

nbytes

my $bytes = $tensor->nbytes();

Returns the total number of bytes used by the tensor's data.

Example:

my $t = Lugh::Tensor->new_f32($ctx, 1000);
print $t->nbytes();  # 4000 (1000 × 4 bytes)

quantize

my $quantized = $tensor->quantize($ctx, $dest_type);

Quantizes an F32 tensor to the specified quantized type. Returns a new tensor.

Parameters:

  • $ctx - A Lugh::Context with enough memory for the result

  • $dest_type - Target quantization type (from Lugh::Quant)

Returns: A new Lugh::Tensor with the quantized data.

Throws: Dies if source is not F32 or destination is not a quantized type.

Example:

use Lugh::Quant qw(Q4_K);

my $f32 = Lugh::Tensor->new_f32($ctx, 256);
$f32->set_f32(@weights);

my $q4 = $f32->quantize($ctx, Q4_K);
printf "Compressed: %d -> %d bytes\n", $f32->nbytes, $q4->nbytes;

dequantize

my $f32 = $tensor->dequantize($ctx);

Dequantizes a quantized (or F16/BF16) tensor back to F32. Returns a new tensor.

Parameters:

  • $ctx - A Lugh::Context with enough memory for the result

Returns: A new F32 Lugh::Tensor.

Throws: Dies if tensor is already F32.

Example:

# Round-trip: F32 -> Q4_K -> F32
my $original = Lugh::Tensor->new_f32($ctx, 256);
$original->set_f32(@data);

my $quantized = $original->quantize($ctx, Lugh::Quant::Q4_K);
my $restored = $quantized->dequantize($ctx);

# Compare original vs restored to measure quantization loss
my @orig = $original->get_f32();
my @rest = $restored->get_f32();

TENSOR OPERATIONS

Tensors can be used with Lugh::Ops to build computation graphs:

my $a = Lugh::Tensor->new_f32($ctx, 100);
my $b = Lugh::Tensor->new_f32($ctx, 100);
$a->set_f32(@a_data);
$b->set_f32(@b_data);

# Create operation result tensors
my $c = Lugh::Ops::add($ctx, $a, $b);      # Element-wise add
my $d = Lugh::Ops::mul($ctx, $a, $b);      # Element-wise multiply
my $e = Lugh::Ops::soft_max($ctx, $a);     # Softmax

# Build and compute graph
my $graph = Lugh::Graph->new($ctx);
$graph->build_forward($c);
$graph->compute($ctx, 4);

# Get results
my @result = $c->get_f32();

DATA TYPES

ggml supports many tensor data types:

Float Types

  • GGML_TYPE_F32 (0) - 32-bit float (4 bytes per element)

  • GGML_TYPE_F16 (1) - 16-bit float (2 bytes per element)

  • GGML_TYPE_BF16 (30) - Brain float16 (2 bytes per element)

Quantized Types

Used for model weights to reduce memory:

  • Q4_0, Q4_1, Q4_K_S, Q4_K_M - 4-bit quantization (~0.5 bytes/element)

  • Q5_0, Q5_1, Q5_K_S, Q5_K_M - 5-bit quantization (~0.625 bytes/element)

  • Q8_0, Q8_1, Q8_K - 8-bit quantization (1 byte per element)

  • Q2_K, Q3_K - 2-3 bit quantization (~0.3 bytes/element)

Quantized tensors from model files can be used directly in operations - ggml handles dequantization automatically during computation.

BROADCASTING

Many operations support broadcasting (NumPy-style):

# Scalar broadcast: [1] op [n] -> [n]
my $scalar = Lugh::Tensor->new_f32($ctx, 1);
my $vector = Lugh::Tensor->new_f32($ctx, 100);
my $result = Lugh::Ops::mul($ctx, $scalar, $vector);

# Row broadcast: [1, n] op [m, n] -> [m, n]
# Column broadcast: [m, 1] op [m, n] -> [m, n]

The broadcasting rules follow standard tensor semantics.

MATRIX MULTIPLICATION

Matrix multiplication follows the pattern:

A [k, n] × B [k, m] → C [n, m]

Note: ggml uses column-major interpretation for mul_mat

Example:

my $a = Lugh::Tensor->new_f32($ctx, 4, 3);  # 3×4 matrix
my $b = Lugh::Tensor->new_f32($ctx, 4, 2);  # 2×4 matrix
my $c = Lugh::Ops::mul_mat($ctx, $a, $b);   # 3×2 result

COMMON TENSOR SHAPES

In transformer models:

  • Token embeddings - [n_embd, n_vocab]

  • Hidden state - [n_embd, n_tokens]

  • Attention Q/K/V - [head_dim, n_heads, n_tokens]

  • FFN weights - [n_embd, ffn_dim] or [ffn_dim, n_embd]

  • Logits - [n_vocab, n_tokens]

VIEWS AND RESHAPING

Tensors can be reshaped without copying data:

# Operations like reshape, permute, transpose
# create views of the same memory

my $flat = Lugh::Tensor->new_f32($ctx, 120);
# Internally, ggml can view this as [2,3,4,5] without copying

Note: View operations are internal to ggml. The Perl API currently focuses on creating new tensors and computing results.

THREAD SAFETY

Tensor objects themselves are not thread-safe. However, ggml's graph computation can use multiple CPU threads for parallel operations:

$graph->compute($ctx, $n_threads);

This uses pthreads internally, parallelizing matrix operations across the specified number of threads.

MEMORY

Tensors are allocated from their context's memory arena:

  • Metadata: ~256 bytes per tensor

  • Data: type-specific (4 bytes per element for F32)

Memory is freed when the context is destroyed, not when individual tensor objects go out of scope.

SEE ALSO

Lugh, Lugh::Context, Lugh::Ops, Lugh::Graph

https://github.com/ggerganov/ggml - ggml tensor library

AUTHOR

lnation <email@lnation.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.