NAME

Lugh::Graph - Computation Graph for Tensor Operations

VERSION

Version 0.01

SYNOPSIS

use Lugh;

# Create context and tensors
my $ctx = Lugh::Context->new(mem_size => 10 * 1024 * 1024);

my $a = Lugh::Tensor->new_f32($ctx, 1000);
my $b = Lugh::Tensor->new_f32($ctx, 1000);
$a->set_f32(@a_data);
$b->set_f32(@b_data);

# Build computation
my $c = Lugh::Ops::add($ctx, $a, $b);
my $d = Lugh::Ops::mul($ctx, $c, $c);
my $e = Lugh::Ops::soft_max($ctx, $d);

# Create graph and add operations
my $graph = Lugh::Graph->new($ctx);
$graph->build_forward($e);

# Execute computation
$graph->compute($ctx, 4);  # Use 4 threads

# Read results
my @result = $e->get_f32();

DESCRIPTION

Lugh::Graph represents a computation graph - a directed acyclic graph (DAG) of tensor operations. The graph enables:

  • Lazy evaluation - Operations are not computed until graph is run

  • Optimization - ggml can fuse and optimize operations

  • Parallelization - Multiple threads for matrix operations

  • Memory planning - Efficient allocation of intermediate tensors

Graph Structure

A computation graph consists of nodes (tensors) and edges (dependencies):

Input A    Input B
   │          │
   └────┬─────┘
        │
     Add(A,B) = C
        │
        ├─────────┐
        │         │
     Mul(C,C) = D │
        │         │
        └────┬────┘
             │
        SoftMax(D) = E
             │
          Output

The graph tracks dependencies so operations execute in correct order.

Build Phase vs Compute Phase

1. Build Phase - Create tensors and operations, recording the graph
2. Compute Phase - Execute all operations in dependency order

This separation allows the same graph to be executed multiple times with different input values.

CONSTRUCTOR

new

my $graph = Lugh::Graph->new($context);

Creates a new empty computation graph.

Parameters:

  • $context - A Lugh::Context object for graph metadata

Returns: A Lugh::Graph object.

Example:

my $ctx = Lugh::Context->new(mem_size => 1024 * 1024);
my $graph = Lugh::Graph->new($ctx);

METHODS

build_forward

$graph->build_forward($output_tensor);

Adds an output tensor and all its dependencies to the graph.

Parameters:

  • $output_tensor - The tensor to compute (a Lugh::Tensor)

Details:

This method traverses backwards from the output tensor, adding all required operations to the graph. Multiple outputs can be added by calling build_forward multiple times.

Example:

my $loss = Lugh::Ops::...;
my $accuracy = Lugh::Ops::...;

my $graph = Lugh::Graph->new($ctx);
$graph->build_forward($loss);
$graph->build_forward($accuracy);

compute

$graph->compute($context, $n_threads);

Executes all operations in the graph.

Parameters:

  • $context - The context for computation

  • $n_threads - Number of CPU threads to use

Thread Usage:

  • 1 thread - Sequential execution, lowest overhead

  • N threads - Parallel matrix operations (recommended: CPU cores)

  • Too many threads - Diminishing returns, overhead increases

Example:

# Single-threaded
$graph->compute($ctx, 1);

# Use all CPU cores (example for 8-core machine)
$graph->compute($ctx, 8);

# Common recommendation
use Sys::Info;
my $info = Sys::Info->new;
my $cpu = $info->device('CPU');
$graph->compute($ctx, $cpu->count);

GRAPH OPERATIONS

Multiple Outputs

my $graph = Lugh::Graph->new($ctx);
$graph->build_forward($output1);
$graph->build_forward($output2);
$graph->compute($ctx, 4);

# Both outputs are now computed
my @result1 = $output1->get_f32();
my @result2 = $output2->get_f32();

Reusing a Graph

# Build once
my $graph = Lugh::Graph->new($ctx);
$graph->build_forward($output);

# Run multiple times with different inputs
for my $input_data (@all_inputs) {
    $input->set_f32(@$input_data);
    $graph->compute($ctx, 4);
    my @result = $output->get_f32();
    push @all_results, \@result;
}

EXECUTION MODEL

Forward Execution

Operations are executed in topological order (dependencies first):

1. Input tensors (already have data)
2. First layer of operations
3. Second layer of operations
4. ... and so on to outputs

Memory Allocation

ggml allocates memory for intermediate tensors during computation. The context must have enough memory for:

  • Input tensors

  • Output tensors

  • All intermediate tensors

  • Graph metadata

Thread Pool

When using multiple threads, ggml creates a thread pool:

Main Thread
     │
┌────┴────┬────────┬────────┐
│         │        │        │
Worker 0  Worker 1  Worker 2  Worker 3
│         │        │        │
└────┬────┴────────┴────────┘
     │
Barrier Sync
     │
Next Operation

Matrix multiplications and other large operations are parallelized across workers.

PERFORMANCE TIPS

Batch Operations

Instead of many small graph executions, batch inputs:

# Slower: Many small graphs
for my $input (@inputs) {
    $graph->compute($ctx, 4);
}

# Faster: One large computation
# (if using batched tensors)
$batched_graph->compute($ctx, 4);

Memory Reuse

The same context can be reused for multiple graph executions, avoiding repeated memory allocation.

Graph Caching

For inference, build the graph once and reuse:

# Build once at startup
my $inference_graph = build_inference_graph($model);

# Reuse for each query
sub infer {
    my ($tokens) = @_;
    $input_tensor->set_data(@$tokens);
    $inference_graph->compute($ctx, 4);
    return $output_tensor->get_f32();
}

BACKEND SELECTION

ggml automatically selects the best compute backend:

  • CPU - Always available, uses SIMD (SSE/AVX/NEON)

  • Metal - Apple Silicon and AMD GPUs on macOS

  • CUDA - NVIDIA GPUs

  • Vulkan - Cross-platform GPU

  • BLAS - Accelerate (macOS) or OpenBLAS for matrix ops

The backend is selected at ggml build time and runtime.

DEBUGGING

Graph Size

# After building
my $n_nodes = ...;  # (Not yet exposed, could add)
print "Graph has $n_nodes operations\n";

Operation Timing

For performance analysis, you can time the compute call:

use Time::HiRes qw(time);

my $start = time();
$graph->compute($ctx, 4);
my $elapsed = time() - $start;

print "Compute took ${elapsed}s\n";

ERROR HANDLING

Common Errors

  • Shape mismatch - Operations require compatible tensor shapes

  • Out of memory - Context too small for tensors

  • Null tensor - Operation returned NULL (allocation failure)

Error Recovery

Graph operations die on error. Use eval for error handling:

eval {
    $graph->compute($ctx, 4);
};
if ($@) {
    warn "Computation failed: $@";
    # Handle error...
}

THREAD SAFETY

Graph objects are NOT thread-safe. Each Perl thread should create its own graphs. However, the compute() method uses internal threading that is safe.

IMPLEMENTATION NOTES

Internally, Lugh::Graph wraps struct ggml_cgraph*:

struct ggml_cgraph {
    int n_nodes;
    int n_leafs;
    struct ggml_tensor ** nodes;  // Operations
    struct ggml_tensor ** grads;  // Gradients (for training)
    struct ggml_tensor ** leafs;  // Inputs
    ...
};

The graph is computed using ggml_graph_compute_with_ctx().

SEE ALSO

Lugh, Lugh::Context, Lugh::Tensor, Lugh::Ops

https://github.com/ggerganov/ggml - ggml library

AUTHOR

lnation <email@lnation.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 67:

Non-ASCII character seen before =encoding in '│'. Assuming UTF-8