NAME
Lugh::Autograd::Ops - Differentiable operations for automatic differentiation
SYNOPSIS
use Lugh;
use Lugh::Autograd;
my $ctx = Lugh::Context->new(mem_size => 16 * 1024 * 1024);
# Create tensors with gradient tracking
my $a = Lugh::Autograd::Tensor->new($ctx, 'f32', 4, { requires_grad => 1 });
my $b = Lugh::Autograd::Tensor->new($ctx, 'f32', 4, { requires_grad => 1 });
$a->set_data(1.0, 2.0, 3.0, 4.0);
$b->set_data(2.0, 2.0, 2.0, 2.0);
# Element-wise operations
my $sum_result = Lugh::Autograd::Ops->add($ctx, $a, $b);
my $prod_result = Lugh::Autograd::Ops->mul($ctx, $a, $b);
# Reduction operations
my $total = Lugh::Autograd::Ops->sum($ctx, $prod_result);
# Compute the graph
my $graph = Lugh::Graph->new($ctx);
my $raw = Lugh::Tensor->from_ptr($total->_raw_tensor_ptr);
$graph->build_forward($raw);
$graph->compute($ctx, 1);
# Backward pass
$total->backward;
# Access gradients
my $grad_a = $a->grad; # Gradients w.r.t. $a
my $grad_b = $b->grad; # Gradients w.r.t. $b
DESCRIPTION
Lugh::Autograd::Ops provides differentiable tensor operations that automatically track gradients for backpropagation. Each operation records its inputs in the computation graph, enabling automatic gradient computation via the backward() method.
All operations return Lugh::Autograd::Tensor objects. If any input tensor has requires_grad set to true and gradient tracking is enabled globally, the output tensor will also track gradients.
CLASS METHODS
add
my $c = Lugh::Autograd::Ops->add($ctx, $a, $b);
Performs element-wise addition of two tensors.
Parameters:
$ctx- A Lugh::Context object$a- First Lugh::Autograd::Tensor operand$b- Second Lugh::Autograd::Tensor operand
Returns: A new Lugh::Autograd::Tensor containing $a + $b
Gradient: For z = x + y, the gradients are:
dL/dx = dL/dz
dL/dy = dL/dz
Example:
my $ctx = Lugh::Context->new(mem_size => 16 * 1024 * 1024);
my $x = Lugh::Autograd::Tensor->new($ctx, 'f32', 3, { requires_grad => 1 });
my $y = Lugh::Autograd::Tensor->new($ctx, 'f32', 3, { requires_grad => 1 });
$x->set_data(1.0, 2.0, 3.0);
$y->set_data(4.0, 5.0, 6.0);
my $z = Lugh::Autograd::Ops->add($ctx, $x, $y);
$ctx->compute;
# z contains [5.0, 7.0, 9.0]
my @z_data = $z->get_data;
# Backward pass
my $loss = Lugh::Autograd::Ops->sum($ctx, $z);
$ctx->compute;
$loss->backward;
# Both gradients are [1.0, 1.0, 1.0] (gradient of sum flows equally)
my $grad_x = $x->grad;
my $grad_y = $y->grad;
mul
my $c = Lugh::Autograd::Ops->mul($ctx, $a, $b);
Performs element-wise multiplication of two tensors.
Parameters:
$ctx- A Lugh::Context object$a- First Lugh::Autograd::Tensor operand$b- Second Lugh::Autograd::Tensor operand
Returns: A new Lugh::Autograd::Tensor containing $a * $b
Gradient: For z = x * y, the gradients are:
dL/dx = dL/dz * y
dL/dy = dL/dz * x
Example:
my $ctx = Lugh::Context->new(mem_size => 16 * 1024 * 1024);
my $x = Lugh::Autograd::Tensor->new($ctx, 'f32', 3, { requires_grad => 1 });
my $y = Lugh::Autograd::Tensor->new($ctx, 'f32', 3, { requires_grad => 1 });
$x->set_data(2.0, 3.0, 4.0);
$y->set_data(5.0, 6.0, 7.0);
my $z = Lugh::Autograd::Ops->mul($ctx, $x, $y);
$ctx->compute;
# z contains [10.0, 18.0, 28.0]
my @z_data = $z->get_data;
# Backward pass
my $loss = Lugh::Autograd::Ops->sum($ctx, $z);
$ctx->compute;
$loss->backward;
# grad_x = y values = [5.0, 6.0, 7.0]
# grad_y = x values = [2.0, 3.0, 4.0]
my $grad_x = $x->grad;
my $grad_y = $y->grad;
sum
my $scalar = Lugh::Autograd::Ops->sum($ctx, $a);
Reduces a tensor to a scalar by summing all elements.
Parameters:
$ctx- A Lugh::Context object$a- The Lugh::Autograd::Tensor to sum
Returns: A new Lugh::Autograd::Tensor containing a single scalar value
Gradient: For y = sum(x), the gradient is:
dL/dx_i = dL/dy (gradient broadcasts to all elements)
Example:
my $ctx = Lugh::Context->new(mem_size => 16 * 1024 * 1024);
my $x = Lugh::Autograd::Tensor->new($ctx, 'f32', 4, { requires_grad => 1 });
$x->set_data(1.0, 2.0, 3.0, 4.0);
my $total = Lugh::Autograd::Ops->sum($ctx, $x);
$ctx->compute;
# total contains [10.0] (scalar tensor)
my @total_data = $total->get_data;
# Backward pass
$total->backward;
# All gradients are 1.0 (sum distributes gradient equally)
my $grad = $x->grad; # [1.0, 1.0, 1.0, 1.0]
sub
my $c = Lugh::Autograd::Ops->sub($ctx, $a, $b);
Performs element-wise subtraction of two tensors.
Gradient: For z = x - y:
dL/dx = dL/dz
dL/dy = -dL/dz
div
my $c = Lugh::Autograd::Ops->div($ctx, $a, $b);
Performs element-wise division of two tensors.
Gradient: For z = x / y:
dL/dx = dL/dz / y
dL/dy = -dL/dz * x / y^2
scale
my $c = Lugh::Autograd::Ops->scale($ctx, $a, $scalar);
Multiplies all elements of a tensor by a scalar value.
Parameters:
$ctx- A Lugh::Context object$a- The Lugh::Autograd::Tensor to scale$scalar- A numeric scalar value
Gradient: For y = s * x:
dL/dx = s * dL/dy
matmul
my $c = Lugh::Autograd::Ops->matmul($ctx, $a, $b);
Performs matrix multiplication of two tensors.
Gradient: For C = A @ B:
dL/dA = dL/dC @ B^T
dL/dB = A^T @ dL/dC
mean
my $scalar = Lugh::Autograd::Ops->mean($ctx, $a);
Reduces a tensor to a scalar by computing the mean of all elements.
Gradient: For y = mean(x):
dL/dx_i = dL/dy / n (where n is the number of elements)
relu
my $c = Lugh::Autograd::Ops->relu($ctx, $a);
Applies the Rectified Linear Unit activation function element-wise.
Formula: relu(x) = max(0, x)
Gradient:
dL/dx = dL/dy if x > 0, else 0
gelu
my $c = Lugh::Autograd::Ops->gelu($ctx, $a);
Applies the Gaussian Error Linear Unit activation function element-wise.
Formula: gelu(x) = 0.5 * x * (1 + tanh(sqrt(2/pi) * (x + 0.044715 * x^3)))
Used in transformer models like BERT and GPT.
silu
my $c = Lugh::Autograd::Ops->silu($ctx, $a);
Applies the Sigmoid Linear Unit (Swish) activation function element-wise.
Formula: silu(x) = x * sigmoid(x)
Used in models like LLaMA and other modern architectures.
Gradient:
dL/dx = sigmoid(x) * (1 + x * (1 - sigmoid(x))) * dL/dy
softmax
my $c = Lugh::Autograd::Ops->softmax($ctx, $a);
Applies the softmax function, converting logits to probabilities.
Formula: softmax(x)_i = exp(x_i) / sum(exp(x_j))
Output values are in range (0, 1) and sum to 1.
Gradient:
dL/dx_i = y_i * (dL/dy_i - sum_j(dL/dy_j * y_j))
rms_norm
my $c = Lugh::Autograd::Ops->rms_norm($ctx, $a);
my $c = Lugh::Autograd::Ops->rms_norm($ctx, $a, $eps); # custom epsilon
Applies Root Mean Square Layer Normalization.
Formula: rms_norm(x) = x / sqrt(mean(x^2) + eps)
Parameters:
$eps- (Optional) Small constant for numerical stability, default 1e-5
Used in transformer models like LLaMA for efficient normalization.
GRADIENT TRACKING
Operations respect the global gradient tracking state controlled by Lugh::Autograd:
use Lugh::Autograd;
# Gradients tracked normally
my $c = Lugh::Autograd::Ops->add($ctx, $a, $b);
# Disable gradient tracking for efficiency
Lugh::Autograd::no_grad {
my $inference = Lugh::Autograd::Ops->add($ctx, $a, $b);
# $inference->requires_grad is false
};
When gradient tracking is disabled:
Output tensors have
requires_grad = 0No computation graph is built
Memory usage is reduced
COMPUTATION WORKFLOW
The typical workflow for using autograd operations is:
# 1. Create context and tensors
my $ctx = Lugh::Context->new(mem_size => 16 * 1024 * 1024);
my $x = Lugh::Autograd::Tensor->new($ctx, 'f32', 10, { requires_grad => 1 });
# 2. Set input data
$x->set_data(1.0, 2.0, 3.0, ...);
# 3. Build computation graph (forward pass)
my $y = Lugh::Autograd::Ops->mul($ctx, $x, $x); # x^2
my $loss = Lugh::Autograd::Ops->sum($ctx, $y);
# 4. Execute the computation
$ctx->compute;
# 5. Read forward pass results
my @loss_val = $loss->get_data;
# 6. Compute gradients (backward pass)
$loss->backward;
# 7. Read gradients
my $grad = $x->grad; # Contains 2*x for each element
CHAINING OPERATIONS
Operations can be chained to build complex computation graphs:
my $ctx = Lugh::Context->new(mem_size => 16 * 1024 * 1024);
my $x = Lugh::Autograd::Tensor->new($ctx, 'f32', 4, { requires_grad => 1 });
my $w = Lugh::Autograd::Tensor->new($ctx, 'f32', 4, { requires_grad => 1 });
my $b = Lugh::Autograd::Tensor->new($ctx, 'f32', 4, { requires_grad => 1 });
$x->set_data(1.0, 2.0, 3.0, 4.0);
$w->set_data(0.5, 0.5, 0.5, 0.5);
$b->set_data(0.1, 0.1, 0.1, 0.1);
# Linear layer: y = w * x + b
my $wx = Lugh::Autograd::Ops->mul($ctx, $w, $x);
my $y = Lugh::Autograd::Ops->add($ctx, $wx, $b);
my $loss = Lugh::Autograd::Ops->sum($ctx, $y);
$ctx->compute;
$loss->backward;
# All leaf tensors now have gradients computed
my $grad_x = $x->grad;
my $grad_w = $w->grad;
my $grad_b = $b->grad;
ERROR HANDLING
Operations will die with an error message if:
The context is invalid or has been freed
Input tensors are not valid Lugh::Autograd::Tensor objects
Input tensors have been freed
Tensor shapes are incompatible for the operation
eval {
my $result = Lugh::Autograd::Ops->add($ctx, $a, $b);
};
if ($@) {
warn "Operation failed: $@";
}
SEE ALSO
Lugh::Autograd - Main autograd module with gradient context management
Lugh::Autograd::Tensor - Tensor class with gradient support
Lugh::Context - Memory context for tensor operations
Lugh::Tensor - Base tensor class
AUTHOR
LNATION <email@lnation.org>
LICENSE
This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.