NAME

Lugh::Train - High-level training API for Lugh

SYNOPSIS

use Lugh;
use Lugh::Train;
use Lugh::Autograd;

# Create a training context
my $ctx = Lugh::Context->new(mem_size => 256 * 1024 * 1024);

# Compute cross-entropy loss
my $logits = Lugh::Autograd::Tensor->new($ctx, 'f32', 5, { requires_grad => 1 });
$logits->set_data(1.0, 2.0, 3.0, 4.0, 5.0);
my $loss = Lugh::Train->cross_entropy_loss($ctx, $logits, [4]);  # target class 4

# Backward pass
$loss->backward();

# Get gradients
my $grad = $logits->grad();  # Returns array reference

# MSE loss for regression
my $predictions = Lugh::Autograd::Tensor->new($ctx, 'f32', 3, { requires_grad => 1 });
$predictions->set_data(1.0, 2.0, 3.0);
my $targets = Lugh::Autograd::Tensor->new($ctx, 'f32', 3, { requires_grad => 0 });
$targets->set_data(1.1, 1.9, 3.2);
my $mse = Lugh::Train->mse_loss($ctx, $predictions, $targets);
$mse->backward();

DESCRIPTION

Lugh::Train provides high-level training utilities for neural network training:

  • Loss functions (cross-entropy, MSE)

  • Training-aware forward pass with gradient support

  • Data loading and batching

  • Training loop helpers

All methods are implemented in XS for performance and are automatically loaded when use Lugh; is called.

FORWARD PASS

forward

my $logits = Lugh::Train->forward(
    inference  => $inference,
    context    => $ctx,
    tokens     => \@tokens,
    lora       => $lora,       # Optional: LoRA adapter
    train_lora => 1,           # Default 1: compute LoRA gradients
    train_full => 0,           # Enable full model gradient computation
);

Performs a training-aware forward pass that stores intermediate activations for gradient computation. Returns logits as a Lugh::Autograd::Tensor suitable for loss computation and backpropagation.

Arguments:

inference - Lugh::Inference object (or model as alias)
context - Lugh::Context object (or ctx as alias)
tokens - Array reference of input token IDs
lora - Optional Lugh::LoRA adapter for LoRA training
train_lora - Boolean, compute gradients for LoRA weights (default: 1)
train_full - Boolean, compute gradients for full model weights (default: 0)

Returns: Lugh::Autograd::Tensor containing logits of shape [vocab_size, n_tokens].

Example:

my $model = Lugh::Model->new(model => 'model.gguf');
my $inference = Lugh::Inference->new(model => $model);
my $ctx = Lugh::Context->new(mem_size => 32 * 1024 * 1024);

my @tokens = (1, 72, 101, 108, 108, 111);  # "Hello"
my @target = @tokens[1..$#tokens];         # Shifted targets
my @input = @tokens[0..$#tokens-1];

my $logits = Lugh::Train->forward(
    inference => $inference,
    context   => $ctx,
    tokens    => \@input,
);

my $loss = Lugh::Train->cross_entropy_loss($ctx, $logits, \@target);
$loss->backward();

register_weight_tensors

Lugh::Train->register_weight_tensors($logits, \@weights);

Registers weight tensors with the training cache for gradient computation. This connects trainable weights (from a model or LoRA adapter) to the forward pass output, enabling gradients to flow to these weights during backpropagation.

Arguments:

$logits - The logits tensor returned from forward()
\@weights - Array reference of Lugh::Autograd::Tensor weight tensors

Example:

# Get trainable weights from model
my $weights_hash = $model->get_trainable_weights($ctx);
my @weights = values %$weights_hash;

my $logits = Lugh::Train->forward(
    inference  => $inference,
    context    => $ctx,
    tokens     => \@tokens,
    train_full => 1,
);

# Register weights for gradient computation
Lugh::Train->register_weight_tensors($logits, \@weights);

my $loss = Lugh::Train->cross_entropy_loss($ctx, $logits, \@targets);
$loss->backward();

# Weights now have gradients
for my $w (@weights) {
    my $grad = $w->grad();
    # ... use gradients for optimization
}

LOSS FUNCTIONS

cross_entropy_loss

my $loss = Lugh::Train->cross_entropy_loss($ctx, $logits, \@targets);

Computes cross-entropy loss between logits and target class indices.

Arguments:

$ctx - Lugh::Context object
$logits - Lugh::Autograd::Tensor of shape [vocab_size, batch_size] or [vocab_size]
\@targets - Array reference of target class indices (integers)

Returns: Scalar loss tensor with requires_grad if logits requires grad.

The loss is computed as the negative log-likelihood of the target class after applying log-softmax to the logits. This is numerically stable and equivalent to:

loss = -log(softmax(logits)[target_class])

mse_loss

my $loss = Lugh::Train->mse_loss($ctx, $predictions, $targets);

Computes Mean Squared Error loss between predictions and targets.

Arguments:

$ctx - Lugh::Context object
$predictions - Lugh::Autograd::Tensor
$targets - Lugh::Autograd::Tensor (same shape as predictions)

Returns: Scalar loss tensor with requires_grad if predictions requires grad.

The loss is computed as:

loss = mean((predictions - targets)^2)

DATA UTILITIES

batch_data

my @batches = Lugh::Train->batch_data(\@data, batch_size => 32, shuffle => 1);

Split data into batches for training.

Arguments:

\@data - Array reference of training examples
batch_size - Number of examples per batch (default: 32)
shuffle - Whether to shuffle before batching (default: 0)

Returns: List of array references, each containing batch_size examples.

tokenize_batch

my ($input_ids, $targets) = Lugh::Train->tokenize_batch($tokenizer, \@texts, max_length => 512);

Tokenize a batch of texts for language model training.

Arguments:

$tokenizer - Lugh::Tokenizer object
\@texts - Array reference of text strings
max_length - Maximum sequence length (default: 512)

Returns: Two array refs: input token IDs and target token IDs (shifted by 1).

TRAINING HELPERS

training_step

my $loss = Lugh::Train->training_step($model, $optimizer, $inputs, $targets, %opts);

Perform a single training step: forward pass, loss computation, backward pass, and optimizer update.

Arguments:

$model - Model with forward() method
$optimizer - Optimizer (Lugh::Optimizer::SGD or Adam)
$inputs - Input tensor or batch
$targets - Target tensor or batch
%opts - Options including loss_fn (default: 'cross_entropy'), ctx (required)

Returns: Scalar loss value.

zero_grad

Lugh::Train->zero_grad(@tensors);

Zero out gradients for all given tensors. This is a convenience method that calls zero_grad() on each tensor that supports it.

Arguments:

@tensors - List of Lugh::Autograd::Tensor objects to zero gradients

Example:

# Zero gradients before each training iteration
Lugh::Train->zero_grad($weight1, $weight2, $bias);

EXAMPLE: TRAINING LOOP

Here's a complete example of training a model from scratch to memorize simple patterns:

use Lugh;
use Lugh::Train;
use Lugh::Optimizer;

# Load or create model
my $model = Lugh::Model->new(model => 'model.gguf');
my $inference = Lugh::Inference->new(model => $model);

# Get trainable weights
my $weight_ctx = Lugh::Context->new(mem_size => 512 * 1024 * 1024);
my $weights_hash = $model->get_trainable_weights($weight_ctx);
delete $weights_hash->{_n_tensors};
delete $weights_hash->{_model_id};
my @weights = values %$weights_hash;

# Create optimizer
my $optimizer = Lugh::Optimizer::AdamW->new(lr => 0.01, weight_decay => 0.0);
$optimizer->add_param($_) for @weights;

# Training data
my @texts = ("Hello", "World", "Test");

# Tokenization helper
sub tokenize { return (1, map { ord($_) } split //, $_[0]) }

# Training loop
for my $epoch (1..1000) {
    for my $text (@texts) {
        my @tokens = tokenize($text);
        my @input = @tokens[0..$#tokens-1];
        my @target = @tokens[1..$#tokens];

        my $ctx = Lugh::Context->new(mem_size => 32 * 1024 * 1024);
        $optimizer->zero_grad();

        # Forward pass
        my $logits = Lugh::Train->forward(
            inference  => $inference,
            context    => $ctx,
            tokens     => \@input,
            train_lora => 0,
            train_full => 1,
        );

        # Register weights for gradient computation
        Lugh::Train->register_weight_tensors($logits, \@weights);

        # Compute loss
        my $loss = Lugh::Train->cross_entropy_loss($ctx, $logits, \@target);
        my ($loss_val) = $loss->get_data();

        # Backward pass and optimization
        $loss->backward();
        $optimizer->step();

        if ($epoch % 100 == 0) {
            printf "Epoch %d: loss=%.4f\n", $epoch, $loss_val;
        }
    }
}

EXAMPLE: LORA TRAINING

Training with LoRA (Low-Rank Adaptation) for efficient fine-tuning:

use Lugh;
use Lugh::Train;
use Lugh::Optimizer;

# Load base model
my $model = Lugh::Model->new(model => 'base-model.gguf');
my $inference = Lugh::Inference->new(model => $model);

# Create trainable LoRA adapter
my $lora = Lugh::LoRA->create(
    model   => $model,
    rank    => 8,
    alpha   => 16.0,
    targets => [qw(attn_q attn_v)],
);

# Get LoRA weight tensors
my @weight_names = $lora->weight_names;
my @weights;
for my $name (@weight_names) {
    push @weights, $lora->get_weight_tensor($name, 'a');
    push @weights, $lora->get_weight_tensor($name, 'b');
}

# Create optimizer
my $optimizer = Lugh::Optimizer::AdamW->new(lr => 0.001);
$optimizer->add_param($_) for @weights;

# Training loop
for my $epoch (1..100) {
    my $ctx = Lugh::Context->new(mem_size => 64 * 1024 * 1024);
    $optimizer->zero_grad();

    my $logits = Lugh::Train->forward(
        inference  => $inference,
        context    => $ctx,
        tokens     => \@input_tokens,
        lora       => $lora,
        train_lora => 1,
    );

    my $loss = Lugh::Train->cross_entropy_loss($ctx, $logits, \@targets);
    $loss->backward();
    $optimizer->step();
}

# Save trained adapter
$lora->save('my-trained-lora.gguf');

SEE ALSO

AUTHOR

LNATION <email@lnation.org>

LICENSE

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.