NAME

Lugh::Optimizer::SGD - Stochastic Gradient Descent optimizer

SYNOPSIS

use Lugh;
use Lugh::Autograd;

# Create context and parameter tensor
my $ctx = Lugh::Context->new(mem_size => 64 * 1024 * 1024);
my $weights = Lugh::Autograd::Tensor->new($ctx, 'f32', 10, {
    requires_grad => 1,
});
$weights->set_data((0.5) x 10);

# Create SGD optimizer
my $optimizer = Lugh::Optimizer::SGD->new(
    lr       => 0.01,
    momentum => 0.9,
);

# Register parameters
$optimizer->add_param($weights);

# Training loop
for my $epoch (1..100) {
    $optimizer->zero_grad();

    # Forward pass (compute loss)
    my $loss = compute_loss($weights, $data);

    # Backward pass
    $loss->backward();

    # Update parameters
    $optimizer->step();
}

DESCRIPTION

Lugh::Optimizer::SGD implements Stochastic Gradient Descent with optional momentum and Nesterov acceleration. It is the most basic and widely used optimizer for training neural networks.

The update rule with momentum is:

v_t = momentum * v_{t-1} + gradient
param = param - lr * v_t

With Nesterov momentum:

v_t = momentum * v_{t-1} + gradient
param = param - lr * (momentum * v_t + gradient)

CONSTRUCTOR

new

my $optimizer = Lugh::Optimizer::SGD->new(%options);

Creates a new SGD optimizer.

Options:

lr (default: 0.001): Learning rate. Controls the step size for parameter updates.
momentum (default: 0): Momentum factor. Set to 0.9 or 0.99 for faster convergence.
weight_decay (default: 0): L2 regularization coefficient. Adds a penalty proportional to the squared magnitude of parameters.
nesterov (default: 0): If true, use Nesterov momentum instead of classical momentum. Nesterov momentum often provides better convergence.

Examples:

# Basic SGD
my $sgd = Lugh::Optimizer::SGD->new(lr => 0.01);

# SGD with momentum
my $sgd = Lugh::Optimizer::SGD->new(
    lr       => 0.01,
    momentum => 0.9,
);

# SGD with Nesterov momentum and weight decay
my $sgd = Lugh::Optimizer::SGD->new(
    lr           => 0.01,
    momentum     => 0.9,
    nesterov     => 1,
    weight_decay => 0.0001,
);

METHODS

add_param

$optimizer->add_param($tensor);

Registers a tensor as a parameter to be optimized. Only tensors with requires_grad => 1 should be added.

Parameters:

$tensor: A Lugh::Autograd::Tensor object with requires_grad enabled.

Example:

my $w1 = Lugh::Autograd::Tensor->new($ctx, 'f32', 10, 10, {
    requires_grad => 1,
});
my $w2 = Lugh::Autograd::Tensor->new($ctx, 'f32', 10, {
    requires_grad => 1,
});

$optimizer->add_param($w1);
$optimizer->add_param($w2);

zero_grad

$optimizer->zero_grad();

Zeros the gradients of all registered parameters. This should be called at the beginning of each training iteration to prevent gradient accumulation.

Example:

for my $batch (@batches) {
    $optimizer->zero_grad();  # Clear gradients

    my $loss = compute_loss($batch);
    $loss->backward();
    $optimizer->step();
}

step

$optimizer->step();

Performs a single optimization step, updating all registered parameters based on their gradients.

Note: This should be called after backward() has been called to compute gradients.

get_lr

my $current_lr = $optimizer->get_lr();

Returns the current learning rate.

set_lr

$optimizer->set_lr($new_lr);

Sets a new learning rate. Useful for implementing custom learning rate schedules or manual adjustment during training.

Example:

# Manual learning rate decay
if ($epoch % 30 == 0) {
    my $current_lr = $optimizer->get_lr();
    $optimizer->set_lr($current_lr * 0.1);
}

HYPERPARAMETER GUIDELINES

Learning Rate

Start with 0.01 or 0.001
If loss oscillates, reduce by 10x
If loss decreases too slowly, increase by 2-10x

Momentum

Use 0.9 as a default for most cases
Try 0.99 for very smooth optimization landscapes
Set to 0 if momentum causes instability

Weight Decay

Use 1e-4 to 1e-5 for regularization
Higher values (1e-2) for strong regularization
Set to 0 if overfitting is not a concern

COMPARISON WITH ADAMW

SGD is simpler and has fewer hyperparameters than AdamW, but may require more tuning of the learning rate schedule. AdamW often works "out of the box" for transformer models, while SGD can achieve better generalization with proper tuning.

AUTHOR

LNATION <email@lnation.org>

LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Lugh, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Lugh

CPAN shell

perl -MCPAN -e shell
install Lugh

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

CONSTRUCTOR

new

METHODS

add_param

zero_grad

step

get_lr

set_lr

HYPERPARAMETER GUIDELINES

Learning Rate

Momentum

Weight Decay

COMPARISON WITH ADAMW

SEE ALSO

AUTHOR

LICENSE

Module Install Instructions