NAME
Lugh::Optimizer::SGD - Stochastic Gradient Descent optimizer
SYNOPSIS
use Lugh;
use Lugh::Autograd;
# Create context and parameter tensor
my $ctx = Lugh::Context->new(mem_size => 64 * 1024 * 1024);
my $weights = Lugh::Autograd::Tensor->new($ctx, 'f32', 10, {
requires_grad => 1,
});
$weights->set_data((0.5) x 10);
# Create SGD optimizer
my $optimizer = Lugh::Optimizer::SGD->new(
lr => 0.01,
momentum => 0.9,
);
# Register parameters
$optimizer->add_param($weights);
# Training loop
for my $epoch (1..100) {
$optimizer->zero_grad();
# Forward pass (compute loss)
my $loss = compute_loss($weights, $data);
# Backward pass
$loss->backward();
# Update parameters
$optimizer->step();
}
DESCRIPTION
Lugh::Optimizer::SGD implements Stochastic Gradient Descent with optional momentum and Nesterov acceleration. It is the most basic and widely used optimizer for training neural networks.
The update rule with momentum is:
v_t = momentum * v_{t-1} + gradient
param = param - lr * v_t
With Nesterov momentum:
v_t = momentum * v_{t-1} + gradient
param = param - lr * (momentum * v_t + gradient)
CONSTRUCTOR
new
my $optimizer = Lugh::Optimizer::SGD->new(%options);
Creates a new SGD optimizer.
Options:
lr(default: 0.001)-
Learning rate. Controls the step size for parameter updates.
momentum(default: 0)-
Momentum factor. Set to 0.9 or 0.99 for faster convergence.
weight_decay(default: 0)-
L2 regularization coefficient. Adds a penalty proportional to the squared magnitude of parameters.
nesterov(default: 0)-
If true, use Nesterov momentum instead of classical momentum. Nesterov momentum often provides better convergence.
Examples:
# Basic SGD
my $sgd = Lugh::Optimizer::SGD->new(lr => 0.01);
# SGD with momentum
my $sgd = Lugh::Optimizer::SGD->new(
lr => 0.01,
momentum => 0.9,
);
# SGD with Nesterov momentum and weight decay
my $sgd = Lugh::Optimizer::SGD->new(
lr => 0.01,
momentum => 0.9,
nesterov => 1,
weight_decay => 0.0001,
);
METHODS
add_param
$optimizer->add_param($tensor);
Registers a tensor as a parameter to be optimized. Only tensors with requires_grad => 1 should be added.
Parameters:
$tensor-
A Lugh::Autograd::Tensor object with
requires_gradenabled.
Example:
my $w1 = Lugh::Autograd::Tensor->new($ctx, 'f32', 10, 10, {
requires_grad => 1,
});
my $w2 = Lugh::Autograd::Tensor->new($ctx, 'f32', 10, {
requires_grad => 1,
});
$optimizer->add_param($w1);
$optimizer->add_param($w2);
zero_grad
$optimizer->zero_grad();
Zeros the gradients of all registered parameters. This should be called at the beginning of each training iteration to prevent gradient accumulation.
Example:
for my $batch (@batches) {
$optimizer->zero_grad(); # Clear gradients
my $loss = compute_loss($batch);
$loss->backward();
$optimizer->step();
}
step
$optimizer->step();
Performs a single optimization step, updating all registered parameters based on their gradients.
Note: This should be called after backward() has been called to compute gradients.
get_lr
my $current_lr = $optimizer->get_lr();
Returns the current learning rate.
set_lr
$optimizer->set_lr($new_lr);
Sets a new learning rate. Useful for implementing custom learning rate schedules or manual adjustment during training.
Example:
# Manual learning rate decay
if ($epoch % 30 == 0) {
my $current_lr = $optimizer->get_lr();
$optimizer->set_lr($current_lr * 0.1);
}
HYPERPARAMETER GUIDELINES
Learning Rate
Start with 0.01 or 0.001
If loss oscillates, reduce by 10x
If loss decreases too slowly, increase by 2-10x
Momentum
Use 0.9 as a default for most cases
Try 0.99 for very smooth optimization landscapes
Set to 0 if momentum causes instability
Weight Decay
Use 1e-4 to 1e-5 for regularization
Higher values (1e-2) for strong regularization
Set to 0 if overfitting is not a concern
COMPARISON WITH ADAMW
SGD is simpler and has fewer hyperparameters than AdamW, but may require more tuning of the learning rate schedule. AdamW often works "out of the box" for transformer models, while SGD can achieve better generalization with proper tuning.
SEE ALSO
Lugh::Optimizer::AdamW - Adam optimizer with weight decay
Lugh::Optimizer::LRScheduler - Learning rate scheduling
Lugh::Optimizer - Gradient clipping utilities
Lugh::Autograd - Automatic differentiation
AUTHOR
LNATION <email@lnation.org>
LICENSE
This is free software; you can redistribute it and/or modify it under the same terms as Perl itself.