Common Optimization algorithms with regularizations.


Create an optimizer with specified name.

name: str
    Name of required optimizer. Should be the name
    of a subclass of Optimizer. Case insensitive.

rescale_grad : float
    Rescaling factor on gradient. Normally should be 1/batch_size.

kwargs: dict
    Parameters for optimizer

opt : Optimizer
    The result optimizer.


Set individual learning rate multipler for parameters

args_lr_mult : dict of string/int to float
    set the lr multipler for name/index to float.
    setting multipler by index is supported for backward compatibility,
    but we recommend using name and symbol.


Set individual weight decay multipler for parameters.
By default wd multipler is 0 for all params whose name doesn't
end with _weight, if param_idx2name is provided.

args_wd_mult : dict of string/int to float
    set the wd multipler for name/index to float.
    setting multipler by index is supported for backward compatibility,
    but we recommend using name and symbol.


update num_update

index : int
    The index will be updated


get learning rate for index.

index : int
    The index for weight

lr : float
    learning rate for this index


get weight decay for index.
Returns 0 for non-weights if the name of weights are provided for __init__.

index : int
    The index for weight

wd : float
    weight decay for this index
A very simple SGD optimizer with momentum and weight regularization.

learning_rate : float, optional
    learning_rate of SGD

momentum : float, optional
   momentum value

wd : float, optional
    L2 regularization coefficient add to all the weights

rescale_grad : float, optional
    rescaling factor of gradient. Normally should be 1/batch_size.

clip_gradient : float, optional
    clip gradient in range [-clip_gradient, clip_gradient]

param_idx2name : dict of string/int to float, optional
    special treat weight decay in parameter ends with bias, gamma, and beta


Create additional optimizer state such as momentum.

    weight : NDArray
        The weight data


Update the parameters.

index : int
    An unique integer key used to index the parameters

weight : NDArray
    weight ndarray

grad : NDArray
    grad ndarray

state : NDArray or other objects returned by init_state
    The auxiliary state used in optimization.




DCASGD optimizer with momentum and weight regularization.

implement paper "Asynchronous Stochastic Gradient Descent with
                Delay Compensation for Distributed Deep Learning"

learning_rate : float, optional
    learning_rate of SGD

momentum : float, optional
   momentum value

lamda : float, optional
   scale DC value

wd : float, optional
    L2 regularization coefficient add to all the weights

rescale_grad : float, optional
    rescaling factor of gradient. Normally should be 1/batch_size.

clip_gradient : float, optional
    clip gradient in range [-clip_gradient, clip_gradient]

param_idx2name : hash ref of string/int to float, optional
    special treat weight decay in parameter ends with bias, gamma, and beta


Create additional optimizer state such as momentum.

    weight : NDArray
        The weight data


Update the parameters.

index : int
    An unique integer key used to index the parameters

weight : NDArray
    weight ndarray

grad : NDArray
    grad ndarray

state : NDArray or other objects returned by init_state
    The auxiliary state used in optimization.

6 POD Errors

The following errors were encountered while parsing the POD:

Around line 270:

=begin without a target?

Around line 688:

=begin without a target?

Around line 821:

=begin without a target?

Around line 902:

=begin without a target?

Around line 1042:

=begin without a target?

Around line 1175:

=begin without a target?