NAME
AI::MXNet::Executor::Group - Manager
for
a group of executors working in different contexts.
DESCRIPTION
DataParallelExecutorGroup is a group of executors that lives on a group of devices.
This is a helper class used to implement data parallelization. Each mini-batch will
be
split
and run on the devices.
Parameters
for
constructor
----------
symbol : AI::MXNet::Symbol
The common symbolic computation graph
for
all executors.
contexts : ArrayRef[AI::MXNet::Context]
A array
ref
of contexts.
workload : ArrayRef[Num]
If not
undef
, could be an array
ref
of numbers that specify the workload to be assigned
to different context. Larger number indicate heavier workload.
data_shapes : ArrayRef[NameShape|AI::MXNet::DataDesc]
Should be a array
ref
of [name, shape] array refs,
for
the shapes of data. Note the order is
important and should be the same as the order that the `DataIter` provide the data.
label_shapes : Maybe[ArrayRef[NameShape|AI::MXNet::DataDesc]]
Should be a array
ref
of [
$name
,
$shape
] array refs,
for
the shapes of label. Note the order is
important and should be the same as the order that the `DataIter` provide the label.
param_names : ArrayRef[Str]
A array
ref
of strings, indicating the names of parameters (e.g. weights, filters, etc.)
in the computation graph.
for_training : Bool
Indicate whether the executors should be
bind
for
training. When not doing training,
the memory
for
gradients will not be allocated.
inputs_need_grad : Bool
Indicate whether the gradients
for
the input data should be computed. This is currently
not used. It will be useful
for
implementing composition of modules.
shared_group : AI::MXNet::DataParallelExecutorGroup
Default is
undef
. This is used in bucketing. When not
undef
, it should be a executor
group corresponding to a different bucket. In other words, it will correspond to a different
symbol
with
the same set of parameters (e.g. unrolled RNNs
with
different lengths).
In this case the memory regions of the parameters will be shared.
logger : Logger
Default is AI::MXNet::Logging->get_logger.
fixed_param_names: Maybe[ArrayRef[Str]]
Indicate parameters to be fixed during training. Parameters in this array
ref
will not allocate
space
for
gradient, nor
do
gradient calculation.
grad_req : ArrayRef[GradReq]|HashRef[GradReq]|GradReq
Requirement
for
gradient accumulation. Can be
'write'
,
'add'
, or
'null'
(
default
to
'write'
).
Can be specified globally (str) or
for
each
argument (array
ref
, hash
ref
).
state_names: Maybe[ArrayRef[Str]]
decide_slices
Decide the slices
for
each
context according to the workload.
Parameters
----------
$data_shapes
: ArrayRef[AI::MXNet::DataDesc]
bind_exec
Bind executors on their respective devices.
Parameters
----------
$data_shapes
: ArrayRef[AI::MXNet::DataDesc]
$label_shapes
: Maybe[ArrayRef[AI::MXNet::DataDesc]]
$shared_group
: Maybe[AI::MXNet::DataParallelExecutorGroup]
$reshape
: Bool
reshape
Reshape executors.
Parameters
----------
$data_shapes
: ArrayRef[AI::MXNet::DataDesc]
$label_shapes
: Maybe[ArrayRef[AI::MXNet::DataDesc]]
set_params
Assign, i.e. copy parameters to all the executors.
Parameters
----------
$arg_params
: HashRef[AI::MXNet::NDArray]
A dictionary of name to AI::MXNet::NDArray parameter mapping.
$aux_params
: HashRef[AI::MXNet::NDArray]
A dictionary of name to AI::MXNet::NDArray auxiliary variable mapping.
get_params
Copy data from
each
executor to arg_params and aux_params.
Parameters
----------
$arg_params
: HashRef[AI::MXNet::NDArray]
target parameter arrays
$aux_params
: HashRef[AI::MXNet::NDArray]
target aux arrays
Notes
-----
- This function will inplace update the NDArrays in arg_params and aux_params.
forward
Split the data_batch according to a workload and run forward on
each
devices.
Parameters
----------
data_batch : AI::MXNet::DataBatch
Or could be any object implementing similar interface.
is_train : bool
The hint
for
the backend, indicating whether we are during training phase.
Default is
undef
, then the value
$self
->for_training will be used.
get_outputs
Gets outputs of the previous forward computation.
Parameters
----------
merge_multi_context : bool
Default is 1. In the case
when
data-parallelism is used, the outputs
will be collected from multiple devices. A 1 value indicates that we
should merge the collected results so that they look like from a single
executor.
Returns
-------
If merge_multi_context is 1, it is [
$out1
,
$out2
]. Otherwise, it
is [[
$out1_dev1
,
$out1_dev2
], [
$out2_dev1
,
$out2_dev2
]]. All the output
elements are `AI::MXNet::NDArray`.
get_input_grads
Get the gradients
with
respect to the inputs of the module.
Parameters
----------
merge_multi_context : bool
Default is 1. In the case
when
data-parallelism is used, the outputs
will be collected from multiple devices. A 1 value indicates that we
should merge the collected results so that they look like from a single
executor.
Returns
-------
If merge_multi_context is 1, it is [
$grad1
,
$grad2
]. Otherwise, it
is [[
$grad1_dev1
,
$grad1_dev2
], [
$grad2_dev1
,
$grad2_dev2
]]. All the output
elements are AI::MXNet::NDArray.
backward
Run backward on all devices. A backward should be called
after
a call to the forward function. Backward cannot be called
unless
$self
->for_training is 1.
Parameters
----------
out_grads : NDArray or array
ref
of NDArray, optional
Gradient on the outputs to be propagated back.
This parameter is only needed
when
bind
is called
on outputs that are not a loss function.
update_metric
Accumulate the performance according to eval_metric on all devices.
Parameters
----------
eval_metric : AI::MXNet::EvalMetric
The metric used
for
evaluation.
labels : array
ref
of NDArray
Typically comes from label of AI::MXNet::DataBatch.
_sliced_shape
Get the sliced shapes
for
the i-th executor.
Parameters
----------
shapes : array
ref
of (str, array
ref
)
The original (name, shape) pairs.
i :
int
Which executor we are dealing
with
.
install_monitor
Install monitor on all executors
Parameters
----------
$mon
: AI::MXNet::Monitor