NAME
AI::MXNet::Gluon::RNN::RNN
DESCRIPTION
Applies a multi-layer Elman RNN
with
`tanh` or `ReLU` non-linearity to an input sequence.
For
each
element in the input sequence,
each
layer computes the following
function:
.. math::
h_t = \tanh(w_{ih} * x_t + b_{ih} + w_{hh} * h_{(t-1)} + b_{hh})
where :math:`h_t` is the hidden state at
time
`t`, and :math:`x_t` is the hidden
state of the previous layer at
time
`t` or :math:`input_t`
for
the first layer.
If nonlinearity=
'relu'
, then `ReLU` is used instead of `tanh`.
Parameters
----------
hidden_size:
int
The number of features in the hidden state h.
num_layers:
int
,
default
1
Number of recurrent layers.
activation: {
'relu'
or
'tanh'
},
default
'tanh'
The activation function to
use
.
layout : str,
default
'TNC'
The
format
of input and output tensors. T, N and C stand
for
sequence
length
, batch size, and feature dimensions respectively.
dropout: float,
default
0
If non-zero, introduces a dropout layer on the outputs of
each
RNN layer except the
last
layer.
bidirectional: bool,
default
False
If `True`, becomes a bidirectional RNN.
i2h_weight_initializer : str or Initializer
Initializer
for
the input weights matrix, used
for
the linear
transformation of the inputs.
h2h_weight_initializer : str or Initializer
Initializer
for
the recurrent weights matrix, used
for
the linear
transformation of the recurrent state.
i2h_bias_initializer : str or Initializer
Initializer
for
the bias vector.
h2h_bias_initializer : str or Initializer
Initializer
for
the bias vector.
input_size:
int
,
default
0
The number of expected features in the input x.
If not specified, it will be inferred from input.
prefix : str or None
Prefix of this `Block`.
params : ParameterDict or None
Shared Parameters
for
this `Block`.
Input shapes:
The input shape depends on `layout`. For `layout=
'TNC'
`, the
input
has
shape `(sequence_length, batch_size, input_size)`
Output shape:
The output shape depends on `layout`. For `layout=
'TNC'
`, the
output
has
shape `(sequence_length, batch_size, num_hidden)`.
If `bidirectional` is True, output shape will instead be
`(sequence_length, batch_size, 2
*num_hidden
)`
Recurrent state:
The recurrent state is an NDArray
with
shape `(num_layers, batch_size, num_hidden)`.
If `bidirectional` is True, the recurrent state shape will instead be
`(2
*num_layers
, batch_size, num_hidden)`
If input recurrent state is None, zeros are used as
default
begin states,
and the output recurrent state is omitted.
Examples
--------
>>> layer = mx.gluon.rnn.RNN(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>>
# by default zeros are used as begin state
>>> output = layer(input)
>>>
# manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)
NANE
AI::MXNet::Gluon::RNN::LSTM
DESCRIPTION
Applies a multi-layer long short-term memory (LSTM) RNN to an input sequence.
For
each
element in the input sequence,
each
layer computes the following
function:
.. math::
\begin{array}{ll}
i_t = sigmoid(W_{ii} x_t + b_{ii} + W_{hi} h_{(t-1)} + b_{hi}) \\
f_t = sigmoid(W_{
if
} x_t + b_{
if
} + W_{hf} h_{(t-1)} + b_{hf}) \\
g_t = \tanh(W_{ig} x_t + b_{ig} + W_{hc} h_{(t-1)} + b_{hg}) \\
o_t = sigmoid(W_{io} x_t + b_{io} + W_{ho} h_{(t-1)} + b_{ho}) \\
c_t = f_t * c_{(t-1)} + i_t * g_t \\
h_t = o_t * \tanh(c_t)
\end{array}
where :math:`h_t` is the hidden state at
time
`t`, :math:`c_t` is the
cell state at
time
`t`, :math:`x_t` is the hidden state of the previous
layer at
time
`t` or :math:`input_t`
for
the first layer, and :math:`i_t`,
:math:`f_t`, :math:`g_t`, :math:`o_t` are the input, forget, cell, and
out gates, respectively.
Parameters
----------
hidden_size:
int
The number of features in the hidden state h.
num_layers:
int
,
default
1
Number of recurrent layers.
layout : str,
default
'TNC'
The
format
of input and output tensors. T, N and C stand
for
sequence
length
, batch size, and feature dimensions respectively.
dropout: float,
default
0
If non-zero, introduces a dropout layer on the outputs of
each
RNN layer except the
last
layer.
bidirectional: bool,
default
False
If `True`, becomes a bidirectional RNN.
i2h_weight_initializer : str or Initializer
Initializer
for
the input weights matrix, used
for
the linear
transformation of the inputs.
h2h_weight_initializer : str or Initializer
Initializer
for
the recurrent weights matrix, used
for
the linear
transformation of the recurrent state.
i2h_bias_initializer : str or Initializer,
default
'lstmbias'
Initializer
for
the bias vector. By
default
, bias
for
the forget
gate is initialized to 1
while
all other biases are initialized
to zero.
h2h_bias_initializer : str or Initializer
Initializer
for
the bias vector.
input_size:
int
,
default
0
The number of expected features in the input x.
If not specified, it will be inferred from input.
prefix : str or None
Prefix of this `Block`.
params : `ParameterDict` or `None`
Shared Parameters
for
this `Block`.
Input shapes:
The input shape depends on `layout`. For `layout=
'TNC'
`, the
input
has
shape `(sequence_length, batch_size, input_size)`
Output shape:
The output shape depends on `layout`. For `layout=
'TNC'
`, the
output
has
shape `(sequence_length, batch_size, num_hidden)`.
If `bidirectional` is True, output shape will instead be
`(sequence_length, batch_size, 2
*num_hidden
)`
Recurrent state:
The recurrent state is a list of two NDArrays. Both
has
shape
`(num_layers, batch_size, num_hidden)`.
If `bidirectional` is True,
each
recurrent state will instead have shape
`(2
*num_layers
, batch_size, num_hidden)`.
If input recurrent state is None, zeros are used as
default
begin states,
and the output recurrent state is omitted.
Examples
--------
>>> layer = mx.gluon.rnn.LSTM(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>>
# by default zeros are used as begin state
>>> output = layer(input)
>>>
# manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> c0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, [h0, c0])
NANE
AI::MXNet::Gluon::RNN::GRU
DESCRIPTION
Applies a multi-layer gated recurrent unit (GRU) RNN to an input sequence.
For
each
element in the input sequence,
each
layer computes the following
function:
.. math::
\begin{array}{ll}
r_t = sigmoid(W_{ir} x_t + b_{ir} + W_{hr} h_{(t-1)} + b_{hr}) \\
i_t = sigmoid(W_{ii} x_t + b_{ii} + W_hi h_{(t-1)} + b_{hi}) \\
n_t = \tanh(W_{in} x_t + b_{in} + r_t * (W_{hn} h_{(t-1)}+ b_{hn})) \\
h_t = (1 - i_t) * n_t + i_t * h_{(t-1)} \\
\end{array}
where :math:`h_t` is the hidden state at
time
`t`, :math:`x_t` is the hidden
state of the previous layer at
time
`t` or :math:`input_t`
for
the first layer,
and :math:`r_t`, :math:`i_t`, :math:`n_t` are the
reset
, input, and new gates, respectively.
Parameters
----------
hidden_size:
int
The number of features in the hidden state h
num_layers:
int
,
default
1
Number of recurrent layers.
layout : str,
default
'TNC'
The
format
of input and output tensors. T, N and C stand
for
sequence
length
, batch size, and feature dimensions respectively.
dropout: float,
default
0
If non-zero, introduces a dropout layer on the outputs of
each
RNN layer except the
last
layer
bidirectional: bool,
default
False
If True, becomes a bidirectional RNN.
i2h_weight_initializer : str or Initializer
Initializer
for
the input weights matrix, used
for
the linear
transformation of the inputs.
h2h_weight_initializer : str or Initializer
Initializer
for
the recurrent weights matrix, used
for
the linear
transformation of the recurrent state.
i2h_bias_initializer : str or Initializer
Initializer
for
the bias vector.
h2h_bias_initializer : str or Initializer
Initializer
for
the bias vector.
input_size:
int
,
default
0
The number of expected features in the input x.
If not specified, it will be inferred from input.
prefix : str or None
Prefix of this `Block`.
params : ParameterDict or None
Shared Parameters
for
this `Block`.
Input shapes:
The input shape depends on `layout`. For `layout=
'TNC'
`, the
input
has
shape `(sequence_length, batch_size, input_size)`
Output shape:
The output shape depends on `layout`. For `layout=
'TNC'
`, the
output
has
shape `(sequence_length, batch_size, num_hidden)`.
If `bidirectional` is True, output shape will instead be
`(sequence_length, batch_size, 2
*num_hidden
)`
Recurrent state:
The recurrent state is an NDArray
with
shape `(num_layers, batch_size, num_hidden)`.
If `bidirectional` is True, the recurrent state shape will instead be
`(2
*num_layers
, batch_size, num_hidden)`
If input recurrent state is None, zeros are used as
default
begin states,
and the output recurrent state is omitted.
Examples
--------
>>> layer = mx.gluon.rnn.GRU(100, 3)
>>> layer.initialize()
>>> input = mx.nd.random.uniform(shape=(5, 3, 10))
>>>
# by default zeros are used as begin state
>>> output = layer(input)
>>>
# manually specify begin state.
>>> h0 = mx.nd.random.uniform(shape=(3, 3, 100))
>>> output, hn = layer(input, h0)