Recurrent Units¶
These are the units that can be used in a returnn.tf.layers.rec.RecLayer
type of layer.
Common units are:
 BasicLSTM (the cell), via official TF, pure TF implementation
 LSTMBlock (the cell), via tf.contrib.rnn, only TF <=1
 LSTMBlockFused, via tf.contrib.rnn. should be much faster than BasicLSTM. only TF <=1.
 CudnnLSTM, via tf.contrib.cudnn_rnn. This is experimental yet.
 NativeLSTM, our own native LSTM. should be faster than LSTMBlockFused, and similar or faster than CudnnLSTM
 NativeLstm2, improved own native LSTM, should be the fastest and most powerful
Note that the native implementations can not be in a recurrent subnetwork, as they process the whole sequence at once. A performance comparison of the different LSTM Layers is available here.
BasicLSTMCell¶

class
tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.
BasicLSTMCell
(num_units, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None, dtype=None, **kwargs)[source]¶ DEPRECATED: Please use tf.compat.v1.nn.rnn_cell.LSTMCell instead.
Basic LSTM recurrent network cell.
The implementation is based on
We add forget_bias (default: 1) to the biases of the forget gate in order to reduce the scale of forgetting in the beginning of the training.
It does not allow cell clipping, a projection layer, and does not use peephole connections: it is the basic baseline.
For advanced models, please use the full tf.compat.v1.nn.rnn_cell.LSTMCell that follows.
Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU, or tf.contrib.rnn.LSTMBlockCell and tf.contrib.rnn.LSTMBlockFusedCell for better performance on CPU.
Initialize the basic LSTM cell.
 Args:
num_units: int, The number of units in the LSTM cell. forget_bias: float, The bias added to forget gates (see above). Must set
to 0.0 manually when restoring from CudnnLSTMtrained checkpoints. state_is_tuple: If True, accepted and returned states are 2tuples of the
 c_state and m_state. If False, they are concatenated along the column axis. The latter behavior will soon be deprecated.
 activation: Activation function of the inner states. Default: tanh. It
 could also be string that is within Keras activation function names.
 reuse: (optional) Python boolean describing whether to reuse variables in
 an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
 name: String, the name of the layer. Layers with the same name will share
 weights, but to avoid mistakes we require reuse=True in such cases.
 dtype: Default dtype of the layer (default of None means use the type of
 the first input). Required when build is called before call.
 **kwargs: Dict, keyword named properties for common layer attributes, like
 trainable etc when constructing the cell from configs of get_config(). When restoring from CudnnLSTMtrained checkpoints, must use CudnnCompatibleLSTMCell instead.

state_size
[source]¶ size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

call
(inputs, state)[source]¶ Long shortterm memory cell (LSTM).
 Args:
inputs: 2D tensor with shape [batch_size, input_size]. state: An LSTMStateTuple of state tensors, each shaped `[batch_size,
num_units]`, if state_is_tuple has been set to True. Otherwise, a Tensor shaped [batch_size, 2 * num_units]. Returns:
 A pair containing the new hidden state, and the new state (either a
 LSTMStateTuple or a concatenated state, depending on state_is_tuple).

get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
 Returns:
 Python dictionary.
BasicRNNCell¶

class
tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.
BasicRNNCell
(num_units, activation=None, reuse=None, name=None, dtype=None, **kwargs)[source]¶ The most basic RNN cell.
Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnRNNTanh for better performance on GPU.
 Args:
num_units: int, The number of units in the RNN cell. activation: Nonlinearity to use. Default: tanh. It could also be string
that is within Keras activation function names. reuse: (optional) Python boolean describing whether to reuse variables in an
 existing scope. If not True, and the existing scope already has the given variables, an error is raised.
 name: String, the name of the layer. Layers with the same name will share
 weights, but to avoid mistakes we require reuse=True in such cases.
 dtype: Default dtype of the layer (default of None means use the type of
 the first input). Required when build is called before call.
 **kwargs: Dict, keyword named properties for common layer attributes, like
 trainable etc when constructing the cell from configs of get_config().

state_size
[source]¶ size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
 Returns:
 Python dictionary.
BlocksparseLSTMCell¶

class
returnn.tf.layers.rec.
BlocksparseLSTMCell
(*args, **kwargs)[source]¶ Standard LSTM but uses OpenAI blocksparse kernels to support bigger matrices.
Refs:
It uses our own wrapper, see
TFNativeOp.init_blocksparse()
.
BlocksparseMultiplicativeMultistepLSTMCell¶
GRUCell¶

class
tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.
GRUCell
(num_units, activation=None, reuse=None, kernel_initializer=None, bias_initializer=None, name=None, dtype=None, **kwargs)[source]¶ Gated Recurrent Unit cell.
Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnGRU for better performance on GPU, or tf.contrib.rnn.GRUBlockCellV2 for better performance on CPU.
 Args:
num_units: int, The number of units in the GRU cell. activation: Nonlinearity to use. Default: tanh. reuse: (optional) Python boolean describing whether to reuse variables in an
existing scope. If not True, and the existing scope already has the given variables, an error is raised. kernel_initializer: (optional) The initializer to use for the weight and
 projection matrices.
bias_initializer: (optional) The initializer to use for the bias. name: String, the name of the layer. Layers with the same name will share
weights, but to avoid mistakes we require reuse=True in such cases. dtype: Default dtype of the layer (default of None means use the type of
 the first input). Required when build is called before call.
 **kwargs: Dict, keyword named properties for common layer attributes, like
trainable etc when constructing the cell from configs of get_config().
References:
Learning Phrase Representations using RNN Encoder Decoder for Statistical Machine Translation:
[Cho et al., 2014] (https://aclanthology.coli.unisaarland.de/papers/D141179/d141179) ([pdf](http://emnlp2014.org/papers/pdf/EMNLP2014179.pdf))

state_size
[source]¶ size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
 Returns:
 Python dictionary.
LayerNormVariantsLSTMCell¶

class
returnn.tf.layers.rec.
LayerNormVariantsLSTMCell
(num_units, norm_gain=1.0, norm_shift=0.0, forget_bias=0.0, activation=<function tanh>, is_training=None, dropout=0.0, dropout_h=0.0, dropout_seed=None, with_concat=False, global_norm=True, global_norm_joined=False, per_gate_norm=False, cell_norm=True, cell_norm_in_output=True, hidden_norm=False, variance_epsilon=1e12)[source]¶ LSTM unit with layer normalization and recurrent dropout
This LSTM cell can apply different variants of layer normalization:
1. Layer normalization as in the original paper: Ref: https://arxiv.org/abs/1607.06450 This can be applied by having:
all default params (global_norm=True, cell_norm=True, cell_norm_in_output=True)2. Layer normalization for RNMT+: Ref: https://arxiv.org/abs/1804.09849 This can be applied by having:
all default params except  global_norm = False  per_gate_norm = True  cell_norm_in_output = False3. TF official LayerNormBasicLSTMCell Ref: https://www.tensorflow.org/api_docs/python/tf/contrib/rnn/LayerNormBasicLSTMCell This can be reproduced by having:
all default params except  global_norm = False  per_gate_norm = True4. Sockeye LSTM layer normalization implementations Ref: https://github.com/awslabs/sockeye/blob/master/sockeye/rnn.py
 LayerNormLSTMCell can be reproduced by having:
 all default params except  with_concat = False (just efficiency, no difference in the model)
 LayerNormPerGateLSTMCell can be reproduced by having:
 all default params except: ( with_concat = False)  global_norm = False  per_gate_norm = True
 Recurrent dropout is based on:
 https://arxiv.org/abs/1603.05118
Prohibited LN combinations:  global_norm and global_norm_joined both enabled  per_gate_norm with global_norm or global_norm_joined
Parameters:  num_units (int) – number of lstm units
 norm_gain (float) – layer normalization gain value
 norm_shift (float) – layer normalization shift (bias) value
 forget_bias (float) – the bias added to forget gates
 activation – Activation function to be applied in the lstm cell
 is_training (bool) – if True then we are in the training phase
 dropout (float) – dropout rate, applied on cellin (j)
 dropout_h (float) – dropout rate, applied on hidden state (h) when it enters the LSTM (variational dropout)
 dropout_seed (int) – used to create random seeds
 with_concat (bool) – if True then the input and prev hidden state is concatenated for the computation. this is just about computation performance.
 global_norm (bool) – if True then layer normalization is applied for the forward and recurrent outputs (separately).
 global_norm_joined (bool) – if True, then layer norm is applied on LSTM in (forward and recurrent output together)
 per_gate_norm (bool) – if True then layer normalization is applied per lstm gate
 cell_norm (bool) – if True then layer normalization is applied to the LSTM new cell output
 cell_norm_in_output (bool) – if True, the normalized cell is also used in the output
 hidden_norm (bool) – if True then layer normalization is applied to the LSTM new hidden state output
LayerRNNCell¶

class
tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.
LayerRNNCell
(trainable=True, name=None, dtype=None, **kwargs)[source]¶ Subclass of RNNCells that act like proper tf.Layer objects.
For backwards compatibility purposes, most RNNCell instances allow their call methods to instantiate variables via tf.compat.v1.get_variable. The underlying variable scope thus keeps track of any variables, and returning cached versions. This is atypical of tf.layer objects, which separate this part of layer building into a build method that is only called once.
Here we provide a subclass for RNNCell objects that act exactly as Layer objects do. They must provide a build method and their call methods do not access Variables tf.compat.v1.get_variable.
NativeLstmCell¶

class
returnn.tf.native_op.
NativeLstmCell
(forget_bias=0.0, **kwargs)[source]¶ Native LSTM.
Parameters: forget_bias (float) – 
classmethod
map_layer_inputs_to_op
(z, rec_weights, i, initial_state=None)[source]¶ Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op().
Parameters:  z (tf.Tensor) – Z: inputs: shape (time,batch,n_hidden*4)
 rec_weights (tf.Tensor) – V_h / W_re: shape (n_hidden,n_hidden*4)
 i (tf.Tensor) – index: shape (time,batch)
 initial_state (tf.TensorNone) – shape (batch,n_hidden)
Return type: (tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

classmethod
MultiRNNCell¶

class
tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.
MultiRNNCell
(cells, state_is_tuple=True)[source]¶ RNN cell composed sequentially of multiple simple cells.
Example:
`python num_units = [128, 64] cells = [BasicLSTMCell(num_units=n) for n in num_units] stacked_rnn_cell = MultiRNNCell(cells) `
Create a RNN cell composed sequentially of a number of RNNCells.
 Args:
cells: list of RNNCells that will be composed in this order. state_is_tuple: If True, accepted and returned states are ntuples, where
n = len(cells). If False, the states are all concatenated along the column axis. This latter behavior will soon be deprecated. Raises:
 ValueError: if cells is empty (not allowed), or at least one of the cells
 returns a state tuple but the flag state_is_tuple is False.

state_size
[source]¶ size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

zero_state
(batch_size, dtype)[source]¶ Return zerofilled state tensor(s).
 Args:
 batch_size: int, float, or unit Tensor representing the batch size. dtype: the data type to use for the state.
 Returns:
If state_size is an int or TensorShape, then the return value is a ND tensor of shape [batch_size, state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2D tensors with the shapes [batch_size, s] for each s in state_size.

trainable_weights
[source]¶ List of all trainable weights tracked by this layer.
Trainable weights are updated via gradient descent during training.
 Returns:
 A list of trainable variables.
NativeLstmCell¶

class
returnn.tf.native_op.
NativeLstmCell
(forget_bias=0.0, **kwargs)[source] Native LSTM.
Parameters: forget_bias (float) – 
classmethod
map_layer_inputs_to_op
(z, rec_weights, i, initial_state=None)[source] Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op().
Parameters:  z (tf.Tensor) – Z: inputs: shape (time,batch,n_hidden*4)
 rec_weights (tf.Tensor) – V_h / W_re: shape (n_hidden,n_hidden*4)
 i (tf.Tensor) – index: shape (time,batch)
 initial_state (tf.TensorNone) – shape (batch,n_hidden)
Return type: (tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

classmethod
NativeLstmLowMemCell¶

class
returnn.tf.native_op.
NativeLstmLowMemCell
(**kwargs)[source]¶ Native LSTM, low mem variant.

map_layer_inputs_to_op
(x, weights, b, i, initial_state=None)[source]¶ Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor x: inputs: shape (time,batch,n_input_dim) :param tf.Tensor weights: shape (n_input_dim+n_hidden,n_hidden*4) :param tf.Tensor b: shape (n_hidden*4,) :param tf.Tensor i: index: shape (time,batch) :param tf.TensorNone initial_state: shape (batch,n_hidden) :rtype: tuple[tf.Tensor]

RHNCell¶

class
returnn.tf.layers.rec.
RHNCell
(num_units, is_training=None, depth=5, dropout=0.0, dropout_seed=None, transform_bias=None, batch_size=None)[source]¶ Recurrent Highway Layer. With optional dropout for recurrent state (fixed over all frames  some call this variational).
 References:
 https://github.com/julian121266/RecurrentHighwayNetworks/ https://arxiv.org/abs/1607.03474
Parameters:  num_units (int) –
 is_training (booltf.TensorNone) –
 depth (int) –
 dropout (float) –
 dropout_seed (int) –
 transform_bias (floatNone) –
 batch_size (inttf.TensorNone) –
RNNCell¶

class
tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.
RNNCell
(trainable=True, name=None, dtype=None, **kwargs)[source]¶ Abstract object representing an RNN cell.
Every RNNCell must have the properties below and implement call with the signature (output, next_state) = call(input, state). The optional third input argument, scope, is allowed for backwards compatibility purposes; but should be left off for new subclasses.
This definition of cell differs from the definition used in the literature. In the literature, ‘cell’ refers to an object with a single scalar output. This definition refers to a horizontal array of such units.
An RNN cell, in the most abstract setting, is anything that has a state and performs some operation that takes a matrix of inputs. This operation results in an output matrix with self.output_size columns. If self.state_size is an integer, this operation also results in a new state matrix with self.state_size columns. If self.state_size is a (possibly nested tuple of) TensorShape object(s), then it should return a matching structure of Tensors having shape [batch_size].concatenate(s) for each s in self.batch_size.

state_size
[source]¶ size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

build
(_)[source]¶ Creates the variables of the layer (optional, for subclass implementers).
This is a method that implementers of subclasses of Layer or Model can override if they need a statecreation step inbetween layer instantiation and layer call.
This is typically used to create the weights of Layer subclasses.
 Args:
 input_shape: Instance of TensorShape, or list of instances of
 TensorShape if the layer expects a list of inputs (one instance per input).

zero_state
(batch_size, dtype)[source]¶ Return zerofilled state tensor(s).
 Args:
 batch_size: int, float, or unit Tensor representing the batch size. dtype: the data type to use for the state.
 Returns:
If state_size is an int or TensorShape, then the return value is a ND tensor of shape [batch_size, state_size] filled with zeros.
If state_size is a nested list or tuple, then the return value is a nested list or tuple (of the same structure) of 2D tensors with the shapes [batch_size, s] for each s in state_size.

get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
 Returns:
 Python dictionary.

LSTMCell¶

class
tensorflow.python.keras.layers.legacy_rnn.rnn_cell_impl.
LSTMCell
(num_units, use_peepholes=False, cell_clip=None, initializer=None, num_proj=None, proj_clip=None, num_unit_shards=None, num_proj_shards=None, forget_bias=1.0, state_is_tuple=True, activation=None, reuse=None, name=None, dtype=None, **kwargs)[source]¶ Long shortterm memory unit (LSTM) recurrent network cell.
The default nonpeephole implementation is based on (Gers et al., 1999). The peephole implementation is based on (Sak et al., 2014).
The class uses optional peephole connections, optional cell clipping, and an optional projection layer.
Note that this cell is not optimized for performance. Please use tf.contrib.cudnn_rnn.CudnnLSTM for better performance on GPU, or tf.contrib.rnn.LSTMBlockCell and tf.contrib.rnn.LSTMBlockFusedCell for better performance on CPU. References:
Long shortterm memory recurrent neural network architectures for large scale acoustic modeling:
 Learning to forget:
 [Gers et al., 1999] (http://digitallibrary.theiet.org/content/conferences/10.1049/cp_19991218) ([pdf](https://arxiv.org/pdf/1409.2329.pdf))
 Long ShortTerm Memory:
 [Hochreiter et al., 1997] (https://www.mitpressjournals.org/doi/abs/10.1162/neco.1997.9.8.1735) ([pdf](http://ml.jku.at/publications/older/3504.pdf))
Initialize the parameters for an LSTM cell.
 Args:
num_units: int, The number of units in the LSTM cell. use_peepholes: bool, set True to enable diagonal/peephole connections. cell_clip: (optional) A float value, if provided the cell state is clipped
by this value prior to the cell output activation. initializer: (optional) The initializer to use for the weight and
 projection matrices.
 num_proj: (optional) int, The output dimensionality for the projection
 matrices. If None, no projection is performed.
 proj_clip: (optional) A float value. If num_proj > 0 and proj_clip is
 provided, then the projected values are clipped elementwise to within [proj_clip, proj_clip].
 num_unit_shards: Deprecated, will be removed by Jan. 2017. Use a
 variable_scope partitioner instead.
 num_proj_shards: Deprecated, will be removed by Jan. 2017. Use a
 variable_scope partitioner instead.
 forget_bias: Biases of the forget gate are initialized by default to 1 in
 order to reduce the scale of forgetting at the beginning of the training. Must set it manually to 0.0 when restoring from CudnnLSTM trained checkpoints.
 state_is_tuple: If True, accepted and returned states are 2tuples of the
 c_state and m_state. If False, they are concatenated along the column axis. This latter behavior will soon be deprecated.
 activation: Activation function of the inner states. Default: tanh. It
 could also be string that is within Keras activation function names.
 reuse: (optional) Python boolean describing whether to reuse variables in
 an existing scope. If not True, and the existing scope already has the given variables, an error is raised.
 name: String, the name of the layer. Layers with the same name will share
 weights, but to avoid mistakes we require reuse=True in such cases.
 dtype: Default dtype of the layer (default of None means use the type of
 the first input). Required when build is called before call.
 **kwargs: Dict, keyword named properties for common layer attributes, like
 trainable etc when constructing the cell from configs of get_config(). When restoring from CudnnLSTMtrained checkpoints, use CudnnCompatibleLSTMCell instead.

state_size
[source]¶ size(s) of state(s) used by this cell.
It can be represented by an Integer, a TensorShape or a tuple of Integers or TensorShapes.

call
(inputs, state)[source]¶ Run one step of LSTM.
 Args:
inputs: input Tensor, must be 2D, [batch, input_size]. state: if state_is_tuple is False, this must be a state Tensor, `2D,
[batch, state_size]`. If state_is_tuple is True, this must be a tuple of state Tensors, both 2D, with column sizes c_state and m_state. Returns:
A tuple containing:
A 2D, [batch, output_dim], Tensor representing the output of the LSTM after reading inputs when previous state was state. Here output_dim is:
num_proj if num_proj was set, num_units otherwise.
Tensor(s) representing the new state of LSTM after reading inputs when the previous state was state. Same type and shape(s) as state.
 Raises:
 ValueError: If input size cannot be inferred from inputs via
 static shape inference.

get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Network (one layer of abstraction above).
Note that get_config() does not guarantee to return a fresh copy of dict every time it is called. The callers should make a copy of the returned dict if they want to modify it.
 Returns:
 Python dictionary.
TwoDNativeLstmCell¶

class
returnn.tf.native_op.
TwoDNativeLstmCell
(pooling, **kwargs)[source]¶ Native 2D LSTM.

classmethod
map_layer_inputs_to_op
(X, V_h, V_v, W, i, previous_state=None, previous_output=None, iteration=None)[source]¶ Just like NativeOp.LstmGenericBase.map_layer_inputs_to_op(). :param tf.Tensor X: inputs: shape (timeT,timeS,batch,n_hidden*5) :param tf.Tensor V_h: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor V_v: W_re: shape (n_hidden,n_hidden*5) :param tf.Tensor W: :param tf.Tensor i: index: shape (time,batch) :param tf.Tensor previous_state: :param tf.Tensor previous_output: :param tf.Tensor iteration: :rtype: (tf.Tensor,tf.Tensor,tf.Tensor,tf.Tensor)

classmethod
ZoneoutLSTMCell¶

class
returnn.tf.layers.rec.
ZoneoutLSTMCell
(num_units, zoneout_factor_cell=0.0, zoneout_factor_output=0.0)[source]¶ Wrapper for tf LSTM to create Zoneout LSTM Cell. This code is an adapted version of Rayhane Mamas version of Tacotron2
Refs:
Initializer with possibility to set different zoneout values for cell/hidden states.
Parameters:  num_units (int) – number of hidden units
 zoneout_factor_cell (float) – cell zoneout factor
 zoneout_factor_output (float) – output zoneout factor