Backend Layers

Layers in this module generally just implement some simple operation from the Keras backend as a Layer. The reason we have these as Layers is largely so that we can properly handle masking.

AddMask

class deep_qa.layers.backend.add_mask.AddMask(mask_value: float = 0.0, **kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

This Layer adds a mask to a tensor. It is intended solely for testing, though if you have a use case for this outside of testing, feel free to use it. The call() method just returns the inputs, and the compute_mask method calls K.not_equal(inputs, mask_value), and that’s it. This is different from Keras’ Masking layer, which assumes higher-order input and does a K.any() call in compute_mask.

Input:
  • tensor: a tensor of arbitrary shape
Output:
  • the same tensor, now with a mask attached of the same shape
Parameters:

mask_value: float, optional (default=0.0)

This is the value that we will compare to in compute_mask.

compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).

# Returns
Python dictionary.

BatchDot

class deep_qa.layers.backend.batch_dot.BatchDot(**kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

This Layer calls K.batch_dot() on two inputs tensor_a and tensor_b. This function will work for tensors of arbitrary size as long as abs(K.ndim(tensor_a) - K.ndim(tensor_b)) < 1, due to limitations in K.batch_dot(). When the input tensors have more than three dimensions, they must have the same shape, except for the last two dimensions. See the examples for more explanation of what this means.

We always assume the dimension to perform the dot is the last one, and that the masks have one fewer dimension that the tensors. Note that this layer does not return zeroes in places that are masked, but does pass a correct mask forward. If this then gets fed into masked_softmax, for instance, your tensor will be correctly normalized. We always assume the dimension to perform the dot is the last one, and that the masks have one fewer dimension than the tensors.

Inputs:
  • tensor_a: tensor with ndim >= 2.
  • tensor_b: tensor with ndim >= 2.
Output:
  • a_dot_b

Examples

The following examples will try to give some insight on how this layer works in relation to K.batch_dot(). Note that the Keras documentation (as of 2/13/17) on K.batch_dot is incorrect, and that this layer behaves differently from the documented behavior.

As a first example, let’s suppose that tensor_a and tensor_b have the same number of dimensions. Let the shape of tensor_a be (2, 3, 2), and let the shape of tensor_b be (2, 4, 2). The mask accompanying these inputs always has one less dimension, so the tensor_a_mask has shape (2, 3) and tensor_b_mask has shape (2, 4). The shape of the batch_dot output would thus be (2, 3, 4). This is because we are taking the batch dot of the last dimension, so the output shape is (2, 3) (from tensor_a) with (4) (from tensor_b) appended on (to get (2, 3, 4) in total). The output mask has the same shape as the output, and is thus (2, 3, 4) as well.

>>> import keras.backend as K
>>> tensor_a = K.ones(shape=(2, 3, 2))
>>> tensor_b = K.ones(shape=(2, 4, 2))
>>> K.eval(K.batch_dot(tensor_a, tensor_b, axes=(2,2))).shape
(2, 3, 4)

Next, let’s look at an example where tensor_a and tensor_b are “uneven” (different number of dimensions). Let the shape of tensor_a be (2, 4, 2), and let the shape of tensor_b be (2, 4, 3, 2). The mask accompanying these inputs always has one less dimension, so the tensor_a_mask has shape (2, 4) and tensor_b_mask has shape (2, 4, 3). The shape of the batch_dot output would thus be (2, 4, 3). In the case of uneven tensors, we always expand the last dimension of the smaller tensor to make them even. Thus in this case, we expand tensor_a to get a new shape of (2, 4, 2, 1). Now we are taking the batch_dot of a tensor with shape (2, 4, 2, 1) and (2, 4, 3, 2). Note that the first two dimensions of this tensor are the same (2, 4) – this is a requirement imposed by K.batch_dot. Following the methodology of calculating the output shape above, we get that the output is (2, 4, 1, 3) since we get (2, 4, 1) from tensor_a and (3) from tensor_b. We then squeeze the tensor to remove the 1-dimension to get a final shape of (2, 4, 3). Note that the mask has the same shape.

>>> import keras.backend as K
>>> tensor_a = K.ones(shape=(2, 4, 2))
>>> tensor_b = K.ones(shape=(2, 4, 3, 2))
>>> tensor_a_expanded = K.expand_dims(tensor_a, axis=-1)
>>> unsqueezed_bd = K.batch_dot(tensor_a_expanded, tensor_b, axes=(2,3))
>>> final_bd = K.squeeze(unsqueezed_bd, axis=K.ndim(tensor_a)-1)
>>> K.eval(final_bd).shape
(2, 4, 3)

Lastly, let’s look at the uneven case where tensor_a has more dimensions than tensor_b. Let the shape of tensor_a be (2, 3, 4, 2), and let the shape of tensor_b be (2, 3, 2). Since the mask accompanying these inputs always has one less dimension, tensor_a_mask has shape (2, 3, 4) and tensor_b_mask has shape (2, 3). The shape of the batch_dot output would thus be (2, 3, 4). Since these tensors are uneven, expand the smaller tensor, tensor_b, to get a new shape of (2, 3, 2, 1). Now we are taking the batch_dot of a tensor with shape (2, 3, 4, 2) and (2, 3, 2, 1). Note again that the first two dimensions of this tensor are the same (2, 3). We can see that the output shape is (2, 3, 4, 1) since we get (2, 3, 4) from tensor_a and (1) from tensor_b. We then squeeze the tensor to remove the 1-dimension to get a final shape of (2, 3, 4). Note that the mask has the same shape.

>>> import keras.backend as K
>>> tensor_a = K.ones(shape=(2, 3, 4, 2))
>>> tensor_b = K.ones(shape=(2, 3, 2))
>>> tensor_b_expanded = K.expand_dims(tensor_b, axis=-1)
>>> unsqueezed_bd = K.batch_dot(tensor_a, tensor_b_expanded, axes=(3, 2))
>>> final_bd = K.squeeze(unsqueezed_bd, axis=K.ndim(tensor_a)-1)
>>> K.eval(final_bd).shape
(2, 3, 4)
compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.

CollapseToBatch

class deep_qa.layers.backend.collapse_to_batch.CollapseToBatch(num_to_collapse: int, **kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

Reshapes a higher order tensor, taking the first num_to_collapse dimensions after the batch dimension and folding them into the batch dimension. For example, a tensor of shape (2, 4, 5, 3), collapsed with num_to_collapse = 2, would become a tensor of shape (40, 3). We perform identical computation on the input mask, if there is one.

This is essentially what Keras’ TimeDistributed layer does (and then undoes) to apply a layer to a higher-order tensor, and that’s the intended use for this layer. However, TimeDistributed cannot handle distributing across dimensions with unknown lengths at graph compilation time. This layer works even in that case. So, if your actual tensor shape at graph compilation time looks like (None, None, None, 3), or (None, 4, None, 3), you can still use this layer (and ExpandFromBatch) to get the same result as TimeDistributed. If your shapes are fully known at graph compilation time, just use TimeDistributed, as it’s a nicer API for the same functionality.

Inputs:
  • tensor with ndim >= 3
Output:
  • tensor with ndim = input_ndim - num_to_collapse, with the removed dimensions folded into the first (batch-size) dimension
Parameters:

num_to_collapse: int

The number of dimensions to fold into the batch size.

compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).

# Returns
Python dictionary.

ExpandFromBatch

class deep_qa.layers.backend.expand_from_batch.ExpandFromBatch(num_to_expand: int, **kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

Reshapes a collapsed tensor, taking the batch size and separating it into num_to_expand dimensions, following the shape of a second input tensor. This is meant to be used in conjunction with CollapseToBatch, to achieve the same effect as Keras’ TimeDistributed layer, but for shapes that are not fully specified at graph compilation time.

For example, say you had an original tensor of shape (None (2), 4, None (5), 3), then collapsed it with CollapseToBatch(2)(tensor) to get a tensor with shape (None (40), 3) (here I’m using None (x) to denote a dimension with unknown length at graph compilation time, where x is the actual runtime length). You can then call ExpandFromBatch(2)(collapsed, tensor) with the result to expand the first two dimensions out of the batch again (presumably after you’ve done some computation when it was collapsed).

Inputs:
  • a tensor that has been collapsed with CollapseToBatch(num_to_expand).
  • the original tensor that was used as input to CollapseToBatch (or one with identical shape in the collapsed dimensions). We will use this input only to get its shape.
Output:
  • tensor with ndim = input_ndim + num_to_expand, with the additional dimensions coming immediately after the first (batch-size) dimension.
Parameters:

num_to_expand: int

The number of dimensions to expand from the batch size.

compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).

# Returns
Python dictionary.

Envelope

class deep_qa.layers.backend.envelope.Envelope(**kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

Given a probability distribution over a begin index and an end index of some sequence, this Layer computes an envelope over the sequence, a probability that each element lies within “begin” and “end”.

Specifically, the computation done here is the following:

after_span_begin = K.cumsum(span_begin, axis=-1)
after_span_end = K.cumsum(span_end, axis=-1)
before_span_end = 1 - after_span_end
envelope = after_span_begin * before_span_end
Inputs:
  • span_begin: tensor with shape (batch_size, sequence_length), representing a probability distribution over a start index in the sequence
  • span_end: tensor with shape (batch_size, sequence_length), representing a probability distribution over an end index in the sequence
Outputs:
  • envelope: tensor with shape (batch_size, sequence_length), representing a probability for each index of the sequence belonging in the span

If there is a mask associated with either of the inputs, we ignore it, assuming that you used the mask correctly when you computed your probability distributions. But we support masking in this layer, so that you have an output mask if you really need it. We just return the first mask that is not None (or None, if both are None).

compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.

Max

class deep_qa.layers.backend.max.Max(axis: int = -1, **kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

This Layer performs a max over some dimension. Keras has a similar layer called GlobalMaxPooling1D, but it is not as configurable as this one, and it does not support masking.

If the mask is not None, it must be the same shape as the input.

Input:
  • A tensor of arbitrary shape (having at least 3 dimensions).
Output:
  • A tensor with one less dimension, where we have taken a max over one of the dimensions.
compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).

# Returns
Python dictionary.

Permute

class deep_qa.layers.backend.permute.Permute(pattern: typing.Tuple[int], **kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

This Layer calls K.permute_dimensions on both the input and the mask.

If the mask is not None, it must have the same shape as the input.

Input:
  • A tensor of arbitrary shape.
Output:
  • A tensor with permuted dimensions.
compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.

Repeat

class deep_qa.layers.backend.repeat.Repeat(axis: int, repetitions: int, **kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

This Layer calls K.repeat_elements on both the input and the mask, after calling K.expand_dims.

If the mask is not None, we must be able to call K.expand_dims using the same axis parameter as we do for the input.

Input:
  • A tensor of arbitrary shape.
Output:
  • The input tensor repeated along one of the dimensions.
Parameters:

axis: int

We will add a dimension to the input tensor at this axis.

repetitions: int

The new dimension will have this size to it, with each slice being identical to the original input tensor.

compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).

# Returns
Python dictionary.

RepeatLike

class deep_qa.layers.backend.repeat_like.RepeatLike(axis: int, copy_from_axis: int, **kwargs)[source]

Bases: deep_qa.layers.masked_layer.MaskedLayer

This Layer is like Repeat, but gets the number of repetitions to use from a second input tensor. This allows doing a number of repetitions that is unknown at graph compilation time, and is necessary when the repetitions argument to Repeat would be None.

If the mask is not None, we must be able to call K.expand_dims using the same axis parameter as we do for the input.

Input:
  • A tensor of arbitrary shape, which we will expand and tile.
  • A second tensor whose shape along one dimension we will copy
Output:
  • The input tensor repeated along one of the dimensions.
Parameters:

axis: int

We will add a dimension to the input tensor at this axis.

copy_from_axis: int

We will copy the dimension from the second tensor at this axis.

compute_mask(inputs, mask=None)[source]

Computes an output mask tensor.

# Arguments
inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
# Returns
None or a tensor (or list of tensors,
one per output tensor of the layer).
compute_output_shape(input_shape)[source]

Computes the output shape of the layer.

Assumes that the layer will be built to match that input shape provided.

# Arguments
input_shape: Shape tuple (tuple of integers)
or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
# Returns
An input shape tuple.
get_config()[source]

Returns the config of the layer.

A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.

The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).

# Returns
Python dictionary.