Core Tensor Utils¶

backend¶

These are utility functions that are similar to calls to Keras’ backend. Some of these are here because a current function in keras.backend is broken, some are things that just haven’t been implemented.

deep_qa.tensors.backend.apply_feed_forward(input_tensor, weights, activation)[source]¶: Takes an input tensor, sequence of weights and an activation and builds an MLP. This can also be achieved by defining a sequence of Dense layers in Keras, but doing this might be desirable if the operation needs to be done within the call method of a more complex layer. Moreover, we are not applying biases here. The input tensor can have any number of dimensions. But the last dimension, and the sequence of weights are expected to be compatible.

deep_qa.tensors.backend.hardmax(unnormalized_attention, knowledge_length)[source]¶

A similar operation to softmax, except all of the weight is placed on the mode of the distribution. So, e.g., this function transforms [.34, .2, -1.4] -> [1, 0, 0].

TODO(matt): we really should have this take an optional mask...

deep_qa.tensors.backend.l1_normalize(tensor_to_normalize, mask=None)[source]¶

Normalize a tensor by its L1 norm. Takes an optional mask.

When the vector to be normalized is all 0’s we return the uniform distribution (taking masking into account, so masked values are still 0.0). When the vector to be normalized is completely masked, we return the uniform distribution over the max padding length of the tensor.

See the tests for concrete examples of the aforementioned behaviors.

Parameters:

tensor_to_normalize : Tensor

Tensor of shape (batch size, x) to be normalized, where x is arbitrary.

mask: Tensor, optional

Tensor of shape (batch size, x) indicating which elements of tensor_to_normalize are padding and should not be considered when normalizing.

Returns:

normalized_tensor : Tensor

Normalized tensor with shape (batch size, x).

deep_qa.tensors.backend.last_dim_flatten(input_tensor)[source]¶: Takes a tensor and returns a matrix while preserving only the last dimension from the input.

deep_qa.tensors.backend.switch(cond, then_tensor, else_tensor)[source]¶: Keras’ implementation of K.switch currently uses tensorflow’s switch function, which only accepts scalar value conditions, rather than boolean tensors which are treated in an elementwise function. This doesn’t match with Theano’s implementation of switch, but using tensorflow’s where, we can exactly retrieve this functionality.

deep_qa.tensors.backend.tile_scalar(scalar, vector)[source]¶

NOTE: If your vector has known shape (i.e., the relevant dimension from K.int_shape(vector) is not None), you should just use K.repeat_elements(scalar) instead of this. This method works, however, when the number of entries in your vector is unknown at graph compilation time.

This method takes a (collection of) scalar(s) (shape: (batch_size, 1)), and tiles that scala a number of times, giving a vector of shape (batch_size, tile_length). (I say “scalar” and “vector” here because I’m ignoring the batch_size). We need the vector as input so we know what the tile_length is - the vector is otherwise ignored.

This is not done as a Keras Layer, however; if you want to use this function, you’ll need to do it _inside_ of a Layer somehow, either in a Lambda or in the call() method of a Layer you’re writing.

TODO(matt): we could probably make a more general tile_tensor method, which can do this for any dimenionsality. There is another place in the code where we do this with a matrix and a tensor; all three of these can probably be one function.

deep_qa.tensors.backend.tile_vector(vector, matrix)[source]¶

NOTE: If your matrix has known shape (i.e., the relevant dimension from K.int_shape(matrix) is not None), you should just use K.repeat_elements(vector) instead of this. This method works, however, when the number of rows in your matrix is unknown at graph compilation time.

This method takes a (collection of) vector(s) (shape: (batch_size, vector_dim)), and tiles that vector a number of times, giving a matrix of shape (batch_size, tile_length, vector_dim). (I say “vector” and “matrix” here because I’m ignoring the batch_size). We need the matrix as input so we know what the tile_length is - the matrix is otherwise ignored.

This is necessary in a number of places in the code. For instance, if you want to do a dot product of a vector with all of the vectors in a matrix, the most efficient way to do that is to tile the vector first, then do an element-wise product with the matrix, then sum out the last mode. So, we capture this functionality here.

This is not done as a Keras Layer, however; if you want to use this function, you’ll need to do it _inside_ of a Layer somehow, either in a Lambda or in the call() method of a Layer you’re writing.

deep_qa.tensors.backend.very_negative_like(tensor)[source]¶

masked_operations¶

deep_qa.tensors.masked_operations.masked_batch_dot(tensor_a, tensor_b, mask_a, mask_b)[source]¶

The simplest case where this function is applicable is the following:

tensor_a: (batch_size, a_length, embed_dim) tensor_b: (batch_size, b_length, embed_dim) mask_a: None or (batch_size, a_length) mask_b: None or (batch_size, b_length)

Returns: a_dot_b: (batch_size, a_length, b_length), with zeros for masked elements.

This function will also work for larger tensors, as long as abs(K.ndim(tensor_a) - K.ndim(tensor_b)) < 1 (this is due to the limitations of K.batch_dot). We always assume the dimension to perform the dot is the last one, and that the masks have one fewer dimension than the tensors.

deep_qa.tensors.masked_operations.masked_softmax(vector, mask)[source]¶

K.softmax(vector) does not work if some elements of vector should be masked. This performs a softmax on just the non-masked portions of vector (passing None in for the mask is also acceptable; you’ll just get a regular softmax).

We assume that both vector and mask (if given) have shape (batch_size, vector_dim).

In the case that the input vector is completely masked, this function returns an array of 0.0. This behavior may cause NaN if this is used as the last layer of a model that uses categorial cross-entropy loss.