Home¶
DeepQA is a library built on top of Keras to make NLP easier. There are four main benefits to this library:
- It is hard to get NLP right in Keras. There are a lot of issues around padding sequences and masking that are not handled well in the main Keras code, and we have well-tested code that does the right thing for, e.g., computing attentions over padded sequences, or distributing text encoders across several sentences or words.
- We have implemented a base class,
TextTrainer
, that provides a nice, consistent API around building NLP models in Keras. This API has functionality around processing data instances, embedding words and/or characters, easily getting various kinds of sentence encoders, and so on. - We provide a nice interface to training, validating, and debugging Keras models. It is very
easy to experiment with variants of a model family, just by changing some parameters in a JSON
file. For example, you can go from using fixed GloVe vectors to represent words, to fine-tuning
those embeddings, to using a concatenation of word vectors and a character-level CNN to
represent words, just by changing parameters in a JSON experiment file. If your model is built
using the
TextTrainer
API, all of this works transparently to the model class - the model just knows that it’s getting some kind of word vector. - We have implemented a number of state-of-the-art models, particularly focused around question answering systems (though we’ve dabbled in models for other tasks, as well). The actual model code for these systems are typically 50 lines or less.
This library has several main components:
- A
training
module, which has a bunch of helper code for training Keras models of various kinds. - A
models
module, containing implementations of actual Keras models grouped around various prediction tasks. - A
layers
module, which contains code for custom Keras Layers that we have written. - A
data
module, containing code for reading in data from files and converting it into numpy arrays suitable for use with Keras. - A
common
module, which has a few random things dealing with reading parameters and a few other things.
Running Models¶
-
deep_qa.run.
compute_accuracy
(predictions: <built-in function array>, labels: <built-in function array>)[source]¶ Computes a simple categorical accuracy metric, useful if you used
score_dataset
to get predictions.
-
deep_qa.run.
evaluate_model
(param_path: str, dataset_files: typing.List[str] = None, model_class=None)[source]¶ Loads a model and evaluates it on some test set.
Parameters: param_path: str, required
A json file specifying a DeepQaModel.
dataset_files: List[str], optional, (default=None)
A list of dataset files to evaluate on. If this is
None
, we’ll evaluate from thetest_files
parameter in the input files. If that’s alsoNone
, we’ll crash.model_class: DeepQaModel, optional (default=None)
This option is useful if you have implemented a new model class which is not one of the ones implemented in this library.
Returns: Numpy arrays of model predictions in the format of model.outputs.
-
deep_qa.run.
load_model
(param_path: str, model_class=None)[source]¶ Loads and returns a model.
Parameters: param_path: str, required
A json file specifying a DeepQaModel.
model_class: DeepQaModel, optional (default=None)
This option is useful if you have implemented a new model class which is not one of the ones implemented in this library.
Returns: A
DeepQaModel
instance.
-
deep_qa.run.
prepare_environment
(params: typing.Union[deep_qa.common.params.Params, dict])[source]¶ Sets random seeds for reproducible experiments. This may not work as expected if you use this from within a python project in which you have already imported Keras. If you use the scripts/run_model.py entry point to training models with this library, your experiments should be reproducible. If you are using this from your own project, you will want to call this function before importing Keras.
Parameters: params: Params object or dict, required.
A
Params
object or dict holding the json parameters.
-
deep_qa.run.
run_model
(param_dict: typing.Dict[str, <built-in function any>], model_class=None)[source]¶ This function is the normal entry point to DeepQA. Use this to run a DeepQA model in your project. Note that if you care about exactly reproducible experiments, you should avoid importing Keras before you import and use this function, as Keras relies on random seeds which can be set in this function via a JSON specification file.
Note that this function performs training and will also evaluate the trained model on development and test sets if provided in the parameter json.
Parameters: param_dict: Dict[str, any], required.
A parameter file specifying a DeepQaModel.
model_class: DeepQaModel, optional (default=None).
This option is useful if you have implemented a new model class which is not one of the ones implemented in this library.
-
deep_qa.run.
run_model_from_file
(param_path: str)[source]¶ A wrapper around the run_model function which loads json from a file.
Parameters: param_path: str, required.
A json paramter file specifying a DeepQA model.
-
deep_qa.run.
score_dataset
(param_path: str, dataset_files: typing.List[str], model_class=None)[source]¶ Loads a model from a saved parameter path and scores a dataset with it, returning the predictions.
Parameters: param_path: str, required
A json file specifying a DeepQaModel.
dataset_files: List[str]
A list of dataset files to score, the same as you would have specified as
train_files
ortest_files
in your parameter file.model_class: DeepQaModel, optional (default=None)
This option is useful if you have implemented a new model class which is not one of the ones implemented in this library.
Returns: predictions: numpy.array
Numpy array of model predictions in the format of model.outputs (typically one array, but could be List[numpy.array] if your model has multiple outputs).
labels: numpy.array
The labels on the dataset, as read by the model. We return this so you can compute whatever metrics you want, if the data was labeled.
-
deep_qa.run.
score_dataset_with_ensemble
(param_paths: typing.List[str], dataset_files: typing.List[str], model_class=None) → typing.Tuple[<built-in function array>, <built-in function array>][source]¶ Loads all of the models specified in
param_paths
, uses each of them to score the dataset specified bydataset_files
, and averages their scores, return an array of ensembled model predictions.Parameters: param_paths: List[str]
A list of parameter files that were used to train models. You must have already trained the corresponding model, as we’ll load it and use it in an ensemble here.
dataset_files: List[str]
A list of dataset files to score, the same as you would have specified as
test_files
in any one of the model parameter files.model_class: ``DeepQaModel``, optional (default=None)
This option is useful if you have implemented a new model class which is not one of the ones implemented in this library.
Returns: predictions: numpy.array
Numpy array of model predictions in the format of model.outputs (typically one array, but could be List[numpy.array] if your model has multiple outputs).
labels: numpy.array
The labels on the dataset, as read by the first model. We return this so you can compute whatever metrics you want, if the data was labeled. Note that if your models all represent that data differently, this will only give the first one. Hopefully the representation of the labels is consistent across the models, though; if not, the whole idea of ensembling them this way is moot, anyway.
About Trainers¶
A Trainer
is the core interface to the DeepQA code. Trainers
specify data, a model, and a way to train the model with the data. This module groups all of the
common code related to these things, making only minimal assumptions about what kind of data you’re
using or what the structure of your model is. Really, a Trainer
is just a nicer interface to a
Keras Model
, we just call it something else to not create too much naming confusion, and
because the Trainer class provides a lot of functionality around training the model that a Keras
Model
doesn’t.
On top of Trainer
, which is a nicer interface to a Keras Model
, this module provides a
TextTrainer
, which adds a lot of functionality for building Keras Models
that work with
text. We provide APIs around word embeddings, sentence encoding, reading and padding datasets, and
similar things. All of the concrete models that we have so far in DeepQA inherit from
TextTrainer
, so understanding how to use this class is pretty important to understanding
DeepQA.
We also deal with the notion of pre-training in this module. A Pretrainer is a Trainer that depends on another Trainer, building its model using pieces of the enclosed Trainer, so that training the Pretrainer updates the weights in the enclosed Trainer object.
Trainer¶
-
class
deep_qa.training.trainer.
Trainer
(params: deep_qa.common.params.Params)[source]¶ A Trainer object specifies data, a model, and a way to train the model with the data. Here we group all of the common code related to these things, making only minimal assumptions about what kind of data you’re using or what the structure of your model is.
The main benefits of this class are having a common place for setting parameters related to training, actually running the training with those parameters, and code for saving and loading models.
The intended use of this class is that you construct a subclass that defines a model, overriding the abstract methods and (optionally) some of the protected methods in this class. Thus there are four kinds of methods in this class: (1) public methods, that are typically only used by
deep_qa/run.py
(or some other driver that you create), (2) abstract methods (beginning with_
), which must be overridden by any concrete subclass, (3) protected methods (beginning with_
) that you are meant to override in concrete subclasses, and (4) private methods (beginning with__
) that you should not need to mess with. We only include the first three in the public docs.Parameters: train_files: List[str], optional (default=None)
The files containing the data that should be used for training. See
load_dataset_from_files()
for more information.validation_files: List[str], optional (default=None)
The files containing the data that should be used for validation, if you do not want to use a split of the training data for validation. The default of None means to just use the validation_split parameter to split the training data for validation.
test_files: List[str], optional (default=None)
The files containing the data that should be used for evaluation. The default of None means to just not perform test set evaluation.
max_training_instances: int, optional (default=None)
Upper limit on the number of training instances. If this is set, and we get more than this, we will truncate the data. Mostly useful for testing things out on small datasets before running them on large datasets.
max_validation_instances: int, optional (default=None)
Upper limit on the number of validation instances, analogous to
max_training_instances
.max_test_instances: int, optional (default=None)
Upper limit on the number of test instances, analogous to
max_training_instances
.train_steps_per_epoch: int, optional (default=None)
If
create_data_arrays()
returns a generator instead of actual arrays, how many steps should we run from this generator before declaring an “epoch” finished? The default here is reasonable - if this is None, we will set it from the data.validation_steps: int, optional (default=None)
Like
train_steps_per_epoch
, but for validation data.test_steps: int, optional (default=None)
Like
train_steps_per_epoch
, but for test data.save_models: bool, optional (default=True)
Should we save the models that we train? If this is True, you are required to also set the model_serialization_prefix parameter, or the code will crash.
model_serialization_prefix: str, optional (default=None)
Prefix for saving and loading model files. Must be set if
save_models
isTrue
.num_gpus: int, optional (default=1) Number of GPUs to use. In DeepQa we use Data Parallelism,
meaning that we create copies of the full model for each GPU, allowing the batch size of your model to be scaled depending on the number of GPUs. Note that using multiple GPUs effectively increases your batch size by the number of GPUs you have, meaning that other code which depends on the batch size will be effected - for example, if you are using dynamic padding, the batches will be larger and hence more padded, as the dataset is chunked into fewer overall batches.
batch_size: int, optional (default=32)
Batch size to use when training.
num_epochs: int, optional (default=20)
Number of training epochs.
validation_split: float, optional (default=0.1)
Amount of training data to use for validation. If
validation_files
is not set, we will split the training data into train/dev, using this proportion as dev. Ifvalidation_files
is set, this parameter gets ignored.optimizer: str or Dict[str, Any], optional (default=’adam’)
If this is a str, it must correspond to an optimizer available in Keras (see the list in
deep_qa.training.optimizers
). If it is a dictionary, it must contain a “type” key, with a value that is one of the optimizers in that list. The remaining parameters in the dict are passed as kwargs to the optimizer’s constructor.loss: str, optional (default=’categorical_crossentropy’)
The loss function to pass to
model.fit()
. This is currently limited to only loss functions that are available as strings in Keras. If you want to use a custom loss function, simply overrideself.loss
in the constructor of your model, after the call tosuper().__init__
.metrics: List[str], optional (default=[‘accuracy’])
The metrics to evaluate and print after each epoch of training. This is currently limited to only loss functions that are available as strings in Keras. If you want to use a custom metric, simply override
self.metrics
in the constructor of your model, after the call tosuper().__init__
.validation_metric: str, optional (default=’val_acc’)
Metric to monitor on the validation data for things like early stopping and saving the best model.
patience: int, optional (default=1)
Number of epochs to be patient before early stopping. I.e., if the
validation_metric
does not improve for this many epochs, we will stop training.fit_kwargs: Dict[str, Any], optional (default={})
A dict of additional arguments to Keras’
model.fit()
method, in case you want to set something that we don’t already have options for. These get added to the options already captured by other arguments.tensorboard_log: str, optional (default=None)
If set, we will output tensorboard log information here.
tensorboard_histogram_freq: int, optional (default=0)
Tensorboard histogram frequency: note that activating the tensorboard histgram (frequency > 0) can drastically increase model training time. Please set frequency with consideration to desired runtime.
debug: Dict[str, Any], optional (default={})
This should be a dict, containing the following keys:
- “layer_names”, which has as a value a list of names that must match layer names in the model built by this Trainer.
- “data”, which has as a value either “training”, “validation”, or a list of file names. If you give “training” or “validation”, we’ll use those datasets, otherwise we’ll load data from the provided files. Note that currently “validation” only works if you provide validation files, not if you’re just using Keras to split the training data.
- “masks”, an optional key that functions identically to “layer_names”, except we output the mask at each layer given here.
show_summary_with_masking_info: bool, optional (default=False)
This is a debugging setting, mostly - we have written a custom model.summary() method that supports showing masking info, to help understand what’s going on with the masks.
Public methods¶
-
Trainer.
load_data_arrays
(data_files: typing.List[str], batch_size: int = None, max_instances: int = None) → typing.Tuple[deep_qa.data.datasets.dataset.Dataset, <built-in function array>, <built-in function array>][source]¶ Loads a
Dataset
from a list of files, then converts it into numpy arrays for both inputs and outputs, returning all three of these to you. This literally just callsself.load_dataset_from_files
, thenself.create_data_arrays
; it’s just a convenience method if you want to do both of these at the same time, and also lets you truncate the dataset if you want.Note that if you have any kind of state in your model that depends on a training dataset (e.g., a vocabulary, or padding dimensions) those must be set prior to calling this method.
Parameters: data_files: List[str]
The files to load. These will get passed to
self.load_dataset_from_files()
, which subclasses must implement.batch_size: int, optional (default = None)
Optionally pass a specific batch size to load the data arrays with. If this is not specified, we use the default self.batch_size attribute. This is a parameter so you can specify different batch sizes for training vs validation, for instance, which is useful if you are doing multi-gpu training.
max_instances: int, optional (default=None)
If not
None
, we will restrict the dataset to only this many instances. This is mostly useful for testing models out on subsets of your data.Returns: dataset: Dataset
A
Dataset
object containing the instances read from the data filesinput_arrays: numpy.array
An array or tuple of arrays suitable to be passed as inputs
x
to Keras’model.fit(x, y)
,model.evaluate(x, y)
ormodel.predict(x)
methodslabel_arrays: numpy.array
An array or tuple of arrays suitable to be passed as outputs
y
to Keras’model.fit(x, y)
ormodel.evaluate(x, y)
methods
Abstract methods¶
If you’re doing NLP, TextTrainer
implements most of these,
so you shouldn’t have to worry about them. The only one it doesn’t is _build_model
(though it
adds some other abstract methods that you might have to worry about).
-
Trainer.
create_data_arrays
(dataset: deep_qa.data.datasets.dataset.IndexedDataset, batch_size: int = None) → typing.Tuple[<built-in function array>, <built-in function array>][source]¶ Takes a raw dataset and converts it into training inputs and labels that can be used to either train a model or make predictions. Depending on parameters passed to the constructor of this
Trainer
, this could either return two actual array objects, or a single generator that generates batches of two array objects.Parameters: dataset: Dataset
A
Dataset
of the same format as read byload_dataset_from_files()
(we will call this directly with the output from that method, in fact)batch_size: int, optional (default = None)
The batch size with which the dataset should be created. If this is None, the default self.batch_size will be used.
Returns: input_arrays: numpy.array or Tuple[numpy.array]
label_arrays: numpy.array, Tuple[numpy.array], or None
generator: a Python generator returning Tuple[input_arrays, label_arrays]
If this is returned, it is the only return value. We either return a
Tuple[input_arrays, label_arrays]
, or this generator.
-
Trainer.
load_dataset_from_files
(files: typing.List[str]) → deep_qa.data.datasets.dataset.Dataset[source]¶ Given a list of file inputs, load a raw dataset from the files. This is a list because some datasets are specified in more than one file (e.g., a file containing the instances, and a file containing background information about those instances).
-
Trainer.
score_dataset
(dataset: deep_qa.data.datasets.dataset.Dataset) → typing.Tuple[<built-in function array>, <built-in function array>][source]¶ Takes a
Dataset
, indexes it, and returns the output of evaluating the model on all instances, and labels for the instances from the data, if they were given. The specifics of the numpy array that are returned depend on the model and the instance type in the dataset.Parameters: dataset: Dataset
A
Dataset
read by :func:`~Trainer.load_dataset_from_files().Returns: predictions: numpy.array
Predictions for each
Instance
in theDataset
. This could actually be a tuple/list of arrays, if your model has multiple outputslabels: numpy.array
The labels for each
Instance
in theDataset
, if there were any (this will beNone
if there were no labels). We return this so you can easily compute metrics over these predictions if you wish. It’s hard to get numpy arrays with the labels from a non-indexed-and-paddedDataset
, so we return it here so you don’t have to do any funny business to get the label array.
-
Trainer.
set_model_state_from_dataset
(dataset: deep_qa.data.datasets.dataset.Dataset)[source]¶ Given a raw
Dataset
object, set whatever model state is necessary. The most obvious use case for this is for computing a vocabulary inTextTrainer
. Note that this is not anIndexedDataset
, and you should not make it one. Useset_model_state_from_indexed_dataset()
for setting state that depends on the data having already been indexed; otherwise you’ll duplicate the work of doing the indexing.
-
Trainer.
set_model_state_from_indexed_dataset
(dataset: deep_qa.data.datasets.dataset.IndexedDataset)[source]¶ Given an
IndexedDataset
, set whatever model state is necessary. This is typically stuff around padding.
-
Trainer.
_build_model
() → deep_qa.training.models.DeepQaModel[source]¶ Constructs and returns a DeepQaModel (which is a wrapper around a Keras Model) that will take the output of self._get_training_data as input, and produce as output a true/false decision for each input. Note that in the multiple gpu case, this function will be called multiple times for the different GPUs. As such, you should be wary of this function having side effects unrelated to building a computation graph.
The returned model will be used to call model.fit(train_input, train_labels).
-
Trainer.
_set_params_from_model
()[source]¶ Called after a model is loaded, this lets you update member variables that contain model parameters, like max sentence length, that are not stored as weights in the model object. This is necessary if you want to process a new data instance to be compatible with the model for prediction, for instance.
-
Trainer.
_dataset_indexing_kwargs
() → typing.Dict[str, typing.Any][source]¶ In order to index a dataset, we may need some parameters (e.g., an object that stores the vocabulary of your model, in order to convert words into indices). You can pass those here, or return an emtpy dictionary if there’s nothing. These will get passed to
Dataset.to_indexed_dataset()
.
Protected methods¶
-
Trainer.
_get_callbacks
()[source]¶ Returns a set of Callbacks which are used to perform various functions within Keras’ .fit method. Here, we use an early stopping callback to add patience with respect to the validation metric and a Lambda callback which performs the model specific callbacks which you might want to build into a model, such as re-encoding some background knowledge.
Additionally, there is also functionality to create Tensorboard log files. These can be visualised using ‘tensorboard –logdir /path/to/log/files’ after training.
-
classmethod
Trainer.
_get_custom_objects
()[source]¶ If you’ve used any Layers that Keras doesn’t know about, you need to specify them in this dictionary, so we can load them correctly.
-
Trainer.
_instance_debug_output
(instance: deep_qa.data.instances.instance.Instance, outputs: typing.Dict[str, <built-in function array>]) → str[source]¶ This method takes an Instance and all of the debug outputs for that Instance, puts them into some human-readable format, and returns that as a string. outputs will have one key corresponding to each item in the debug.layer_names parameter given to the constructor of this object.
The default here is pass instead of raise NotImplementedError, because you’re not required to implement debugging for your model.
-
Trainer.
_load_auxiliary_files
()[source]¶ Called during model loading. If you have some auxiliary pickled object, such as an object storing the vocabulary of your model, you can load it here.
-
Trainer.
_output_debug_info
(output_dict: typing.Dict[str, <built-in function array>], epoch: int)[source]¶
-
Trainer.
_overall_debug_output
(output_dict: typing.Dict[str, <built-in function array>]) → str[source]¶
-
Trainer.
_post_epoch_hook
(epoch: int)[source]¶ This method gets called directly after model.fit(), before making any early stopping decisions. If you want to modify anything after each iteration (e.g., computing a different kind of validation loss to use for early stopping, or just computing and printing accuracy on some other held out data), you can do that here. If you require extra parameters, use calls to local methods rather than passing new parameters, as this hook is run via a Keras Callback, which is fairly strict in it’s interface.
-
Trainer.
_pre_epoch_hook
(epoch: int)[source]¶ This method gets called before each epoch of training. If you want to do any kind of processing in between epochs (e.g., updating the training data for whatever reason), here is your chance to do so.
-
Trainer.
_save_auxiliary_files
()[source]¶ Called after training. If you have some auxiliary object, such as an object storing the vocabulary of your model, you can save it here. The model config is saved by default.
-
Trainer.
_uses_data_generators
()[source]¶ Training models with Keras requires a different API if you produce data in batches uses a generator or if you just provide one big numpy array with all of your data, which Keras has to split into batches. This method tells us which Keras API we should use. If your model class produces data using a generator, return
True
here; otherwise, returnFalse
. The default implementation just returnsFalse.
TextTrainer¶
-
class
deep_qa.training.text_trainer.
TextTrainer
(params: deep_qa.common.params.Params)[source]¶ This is a Trainer that deals with word sequences as its fundamental data type (any TextDataset or TextInstance subtype is fine). That means we have to deal with padding, with converting words (or characters) to indices, and encoding word sequences. This class adds methods on top of Trainer to deal with all of that stuff.
This class has five kinds of methods:
- protected methods that are overriden from
Trainer
, and which you shouldn’t need to worry about - utility methods for building models, intended for use by subclasses
- abstract methods that determine a few key points of behavior in concrete subclasses (e.g., what your input data type is)
- model-specific methods that you might have to override, depending on what your model looks like - similar to (3), but simple models don’t need to override these
- private methods that you shouldn’t need to worry about
There are two main ways you’re intended to interact with this class, then: by calling the utility methods when building your model, and by customizing the behavior of your concrete model by using the parameters to this class.
Parameters: embeddings : Dict[str, Any], optional (default=50 dim word embeddings, 8 dim character
embeddings, 0.5 dropout on both)
These parameters specify the kind of embeddings to use for words, character, tags, or whatever you want to embed. This dictionary behaves similarly to the
encoder
andseq2seq_encoder
parameter dictionaries. Valid keys aredimension
,dropout
,pretrained_file
,fine_tune
, andproject
. The value fordimension
is anint
specifying the dimensionality of the embedding (default 50 for words, 8 for characters);dropout
is a float, specifying the amount of dropout to use on the embedding layer (default0.5
);pretrained_file
is a (string) path to a glove-formatted file containing pre-trained embeddings;fine_tune
is a boolean specifying whether the pretrained embeddings should be trainable (defaultFalse
); andproject
is a boolean specifying whether to add a projection layer after the embedding layer (only really useful in conjunction with pre-trained embeddings, to get them into a lower-dimensional space; defaultFalse
).data_generator: Dict[str, Any], optional (default=None)
If not
None
, we will pass these parameters to aDataGenerator
object to create data batches, instead of creating one big array for all of our training data. SeeDataGenerator
for the available options here. Note that in order to take full advantage of the capabilities of aDataGenerator
, you should make sure your model correctly implements_set_padding_lengths()
,get_padding_lengths()
,get_padding_memory_scaling()
, andget_instance_sorting_keys()
. Also note that some of the thingsDataGenerator
does can change the behavior of your learning algorithm, so you should think carefully about how exactly you want batches to be structured before you choose these parameters.num_sentence_words: int, optional (default=None)
Upper limit on length of word sequences in the training data. Ignored during testing (we use the value set at training time, either from this parameter or from a loaded model). If this is not set, we’ll calculate a max length from the data.
num_word_characters: int, optional (default=None)
Upper limit on length of words in the training data. Only applicable for “words and characters” text encoding.
tokenizer: Dict[str, Any], optional (default={})
Which tokenizer to use for
TextInstances
. See :mod:deep_qa.data.tokenizers.tokenizer
for more information.encoder: Dict[str, Dict[str, Any]], optional (default={‘default’: {}})
These parameters specify the kind of encoder used to encode any word sequence input. An encoder takes a sequence of vectors and returns a single vector.
If given, this must be a dict, where each key is a name that can be used for encoders in the model, and the value corresponding to the key is a set of parameters that will be passed on to the constructor of the encoder. We will use the “type” key in this dict (which must match one of the keys in encoders) to determine the type of the encoder, then pass the remaining args to the encoder constructor.
Hint: Use
"lstm"
or"cnn"
for sentences,"treelstm"
for logical forms, and"bow"
for either.encoder_fallback_behavior: string, optional (default=”crash”)
Determines the behavior when an encoder is asked for by name, but you have not given parameters for an encoder with that name. See
_get_encoder
for more information.seq2seq_encoder: Dict[str, Dict[str, Any]], optional (default={‘default’: {‘encoder_params’: {}, ‘wrapper_params: {}}})
Like
encoder
, except seq2seq encoders return a sequence of vectors instead of a single vector (the difference between our “encoders” and “seq2seq encoders” is the difference in Keras betweenLSTM()
andLSTM(return_sequences=True)
).seq2seq_encoder_fallback_behavior: string, optional (default=”crash”)
Determines the behavior when a seq2seq encoder is asked for by name, but you have not given parameters for an encoder with that name. See
_get_seq2seq_encoder
for more information.- protected methods that are overriden from
Utility methods¶
These methods are intended for use by subclasses, mostly in your _build_model
implementation.
-
TextTrainer.
_get_sentence_shape
(sentence_length: int = None) → typing.Tuple[int][source]¶ Returns a tuple specifying the shape of a tensor representing a sentence. This is not necessarily just (self.num_sentence_words,), because different text_encodings lead to different tensor shapes. If you have an input that is a sequence of words, you need to call this to get the shape to pass to an
Input
layer. If you don’t, your model won’t work correctly for all tokenizers.
-
TextTrainer.
_embed_input
(input_layer: keras.engine.topology.Layer, embedding_suffix: str = '')[source]¶ This function embeds a word sequence input, using an embedding defined by
embedding_suffix
. You should call this function in your_build_model
method any time you want to convert word indices into word embeddings. Note that if this is used in conjunction with_get_sentence_shape
, we will do the correct thing for whateverTokenizer
you use. The actual input to this might be words and characters, and we might actually do a concatenation of a word embedding and a character-level encoder. All of this is handled transparently to your concrete model subclass, if you use the API correctly, calling_get_sentence_shape()
to get the shape for yourInput
layer, and passing that input layer into this_embed_input()
method.We need to take the input Layer here, instead of just returning a Layer that you can use as you wish, because we might have to apply several layers to the input, depending on the parameters you specified for embedding things. So we return, essentially,
embedding(input_layer)
.The input layer can have arbitrary shape, as long as it ends with a word sequence. For example, you could pass in a single sentence, a set of sentences, or a set of sets of sentences, and we will handle them correctly.
Internally, we will create a dictionary mapping embedding names to embedding layers, so if you have several things you want to embed with the same embedding layer, be sure you use the same name each time (or just don’t pass a name, which accomplishes the same thing). If for some reason you want to have different embeddings for different inputs, use a different name for the embedding.
In this function, we pass the work off to self.tokenizer, which might need to do some additional processing to actually give you a word embedding (e.g., if your text encoder uses both words and characters, we need to run the character encoder and concatenate the result with a word embedding).
Note that the
embedding_suffix
parameter is a suffix to whatever name the tokenizer will give to the embeddings it creates. Typically, the tokenizer will use the namewords
, though it could also usecharacters
, or something else. So if you pass_A
forembedding_suffix
, you will end up with actual embedding names likewords_A
andcharacters_A
. These are the keys you need to specify in your parameter file, for embedding sizes etc. When constructing actualEmbedding
layers, we will further append the string_embedding
, so the layer would be namedwords_A_embedding
.
-
TextTrainer.
_get_encoder
(name='default', fallback_behavior: str = None)[source]¶ This method is intended to be used in your
_build_model
implementation, any time you want to convert a sequence of vectors into a single vector. The encodername
corresponds to entries in theencoder
parameter passed to the constructor of this object, allowing you to customize the kind and behavior of the encoder just through parameters.A sentence encoder takes as input a sequence of word embeddings, and returns as output a single vector encoding the sentence. This is typically either a simple RNN or an LSTM, but could be more complex, if the “sentence” is actually a logical form.
Parameters: name : str, optional (default=”default”)
The name of the encoder. Multiple calls to
_get_encoder
using the same name will return the same encoder. To get parameters for creating the encoder, we look inself.encoder_params
, which is specified by theencoder
parameter inself.__init__
. Ifname
is not a key inself.encoder_params
, the behavior is defined by thefallback_behavior
parameter.fallback_behavior : str, optional (default=None)
Determines what to do when
name
is not a key inself.encoder_params
. If you passNone
(the default), we will useself.encoder_fallback_behavior
, specified by theencoder_fallback_behavior
parameter toself.__init__
. There are three options:"crash"
: raise an error. This is the default forself.encoder_fallback_behavior
. The intention is to help you find bugs - if you specify a particular encoder name inself._build_model
without giving a fallback behavior, you probably wanted to use a particular set of parameters, so we crash if they are not provided."use default params"
: In this case, we return a new encoder created withself.encoder_params["default"]
."use default encoder"
: In this case, we reuse the encoder created withself.encoder_params["default"]
. This effectively changes thename
parameter to"default"
when the givenname
is not inself.encoder_params
.
-
TextTrainer.
_get_seq2seq_encoder
(name='default', fallback_behavior: str = None)[source]¶ This method is intended to be used in your
_build_model
implementation, any time you want to convert a sequence of vectors into another sequence of vector. The encodername
corresponds to entries in theencoder
parameter passed to the constructor of this object, allowing you to customize the kind and behavior of the encoder just through parameters.A seq2seq encoder takes as input a sequence of vectors, and returns as output a sequence of vectors. This method is essentially identical to
_get_encoder
, except that it gives an encoder that returns a sequence of vectors instead of a single vector.Parameters: name : str, optional (default=”default”)
The name of the encoder. Multiple calls to
_get_seq2seq_encoder
using the same name will return the same encoder. To get parameters for creating the encoder, we look inself.seq2seq_encoder_params
, which is specified by theseq2seq_encoder
parameter inself.__init__
. Ifname
is not a key inself.seq2seq_encoder_params
, the behavior is defined by thefallback_behavior
parameter.fallback_behavior : str, optional (default=None)
Determines what to do when
name
is not a key inself.seq2seq_encoder_params
. If you passNone
(the default), we will useself.seq2seq_encoder_fallback_behavior
, specified by theseq2seq_encoder_fallback_behavior
parameter toself.__init__
. There are three options:"crash"
: raise an error. This is the default forself.seq2seq_encoder_fallback_behavior
. The intention is to help you find bugs - if you specify a particular encoder name inself._build_model
without giving a fallback behavior, you probably wanted to use a particular set of parameters, so we crash if they are not provided."use default params"
: In this case, we return a new encoder created withself.seq2seq_encoder_params["default"]
."use default encoder"
: In this case, we reuse the encoder created withself.seq2seq_encoder_params["default"]
. This effectively changes thename
parameter to"default"
when the givenname
is not inself.seq2seq_encoder_params
.
-
TextTrainer.
_set_text_lengths_from_model_input
(input_slice)[source]¶ Given an input slice (a tuple) from a model representing the max length of the sentences and the max length of each words, set the padding max lengths. This gets called when loading a model, and is necessary to get padding correct when using loaded models. Subclasses need to call this in their
_set_padding_lengths_from_model
method.Parameters: input_slice : tuple
A slice from a concrete model class that represents an input word sequence. The tuple must be of length one or two, and the first dimension should correspond to the length of the sentences while the second dimension (if provided) should correspond to the max length of the words in each sentence.
Abstract methods¶
You must implement these methods in your model (along with
_build_model()
). The simplest concrete TextTrainer
implementations only have four methods: __init__
, _instance_type
(typically one line),
_set_padding_lengths_from_model
(also typically one line, for simple models), and
_build_model
. See
TrueFalseModel
and
SimpleTagger
for examples.
-
TextTrainer.
_instance_type
() → deep_qa.data.instances.instance.Instance[source]¶ When reading datasets, what
Instance
type should we create? TheInstance
class contains code that creates actual numpy arrays, so this instance type determines the inputs that you will get to your model, and the outputs that are used for training.
-
TextTrainer.
_set_padding_lengths_from_model
()[source]¶ This gets called when loading a saved model. It is analogous to
_set_padding_lengths
, but needs to set all of the values set in that method just by inspecting the loaded model. If we didn’t have this, we would not be able to correctly pad data after loading a model.
Semi-abstract methods¶
You’ll likely need to override these methods, if you have anything more complex than a single sentence as input.
-
TextTrainer.
get_padding_lengths
() → typing.Dict[str, int][source]¶ This is about padding. Any solver will have some number of things that need padding in order to make consistently-sized data arrays, like the length of a sentence. This method returns a dictionary of all of those things, mapping a length key to an int.
If any of the entries in this dictionary is
None
, the padding code will calculate a padding length from the data itself. This could either be a good idea or a bad idea - if you have outliers in your data, you could be wasting a whole lot of memory and computation time if you pad the whole dataset to the size of the outlier. On the other hand, if you do batch-specific padding, this can save you a whole lot of time, if you group batches by similar lengths.Here we return the lengths that are applicable to encoding words and sentences. If you have additional padding dimensions, call super().get_padding_lengths() and then update the dictionary.
-
TextTrainer.
get_instance_sorting_keys
() → typing.List[str][source]¶ If we’re using dynamic padding, we want to group the instances by padding length, so that we minimize the amount of padding necessary per batch. This variable sets what exactly gets sorted by. We’ll call
get_padding_lengths()
on each instance, pull out these keys, and sort by them in the order specified. You’ll want to override this in your model class if you have more complex models.The default implementation is to sort first by
num_sentence_words
, then bynum_word_characters
(if applicable).
-
TextTrainer.
get_padding_memory_scaling
(padding_lengths: typing.Dict[str, int]) → int[source]¶ This method is for computing adaptive batch sizes. We assume that memory usage is a function that looks like this:
, where
is the memory usage,
is the batch size,
is some constant that depends on how much GPU memory you have and various model hyperparameters, and
is a function outlining how memory usage asymptotically varies with the padding lengths. Our approach will be to let the user effectively set
using the
adaptive_memory_usage_constant
parameter inDataGenerator
. The model (this method) specifies, so we can solve for the batch size
. The more specific you get in specifying
in this function, the better a job we can do in optimizing memory usage.
Parameters: padding_lengths: Dict[str, int]
Dictionary containing padding lengths, mapping keys like
num_sentence_words
to ints. This method computes a function of these ints.Returns: O(p): int
The big-O complexity of the model, evaluated with the specific ints given in
padding_lengths
dictionary.
-
TextTrainer.
_set_padding_lengths
(dataset_padding_lengths: typing.Dict[str, int])[source]¶ This is about padding. Any model will have some number of things that need padding in order to make a consistent set of input arrays, like the length of a sentence. This method sets those variables given a dictionary of lengths from a dataset.
Note that you might choose not to update some of these lengths, either because you want to keep the model flexible to allow for dynamic (batch-specific) padding, or because you’ve set a hard limit in the class parameters and don’t want to change it.
Overridden Trainer
methods¶
You probably don’t need to override these, except for probably _get_custom_objects
. The rest
of them you shouldn’t need to worry about at all (except to call them, if they are part of the
external Trainer
API), but we document them here for completeness.
-
TextTrainer.
create_data_arrays
(dataset: deep_qa.data.datasets.dataset.IndexedDataset, batch_size: int = None)[source]¶ Takes a raw dataset and converts it into training inputs and labels that can be used to either train a model or make predictions. Depending on parameters passed to the constructor of this
Trainer
, this could either return two actual array objects, or a single generator that generates batches of two array objects.Parameters: dataset: Dataset
A
Dataset
of the same format as read byload_dataset_from_files()
(we will call this directly with the output from that method, in fact)batch_size: int, optional (default = None)
The batch size with which the dataset should be created. If this is None, the default self.batch_size will be used.
Returns: input_arrays: numpy.array or Tuple[numpy.array]
label_arrays: numpy.array, Tuple[numpy.array], or None
generator: a Python generator returning Tuple[input_arrays, label_arrays]
If this is returned, it is the only return value. We either return a
Tuple[input_arrays, label_arrays]
, or this generator.
-
TextTrainer.
load_dataset_from_files
(files: typing.List[str])[source]¶ This method assumes you have a TextDataset that can be read from a single file. If you have something more complicated, you’ll need to override this method (though, a solver that has background information could call this method, then do additional processing on the rest of the list, for instance).
-
TextTrainer.
score_dataset
(dataset: deep_qa.data.datasets.dataset.TextDataset)[source]¶ See the superclass docs (
Trainer.score_dataset()
) for usage info. Just a note here that we do not use data generators for this method, even if you’ve said elsewhere that you want to use them, so that we can easily return the labels for the data. This means that we’ll do whole-dataset padding, and this could be slow. We could probably fix this, but it’s good enough for now.
-
TextTrainer.
set_model_state_from_dataset
(dataset: deep_qa.data.datasets.dataset.TextDataset)[source]¶ Given a raw
Dataset
object, set whatever model state is necessary. The most obvious use case for this is for computing a vocabulary inTextTrainer
. Note that this is not anIndexedDataset
, and you should not make it one. Useset_model_state_from_indexed_dataset()
for setting state that depends on the data having already been indexed; otherwise you’ll duplicate the work of doing the indexing.
-
TextTrainer.
set_model_state_from_indexed_dataset
(dataset: deep_qa.data.datasets.dataset.IndexedDataset)[source]¶ Given an
IndexedDataset
, set whatever model state is necessary. This is typically stuff around padding.
-
TextTrainer.
_load_auxiliary_files
()[source]¶ Called during model loading. If you have some auxiliary pickled object, such as an object storing the vocabulary of your model, you can load it here.
-
TextTrainer.
_overall_debug_output
(output_dict: typing.Dict[str, <built-in function array>]) → str[source]¶ We’ll do something different here: if “embedding” is in output_dict, we’ll output the embedding matrix at the top of the debug file. Note that this could be _huge_ - you should only do this for debugging on very simple datasets.
-
TextTrainer.
_save_auxiliary_files
()[source]¶ Called after training. If you have some auxiliary object, such as an object storing the vocabulary of your model, you can save it here. The model config is saved by default.
-
TextTrainer.
_set_params_from_model
()[source]¶ Called after a model is loaded, this lets you update member variables that contain model parameters, like max sentence length, that are not stored as weights in the model object. This is necessary if you want to process a new data instance to be compatible with the model for prediction, for instance.
-
TextTrainer.
_uses_data_generators
()[source]¶ Training models with Keras requires a different API if you produce data in batches uses a generator or if you just provide one big numpy array with all of your data, which Keras has to split into batches. This method tells us which Keras API we should use. If your model class produces data using a generator, return
True
here; otherwise, returnFalse
. The default implementation just returnsFalse.
Multi GPU Training¶
-
deep_qa.training.multi_gpu.
compile_parallel_model
(model_builder: typing.Callable[[], deep_qa.training.models.DeepQaModel], compile_arguments: deep_qa.common.params.Params) → deep_qa.training.models.DeepQaModel[source]¶ This function compiles a multi-gpu version of your model. This is done using data parallelism, by making N copies of the model on the different GPUs, all of which share parameters. Gradients are updated synchronously, using the average gradient from all of the outputs of the various models. This effectively allows you to scale a model up to batch_sizes which cannot fit on a single GPU.
This method returns a “primary” copy of the model, which has had its training function which is run by Keras overridden to be a training function which trains all of the towers of the model. The other towers never have their training functions initialised or used and are completely hidden from the user. The returned model can be serialised in the same way as any other model and has no dependency on multiple gpus being available when it is loaded.
Note that by calling this function, the model_builder function will be called multiple times for the different GPUs. As such, you should be wary of this function having side effects unrelated to building a computation graph.
Parameters: model_builder: Callable[any, DeepQaModel], required.
A function which returns an uncompiled DeepQaModel.
compile_arguments: Params, required
Model parameters which are passed to compile. These should be the same as if you were building a single GPU model, with the exception of the
num_gpus
field.Returns: The “primary” copy of the DeepQaModel, which holds the training function which
trains all of the copies of the model.
Misc¶
Models¶
-
class
deep_qa.training.models.
DeepQaModel
(*args, **kwargs)[source]¶ Bases:
keras.engine.training.Model
This is a Model that adds functionality to Keras’
Model
class. In particular, we use tensorflow optimisers directly in order to make use of sparse gradient updates, which Keras does not handle. Additionally, we provide some nicer summary functions which include mask information. We are overriding key components of Keras here and you should probably have a pretty good grip on the internals of Keras before you change stuff below, as there could be unexpected consequences.-
_fit_loop
(f: <built-in function callable>, ins: typing.List[<built-in function array>], out_labels: typing.List[str] = None, batch_size: int = 32, epochs: int = 100, verbose: int = 1, callbacks: typing.List[keras.callbacks.Callback] = None, val_f: <built-in function callable> = None, val_ins: typing.List[<built-in function array>] = None, shuffle: bool = True, callback_metrics: typing.List[str] = None, initial_epoch: int = 0)[source]¶ Abstract fit function which preprocesses and batches data before training a model. We override this keras backend function to support multi-gpu training via splitting a large batch size across multiple gpus. This function is broadly the same as the Keras backend version aside from this - changed elements have corresponding comments attached.
Note that this should not be called directly - it is used by calling model.fit().
Assume that step_function returns a list, labeled by out_labels.
Parameters: f: A callable ``Step`` or a Keras ``Function``, required.
A DeepQA Step or Keras Function returning a list of tensors.
ins: List[numpy.array], required.
The list of tensors to be fed to
step_function
.out_labels: List[str], optional (default = None).
The display names of the outputs of
step_function
.batch_size: int, optional (default = 32).
The integer batch size.
epochs: int, optional (default = 100).
Number of times to iterate over the data.
verbose: int, optional, (default = 1)
Verbosity mode, 0, 1 or 2.
callbacks: List[Callback], optional (default = None).
A list of Keras callbacks to be called during training.
val_f: A callable ``Step`` or a Keras ``Function``, optional (default = None).
The Keras function to call for validation.
val_ins: List[numpy.array], optional (default)
A list of tensors to be fed to
val_f
.shuffle: bool, optional (default = True).
whether to shuffle the data at the beginning of each epoch
callback_metrics: List[str], optional, (default = None).
A list of strings, the display names of the validation metrics. passed to the callbacks. They should be the concatenation of list the display names of the outputs of
f
and the list of display names of the outputs off_val
.initial_epoch: int, optional (default = 0).
The epoch at which to start training (useful for resuming a previous training run).
Returns: A Keras
History
object.
-
_make_train_function
()[source]¶ We override this method so that we can use tensorflow optimisers directly. This is desirable as tensorflow handles gradients of sparse tensors efficiently.
-
_prepare_callbacks
(callbacks: typing.List[keras.callbacks.Callback], val_ins: typing.List[<built-in function array>], epochs: int, batch_size: int, num_train_samples: int, callback_metrics: typing.List[str], do_validation: bool, verbose: int)[source]¶ Sets up Keras callbacks to perform various monitoring functions during training.
-
compile
(params: deep_qa.common.params.Params)[source]¶ The only reason we are overriding this method is because keras automatically wraps our tensorflow optimiser in a keras wrapper, which we don’t want. We override the only method in
Model
which uses this attribute,_make_train_function
, which raises an error if compile is not called first. As we move towards using a Tensorflow first optimisation loop, more things will be added here which add functionality to the way Keras runs tensorflow Session calls.
-
train_on_batch
(x: typing.List[<built-in function array>], y: typing.List[<built-in function array>], sample_weight: typing.List[<built-in function array>] = None, class_weight: typing.Dict[int, <built-in function array>] = None)[source]¶ Runs a single gradient update on a single batch of data. We override this method in order to provide multi-gpu training capability.
Parameters: x: List[numpy.array], required
Numpy array of training data, or list of Numpy arrays if the model has multiple inputs. If all inputs in the model are named, you can also pass a dictionary mapping input names to Numpy arrays.
y: List[numpy.array], required
A Numpy array of labels, or list of Numpy arrays if the model has multiple outputs. If all outputs in the model are named, you can also pass a dictionary mapping output names to Numpy arrays.
sample_weight: List[numpy.array], optional (default = None)
optional array of the same length as x, containing weights to apply to the model’s loss for each sample. In the case of temporal data, you can pass a 2D array with shape (samples, sequence_length), to apply a different weight to every timestep of every sample. In this case you should make sure to specify sample_weight_mode=”temporal” in compile().
class_weight: optional dictionary mapping
class indices (integers) to a weight (float) to apply to the model’s loss for the samples from this class during training. This can be useful to tell the model to “pay more attention” to samples from an under-represented class.
Returns: Scalar training loss
(if the model has a single output and no metrics)
or list of scalars (if the model has multiple outputs
and/or metrics). The attribute model.metrics_names will give you
the display labels for the scalar outputs.
-
Optimizers¶
It turns out that Keras’ design is somewhat crazy*, and there is no list of optimizers that you can just import from Keras. So, this module specifies a list, and a helper function or two for dealing with optimizer parameters. Unfortunately, this means that we have a list that must be kept in sync with Keras. Oh well.
* Have you seen their get_from_module() method? See here: https://github.com/fchollet/keras/blob/6e42b0e4a77fb171295b541a6ae9a3a4a79f9c87/keras/utils/generic_utils.py#L10. That method means I could pass in ‘clip_norm’ as an optimizer, and it would try to use that function as an optimizer. It also means there is no simple list of implemented optimizers I can grab.
* I should also note that Keras is an incredibly useful library that does a lot of things really well. It just has a few quirks...
-
deep_qa.training.optimizers.
optimizer_from_params
(params: typing.Union[deep_qa.common.params.Params, str])[source]¶ This method converts from a parameter object like we use in our Trainer code into an optimizer object suitable for use with Keras. The simplest case for both of these is a string that shows up in optimizers above - if params is just one of those strings, we return it, and everyone is happy. If not, we assume params is a Dict[str, Any], with a “type” key, where the value for “type” must be one of those strings above. We take the rest of the parameters and pass them to the optimizer’s constructor.
About Data¶
This module contains code for processing data. There’s a DataIndexer, whose job it is to convert from strings to word (or character) indices suitable for use with an embedding matrix. There’s code to load pre-trained embeddings from a file, to tokenize sentences, and, most importantly, to convert training and testing examples into numpy arrays that can be used with Keras.
The most important thing to understand about the data processing code is the Dataset object. A Dataset is a collection of Instances, which are the individual examples used for training and testing. Dataset has two subclasses: TextDataset, which contains Instances with raw strings and can be read directly from a file, and IndexedDataset, which contains Instances whose raw strings have been converted to word (or character) indices. The IndexedDataset has methods for padding sequences to a consistent length, so that models can be compiled, and for converting the Instances to numpy arrays. The file formats read by TextDataset, and the format of the numpy arrays produced by IndexedDataset, are determined by the underlying Instance type used by the Dataset. See the instances module for more detail on this.
Base Instances¶
An Instance
is a single training or testing example for a Keras model. The base classes for
working with Instances
are found in instance.py. There are two subclasses: (1)
TextInstance
, which is a raw instance that contains
actual strings, and can be used to determine a vocabulary for a model, or read directly from a
file; and (2) IndexedInstance
, which has had its raw
strings converted to word (or character) indices, and can be padded to a consistent length and
converted to numpy arrays for use with Keras.
Concrete Instance
classes are organized in the code by the task they are designed for (e.g.,
text classification, reading comprehension, sequence tagging, etc.).
A lot of the magic of how the DeepQA library works happens here, in the concrete Instance classes
in this module. Most of the code can be totally agnostic to how exactly the input is structured,
because the conversion to numpy arrays happens here, not in the Trainer or TextTrainer classes,
with only the specific _build_model()
methods needing to know about the format of their input
and output (and even some of the details there are transparent to the model class).
This module contains the base Instance
classes that concrete classes
inherit from. Specifically, there are three classes:
Instance
, that just exists as a base type with no functionalityTextInstance
, which adds awords()
method and a method to convert strings to indices using a DataIndexer.IndexedInstance
, which is aTextInstance
that has had all of its strings converted into indices.
This class has methods to deal with padding (so that sequences all have the
same length) and converting an Instance
into a set of Numpy arrays
suitable for use with Keras.
As this codebase is dealing mostly with textual question answering, pretty much
all of the concrete Instance
types will have both a TextInstance
and a
corresponding IndexedInstance
, which you can see in the individual files
for each Instance
type.
-
class
deep_qa.data.instances.instance.
IndexedInstance
(label, index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.Instance
An indexed data instance has all word tokens replaced with word indices, along with some kind of label, suitable for input to a Keras model. An
IndexedInstance
is created from anInstance
using aDataIndexer
, and the indices here have no recoverable meaning without theDataIndexer
.For example, we might have the following
Instance
: -TrueFalseInstance('Jamie is nice, Holly is mean', True, 25)
After being converted into an
IndexedInstance
, we might have the following: -IndexedTrueFalseInstance([1, 6, 7, 1, 6, 8], True, 25)
This would mean that
"Jamie"
and"Holly"
were OOV to theDataIndexer
, and the other words were given indices.-
static
_get_word_sequence_lengths
(word_indices: typing.List) → typing.Dict[str, int][source]¶ Because
TextEncoders
can return complex data structures, we might actually have several things to pad for a single word sequence. We check for that and handle it in a single spot here. We return a dictionary containing ‘num_sentence_words’, which is the number of words in word_indices. If the word representations also contain characters, the dictionary additionally contains a ‘num_word_characters’ key, with a value corresponding to the longest word in the sequence.
-
as_training_data
()[source]¶ Convert this
IndexedInstance
to NumPy arrays suitable for use as training data to Keras models.Returns: train_data : (inputs, label)
The
IndexedInstance
as NumPy arrays to be uesd in Keras. Note thatinputs
might itself be a complex tuple, depending on theInstance
type.
-
classmethod
empty_instance
()[source]¶ Returns an empty, unpadded instance of this class. Necessary for option padding in multiple choice instances.
-
get_padding_lengths
() → typing.Dict[str, int][source]¶ Returns the length of this instance in all dimensions that require padding.
Different kinds of instances have different fields that are padded, such as sentence length, number of background sentences, number of options, etc.
Returns: padding_lengths: Dict[str, int]
A dictionary mapping padding keys (like “num_sentence_words”) to lengths.
-
pad
(padding_lengths: typing.Dict[str, int])[source]¶ Add zero-padding to make each data example of equal length for use in the neural network.
This modifies the current object.
Parameters: padding_lengths: Dict[str, int]
In this dictionary, each
str
refers to a type of token (e.g.num_sentence_words
), and the correspondingint
is the value. This dictionary must have the same keys as was returned byget_padding_lengths()
. We will use these lengths to pad the instance in all of the necessary dimensions to the given leangths.
-
static
pad_sequence_to_length
(sequence: typing.List, desired_length: int, default_value: typing.Callable[[], typing.Any] = <function IndexedInstance.<lambda>>, truncate_from_right: bool = True) → typing.List[source]¶ Take a list of indices and pads them to the desired length.
Parameters: word_sequence : List of int
A list of word indices.
desired_length : int
Maximum length of each sequence. Longer sequences are truncated to this length, and shorter ones are padded to it.
default_value: Callable, default=lambda: 0
Callable that outputs a default value (of any type) to use as padding values.
truncate_from_right : bool, default=True
If truncating the indices is necessary, this parameter dictates whether we do so on the left or right.
Returns: padded_word_sequence : List of int
A padded or truncated list of word indices.
Notes
The reason we truncate from the right by default is for cases that are questions, with long set ups. We at least want to get the question encoded, which is always at the end, even if we’ve lost much of the question set up. If you want to truncate from the other direction, you can.
-
static
pad_word_sequence
(word_sequence: typing.List[int], padding_lengths: typing.Dict[str, int], truncate_from_right: bool = True) → typing.List[source]¶ Take a list of indices and pads them.
Parameters: word_sequence : List of int
A list of word indices.
padding_lengths : Dict[str, int]
In this dictionary, each
str
refers to a type of token (e.g.num_sentence_words
), and the correspondingint
is the value. This dictionary must have the same dimension as was returned byget_padding_lengths()
. We will use these lengths to pad the instance in all of the necessary dimensions to the given leangths.truncate_from_right : bool, default=True
If truncating the indices is necessary, this parameter dictates whether we do so on the left or right.
Returns: padded_word_sequence : List of int
A padded list of word indices.
Notes
The reason we truncate from the right by default is for cases that are questions, with long set ups. We at least want to get the question encoded, which is always at the end, even if we’ve lost much of the question set up. If you want to truncate from the other direction, you can.
TODO(matt): we should probably switch the default to truncate from the left, and clear up the naming here - it’s easy to get confused about what “truncate from right” means.
-
static
-
class
deep_qa.data.instances.instance.
Instance
(label, index: int = None)[source]¶ Bases:
object
A data instance, used either for training a neural network or for testing one.
Parameters: label : Any
Any kind of label that you might want to predict in a model. Could be a class label, a tag sequence, a character span in a passage, etc.
index : int, optional
Used for matching instances with other data, such as background sentences.
-
class
deep_qa.data.instances.instance.
TextInstance
(label, index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.Instance
An
Instance
that has some attached text, typically either a sentence or a logical form. This is called aTextInstance
because the individual tokens here are encoded as strings, and we can get a list of strings out when we ask what words show up in the instance.We use these kinds of instances to fit a
DataIndexer
(i.e., deciding which words should be mapped to an unknown token); to use them in training or testing, we need to first convert them intoIndexedInstances
.In order to actually convert text into some kind of indexed sequence, we rely on a
TextEncoder
. There are severalTextEncoder
subclasses, that will let you use word token sequences, character sequences, and other options. By default we use word tokens. You can override this by setting theencoder
class variable.-
_index_text
(text: str, data_indexer: deep_qa.data.data_indexer.DataIndexer) → typing.List[int][source]¶
-
classmethod
read_from_line
(line: str)[source]¶ Reads an instance of this type from a line.
Parameters: line : str
A line from a data file.
Returns: indexed_instance : IndexedInstance
A
TextInstance
that has had all of its strings converted into indices.Notes
We throw a
RuntimeError
here instead of aNotImplementedError
, because it’s not expected that all subclasses will implement this.
-
to_indexed_instance
(data_indexer: deep_qa.data.data_indexer.DataIndexer) → deep_qa.data.instances.instance.IndexedInstance[source]¶ Converts the words in this
Instance
into indices using theDataIndexer
.Parameters: data_indexer : DataIndexer
DataIndexer
to use in converting theInstance
to anIndexedInstance
.Returns: indexed_instance : IndexedInstance
A
TextInstance
that has had all of its strings converted into indices.
-
tokenizer
= <deep_qa.data.tokenizers.word_tokenizer.WordTokenizer object>¶
-
words
() → typing.Dict[str, typing.List[str]][source]¶ Returns a list of all of the words in this instance, contained in a namespace dictionary.
This is mainly used for computing word counts when fitting a word vocabulary on a dataset. The namespace dictionary allows you to have several embedding matrices with different vocab sizes, e.g., for words and for characters (in fact, words and characters are the only use cases I can think of for now, but this allows you to do other more crazy things if you want). You can call the namespaces whatever you want, but if you want the
DataIndexer
to work correctly without namespace arguments, you should use the key ‘words’ to represent word tokens.Returns: namespace : Dictionary of {str: List[str]}
The
str
key refers to vocabularies, and theList[str]
should contain the tokens in that vocabulary. For example, you should use the keywords
to represent word tokens, and the correspoding value in the dictionary would be a list of all the words in the instance.
-
Entailment Instances¶
These Instances
are designed for an entailment task, where the input is a pair of sentences
(or larger text sequences) and the output is a classification decision.
SentencePairInstances¶
-
class
deep_qa.data.instances.entailment.sentence_pair_instance.
IndexedSentencePairInstance
(first_sentence_indices: typing.List[int], second_sentence_indices: typing.List[int], label: typing.List[int], index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.IndexedInstance
This is an indexed instance that is commonly used for labeled sentence pairs. Examples of this are SnliInstances where we have a labeled pair of text and hypothesis, and a sentence2vec instance where the objective is to train an encoder to predict whether the sentences are in context or not.
-
as_training_data
()[source]¶ Convert this
IndexedInstance
to NumPy arrays suitable for use as training data to Keras models.Returns: train_data : (inputs, label)
The
IndexedInstance
as NumPy arrays to be uesd in Keras. Note thatinputs
might itself be a complex tuple, depending on theInstance
type.
-
classmethod
empty_instance
()[source]¶ Returns an empty, unpadded instance of this class. Necessary for option padding in multiple choice instances.
-
get_padding_lengths
() → typing.Dict[str, int][source]¶ Returns the length of this instance in all dimensions that require padding.
Different kinds of instances have different fields that are padded, such as sentence length, number of background sentences, number of options, etc.
Returns: padding_lengths: Dict[str, int]
A dictionary mapping padding keys (like “num_sentence_words”) to lengths.
-
pad
(padding_lengths: typing.Dict[str, int])[source]¶ Add zero-padding to make each data example of equal length for use in the neural network.
This modifies the current object.
Parameters: padding_lengths: Dict[str, int]
In this dictionary, each
str
refers to a type of token (e.g.num_sentence_words
), and the correspondingint
is the value. This dictionary must have the same keys as was returned byget_padding_lengths()
. We will use these lengths to pad the instance in all of the necessary dimensions to the given leangths.
-
-
class
deep_qa.data.instances.entailment.sentence_pair_instance.
SentencePairInstance
(first_sentence: str, second_sentence: str, label: typing.List[int], index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.TextInstance
SentencePairInstance contains a labeled pair of instances accompanied by a binary label. You could have the label represent whatever you want, such as entailment, or occuring in the same context, or whatever.
-
classmethod
read_from_line
(line: str)[source]¶ Expected format: [sentence1][tab][sentence2][tab][label]
-
to_indexed_instance
(data_indexer: deep_qa.data.data_indexer.DataIndexer)[source]¶ Converts the words in this
Instance
into indices using theDataIndexer
.Parameters: data_indexer : DataIndexer
DataIndexer
to use in converting theInstance
to anIndexedInstance
.Returns: indexed_instance : IndexedInstance
A
TextInstance
that has had all of its strings converted into indices.
-
words
() → typing.Dict[str, typing.List[str]][source]¶ Returns a list of all of the words in this instance, contained in a namespace dictionary.
This is mainly used for computing word counts when fitting a word vocabulary on a dataset. The namespace dictionary allows you to have several embedding matrices with different vocab sizes, e.g., for words and for characters (in fact, words and characters are the only use cases I can think of for now, but this allows you to do other more crazy things if you want). You can call the namespaces whatever you want, but if you want the
DataIndexer
to work correctly without namespace arguments, you should use the key ‘words’ to represent word tokens.Returns: namespace : Dictionary of {str: List[str]}
The
str
key refers to vocabularies, and theList[str]
should contain the tokens in that vocabulary. For example, you should use the keywords
to represent word tokens, and the correspoding value in the dictionary would be a list of all the words in the instance.
-
classmethod
SnliInstances¶
-
class
deep_qa.data.instances.entailment.snli_instance.
SnliInstance
(text: str, hypothesis: str, label: str, index: int = None)[source]¶ Bases:
deep_qa.data.instances.entailment.sentence_pair_instance.SentencePairInstance
An SnliInstance is a SentencePairInstance that represents a pair of (text, hypothesis) from the Stanford Natural Language Inference (SNLI) dataset, with an associated label. The main thing we need to add here is handling of the label, because there are a few different ways we can use this Instance.
The label can either be a three-way decision (one of either “entails”, “contradicts”, or “neutral”), or a binary decision (grouping either “entails” and “contradicts”, for relevance decisions, or “contradicts” and “neutral”, for entails/not entails decisions.
The input label must be one of the strings in the label_mapping field below. The difference between the
*_softmax
and*_sigmoid
labels are just for implementation reasons. A softmax over two dimensions is exactly equivalent to a sigmoid, but to make our lives easier in building models, sometimes we use a sigmoid and sometimes we use a softmax over two dimensions. Having separate labels for these cases makes it easier to use this data in whatever kind of model you want.It might make sense to push this difference more generally into some common place, so that we can separate the label itself from how it’s encoded for training. But that might also be complicated to implement, and it’s not needed right now. TODO(matt): if we find ourselves doing this kind of thing in several places, we should think about making that change.
-
label_mapping
= {'entails_softmax': [0, 1], 'not_entails_softmax': [1, 0], 'attention_false': [0], 'entails': [1, 0, 0], 'contradicts': [0, 1, 0], 'not_entails_sigmoid': [0], 'neutral': [0, 0, 1], 'attention_true': [1], 'entails_sigmoid': [1]}¶
-
classmethod
read_from_line
(line: str)[source]¶ Reads an SnliInstance object from a line. The format has one of two options:
- [example index][tab][text][tab][hypothesis][tab][label]
- [text][tab][hypothesis][tab][label]
[label] is assumed to be one of “entails”, “contradicts”, or “neutral”.
-
to_entails_instance
(activation: str)[source]¶ This returns a new SnliInstance with a different label. The new label will be binary (entails / not entails), but we need to distinguish between two different label types. Sometimes we need the label to be encoded in a single dimension (i.e., either 0 or 1), and sometimes we need it to be encoded in two dimensions (i.e., either [0, 1] or [1, 0]). This depends on the activation function of the final layer in our network - a sigmoid activation will need the former, while a softmax activation will need the later. So, we encode these differently, as strings, which will be converted to the right array later, in IndexedSnliInstance.
-
Reading Comprehension Instances¶
These Instances
are designed for the set of tasks known today as “reading comprehension”, where
the input is a natural language question, a passage, and (optionally) some number of answer
options, and the output is either a (span begin index, span end index) decision over the passage,
or a classification decision over the answer options (if provided).
QuestionPassageInstances¶
-
class
deep_qa.data.instances.reading_comprehension.question_passage_instance.
IndexedQuestionPassageInstance
(question_indices: typing.List[int], passage_indices: typing.List[int], label: typing.List[int], index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.IndexedInstance
This is an indexed instance that is used for (question, passage) pairs.
-
as_training_data
()[source]¶ Convert this
IndexedInstance
to NumPy arrays suitable for use as training data to Keras models.Returns: train_data : (inputs, label)
The
IndexedInstance
as NumPy arrays to be uesd in Keras. Note thatinputs
might itself be a complex tuple, depending on theInstance
type.
-
classmethod
empty_instance
()[source]¶ Returns an empty, unpadded instance of this class. Necessary for option padding in multiple choice instances.
-
-
class
deep_qa.data.instances.reading_comprehension.question_passage_instance.
QuestionPassageInstance
(question_text: str, passage_text: str, label: typing.Any, index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.TextInstance
A QuestionPassageInstance is a base class for datasets that consist primarily of a question text and a passage, where the passage contains the answer to the question. This class should not be used directly due to the missing
_index_label
function, use a subclass instead.-
_index_label
(label: typing.Any) → typing.List[int][source]¶ Index the labels. Since we don’t know what form the label takes, we leave it to subclasses to implement this method.
-
to_indexed_instance
(data_indexer: deep_qa.data.data_indexer.DataIndexer)[source]¶ Converts the words in this
Instance
into indices using theDataIndexer
.Parameters: data_indexer : DataIndexer
DataIndexer
to use in converting theInstance
to anIndexedInstance
.Returns: indexed_instance : IndexedInstance
A
TextInstance
that has had all of its strings converted into indices.
-
words
() → typing.Dict[str, typing.List[str]][source]¶ Returns a list of all of the words in this instance, contained in a namespace dictionary.
This is mainly used for computing word counts when fitting a word vocabulary on a dataset. The namespace dictionary allows you to have several embedding matrices with different vocab sizes, e.g., for words and for characters (in fact, words and characters are the only use cases I can think of for now, but this allows you to do other more crazy things if you want). You can call the namespaces whatever you want, but if you want the
DataIndexer
to work correctly without namespace arguments, you should use the key ‘words’ to represent word tokens.Returns: namespace : Dictionary of {str: List[str]}
The
str
key refers to vocabularies, and theList[str]
should contain the tokens in that vocabulary. For example, you should use the keywords
to represent word tokens, and the correspoding value in the dictionary would be a list of all the words in the instance.
-
McQuestionPassageInstances¶
-
class
deep_qa.data.instances.reading_comprehension.mc_question_passage_instance.
IndexedMcQuestionPassageInstance
(question_indices: typing.List[int], passage_indices: typing.List[int], option_indices: typing.List[typing.List[int]], label: typing.List[int], index: int = None)[source]¶ -
-
as_training_data
()[source]¶ Convert this
IndexedInstance
to NumPy arrays suitable for use as training data to Keras models.Returns: train_data : (inputs, label)
The
IndexedInstance
as NumPy arrays to be uesd in Keras. Note thatinputs
might itself be a complex tuple, depending on theInstance
type.
-
classmethod
empty_instance
()[source]¶ Returns an empty, unpadded instance of this class. Necessary for option padding in multiple choice instances.
-
get_padding_lengths
() → typing.Dict[str, int][source]¶ We need to pad the answer option length (in words), the number of answer options, the question length (in words), the passage length (in words), and the word length (in characters) among all the questions, passages, and answer options.
-
pad
(padding_lengths: typing.Dict[str, int])[source]¶ In this function, we pad the questions and passages (in terms of number of words in each), as well as the individual words in the questions and passages themselves. We also pad the number of answer options, the answer options (in terms of numbers or words in each), as well as the individual words in the answer options.
-
-
class
deep_qa.data.instances.reading_comprehension.mc_question_passage_instance.
McQuestionPassageInstance
(question: str, passage: str, answer_options: typing.List[str], label: int, index: int = None)[source]¶ Bases:
deep_qa.data.instances.reading_comprehension.question_passage_instance.QuestionPassageInstance
A McQuestionPassageInstance is a QuestionPassageInstance that represents a (question, passage, answer_options) tuple from the McQuestionPassageInstance dataset, with an associated label indicating the index of the correct answer choice.
-
_index_label
(label: typing.Tuple[int, int]) → typing.List[int][source]¶ Specify how to index self.label, which is needed to convert the McQuestionPassageInstance into an IndexedInstance (conversion handled in superclass).
-
classmethod
read_from_line
(line: str)[source]¶ Reads a McQuestionPassageInstance object from a line. The format has one of two options:
- [example index][tab][passage][tab][question][tab][options][tab][label]
- [passage][tab][question][tab][options][tab][label]
The
answer_options
column is assumed formatted as:[option]###[option]###[option]...
That is, we split on three hashes ("###"
).
-
to_indexed_instance
(data_indexer: deep_qa.data.data_indexer.DataIndexer)[source]¶ Converts the words in this
Instance
into indices using theDataIndexer
.Parameters: data_indexer : DataIndexer
DataIndexer
to use in converting theInstance
to anIndexedInstance
.Returns: indexed_instance : IndexedInstance
A
TextInstance
that has had all of its strings converted into indices.
-
words
() → typing.Dict[str, typing.List[str]][source]¶ Returns a list of all of the words in this instance, contained in a namespace dictionary.
This is mainly used for computing word counts when fitting a word vocabulary on a dataset. The namespace dictionary allows you to have several embedding matrices with different vocab sizes, e.g., for words and for characters (in fact, words and characters are the only use cases I can think of for now, but this allows you to do other more crazy things if you want). You can call the namespaces whatever you want, but if you want the
DataIndexer
to work correctly without namespace arguments, you should use the key ‘words’ to represent word tokens.Returns: namespace : Dictionary of {str: List[str]}
The
str
key refers to vocabularies, and theList[str]
should contain the tokens in that vocabulary. For example, you should use the keywords
to represent word tokens, and the correspoding value in the dictionary would be a list of all the words in the instance.
-
CharacterSpanInstances¶
-
class
deep_qa.data.instances.reading_comprehension.character_span_instance.
CharacterSpanInstance
(question: str, passage: str, label: typing.Tuple[int, int], index: int = None)[source]¶ Bases:
deep_qa.data.instances.reading_comprehension.question_passage_instance.QuestionPassageInstance
A CharacterSpanInstance is a QuestionPassageInstance that represents a (question, passage) pair with an associated label, which is the data given for the span prediction task. The label is a span of characters in the passage that indicates where the answer to the question begins and where the answer to the question ends.
The main thing this class handles over QuestionPassageInstance is in specifying the form of and how to index the label, which is given as a span of _characters_ in the passage. The label we are going to use in the rest of the code is a span of _tokens_ in the passage, so the mapping from character labels to token labels depends on the tokenization we did, and the logic to handle this is, unfortunately, a little complicated. The label conversion happens when converting a CharacterSpanInstance to in IndexedInstance (where character indices are generally lost, anyway).
This class should be used to represent training instances for the SQuAD (Stanford Question Answering) and NewsQA datasets, to name a few.
-
_index_label
(label: typing.Tuple[int, int]) → typing.List[int][source]¶ Specify how to index self.label, which is needed to convert the CharacterSpanInstance into an IndexedInstance (handled in superclass).
-
classmethod
read_from_line
(line: str)[source]¶ Reads a CharacterSpanInstance object from a line. The format has one of two options:
- [example index][tab][question][tab][passage][tab][label]
- [question][tab][passage][tab][label]
[label] is assumed to be a comma-separated pair of integers.
-
stop_token
= '@@STOP@@'¶
-
to_indexed_instance
(data_indexer: deep_qa.data.data_indexer.DataIndexer)[source]¶ Converts the words in this
Instance
into indices using theDataIndexer
.Parameters: data_indexer : DataIndexer
DataIndexer
to use in converting theInstance
to anIndexedInstance
.Returns: indexed_instance : IndexedInstance
A
TextInstance
that has had all of its strings converted into indices.
-
Sequence Tagging Instances¶
These Instances
are designed for a sequence tagging task, where the input is a passage of
natural language (e.g., a sentence), and the output is some classification decision for each token
in that passage (e.g., part-of-speech tags, any kind of BIO tagging like NER or chunking, etc.).
TaggingInstances¶
-
class
deep_qa.data.instances.sequence_tagging.tagging_instance.
IndexedTaggingInstance
(text_indices: typing.List[int], label: typing.List[int], index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.IndexedInstance
-
as_training_data
()[source]¶ Convert this
IndexedInstance
to NumPy arrays suitable for use as training data to Keras models.Returns: train_data : (inputs, label)
The
IndexedInstance
as NumPy arrays to be uesd in Keras. Note thatinputs
might itself be a complex tuple, depending on theInstance
type.
-
classmethod
empty_instance
()[source]¶ Returns an empty, unpadded instance of this class. Necessary for option padding in multiple choice instances.
-
get_padding_lengths
() → typing.Dict[str, int][source]¶ Returns the length of this instance in all dimensions that require padding.
Different kinds of instances have different fields that are padded, such as sentence length, number of background sentences, number of options, etc.
Returns: padding_lengths: Dict[str, int]
A dictionary mapping padding keys (like “num_sentence_words”) to lengths.
-
pad
(padding_lengths: typing.Dict[str, int])[source]¶ Add zero-padding to make each data example of equal length for use in the neural network.
This modifies the current object.
Parameters: padding_lengths: Dict[str, int]
In this dictionary, each
str
refers to a type of token (e.g.num_sentence_words
), and the correspondingint
is the value. This dictionary must have the same keys as was returned byget_padding_lengths()
. We will use these lengths to pad the instance in all of the necessary dimensions to the given leangths.
-
-
class
deep_qa.data.instances.sequence_tagging.tagging_instance.
TaggingInstance
(text: str, label: typing.Any, index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.TextInstance
A
TaggingInstance
represents a passage of text and a tag sequence over that text.There are some sticky issues with tokenization and how exactly the label is specified. For example, if your label is a sequence of tags, that assumes a particular tokenization, which interacts in a funny way with our tokenization code. This is a general superclass containing common functionality for most simple sequence tagging tasks. The specifics of reading in data from a file and converting that data into properly-indexed tag sequences is left to subclasses.
-
_index_label
(label: typing.Any, data_indexer: deep_qa.data.data_indexer.DataIndexer) → typing.List[int][source]¶ Index the labels. Since we don’t know what form the label takes, we leave it to subclasses to implement this method. If you need to convert tag names into indices, use the namespace ‘tags’ in the
DataIndexer
.
Returns all of the tag words in this instance, so that we can convert them into indices. This is called in
self.words()
. Not necessary if you have some pre-indexed labeling scheme.
-
words
() → typing.Dict[str, typing.List[str]][source]¶ Returns a list of all of the words in this instance, contained in a namespace dictionary.
This is mainly used for computing word counts when fitting a word vocabulary on a dataset. The namespace dictionary allows you to have several embedding matrices with different vocab sizes, e.g., for words and for characters (in fact, words and characters are the only use cases I can think of for now, but this allows you to do other more crazy things if you want). You can call the namespaces whatever you want, but if you want the
DataIndexer
to work correctly without namespace arguments, you should use the key ‘words’ to represent word tokens.Returns: namespace : Dictionary of {str: List[str]}
The
str
key refers to vocabularies, and theList[str]
should contain the tokens in that vocabulary. For example, you should use the keywords
to represent word tokens, and the correspoding value in the dictionary would be a list of all the words in the instance.
-
PretokenizedTaggingInstances¶
-
class
deep_qa.data.instances.sequence_tagging.pretokenized_tagging_instance.
PreTokenizedTaggingInstance
(text: typing.List[str], label: typing.List[str], index: int = None)[source]¶ Bases:
deep_qa.data.instances.sequence_tagging.tagging_instance.TaggingInstance
This is a
TaggingInstance
where the text has been pre-tokenized. Thus thetext
member variable here is actually aList[str]
, instead of astr
.When using this
Instance
, you must use theNoOpWordSplitter
as well, or things will break. You probably also do not want any kind of filtering (though stemming is ok), because only the words will get filtered, not the labels.-
_index_label
(label: typing.List[str], data_indexer: deep_qa.data.data_indexer.DataIndexer) → typing.List[int][source]¶ Index the labels. Since we don’t know what form the label takes, we leave it to subclasses to implement this method. If you need to convert tag names into indices, use the namespace ‘tags’ in the
DataIndexer
.
-
classmethod
read_from_line
(line: str)[source]¶ Reads a
PreTokenizedTaggingInstance
from a line. The format has one of two options:- [example index][token1]###[tag1][tab][token2]###[tag2][tab]...
- [token1]###[tag1][tab][token2]###[tag2][tab]...
Returns all of the tag words in this instance, so that we can convert them into indices. This is called in
self.words()
. Not necessary if you have some pre-indexed labeling scheme.
-
Text Classification Instances¶
These Instances
are designed for any classification task over a single passage of text. The
input is the passage (e.g., a sentence, a document, etc.), and the output is a single label (e.g.,
positive / negative sentiment, spam / not spam, essay grade, etc.).
TextClassificationInstances¶
-
class
deep_qa.data.instances.text_classification.text_classification_instance.
IndexedTextClassificationInstance
(word_indices: typing.List[int], label, index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.IndexedInstance
-
as_training_data
()[source]¶ Convert this
IndexedInstance
to NumPy arrays suitable for use as training data to Keras models.Returns: train_data : (inputs, label)
The
IndexedInstance
as NumPy arrays to be uesd in Keras. Note thatinputs
might itself be a complex tuple, depending on theInstance
type.
-
classmethod
empty_instance
()[source]¶ Returns an empty, unpadded instance of this class. Necessary for option padding in multiple choice instances.
-
get_padding_lengths
() → typing.Dict[str, int][source]¶ Returns the length of this instance in all dimensions that require padding.
Different kinds of instances have different fields that are padded, such as sentence length, number of background sentences, number of options, etc.
Returns: padding_lengths: Dict[str, int]
A dictionary mapping padding keys (like “num_sentence_words”) to lengths.
-
pad
(padding_lengths: typing.Dict[str, int])[source]¶ Add zero-padding to make each data example of equal length for use in the neural network.
This modifies the current object.
Parameters: padding_lengths: Dict[str, int]
In this dictionary, each
str
refers to a type of token (e.g.num_sentence_words
), and the correspondingint
is the value. This dictionary must have the same keys as was returned byget_padding_lengths()
. We will use these lengths to pad the instance in all of the necessary dimensions to the given leangths.
-
-
class
deep_qa.data.instances.text_classification.text_classification_instance.
TextClassificationInstance
(text: str, label: bool, index: int = None)[source]¶ Bases:
deep_qa.data.instances.instance.TextInstance
A TextClassificationInstance is a
TextInstance
that is a single passage of text, where that passage has some associated (categorical, or possibly real-valued) label.-
classmethod
read_from_line
(line: str)[source]¶ Reads a TextClassificationInstance object from a line. The format has one of four options:
- [sentence]
- [sentence index][tab][sentence]
- [sentence][tab][label]
- [sentence index][tab][sentence][tab][label]
If no label is given, we use
None
as the label.
-
to_indexed_instance
(data_indexer: deep_qa.data.data_indexer.DataIndexer)[source]¶ Converts the words in this
Instance
into indices using theDataIndexer
.Parameters: data_indexer : DataIndexer
DataIndexer
to use in converting theInstance
to anIndexedInstance
.Returns: indexed_instance : IndexedInstance
A
TextInstance
that has had all of its strings converted into indices.
-
words
() → typing.Dict[str, typing.List[str]][source]¶ Returns a list of all of the words in this instance, contained in a namespace dictionary.
This is mainly used for computing word counts when fitting a word vocabulary on a dataset. The namespace dictionary allows you to have several embedding matrices with different vocab sizes, e.g., for words and for characters (in fact, words and characters are the only use cases I can think of for now, but this allows you to do other more crazy things if you want). You can call the namespaces whatever you want, but if you want the
DataIndexer
to work correctly without namespace arguments, you should use the key ‘words’ to represent word tokens.Returns: namespace : Dictionary of {str: List[str]}
The
str
key refers to vocabularies, and theList[str]
should contain the tokens in that vocabulary. For example, you should use the keywords
to represent word tokens, and the correspoding value in the dictionary would be a list of all the words in the instance.
-
classmethod
Tokenizers¶
character_tokenizer¶
-
class
deep_qa.data.tokenizers.character_tokenizer.
CharacterTokenizer
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.data.tokenizers.tokenizer.Tokenizer
A CharacterTokenizer splits strings into character tokens.
Notes
Note that in the code, we’re still using the “words” namespace, and the “num_sentence_words” padding key, instead of using a different “characters” namespace. This is so that the rest of the code doesn’t have to change as much to just use this different tokenizer. For example, this is an issue when adding start and stop tokens - how is an
Instance
class supposed to know if it should use the “words” or the “characters” namespace when getting a start token id? If we just always use the “words” namespace for the top-level token namespace, it’s not an issue.But confusingly, we’ll still use the “characters” embedding key... At least the user-facing parts all use
characters
; it’s only in writing tokenizer code that you need to be careful about namespaces. TODO(matt): it probably makes sense to change the default namespace to “tokens”, and use that for both the words inWordTokenizer
and the characters inCharacterTokenizer
, so the naming isn’t so confusing.-
embed_input
(input_layer: keras.engine.topology.Layer, embed_function: typing.Callable[[keras.engine.topology.Layer, str, str], keras.engine.topology.Layer], text_trainer, embedding_suffix: str = '')[source]¶ Applies embedding layers to the input_layer. See
TextTrainer._embed_input
for a more detailed comment on what this method does.Parameters: input_layer: Keras ``Input()`` layer
The layer to embed.
embed_function: Callable[[‘Layer’, str, str], ‘Tensor’]
This should be the __get_embedded_input method from your instantiated
TextTrainer
. This function actually applies anEmbedding
layer (and maybe also a projection and dropout) to the input layer.text_trainer: TextTrainer
Simple
Tokenizers
will just need to use theembed_function
that gets passed as a parameter here, but complexTokenizers
might need more than just an embedding function. So that you can get an encoder or other things from theTextTrainer
here if you need them, we take this object as a parameter.embedding_suffix: str, optional (default=””)
A suffix to add to embedding keys that we use, so that, e.g., you could specify several different word embedding matrices, for whatever reason.
-
get_padding_lengths
(sentence_length: int, word_length: int) → typing.Dict[str, int][source]¶ When dealing with padding in TextTrainer, TextInstances need to know what to pad and how much. This function takes a potential max sentence length and word length, and returns a lengths dictionary containing keys for the padding that is applicable to this encoding.
-
get_sentence_shape
(sentence_length: int, word_length: int) → typing.Tuple[int][source]¶ If we have a text sequence of length sentence_length, what shape would that correspond to with this encoding? For words or characters only, this would just be (sentence_length,). For an encoding that contains both words and characters, it might be (sentence_length, word_length).
-
get_words_for_indexer
(text: str) → typing.Dict[str, typing.List[str]][source]¶ The DataIndexer needs to assign indices to whatever strings we see in the training data (possibly doing some frequency filtering and using an OOV token). This method takes some text and returns whatever the DataIndexer would be asked to index from that text. Note that this returns a dictionary of token lists keyed by namespace. Typically, the key would be either ‘words’ or ‘characters’. An example for indexing the string ‘the’ might be {‘words’: [‘the’], ‘characters’: [‘t’, ‘h’, ‘e’]}, if you are indexing both words and characters.
-
index_text
(text: str, data_indexer: deep_qa.data.data_indexer.DataIndexer) → typing.List[source]¶ This method actually converts some text into an indexed list. This could be a list of integers (for either word tokens or characters), or it could be a list of arrays (for word tokens combined with characters), or something else.
-
tokenizer¶
-
class
deep_qa.data.tokenizers.tokenizer.
Tokenizer
(params: deep_qa.common.params.Params)[source]¶ Bases:
object
A Tokenizer splits strings into sequences of tokens that can be used in a model. The “tokens” here could be words, characters, or words and characters. The Tokenizer object handles various things involved with this conversion, including getting a list of tokens for pre-computing a vocabulary, getting the shape of a word sequence in a model, etc. The Tokenizer needs to handle these things because the tokenization you do could affect the shape of word sequence tensors in the model (e.g., a sentence could have shape (num_words,), (num_characters,), or (num_words, num_characters)).
-
static
_spans_match
(sentence_tokens: typing.List[str], span_tokens: typing.List[str], index: int) → bool[source]¶
-
char_span_to_token_span
(sentence: str, span: typing.Tuple[int, int], slack: int = 3) → typing.Tuple[int, int][source]¶ Converts a character span from a sentence into the corresponding token span in the tokenized version of the sentence. If you pass in a character span that does not correspond to complete tokens in the tokenized version, we’ll do our best, but the behavior is officially undefined.
The basic outline of this method is to find the token that starts the same number of characters into the sentence as the given character span. We try to handle a bit of error in the tokenization by checking slack tokens in either direction from that initial estimate.
The returned
(begin, end)
indices are inclusive forbegin
, and exclusive forend
. So, for example,(2, 2)
is an empty span,(2, 3)
is the one-word span beginning at token index 2, and so on.
-
embed_input
(input_layer: keras.engine.topology.Layer, embed_function: typing.Callable[[keras.engine.topology.Layer, str, str], keras.engine.topology.Layer], text_trainer, embedding_suffix: str = '')[source]¶ Applies embedding layers to the input_layer. See
TextTrainer._embed_input
for a more detailed comment on what this method does.Parameters: input_layer: Keras ``Input()`` layer
The layer to embed.
embed_function: Callable[[‘Layer’, str, str], ‘Tensor’]
This should be the __get_embedded_input method from your instantiated
TextTrainer
. This function actually applies anEmbedding
layer (and maybe also a projection and dropout) to the input layer.text_trainer: TextTrainer
Simple
Tokenizers
will just need to use theembed_function
that gets passed as a parameter here, but complexTokenizers
might need more than just an embedding function. So that you can get an encoder or other things from theTextTrainer
here if you need them, we take this object as a parameter.embedding_suffix: str, optional (default=””)
A suffix to add to embedding keys that we use, so that, e.g., you could specify several different word embedding matrices, for whatever reason.
-
get_custom_objects
() → typing.Dict[str, typing.Layer][source]¶ If you use any custom
Layers
in yourembed_input
method, you need to return them here, so that theTextTrainer
can correctly load models.
-
get_padding_lengths
(sentence_length: int, word_length: int) → typing.Dict[str, int][source]¶ When dealing with padding in TextTrainer, TextInstances need to know what to pad and how much. This function takes a potential max sentence length and word length, and returns a lengths dictionary containing keys for the padding that is applicable to this encoding.
-
get_sentence_shape
(sentence_length: int, word_length: int) → typing.Tuple[int][source]¶ If we have a text sequence of length sentence_length, what shape would that correspond to with this encoding? For words or characters only, this would just be (sentence_length,). For an encoding that contains both words and characters, it might be (sentence_length, word_length).
-
get_words_for_indexer
(text: str) → typing.Dict[str, typing.List[str]][source]¶ The DataIndexer needs to assign indices to whatever strings we see in the training data (possibly doing some frequency filtering and using an OOV token). This method takes some text and returns whatever the DataIndexer would be asked to index from that text. Note that this returns a dictionary of token lists keyed by namespace. Typically, the key would be either ‘words’ or ‘characters’. An example for indexing the string ‘the’ might be {‘words’: [‘the’], ‘characters’: [‘t’, ‘h’, ‘e’]}, if you are indexing both words and characters.
-
index_text
(text: str, data_indexer: deep_qa.data.data_indexer.DataIndexer) → typing.List[source]¶ This method actually converts some text into an indexed list. This could be a list of integers (for either word tokens or characters), or it could be a list of arrays (for word tokens combined with characters), or something else.
-
static
word_and_character_tokenizer¶
-
class
deep_qa.data.tokenizers.word_and_character_tokenizer.
WordAndCharacterTokenizer
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.data.tokenizers.tokenizer.Tokenizer
A
WordAndCharacterTokenizer
first splits strings into words, then splits those words into characters, and returns a representation that contains both a word index and a sequence of character indices for each word. See the documention forWordTokenizer
for a note about naming, and the typical notion of “tokenization” in NLP.Notes
In
embed_input
, thisTokenizer
uses an encoder to get a character-level word embedding, which then gets concatenated with a standard word embedding from an embedding matrix. To specify the encoder to use for this character-level word embedding, use the"word"
key in theencoder
parameter to your model (which should be aTextTrainer
subclass - see the documentation there for some more info). If you do not give a"word"
key in theencoder
dict, we’ll create a new encoder using the"default"
parameters.-
embed_input
(input_layer: keras.engine.topology.Layer, embed_function: typing.Callable[[keras.engine.topology.Layer, str, str], keras.engine.topology.Layer], text_trainer, embedding_suffix: str = '')[source]¶ A combined word-and-characters representation requires some fancy footwork to do the embedding properly.
This method assumes the input shape is (..., sentence_length, word_length + 1), where the first integer for each word in the tensor is the word index, and the remaining word_length entries is the character sequence. We’ll first split this into two tensors, one of shape (..., sentence_length), and one of shape (..., sentence_length, word_length), where the first is the word sequence, and the second is the character sequence for each word. We’ll pass the word sequence through an embedding layer, as normal, and pass the character sequence through a _separate_ embedding layer, then an encoder, to get a word vector out. We’ll then concatenate the two word vectors, returning a tensor of shape (..., sentence_length, embedding_dim * 2).
-
get_custom_objects
() → typing.Dict[str, typing.Any][source]¶ If you use any custom
Layers
in yourembed_input
method, you need to return them here, so that theTextTrainer
can correctly load models.
-
get_padding_lengths
(sentence_length: int, word_length: int) → typing.Dict[str, int][source]¶ When dealing with padding in TextTrainer, TextInstances need to know what to pad and how much. This function takes a potential max sentence length and word length, and returns a lengths dictionary containing keys for the padding that is applicable to this encoding.
-
get_sentence_shape
(sentence_length: int, word_length: int = None) → typing.Tuple[int][source]¶ If we have a text sequence of length sentence_length, what shape would that correspond to with this encoding? For words or characters only, this would just be (sentence_length,). For an encoding that contains both words and characters, it might be (sentence_length, word_length).
-
get_words_for_indexer
(text: str) → typing.Dict[str, typing.List[str]][source]¶ The DataIndexer needs to assign indices to whatever strings we see in the training data (possibly doing some frequency filtering and using an OOV token). This method takes some text and returns whatever the DataIndexer would be asked to index from that text. Note that this returns a dictionary of token lists keyed by namespace. Typically, the key would be either ‘words’ or ‘characters’. An example for indexing the string ‘the’ might be {‘words’: [‘the’], ‘characters’: [‘t’, ‘h’, ‘e’]}, if you are indexing both words and characters.
-
index_text
(text: str, data_indexer: deep_qa.data.data_indexer.DataIndexer) → typing.List[source]¶ This method actually converts some text into an indexed list. This could be a list of integers (for either word tokens or characters), or it could be a list of arrays (for word tokens combined with characters), or something else.
-
word_splitter¶
-
class
deep_qa.data.tokenizers.word_splitter.
NltkWordSplitter
[source]¶ Bases:
deep_qa.data.tokenizers.word_splitter.WordSplitter
A tokenizer that uses nltk’s word_tokenize method.
I found that nltk is very slow, so I switched to using my own simple one, which is a good deal faster. But I’m adding this one back so that there’s consistency with older versions of the code, if you really want it.
-
class
deep_qa.data.tokenizers.word_splitter.
NoOpWordSplitter
[source]¶ Bases:
deep_qa.data.tokenizers.word_splitter.WordSplitter
This is a word splitter that does nothing. We’re playing a little loose with python’s dynamic typing, breaking the typical WordSplitter API a bit and assuming that you’ve already split
sentence
into a list somehow, so you don’t need to do anything else here. For example, thePreTokenizedTaggingInstance
requires this word splitter, because it reads in pre-tokenized data from a file.
-
class
deep_qa.data.tokenizers.word_splitter.
SimpleWordSplitter
[source]¶ Bases:
deep_qa.data.tokenizers.word_splitter.WordSplitter
Does really simple tokenization. NLTK was too slow, so we wrote our own simple tokenizer instead. This just does an initial split(), followed by some heuristic filtering of each whitespace-delimited token, separating contractions and punctuation. We assume lower-cased, reasonably well-formed English sentences as input.
-
split_words
(sentence: str) → typing.List[str][source]¶ Splits a sentence into word tokens. We handle four kinds of things: words with punctuation that should be ignored as a special case (Mr. Mrs., etc.), contractions/genitives (isn’t, don’t, Matt’s), and beginning and ending punctuation (“antennagate”, (parentheticals), and such.).
The basic outline is to split on whitespace, then check each of these cases. First, we strip off beginning punctuation, then strip off ending punctuation, then strip off contractions. When we strip something off the beginning of a word, we can add it to the list of tokens immediately. When we strip it off the end, we have to save it to be added to after the word itself has been added. Before stripping off any part of a token, we first check to be sure the token isn’t in our list of special cases.
-
-
class
deep_qa.data.tokenizers.word_splitter.
SpacyWordSplitter
[source]¶ Bases:
deep_qa.data.tokenizers.word_splitter.WordSplitter
A tokenizer that uses spaCy’s Tokenizer, which is much faster than the others.
tokenizers.word_tokenizer¶
-
class
deep_qa.data.tokenizers.word_tokenizer.
WordTokenizer
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.data.tokenizers.tokenizer.Tokenizer
A
WordTokenizer
splits strings into word tokens.There are several ways that you can split a string into words, so we rely on a
WordProcessor
to do that work for us. Note that we’re using the word “tokenizer” here for something different than is typical in NLP - we’re referring here to how strings are represented as numpy arrays, not the linguistic notion of splitting sentences into tokens. Those things are handled in theWordProcessor
, which is a common dependency in severalTokenizers
.Parameters: processor: Dict[str, Any], default={}
Contains parameters for processing text strings into word tokens, including, e.g., splitting, stemming, and filtering words. See
WordProcessor
for a complete description of available parameters.-
embed_input
(input_layer: keras.engine.topology.Layer, embed_function: typing.Callable[[keras.engine.topology.Layer, str, str], keras.engine.topology.Layer], text_trainer, embedding_suffix: str = '')[source]¶ Applies embedding layers to the input_layer. See
TextTrainer._embed_input
for a more detailed comment on what this method does.Parameters: input_layer: Keras ``Input()`` layer
The layer to embed.
embed_function: Callable[[‘Layer’, str, str], ‘Tensor’]
This should be the __get_embedded_input method from your instantiated
TextTrainer
. This function actually applies anEmbedding
layer (and maybe also a projection and dropout) to the input layer.text_trainer: TextTrainer
Simple
Tokenizers
will just need to use theembed_function
that gets passed as a parameter here, but complexTokenizers
might need more than just an embedding function. So that you can get an encoder or other things from theTextTrainer
here if you need them, we take this object as a parameter.embedding_suffix: str, optional (default=””)
A suffix to add to embedding keys that we use, so that, e.g., you could specify several different word embedding matrices, for whatever reason.
-
get_padding_lengths
(sentence_length: int, word_length: int) → typing.Dict[str, int][source]¶ When dealing with padding in TextTrainer, TextInstances need to know what to pad and how much. This function takes a potential max sentence length and word length, and returns a lengths dictionary containing keys for the padding that is applicable to this encoding.
-
get_sentence_shape
(sentence_length: int, word_length: int) → typing.Tuple[int][source]¶ If we have a text sequence of length sentence_length, what shape would that correspond to with this encoding? For words or characters only, this would just be (sentence_length,). For an encoding that contains both words and characters, it might be (sentence_length, word_length).
-
get_words_for_indexer
(text: str) → typing.Dict[str, typing.List[str]][source]¶ The DataIndexer needs to assign indices to whatever strings we see in the training data (possibly doing some frequency filtering and using an OOV token). This method takes some text and returns whatever the DataIndexer would be asked to index from that text. Note that this returns a dictionary of token lists keyed by namespace. Typically, the key would be either ‘words’ or ‘characters’. An example for indexing the string ‘the’ might be {‘words’: [‘the’], ‘characters’: [‘t’, ‘h’, ‘e’]}, if you are indexing both words and characters.
-
index_text
(text: str, data_indexer: deep_qa.data.data_indexer.DataIndexer) → typing.List[source]¶ This method actually converts some text into an indexed list. This could be a list of integers (for either word tokens or characters), or it could be a list of arrays (for word tokens combined with characters), or something else.
-
Data Generators¶
-
class
deep_qa.data.data_generator.
DataGenerator
(text_trainer, params: deep_qa.common.params.Params)[source]¶ Bases:
object
A
DataGenerator
takes anIndexedDataset
and converts it into a generator, yielding batches suitable for training. You might want to do this instead of just creating one large set of numpy arrays for a few reasons:- Creating large arrays for your whole data could take a whole lot of memory, maybe more than is available on your machine.
- Creating one large array means padding all of your instances to the same length. This
typically means you waste a whole lot of computation on padding tokens. Using a
DataGenerator
instead allows you to only pad each batch to the same length, instead of all of your instances across your whole dataset. We’ve typically seen a 4-5x speed up just from doing this (partially because Keras is pretty bad at doing variable-length computation; the speed-up isn’t quite as large with plain tensorflow, I think). - If we’re varying the padding lengths in each batch, we can also vary the batch size, to optimize GPU memory usage. This means we’ll use smaller batch sizes for big instances, and larger batch sizes for small instances. We’ve seen speedups up to 10-12x (on top of the 4-5x speed up above) from doing this.
Parameters: text_trainer: TextTrainer
We need access to the
TextTrainer
object so we can call some methods on it, such asget_instance_sorting_keys()
.dynamic_padding: bool, optional (default=False)
If
True
, we will set padding lengths based on the data per batch, instead of on the whole dataset. This only works if your model is structured to allow variable-length sequences (typically usingNone
for specific dimensions when you build your model), and if you don’t set padding values in_set_padding_lengths()
. This flag specifically is read in_set_padding_lengths()
to know if we should set certain padding values or not. It’s handled correctly fornum_sentence_words
andnum_word_characters
inTextTrainer
, but you need to be sure to implement it correctly in subclasses for this to work.padding_noise: double, optional (default=.1)
When sorting by padding length, we add a bit of noise to the lengths, so that the sorting isn’t deterministic. This parameter determines how much noise we add, as a percentage of the actual padding value for each instance.
sort_every_epoch: bool, optional (default=True)
If
True
, we will re-sort the data after every epoch, then re-group the instances into batches. Ifpadding_noise
is zero, this does nothing, but if it’s non-zero, this will give you a slightly different ordering, so you don’t have exactly the same batches at every epoch. If you’re doing adaptive batch sizes, this will lead to re-computing the adaptive batches each epoch, which could give a different number of batches for the whole dataset, which means each “epoch” might no longer correspond to exactly one pass over the data. This is probably a pretty minor issue, though.adaptive_batch_sizes: bool, optional (default=False)
Only relevant if
dynamic_padding
isTrue
. Ifadaptive_batch_sizes
isTrue
, we will vary the batch size to try to optimize GPU memory usage. Because padding lengths are done dynamically, we can have larger batches when padding lengths are smaller, maximizing our usage of the GPU. In order for this to work, you need to do two things: (1) override_get_padding_memory_scaling()
to give a big-O bound on memory usage given padding lengths, and (2) tune the adaptive_memory_usage_constant parameter for your particular model and GPU. See the documentation for_get_padding_memory_scaling()
for more information.adaptive_memory_usage_constant: int, optional (default=None)
Only relevant if
adaptive_batch_sizes
isTrue
. This is a manually-tuned parameter, specific to a particular model architecture and amount of GPU memory (e.g., if you change the number of hidden layers in your model, this number will need to change). See_get_padding_memory_scaling()
for more detail. The recommended way to tune this parameter is to (1) use a fixed batch size, withbiggest_batch_first
set toTrue
, and find out the maximum batch size you can handle on your biggest instances without running out of memory. Then (2) turn onadaptive_batch_sizes
, and set this parameter so that you get the right batch size for your biggest instances. If you set the log level toDEBUG
inscripts/run_model.py
, you can see the batch sizes that are computed.maximum_batch_size: int, optional (default=1000000)
If we’re using adaptive batch sizes, you can use this to be sure you do not create batches larger than this, even if you have enough memory to handle it on your GPU. You might choose to do this to keep smaller batches because you like the noisier gradient estimates that come from smaller batches, for instance.
biggest_batch_first: bool, optional (default=False)
This is largely for testing, to see how large of a batch you can safely use with your GPU. It’s only meaningful if you’re using dynamic padding - this will let you try out the largest batch that you have in the data first, so that if you’re going to run out of memory, you know it early, instead of waiting through the whole batch to find out at the end that you’re going to crash.
-
create_generator
(dataset: deep_qa.data.datasets.dataset.IndexedDataset, batch_size: int = None)[source]¶ Main external API call: converts an
IndexedDataset
into a data generator suitable for use with Keras’fit_generator
and related methods.
-
last_num_batches
= None¶ This field can be read after calling
create_generator
to get the number of steps you should take per epoch inmodel.fit_generator
ormodel.evaluate_generator
for this data.
Datasets¶
deep_qa.data.dataset¶
-
class
deep_qa.data.datasets.dataset.
Dataset
(instances: typing.List[deep_qa.data.instances.instance.Instance])[source]¶ Bases:
object
A collection of Instances.
This base class has general methods that apply to all collections of Instances. That basically is just methods that operate on sets, like merging and truncating.
-
merge
(other: deep_qa.data.datasets.dataset.Dataset) → deep_qa.data.datasets.dataset.Dataset[source]¶ Combine two datasets. If you call try to merge two Datasets of the same subtype, you will end up with a Dataset of the same type (i.e., calling IndexedDataset.merge() with another IndexedDataset will return an IndexedDataset). If the types differ, this method currently raises an error, because the underlying Instance objects are not currently type compatible.
-
-
class
deep_qa.data.datasets.dataset.
IndexedDataset
(instances: typing.List[deep_qa.data.instances.instance.IndexedInstance])[source]¶ Bases:
deep_qa.data.datasets.dataset.Dataset
A Dataset of IndexedInstances, with some helper methods.
IndexedInstances have text sequences replaced with lists of word indices, and are thus able to be padded to consistent lengths and converted to training inputs.
-
as_training_data
()[source]¶ Takes each
IndexedInstance
and converts it into (inputs, labels), according to the Instance’s as_training_data() method. Both the inputs and the labels are numpy arrays. Note that if theInstances
return tuples for their inputs, we convert the list of tuples into a tuple of lists, before converting everything to numpy arrays.
-
pad_instances
(padding_lengths: typing.Dict[str, int] = None, verbose: bool = True)[source]¶ Makes all of the
IndexedInstances
in the dataset have the same length by padding them. ThisDataset
object doesn’t know what things there are in theInstance
to pad, but theInstances
do, and so does the model that called us, passing in apadding_lengths
dictionary. The keys in that dictionary must match the lengths that theInstance
knows about.Given that, this method does two things: (1) it asks each of the
Instances
what their padding lengths are, and takes a max (usingpadding_lengths()
). It then reconciles those values with thepadding_lengths
we were passed as an argument to this method, and pads the instances withIndexedInstance.pad()
. Ifpadding_lengths
has a particular key specified with a value, that value takes precedence over whatever we computed in our data. TODO(matt): with dynamic padding, we should probably have this be a max padding length, not a hard setting, but that requires some API changes.This method modifies the current object, it does not return a new
IndexedDataset
.Parameters: padding_lengths: Dict[str, int]
If a key is present in this dictionary with a non-None value, we will pad to that length instead of the length calculated from the data. This lets you, e.g., set a maximum value for sentence length, or word length, if you want to throw out long sequences.
verbose: bool, optional (default=True)
Should we output logging information when we’re doing this padding? If the dataset is large, this is nice to have, because padding a large dataset could take a long time. But if you’re doing this inside of a data generator, having all of this output per batch is a bit obnoxious.
-
-
class
deep_qa.data.datasets.dataset.
TextDataset
(instances: typing.List[deep_qa.data.instances.instance.TextInstance], params: deep_qa.common.params.Params = None)[source]¶ Bases:
deep_qa.data.datasets.dataset.Dataset
A Dataset of TextInstances, with a few helper methods.
TextInstances aren’t useful for much with Keras until they’ve been indexed. So this class just has methods to read in data from a file and convert it into other kinds of Datasets.
-
static
read_from_file
(filename: str, instance_class, params: deep_qa.common.params.Params = None)[source]¶
-
static
Entailment¶
General Data Utils¶
deep_qa.data.data_indexer¶
-
class
deep_qa.data.data_indexer.
DataIndexer
[source]¶ Bases:
object
A DataIndexer maps strings to integers, allowing for strings to be mapped to an out-of-vocabulary token.
DataIndexers are fit to a particular dataset, which we use to decide which words are in-vocabulary.
DataIndexers also allow for several different namespaces, so you can have separate word indices for ‘a’ as a word, and ‘a’ as a character, for instance. Most of the methods on this class allow you to pass in a namespace; by default we use the ‘words’ namespace, and you can omit the namespace argument everywhere and just use the default.
-
add_word_to_index
(word: str, namespace: str = 'words') → int[source]¶ Adds word to the index, if it is not already present. Either way, we return the index of the word.
-
fit_word_dictionary
(dataset, min_count: int = 1)[source]¶ Given a
Dataset
, this method decides which words are given an index, and which ones are mapped to an OOV token (in this case “UNK”). This method must be called before any dataset is indexed with thisDataIndexer
. If you don’t first fit the word dictionary, you’ll basically map every token onto “UNK”.We call
instance.words()
for each instance in the dataset, and then keep all words that appear at leastmin_count
times.Parameters: dataset: ``TextDataset``
The dataset to index.
min_count: int, optional (default=1)
The minimum number of occurences a word must have in the dataset in order to be assigned an index.
-
deep_qa.data.embeddings¶
-
class
deep_qa.data.embeddings.
PretrainedEmbeddings
[source]¶ Bases:
object
-
static
get_embedding_layer
(embeddings_filename: str, data_indexer: deep_qa.data.data_indexer.DataIndexer, trainable=False, log_misses=False, name='pretrained_embedding')[source]¶ Reads a pre-trained embedding file and generates a Keras Embedding layer that has weights initialized to the pre-trained embeddings. The Embedding layer can either be trainable or not.
We use the DataIndexer to map from the word strings in the embeddings file to the indices that we need, and to know which words from the embeddings file we can safely ignore. If we come across a word in DataIndexer that does not show up with the embeddings file, we give it a zero vector.
The embeddings file is assumed to be gzipped, formatted as [word] [dim 1] [dim 2] ...
-
static
About Models¶
In this module we define a number of concrete models. The models are grouped by task, where each task has a roughly coherent input/output specification. See the README in each submodule for a description of the task models in that submodule are designed to solve.
You should think of these models as more of “model families” than actual models, though, as there are typically options left unspecified in the models themselves. For example, models in this module might have a layer that encodes word sequences into vectors; they just call a method on TextTrainer to get an encoder, and the decision for which actual encoder is used (an LSTM, a CNN, or something else) happens in the parameters passed to TextTrainer. If you really want to, you can hard-code specific decisions for these things, but most models we have here use the TextTrainer API to abstract away these decisions, giving implementations of a class of similar models, instead of a single model.
We also define a few general Pretrainers in a submodule here. The Pretrainers in this top-level submodule are suitable to pre-train a large class of models (e.g., any model that encodes sentences), while more task-specific Pretrainers are found in that task’s submodule.
Below, we describe a few popular models that we’ve implemented and include our output when training.
Attention Sum Reader¶
The Attention Sum Reader
Network is implemented in
attention_sum_reader
.
Train Logs:
Using Theano backend.
Using gpu device 0: Tesla K80 (CNMeM is disabled, cuDNN 5105)
/home/nelsonl/miniconda3/envs/deep_qa/lib/python3.5/site-packages/theano/sandbox/cuda/__init__.py:600: UserWarning: Your cuDNN version is more recent than the one Theano officially supports. If you see any problems, try updating Theano or downgrading cuDNN to version 5.
warnings.warn(warn)
2017-01-26 23:52:54,082 - INFO - deep_qa.common.checks - Keras version: 1.2.0
2017-01-26 23:52:54,082 - INFO - deep_qa.common.checks - Theano version: 0.8.2
2017-01-26 23:52:54,269 - INFO - __main__ - Training model
2017-01-26 23:52:54,270 - INFO - deep_qa.training.trainer - Running training (TextTrainer)
2017-01-26 23:52:54,270 - INFO - deep_qa.training.trainer - Getting training data
2017-01-26 23:52:58,914 - INFO - deep_qa.data.dataset - Finished reading dataset; label counts: [(0, 42399), (1, 44896), (2, 23832), (3, 11274), (4, 585)]
2017-01-26 23:58:07,539 - INFO - deep_qa.training.text_trainer - Indexing dataset
2017-01-27 00:03:28,722 - INFO - deep_qa.training.text_trainer - Padding dataset to lengths {'num_option_words': None, 'num_question_words': None, 'wod_sequence_length': None, 'num_options': None, 'num_passage_words': None}
2017-01-27 00:03:28,722 - INFO - deep_qa.data.dataset - Getting max lengths from instances
2017-01-27 00:03:29,714 - INFO - deep_qa.data.dataset - Instance max lengths: {'num_option_words': 68, 'num_question_words': 121, 'num_options': 5, 'nm_passage_words': 3090}
2017-01-27 00:03:29,714 - INFO - deep_qa.data.dataset - Now actually padding instances to length: {'num_option_words': 68, 'num_question_words': 121, num_options': 5, 'num_passage_words': 3090}
2017-01-27 00:05:40,054 - INFO - deep_qa.training.trainer - Getting validation data
2017-01-27 00:05:40,347 - INFO - deep_qa.data.dataset - Finished reading dataset; label counts: [(0, 3522), (1, 3429), (2, 1835), (3, 784), (4, 430)]
2017-01-27 00:05:40,348 - INFO - deep_qa.training.text_trainer - Indexing dataset
2017-01-27 00:06:02,773 - INFO - deep_qa.training.text_trainer - Padding dataset to lengths {'num_option_words': 68, 'num_question_words': 121, 'word_sequence_length': None, 'num_options': 5, 'num_passage_words': 3090}
2017-01-27 00:06:02,774 - INFO - deep_qa.data.dataset - Getting max lengths from instances
2017-01-27 00:06:02,851 - INFO - deep_qa.data.dataset - Instance max lengths: {'num_option_words': 8, 'num_question_words': 95, 'num_options': 5, 'num_passage_words': 2186}
2017-01-27 00:06:02,851 - INFO - deep_qa.data.dataset - Now actually padding instances to length: {'num_option_words': 68, 'num_question_words': 121, 'num_options': 5, 'num_passage_words': 3090}
2017-01-27 00:06:13,387 - INFO - deep_qa.training.trainer - Building the model
____________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
====================================================================================================
document_input (InputLayer) (None, 3090) 0
____________________________________________________________________________________________________
question_input (InputLayer) (None, 121) 0
____________________________________________________________________________________________________
word_embedding (TimeDistributedE multiple 80112384 question_input[0][0]
document_input[0][0]
____________________________________________________________________________________________________
bidirectional_1 (Bidirectional) (None, 768) 1476864 word_embedding[0][0]
____________________________________________________________________________________________________
bidirectional_2 (Bidirectional) (None, 3090, 768) 1476864 word_embedding[1][0]
____________________________________________________________________________________________________
question_document_softmax (Atten (None, 3090) 0 bidirectional_1[0][0]
bidirectional_2[0][0]
____________________________________________________________________________________________________
options_input (InputLayer) (None, 5, 68) 0
____________________________________________________________________________________________________
options_probability_sum (OptionA (None, 5) 0 document_input[0][0]
question_document_softmax[0][0]
options_input[0][0]
____________________________________________________________________________________________________
l1normalize_1 (L1Normalize) (None, 5) 0 options_probability_sum[0][0]
====================================================================================================
Total params: 83,066,112
Trainable params: 83,066,112
Non-trainable params: 0
____________________________________________________________________________________________________
Train on 127786 samples, validate on 10000 samples
Epoch 1/5
127786/127786 [==============================] - 34850s - loss: 1.0131 - acc: 0.5290 - val_loss: 0.9776 - val_acc: 0.5624
Epoch 2/5
127786/127786 [==============================] - 34828s - loss: 0.6713 - acc: 0.7267 - val_loss: 1.0838 - val_acc: 0.5514
Epoch 3/5
127786/127786 [==============================] - 34835s - loss: 0.2720 - acc: 0.8996 - val_loss: 1.4446 - val_acc: 0.5335
Entailment Models¶
Entailment models take two sequences of text as input and make a classification decision on the pair. Typically that decision represents whether one sentence entails the other, but we’ll use this family of models to represent any kind of classification decision over pairs of text.
Inputs: Two text sequences
Output: Some classification decision (typically “entails/not entails”, “entails/neutral/contradicts”, or similar)
DecomposableAttention¶
-
class
deep_qa.models.entailment.decomposable_attention.
DecomposableAttention
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.training.text_trainer.TextTrainer
This
TextTrainer
implements the Decomposable Attention model described in “A Decomposable Attention Model for Natural Language Inference”, by Parikh et al., 2016, with some optional enhancements before the decomposable attention actually happens. Specifically, Parikh’s original model took plain word embeddings as input to the decomposable attention; we allow other operations the transform these word embeddings, such as running a biLSTM on them, before running the decomposable attention layer.Inputs:
- A “text” sentence, with shape (batch_size, sentence_length)
- A “hypothesis” sentence, with shape (batch_size, sentence_length)
Outputs:
- An entailment decision per input text/hypothesis pair, in {entails, contradicts, neutral}.
Parameters: num_seq2seq_layers : int, optional (default=0)
After getting a word embedding, how many stacked seq2seq encoders should we use before doing the decomposable attention? The default of 0 recreates the original decomposable attention model.
share_encoders : bool, optional (default=True)
Should we use the same seq2seq encoder for the text and hypothesis, or different ones?
decomposable_attention_params : Dict[str, Any], optional (default={})
These parameters get passed to the
DecomposableAttentionEntailment
layer object, and control things like the number of output labels, number of hidden layers in the entailment MLPs, etc. See that class for a complete description of options here.-
_build_model
()[source]¶ Constructs and returns a DeepQaModel (which is a wrapper around a Keras Model) that will take the output of self._get_training_data as input, and produce as output a true/false decision for each input. Note that in the multiple gpu case, this function will be called multiple times for the different GPUs. As such, you should be wary of this function having side effects unrelated to building a computation graph.
The returned model will be used to call model.fit(train_input, train_labels).
-
_instance_type
()[source]¶ When reading datasets, what
Instance
type should we create? TheInstance
class contains code that creates actual numpy arrays, so this instance type determines the inputs that you will get to your model, and the outputs that are used for training.
-
_set_padding_lengths_from_model
()[source]¶ This gets called when loading a saved model. It is analogous to
_set_padding_lengths
, but needs to set all of the values set in that method just by inspecting the loaded model. If we didn’t have this, we would not be able to correctly pad data after loading a model.
-
get_padding_memory_scaling
(padding_lengths: typing.Dict[str, int]) → int[source]¶ This method is for computing adaptive batch sizes. We assume that memory usage is a function that looks like this:
, where
is the memory usage,
is the batch size,
is some constant that depends on how much GPU memory you have and various model hyperparameters, and
is a function outlining how memory usage asymptotically varies with the padding lengths. Our approach will be to let the user effectively set
using the
adaptive_memory_usage_constant
parameter inDataGenerator
. The model (this method) specifies, so we can solve for the batch size
. The more specific you get in specifying
in this function, the better a job we can do in optimizing memory usage.
Parameters: padding_lengths: Dict[str, int]
Dictionary containing padding lengths, mapping keys like
num_sentence_words
to ints. This method computes a function of these ints.Returns: O(p): int
The big-O complexity of the model, evaluated with the specific ints given in
padding_lengths
dictionary.
Reading Comprehension¶
AttentionSumReader¶
-
class
deep_qa.models.reading_comprehension.attention_sum_reader.
AttentionSumReader
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.training.text_trainer.TextTrainer
This TextTrainer implements the Attention Sum Reader model described by Kadlec et. al 2016. It takes a question and document as input, encodes the document and question words with two separate Bidirectional GRUs, and then takes the dot product of the question embedding with the document embedding of each word in the document. This creates an attention over words in the document, and it then selects the option with the highest summed or mean weight as the answer.
-
_build_model
()[source]¶ The basic outline here is that we’ll pass the questions and the document / passage (think of this as a collection of possible answer choices) into a word embedding layer.
Then, we run the word embeddings from the document (a sequence) through a bidirectional GRU and output a sequence that is the same length as the input sequence size. For each time step, the output item (“contextual embedding”) is the concatenation of the forward and backward hidden states in the bidirectional GRU encoder at that time step.
To get the encoded question, we pass the words of the question into another bidirectional GRU. This time, the output encoding is a vector containing the concatenation of the last hidden state in the forward network with the last hidden state of the backward network.
We then take the dot product of the question embedding with each of the contextual embeddings for the words in the documents. We sum up all the occurences of a word (“total attention”), and pick the word with the highest total attention in the document as the answer.
-
_set_padding_lengths
(padding_lengths: typing.Dict[str, int])[source]¶ Set the padding lengths of the model.
-
_set_padding_lengths_from_model
()[source]¶ This gets called when loading a saved model. It is analogous to
_set_padding_lengths
, but needs to set all of the values set in that method just by inspecting the loaded model. If we didn’t have this, we would not be able to correctly pad data after loading a model.
-
BidirectionalAttentionFlow¶
-
class
deep_qa.models.reading_comprehension.bidirectional_attention.
BidirectionalAttentionFlow
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.training.text_trainer.TextTrainer
This class implements Minjoon Seo’s Bidirectional Attention Flow model for answering reading comprehension questions (ICLR 2017).
The basic layout is pretty simple: encode words as a combination of word embeddings and a character-level encoder, pass the word representations through a bi-LSTM/GRU, use a matrix of attentions to put question information into the passage word representations (this is the only part that is at all non-standard), pass this through another few layers of bi-LSTMs/GRUs, and do a softmax over span start and span end.
Parameters: num_hidden_seq2seq_layers : int, optional (default:
2
)At the end of the model, we add a few stacked biLSTMs (or similar), to give the model some depth. This parameter controls how many deep layers we should use.
num_passage_words : int, optional (default:
None
)If set, we will truncate (or pad) all passages to this length. If not set, we will pad all passages to be the same length as the longest passage in the data.
num_question_words : int, optional (default:
None
)Same as
num_passage_words
, but for the number of words in the question. (default:None
)num_highway_layers : int, optional (default:
2
)After constructing a word embedding, but before the first biLSTM layer, Min has some
Highway
layers operating on the word embedding layer. This parameter specifies how many of those to do. (default:2
)highway_activation : string, optional (default:
'relu'
)Specifies the activation function to use for the
Highway
layers mentioned above. Any Keras activation function is acceptable here.similarity_function : Dict[str, Any], optional (default:
{'type': 'linear', 'combination': 'x,y,x*y'}
)Specifies the similarity function to use when computing a similarity matrix between question words and passage words. By default we use the function Min used in his paper.
Notes
Min’s code uses tensors of shape
(batch_size, num_sentences, sentence_length)
to represent the passage, splitting it up into sentences, where here we just have one long passage sequence. I was originally afraid this might mean he applied the biLSTM on each sentence independently, but it looks like he flattens it to our shape before he does any actual operations on it. So, I think this is implementing pretty much exactly what he did, but I’m not totally certain.-
_build_model
()[source]¶ Constructs and returns a DeepQaModel (which is a wrapper around a Keras Model) that will take the output of self._get_training_data as input, and produce as output a true/false decision for each input. Note that in the multiple gpu case, this function will be called multiple times for the different GPUs. As such, you should be wary of this function having side effects unrelated to building a computation graph.
The returned model will be used to call model.fit(train_input, train_labels).
-
_instance_type
()[source]¶ When reading datasets, what
Instance
type should we create? TheInstance
class contains code that creates actual numpy arrays, so this instance type determines the inputs that you will get to your model, and the outputs that are used for training.
-
_set_padding_lengths
(padding_lengths: typing.Dict[str, int])[source]¶ This is about padding. Any model will have some number of things that need padding in order to make a consistent set of input arrays, like the length of a sentence. This method sets those variables given a dictionary of lengths from a dataset.
Note that you might choose not to update some of these lengths, either because you want to keep the model flexible to allow for dynamic (batch-specific) padding, or because you’ve set a hard limit in the class parameters and don’t want to change it.
-
_set_padding_lengths_from_model
()[source]¶ This gets called when loading a saved model. It is analogous to
_set_padding_lengths
, but needs to set all of the values set in that method just by inspecting the loaded model. If we didn’t have this, we would not be able to correctly pad data after loading a model.
-
get_instance_sorting_keys
() → typing.List[str][source]¶ If we’re using dynamic padding, we want to group the instances by padding length, so that we minimize the amount of padding necessary per batch. This variable sets what exactly gets sorted by. We’ll call
get_padding_lengths()
on each instance, pull out these keys, and sort by them in the order specified. You’ll want to override this in your model class if you have more complex models.The default implementation is to sort first by
num_sentence_words
, then bynum_word_characters
(if applicable).
-
get_padding_lengths
() → typing.Dict[str, int][source]¶ This is about padding. Any solver will have some number of things that need padding in order to make consistently-sized data arrays, like the length of a sentence. This method returns a dictionary of all of those things, mapping a length key to an int.
If any of the entries in this dictionary is
None
, the padding code will calculate a padding length from the data itself. This could either be a good idea or a bad idea - if you have outliers in your data, you could be wasting a whole lot of memory and computation time if you pad the whole dataset to the size of the outlier. On the other hand, if you do batch-specific padding, this can save you a whole lot of time, if you group batches by similar lengths.Here we return the lengths that are applicable to encoding words and sentences. If you have additional padding dimensions, call super().get_padding_lengths() and then update the dictionary.
-
get_padding_memory_scaling
(padding_lengths: typing.Dict[str, int]) → int[source]¶ This method is for computing adaptive batch sizes. We assume that memory usage is a function that looks like this:
, where
is the memory usage,
is the batch size,
is some constant that depends on how much GPU memory you have and various model hyperparameters, and
is a function outlining how memory usage asymptotically varies with the padding lengths. Our approach will be to let the user effectively set
using the
adaptive_memory_usage_constant
parameter inDataGenerator
. The model (this method) specifies, so we can solve for the batch size
. The more specific you get in specifying
in this function, the better a job we can do in optimizing memory usage.
Parameters: padding_lengths: Dict[str, int]
Dictionary containing padding lengths, mapping keys like
num_sentence_words
to ints. This method computes a function of these ints.Returns: O(p): int
The big-O complexity of the model, evaluated with the specific ints given in
padding_lengths
dictionary.
-
GatedAttentionReader¶
-
class
deep_qa.models.reading_comprehension.gated_attention_reader.
GatedAttentionReader
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.training.text_trainer.TextTrainer
This TextTrainer implements the Gated Attention Reader model described in “Gated-Attention Readers for Text Comprehension” by Dhingra et. al 2016. It encodes the document with a variable number of gated attention layers, and then encodes the query. It takes the dot product of these two final encodings to generate an attention over the words in the document, and it then selects the option with the highest summed or mean weight as the answer.
Parameters: multiword_option_mode: str, optional (default=”mean”)
Describes how to calculate the probability of options that contain multiple words. If “mean”, the probability of the option is taken to be the mean of the probabilities of its constituent words. If “sum”, the probability of the option is taken to be the sum of the probabilities of its constituent words.
num_gated_attention_layers: int, optional (default=3)
The number of gated attention layers to pass the document embedding through. Must be at least 1.
cloze_token: str, optional (default=None)
If not None, the string that represents the cloze token in a cloze question. Used to calculate the attention over the document, as the model does it differently for cloze vs non-cloze datasets.
gating_function: str, optional (default=”*”)
The gating function to use in the Gated Attention layer.
"*"
is for elementwise multiplication,"+"
is for elementwise addition, and"|"
is for concatenation.gated_attention_dropout: float, optional (default=0.3)
The proportion of units to drop out after each gated attention layer.
qd_common_feature: boolean, optional (default=True)
Whether to use the question-document common word feature. This feature simply indicates, for each word in the document, whether it appears in the query and has been shown to improve reading comprehension performance.
-
_build_model
()[source]¶ The basic outline here is that we’ll pass the questions and the document / passage (think of this as a collection of possible answer choices) into a word embedding layer.
-
_set_padding_lengths
(padding_lengths: typing.Dict[str, int])[source]¶ Set the padding lengths of the model.
-
_set_padding_lengths_from_model
()[source]¶ This gets called when loading a saved model. It is analogous to
_set_padding_lengths
, but needs to set all of the values set in that method just by inspecting the loaded model. If we didn’t have this, we would not be able to correctly pad data after loading a model.
-
Text Classification¶
Text classification models take a sequence of text as input and classify it into one of several classes.
Input: Text sequence
Output: Class label
ClassificationModel¶
-
class
deep_qa.models.text_classification.classification_model.
ClassificationModel
(params: deep_qa.common.params.Params)[source]¶ Bases:
deep_qa.training.text_trainer.TextTrainer
A TextTrainer that simply takes word sequences as input (could be either sentences or logical forms), encodes the sequence using a sentence encoder, then uses a few dense layers to decide on some classification label for the text sequence (currently hard-coded for a binary classification decision, but that’s easy to fix if we need to).
We don’t really expect this model to work for question answering - it’s just a sentence classification model. The best it can do is basically to learn word cooccurrence information, similar to how the Salience solver works, and I’m not at all confident that this does that job better than Salience. We’ve implemented this mostly as a simple baseline.
Note that this also can’t actually answer questions at this point. You have to do some post-processing to get from true/false decisions to question answers, and I removed that from TextTrainer to make the code simpler.
-
_build_model
()[source]¶ - train_input: numpy array: int32 (samples, num_words). Left padded arrays of word indices
- from sentences in training data
-
_set_padding_lengths_from_model
()[source]¶ This gets called when loading a saved model. It is analogous to
_set_padding_lengths
, but needs to set all of the values set in that method just by inspecting the loaded model. If we didn’t have this, we would not be able to correctly pad data after loading a model.
-
About Layers¶
Custom layers that we have implemented belong here. These include things like knowledge encoders (which encode the memory component of a memory network), knowledge selectors (which perform an attention over the memory), and entailment models. There’s also an encoders submodule, containing sentence encoders that convert an embedded word (or character) sequence into a vector.
Core Layers¶
Additive¶
-
class
deep_qa.layers.additive.
Additive
(initializer='glorot_uniform', **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
adds a parameter value to each cell in the input tensor, similar to a bias vector in aDense
layer, but this only adds, one value per cell. The value to add is learned.Parameters: initializer: str, optional (default=’glorot_uniform’)
Keras initializer for the additive weight.
-
build
(input_shape)[source]¶ Creates the layer weights.
Must be implemented on all layers that have weights.
- # Arguments
- input_shape: Keras tensor (future input to layer)
- or list/tuple of Keras tensors to reference for weight shape computations.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
-
BiGRUIndexSelector¶
-
class
deep_qa.layers.bigru_index_selector.
BiGRUIndexSelector
(target_index, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer takes 3 inputs: a tensor of document indices, the seq2seq GRU output over the document feeding it in forward, the seq2seq GRU output over the document feeding it in backwards. It also takes one parameter, the word index whose biGRU outputs we want to extract
- Inputs:
- document indices: shape
(batch_size, document_length)
- forward GRU output: shape
(batch_size, document_length, GRU hidden dim)
- backward GRU output: shape
(batch_size, document_length, GRU hidden dim)
- document indices: shape
- Output:
- GRU outputs at index: shape
(batch_size, GRU hidden dim * 2)
- GRU outputs at index: shape
Parameters: target_index : int
The word index to extract the forward and backward GRU output from.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
ComplexConcat¶
-
class
deep_qa.layers.complex_concat.
ComplexConcat
(combination: str, axis: int = -1, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
doesK.concatenate()
on a collection of tensors, but allows for more complex operations thanMerge(mode='concat')
. Specifically, you can perform an arbitrary number of elementwise linear combinations of the vectors, and concatenate all of the results. If you do not need to do this, you should use the regularMerge
layer instead of thisComplexConcat
.Because the inputs all have the same shape, we assume that the masks are also the same, and just return the first mask.
- Input:
- A list of tensors. The tensors that you combine must have the same shape, so that we can do elementwise operations on them, and all tensors must have the same number of dimensions, and match on all dimensions except the concatenation axis.
- Output:
- A tensor with some combination of the input tensors concatenated along a specific dimension.
Parameters: axis : int
The axis to use for
K.concatenate
.combination: List of str
A comma-separated list of combinations to perform on the input tensors. These are either tensor indices (1-indexed), or an arithmetic operation between two tensor indices (valid operations:
*
,+
,-
,/
). For example, these are all valid combination parameters:"1,2"
,"1,2*3"
,"1-2,2-1"
,"1,1*1"
, and"1,2,1*2"
.-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
Highway¶
L1Normalize¶
-
class
deep_qa.layers.l1_normalize.
L1Normalize
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer normalizes a tensor by its L1 norm. This could just be a
Lambda
layer that calls ourtensors.l1_normalize
function, except thatLambda
layers do not properly handle masked input.The expected input to this layer is a tensor of shape
(batch_size, x)
, with an optional mask of the same shape. We also accept as input a tensor of shape(batch_size, x, 1)
, which will be squeezed to shape(batch_size, x)
(though the mask must still be of shape(batch_size, x)
).We give no output mask, as we expect this to only be used at the end of the model, to get a final probability distribution over class labels. If you need this to propagate the mask for your model, it would be pretty easy to change it to optionally do so - submit a PR.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
NoisyOr¶
-
class
deep_qa.layers.noisy_or.
BetweenZeroAndOne
[source]¶ Bases:
keras.constraints.Constraint
Constrains the weights to be between zero and one
-
class
deep_qa.layers.noisy_or.
NoisyOr
(axis=-1, name='noisy_or', param_init='uniform', noise_param_constraint=None, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This layer takes as input a tensor of probabilities and calculates the noisy-or probability across a given axis based on the noisy-or equation:
where :math`q` is the noise parameter.
- Inputs:
- probabilities: shape
(batch, ..., N, ...)
Optionally takes a mask of the same shape, where N is the number of y’s in the above equation (i.e. the number of probabilities being combined in the product), in the dimension corresponding to the specified axis.
- probabilities: shape
- Output:
- X: shape
(batch, ..., ...)
The output has one less dimension than the input, and has an optional mask of the same shape. The lost dimension corresponds to the specified axis. The output mask is the result ofK.any()
on the input mask, along the specified axis.
- X: shape
Parameters: axis : int, default=-1
The axis over which to combine probabilities.
name : string, default=’noisy_or’
Name of the layer, ued to debug both the layer and its parameter.
param_init : string, default=’uniform’
The initialization of the noise parameter.
noise_param_constraint : Keras Constraint, default=None
Optional, a constraint which would be applied to the noise parameter.
OptionAttentionSum¶
-
class
deep_qa.layers.option_attention_sum.
OptionAttentionSum
(multiword_option_mode='mean', **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer takes three inputs: a tensor of document indices, a tensor of document probabilities, and a tensor of answer options. In addition, it takes a parameter: a string describing how to calculate the probability of options that consist of multiple words. We compute the probability of each of the answer options in the fashion described in the paper “Text Comprehension with the Attention Sum Reader Network” (Kadlec et. al 2016).
- Inputs:
- document indices: shape
(batch_size, document_length)
- document probabilities: shape
(batch_size, document_length)
- options: shape
(batch size, num_options, option_length)
- document indices: shape
- Output:
- option_probabilities
(batch_size, num_options)
- option_probabilities
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
Overlap¶
-
class
deep_qa.layers.overlap.
Overlap
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer takes 2 inputs: a
tensor_a
(e.g. a document) and atensor_b
(e.g. a question). It returns a one-hot vector suitable for feature representation with the same shape astensor_a
, indicating at each index whether the element intensor_a
appears intensor_b
. Note that the output is not the same shape astensor_a
.- Inputs:
- tensor_a: shape
(batch_size, length_a)
- tensor_b shape
(batch_size, length_b)
- tensor_a: shape
- Output:
- Collection of one-hot vectors indicating
overlap: shape
(batch_size, length_a, 2)
- Collection of one-hot vectors indicating
overlap: shape
Notes
This layer is used to implement the “Question Evidence Common Word Feature” discussed in section 3.2.4 of Dhingra et. al, 2016.
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
SubtractMinimum¶
-
class
deep_qa.layers.subtract_minimum.
SubtractMinimum
(axis: int, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This layer is used to normalize across a tensor axis. Normalization is done by finding the minimum value across the specified axis, and then subtracting that value from all values (again, across the spcified axis). Note that this also works just fine if you want to find the minimum across more than one axis.
- Inputs:
- A tensor with arbitrary dimension, and a mask of the same shape (currently doesn’t support masks with other shapes).
- Output:
- The same tensor, with the minimum across one (or more) of the dimensions subtracted.
Parameters: axis: int
The axis (or axes) across which to find the minimum. Can be a single int, a list of ints, or None. We just call K.min with this parameter, so anything that’s valid there works here too.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
VectorMatrixMerge¶
-
class
deep_qa.layers.vector_matrix_merge.
VectorMatrixMerge
(concat_axis: int, mask_concat_axis: int = None, propagate_mask: bool = True, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
takes a tensor withK
modes and a collection of other tensors withK - 1
modes, and concatenates the lower-order tensors at the beginning of the higher-order tensor along a given mode. We call this a vector-matrix merge to evoke the notion of appending vectors onto a matrix, but this will also work with higher-order tensors.For example, if you have a memory tensor of shape
(batch_size, knowledge_length, encoding_dim)
, containingknowledge_length
encoded sentences, you could use this layer to concatenateN
individual encoded sentences with it, resulting in a tensor of shape(batch_size, N + knowledge_length, encoding_dim)
.This layer supports masking - we will pass through whatever mask you have on the matrix, and concatenate ones to it, similar to how to we concatenate the inputs. We need to know what axis to do that concatenation on, though - we’ll default to the input concatenation axis, but you can specify a different one if you need to. We just ignore masks on the vectors, because doing the right thing with masked vectors here is complicated. If you want to handle that later, submit a PR.
This
Layer
is essentially the opposite of aVectorMatrixSplit
.Parameters: concat_axis: int
The axis to concatenate the vectors and matrix on.
mask_concat_axis: int, optional (default=None)
The axis to concatenate the masks on (defaults to
concat_axis
ifNone
)-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
-
VectorMatrixSplit¶
-
class
deep_qa.layers.vector_matrix_split.
VectorMatrixSplit
(split_axis: int, mask_split_axis: int = None, propagate_mask: bool = True, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer takes a tensor with K modes and splits it into a tensor with K - 1 modes and a tensor with K modes, but one less row in one of the dimensions. We call this a vector-matrix split to evoke the notion of taking a row- (or column-) vector off of a matrix and returning both the vector and the remaining matrix, but this will also work with higher-order tensors.
For example, if you have a sentence that has a combined (word + characters) representation of the tokens in the sentence, you’d have a tensor of shape (batch_size, sentence_length, word_length + 1). You could split that using this Layer into a tensor of shape (batch_size, sentence_length) for the word tokens in the sentence, and a tensor of shape (batch_size, sentence_length, word_length) for the character for each word token.
This layer supports masking - we will split the mask the same way that we split the inputs.
This Layer is essentially the opposite of a VectorMatrixMerge.
-
compute_mask
(inputs, input_mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
-
Attention¶
Attention¶
-
class
deep_qa.layers.attention.attention.
Attention
(similarity_function: typing.Dict[str, typing.Any] = None, normalize: bool = True, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer takes two inputs: a vector and a matrix. We compute the similarity between the vector and each row in the matrix, and then (optionally) perform a softmax over rows using those computed similarities. We handle masking properly for masked rows in the matrix, though we ignore any masking on the vector.
By default similarity is computed with a dot product, but you can alternatively use a parameterized similarity function if you wish.
Inputs:
- vector: shape
(batch_size, embedding_dim)
, mask is ignored if provided - matrix: shape
(batch_size, num_rows, embedding_dim)
, with mask(batch_size, num_rows)
Output:
- attention: shape
(batch_size, num_rows)
. Ifnormalize
isTrue
, we return no mask, as we’ve already applied it (masked input rows have value 0 in the output). Ifnormalize
isFalse
, we return the matrix mask, if there was one.
Parameters: similarity_function_params :
Dict[str, Any]
, optional (default:{}
)These parameters get passed to a similarity function (see
deep_qa.tensors.similarity_functions
for more info on what’s acceptable). The default similarity function with no parameters is a simple dot product.normalize :
bool
, optional (default:True
)If true, we normalize the computed similarities with a softmax, to return a probability distribution for your attention. If false, this is just computing a similarity score.
-
build
(input_shape)[source]¶ Creates the layer weights.
Must be implemented on all layers that have weights.
- # Arguments
- input_shape: Keras tensor (future input to layer)
- or list/tuple of Keras tensors to reference for weight shape computations.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
- vector: shape
GatedAttention¶
-
class
deep_qa.layers.attention.gated_attention.
GatedAttention
(gating_function='*', **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This layer implements the majority of the Gated Attention module described in “Gated-Attention Readers for Text Comprehension” by Dhingra et. al 2016.
The module is described in section 3.2.2. For each token
in
, the GA module forms a “token-specific representation” of the query
using soft attention, and then multiplies the query representation element-wise with the document token representation.
(
is element-wise multiplication)
This layer implements equations 2 and 3 above but in a batched manner to get
, a tensor with all
. Thus, the input to the layer is
(
normalized_qd_attention
), a tensor with all, as well as
(
question_matrix
), and(
document_matrix
), a tensor with all. Equation 6 uses element-wise multiplication to model the interactions between
and
, and the paper reports results when using other such gating functions like sum or concatenation.
- Inputs:
document_
, a matrix of shape(batch, document length, biGRU hidden length)
. Represents the document as encoded by the biGRU.question_matrix
, a matrix of shape(batch, question length, biGRU hidden length)
. Represents the question as encoded by the biGRU.normalized_qd_attention
, the soft attention over the document and question. Matrix of shape(batch, document length, question length)
.
- Output:
X
, a tensor of shape(batch, document length, biGRU hidden length)
if the gating function is*
or+
, or(batch, document length, biGRU hidden length * 2)
if the gating function is||
This serves as a representation of each token in the document.
Parameters: gating_function : string, default=”*”
The gating function to use for modeling the interactions between the document and query token. Supported gating functions are
"*"
for elementwise multiplication,"+"
for elementwise addition, and"||"
for concatenation.Notes
To find out how we calculated equation 1, see the GatedAttentionReader model (roughly, a
masked_batch_dot
and amasked_softmax
)-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
MaskedSoftmax¶
-
class
deep_qa.layers.attention.masked_softmax.
MaskedSoftmax
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer performs a masked softmax. This could just be a Lambda layer that calls our tensors.masked_softmax function, except that Lambda layers do not properly handle masked input.
The expected input to this layer is a tensor of shape (batch_size, num_options), with a mask of the same shape. We also accept an input tensor of shape (batch_size, num_options, 1), which we will squeeze to be (batch_size, num_options) (though the mask must still be (batch_size, num_options)).
While we give the expected input as having two modes, we also accept higher-order tensors. In those cases, we’ll first perform a last_dim_flatten on both the input and the mask, so that we always do the softmax over a single dimension (the last one).
We give no output mask, as we expect this to only be used at the end of the model, to get a final probability distribution over class labels (and it’s a softmax, so you’ll have zeros in the tensor itself; do you really still need a mask?). If you need this to propagate the mask for whatever reason, it would be pretty easy to change it to optionally do so - submit a PR.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
MatrixAttention¶
-
class
deep_qa.layers.attention.matrix_attention.
MatrixAttention
(similarity_function: typing.Dict[str, typing.Any] = None, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
takes two matrices as input and returns a matrix of attentions.We compute the similarity between each row in each matrix and return unnormalized similarity scores. We don’t worry about zeroing out any masked values, because we propagate a correct mask.
By default similarity is computed with a dot product, but you can alternatively use a parameterized similarity function if you wish.
This is largely similar to using
TimeDistributed(Attention)
, except the result is unnormalized, and we return a mask, so you can do a masked normalization with the result. You should use this instead ofTimeDistributed(Attention)
if you want to compute multiple normalizations of the attention matrix.- Input:
- matrix_1:
(batch_size, num_rows_1, embedding_dim)
, with mask(batch_size, num_rows_1)
- matrix_2:
(batch_size, num_rows_2, embedding_dim)
, with mask(batch_size, num_rows_2)
- matrix_1:
- Output:
(batch_size, num_rows_1, num_rows_2)
, with mask of same shape
Parameters: similarity_function_params: Dict[str, Any], default={}
These parameters get passed to a similarity function (see
deep_qa.tensors.similarity_functions
for more info on what’s acceptable). The default similarity function with no parameters is a simple dot product.-
build
(input_shape)[source]¶ Creates the layer weights.
Must be implemented on all layers that have weights.
- # Arguments
- input_shape: Keras tensor (future input to layer)
- or list/tuple of Keras tensors to reference for weight shape computations.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
MaxSimilaritySoftmax¶
-
class
deep_qa.layers.attention.max_similarity_softmax.
MaxSimilaritySoftmax
(knowledge_axis, max_knowledge_length, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This layer takes encoded questions and knowledge in a multiple choice setting and computes the similarity between each of the question embeddings and the background knowledge, and returns a softmax over the options.
Inputs:
- encoded_questions (batch_size, num_options, encoding_dim)
- encoded_knowledge (batch_size, num_options, knowledge_length, encoding_dim)
Output:
- option_probabilities (batch_size, num_options)
This is a pretty niche layer that does a very specific computation. We only made it its own class instead of a
Lambda
layer so that we could handle masking correctly, whichLambda
does not.-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
WeightedSum¶
-
class
deep_qa.layers.attention.weighted_sum.
WeightedSum
(use_masking: bool = True, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
takes a matrix of vectors and a vector of row weights, and returns a weighted sum of the vectors. You might use this to get some aggregate sentence representation after computing an attention over the sentence, for example.Inputs:
- matrix:
(batch_size, num_rows, embedding_dim)
, with mask(batch_size, num_rows)
- vector:
(batch_size, num_rows)
, mask is ignored
Outputs:
- A weighted sum of the rows in the matrix, with shape
(batch_size, embedding_dim)
, with mask=``None``.
Parameters: use_masking: bool, default=True
If true, we will apply the input mask to the matrix before doing the weighted sum. If you’ve computed your vector weights with masking, so that masked entries are 0, this is unnecessary, and you can set this parameter to False to avoid an expensive computation.
Notes
You probably should have used a mask when you computed your attention weights, so any row that’s masked in the matrix should already be 0 in the attention vector. But just in case you didn’t, we’ll handle a mask on the matrix here too. If you know that you did masking right on the attention, you can optionally remove the mask computation here, which will save you a bit of time and memory.
While the above spec shows inputs with 3 and 2 modes, we also allow inputs of any order; we always sum over the second-to-last dimension of the “matrix”, weighted by the last dimension of the “vector”. Higher-order tensors get complicated for matching things, though, so there is a hard constraint: all dimensions in the “matrix” before the final embedding must be matched in the “vector”.
For example, say I have a “matrix” with dimensions (batch_size, num_queries, num_words, embedding_dim), representing some kind of embedding or encoding of several multi-word queries. My attention “vector” must then have at least those dimensions, and could have more. So I could have an attention over words per query, with shape (batch_size, num_queries, num_words), or I could have an attention over query words for every document in some list, with shape (batch_size, num_documents, num_queries, num_words). Both of these cases are fine. In the first case, the returned tensor will have shape (batch_size, num_queries, embedding_dim), and in the second case, it will have shape (batch_size, num_documents, num_queries, embedding_dim). But you can’t have an attention “vector” that does not include all of the queries, so shape (batch_size, num_words) is not allowed - you haven’t specified how to handle that dimension in the “matrix”, so we can’t do anything with this input.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shapes)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
- matrix:
Backend Layers¶
Layers in this module generally just implement some simple operation from the Keras backend as a Layer. The reason we have these as Layers is largely so that we can properly handle masking.
AddMask¶
-
class
deep_qa.layers.backend.add_mask.
AddMask
(mask_value: float = 0.0, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
adds a mask to a tensor. It is intended solely for testing, though if you have a use case for this outside of testing, feel free to use it. Thecall()
method just returns the inputs, and thecompute_mask
method callsK.not_equal(inputs, mask_value)
, and that’s it. This is different from Keras’Masking
layer, which assumes higher-order input and does aK.any()
call incompute_mask
.- Input:
- tensor: a tensor of arbitrary shape
- Output:
- the same tensor, now with a mask attached of the same shape
Parameters: mask_value: float, optional (default=0.0)
This is the value that we will compare to in
compute_mask
.-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
BatchDot¶
-
class
deep_qa.layers.backend.batch_dot.
BatchDot
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
callsK.batch_dot()
on two inputstensor_a
andtensor_b
. This function will work for tensors of arbitrary size as long asabs(K.ndim(tensor_a) - K.ndim(tensor_b)) < 1
, due to limitations inK.batch_dot()
. When the input tensors have more than three dimensions, they must have the same shape, except for the last two dimensions. See the examples for more explanation of what this means.We always assume the dimension to perform the dot is the last one, and that the masks have one fewer dimension that the tensors. Note that this layer does not return zeroes in places that are masked, but does pass a correct mask forward. If this then gets fed into
masked_softmax
, for instance, your tensor will be correctly normalized. We always assume the dimension to perform the dot is the last one, and that the masks have one fewer dimension than the tensors.- Inputs:
- tensor_a: tensor with
ndim >= 2
. - tensor_b: tensor with
ndim >= 2
.
- tensor_a: tensor with
- Output:
- a_dot_b
Examples
The following examples will try to give some insight on how this layer works in relation to
K.batch_dot()
. Note that the Keras documentation (as of 2/13/17) onK.batch_dot
is incorrect, and that this layer behaves differently from the documented behavior.As a first example, let’s suppose that
tensor_a
andtensor_b
have the same number of dimensions. Let the shape oftensor_a
be(2, 3, 2)
, and let the shape oftensor_b
be(2, 4, 2)
. The mask accompanying these inputs always has one less dimension, so thetensor_a_mask
has shape(2, 3)
andtensor_b_mask
has shape(2, 4)
. The shape of thebatch_dot
output would thus be(2, 3, 4)
. This is because we are taking the batch dot of the last dimension, so the output shape is(2, 3)
(from tensor_a) with(4)
(from tensor_b) appended on (to get(2, 3, 4)
in total). The output mask has the same shape as the output, and is thus(2, 3, 4)
as well.>>> import keras.backend as K >>> tensor_a = K.ones(shape=(2, 3, 2)) >>> tensor_b = K.ones(shape=(2, 4, 2)) >>> K.eval(K.batch_dot(tensor_a, tensor_b, axes=(2,2))).shape (2, 3, 4)
Next, let’s look at an example where
tensor_a
andtensor_b
are “uneven” (different number of dimensions). Let the shape oftensor_a
be(2, 4, 2)
, and let the shape oftensor_b
be(2, 4, 3, 2)
. The mask accompanying these inputs always has one less dimension, so thetensor_a_mask
has shape(2, 4)
andtensor_b_mask
has shape(2, 4, 3)
. The shape of thebatch_dot
output would thus be(2, 4, 3)
. In the case of uneven tensors, we always expand the last dimension of the smaller tensor to make them even. Thus in this case, we expandtensor_a
to get a new shape of(2, 4, 2, 1)
. Now we are taking thebatch_dot
of a tensor with shape(2, 4, 2, 1)
and(2, 4, 3, 2)
. Note that the first two dimensions of this tensor are the same(2, 4)
– this is a requirement imposed byK.batch_dot
. Following the methodology of calculating the output shape above, we get that the output is(2, 4, 1, 3)
since we get(2, 4, 1)
fromtensor_a
and(3)
fromtensor_b
. We then squeeze the tensor to remove the 1-dimension to get a final shape of(2, 4, 3)
. Note that the mask has the same shape.>>> import keras.backend as K >>> tensor_a = K.ones(shape=(2, 4, 2)) >>> tensor_b = K.ones(shape=(2, 4, 3, 2)) >>> tensor_a_expanded = K.expand_dims(tensor_a, axis=-1) >>> unsqueezed_bd = K.batch_dot(tensor_a_expanded, tensor_b, axes=(2,3)) >>> final_bd = K.squeeze(unsqueezed_bd, axis=K.ndim(tensor_a)-1) >>> K.eval(final_bd).shape (2, 4, 3)
Lastly, let’s look at the uneven case where
tensor_a
has more dimensions thantensor_b
. Let the shape oftensor_a
be(2, 3, 4, 2)
, and let the shape oftensor_b
be(2, 3, 2)
. Since the mask accompanying these inputs always has one less dimension,tensor_a_mask
has shape(2, 3, 4)
andtensor_b_mask
has shape(2, 3)
. The shape of thebatch_dot
output would thus be(2, 3, 4)
. Since these tensors are uneven, expand the smaller tensor,tensor_b
, to get a new shape of(2, 3, 2, 1)
. Now we are taking thebatch_dot
of a tensor with shape(2, 3, 4, 2)
and(2, 3, 2, 1)
. Note again that the first two dimensions of this tensor are the same(2, 3)
. We can see that the output shape is(2, 3, 4, 1)
since we get(2, 3, 4)
fromtensor_a
and(1)
fromtensor_b
. We then squeeze the tensor to remove the 1-dimension to get a final shape of(2, 3, 4)
. Note that the mask has the same shape.>>> import keras.backend as K >>> tensor_a = K.ones(shape=(2, 3, 4, 2)) >>> tensor_b = K.ones(shape=(2, 3, 2)) >>> tensor_b_expanded = K.expand_dims(tensor_b, axis=-1) >>> unsqueezed_bd = K.batch_dot(tensor_a, tensor_b_expanded, axes=(3, 2)) >>> final_bd = K.squeeze(unsqueezed_bd, axis=K.ndim(tensor_a)-1) >>> K.eval(final_bd).shape (2, 3, 4)
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
CollapseToBatch¶
-
class
deep_qa.layers.backend.collapse_to_batch.
CollapseToBatch
(num_to_collapse: int, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
Reshapes a higher order tensor, taking the first
num_to_collapse
dimensions after the batch dimension and folding them into the batch dimension. For example, a tensor of shape (2, 4, 5, 3), collapsed withnum_to_collapse = 2
, would become a tensor of shape (40, 3). We perform identical computation on the input mask, if there is one.This is essentially what Keras’
TimeDistributed
layer does (and then undoes) to apply a layer to a higher-order tensor, and that’s the intended use for this layer. However,TimeDistributed
cannot handle distributing across dimensions with unknown lengths at graph compilation time. This layer works even in that case. So, if your actual tensor shape at graph compilation time looks like (None, None, None, 3), or (None, 4, None, 3), you can still use this layer (andExpandFromBatch
) to get the same result asTimeDistributed
. If your shapes are fully known at graph compilation time, just useTimeDistributed
, as it’s a nicer API for the same functionality.- Inputs:
- tensor with
ndim >= 3
- tensor with
- Output:
- tensor with
ndim = input_ndim - num_to_collapse
, with the removed dimensions folded into the first (batch-size) dimension
- tensor with
Parameters: num_to_collapse: int
The number of dimensions to fold into the batch size.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
ExpandFromBatch¶
-
class
deep_qa.layers.backend.expand_from_batch.
ExpandFromBatch
(num_to_expand: int, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
Reshapes a collapsed tensor, taking the batch size and separating it into
num_to_expand
dimensions, following the shape of a second input tensor. This is meant to be used in conjunction withCollapseToBatch
, to achieve the same effect as Keras’TimeDistributed
layer, but for shapes that are not fully specified at graph compilation time.For example, say you had an original tensor of shape
(None (2), 4, None (5), 3)
, then collapsed it withCollapseToBatch(2)(tensor)
to get a tensor with shape(None (40), 3)
(here I’m usingNone (x)
to denote a dimension with unknown length at graph compilation time, wherex
is the actual runtime length). You can then callExpandFromBatch(2)(collapsed, tensor)
with the result to expand the first two dimensions out of the batch again (presumably after you’ve done some computation when it was collapsed).- Inputs:
- a tensor that has been collapsed with
CollapseToBatch(num_to_expand)
. - the original tensor that was used as input to
CollapseToBatch
(or one with identical shape in the collapsed dimensions). We will use this input only to get its shape.
- a tensor that has been collapsed with
- Output:
- tensor with
ndim = input_ndim + num_to_expand
, with the additional dimensions coming immediately after the first (batch-size) dimension.
- tensor with
Parameters: num_to_expand: int
The number of dimensions to expand from the batch size.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
Envelope¶
-
class
deep_qa.layers.backend.envelope.
Envelope
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
Given a probability distribution over a begin index and an end index of some sequence, this
Layer
computes an envelope over the sequence, a probability that each element lies within “begin” and “end”.Specifically, the computation done here is the following:
after_span_begin = K.cumsum(span_begin, axis=-1) after_span_end = K.cumsum(span_end, axis=-1) before_span_end = 1 - after_span_end envelope = after_span_begin * before_span_end
- Inputs:
- span_begin: tensor with shape
(batch_size, sequence_length)
, representing a probability distribution over a start index in the sequence - span_end: tensor with shape
(batch_size, sequence_length)
, representing a probability distribution over an end index in the sequence
- span_begin: tensor with shape
- Outputs:
- envelope: tensor with shape
(batch_size, sequence_length)
, representing a probability for each index of the sequence belonging in the span
- envelope: tensor with shape
If there is a mask associated with either of the inputs, we ignore it, assuming that you used the mask correctly when you computed your probability distributions. But we support masking in this layer, so that you have an output mask if you really need it. We just return the first mask that is not
None
(orNone
, if both areNone
).-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
Max¶
-
class
deep_qa.layers.backend.max.
Max
(axis: int = -1, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
performs a max over some dimension. Keras has a similar layer calledGlobalMaxPooling1D
, but it is not as configurable as this one, and it does not support masking.If the mask is not
None
, it must be the same shape as the input.- Input:
- A tensor of arbitrary shape (having at least 3 dimensions).
- Output:
- A tensor with one less dimension, where we have taken a max over one of the dimensions.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
Permute¶
-
class
deep_qa.layers.backend.permute.
Permute
(pattern: typing.Tuple[int], **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
callsK.permute_dimensions
on both the input and the mask.If the mask is not
None
, it must have the same shape as the input.- Input:
- A tensor of arbitrary shape.
- Output:
- A tensor with permuted dimensions.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
Repeat¶
-
class
deep_qa.layers.backend.repeat.
Repeat
(axis: int, repetitions: int, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
callsK.repeat_elements
on both the input and the mask, after callingK.expand_dims
.If the mask is not
None
, we must be able to callK.expand_dims
using the same axis parameter as we do for the input.- Input:
- A tensor of arbitrary shape.
- Output:
- The input tensor repeated along one of the dimensions.
Parameters: axis: int
We will add a dimension to the input tensor at this axis.
repetitions: int
The new dimension will have this size to it, with each slice being identical to the original input tensor.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
RepeatLike¶
-
class
deep_qa.layers.backend.repeat_like.
RepeatLike
(axis: int, copy_from_axis: int, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This
Layer
is likeRepeat
, but gets the number of repetitions to use from a second input tensor. This allows doing a number of repetitions that is unknown at graph compilation time, and is necessary when therepetitions
argument toRepeat
would beNone
.If the mask is not
None
, we must be able to callK.expand_dims
using the same axis parameter as we do for the input.- Input:
- A tensor of arbitrary shape, which we will expand and tile.
- A second tensor whose shape along one dimension we will copy
- Output:
- The input tensor repeated along one of the dimensions.
Parameters: axis: int
We will add a dimension to the input tensor at this axis.
copy_from_axis: int
We will copy the dimension from the second tensor at this axis.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
Encoders¶
BagOfWords¶
-
class
deep_qa.layers.encoders.bag_of_words.
BOWEncoder
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
Bag of Words Encoder takes a matrix of shape (num_words, word_dim) and returns a vector of size (word_dim), which is an average of the (unmasked) rows in the input matrix. This could have been done using a Lambda layer, except that Lambda layer does not support masking (as of Keras 1.0.7).
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
ConvolutionalEncoder¶
-
class
deep_qa.layers.encoders.convolutional_encoder.
CNNEncoder
(units: int, num_filters: int, ngram_filter_sizes: typing.Tuple[int] = (2, 3, 4, 5), conv_layer_activation: str = 'relu', l1_regularization: float = None, l2_regularization: float = None, **kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
CNNEncoder is a combination of multiple convolution layers and max pooling layers. This is defined as a single layer to be consistent with the other encoders in terms of input and output specifications. The input to this “layer” is of shape (batch_size, num_words, embedding_dim) and the output is of size (batch_size, output_dim).
The CNN has one convolution layer per each ngram filter size. Each convolution operation gives out a vector of size num_filters. The number of times a convolution layer will be used depends on the ngram size: input_length - ngram_size + 1. The corresponding maxpooling layer aggregates all these outputs from the convolution layer and outputs the max.
This operation is repeated for every ngram size passed, and consequently the dimensionality of the output after maxpooling is len(ngram_filter_sizes) * num_filters.
We then use a fully connected layer to project in back to the desired output_dim. For more details, refer to “A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification”, Zhang and Wallace 2016, particularly Figure 1.
Parameters: units: int
After doing convolutions, we’ll project the collected features into a vector of this size. This used to be
output_dim
, but Keras changed it tounits
. I prefer the nameoutput_dim
, so we’ll leave the code usingoutput_dim
, and just use the nameunits
in the external API.num_filters: int
This is the output dim for each convolutional layer, which is the same as the number of “filters” learned by that layer.
ngram_filter_sizes: Tuple[int], optional (default=(2, 3, 4, 5))
This specifies both the number of convolutional layers we will create and their sizes. The default of (2, 3, 4, 5) will have four convolutional layers, corresponding to encoding ngrams of size 2 to 5 with some number of filters.
conv_layer_activation: str, optional (default=’relu’)
l1_regularization: float, optional (default=None)
l2_regularization: float, optional (default=None)
-
build
(input_shape)[source]¶ Creates the layer weights.
Must be implemented on all layers that have weights.
- # Arguments
- input_shape: Keras tensor (future input to layer)
- or list/tuple of Keras tensors to reference for weight shape computations.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
-
PositionalEncoder¶
-
class
deep_qa.layers.encoders.positional_encoder.
PositionalEncoder
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
A
PositionalEncoder
is very similar to a kind of weighted bag of words encoder, where the weighting is done by an index-dependent vector, not a scalar. If you think this is an odd thing to do, it is. The original authors provide no real reasoning behind the exact method other than it takes into account word order. This is here mainly to reproduce results for comparison.It takes a matrix of shape (num_words, word_dim) and returns a vector of size (word_dim), which implements the following linear combination of the rows:
representation = sum_(j=1)^(n) { l_j * w_j }
where w_j is the j-th word representation in the sentence and l_j is a vector defined as follows:
l_kj = (1 - j)/m - (k/d)((1-2j)/m)
- where:
- j is the word sentence index.
- m is the sentence length.
- k is the vector index(ie the k-th element of a vector).
- d is the dimension of the embedding.
- represents element-wise multiplication.
This method was originally introduced in End-To-End Memory Networks(pg 4-5): https://arxiv.org/pdf/1503.08895v5.pdf
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
AttentiveGRU¶
-
class
deep_qa.layers.encoders.attentive_gru.
AttentiveGru
(output_dim, input_length, **kwargs)[source]¶ Bases:
keras.layers.recurrent.GRU
GRUs typically operate over sequences of words. The motivation behind this encoding is that a weighted average loses ordering information over it’s inputs - for instance, this is important in the BABI tasks.
See Dynamic Memory Networks for more information: https://arxiv.org/pdf/1603.01417v1.pdf. This class extends the Keras Gated Recurrent Unit by implementing a method which substitutes the GRU update gate (normally a vector, z - it is noted below where it is normally computed) for a scalar attention weight (one per input, such as from the output of a softmax over the input vectors), which is pre-computed. As mentioned above, instead of using word embedding sequences as input to the GRU, we are using sentence encoding sequences.
The implementation of this class is subtle - it is only very slightly different from a standard GRU. When it is initialised, the Keras backend will call the build method. It uses this to check that inputs being passed to this function are the correct size, so we allow this to be the actual input size as normal. However, for the internal implementation, everywhere where this global shape is used, we override it to be one less, as we are passing in a tensor of shape (batch, knowledge_length, 1 + encoding_dim) as we are including the attention mask. Therefore, we need all of the weights to have shape (, encoding_dim), NOT (, 1 + encoding_dim). All of the below methods which are overridden use some form of this dimension, so we correct them.
-
build
(input_shape)[source]¶ This is used by Keras to verify things, but also to build the weights. The only differences from the Keras GRU (which we copied exactly other than the below) are: We generate weights with dimension input_dim[2] - 1, rather than dimension input_dim[2]. There are a few variables which are created in non-‘gpu’ modes which are not required. These are commented out but left in for clarity below.
-
preprocess_input
(inputs, training=None)[source]¶ We have to override this preprocessing step, because if we are using the cpu, we do the weight - input multiplications in the internals of the GRU as separate, smaller matrix multiplications and concatenate them after. Therefore, before this happens, we split off the attention and then add it back afterwards.
-
step
(inputs, states)[source]¶ The input to step is a tensor of shape (batch, 1 + encoding_dim), i.e. a timeslice of the input to this AttentiveGRU, where the time axis is the knowledge_length. Before we start, we strip off the attention from the beginning. Then we do the equations for a normal GRU, except we don’t calculate the output gate z, substituting the attention weight for it instead. Note that there is some redundancy here - for instance, in the GPU mode, we do a larger matrix multiplication than required, as we don’t use one part of it. However, for readability and similarity to the original GRU code in Keras, it has not been changed. In each section, there are commented out lines which contain code. If you were to uncomment these, remove the differences in the input size and replace the attention with the z gate at the output, you would have a standard GRU back again. We literally copied the Keras GRU code here, making some small modifications.
-
Entailment Model Layers¶
DecomposableAttention¶
-
class
deep_qa.layers.entailment_models.decomposable_attention.
DecomposableAttentionEntailment
(num_hidden_layers: int = 1, hidden_layer_width: int = 50, hidden_layer_activation: str = 'relu', final_activation: str = 'softmax', output_dim: int = 3, initializer: str = 'uniform', **kwargs)[source]¶ Bases:
deep_qa.layers.entailment_models.word_alignment.WordAlignmentEntailment
This layer is a reimplementation of the entailment algorithm described in “A Decomposable Attention Model for Natural Language Inference”, Parikh et al., 2016. The algorithm has three main steps:
- Attend: Compute dot products between all pairs of projections of words in the hypothesis and the premise, normalize those dot products to use them to align each word in premise to a phrase in the hypothesis and vice-versa. These alignments are then used to summarize the aligned phrase in the other sentence as a weighted sum. The initial word projections are computed using a feed forward NN, F.
- Compare: Pass a concatenation of each word in the premise and the summary of its aligned phrase in the hypothesis through a feed forward NN, G, to get a projected comparison. Do the same with the hypothesis and the aligned phrase from the premise.
- Aggregate: Sum over the comparisons to get a single vector each for premise-hypothesis comparison, and hypothesis-premise comparison. Pass them through a third feed forward NN (H), to get the entailment decision.
This layer can take either a tuple (premise, hypothesis) or a concatenation of them as input.
Input:
- Tuple input: a premise sentence and a hypothesis sentence, both with shape
(batch_size, sentence_length, embed_dim)
and masks of shape(batch_size, sentence_length)
- Single input: a single tensor of shape
(batch_size, sentence_length * 2, embed_dim)
, with a mask of shape(batch_size, sentence_length * 2)
, which we will split in half to get the premise and hypothesis sentences.
Output:
- Entailment decisions with the given
output_dim
.
Parameters: num_hidden_layers: int, optional (default=1)
Number of hidden layers in each of the feed forward neural nets described above.
hidden_layer_width: int, optional (default=50)
Width of each hidden layer in each of the feed forward neural nets described above.
hidden_layer_activation: str, optional (default=’relu’)
Activation for each hidden layer in each of the feed forward neural nets described above.
final_activation: str, optional (default=’softmax’)
Activation to use for the final output. Should almost certainly be ‘softmax’.
output_dim: int, optional (default=3)
Dimensionality of the final output. If this is the last layer in your model, this needs to be the same as the number of labels you have.
initializer: str, optional (default=’uniform’)
Will be passed to
self.add_weight()
for each of the weight matrices in the feed forward neural nets described above.Notes
premise_length = hypothesis_length = sentence_length below.
-
static
_attend
(target_embedding, s2t_alignment)[source]¶ Takes target embedding, and source-target alignment attention and produces a weighted average of the target embedding per each source word.
target_embedding: (batch_size, target_length, embed_dim) s2t_alignment: (batch_size, source_length, target_length)
-
_compare
(source_embedding, s2t_attention)[source]¶ Takes word embeddings from a sentence, and aggregated representations of words aligned to each of those words from another sentence, and returns a projection of their concatenation.
source_embedding: (batch_size, source_length, embed_dim) s2t_attention: (batch_size, source_length, embed_dim)
-
build
(input_shape)[source]¶ This model has three feed forward NNs (F, G and H in the paper). We assume that all three NNs have the same hyper-parameters: num_hidden_layers, hidden_layer_width and hidden_layer_activation. That is, F, G and H have the same structure and activations. Their actual weights are different, though. H has a separate softmax layer at the end.
-
compute_mask
(inputs, mask=None)[source]¶ Computes an output mask tensor.
- # Arguments
- inputs: Tensor or list of tensors. mask: Tensor or list of tensors.
- # Returns
- None or a tensor (or list of tensors,
- one per output tensor of the layer).
-
compute_output_shape
(input_shape)[source]¶ Computes the output shape of the layer.
Assumes that the layer will be built to match that input shape provided.
- # Arguments
- input_shape: Shape tuple (tuple of integers)
- or list of shape tuples (one per output tensor of the layer). Shape tuples can include None for free dimensions, instead of an integer.
- # Returns
- An input shape tuple.
-
get_config
()[source]¶ Returns the config of the layer.
A layer config is a Python dictionary (serializable) containing the configuration of a layer. The same layer can be reinstantiated later (without its trained weights) from this configuration.
The config of a layer does not include connectivity information, nor the layer class name. These are handled by Container (one layer of abstraction above).
- # Returns
- Python dictionary.
MultipleChoiceTupleEntailment¶
-
class
deep_qa.layers.entailment_models.multiple_choice_tuple_entailment.
MultipleChoiceTupleEntailment
(**kwargs)[source]¶ Bases:
deep_qa.layers.entailment_models.word_alignment.WordAlignmentEntailment
A kind of decomposable attention where the premise (or background) is in the form of SVO triples, and entailment is computed by finding the answer in a multiple choice setting that aligns best with the tuples that align with the question. This happens in two steps:
- We use the _align function from WordAlignmentEntailment to find the premise tuples whose SV, or VO pairs align best with the question.
- We then use the _align function again to find the answer that aligns best with the unaligned part of the tuples, weighed by how much they partially align with the question in step 1.
TODO(pradeep): Also match S with question, VO with answer, O with question and SV with answer.
WordAlignment¶
Word alignment entailment models operate on word level representations, and define alignment as a function of how well the words in the premise align with those in the hypothesis. These are different from the encoded sentence entailment models where both the premise and hypothesis are encoded as single vectors and entailment functions are defined on top of them.
At this point this doesn’t quite fit into the memory network setup because the model doesn’t operate on the encoded sentence representations, but instead consumes the word level representations. TODO(pradeep): Make this work with the memory network eventually.
-
class
deep_qa.layers.entailment_models.word_alignment.
WordAlignmentEntailment
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This is an abstract class for word alignment entailment. It defines an _align function.
-
static
_align
(source_embedding, target_embedding, source_mask, target_mask, normalize_alignment=True)[source]¶ Takes source and target sequence embeddings and returns a source-to-target alignment weights. That is, for each word in the source sentence, returns a probability distribution over target_sequence that shows how well each target word aligns (i.e. is similar) to it.
source_embedding: (batch_size, source_length, embed_dim) target_embedding: (batch_size, target_length, embed_dim) source_mask: None or (batch_size, source_length, 1) target_mask: None or (batch_size, target_length, 1) normalize_alignment (bool): Will apply a (masked) softmax over alignments is True.
Returns: s2t_attention: (batch_size, source_length, target_length)
-
static
Wrappers¶
EncoderWrapper¶
-
class
deep_qa.layers.wrappers.encoder_wrapper.
EncoderWrapper
(layer, keep_dims=False, **kwargs)[source]¶ Bases:
deep_qa.layers.wrappers.time_distributed.TimeDistributed
This class TimeDistributes a sentence encoder, applying the encoder to several word sequences. The only difference between this and the regular TimeDistributed is in how we handle the mask. Typically, an encoder will handle masked embedded input, and return None as its mask, as it just returns a vector and no more masking is necessary. However, if the encoder is TimeDistributed, we might run into a situation where _all_ of the words in a given sequence are masked (because we padded the number of sentences, for instance). In this case, we just want to mask the entire sequence. EncoderWrapper returns a mask with the same dimension as the input sequences, where sequences are masked if _all_ of their words were masked.
Notes
For seq2seq encoders, one should use either
TimeDistributed
orTimeDistributedWithMask
sinceEncoderWrapper
reduces the dimensionality of the input mask.
OutputMask¶
-
class
deep_qa.layers.wrappers.output_mask.
OutputMask
(**kwargs)[source]¶ Bases:
deep_qa.layers.masked_layer.MaskedLayer
This Layer is purely for debugging. You can wrap this on a layer’s output to get the mask output by that layer as a model output, for easier visualization of what the model is actually doing.
Don’t try to use this in an actual model.
TimeDistributed¶
-
class
deep_qa.layers.wrappers.time_distributed.
TimeDistributed
(layer, keep_dims=False, **kwargs)[source]¶ Bases:
keras.layers.wrappers.TimeDistributed
This class fixes two bugs in Keras: (1) the input mask is not passed to the wrapped layer, and (2) Keras’ TimeDistributed currently only allows a single input, not a list. We currently don’t handle the case where the _output_ of the wrapped layer is a list, however. (Not that that’s particularly hard, we just haven’t needed it yet, so haven’t implemented it.)
Notes
If the output shape for TimeDistributed has a final dimension of 1, we essentially sqeeze it, reshaping to have one fewer dimension. That change takes place in the actual
call
method as well as thecompute_output_shape
method.
Tensor Utils¶
Here are some general tensor manipulation utilities that we’ve written to help in other parts of the code base.
Core Tensor Utils¶
backend¶
These are utility functions that are similar to calls to Keras’ backend. Some of these are here because a current function in keras.backend is broken, some are things that just haven’t been implemented.
-
deep_qa.tensors.backend.
apply_feed_forward
(input_tensor, weights, activation)[source]¶ Takes an input tensor, sequence of weights and an activation and builds an MLP. This can also be achieved by defining a sequence of Dense layers in Keras, but doing this might be desirable if the operation needs to be done within the call method of a more complex layer. Moreover, we are not applying biases here. The input tensor can have any number of dimensions. But the last dimension, and the sequence of weights are expected to be compatible.
-
deep_qa.tensors.backend.
hardmax
(unnormalized_attention, knowledge_length)[source]¶ A similar operation to softmax, except all of the weight is placed on the mode of the distribution. So, e.g., this function transforms [.34, .2, -1.4] -> [1, 0, 0].
TODO(matt): we really should have this take an optional mask...
-
deep_qa.tensors.backend.
l1_normalize
(tensor_to_normalize, mask=None)[source]¶ Normalize a tensor by its L1 norm. Takes an optional mask.
When the vector to be normalized is all 0’s we return the uniform distribution (taking masking into account, so masked values are still 0.0). When the vector to be normalized is completely masked, we return the uniform distribution over the max padding length of the tensor.
See the tests for concrete examples of the aforementioned behaviors.
Parameters: tensor_to_normalize : Tensor
Tensor of shape (batch size, x) to be normalized, where x is arbitrary.
mask: Tensor, optional
Tensor of shape (batch size, x) indicating which elements of
tensor_to_normalize
are padding and should not be considered when normalizing.Returns: normalized_tensor : Tensor
Normalized tensor with shape (batch size, x).
-
deep_qa.tensors.backend.
last_dim_flatten
(input_tensor)[source]¶ Takes a tensor and returns a matrix while preserving only the last dimension from the input.
-
deep_qa.tensors.backend.
switch
(cond, then_tensor, else_tensor)[source]¶ Keras’ implementation of K.switch currently uses tensorflow’s switch function, which only accepts scalar value conditions, rather than boolean tensors which are treated in an elementwise function. This doesn’t match with Theano’s implementation of switch, but using tensorflow’s where, we can exactly retrieve this functionality.
-
deep_qa.tensors.backend.
tile_scalar
(scalar, vector)[source]¶ NOTE: If your vector has known shape (i.e., the relevant dimension from K.int_shape(vector) is not None), you should just use K.repeat_elements(scalar) instead of this. This method works, however, when the number of entries in your vector is unknown at graph compilation time.
This method takes a (collection of) scalar(s) (shape: (batch_size, 1)), and tiles that scala a number of times, giving a vector of shape (batch_size, tile_length). (I say “scalar” and “vector” here because I’m ignoring the batch_size). We need the vector as input so we know what the tile_length is - the vector is otherwise ignored.
This is not done as a Keras Layer, however; if you want to use this function, you’ll need to do it _inside_ of a Layer somehow, either in a Lambda or in the call() method of a Layer you’re writing.
TODO(matt): we could probably make a more general tile_tensor method, which can do this for any dimenionsality. There is another place in the code where we do this with a matrix and a tensor; all three of these can probably be one function.
-
deep_qa.tensors.backend.
tile_vector
(vector, matrix)[source]¶ NOTE: If your matrix has known shape (i.e., the relevant dimension from K.int_shape(matrix) is not None), you should just use K.repeat_elements(vector) instead of this. This method works, however, when the number of rows in your matrix is unknown at graph compilation time.
This method takes a (collection of) vector(s) (shape: (batch_size, vector_dim)), and tiles that vector a number of times, giving a matrix of shape (batch_size, tile_length, vector_dim). (I say “vector” and “matrix” here because I’m ignoring the batch_size). We need the matrix as input so we know what the tile_length is - the matrix is otherwise ignored.
This is necessary in a number of places in the code. For instance, if you want to do a dot product of a vector with all of the vectors in a matrix, the most efficient way to do that is to tile the vector first, then do an element-wise product with the matrix, then sum out the last mode. So, we capture this functionality here.
This is not done as a Keras Layer, however; if you want to use this function, you’ll need to do it _inside_ of a Layer somehow, either in a Lambda or in the call() method of a Layer you’re writing.
masked_operations¶
-
deep_qa.tensors.masked_operations.
masked_batch_dot
(tensor_a, tensor_b, mask_a, mask_b)[source]¶ The simplest case where this function is applicable is the following:
tensor_a: (batch_size, a_length, embed_dim) tensor_b: (batch_size, b_length, embed_dim) mask_a: None or (batch_size, a_length) mask_b: None or (batch_size, b_length)
Returns: a_dot_b: (batch_size, a_length, b_length), with zeros for masked elements.
This function will also work for larger tensors, as long as abs(K.ndim(tensor_a) - K.ndim(tensor_b)) < 1 (this is due to the limitations of K.batch_dot). We always assume the dimension to perform the dot is the last one, and that the masks have one fewer dimension than the tensors.
-
deep_qa.tensors.masked_operations.
masked_softmax
(vector, mask)[source]¶ K.softmax(vector) does not work if some elements of vector should be masked. This performs a softmax on just the non-masked portions of vector (passing None in for the mask is also acceptable; you’ll just get a regular softmax).
We assume that both vector and mask (if given) have shape (batch_size, vector_dim).
In the case that the input vector is completely masked, this function returns an array of
0.0
. This behavior may causeNaN
if this is used as the last layer of a model that uses categorial cross-entropy loss.
Similarity Functions¶
bilinear¶
-
class
deep_qa.tensors.similarity_functions.bilinear.
Bilinear
(**kwargs)[source]¶ Bases:
deep_qa.tensors.similarity_functions.similarity_function.SimilarityFunction
This similarity function performs a bilinear transformation of the two input vectors. This function has a matrix of weights W and a bias b, and the similarity between two vectors x and y is computed as x^T W y + b.
-
compute_similarity
(tensor_1, tensor_2)[source]¶ Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).
-
initialize_weights
(tensor_1_dim: int, tensor_2_dim: int) → typing.List[typing.K.variable][source]¶ Called in a Layer.build() method that uses this SimilarityFunction, here we both initialize whatever weights are necessary for this similarity function, and return them so they can be included in Layer.trainable_weights.
Parameters: tensor_1_dim : int
The last dimension (typically
embedding_dim
) of the first input tensor. We need this so we can initialize weights appropriately.tensor_2_dim : int
The last dimension (typically
embedding_dim
) of the second input tensor. We need this so we can initialize weights appropriately.
-
cosine_similarity¶
-
class
deep_qa.tensors.similarity_functions.cosine_similarity.
CosineSimilarity
(**kwargs)[source]¶ Bases:
deep_qa.tensors.similarity_functions.similarity_function.SimilarityFunction
This similarity function simply computes the cosine similarity between each pair of vectors. It has no parameters.
-
compute_similarity
(tensor_1, tensor_2)[source]¶ Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).
-
initialize_weights
(tensor_1_dim: int, tensor_2_dim: int) → typing.List[typing.K.variable][source]¶ Called in a Layer.build() method that uses this SimilarityFunction, here we both initialize whatever weights are necessary for this similarity function, and return them so they can be included in Layer.trainable_weights.
Parameters: tensor_1_dim : int
The last dimension (typically
embedding_dim
) of the first input tensor. We need this so we can initialize weights appropriately.tensor_2_dim : int
The last dimension (typically
embedding_dim
) of the second input tensor. We need this so we can initialize weights appropriately.
-
dot_product¶
-
class
deep_qa.tensors.similarity_functions.dot_product.
DotProduct
(**kwargs)[source]¶ Bases:
deep_qa.tensors.similarity_functions.similarity_function.SimilarityFunction
This similarity function simply computes the dot product between each pair of vectors. It has no parameters.
-
compute_similarity
(tensor_1, tensor_2)[source]¶ Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).
-
initialize_weights
(tensor_1_dim: int, tensor_2_dim: int) → typing.List[typing.K.variable][source]¶ Called in a Layer.build() method that uses this SimilarityFunction, here we both initialize whatever weights are necessary for this similarity function, and return them so they can be included in Layer.trainable_weights.
Parameters: tensor_1_dim : int
The last dimension (typically
embedding_dim
) of the first input tensor. We need this so we can initialize weights appropriately.tensor_2_dim : int
The last dimension (typically
embedding_dim
) of the second input tensor. We need this so we can initialize weights appropriately.
-
linear¶
-
class
deep_qa.tensors.similarity_functions.linear.
Linear
(combination: str = 'x, y', **kwargs)[source]¶ Bases:
deep_qa.tensors.similarity_functions.similarity_function.SimilarityFunction
This similarity function performs a dot product between a vector of weights and some combination of the two input vectors. The combination done is configurable.
If the two vectors are x and y, we allow the following kinds of combinations: x, y, x*y, x+y, x-y, x/y, where each of those binary operations is performed elementwise. You can list as many combinations as you want, comma separated. For example, you might give “x,y,x*y” as the combination parameter to this class. The computed similarity function would then be w^T [x; y; x*y] + b, where w is a vector of weights, b is a bias parameter, and [;] is vector concatenation.
Note that if you want a bilinear similarity function with a diagonal weight matrix W, where the similarity function is computed as x * w * y + b (with w the diagonal of W), you can accomplish that with this class by using “x*y” for combination.
-
compute_similarity
(tensor_1, tensor_2)[source]¶ Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).
-
initialize_weights
(tensor_1_dim: int, tensor_2_dim: int) → typing.List[typing.K.variable][source]¶ Called in a Layer.build() method that uses this SimilarityFunction, here we both initialize whatever weights are necessary for this similarity function, and return them so they can be included in Layer.trainable_weights.
Parameters: tensor_1_dim : int
The last dimension (typically
embedding_dim
) of the first input tensor. We need this so we can initialize weights appropriately.tensor_2_dim : int
The last dimension (typically
embedding_dim
) of the second input tensor. We need this so we can initialize weights appropriately.
-
similarity_function¶
Similarity functions take a pair of tensors with the same shape, and compute a similarity function on the vectors in the last dimension. For example, the tensors might both have shape (batch_size, sentence_length, embedding_dim), and we will compute some function of the two vectors of length embedding_dim for each position (batch_size, sentence_length), returning a tensor of shape (batch_size, sentence_length).
The similarity function could be as simple as a dot product, or it could be a more complex, parameterized function. The SimilarityFunction class exposes an API for a Layer that wants to allow for multiple similarity functions, such as for initializing and returning weights.
If you want to compute a similarity between tensors of different sizes, you need to first tile them in the appropriate dimensions to make them the same before you can use these functions. The Attention and MatrixAttention layers do this.
-
class
deep_qa.tensors.similarity_functions.similarity_function.
SimilarityFunction
(name: str, initialization: str = 'glorot_uniform', activation: str = 'linear')[source]¶ Bases:
object
-
compute_similarity
(tensor_1, tensor_2)[source]¶ Takes two tensors of the same shape, such as (batch_size, length_1, length_2, embedding_dim). Computes a (possibly parameterized) similarity on the final dimension and returns a tensor with one less dimension, such as (batch_size, length_1, length_2).
-
initialize_weights
(tensor_1_dim: int, tensor_2_dim: int) → typing.List[typing.K.variable][source]¶ Called in a Layer.build() method that uses this SimilarityFunction, here we both initialize whatever weights are necessary for this similarity function, and return them so they can be included in Layer.trainable_weights.
Parameters: tensor_1_dim : int
The last dimension (typically
embedding_dim
) of the first input tensor. We need this so we can initialize weights appropriately.tensor_2_dim : int
The last dimension (typically
embedding_dim
) of the second input tensor. We need this so we can initialize weights appropriately.
-
Common Utils¶
Here are some general utilities that we’ve written to help in other parts of the code base.
Checks¶
Parameter Utils¶
-
class
deep_qa.common.params.
Params
(params: typing.Dict[str, typing.Any], history: str = '')[source]¶ Bases:
collections.abc.MutableMapping
Represents a parameter dictionary with a history, and contains other functionality around parameter passing and validation for DeepQA.
There are currently two benefits of a
Params
object over a plain dictionary for parameter passing:- We handle a few kinds of parameter validation, including making sure that parameters representing discrete choices actually have acceptable values, and making sure no extra parameters are passed.
- We log all parameter reads, including default values. This gives a more complete specification of the actual parameters used than is given in a JSON / HOCON file, because those may not specify what default values were used, whereas this will log them.
The convention for using a
Params
object in DeepQA is that you will consume the parameters as you read them, so that there are none left when you’ve read everything you expect. This lets us easily validate that you didn’t pass in any extra parameters, just by making sure that the parameter dictionary is empty. You should do this when you’re done handling parameters, by callingParams.assert_empty()
.-
DEFAULT
= <object object>¶
-
_abc_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache
= <_weakrefset.WeakSet object>¶
-
_abc_negative_cache_version
= 47¶
-
_abc_registry
= <_weakrefset.WeakSet object>¶
-
as_dict
(quiet=False)[source]¶ Sometimes we need to just represent the parameters as a dict, for instance when we pass them to a Keras layer(so that they can be serialised).
Parameters: quiet: bool, optional (default = False)
Whether to log the parameters before returning them as a dict.
-
assert_empty
(class_name: str)[source]¶ Raises a
ConfigurationError
ifself.params
is not empty. We takeclass_name
as an argument so that the error message gives some idea of where an error happened, if there was one.class_name
should be the name of the calling class, the one that got extra parameters (if there are any).
-
get
(key: str, default: typing.Any = <object object>)[source]¶ Performs the functionality associated with dict.get(key) but also checks for returned dicts and returns a Params object in their place with an updated history.
-
pop
(key: str, default: typing.Any = <object object>)[source]¶ Performs the functionality associated with dict.pop(key), along with checking for returned dictionaries, replacing them with Param objects with an updated history.
If
key
is not present in the dictionary, and no default was specified, we raise aConfigurationError
, instead of the typicalKeyError
.
-
pop_choice
(key: str, choices: typing.List[typing.Any], default_to_first_choice: bool = False)[source]¶ Gets the value of
key
in theparams
dictionary, ensuring that the value is one of the given choices. Note that this pops the key from params, modifying the dictionary, consistent with how parameters are processed in this codebase.Parameters: key: str
Key to get the value from in the param dictionary
choices: List[Any]
A list of valid options for values corresponding to
key
. For example, if you’re specifying the type of encoder to use for some part of your model, the choices might be the list of encoder classes we know about and can instantiate. If the value we find in the param dictionary is not inchoices
, we raise aConfigurationError
, because the user specified an invalid value in their parameter file.default_to_first_choice: bool, optional (default=False)
If this is
True
, we allow thekey
to not be present in the parameter dictionary. If the key is not present, we will use the return as the value the first choice in thechoices
list. If this isFalse
, we raise aConfigurationError
, because specifying thekey
is required (e.g., you have to specify your model class when running an experiment, but you can feel free to use default settings for encoders if you want).
-
deep_qa.common.params.
pop_choice
(params: typing.Dict[str, typing.Any], key: str, choices: typing.List[typing.Any], default_to_first_choice: bool = False, history: str = '?.') → typing.Any[source]¶ Performs the same function as
Params.pop_choice()
, but is required in order to deal with places that the Params object is not welcome, such as inside Keras layers. See the docstring of that method for more detail on how this function works.This method adds a
history
parameter, in the off-chance that you know it, so that we can reproduceParams.pop_choice()
exactly. We default to using ”?.” if you don’t know the history, so you’ll have to fix that in the log if you want to actually recover the logged parameters.