deeppavlov.models.multitask_bert¶
-
class
deeppavlov.dataset_readers.multitask_reader.MultiTaskReader[source]¶ Class to read several datasets simultaneuosly
-
class
deeppavlov.dataset_iterators.multitask_iterator.MultiTaskIterator(data: dict, tasks: dict)[source]¶ Class merges data from several dataset iterators. When used for batch generation batches from merged dataset iterators are united into one batch. If sizes of merged datasets are different smaller datasets are repeated until their size becomes equal to the largest dataset.
- Parameters
data – dictionary which keys are task names and values are dictionaries with fields
"train", "valid", "test".tasks – dictionary which keys are task names and values are init params of dataset iterators.
-
data¶ dictionary of data with fields “train”, “valid” and “test” (or some of them)
-
gen_batches(batch_size: int, data_type: str = 'train', shuffle: Optional[bool] = None) → Iterator[Tuple[tuple, tuple]][source]¶ Generate batches and expected output to train neural networks. Batches from task iterators are united into one batch. Every element of the largest dataset is used once whereas smaller datasets are repeated until their size is equal to the largest dataset.
- Parameters
batch_size – number of samples in batch
data_type – can be either ‘train’, ‘test’, or ‘valid’
shuffle – whether to shuffle dataset before batching
- Yields
a tuple of a batch of inputs and a batch of expected outputs. Inputs and outputs are tuples. Element of inputs or outputs is a tuple which elements are x values of merged tasks in the order tasks are present in tasks argument of __init__ method.
-
get_instances(data_type: str = 'train')[source]¶ Returns a tuple of inputs and outputs from all datasets. Lengths of inputs and outputs are equal to the size of the largest dataset. Smaller datasets are repeated until their sizes are equal to the size of the largest dataset.
- Parameters
data_type – can be either ‘train’, ‘test’, or ‘valid’
- Returns
a tuple of all inputs for a data type and all expected outputs for a data type
-
class
deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert(*args, **kwargs)[source]¶ The component for multi-task BERT. It builds the BERT body, launches building of BERT heads.
The component aggregates components implementing BERT heads. The head components are called tasks.
__call__andtrain_on_batchmethods ofMultiTaskBertare used for inference and training of BERT heads. BERT head components, which are derived fromMTBertTask, can be used only inside this class.One training iteration consists of one
train_on_batchcall for every task.If
inference_task_namesis notNone, then the component is created for training. Otherwise, the component is created for inference. If component is created for inference, several tasks can be run simultaneously. For explanation see parameterinference_task_namesdescription.- Parameters
tasks – a dictionary. Task names are dictionary keys and objects of
MTBertTasksubclasses are dictionary values. Task names are used as variable scopes in computational graph so it is important to use same names in multi-task BERT train and inference configuration files.bert_config_file – path to BERT configuration file
pretrained_bert – pre-trained BERT checkpoint
attention_probs_keep_prob – keep_prob for BERT self-attention layers
hidden_keep_prob – keep_prob for BERT hidden layers
body_learning_rate – learning rate of BERT body
min_body_learning_rate – min value of body learning rate if learning rate decay is used
learning_rate_drop_patience – how many validations with no improvements to wait
learning_rate_drop_div – the divider of the learning rate after
learning_rate_drop_patienceunsuccessful validationsload_before_drop – whether to load best model before dropping learning rate or not
clip_norm – clip gradients by norm
freeze_embeddings – set to False to train input embeddings
inference_task_names –
names of tasks on which inference is done. If this parameter is provided, the component is created for inference, else the component is created for training.
If
inference_task_namesis a string, then it is a name of the task called separately from other tasks (in individualtf.Session.runcall).If
inference_task_namesis alist, then elements of this list are either strings or lists of strings. You can combine these options. For example,["task_name1", ["task_name2", "task_name3"], ["task_name4", "task_name5"]].If an element of
inference_task_nameslist is a string, the element is a name of the task that is computed when__call__method is called.If an element of the
inference_task_namesparameter is a list of strings["task_name1", "task_name2", ...], then tasks"task_name1","task_name2"and so on are run simultaneously intf.Session.runcall. This option is available if tasks"task_name1","task_name2"and so on have common inputs. Despite the fact that tasks share inputs, if positional arguments are used in methods__call__andtrain_on_batch, all arguments are passed individually. For instance, if"task_name1","task_name2", and"task_name3"all take an argument with namexin the model pipe, then the__call__method takes arguments(x, x, x).in_distribution –
The distribution of variables listed in the
"in"config parameter between tasks.in_distributioncan beNoneif only 1 task is called. In that case all variables listed in"in"are arguments of 1 task.in_distributioncan be a dictionary ofint. If that is the case, then keys ofin_distributionare task names and values are numbers of variables from"in"parameter of config which are inputs of corresponding task. The variables in"in"parameter have to be in the same order the tasks are listed inin_distribution.in_distributioncan be a dictionary of lists ofstr. Strings are names of variables from"in"configuration parameter. If"in"parameter is a list, thenin_distributionworks the same way as whenin_distributionis dictionary ofint. Values ofin_distribution, which are lists, are replaced by their lengths. If"in"parameter in component config is a dictionary, then the order of strings inin_distributionvalues has to match the order of arguments oftrain_on_batchandget_sess_run_infer_argsmethods of task components.in_y_distribution – The same as
in_distributionfor"in_y"config parameter.
-
train_on_batch(*args, **kwargs) → Dict[str, Dict[str, float]][source]¶ Calls
train_on_batchmethods for every task. This method takesargsorkwargsbut not both. The order ofargsis the same as the order of tasks in the component parameters:args = [ task1_in_x[0], task1_in_x[1], task1_in_x[2], ... task1_in_y[0], task1_in_y[1], ... task2_in_x[0], ... ]
If
kwargsare used andin_distributionandin_y_distributionattributes are dictionaries of lists of strings, then keys ofkwargshave to be same as strings inin_distributionandin_y_distribution. Ifin_distributionandin_y_distributionare dictionaries ofint, thenkwargsvalues are treated the same way asargs.- Parameters
args – task inputs and expected outputs
kwargs – task inputs and expected outputs
- Returns
dictionary of dictionaries with task losses and learning rates.
-
__call__(*args, **kwargs)[source]¶ Calls one or several BERT heads depending on provided task names.
argsandkwargscontain inputs of BERT tasks.argsandkwargscannot be used together. Ifargsare usedargscontent has to beargs = [ task1_in_x[0], task1_in_x[1], ... task2_in_x[0], task2_in_x[1], ... ]
If
kwargsare used andin_distributionis a dictionary ofint, thenkwargs’ order has to be the same asargsorder described in the previous paragraph. Ifin_distributionis a dictionary of lists ofstr, then all task names fromin_distributionhave to be present inkwargskeys.- Returns
list of results of called tasks.
-
call(args: Tuple[Any], kwargs: Dict[str, Any], task_names: Optional[Union[List[str], str]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None)[source]¶ Calls one or several BERT heads depending on provided task names in
task_namesparameter.argsandkwargscontain inputs of BERT tasks.argsandkwargs cannot be used simultaneously. If ``argsare usedargs, content has to beargs = [ task1_in_x[0], task1_in_x[1], ... task2_in_x[0], task2_in_x[1], ... ]
If
kwargsis usedkwargskeys has to match content ofin_namesparams of called tasks.- Parameters
args – generally,
argsparameter of__call__method of this component orMTBertReUser. Inputs of one or several tasks. Has to be empty ifkwargsargument is used.kwargs – generally,
kwargsparameter of__call__method of this component orMTBertReUser. Inputs of one or several tasks. Has to be empty ifargsargument is used.task_names – names of tasks that are called. If
str, then 1 task is called. If a task name is an element oftask_nameslist, then this task is run independently. If task an element oftask_namesis an list of strings, then tasks in the inner list are run simultaneously.in_distribution – a distribution of variables from
"in"config parameters between tasks. For details see method__init__docstring.
- Returns
list results of called tasks.
-
class
deeppavlov.models.multitask_bert.multitask_bert.MTBertTask(keep_prob: float = 1.0, return_probas: Optional[bool] = None, learning_rate: float = 0.001)[source]¶ Abstract class for multitask BERT tasks. Objects of its subclasses are linked with BERT body when
MultiTaskBert.buildmethod is called. Training is performed withMultiTaskBert.train_on_batchmethod is called. The objects of classes derived fromMTBertTaskdon’t have__call__method. Instead they haveget_sess_run_infer_argsandpost_process_predsmethods, which are called fromcallmethod ofMultiTaskBertclass.get_sess_run_infer_argsmethod returns fetches and feed_dict for inference andpost_process_predsmethod retrieves predictions from computed fetches. Classes derived fromMTBertTaskmustget_sess_run_train_argsmethod that returns fetches and feed_dict for training.- Parameters
keep_prob – dropout keep_prob for non-BERT layers
return_probas – set this to
Trueif you need the probabilities instead of raw answerslearning_rate – learning rate of BERT head
-
build(bert_body: bert_dp.modeling.BertModel, optimizer_params: Dict[str, Union[str, float]], shared_placeholders: Dict[str, tensorflow.placeholder], sess: tensorflow.Session, mode: str, get_train_op_func: Callable, freeze_embeddings: bool, bert_head_variable_scope: str) → None[source]¶ Initiates building of the BERT head and initializes optimizer parameters, placeholders that are common for all tasks.
- Parameters
bert_body – instance of
BertModel.optimizer_params – a dictionary with four fields:
'optimizer'(str) – a name of optimizer class,'body_learning_rate'(float) – initial value of BERT body learning rate,'min_body_learning_rate'(float) – min BERT body learning rate for learning rate decay,'weight_decay_rate'(float) – L2 weight decay forAdamWeightDecayOptimizershared_placeholders – a dictionary with placeholders used in all tasks. The dictionary contains fields
'input_ids','input_masks','learning_rate','keep_prob','is_train','token_types'.sess – current
tf.Sessioninstancemode –
'train'or'inference'get_train_op_func – a function returning
tf.Operationand with signature similar toLRScheduledTFModel.get_train_opwithoutselfargument. It is a function returning train operation for specified loss and variable scopes.freeze_embeddings – set
Falseto train input embeddings.bert_head_variable_scope – variable scope for BERT head.
-
abstract
_init_graph() → None[source]¶ Build BERT head, initialize task specific placeholders, create attributes containing output probabilities and model loss. Optimizer initialized not in this method but in
_init_optimizer.
-
get_train_op(loss: tensorflow.Tensor, body_learning_rate: Union[tensorflow.Tensor, float], **kwargs) → tensorflow.Operation[source]¶ Return operation for the task training. Head learning rate is calculated as a product of
body_learning_rateand quotient of initial head learning rate and initial body learning rate.- Parameters
loss – the task loss
body_learning_rate – the learning rate for the BERT body
- Returns
train operation for the task
-
train_on_batch(*args, **kwargs) → Dict[str, float][source]¶ Trains the task on one batch. This method will work correctly if you override
get_sess_run_train_argsfor your task.- Parameters
kwargs – the keys are
body_learning_rateand"in"and"in_y"params for the task.- Returns
dictionary with calcutated task loss and body and head learning rates.
-
abstract
get_sess_run_infer_args(*args) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for inference. Fetches are lists of tensors and feed_dict is dictionary with placeholder values required for fetches computation. The method is used inside
MultiTaskBert__call__method.If
self.return_probasisTruefetches contains probabilities tensor and predictions tensor otherwise.Overriding methods take task inputs as positional arguments.
ATTENTION! Let
get_sess_run_infer_argsmethod haven_x_argsarguments. Then the order of firstn_x_argsarguments ofget_sess_run_train_argsmethod arguments has to match the order ofget_sess_run_infer_argsarguments.- Parameters
args – task inputs.
- Returns
fetches and feed_dict
-
abstract
get_sess_run_train_args(*args) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for task
train_on_batchmethod.Overriding methods take task inputs as positional arguments.
ATTENTION! Let
get_sess_run_infer_argsmethod haven_x_argsarguments. Then the order of firstn_x_argsarguments ofget_sess_run_train_argsmethod arguments has to match the order ofget_sess_run_infer_argsarguments.- Parameters
args – task inputs followed by expect outputs.
- Returns
fetches and feed_dict
-
class
deeppavlov.models.multitask_bert.multitask_bert.MTBertSequenceTaggingTask(n_tags: Optional[int] = None, use_crf: Optional[bool] = None, use_birnn: bool = False, birnn_cell_type: str = 'lstm', birnn_hidden_size: int = 128, keep_prob: float = 1.0, encoder_dropout: float = 0.0, return_probas: Optional[bool] = None, encoder_layer_ids: Optional[List[int]] = None, learning_rate: float = 0.001)[source]¶ BERT head for text tagging. It predicts a label for every token (not subtoken) in the text. You can use it for sequence labelling tasks, such as morphological tagging or named entity recognition. Objects of this class should be passed to the constructor of
MultiTaskBertclass in paramtasks.- Parameters
n_tags – number of distinct tags
use_crf – whether to use CRF on top or not
use_birnn – whether to use bidirection rnn after BERT layers. For NER and morphological tagging we usually set it to
Falseas otherwise the model overfitsbirnn_cell_type – the type of Bidirectional RNN. Either
"lstm"or"gru"birnn_hidden_size – number of hidden units in the BiRNN layer in each direction
keep_prob – dropout keep_prob for non-Bert layers
encoder_dropout – dropout probability of encoder output layer
return_probas – set this to
Trueif you need the probabilities instead of raw answersencoder_layer_ids – list of averaged layers from Bert encoder (layer ids) optimizer: name of
tf.train.*optimizer or None forAdamWeightDecayOptimizerweight_decay_rate: L2 weight decay forAdamWeightDecayOptimizerlearning_rate – learning rate of BERT head
-
get_sess_run_infer_args(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray]) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model inference. The method is called from
MultiTaskBert.__call__.- Parameters
input_ids – indices of the subwords in vocabulary
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word
- Returns
list of fetches and feed_dict
-
get_sess_run_train_args(input_ids: Union[List[List[int]], numpy.ndarray], input_masks: Union[List[List[int]], numpy.ndarray], y_masks: Union[List[List[int]], numpy.ndarray], y: Union[List[List[int]], numpy.ndarray], body_learning_rate: float) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model
train_on_batchmethod.- Parameters
input_ids – indices of the subwords in vocabulary
input_masks – mask that determines where to attend and where not to
y_masks – mask which determines the first subword units in the the word
y – indices of ground truth tags
body_learning_rate – learning rate for BERT body
- Returns
list of fetches and feed_dict
-
post_process_preds(sess_run_res: List[numpy.ndarray]) → Union[List[List[int]], List[numpy.ndarray]][source]¶ Decodes CRF if needed and returns predictions or probabilities.
- Parameters
sess_run_res – list of computed fetches gathered by
get_sess_run_infer_args- Returns
predictions or probabilities depending on
return_probasattribute
-
class
deeppavlov.models.multitask_bert.multitask_bert.MTBertClassificationTask(n_classes: Optional[int] = None, return_probas: Optional[bool] = None, one_hot_labels: Optional[bool] = None, keep_prob: float = 1.0, multilabel: bool = False, learning_rate: float = 2e-05, optimizer: str = 'Adam')[source]¶ Task for text classification.
It uses output from [CLS] token and predicts labels using linear transformation.
- Parameters
n_classes – number of classes
return_probas – set
Trueif return class probabilities instead of most probable label neededone_hot_labels – set
Trueif one-hot encoding for labels is usedkeep_prob – dropout keep_prob for non-BERT layers
multilabel – set
Trueif it is multi-label classificationlearning_rate – learning rate of BERT head
optimizer – name of
tf.train.*optimizer orNoneforAdamWeightDecayOptimizer
-
get_sess_run_infer_args(features: List[bert_dp.preprocessing.InputFeatures]) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model inference. The method is called from
MultiTaskBert.__call__.- Parameters
features – text features created by BERT preprocessor.
- Returns
list of fetches and feed_dict
-
get_sess_run_train_args(features: List[bert_dp.preprocessing.InputFeatures], y: Union[List[int], List[List[int]]], body_learning_rate: float) → Tuple[List[tensorflow.Tensor], Dict[tensorflow.placeholder, Any]][source]¶ Returns fetches and feed_dict for model
train_on_batchmethod.- Parameters
features – text features created by BERT preprocessor.
y – batch of labels (class id or one-hot encoding)
body_learning_rate – learning rate for BERT body
- Returns
list of fetches and feed_dict
-
class
deeppavlov.models.multitask_bert.multitask_bert.MTBertReUser(mt_bert: deeppavlov.models.multitask_bert.multitask_bert.MultiTaskBert, task_names: Union[str, List[Union[List[str], str]]], in_distribution: Optional[Union[Dict[str, int], Dict[str, List[str]]]] = None, *args, **kwargs)[source]¶ Instances of this class are for multi-task BERT inference. In inference config
MultiTaskBertclass may not perform inference of some tasks. For example, you may need to sequentially apply two models with BERT. In that case,mt_bert_reuseris created to call remaining tasks.- Parameters
mt_bert – An instance of
MultiTaskBerttask_names – Names of infered tasks. If
task_namesisstr, thentask_namesis the name of the only infered task. Iftask_namesislist, then its elements can be either strings or lists of strings. If an element oftask_namesis a string, then this element is a name of a task that is run independently. If an element oftask_namesis a list of strings, then the element is a list of names of tasks that have common inputs and run simultaneously. For detailed information look upMultiTaskBertinference_task_namesparameter.
-
__call__(*args, **kwargs) → List[Any][source]¶ Infer tasks listed in parameter
task_names. One of parametersargsandkwargshas to be empty.- Parameters
args – inputs and labels of infered tasks.
kwargs – inputs and labels of infered tasks.
- Returns
list of results of inference of tasks listed in
task_names
-
class
deeppavlov.models.multitask_bert.multitask_bert.InputSplitter(keys_to_extract: Union[List[str], Tuple[str, …]], **kwargs)[source]¶ The instance of these class in pipe splits a batch of sequences of identical length or dictionaries with identical keys into tuple of batches.
- Parameters
keys_to_extract – a sequence of ints or strings that have to match keys of split dictionaries.
-
__call__(inp: Union[List[dict], List[List[int]], List[Tuple[int]]]) → List[list][source]¶ Returns batches of values from
inp. Every batch contains values that have same key fromkeys_to_extractattribute. The order of elements ofkeys_to_extractis preserved.- Parameters
inp – A sequence of dictionaries with identical keys
- Returns
A list of lists of values of dictionaries from
inp