deeppavlov.models.sklearn¶
-
class
deeppavlov.models.sklearn.sklearn_component.SklearnComponent(model_class: str, save_path: Optional[Union[pathlib.Path, str]] = None, load_path: Optional[Union[pathlib.Path, str]] = None, infer_method: str = 'predict', ensure_list_output: bool = False, **kwargs)[source]¶ Class implements wrapper for sklearn components for feature extraction, feature selection, classification, regression etc.
- Parameters
model_class – string with full name of sklearn model to use, e.g.
sklearn.linear_model:LogisticRegressionsave_path – save path for model, e.g. full name
model_path/model.pklor prefixmodel_path/model(still model will be saved tomodel_path/model.pkl)load_path – load path for model, e.g. full name
model_path/model.pklor prefixmodel_path/model(still model will be loaded frommodel_path/model.pkl)infer_method – string name of class method to use for infering model, e.g.
predict,predict_proba,predict_log_proba,transformensure_list_output – whether to ensure that output for each sample is iterable (but not string)
kwargs – dictionary with parameters for the sklearn model
-
model¶ sklearn model instance
-
model_class¶ string with full name of sklearn model to use, e.g.
sklearn.linear_model:LogisticRegression
-
model_params¶ dictionary with parameters for the sklearn model without pipe parameters
-
pipe_params¶ dictionary with parameters for pipe:
in,out,fit_on,main,name
-
save_path¶ save path for model, e.g. full name
model_path/model.pklor prefixmodel_path/model(still model will be saved tomodel_path/model.pkl)
-
load_path¶ load path for model, e.g. full name
model_path/model.pklor prefixmodel_path/model(still model will be loaded frommodel_path/model.pkl)
-
infer_method¶ string name of class method to use for infering model, e.g.
predict,predict_proba,predict_log_proba,transform
-
ensure_list_output¶ whether to ensure that output for each sample is iterable (but not string)
-
__call__(*args)[source]¶ Infer on the given data according to given in the config infer method, e.g.
"predict", "predict_proba", "transform"- Parameters
*args – list of inputs
- Returns
predictions, e.g. list of labels, array of probability distribution, sparse array of vectorized samples
-
fit(*args) → None[source]¶ Fit model on the given data
- Parameters
*args – list of x-inputs and, optionally, one y-input (the last one) to fit on. Possible input (x0, …, xK, y) or (x0, …, xK) ‘ where K is the number of input data elements (the length of list
infrom config). In case of several inputs (K > 1) input features will be stacked. For example, one has x0: (n_samples, n_features0), …, xK: (n_samples, n_featuresK), then model will be trained on x: (n_samples, n_features0 + … + n_featuresK).- Returns
None
-
init_from_scratch() → None[source]¶ Initialize
self.modelas some sklearn model from scratch with given inself.model_paramsparameters.- Returns
None
-
load(fname: Optional[str] = None) → None[source]¶ Initialize
self.modelas some sklearn model from saved re-initializingself.model_paramsparameters. If in new given parameterswarm_startis set to True and given model admitswarm_startparameter, model will be initilized from saved with opportunity to continue fitting.- Parameters
fname – string name of path to model to load from
- Returns
None
-
save(fname: Optional[str] = None) → None[source]¶ Save
self.modelto the file fromfnameor, if not given,self.save_path. Ifself.save_pathdoes not have.pklextension, then it will be replaced tostr(Path(self.save_path).stem) + ".pkl"- Parameters
fname – string name of path to model to save to
- Returns
None
-
static
compose_input_data(x: List[Union[Tuple[Union[numpy.ndarray, list, scipy.sparse.base.spmatrix, str]], List[Union[numpy.ndarray, list, scipy.sparse.base.spmatrix, str]], numpy.ndarray, scipy.sparse.base.spmatrix]]) → Union[scipy.sparse.base.spmatrix, numpy.ndarray][source]¶ Stack given list of different types of inputs to the one matrix. If one of the inputs is a sparse matrix, then output will be also a sparse matrix
- Parameters
x – list of data elements
- Returns
sparse or dense array of stacked data