scikit_ext package

Submodules

scikit_ext.estimators module

Various scikit-learn estimators and meta-estimators

class scikit_ext.estimators.IterRandomEstimator(estimator, target_score=None, max_iter=10, random_state=None, scoring=<function calinski_harabaz_score>, fit_params=None, verbose=0)

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

Meta-Estimator intended primarily for unsupervised estimators whose fitted model can be heavily dependent on an arbitrary random initialization state. It is best used for problems where a fit_predict method is intended, so the only data used for prediction will be the same data on which the model was fitted.

The fit method will fit multiple iterations of the same base estimator, varying the random_state argument for each iteration. The iterations will stop either when max_iter is reached, or when the target score is obtained.

The model does not use cross validation to find the best estimator. It simply fits and scores on the entire input data set. A hyperparaeter is not being optimized here, only random initialization states. The idea is to find the best fitted model, and keep that exact model, rather than to find the best hyperparameter set.

fit(X, y=None, **fit_params)

Run fit on the estimator attribute multiple times with various random_state arguments and choose the fitted estimator with the best score.

Uses calinski_harabaz_score is no scoring is provided.

X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples] or [n_samples, n_output], optional
Target relative to X for classification or regression; None for unsupervised learning.
**fit_params : dict of string -> object
Parameters passed to the fit method of the estimator
class scikit_ext.estimators.OneVsRestAdjClassifier(estimator, norm=None, **kwargs)

Bases: sklearn.multiclass.OneVsRestClassifier

One-vs-the-rest (OvR) multiclass strategy

Also known as one-vs-all, this strategy consists in fitting one classifier per class. For each classifier, the class is fitted against all the other classes. In addition to its computational efficiency (only n_classes classifiers are needed), one advantage of this approach is its interpretability. Since each class is represented by one and one classifier only, it is possible to gain knowledge about the class by inspecting its corresponding classifier. This is the most commonly used strategy for multiclass classification and is a fair default choice.

The adjusted version is a custom extension which overwrites the inherited predict_proba() method with a more flexible method allowing custom normalization for the predicted probabilities. Any norm argument that can be passed directly to sklearn.preprocessing.normalize is allowed. Additionally, norm=None will skip the normalization step alltogeter. To mimick the inherited OneVsRestClassfier behavior, set norm=’l2’. All other methods are inherited from OneVsRestClassifier.

estimator : estimator object
An estimator object implementing fit and one of decision_function or predict_proba.
n_jobs : int, optional, default: 1
The number of jobs to use for the computation. If -1 all CPUs are used. If 1 is given, no parallel computing code is used at all, which is useful for debugging. For n_jobs below -1, (n_cpus + 1 + n_jobs) are used. Thus for n_jobs = -2, all CPUs but one are used.
norm: str, optional, default: None
Normalization method to be passed straight into sklearn.preprocessing.normalize as the norm input. A value of None (default) will skip the normalization step.
estimators_ : list of n_classes estimators
Estimators used for predictions.
classes_ : array, shape = [n_classes]
Class labels.
label_binarizer_ : LabelBinarizer object
Object used to transform multiclass labels to binary labels and vice-versa.
multilabel_ : boolean
Whether a OneVsRestClassifier is a multilabel classifier.
predict_proba(X)

Probability estimates.

The returned estimates for all classes are ordered by label of classes.

X : array-like, shape = [n_samples, n_features]

T : array-like, shape = [n_samples, n_classes]
Returns the probability of the sample for each class in the model, where classes are ordered as they are in self.classes_.
class scikit_ext.estimators.OptimizedEnsemble(estimator, n_estimators_init=5, threshold=0.01, max_iter=10, step_function=<function <lambda>>, **kwargs)

Bases: sklearn.model_selection._search.BaseSearchCV

An optimized ensemble class. Will find the optimal n_estimators parameter for the given ensemble estimator, according to the specified input parameters.

The fit method will iterate through n_estimators options, starting with n_estimators_init, and using the step_function reursively from there. Stop at max_iter or when the score gain between iterations is less than threshold.

The OptimizedEnsemble class can then itself be used as an Estimator, or the best_estimator_ attribute can be accessed directly, which is a fitted version of the input estimator with the optimal parameters.

fit(X, y, **fit_params)

Find the optimal n_estimators parameter using a custom optimization routine.

X : array-like, shape = [n_samples, n_features]
Training vector, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples] or [n_samples, n_output], optional
Target relative to X for classification or regression; None for unsupervised learning.
**fit_params : dict of string -> object
Parameters passed to the fit method of the estimator
score(*args, **kwargs)

Call score on the estimator with the best found parameters. Only available if the underlying estimator supports score.

This uses the score defined by the best_estimator_.score method.

X : array-like, shape = [n_samples, n_features]
Input data, where n_samples is the number of samples and n_features is the number of features.
y : array-like, shape = [n_samples] or [n_samples, n_output], optional
Target relative to X for classification or regression; None for unsupervised learning.

score : float

scikit_ext.scorers module

Various scikit-learn scorers and scoring functions

scikit_ext.scorers.cluster_distribution_score(X, labels)

Description

X : array-like, shape (n_samples, n_features)
List of n_features-dimensional data points. Each row corresponds to a single data point.
labels : array-like, shape (n_samples,)
Predicted labels for each sample.
score : float
The resulting Cluster Distribution score.

Module contents