classification Package

AdvancedSampler Module

class WORC.classification.AdvancedSampler.AdvancedSampler(param_distributions, n_iter, random_state=None, method='Halton')[source]

Bases: object

Generator on parameters sampled from given distributions using numerical sequences. Based on the sklearn ParameterSampler.

Non-deterministic iterable over random candidate combinations for hyper- parameter search. If all parameters are presented as a list, sampling without replacement is performed. If at least one parameter is given as a distribution, sampling with replacement is used. It is highly recommended to use continuous distributions for continuous parameters.

Note that before SciPy 0.16, the scipy.stats.distributions do not accept a custom RNG instance and always use the singleton RNG from numpy.random. Hence setting random_state will not guarantee a deterministic iteration whenever scipy.stats distributions are used to define the parameter search space. Deterministic behavior is however guaranteed from SciPy 0.16 onwards.

Read more in the User Guide.

param_distributionsdict

Dictionary where the keys are parameters and values are distributions from which a parameter is to be sampled. Distributions either have to provide a rvs function to sample from them, or can be given as a list of values, where a uniform distribution is assumed.

n_iterinteger

Number of parameter settings that are produced.

random_stateint or RandomState

Pseudo random number generator state used for random uniform sampling from lists of possible values instead of scipy.stats distributions.

paramsdict of string to any

Yields dictionaries mapping each estimator parameter to as sampled value.

>>> from WORC.classification.AdvancedSampler import HaltonSampler
>>> from scipy.stats.distributions import expon
>>> import numpy as np
>>> np.random.seed(0)
>>> param_grid = {'a':[1, 2], 'b': expon()}
>>> param_list = list(HaltonSampler(param_grid, n_iter=4))
>>> rounded_list = [dict((k, round(v, 6)) for (k, v) in d.items())
...                 for d in param_list]
>>> rounded_list == [{'b': 0.89856, 'a': 1},
...                  {'b': 0.923223, 'a': 1},
...                  {'b': 1.878964, 'a': 2},
...                  {'b': 1.038159, 'a': 2}]
True
__dict__ = mappingproxy({'__module__': 'WORC.classification.AdvancedSampler', '__doc__': "Generator on parameters sampled from given distributions using\n numerical sequences. Based on the sklearn ParameterSampler.\n\n Non-deterministic iterable over random candidate combinations for hyper-\n parameter search. If all parameters are presented as a list,\n sampling without replacement is performed. If at least one parameter\n is given as a distribution, sampling with replacement is used.\n It is highly recommended to use continuous distributions for continuous\n parameters.\n\n Note that before SciPy 0.16, the ``scipy.stats.distributions`` do not\n accept a custom RNG instance and always use the singleton RNG from\n ``numpy.random``. Hence setting ``random_state`` will not guarantee a\n deterministic iteration whenever ``scipy.stats`` distributions are used to\n define the parameter search space. Deterministic behavior is however\n guaranteed from SciPy 0.16 onwards.\n\n Read more in the :ref:`User Guide <search>`.\n\n Parameters\n ----------\n param_distributions : dict\n Dictionary where the keys are parameters and values\n are distributions from which a parameter is to be sampled.\n Distributions either have to provide a ``rvs`` function\n to sample from them, or can be given as a list of values,\n where a uniform distribution is assumed.\n\n n_iter : integer\n Number of parameter settings that are produced.\n\n random_state : int or RandomState\n Pseudo random number generator state used for random uniform sampling\n from lists of possible values instead of scipy.stats distributions.\n\n Returns\n -------\n params : dict of string to any\n **Yields** dictionaries mapping each estimator parameter to\n as sampled value.\n\n Examples\n --------\n >>> from WORC.classification.AdvancedSampler import HaltonSampler\n >>> from scipy.stats.distributions import expon\n >>> import numpy as np\n >>> np.random.seed(0)\n >>> param_grid = {'a':[1, 2], 'b': expon()}\n >>> param_list = list(HaltonSampler(param_grid, n_iter=4))\n >>> rounded_list = [dict((k, round(v, 6)) for (k, v) in d.items())\n ... for d in param_list]\n >>> rounded_list == [{'b': 0.89856, 'a': 1},\n ... {'b': 0.923223, 'a': 1},\n ... {'b': 1.878964, 'a': 2},\n ... {'b': 1.038159, 'a': 2}]\n True\n ", '__init__': <function AdvancedSampler.__init__>, '__iter__': <function AdvancedSampler.__iter__>, '__len__': <function AdvancedSampler.__len__>, '__dict__': <attribute '__dict__' of 'AdvancedSampler' objects>, '__weakref__': <attribute '__weakref__' of 'AdvancedSampler' objects>})
__init__(param_distributions, n_iter, random_state=None, method='Halton')[source]

Initialize self. See help(type(self)) for accurate signature.

__iter__()[source]
__len__()[source]

Number of points that will be sampled.

__module__ = 'WORC.classification.AdvancedSampler'
__weakref__

list of weak references to the object (if defined)

class WORC.classification.AdvancedSampler.discrete_uniform(loc=-1, scale=0)[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'WORC.classification.AdvancedSampler', '__init__': <function discrete_uniform.__init__>, 'rvs': <function discrete_uniform.rvs>, '__dict__': <attribute '__dict__' of 'discrete_uniform' objects>, '__weakref__': <attribute '__weakref__' of 'discrete_uniform' objects>, '__doc__': None})
__init__(loc=-1, scale=0)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'WORC.classification.AdvancedSampler'
__weakref__

list of weak references to the object (if defined)

rvs(size=None, random_state=None)[source]
class WORC.classification.AdvancedSampler.exp_uniform(loc=-1, scale=0, base=2.718281828459045)[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'WORC.classification.AdvancedSampler', '__init__': <function exp_uniform.__init__>, 'rvs': <function exp_uniform.rvs>, '__dict__': <attribute '__dict__' of 'exp_uniform' objects>, '__weakref__': <attribute '__weakref__' of 'exp_uniform' objects>, '__doc__': None})
__init__(loc=-1, scale=0, base=2.718281828459045)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'WORC.classification.AdvancedSampler'
__weakref__

list of weak references to the object (if defined)

rvs(size=None, random_state=None)[source]
class WORC.classification.AdvancedSampler.log_uniform(loc=-1, scale=0, base=10)[source]

Bases: object

__dict__ = mappingproxy({'__module__': 'WORC.classification.AdvancedSampler', '__init__': <function log_uniform.__init__>, 'rvs': <function log_uniform.rvs>, '__dict__': <attribute '__dict__' of 'log_uniform' objects>, '__weakref__': <attribute '__weakref__' of 'log_uniform' objects>, '__doc__': None})
__init__(loc=-1, scale=0, base=10)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'WORC.classification.AdvancedSampler'
__weakref__

list of weak references to the object (if defined)

rvs(size=None, random_state=None)[source]

RankedSVM Module

WORC.classification.RankedSVM.RankSVM_test(test_data, num_class, Weights, Bias, SVs, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)[source]
WORC.classification.RankedSVM.RankSVM_test_original(test_data, test_target, Weights, Bias, SVs, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)[source]
WORC.classification.RankedSVM.RankSVM_train(train_data, train_target, cost=1, lambda_tol=1e-06, norm_tol=0.0001, max_iter=500, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)[source]
WORC.classification.RankedSVM.RankSVM_train_old(train_data, train_target, cost=1, lambda_tol=1e-06, norm_tol=0.0001, max_iter=500, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)[source]

Weights,Bias,SVs = RankSVM_train(train_data,train_target,cost,lambda_tol,norm_tol,max_iter,svm,gamma,coefficient,degree)

Description

RankSVM_train takes,

train_data - An MxN array, the ith instance of training instance is stored in train_data[i,:] train_target - A QxM array, if the ith training instance belongs to the jth class, then train_target[j,i] equals +1, otherwise train_target(j,i) equals -1

svm - svm gives the type of svm used in training, which can take the value of ‘RBF’, ‘Poly’ or ‘Linear’; svm.para gives the corresponding parameters used for the svm:
  1. if svm is ‘RBF’, then gamma gives the value of gamma, where the kernel is exp(-Gamma*|x[i]-x[j]|^2)

  1. if svm is ‘Poly’, then three values are used gamma, coefficient, and degree respectively, where the kernel is (gamma*<x[i],x[j]>+coefficient)^degree.

  2. if svm is ‘Linear’, then svm is [].

cost - The value of ‘C’ used in the SVM, default=1 lambda_tol - The tolerance value for lambda described in the appendix of [1]; default value is 1e-6 norm_tol - The tolerance value for difference between alpha(p+1) and alpha(p) described in the appendix of [1]; default value is 1e-4 max_iter - The maximum number of iterations for RankSVM, default=500

and returns,

Weights - The value for beta[ki] as described in the appendix of [1] is stored in Weights[k,i] Bias - The value for b[i] as described in the appendix of [1] is stored in Bias[1,i] SVs - The ith support vector is stored in SVs[:,i]

For more details,please refer to [1] and [2].

WORC.classification.RankedSVM.is_empty(any_structure)[source]
WORC.classification.RankedSVM.neg_dual_func(Lambda, Alpha_old, Alpha_new, c_value, kernel, num_training, num_class, Label, not_Label, Label_size, size_alpha)[source]

SearchCV Module

construct_classifier Module

WORC.classification.construct_classifier.construct_SVM(config, regression=False)[source]

Constructs a SVM classifier

Args:

config (dict): Dictionary of the required config settings features (pandas dataframe): A pandas dataframe containing the features

to be used for classification

Returns:

SVM/SVR classifier, parameter grid

WORC.classification.construct_classifier.construct_classifier(config)[source]

Interface to create classification

Different classifications can be created using this common interface

config: dict, mandatory

Contains the required config settings. See the Github Wiki for all available fields.

Returns:

Constructed classifier

WORC.classification.construct_classifier.create_param_grid(config)[source]

Create a parameter grid for the WORC classifiers based on the provided configuration.

crossval Module

estimators Module

class WORC.classification.estimators.RankedSVM(cost=1, lambda_tol=1e-06, norm_tol=0.0001, max_iter=500, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.ClassifierMixin

An example classifier which implements a 1-NN algorithm.

demo_paramstr, optional

A parameter used for demonstation of how to pass and store paramters.

X_array, shape = [n_samples, n_features]

The input passed during fit()

y_array, shape = [n_samples]

The labels passed during fit()

__init__(cost=1, lambda_tol=1e-06, norm_tol=0.0001, max_iter=500, svm='Poly', gamma=0.05, coefficient=0.05, degree=3)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'WORC.classification.estimators'
fit(X, y)[source]

A reference implementation of a fitting function for a classifier.

Xarray-like, shape = [n_samples, n_features]

The training input samples.

yarray-like, shape = [n_samples]

The target values. An array of int.

selfobject

Returns self.

predict(X, y=None)[source]

A reference implementation of a prediction for a classifier.

Xarray-like of shape = [n_samples, n_features]

The input samples.

yarray of int of shape = [n_samples]

The label for each sample is the label of the closest sample seen udring fit.

predict_proba(X, y)[source]

A reference implementation of a prediction for a classifier.

Xarray-like of shape = [n_samples, n_features]

The input samples.

yarray of int of shape = [n_samples]

The label for each sample is the label of the closest sample seen udring fit.

fitandscore Module

metrics Module

WORC.classification.metrics.ICC(M, ICCtype='inter')[source]
Input:

M is matrix of observations. Rows: patients, columns: observers. type: ICC type, currently “inter” or “intra”.

WORC.classification.metrics.ICC_anova(Y, ICCtype='inter', more=False)[source]

Adopted from Nipype with a slight alteration to distinguish inter and intra. the data Y are entered as a ‘table’ ie subjects are in rows and repeated measures in columns One Sample Repeated measure ANOVA Y = XB + E with X = [FaTor / Subjects]

WORC.classification.metrics.check_scoring(estimator, scoring=None, allow_none=False)[source]

Surrogate for sklearn’s check_scoring to enable use of some other scoring metrics.

WORC.classification.metrics.multi_class_auc(y_truth, y_score)[source]
WORC.classification.metrics.multi_class_auc_score(y_truth, y_score)[source]
WORC.classification.metrics.pairwise_auc(y_truth, y_score, class_i, class_j)[source]
WORC.classification.metrics.performance_multilabel(y_truth, y_prediction, y_score=None, beta=1)[source]

Multiclass performance metrics.

y_truth and y_prediction should both be lists with the multiclass label of each object, e.g.

y_truth = [0, 0, 0, 0, 0, 0, 2, 2, 1, 1, 2] ### Groundtruth y_prediction = [0, 0, 0, 0, 0, 0, 1, 2, 1, 2, 2] ### Predicted labels

Calculation of accuracy accorading to formula suggested in CAD Dementia Grand Challege http://caddementia.grand-challenge.org Calculation of Multi Class AUC according to classpy: https://bitbucket.org/bigr_erasmusmc/classpy/src/master/classpy/multi_class_auc.py

WORC.classification.metrics.performance_singlelabel(y_truth, y_prediction, y_score, regression=False)[source]

Singleclass performance metrics

parameter_optimization Module

trainclassifier Module