featureprocessing Package

featureprocessing Package

Imputer Module

class WORC.featureprocessing.Imputer.Imputer(missing_values='nan', strategy='mean', n_neighbors=5)[source]

Bases: object

Module for feature imputation.

__dict__ = mappingproxy({'__module__': 'WORC.featureprocessing.Imputer', '__doc__': 'Module for feature imputation.', '__init__': <function Imputer.__init__>, 'fit': <function Imputer.fit>, 'transform': <function Imputer.transform>, '__dict__': <attribute '__dict__' of 'Imputer' objects>, '__weakref__': <attribute '__weakref__' of 'Imputer' objects>})
__init__(missing_values='nan', strategy='mean', n_neighbors=5)[source]

Imputation of feature values using either sklearn, missingpy or (WIP) fancyimpute approaches.

missing_valuesnumber, string, np.nan (default) or None

The placeholder for the missing values. All occurrences of missing_values will be imputed.

strategystring, optional (default=”mean”)

The imputation strategy.

Supported using sklearn: - If “mean”, then replace missing values using the mean along

each column. Can only be used with numeric data.

  • If “median”, then replace missing values using the median along each column. Can only be used with numeric data.

  • If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data.

  • If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.

Supported using missingpy: - If ‘knn’, then use a nearest neighbor search. Can be

used with strings or numeric data.

WIP: More strategies using fancyimpute

n_neighborsint, optional (default = 5)

Number of neighboring samples to use for imputation if method is knn.

__module__ = 'WORC.featureprocessing.Imputer'
__weakref__

list of weak references to the object (if defined)

fit(X, y=None)[source]
transform(X)[source]

Relief Module

class WORC.featureprocessing.Relief.SelectMulticlassRelief(n_neighbours=3, sample_size=1, distance_p=2, numf=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.

__abstractmethods__ = frozenset({})
__init__(n_neighbours=3, sample_size=1, distance_p=2, numf=None)[source]
n_neightbors: integer

Number of nearest neighbours used.

sample_size: float

Percentage of samples used to calculate score

distance_p: integer

Parameter in minkov distance usde for nearest neighbour calculation

numf: integer, default None

Number of important features to be selected with respect to their ranking. If None, all are used.

__module__ = 'WORC.featureprocessing.Relief'
fit(X, y)[source]

Select only features specificed by parameters per patient.

feature_values: numpy array, mandatory

Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.

feature_labels: list, mandatory

Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.

multi_class_relief(feature_set, label_set, nb=3, sample_size=1, distance_p=2, numf=None)[source]
single_class_relief(feature_set, label_set, nb=3, sample_size=1, distance_p=2, numf=None)[source]
transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

SelectGroups Module

class WORC.featureprocessing.SelectGroups.SelectGroups(parameters)[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.

__abstractmethods__ = frozenset({})
__init__(parameters)[source]
parameters: dict, mandatory

Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - patient_features - coliage_features - phase_features - vessel_features - log_features - texture_Gabor_features - texture_GLCM_features - texture_GLCMMS_features - texture_GLRLM_features - texture_GLSZM_features - texture_NGTDM_features - texture_LBP_features

__module__ = 'WORC.featureprocessing.SelectGroups'
fit(feature_labels)[source]

Select only features specificed by parameters per patient.

feature_labels: list, optional

Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.

transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

SelectIndividuals Module

class WORC.featureprocessing.SelectIndividuals.SelectIndividuals(parameters=['hf_mean', 'sf_compactness'])[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.

__abstractmethods__ = frozenset({})
__init__(parameters=['hf_mean', 'sf_compactness'])[source]
parameters: dict, mandatory

Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - patient_features - coliage_features - phase_features - vessel_features - log_features - texture_features

__module__ = 'WORC.featureprocessing.SelectIndividuals'
fit(feature_labels)[source]

Select only features specificed by parameters per patient.

feature_labels: list, optional

Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.

transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

StatisticalTestFeatures Module

StatisticalTestThreshold Module

class WORC.featureprocessing.StatisticalTestThreshold.StatisticalTestThreshold(metric='ttest', threshold=0.05)[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on statistical tests.

__abstractmethods__ = frozenset({})
__init__(metric='ttest', threshold=0.05)[source]
metric: string, default ‘ttest’

Statistical test used for selection. Options are ttest, Welch, Wilcoxon, MannWhitneyU

threshold: float, default 0.05

Threshold for p-value in order for feature to be selected

__module__ = 'WORC.featureprocessing.StatisticalTestThreshold'
fit(X_train, Y_train)[source]

Select only features specificed by the metric and threshold per patient.

X_train: numpy array, mandatory

Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.

Y_train: numpy array, mandatory

Array containing the binary labels for each object in X_train.

transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

VarianceThreshold Module