featureprocessing Package

featureprocessing Package

Decomposition Module

Imputer Module

class WORC.featureprocessing.Imputer.Imputer(missing_values='nan', strategy='mean', n_neighbors=5)[source]

Bases: object

Module for feature imputation.

__dict__ = mappingproxy({'__module__': 'WORC.featureprocessing.Imputer', '__doc__': 'Module for feature imputation.', '__init__': <function Imputer.__init__>, 'fit': <function Imputer.fit>, 'transform': <function Imputer.transform>, '__dict__': <attribute '__dict__' of 'Imputer' objects>, '__weakref__': <attribute '__weakref__' of 'Imputer' objects>})
__init__(missing_values='nan', strategy='mean', n_neighbors=5)[source]

Imputation of feature values using either sklearn, missingpy or (WIP) fancyimpute approaches.

missing_valuesnumber, string, np.nan (default) or None

The placeholder for the missing values. All occurrences of missing_values will be imputed.

strategystring, optional (default=”mean”)

The imputation strategy.

Supported using sklearn: - If “mean”, then replace missing values using the mean along

each column. Can only be used with numeric data.

  • If “median”, then replace missing values using the median along each column. Can only be used with numeric data.

  • If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data.

  • If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.

Supported using missingpy: - If ‘knn’, then use a nearest neighbor search. Can be

used with strings or numeric data.

WIP: More strategies using fancyimpute

n_neighborsint, optional (default = 5)

Number of neighboring samples to use for imputation if method is knn.

__module__ = 'WORC.featureprocessing.Imputer'
__weakref__

list of weak references to the object (if defined)

fit(X, y=None)[source]
transform(X)[source]

Relief Module

class WORC.featureprocessing.Relief.SelectMulticlassRelief(n_neighbours=3, sample_size=1, distance_p=2, numf=None)[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.

__abstractmethods__ = frozenset({})
__init__(n_neighbours=3, sample_size=1, distance_p=2, numf=None)[source]
n_neightbors: integer

Number of nearest neighbours used.

sample_size: float

Percentage of samples used to calculate score

distance_p: integer

Parameter in minkov distance usde for nearest neighbour calculation

numf: integer, default None

Number of important features to be selected with respect to their ranking. If None, all are used.

__module__ = 'WORC.featureprocessing.Relief'
fit(X, y)[source]

Select only features specificed by parameters per patient.

feature_values: numpy array, mandatory

Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.

feature_labels: list, mandatory

Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.

multi_class_relief(feature_set, label_set, nb=3, sample_size=1, distance_p=2, numf=None)[source]
single_class_relief(feature_set, label_set, nb=3, sample_size=1, distance_p=2, numf=None)[source]
transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

SelectGroups Module

class WORC.featureprocessing.SelectGroups.SelectGroups(parameters)[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.

__abstractmethods__ = frozenset({})
__init__(parameters)[source]
parameters: dict, mandatory

Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - patient_features - coliage_features - phase_features - vessel_features - log_features - texture_Gabor_features - texture_GLCM_features - texture_GLCMMS_features - texture_GLRLM_features - texture_GLSZM_features - texture_NGTDM_features - texture_LBP_features

__module__ = 'WORC.featureprocessing.SelectGroups'
fit(feature_labels)[source]

Select only features specificed by parameters per patient.

feature_labels: list, optional

Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.

transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

SelectIndividuals Module

class WORC.featureprocessing.SelectIndividuals.SelectIndividuals(parameters=['hf_mean', 'sf_compactness'])[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.

__abstractmethods__ = frozenset({})
__init__(parameters=['hf_mean', 'sf_compactness'])[source]
parameters: dict, mandatory

Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - patient_features - coliage_features - phase_features - vessel_features - log_features - texture_features

__module__ = 'WORC.featureprocessing.SelectIndividuals'
fit(feature_labels)[source]

Select only features specificed by parameters per patient.

feature_labels: list, optional

Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.

transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

StatisticalTestFeatures Module

WORC.featureprocessing.StatisticalTestFeatures.StatisticalTestFeatures(features, patientinfo, config, output=None, verbose=True, label_type=None)[source]

Perform several statistical tests on features, such as a student t-test. Useage is similar to trainclassifier.

features: string, mandatory

contains the paths to all .hdf5 feature files used. modalityname1=file1,file2,file3,… modalityname2=file1,… Thus, modalities names are always between a space and a equal sign, files are split by commas. We assume that the lists of files for each modality has the same length. Files on the same position on each list should belong to the same patient.

patientinfo: string, mandatory

Contains the path referring to a .txt file containing the patient label(s) and value(s) to be used for learning. See the Github Wiki for the format.

config: string, mandatory

path referring to a .ini file containing the parameters used for feature extraction. See the Github Wiki for the possible fields and their description.

# TODO: outputs

verbose: boolean, default True

print final feature values and labels to command line or not.

StatisticalTestThreshold Module

class WORC.featureprocessing.StatisticalTestThreshold.StatisticalTestThreshold(metric='ttest', threshold=0.05)[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Object to fit feature selection based on statistical tests.

__abstractmethods__ = frozenset({})
__init__(metric='ttest', threshold=0.05)[source]
metric: string, default ‘ttest’

Statistical test used for selection. Options are ttest, Welch, Wilcoxon, MannWhitneyU

threshold: float, default 0.05

Threshold for p-value in order for feature to be selected

__module__ = 'WORC.featureprocessing.StatisticalTestThreshold'
fit(X_train, Y_train)[source]

Select only features specificed by the metric and threshold per patient.

X_train: numpy array, mandatory

Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.

Y_train: numpy array, mandatory

Array containing the binary labels for each object in X_train.

transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function.

inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

VarianceThreshold Module

class WORC.featureprocessing.VarianceThreshold.VarianceThresholdMean(threshold)[source]

Bases: sklearn.base.BaseEstimator, sklearn.feature_selection.base.SelectorMixin

Select features based on variance among objects. Similar to VarianceThreshold from sklearn, but does take the mean of the feature into account.

__abstractmethods__ = frozenset({})
__init__(threshold)[source]

Initialize self. See help(type(self)) for accurate signature.

__module__ = 'WORC.featureprocessing.VarianceThreshold'
fit(image_features)[source]
transform(inputarray)[source]

Transform the inputarray to select only the features based on the result from the fit function. Parameters ———- inputarray: numpy array, mandatory

Array containing the items to use selection on. The type of item in this list does not matter, e.g. floats, strings etc.

WORC.featureprocessing.VarianceThreshold.selfeat_variance(image_features, labels=None, thresh=0.99, method='nomean')[source]

Select features using a variance threshold.

image_features: numpy array, mandatory

Array containing the feature values to apply the variance threshold selection on. The rows correspond to the patients, the column to the features.

labels: numpy array, optional

Array containing the labels of the corresponding features. Array should therefore have the same shape as the image_features array.

thresh: float, default 0.99

Threshold to be used as lower boundary for feature variance among patients.

method: string, default nomean.

Method to use for selection. Default: do not use the mean of the features. Other valid option is ‘mean’.

image_features: numpy array

Transformed features array.

labels: list or None

When labels are given, returns the transformed labels. That object contains a list of all label names kept.

sel: VarianceThreshold object

The fitted variance threshold object.