featureprocessing Package¶
featureprocessing
Package¶
Imputer
Module¶
-
class
WORC.featureprocessing.Imputer.
Imputer
(missing_values='nan', strategy='mean', n_neighbors=5)[source]¶ Bases:
object
Module for feature imputation.
-
__dict__
= mappingproxy({'__module__': 'WORC.featureprocessing.Imputer', '__doc__': 'Module for feature imputation.', '__init__': <function Imputer.__init__>, 'fit': <function Imputer.fit>, 'transform': <function Imputer.transform>, '__dict__': <attribute '__dict__' of 'Imputer' objects>, '__weakref__': <attribute '__weakref__' of 'Imputer' objects>})¶
-
__init__
(missing_values='nan', strategy='mean', n_neighbors=5)[source]¶ Imputation of feature values using either sklearn, missingpy or (WIP) fancyimpute approaches.
- missing_valuesnumber, string, np.nan (default) or None
The placeholder for the missing values. All occurrences of missing_values will be imputed.
- strategystring, optional (default=”mean”)
The imputation strategy.
Supported using sklearn: - If “mean”, then replace missing values using the mean along
each column. Can only be used with numeric data.
If “median”, then replace missing values using the median along each column. Can only be used with numeric data.
If “most_frequent”, then replace missing using the most frequent value along each column. Can be used with strings or numeric data.
If “constant”, then replace missing values with fill_value. Can be used with strings or numeric data.
Supported using missingpy: - If ‘knn’, then use a nearest neighbor search. Can be
used with strings or numeric data.
WIP: More strategies using fancyimpute
- n_neighborsint, optional (default = 5)
Number of neighboring samples to use for imputation if method is knn.
-
__module__
= 'WORC.featureprocessing.Imputer'¶
-
__weakref__
¶ list of weak references to the object (if defined)
-
Relief
Module¶
-
class
WORC.featureprocessing.Relief.
SelectMulticlassRelief
(n_neighbours=3, sample_size=1, distance_p=2, numf=None)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.feature_selection.base.SelectorMixin
Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.
-
__abstractmethods__
= frozenset({})¶
-
__init__
(n_neighbours=3, sample_size=1, distance_p=2, numf=None)[source]¶ - n_neightbors: integer
Number of nearest neighbours used.
- sample_size: float
Percentage of samples used to calculate score
- distance_p: integer
Parameter in minkov distance usde for nearest neighbour calculation
- numf: integer, default None
Number of important features to be selected with respect to their ranking. If None, all are used.
-
__module__
= 'WORC.featureprocessing.Relief'¶
-
fit
(X, y)[source]¶ Select only features specificed by parameters per patient.
- feature_values: numpy array, mandatory
Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.
- feature_labels: list, mandatory
Contains the labels of all features used. The index in this list will be used in the transform funtion to select features.
-
SelectGroups
Module¶
-
class
WORC.featureprocessing.SelectGroups.
SelectGroups
(parameters)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.feature_selection.base.SelectorMixin
Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.
-
__abstractmethods__
= frozenset({})¶
-
__init__
(parameters)[source]¶ - parameters: dict, mandatory
Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - patient_features - coliage_features - phase_features - vessel_features - log_features - texture_Gabor_features - texture_GLCM_features - texture_GLCMMS_features - texture_GLRLM_features - texture_GLSZM_features - texture_NGTDM_features - texture_LBP_features
-
__module__
= 'WORC.featureprocessing.SelectGroups'¶
-
SelectIndividuals
Module¶
-
class
WORC.featureprocessing.SelectIndividuals.
SelectIndividuals
(parameters=['hf_mean', 'sf_compactness'])[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.feature_selection.base.SelectorMixin
Object to fit feature selection based on the type group the feature belongs to. The label for the feature is used for this procedure.
-
__abstractmethods__
= frozenset({})¶
-
__init__
(parameters=['hf_mean', 'sf_compactness'])[source]¶ - parameters: dict, mandatory
Contains the settings for the groups to be selected. Should contain the settings for the following groups: - histogram_features - shape_features - orientation_features - semantic_features - patient_features - coliage_features - phase_features - vessel_features - log_features - texture_features
-
__module__
= 'WORC.featureprocessing.SelectIndividuals'¶
-
StatisticalTestFeatures
Module¶
StatisticalTestThreshold
Module¶
-
class
WORC.featureprocessing.StatisticalTestThreshold.
StatisticalTestThreshold
(metric='ttest', threshold=0.05)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.feature_selection.base.SelectorMixin
Object to fit feature selection based on statistical tests.
-
__abstractmethods__
= frozenset({})¶
-
__init__
(metric='ttest', threshold=0.05)[source]¶ - metric: string, default ‘ttest’
Statistical test used for selection. Options are ttest, Welch, Wilcoxon, MannWhitneyU
- threshold: float, default 0.05
Threshold for p-value in order for feature to be selected
-
__module__
= 'WORC.featureprocessing.StatisticalTestThreshold'¶
-
fit
(X_train, Y_train)[source]¶ Select only features specificed by the metric and threshold per patient.
- X_train: numpy array, mandatory
Array containing feature values used for model_selection. Number of objects on first axis, features on second axis.
- Y_train: numpy array, mandatory
Array containing the binary labels for each object in X_train.
-