frlearn.neighbours.preprocessors
.FRPS¶
-
class
frlearn.neighbours.preprocessors.
FRPS
(owa_weights: frlearn.utils.owa_operators.OWAOperator = <frlearn.utils.owa_operators.invadd object>, quality_measure: str = 'lower', aggr_R=<function mean>, nn_search: frlearn.neighbours.neighbour_search.NNSearch = <frlearn.neighbours.neighbour_search.KDTree object>)[source]¶ Implementation of the Fuzzy Rough Prototype Selection (FRPS) preprocessor.
Calculates quality measure for each training instance, the values of which serve as potential thresholds for selecting instances. Each potential threshold and corresponding candidate instance set is evaluated by comparing the decision class of each instance with the decision class of its nearest neighbour within the candidate instance set (excluding itself). The candidate instance set with the highest number of matches is selected.
- Parameters
- quality_measurestr {‘upper’, ‘lower’, ‘both’, }, default=’lower’
Quality measure to use for calculating thresholds. Either the upper approximation of the decision class of each attribute, the lower approximation, or the mean value of both. [2] recommends the lower approximation.
- aggr_R(ndarray, int, ) -> ndarray, default=np.mean
Function that takes an ndarray and a keyword argument axis, and returns an ndarray with the corresponding axis removed. Used to define the similarity relation R from the per-attribute similarities. [1] uses the Łukasiewicz t-norm, while [2] offers a choice between the Łukasiewicz t-norm, the mean and the minimum, and recommends the mean.
- owa_weights: OWAOperator, default=invadd()
OWA weights to use for calculation of soft maximum and/or minimum in quality measure. [1] uses additive weights, while [2] uses inverse additive weights.
- nn_searchNNSearch, default=KDTree()
Nearest neighbour search algorithm to use.
Notes
There are a number of implementation differences between [1] and [2], in each case the present implementation follows [2]:
[1] calculates upper and lower approximations using all instances, while [2] only calculates upper approximation membership over instances of the decision class, and lower approximation membership over instances of other decision classes. This affects over what length the weight vector is ‘stretched’.
In addition, [2] excludes each instance from the calculation of its own upper approximation membership.
[1] uses additive weights, while [2] uses inverse additive weights.
[1] uses the Łukasiewicz t-norm to aggregate per-attribute similarities, while [2] recommends using the mean.
In addition, there are implementation issues not addressed in [1] or [2]:
It is unclear what metric the nearest neighbour search should use. It seems reasonable that it should either correspond with the similarity relation R (and therefore incorporate the same aggregation strategy from per-attribute similarities), or that it should match whatever metric is used by nearest neighbour classifition subsequent to FRPS. By default, the present implementation uses manhattan distance on the scaled attribute values.
When the largest quality measure value corresponds to a singleton candidate instance set, it cannot be evaluated (because the single instance in that set has no nearest neighbour). Since this is an edge case that would not score highly anyway, it is simply excluded from consideration.
References
- R99f51b447213-1
- R99f51b447213-2
-
__init__
(self, owa_weights: 'OWAOperator' = <frlearn.utils.owa_operators.invadd object at 0x7f8b19dae990>, quality_measure: 'str' = 'lower', aggr_R=<function mean at 0x7f8b1e88f950>, nn_search: 'NNSearch' = <frlearn.neighbours.neighbour_search.KDTree object at 0x7f8b19dae850>)[source]¶ Initialize self. See help(type(self)) for accurate signature.