Version 0.2¶
Changelog¶
Bug fixes¶
- Fixed a bug in
under_sampling.NearMiss
which was not picking the right samples during under sampling for the method 3. By Guillaume Lemaitre. - Fixed a bug in
ensemble.EasyEnsemble
, correction of the random_state generation. By Guillaume Lemaitre and Christos Aridas. - Fixed a bug in
under_sampling.RepeatedEditedNearestNeighbours
, add additional stopping criterion to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.AllKNN
, add stopping criteria to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the list of indices returned. By Guillaume Lemaitre. - Fixed a bug in
ensemble.BalanceCascade
, solve the issue to obtain a single array if desired. By Guillaume Lemaitre. - Fixed a bug in
pipeline.Pipeline
, solve to embed Pipeline in other Pipeline. #231 by Christos Aridas. - Fixed a bug in
pipeline.Pipeline
, solve the issue to put to sampler in the same Pipeline. #188 by Christos Aridas. - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the shape of sel_x when only one sample is selected. By Aliaksei Halachkin. - Fixed a bug in
under_sampling.NeighbourhoodCleaningRule
, selecting neighbours instead of minority class misclassified samples. #230 by Aleksandr Loskutov. - Fixed a bug in
over_sampling.ADASYN
, correction of the creation of a new sample so that the new sample lies between the minority sample and the nearest neighbour. #235 by Rafael Wampfler.
New features¶
- Added AllKNN under sampling technique. By Dayvid Oliveira.
- Added a module metrics implementing some specific scoring function for the problem of balancing. #204 by Guillaume Lemaitre and Christos Aridas.
Enhancement¶
- Added support for bumpversion. By Guillaume Lemaitre.
- Validate the type of target in binary samplers. A warning is raised for the moment. By Guillaume Lemaitre and Christos Aridas.
- Change from cross_validation module to model_selection module for sklearn deprecation cycle. By Dayvid Oliveira and Christos Aridas.
API changes summary¶
- size_ngh has been deprecated in
combine.SMOTEENN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.EditedNearestNeighbors
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.CondensedNeareastNeigbour
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.OneSidedSelection
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.NeighbourhoodCleaningRule
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.RepeatedEditedNearestNeighbours
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.AllKNN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - Two base classes
BaseBinaryclassSampler
andBaseMulticlassSampler
have been created to handle the target type and raise warning in case of abnormality. By Guillaume Lemaitre and Christos Aridas. - Move random_state to be assigned in the
SamplerMixin
initialization. By Guillaume Lemaitre. - Provide estimators instead of parameters in
combine.SMOTEENN
andcombine.SMOTETomek
. Therefore, the list of parameters have been deprecated. By Guillaume Lemaitre and Christos Aridas. - k has been deprecated in
over_sampling.ADASYN
. Use n_neighbors instead. #183 by Guillaume Lemaitre. - k and m have been deprecated in
over_sampling.SMOTE
. Use k_neighbors and m_neighbors instead. #182 by Guillaume Lemaitre. - n_neighbors accept KNeighborsMixin based object for
under_sampling.EditedNearestNeighbors
,under_sampling.CondensedNeareastNeigbour
,under_sampling.NeighbourhoodCleaningRule
,under_sampling.RepeatedEditedNearestNeighbours
, andunder_sampling.AllKNN
. #109 by Guillaume Lemaitre.
Documentation changes¶
- Replace some remaining UnbalancedDataset occurences. By Francois Magimel.
- Added doctest in the documentation. By Guillaume Lemaitre.