Release history¶
Version 0.4.2¶
Changelog¶
Bug fixes¶
- Fix a bug in
imblearn.over_sampling.SMOTENC
in which the the median of the standard deviation instead of half of the median of the standard deviation. By Guillaume Lemaitre in #491. - Raise an error when passing target which is not supported, i.e. regression target or multilabel targets. Imbalanced-learn does not support this case. By Guillaume Lemaitre in #490.
- Fix a bug in
imblearn.over_sampling.SMOTENC
in which a sparse matrices were densify duringinverse_transform
. By Guillaume Lemaitre in #495. - Fix a bug in
imblearn.over_sampling.SMOTE_NC
in which a the tie breaking was wrongly sampling. By Guillaume Lemaitre in #497.
Version 0.4¶
October, 2018
Warning
Version 0.4 is the last version of imbalanced-learn to support Python 2.7 and Python 3.4. Imbalanced-learn 0.5 will require Python 3.5 or higher.
Highlights¶
This release brings its set of new feature as well as some API changes to strengthen the foundation of imbalanced-learn.
As new feature, 2 new modules imblearn.keras
and
imblearn.tensorflow
have been added in which imbalanced-learn samplers
can be used to generate balanced mini-batches.
The module imblearn.ensemble
has been consolidated with new classifier:
imblearn.ensemble.BalancedRandomForestClassifier
,
imblearn.ensemble.EasyEnsembleClassifier
,
imblearn.ensemble.RUSBoostClassifier
.
Support for string has been added in
imblearn.over_sampling.RandomOverSampler
and
imblearn.under_sampling.RandomUnderSampler
. In addition, a new class
imblearn.over_sampling.SMOTENC
allows to generate sample with data
sets containing both continuous and categorical features.
The imblearn.over_sampling.SMOTE
has been simplified and break down
to 2 additional classes:
imblearn.over_sampling.SVMSMOTE
and
imblearn.over_sampling.BorderlineSMOTE
.
There is also some changes regarding the API:
the parameter sampling_strategy
has been introduced to replace the
ratio
parameter. In addition, the return_indices
argument has been
deprecated and all samplers will exposed a sample_indices_
whenever this is
possible.
Changelog¶
API¶
- Replace the parameter
ratio
bysampling_strategy
. #411 by Guillaume Lemaitre. - Enable to use a
float
with binary classification forsampling_strategy
. #411 by Guillaume Lemaitre. - Enable to use a
list
for the cleaning methods to specify the class to sample. #411 by Guillaume Lemaitre. - Replace
fit_sample
byfit_resample
. An alias is still available for backward compatibility. In addition,sample
has been removed to avoid resampling on different set of data. #462 by Guillaume Lemaitre.
New features¶
- Add a
keras
andtensorflow
modules to create balanced mini-batches generator. #409 by Guillaume Lemaitre. - Add
imblearn.ensemble.EasyEnsembleClassifier
which create a bag of AdaBoost classifier trained on balanced bootstrap samples. #455 by Guillaume Lemaitre. - Add
imblearn.ensemble.BalancedRandomForestClassifier
which balanced each bootstrap provided to each tree of the forest. #459 by Guillaume Lemaitre. - Add
imblearn.ensemble.RUSBoostClassifier
which applied a random under-sampling stage before each boosting iteration of AdaBoost. #469 by Guillaume Lemaitre. - Add
imblern.over_sampling.SMOTENC
which generate synthetic samples on data set with heterogeneous data type (continuous and categorical features). #412 by Denis Dudnik and Guillaume Lemaitre.
Enhancement¶
- Add a documentation node to create a balanced random forest from a balanced bagging classifier. #372 by Guillaume Lemaitre.
- Document the metrics to evaluate models on imbalanced dataset. #367 by Guillaume Lemaitre.
- Add support for one-vs-all encoded target to support keras. #409 by Guillaume Lemaitre.
- Adding specific class for borderline and SVM SMOTE using
BorderlineSMOTE
andSVMSMOTE
. #440 by Guillaume Lemaitre. - Allow
imblearn.over_sampling.RandomOverSampler
can return indices using the attributesreturn_indices
. #439 by Hugo Gascon and Guillaume Lemaitre. - Allow
imblearn.under_sampling.RandomUnderSampler
andimblearn.over_sampling.RandomOverSampler
to sample object array containing strings. #451 by Guillaume Lemaitre.
Bug fixes¶
- Fix bug in
metrics.classification_report_imbalanced
for which y_pred and y_true where inversed. #394 by @Ole Silvig <klizter>. - Fix bug in ADASYN to consider only samples from the current class when generating new samples. #354 by Guillaume Lemaitre.
- Fix bug which allow for sorted behavior of
sampling_strategy
dictionary and thus to obtain a deterministic results when using the same random state. #447 by Guillaume Lemaitre. - Force to clone scikit-learn estimator passed as attributes to samplers. #446 by Guillaume Lemaitre.
- Fix bug which was not preserving the dtype of X and y when generating samples. #450 by Guillaume Lemaitre.
- Add the option to pass a
Memory
object tomake_pipeline
like inpipeline.Pipeline
class. #458 by Christos Aridas.
Maintenance¶
- Remove deprecated parameters in 0.2 - #331 by Guillaume Lemaitre.
- Make some modules private. #452 by Guillaume Lemaitre.
- Upgrade requirements to scikit-learn 0.20. #379 by Guillaume Lemaitre.
- Catch deprecation warning in testing. #441 by Guillaume Lemaitre.
- Refactor and impose pytest style tests. #470 by Guillaume Lemaitre.
Documentation¶
- Remove some docstring which are not necessary. #454 by Guillaume Lemaitre.
- Fix the documentation of the
sampling_strategy
parameters when used as a float. #480 by Guillaume Lemaitre.
Deprecation¶
- Deprecate
ratio
in favor ofsampling_strategy
. #411 by Guillaume Lemaitre. - Deprecate the use of a
dict
for cleaning methods. alist
should be used. #411 by Guillaume Lemaitre. - Deprecate
random_state
inimblearn.under_sampling.NearMiss
,imblearn.under_sampling.EditedNearestNeighbors
,imblearn.under_sampling.RepeatedEditedNearestNeighbors
,imblearn.under_sampling.AllKNN
,imblearn.under_sampling.NeighbourhoodCleaningRule
,imblearn.under_sampling.InstanceHardnessThreshold
,imblearn.under_sampling.CondensedNearestNeighbours
. - Deprecate
kind
,out_step
,svm_estimator
,m_neighbors
inimblearn.over_sampling.SMOTE
. User should useimblearn.over_sampling.SVMSMOTE
andimblearn.over_sampling.BorderlineSMOTE
. #440 by Guillaume Lemaitre. - Deprecate
imblearn.ensemble.EasyEnsemble
in favor of meta-estimatorimblearn.ensemble.EasyEnsembleClassifier
which follow the exact algorithm described in the literature. #455 by Guillaume Lemaitre. - Deprecate
imblearn.ensemble.BalanceCascade
. #472 by Guillaume Lemaitre. - Deprecate
return_indices
in all samplers. Instead, an attributesample_indices_
is created whenever the sampler is selecting a subset of the original samples. #474 by @Guillaume Lemaitre <glemaitre.
Version 0.3¶
Changelog¶
- Pytest is used instead of nosetests. #321 by Joan Massich.
- Added a User Guide and extended some examples. #295 by Guillaume Lemaitre.
- Fixed a bug in
utils.check_ratio
such that an error is raised when the number of samples required is negative. #312 by Guillaume Lemaitre. - Fixed a bug in
under_sampling.NearMiss
version 3. The indices returned were wrong. #312 by Guillaume Lemaitre. - Fixed bug for
ensemble.BalanceCascade
andcombine.SMOTEENN
andSMOTETomek
. #295 by Guillaume Lemaitre. - Fixed bug for check_ratio to be able to pass arguments when ratio is a callable. #307 by Guillaume Lemaitre.
- Turn off steps in
pipeline.Pipeline
using the None object. By Christos Aridas. - Add a fetching function
datasets.fetch_datasets
in order to get some imbalanced datasets useful for benchmarking. #249 by Guillaume Lemaitre.
- All samplers accepts sparse matrices with defaulting on CSR type. #316 by Guillaume Lemaitre.
datasets.make_imbalance
take a ratio similarly to other samplers. It supports multiclass. #312 by Guillaume Lemaitre.- All the unit tests have been factorized and a
utils.check_estimators
has been derived from scikit-learn. By Guillaume Lemaitre. - Script for automatic build of conda packages and uploading. #242 by Guillaume Lemaitre
- Remove seaborn dependence and improve the examples. #264 by Guillaume Lemaitre.
- adapt all classes to multi-class resampling. #290 by Guillaume Lemaitre
- __init__ has been removed from the
base.SamplerMixin
to create a real mixin class. #242 by Guillaume Lemaitre. - creation of a module
exceptions
to handle consistant raising of errors. #242 by Guillaume Lemaitre. - creation of a module
utils.validation
to make checking of recurrent patterns. #242 by Guillaume Lemaitre. - move the under-sampling methods in
prototype_selection
andprototype_generation
submodule to make a clearer dinstinction. #277 by Guillaume Lemaitre. - change
ratio
such that it can adapt to multiple class problems. #290 by Guillaume Lemaitre.
- Deprecation of the use of
min_c_
indatasets.make_imbalance
. #312 by Guillaume Lemaitre - Deprecation of the use of float in
datasets.make_imbalance
for the ratio parameter. #290 by Guillaume Lemaitre. - deprecate the use of float as ratio in favor of dictionary, string, or callable. #290 by Guillaume Lemaitre.
Version 0.2¶
Changelog¶
- Fixed a bug in
under_sampling.NearMiss
which was not picking the right samples during under sampling for the method 3. By Guillaume Lemaitre. - Fixed a bug in
ensemble.EasyEnsemble
, correction of the random_state generation. By Guillaume Lemaitre and Christos Aridas. - Fixed a bug in
under_sampling.RepeatedEditedNearestNeighbours
, add additional stopping criterion to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.AllKNN
, add stopping criteria to avoid that the minority class become a majority class or that a class disappear. By Guillaume Lemaitre. - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the list of indices returned. By Guillaume Lemaitre. - Fixed a bug in
ensemble.BalanceCascade
, solve the issue to obtain a single array if desired. By Guillaume Lemaitre. - Fixed a bug in
pipeline.Pipeline
, solve to embed Pipeline in other Pipeline. #231 by Christos Aridas. - Fixed a bug in
pipeline.Pipeline
, solve the issue to put to sampler in the same Pipeline. #188 by Christos Aridas. - Fixed a bug in
under_sampling.CondensedNeareastNeigbour
, correction of the shape of sel_x when only one sample is selected. By Aliaksei Halachkin. - Fixed a bug in
under_sampling.NeighbourhoodCleaningRule
, selecting neighbours instead of minority class misclassified samples. #230 by Aleksandr Loskutov. - Fixed a bug in
over_sampling.ADASYN
, correction of the creation of a new sample so that the new sample lies between the minority sample and the nearest neighbour. #235 by Rafael Wampfler.
- Added AllKNN under sampling technique. By Dayvid Oliveira.
- Added a module metrics implementing some specific scoring function for the problem of balancing. #204 by Guillaume Lemaitre and Christos Aridas.
- Added support for bumpversion. By Guillaume Lemaitre.
- Validate the type of target in binary samplers. A warning is raised for the moment. By Guillaume Lemaitre and Christos Aridas.
- Change from cross_validation module to model_selection module for sklearn deprecation cycle. By Dayvid Oliveira and Christos Aridas.
- size_ngh has been deprecated in
combine.SMOTEENN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.EditedNearestNeighbors
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.CondensedNeareastNeigbour
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.OneSidedSelection
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.NeighbourhoodCleaningRule
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.RepeatedEditedNearestNeighbours
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - size_ngh has been deprecated in
under_sampling.AllKNN
. Use n_neighbors instead. By Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira. - Two base classes
BaseBinaryclassSampler
andBaseMulticlassSampler
have been created to handle the target type and raise warning in case of abnormality. By Guillaume Lemaitre and Christos Aridas. - Move random_state to be assigned in the
SamplerMixin
initialization. By Guillaume Lemaitre. - Provide estimators instead of parameters in
combine.SMOTEENN
andcombine.SMOTETomek
. Therefore, the list of parameters have been deprecated. By Guillaume Lemaitre and Christos Aridas. - k has been deprecated in
over_sampling.ADASYN
. Use n_neighbors instead. #183 by Guillaume Lemaitre. - k and m have been deprecated in
over_sampling.SMOTE
. Use k_neighbors and m_neighbors instead. #182 by Guillaume Lemaitre. - n_neighbors accept KNeighborsMixin based object for
under_sampling.EditedNearestNeighbors
,under_sampling.CondensedNeareastNeigbour
,under_sampling.NeighbourhoodCleaningRule
,under_sampling.RepeatedEditedNearestNeighbours
, andunder_sampling.AllKNN
. #109 by Guillaume Lemaitre.
- Replace some remaining UnbalancedDataset occurences. By Francois Magimel.
- Added doctest in the documentation. By Guillaume Lemaitre.
Version 0.1¶
Changelog¶
- First release of the stable API. By :user;`Fernando Nogueira <fmfn>`, Guillaume Lemaitre, Christos Aridas, and Dayvid Oliveira.
- Under-sampling
- Random majority under-sampling with replacement
- Extraction of majority-minority Tomek links
- Under-sampling with Cluster Centroids
- NearMiss-(1 & 2 & 3)
- Condensend Nearest Neighbour
- One-Sided Selection
- Neighboorhood Cleaning Rule
- Edited Nearest Neighbours
- Instance Hardness Threshold
- Repeated Edited Nearest Neighbours
- Over-sampling
- Random minority over-sampling with replacement
- SMOTE - Synthetic Minority Over-sampling Technique
- bSMOTE(1 & 2) - Borderline SMOTE of types 1 and 2
- SVM SMOTE - Support Vectors SMOTE
- ADASYN - Adaptive synthetic sampling approach for imbalanced learning
- Over-sampling followed by under-sampling
- SMOTE + Tomek links
- SMOTE + ENN
- Ensemble sampling
- EasyEnsemble
- BalanceCascade