6. Miscellaneous samplers¶
6.1. Custom samplers¶
A fully customized sampler, FunctionSampler
, is available in
imbalanced-learn such that you can fast prototype your own sampler by defining
a single function. Additional parameters can be added using the attribute
kw_args
which accepts a dictionary. The following example illustrates how
to retain the 10 first elements of the array X
and y
:
>>> import numpy as np
>>> from imblearn import FunctionSampler
>>> from sklearn.datasets import make_classification
>>> X, y = make_classification(n_samples=5000, n_features=2, n_informative=2,
... n_redundant=0, n_repeated=0, n_classes=3,
... n_clusters_per_class=1,
... weights=[0.01, 0.05, 0.94],
... class_sep=0.8, random_state=0)
>>> def func(X, y):
... return X[:10], y[:10]
>>> sampler = FunctionSampler(func=func)
>>> X_res, y_res = sampler.fit_resample(X, y)
>>> np.all(X_res == X[:10])
True
>>> np.all(y_res == y[:10])
True
We illustrate the use of such sampler to implement an outlier rejection
estimator which can be easily used within a
imblearn.pipeline.Pipeline
:
Customized sampler to implement an outlier rejections estimator
6.2. Custom generators¶
Imbalanced-learn provides specific generators for TensorFlow and Keras which will generate balanced mini-batches.
6.2.1. TensorFlow generator¶
The imblearn.tensorflow.balanced_batch_generator
allow to generate
balanced mini-batches using an imbalanced-learn sampler which returns indices:
>>> X = X.astype(np.float32)
>>> from imblearn.under_sampling import RandomUnderSampler
>>> from imblearn.tensorflow import balanced_batch_generator
>>> training_generator, steps_per_epoch = balanced_batch_generator(
... X, y, sample_weight=None, sampler=RandomUnderSampler(),
... batch_size=10, random_state=42)
The generator
and steps_per_epoch
is used during the training of the
Tensorflow model. We will illustrate how to use this generator. First, we can
define a logistic regression model which will be optimized by a gradient
descent:
>>> learning_rate, epochs = 0.01, 10
>>> input_size, output_size = X.shape[1], 3
>>> import tensorflow as tf
>>> def init_weights(shape):
... return tf.Variable(tf.random_normal(shape, stddev=0.01))
>>> def accuracy(y_true, y_pred):
... return np.mean(np.argmax(y_pred, axis=1) == y_true)
>>> # input and output
>>> data = tf.placeholder("float32", shape=[None, input_size])
>>> targets = tf.placeholder("int32", shape=[None])
>>> # build the model and weights
>>> W = init_weights([input_size, output_size])
>>> b = init_weights([output_size])
>>> out_act = tf.nn.sigmoid(tf.matmul(data, W) + b)
>>> # build the loss, predict, and train operator
>>> cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
... logits=out_act, labels=targets)
>>> loss = tf.reduce_sum(cross_entropy)
>>> optimizer = tf.train.GradientDescentOptimizer(learning_rate)
>>> train_op = optimizer.minimize(loss)
>>> predict = tf.nn.softmax(out_act)
>>> # Initialization of all variables in the graph
>>> init = tf.global_variables_initializer()
Once initialized, the model is trained by iterating on balanced mini-batches of data and minimizing the loss previously defined:
>>> with tf.Session() as sess:
... print('Starting training')
... sess.run(init)
... for e in range(epochs):
... for i in range(steps_per_epoch):
... X_batch, y_batch = next(training_generator)
... sess.run([train_op, loss], feed_dict={data: X_batch, targets: y_batch})
... # For each epoch, run accuracy on train and test
... feed_dict = dict()
... predicts_train = sess.run(predict, feed_dict={data: X})
... print("epoch: {} train accuracy: {:.3f}"
... .format(e, accuracy(y, predicts_train)))
...
Starting training
[...
6.2.2. Keras generator¶
Keras provides an higher level API in which a model can be defined and train by
calling fit_generator
method to train the model. To illustrate, we will
define a logistic regression model:
>>> import keras
>>> y = keras.utils.to_categorical(y, 3)
>>> model = keras.Sequential()
>>> model.add(keras.layers.Dense(y.shape[1], input_dim=X.shape[1],
... activation='softmax'))
>>> model.compile(optimizer='sgd', loss='categorical_crossentropy',
... metrics=['accuracy'])
imblearn.keras.balanced_batch_generator
creates a balanced mini-batches
generator with the associated number of mini-batches which will be generated:
>>> from imblearn.keras import balanced_batch_generator
>>> training_generator, steps_per_epoch = balanced_batch_generator(
... X, y, sampler=RandomUnderSampler(), batch_size=10, random_state=42)
Then, fit_generator
can be called passing the generator and the step:
>>> callback_history = model.fit_generator(generator=training_generator,
... steps_per_epoch=steps_per_epoch,
... epochs=10, verbose=0)
The second possibility is to use
imblearn.keras.BalancedBatchGenerator
. Only an instance of this class
will be passed to fit_generator
:
>>> from imblearn.keras import BalancedBatchGenerator
>>> training_generator = BalancedBatchGenerator(
... X, y, sampler=RandomUnderSampler(), batch_size=10, random_state=42)
>>> callback_history = model.fit_generator(generator=training_generator,
... epochs=10, verbose=0)