quantization_config Module

ThresholdSelectionMethod

Enum to select a method for quantization threshold selection:

class model_compression_toolkit.ThresholdSelectionMethod(value)

Method for quantization threshold selection:

NOCLIPPING - Use min/max values as thresholds.

MSE - Use min square error for minimizing quantization noise.

MAE - Use min absolute error for minimizing quantization noise.

KL - Use KL-divergence to make signals distributions to be similar as possible.

Lp - Use Lp-norm to minimizing quantization noise.


QuantizationConfig

Class to configure the quantization process of the model:

class model_compression_toolkit.QuantizationConfig(activation_threshold_method=ThresholdSelectionMethod.MSE, weights_threshold_method=ThresholdSelectionMethod.MSE, activation_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, weights_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, activation_n_bits=8, weights_n_bits=8, relu_unbound_correction=False, weights_bias_correction=True, weights_per_channel_threshold=True, input_scaling=False, enable_weights_quantization=True, enable_activation_quantization=True, shift_negative_activation_correction=False, activation_channel_equalization=False, z_threshold=math.inf, min_threshold=MIN_THRESHOLD, l_p_value=2, shift_negative_ratio=0.25, shift_negative_threshold_recalculation=False)

Class to wrap all different parameters the library quantize the input model according to.

Parameters
  • activation_threshold_method (ThresholdSelectionMethod) – Which method to use from ThresholdSelectionMethod for activation quantization threshold selection.

  • weights_threshold_method (ThresholdSelectionMethod) – Which method to use from ThresholdSelectionMethod for activation quantization threshold selection.

  • activation_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for activation quantization.

  • weights_quantization_method (QuantizationMethod) – Which method to use from QuantizationMethod for weights quantization.

  • activation_n_bits (int) – Number of bits to quantize the activations.

  • weights_n_bits (int) – Number of bits to quantize the coefficients.

  • relu_unbound_correction (bool) – Whether to use relu unbound scaling correction or not.

  • weights_bias_correction (bool) – Whether to use weights bias correction or not.

  • weights_per_channel_threshold (bool) – Whether to quantize the weights per-channel or not (per-tensor).

  • input_scaling (bool) – Whether to use input scaling or not.

  • enable_weights_quantization (bool) – Whether to quantize the model weights or not.

  • enable_activation_quantization (bool) – Whether to quantize the model activations or not.

  • shift_negative_activation_correction (bool) – Whether to use shifting negative activation correction or not.

  • activation_channel_equalization (bool) – Whether to use activation channel equalization correction or not.

  • z_threshold (float) – Value of z score for outliers removal.

  • min_threshold (float) – Minimum threshold to use during thresholds selection.

  • l_p_value (int) – The p value of L_p norm threshold selection.

  • shift_negative_ratio (float) – Value for the ratio between the minimal negative value of a non-linearity output to its activation threshold, which above it - shifting negative activation should occur if enabled.

  • shift_negative_threshold_recalculation (bool) – Whether or not to recompute the threshold after shifting negative activation.

Examples

One may create a quantization configuration to quantize a model according to. For example, to quantize a model using 6 bits for activation, 7 bits for weights, weights and activation quantization method is symetric uniform, weights threshold selection using MSE, activation threshold selection using NOCLIPPING, enabling relu_unbound_correction, weights_bias_correction, and quantizing the weights per-channel, one can instantiate a quantization configuration:

>>> qc = QuantizationConfig(activation_n_bits=6, weights_n_bits=7, activation_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, weights_quantization_method=QuantizationMethod.SYMMETRIC_UNIFORM, weights_threshold_method=ThresholdSelectionMethod.MSE, activation_threshold_method=ThresholdSelectionMethod.NOCLIPPING, relu_unbound_correction=True, weights_bias_correction=True, weights_per_channel_threshold=True)

The QuantizationConfig instanse can then be passed to keras_post_training_quantization()