baselines package

Submodules

baselines.basemethod module

class baselines.basemethod.BaseMethod(*args, **kwargs)[source]

Bases: ABC

Abstract method for learning to defer methods

abstract fit(*args, **kwargs)[source]

this function should fit the model and be enough to evaluate the model

fit_hyperparam(*args, **kwargs)[source]

This is an optional method that fits and optimizes hyperparameters over a validation set

abstract test(dataloader)[source]

this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)

class baselines.basemethod.BaseSurrogateMethod(alpha, plotting_interval, model, device)[source]

Bases: BaseMethod

Abstract method for learning to defer methods based on a surrogate model

fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, scheduler=None, verbose=True, test_interval=5)[source]

this function should fit the model and be enough to evaluate the model

fit_epoch(dataloader, optimizer, verbose=False, epoch=1)[source]

Fit the model for one epoch model: model to be trained dataloader: dataloader optimizer: optimizer verbose: print loss epoch: epoch number

abstract surrogate_loss_function(outputs, hum_preds, data_y)[source]

surrogate loss function

test(dataloader)[source]

Test the model dataloader: dataloader

baselines.compare_confidence module

class baselines.compare_confidence.CompareConfidence(model_class, model_expert, device, plotting_interval=100)[source]

Bases: BaseMethod

Method trains classifier indepedently on cross entropy, and expert model on whether human prediction is equal to ground truth. Then, at each test point we compare the confidence of the classifier and the expert model.

fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, scheduler=None, verbose=True, test_interval=5)[source]

fits classifier and expert model

Parameters
  • dataloader_train (_type_) – train dataloader

  • dataloader_val (_type_) – val dataloader

  • dataloader_test (_type_) – _description_

  • epochs (_type_) – training epochs

  • optimizer (_type_) – optimizer function

  • lr (_type_) – learning rate

  • scheduler (_type_, optional) – scheduler function. Defaults to None.

  • verbose (bool, optional) – _description_. Defaults to True.

  • test_interval (int, optional) – _description_. Defaults to 5.

Returns

metrics on the test set

Return type

dict

fit_epoch_class(dataloader, optimizer, verbose=True, epoch=1)[source]

train classifier for single epoch :param dataloader: _description_ :type dataloader: dataloader :param optimizer: _description_ :type optimizer: optimizer :param verbose: to print loss or not. Defaults to True. :type verbose: bool, optional :param epoch: _description_. Defaults to 1. :type epoch: int, optional

fit_epoch_expert(dataloader, optimizer, verbose=True, epoch=1)[source]

train expert model for single epoch

Parameters
  • dataloader (_type_) – _description_

  • optimizer (_type_) – _description_

  • verbose (bool, optional) – _description_. Defaults to True.

  • epoch (int, optional) – _description_. Defaults to 1.

test(dataloader)[source]

this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)

baselines.differentiable_triage module

class baselines.differentiable_triage.DifferentiableTriage(model_class, model_rejector, device, weight_low=0.0, strategy='human_error', plotting_interval=100)[source]

Bases: BaseMethod

find_machine_samples(model_outputs, data_y, hum_preds)[source]
Parameters
  • model_outputs (_type_) – _description_

  • data_y (_type_) – _description_

  • hum_preds (_type_) – _description_

Returns

binary array of size equal to the input indicating whether to train or not on each poin

Return type

array

fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]

this function should fit the model and be enough to evaluate the model

fit_epoch_class(dataloader, optimizer, verbose=True, epoch=1)[source]

train classifier for single epoch :param dataloader: _description_ :type dataloader: dataloader :param optimizer: _description_ :type optimizer: optimizer :param verbose: to print loss or not. Defaults to True. :type verbose: bool, optional :param epoch: _description_. Defaults to 1. :type epoch: int, optional

fit_epoch_class_triage(dataloader, optimizer, verbose=True, epoch=1)[source]

Fit the model for classifier for one epoch

fit_epoch_rejector(dataloader, optimizer, verbose=True, epoch=1)[source]

Fit the rejector for one epoch

fit_hyperparam(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]

This is an optional method that fits and optimizes hyperparameters over a validation set

test(dataloader)[source]

this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)

baselines.differentiable_triage.weighted_cross_entropy_loss(outputs, labels, weights)[source]

Weigthed cross entropy loss outputs: network outputs with softmax labels: target weights: weights for each example

return: weighted cross entropy loss as scalar

baselines.lce_surrogate module

class baselines.lce_surrogate.LceSurrogate(alpha, plotting_interval, model, device)[source]

Bases: BaseSurrogateMethod

fit_hyperparam(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, scheduler=None, verbose=True, test_interval=5)[source]

This is an optional method that fits and optimizes hyperparameters over a validation set

surrogate_loss_function(outputs, hum_preds, data_y)[source]

Implmentation of L_{CE}^{lpha}

baselines.mix_of_exps module

class baselines.mix_of_exps.MixtureOfExperts(model, device, plotting_interval=100)[source]

Bases: BaseMethod

Implementation of Madras et al., 2018

fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]

this function should fit the model and be enough to evaluate the model

fit_epoch(dataloader, optimizer, verbose=True, epoch=1)[source]

Fit the model for one epoch

mixtures_of_experts_loss(outputs, human_is_correct, labels)[source]

Implmentation of Mixtures of Experts loss from Madras et al., 2018

test(dataloader)[source]

this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)

baselines.one_v_all module

class baselines.one_v_all.OVASurrogate(alpha, plotting_interval, model, device)[source]

Bases: BaseMethod

Method of OvA surrogate from Calibrated Learning to Defer with One-vs-All Classifiers https://proceedings.mlr.press/v162/verma22c/verma22c.pdf

LogisticLossOVA(outputs, y)[source]
fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]

this function should fit the model and be enough to evaluate the model

fit_epoch(dataloader, optimizer, verbose=True, epoch=1)[source]

Fit the model for one epoch model: model to be trained dataloader: dataloader optimizer: optimizer verbose: print loss epoch: epoch number

ova_loss(outputs, m, labels)[source]

outputs: network outputs m: cost of deferring to expert cost of classifier predicting hum_preds == target labels: target

test(dataloader)[source]

this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)

baselines.selective_prediction module

class baselines.selective_prediction.SelectivePrediction(model_class, device, plotting_interval=100)[source]

Bases: BaseMethod

Selective Prediction method, train classifier on all data, and defer based on thresholding classifier confidence (max class prob)

fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]

this function should fit the model and be enough to evaluate the model

fit_epoch_class(dataloader, optimizer, verbose=True, epoch=1)[source]
set_optimal_threshold(dataloader)[source]

set threshold to maximize system accuracy on validation set

Parameters

dataloader (_type_) – dataloader validation set

test(dataloader)[source]

this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)

Module contents