baselines package
Submodules
baselines.basemethod module
- class baselines.basemethod.BaseMethod(*args, **kwargs)[source]
Bases:
ABC
Abstract method for learning to defer methods
- abstract fit(*args, **kwargs)[source]
this function should fit the model and be enough to evaluate the model
- fit_hyperparam(*args, **kwargs)[source]
This is an optional method that fits and optimizes hyperparameters over a validation set
- abstract test(dataloader)[source]
this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)
- class baselines.basemethod.BaseSurrogateMethod(alpha, plotting_interval, model, device)[source]
Bases:
BaseMethod
Abstract method for learning to defer methods based on a surrogate model
- fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, scheduler=None, verbose=True, test_interval=5)[source]
this function should fit the model and be enough to evaluate the model
baselines.compare_confidence module
- class baselines.compare_confidence.CompareConfidence(model_class, model_expert, device, plotting_interval=100)[source]
Bases:
BaseMethod
Method trains classifier indepedently on cross entropy, and expert model on whether human prediction is equal to ground truth. Then, at each test point we compare the confidence of the classifier and the expert model.
- fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, scheduler=None, verbose=True, test_interval=5)[source]
fits classifier and expert model
- Parameters
dataloader_train (_type_) – train dataloader
dataloader_val (_type_) – val dataloader
dataloader_test (_type_) – _description_
epochs (_type_) – training epochs
optimizer (_type_) – optimizer function
lr (_type_) – learning rate
scheduler (_type_, optional) – scheduler function. Defaults to None.
verbose (bool, optional) – _description_. Defaults to True.
test_interval (int, optional) – _description_. Defaults to 5.
- Returns
metrics on the test set
- Return type
dict
- fit_epoch_class(dataloader, optimizer, verbose=True, epoch=1)[source]
train classifier for single epoch :param dataloader: _description_ :type dataloader: dataloader :param optimizer: _description_ :type optimizer: optimizer :param verbose: to print loss or not. Defaults to True. :type verbose: bool, optional :param epoch: _description_. Defaults to 1. :type epoch: int, optional
- fit_epoch_expert(dataloader, optimizer, verbose=True, epoch=1)[source]
train expert model for single epoch
- Parameters
dataloader (_type_) – _description_
optimizer (_type_) – _description_
verbose (bool, optional) – _description_. Defaults to True.
epoch (int, optional) – _description_. Defaults to 1.
- test(dataloader)[source]
this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)
baselines.differentiable_triage module
- class baselines.differentiable_triage.DifferentiableTriage(model_class, model_rejector, device, weight_low=0.0, strategy='human_error', plotting_interval=100)[source]
Bases:
BaseMethod
- find_machine_samples(model_outputs, data_y, hum_preds)[source]
- Parameters
model_outputs (_type_) – _description_
data_y (_type_) – _description_
hum_preds (_type_) – _description_
- Returns
binary array of size equal to the input indicating whether to train or not on each poin
- Return type
array
- fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]
this function should fit the model and be enough to evaluate the model
- fit_epoch_class(dataloader, optimizer, verbose=True, epoch=1)[source]
train classifier for single epoch :param dataloader: _description_ :type dataloader: dataloader :param optimizer: _description_ :type optimizer: optimizer :param verbose: to print loss or not. Defaults to True. :type verbose: bool, optional :param epoch: _description_. Defaults to 1. :type epoch: int, optional
- fit_epoch_class_triage(dataloader, optimizer, verbose=True, epoch=1)[source]
Fit the model for classifier for one epoch
- fit_epoch_rejector(dataloader, optimizer, verbose=True, epoch=1)[source]
Fit the rejector for one epoch
- fit_hyperparam(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]
This is an optional method that fits and optimizes hyperparameters over a validation set
- test(dataloader)[source]
this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)
baselines.lce_surrogate module
- class baselines.lce_surrogate.LceSurrogate(alpha, plotting_interval, model, device)[source]
Bases:
BaseSurrogateMethod
baselines.mix_of_exps module
- class baselines.mix_of_exps.MixtureOfExperts(model, device, plotting_interval=100)[source]
Bases:
BaseMethod
Implementation of Madras et al., 2018
- fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]
this function should fit the model and be enough to evaluate the model
- mixtures_of_experts_loss(outputs, human_is_correct, labels)[source]
Implmentation of Mixtures of Experts loss from Madras et al., 2018
- test(dataloader)[source]
this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)
baselines.one_v_all module
- class baselines.one_v_all.OVASurrogate(alpha, plotting_interval, model, device)[source]
Bases:
BaseMethod
Method of OvA surrogate from Calibrated Learning to Defer with One-vs-All Classifiers https://proceedings.mlr.press/v162/verma22c/verma22c.pdf
- fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]
this function should fit the model and be enough to evaluate the model
- fit_epoch(dataloader, optimizer, verbose=True, epoch=1)[source]
Fit the model for one epoch model: model to be trained dataloader: dataloader optimizer: optimizer verbose: print loss epoch: epoch number
- ova_loss(outputs, m, labels)[source]
outputs: network outputs m: cost of deferring to expert cost of classifier predicting hum_preds == target labels: target
- test(dataloader)[source]
this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)
baselines.selective_prediction module
- class baselines.selective_prediction.SelectivePrediction(model_class, device, plotting_interval=100)[source]
Bases:
BaseMethod
Selective Prediction method, train classifier on all data, and defer based on thresholding classifier confidence (max class prob)
- fit(dataloader_train, dataloader_val, dataloader_test, epochs, optimizer, lr, verbose=True, test_interval=5, scheduler=None)[source]
this function should fit the model and be enough to evaluate the model
- set_optimal_threshold(dataloader)[source]
set threshold to maximize system accuracy on validation set
- Parameters
dataloader (_type_) – dataloader validation set
- test(dataloader)[source]
this function should return a dict with the following keys: ‘defers’: deferred binary predictions ‘preds’: classifier predictions ‘labels’: labels ‘hum_preds’: human predictions ‘rej_score’: a real score for the rejector, the higher the more likely to be rejected ‘class_probs’: probability of the classifier for each class (can be scores as well)