datasetsdefer package

Submodules

datasetsdefer.basedataset module

class datasetsdefer.basedataset.BaseDataset(*args, **kwargs)[source]

Bases: ABC

Abstract method for learning to defer methods

abstract generate_data()[source]

generates the data loader, called on init

should generate the following must:

self.data_train_loader self.data_val_loader self.data_test_loader self.d (dimension) self.n_dataset (number of classes in target)

datasetsdefer.broward module

class datasetsdefer.broward.BrowardDataset(data_dir, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]

Bases: BaseDataset

Compas dataset with human judgements for 1000 points

generate_data()[source]

generate data for training, validation and test sets

datasetsdefer.chestxray module

class datasetsdefer.chestxray.ChestXrayDataset(non_deferral_dataset, use_data_aug, data_dir, label_chosen, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]

Bases: BaseDataset

Chest X-ray dataset from NIH with multiple radiologist annotations per point from Google Research

generate_data()[source]

generate data for training, validation and test sets

datasetsdefer.cifar_h module

class datasetsdefer.cifar_h.Cifar10h(use_data_aug, data_dir, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]

Bases: BaseDataset

CIFAR-10H dataset with seperate human annotations on the test set of CIFAR-10

generate_data()[source]

generate data for training, validation and test sets : “airplane”: 0, “automobile”: 1, “bird”: 2, “cat”: 3, “deer”: 4, “dog”: 5, “frog”: 6, “horse”: 7, “ship”: 8, “truck”: 9

metrics_cifar10h(exp_preds, labels)[source]

datasetsdefer.cifar_synth module

class datasetsdefer.cifar_synth.CifarSynthDataset(expert_k, use_data_aug, test_split=0.2, val_split=0.1, batch_size=1000, n_dataset=10, transforms=None)[source]

Bases: BaseDataset

This is the CifarK synthetic expert on top of Cifar-10 from Consistent Estimators for Learning to Defer (https://arxiv.org/abs/2006.01862)

generate_data()[source]

generate data for training, validation and test sets

class datasetsdefer.cifar_synth.CifarSynthExpert(k, n_classes)[source]

Bases: object

simple class to describe our synthetic expert on CIFAR-10 k: number of classes expert can predict, n_classes: number of classes (10 for CIFAR-10)

predict(labels)[source]

datasetsdefer.generic_dataset module

class datasetsdefer.generic_dataset.GenericDatasetDeferral(data_train, data_test=None, test_split=0.2, val_split=0.1, batch_size=100, transforms=None)[source]

Bases: BaseDataset

generate_data()[source]

generates the data loader, called on init

should generate the following must:

self.data_train_loader self.data_val_loader self.data_test_loader self.d (dimension) self.n_dataset (number of classes in target)

class datasetsdefer.generic_dataset.GenericImageExpertDataset(images, targets, expert_preds, transforms_fn, to_open=False)[source]

Bases: Dataset

datasetsdefer.hatespeech module

class datasetsdefer.hatespeech.HateSpeech(data_dir, embed_texts, include_demographics, expert_type, device, synth_exp_param=[0.7, 0.7], test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]

Bases: BaseDataset

Hatespeech dataset from Davidson et al. 2017

generate_data()[source]

generate data for training, validation and test sets

model_setting(model_nn)[source]
class datasetsdefer.hatespeech.ModelPredictAAE(modelfile, vocabfile)[source]

Bases: object

infer_cvb0(invocab_tokens, alpha, numpasses)[source]
load_model()[source]
predict_lang(tokens, alpha=1, numpasses=5, thresh1=1, thresh2=0.2)[source]

datasetsdefer.imagenet_16h module

class datasetsdefer.imagenet_16h.ImageNet16h(use_data_aug, data_dir, noise_version, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]

Bases: BaseDataset

generate_data()[source]

generate data for training, validation and test sets

datasetsdefer.synthetic_data module

class datasetsdefer.synthetic_data.SyntheticData(train_samples=1000, test_samples=1000, data_distribution='mix_of_guassians', d=10, mean_scale=1, expert_deferred_error=0, expert_nondeferred_error=0.5, machine_nondeferred_error=0, num_of_guassians=10, val_split=0.1, batch_size=1000, transforms=None)[source]

Bases: BaseDataset

Synthetic dataset introduced in our work

generate_data()[source]

generates the data loader, called on init

should generate the following must:

self.data_train_loader self.data_val_loader self.data_test_loader self.d (dimension) self.n_dataset (number of classes in target)

Module contents