datasetsdefer package
Submodules
datasetsdefer.basedataset module
datasetsdefer.broward module
- class datasetsdefer.broward.BrowardDataset(data_dir, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]
Bases:
BaseDataset
Compas dataset with human judgements for 1000 points
datasetsdefer.chestxray module
- class datasetsdefer.chestxray.ChestXrayDataset(non_deferral_dataset, use_data_aug, data_dir, label_chosen, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]
Bases:
BaseDataset
Chest X-ray dataset from NIH with multiple radiologist annotations per point from Google Research
datasetsdefer.cifar_h module
- class datasetsdefer.cifar_h.Cifar10h(use_data_aug, data_dir, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]
Bases:
BaseDataset
CIFAR-10H dataset with seperate human annotations on the test set of CIFAR-10
datasetsdefer.cifar_synth module
- class datasetsdefer.cifar_synth.CifarSynthDataset(expert_k, use_data_aug, test_split=0.2, val_split=0.1, batch_size=1000, n_dataset=10, transforms=None)[source]
Bases:
BaseDataset
This is the CifarK synthetic expert on top of Cifar-10 from Consistent Estimators for Learning to Defer (https://arxiv.org/abs/2006.01862)
datasetsdefer.generic_dataset module
- class datasetsdefer.generic_dataset.GenericDatasetDeferral(data_train, data_test=None, test_split=0.2, val_split=0.1, batch_size=100, transforms=None)[source]
Bases:
BaseDataset
datasetsdefer.hatespeech module
- class datasetsdefer.hatespeech.HateSpeech(data_dir, embed_texts, include_demographics, expert_type, device, synth_exp_param=[0.7, 0.7], test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]
Bases:
BaseDataset
Hatespeech dataset from Davidson et al. 2017
datasetsdefer.imagenet_16h module
- class datasetsdefer.imagenet_16h.ImageNet16h(use_data_aug, data_dir, noise_version, test_split=0.2, val_split=0.1, batch_size=1000, transforms=None)[source]
Bases:
BaseDataset
datasetsdefer.synthetic_data module
- class datasetsdefer.synthetic_data.SyntheticData(train_samples=1000, test_samples=1000, data_distribution='mix_of_guassians', d=10, mean_scale=1, expert_deferred_error=0, expert_nondeferred_error=0.5, machine_nondeferred_error=0, num_of_guassians=10, val_split=0.1, batch_size=1000, transforms=None)[source]
Bases:
BaseDataset
Synthetic dataset introduced in our work