Datasets

load_mimic3

bcselector.datasets.load_mimic3(as_frame=True, discretize_data=True, **kwargs)[source]

Load and return the mimic3 dataset. The mimic3 dataset is a medical dataset with multiple target variables. Dataset is avaliable at Physiobank 1. Costs of features were collected in article 2.

Samples total

6591

Dimensionality

306

Target variables

10

Parameters
  • as_frame (bool, default=True) – If True, the data is a pandas DataFrame including columns with appropriate names. The target is a pandas DataFrame with multiple target variables.

  • discretize_data (bool, default=True) – If True, the returned data is discretized with sklearn.preprocessing.KBinsDiscretizer.

  • **kwargs – Arguments passed to sklearn.preprocessing.KBinsDiscretizer constructor.

Returns

  • data ({np.ndarray, pd.DataFrame} of shape (6591, 306)) – The data matrix. If as_frame=True, data will be a pd.DataFrame.

  • target ({np.ndarray, pd.DataFrame} of shape (6591, 10)) – The binary classification target variable. If as_frame=True, target will be a pd.DataFrame.

  • costs ({dict, list)) – Cost of every feature in data. If as_frame=True, target will be a dict.

References

1

MIMIC-III, a freely accessible critical care database. Johnson AEW, Pollard TJ, Shen L, Lehman LH, Feng M, Ghassemi M, Moody B, Szolovits P, Celi LA, and Mark RG. Scientific Data (2016). DOI: 10.1038/sdata.2016.35. Available at: http://www.nature.com/articles/sdata201635.

2

Paweł Teisseyre, Damien Zufferey, and Marta Słomka. Cost-sensitive classifier chains: Se-lecting low-cost features in multi-label classification.Pattern Recognition, 86, 09 2018.

Examples

>>> from bcselector.dataset import load_mimic3
>>> data, target, costs = load_mimic3()

load_hepatitis

bcselector.datasets.load_hepatitis(as_frame=True, discretize_data=True, **kwargs)[source]

Load and return the hepatitis dataset provided. The mimic3 dataset is a small medical dataset with single target variable. Dataset is collected from UCI repository 3.

Samples total

155

Dimensionality

19

Target variables

1

Parameters
  • as_frame (bool, default=True) – If True, the data is a pandas DataFrame including columns with appropriate names. The target is a pandas DataFrame with multiple target variables.

  • discretize_data (bool, default=True) – If True, the returned data is discretized with sklearn.preprocessing.KBinsDiscretizer.

  • kwargs – Arguments passed to sklearn.preprocessing.KBinsDiscretizer constructor.

Returns

  • data ({np.ndarray, pd.DataFrame} of shape (6591, 306)) – The data matrix. If as_frame=True, data will be a pd.DataFrame.

  • target ({np.ndarray, pd.Series} of shape (6591, 10)) – The binary classification target variable. If as_frame=True, target will be a pd.DataFrame.

  • costs ({dict, list)) – Cost of every feature in data. If as_frame=True, target will be a dict.

References

3

Dua, D. and Graff, C. (2019). UCI Machine Learning Repository [http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science.

Examples

>>> from bcselector.dataset import load_hepatitis
>>> data, target, costs = load_hepatitis()