syntheticdata: Synthetic Clinical Data Generation and Privacy-Preserving Validation

Generates synthetic clinical datasets that preserve statistical properties while reducing re-identification risk. Implements Gaussian copula simulation, bootstrap with noise injection, and Laplace noise perturbation, with built-in utility and privacy validation metrics. Useful for privacy-aware data sharing in multi-site clinical research. Validates synthetic data quality via distributional similarity (Kolmogorov-Smirnov), discriminative accuracy (real-vs-synthetic classifier), and nearest-neighbor privacy ratio. Methods described in Jordon et al. (2022) <doi:10.48550/arXiv.2205.03257> and Snoke et al. (2018) <doi:10.1111/rssa.12358>.

Version: 0.1.0
Depends: R (≥ 4.1.0)
Imports: cli (≥ 3.4.0), dplyr (≥ 1.1.0), stats, tibble (≥ 3.1.0)
Suggests: knitr, rmarkdown, testthat (≥ 3.0.0)
Published: 2026-04-02
DOI: 10.32614/CRAN.package.syntheticdata (may not be active yet)
Author: Cuiwei Gao [aut, cre, cph]
Maintainer: Cuiwei Gao <48gaocuiwei at gmail.com>
BugReports: https://github.com/CuiweiG/syntheticdata/issues
License: MIT + file LICENSE
URL: https://github.com/CuiweiG/syntheticdata
NeedsCompilation: no
Language: en-US
Citation: syntheticdata citation info
Materials: README, NEWS
CRAN checks: syntheticdata results

Documentation:

Reference manual: syntheticdata.html , syntheticdata.pdf
Vignettes: Generating and validating synthetic clinical data (source, R code)

Downloads:

Package source: syntheticdata_0.1.0.tar.gz
Windows binaries: r-devel: syntheticdata_0.1.0.zip, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): syntheticdata_0.1.0.tgz, r-oldrel (arm64): not available, r-release (x86_64): not available, r-oldrel (x86_64): not available

Linking:

Please use the canonical form https://CRAN.R-project.org/package=syntheticdata to link to this page.