This article shows how to use
ild_simulate() and
ild_power(..., return_sims = TRUE) to
assess parameter recovery for a fixed effect under the
package’s bundled Gaussian mixed-model simulation setup, and how that
connects to empirical power. It is an
illustrative benchmark for that DGP—not a proof that
every tidyILD method is validated for all data.
Power answers: “How often do we reject the null when
the effect is present?” Recovery answers: “How close
are point estimates (and intervals) to the known coefficient across
repeated simulations?” Both use the same replication loop inside
ild_power().
Recovery summaries (bias, RMSE,
coverage) are Monte Carlo estimates:
their sampling variability shrinks as
n_sim increases (roughly \(\propto 1/\sqrt{n_\text{sim}}\) for means).
The worked example below uses n_sim = 200
as a balance between stability and build time on CRAN;
for publication-style tables, consider larger
n_sim and reporting uncertainty
(e.g. bootstrap across replications or analytical approximations where
available). Chunks that re-run ild_power()
are marked with optional cache = TRUE so
local rebuilds can skip re-simulation when the code is unchanged (not
used on CRAN by default).
ild_simulate() encodesThe base outcome y (before
ild_power() adds a predictor) is built as follows:
id (scale bp_effect).ar1
is set (numeric \(\phi\)); otherwise
i.i.d. Gaussian noise (scale
wp_effect).time as POSIXct from a
regular grid (with optional jitter if
irregular = TRUE).The formula you fit should match this structure (e.g. random
intercept for id when using
ild_lme(..., ar1 = FALSE)). Recovery in
this vignette focuses on a single fixed effect added by
ild_power() on top of that DGP (see
below).
ild_power(..., return_sims = TRUE)For each replication, ild_power():
ild_simulate().x and
sets y <- y + effect_size * x so the
true coefficient of x is
effect_size (call it β).ild_prepare() and
ild_lme() with your formula.With return_sims = TRUE, you get a
tibble of per-replication estimate,
std_error,
p_value, and
converged. Use
ild_recovery_metrics() for
bias, RMSE, and nominal Wald
interval coverage (normal approximation \(\hat\beta \pm z_{1-\alpha/2} \cdot
\mathrm{SE}\)), consistent with the manual calculation shown in
earlier package versions.
library(tidyILD)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
truth <- 0.35
n_sim <- 200L
set.seed(2026)
res <- ild_power(
formula = y ~ x + (1 | id),
n_sim = n_sim,
n_id = 25L,
n_obs_per = 12L,
effect_size = truth,
seed = 2026L,
return_sims = TRUE,
verbose = FALSE
)
sim <- res$sim_results |> dplyr::filter(converged)ild_recovery_metrics(res$sim_results, truth = truth, level = 0.95)
#> # A tibble: 1 × 8
#> truth n n_total mean_estimate bias rmse coverage level
#> <dbl> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0.35 200 200 0.346 -0.00404 0.0368 0.945 0.95Across replications, the mean estimate should be near
β, bias near 0, RMSE positive but modest if the model
is well specified, and coverage near the nominal level
(e.g. 0.95) for large n_sim—subject to
Monte Carlo noise when n_sim is
moderate.
ggplot2::ggplot(sim, ggplot2::aes(x = estimate)) +
ggplot2::geom_histogram(fill = "steelblue", color = "white", bins = 20) +
ggplot2::geom_vline(xintercept = truth, linetype = 2, linewidth = 0.8) +
ggplot2::labs(
x = "Estimated coefficient for x",
y = "Count",
title = "Sampling distribution of fixed-effect estimates",
subtitle = sprintf("True beta = %s (vertical line)", truth)
)The same object res already contains
empirical power: the fraction of converged runs with
p_value < alpha.
tibble::tibble(
power = res$power,
n_reject = res$n_reject,
n_converged = res$n_converged,
alpha = res$alpha
)
#> # A tibble: 1 × 4
#> power n_reject n_converged alpha
#> <dbl> <int> <int> <dbl>
#> 1 1 200 200 0.05Interpretation: Recovery summarizes
the sampling distribution of \(\hat\beta\). Power
summarizes how often that distribution leads to rejection at
alpha. A large effect with low variance
yields good recovery and high power; a small effect may show
acceptable recovery (unbiased estimates) but low power at a fixed sample
size.
ild_power() is built around a
fixed-effect estimand. Recovery of variance
components (random-intercept SD, residual variance) is a
separate question. As a single-run sanity check, you
can inspect VarCorr() on one simulated
dataset ( lme4 or
nlme output depending on
ar1):
set.seed(1)
d_one <- ild_simulate(n_id = 30L, n_obs_per = 15L, seed = 99)
d_one$x <- rnorm(nrow(d_one))
d_one$y <- d_one$y + 0.3 * d_one$x
prep_one <- ild_prepare(d_one, id = "id", time = "time")
fit_one <- ild_lme(y ~ x + (1 | id), prep_one, ar1 = FALSE, warn_no_ar1 = FALSE)
#> Warning: Predictor(s) 'x' vary both within and between persons. Consider using
#> ild_center(x) to separate within- and between-person effects and avoid
#> conflation bias.
lme4::VarCorr(fit_one)
#> Groups Name Std.Dev.
#> id (Intercept) 0.97185
#> Residual 0.66690This is not a simulation-based assessment of variance recovery—only a template for where to look when extending benchmarks.
ild_simulate(..., ar1 = phi) (numeric
\(\phi\)) injects AR(1)
within-person correlation in the outcome
before ild_power() adds
x.ild_lme(..., ar1 = TRUE) fits a
residual AR1/CAR1 structure in
nlme (different object from the DGP’s
\(\phi\)); when
ar1 = TRUE, the formula must be
fixed-effects-only with
random = ~ 1 | id (see
?ild_lme).ild_power() exposes
ar1 as a logical flag for
the fit, not the DGP’s \(\phi\). To study AR(1) DGPs with
ild_power(), you would need a
custom loop calling
ild_simulate(..., ar1 = ...) yourself, or
prioritize alignment by matching structures in both steps. Minimal
illustration of DGP AR + nlme residual correlation on
one draw:
set.seed(2)
d_ar <- ild_simulate(n_id = 20L, n_obs_per = 14L, ar1 = 0.35, wp_effect = 0.5, seed = 2)
d_ar$x <- rnorm(nrow(d_ar))
d_ar$y <- d_ar$y + 0.25 * d_ar$x
prep_ar <- ild_prepare(d_ar, id = "id", time = "time")
fit_ar <- tryCatch(
ild_lme(y ~ x, prep_ar, ar1 = TRUE, random = ~ 1 | id, warn_no_ar1 = FALSE),
error = function(e) NULL
)
#> Warning: Predictor(s) 'x' vary both within and between persons. Consider using
#> ild_center(x) to separate within- and between-person effects and avoid
#> conflation bias.
if (!is.null(fit_ar)) summary(fit_ar) else "nlme fit failed on this platform (skip)"
#> Linear mixed-effects model fit by REML
#> Data: data
#> AIC BIC logLik
#> 573.1519 591.29 -281.5759
#>
#> Random effects:
#> Formula: ~1 | id
#> (Intercept) Residual
#> StdDev: 1.024086 0.8321674
#>
#> Correlation Structure: AR(1)
#> Formula: ~.ild_seq | id
#> Parameter estimate(s):
#> Phi
#> 0.6910938
#> Fixed effects: list(formula)
#> Value Std.Error DF t-value p-value
#> (Intercept) 0.9146186 0.25045053 259 3.651893 3e-04
#> x 0.2603221 0.03073403 259 8.470158 0e+00
#> Correlation:
#> (Intr)
#> x -0.021
#>
#> Standardized Within-Group Residuals:
#> Min Q1 Med Q3 Max
#> -2.17356216 -0.70843954 -0.07166726 0.61178709 2.15484654
#>
#> Number of Observations: 280
#> Number of Groups: 20ild_power() currently drives
ild_lme() only. For brms
/ Stan, use ild_simulate() +
ild_fit(..., backend = "brms") (or
ild_brms()) in your own replication loop,
then summarize posterior intervals against
truth with the same
coverage idea (credible vs Wald nominal coverage is
not interchangeable—document which you use).vignette("kfas-state-space-modeling", package = "tidyILD")
and ?ild_kfas. Cross-checking mixed-model
simulation benchmarks against KFAS is out of scope here but natural for
method comparison papers.For repeatable CI-style checks across several
engines (lme4, optional nlme /
brms / KFAS / ctsem), the
package ships a non-exported harness documented in
inst/dev/BACKEND_VALIDATION_BENCHMARK_CONTRACT.md.
From a source tree, run
scripts/run-backend-validation-benchmarks.R with a
--tier (smoke,
nightly, full) and compare outputs to JSON
thresholds via
scripts/check-backend-validation-thresholds.R. This
complements the ild_power() /
ild_recovery_metrics() workflow above by
standardizing raw and summary tables for trend
comparison; it is not a substitute for study-specific
simulation design.
ild_power(): this path
targets a single fixed effect in a linear mixed model
fit with ild_lme(), matching the current
ild_power() implementation.lme4
fits, ild_power() uses a Wald
z-approximation when p-values are unavailable
from the engine; that is appropriate for this illustration but not
identical to likelihood-ratio tests.vignette("tidyILD-workflow", package = "tidyILD")
— end-to-end pipeline.?ild_power,
?ild_recovery_metrics — simulation-based
power and recovery summaries.?ild_simulate — base data
generator.#> R version 4.5.3 (2026-03-11)
#> Platform: aarch64-apple-darwin20
#> Running under: macOS Sequoia 15.6
#>
#> Matrix products: default
#> BLAS: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.1
#>
#> locale:
#> [1] C/en_US/en_US/C/en_US/en_US
#>
#> time zone: America/New_York
#> tzcode source: internal
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] ggplot2_4.0.0 dplyr_1.1.4 tidyILD_0.4.0
#>
#> loaded via a namespace (and not attached):
#> [1] Matrix_1.7-4 gtable_0.3.6 jsonlite_2.0.0 compiler_4.5.3
#> [5] Rcpp_1.1.0 tidyselect_1.2.1 jquerylib_0.1.4 splines_4.5.3
#> [9] scales_1.4.0 boot_1.3-32 yaml_2.3.10 fastmap_1.2.0
#> [13] lattice_0.22-9 R6_2.6.1 labeling_0.4.3 generics_0.1.4
#> [17] knitr_1.50 rbibutils_2.3 MASS_7.3-65 tibble_3.3.0
#> [21] nloptr_2.2.1 lubridate_1.9.4 tsibble_1.2.0 minqa_1.2.8
#> [25] bslib_0.9.0 pillar_1.11.1 RColorBrewer_1.1-3 rlang_1.1.6
#> [29] cachem_1.1.0 xfun_0.53 sass_0.4.10 S7_0.2.0
#> [33] timechange_0.3.0 cli_3.6.5 withr_3.0.2 magrittr_2.0.4
#> [37] Rdpack_2.6.4 digest_0.6.37 grid_4.5.3 lme4_1.1-37
#> [41] anytime_0.3.12 lifecycle_1.0.4 nlme_3.1-168 reformulas_0.4.1
#> [45] vctrs_0.6.5 evaluate_1.0.5 glue_1.8.0 farver_2.1.2
#> [49] rmarkdown_2.29 tools_4.5.3 pkgconfig_2.0.3 htmltools_0.5.8.1