In EMA and diary studies, missing responses are often non-ignorable in substance even when analysts assume missing at random (MAR) for estimation: burden, symptom severity, context, or device issues can co-determine both whether a prompt is answered and the outcome. tidyILD does not replace dedicated missing-data software; it gives structured diagnostics, person-level adherence views, time-oriented summaries, and hooks to IPW-based sensitivity workflows already in the package.
MNAR (missing not at random) means missingness depends on unobserved values or latent states. No routine plot proves MAR vs MNAR. Use multiple sensitivity routes and transparent reporting.
The ordinal occasion index .ild_seq
(from ild_prepare()) is the default backbone for “wave”
summaries; it is not the same as equal calendar
spacing—see
vignette("ild-decomposition-and-spacing", package = "tidyILD")
when timing is irregular.
ild_missing_pattern() and
heatmapsild_missing_pattern() tabulates NA rates by variable and
by person, and builds a person × occasion heatmap
(sequence index on the x-axis). Pass outcome to
enrich by_id with compliance metrics from
ild_missing_compliance() (see below).
library(tidyILD)
set.seed(11)
d <- ild_simulate(n_id = 25, n_obs_per = 12, seed = 11)
d$stress <- rnorm(nrow(d))
d$mood <- d$y
miss_i <- sample(nrow(d), 45)
d$mood[miss_i] <- NA
x <- ild_prepare(d, id = "id", time = "time")
mp <- ild_missing_pattern(x, vars = c("mood", "stress"), outcome = "mood")
mp$summary
#> # A tibble: 2 × 4
#> var n_obs n_na pct_na
#> <chr> <int> <int> <dbl>
#> 1 mood 300 45 15
#> 2 stress 300 0 0
head(mp$by_id, 3)
#> # A tibble: 3 × 10
#> .ild_id mood_n_obs mood_n_na stress_n_obs stress_n_na n_rows n_obs_outcome
#> <int> <int> <int> <int> <int> <int> <int>
#> 1 1 10 2 12 0 12 10
#> 2 2 9 3 12 0 12 9
#> 3 3 11 1 12 0 12 11
#> # ℹ 3 more variables: pct_nonmissing_outcome <dbl>, longest_run_observed <int>,
#> # monotone_missing <lgl>Plot the same view with
ild_plot(x, type = "missingness", var = "mood") (see
?ild_plot).
ild_missing_compliance()tidyILD::ild_missing_compliance() returns, per
.ild_id:
pct_nonmissing_outcome,
longest_run_observed (longest streak of
observed values in time order),monotone_missing: TRUE
if, after the first missing outcome, all later values are missing
(NA if there is no missingness for that person),expected_occasions for rough
adherence vs planned N (pct_of_expected,
meets_expected_rows).ild_missing_model() and
ild_missing_bias()ild_missing_model() fits a
logistic model for is.na(outcome) ~
predictors (pooling glm or glmer with
random = TRUE). Use it as a diagnostic for
whether observed covariates predict missingness, not as proof of
MAR.ild_missing_bias() is a shortcut for
one numeric predictor vs missingness (teaching / quick
screening).If predictors are associated with missingness, complete-case summaries of the outcome can be biased even when a mixed model uses all rows—because the composition of who contributes at each occasion may shift. Compare descriptive means by missingness pattern only as exploratory, not causal.
A linear mixed model fitted to all available rows uses the likelihood contribution from observed outcomes conditional on random effects. Under MAR and correct mean and covariance specification, inference for the outcome model can be appropriate while ignoring the missingness mechanism (likelihood-based inference). That statement has scope limits:
tidyILD encourages comparing descriptives and fits on full vs complete-case data as a coarse sensitivity check, not a formal test.
ild_missing_cohort(): fraction of
non-missing outcomes at each .ild_seq plus an optional line
plot.ild_missing_hazard_first(): discrete
hazard of being missing on the current row among rows
at risk (previous occasion observed, or first
occasion). Under intermittent missingness this is a
rough first-event summary; under monotone
dropout it aligns better with a discrete-time dropout
hazard.coh <- ild_missing_cohort(x, outcome = "mood", plot = FALSE)
head(coh$by_occasion)
#> # A tibble: 6 × 4
#> .ild_seq n_rows n_obs pct_observed
#> <int> <int> <int> <dbl>
#> 1 1 25 23 92
#> 2 2 25 21 84
#> 3 3 25 21 84
#> 4 4 25 24 96
#> 5 5 25 19 76
#> 6 6 25 21 84
head(ild_missing_hazard_first(x, outcome = "mood"))
#> # A tibble: 6 × 4
#> .ild_seq n_at_risk n_missing hazard
#> <int> <int> <int> <dbl>
#> 1 1 25 2 0.08
#> 2 2 23 3 0.130
#> 3 3 21 2 0.0952
#> 4 4 21 0 0
#> 5 5 24 5 0.208
#> 6 6 19 3 0.158ild_missingness_report()ild_missingness_report() bundles compliance,
ild_missing_pattern() (with outcome
enrichment), cohort and hazard tables, optional
ild_missing_model(), the same late-dropout
heuristic used in guardrails
(GR_DROPOUT_LATE_CONCENTRATION), and short
snippets for methods text.
rpt <- ild_missingness_report(
x,
outcome = "mood",
predictors = "stress",
fit_missing_model = TRUE,
random = FALSE,
cohort_plot = FALSE
)
names(rpt)
#> [1] "compliance" "pattern" "cohort" "hazard"
#> [5] "flags" "missing_model" "snippets"
rpt$snippets["overview"]
#> overview
#> "Outcome mood was summarized with tidyILD person-level compliance, cohort observed fractions by occasion (.ild_seq), and a discrete-time hazard of first missing row (ordinal schedule; see ?ild_missing_hazard_first)."tidyILD does not fit selection models,
pattern-mixture models, or joint models for MNAR. Consider external
packages and pre-specified sensitivity analyses. The snippets in
ild_missingness_report() remind readers that logistic
missingness models are diagnostic / sensitivity, not
proof of MAR.
If you fit ild_missing_model(), you can feed predicted
probabilities into ild_ipw_weights() and
ild_ipw_refit() for inverse-probability
weighting (see ?ild_ipw_weights and causal vignettes). This
addresses observed confounding of missingness under a
MAR-like weighting story; it is not a
blanket MNAR solution.
Compare complete-case vs full mixed model (same formula):
x_cc <- dplyr::filter(x, !is.na(mood))
fit_full <- ild_lme(mood ~ stress + (1 | id), data = x, warn_uncentered = FALSE)
fit_cc <- ild_lme(mood ~ stress + (1 | id), data = x_cc, warn_uncentered = FALSE)Multiple imputation outside tidyILD, then
ild_prepare() per imputed dataset and pool with
mice / mitools /
brms—keep the imputation model and substantive model
aligned with your estimand.
vignette("tidyILD-workflow", package = "tidyILD"),
vignette("msm-identification-and-recovery", package = "tidyILD"),
?ild_diagnose, ?ild_missing_pattern,
?ild_missingness_report.