Missingness in ILD: diagnostics and sensitivity routes

Why missingness matters in intensive longitudinal data

In EMA and diary studies, missing responses are often non-ignorable in substance even when analysts assume missing at random (MAR) for estimation: burden, symptom severity, context, or device issues can co-determine both whether a prompt is answered and the outcome. tidyILD does not replace dedicated missing-data software; it gives structured diagnostics, person-level adherence views, time-oriented summaries, and hooks to IPW-based sensitivity workflows already in the package.

MNAR (missing not at random) means missingness depends on unobserved values or latent states. No routine plot proves MAR vs MNAR. Use multiple sensitivity routes and transparent reporting.

Types of missingness (useful labels)

The ordinal occasion index .ild_seq (from ild_prepare()) is the default backbone for “wave” summaries; it is not the same as equal calendar spacing—see vignette("ild-decomposition-and-spacing", package = "tidyILD") when timing is irregular.

Descriptive profiling: ild_missing_pattern() and heatmaps

ild_missing_pattern() tabulates NA rates by variable and by person, and builds a person × occasion heatmap (sequence index on the x-axis). Pass outcome to enrich by_id with compliance metrics from ild_missing_compliance() (see below).

library(tidyILD)
set.seed(11)
d <- ild_simulate(n_id = 25, n_obs_per = 12, seed = 11)
d$stress <- rnorm(nrow(d))
d$mood <- d$y
miss_i <- sample(nrow(d), 45)
d$mood[miss_i] <- NA
x <- ild_prepare(d, id = "id", time = "time")
mp <- ild_missing_pattern(x, vars = c("mood", "stress"), outcome = "mood")
mp$summary
#> # A tibble: 2 × 4
#>   var    n_obs  n_na pct_na
#>   <chr>  <int> <int>  <dbl>
#> 1 mood     300    45     15
#> 2 stress   300     0      0
head(mp$by_id, 3)
#> # A tibble: 3 × 10
#>   .ild_id mood_n_obs mood_n_na stress_n_obs stress_n_na n_rows n_obs_outcome
#>     <int>      <int>     <int>        <int>       <int>  <int>         <int>
#> 1       1         10         2           12           0     12            10
#> 2       2          9         3           12           0     12             9
#> 3       3         11         1           12           0     12            11
#> # ℹ 3 more variables: pct_nonmissing_outcome <dbl>, longest_run_observed <int>,
#> #   monotone_missing <lgl>

Plot the same view with ild_plot(x, type = "missingness", var = "mood") (see ?ild_plot).

Person-level compliance: ild_missing_compliance()

tidyILD::ild_missing_compliance() returns, per .ild_id:

cm <- ild_missing_compliance(x, outcome = "mood", expected_occasions = 12L)
summary(cm$pct_nonmissing_outcome)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>   58.33   83.33   83.33   85.00   91.67  100.00

When to use ild_missing_model() and ild_missing_bias()

If predictors are associated with missingness, complete-case summaries of the outcome can be biased even when a mixed model uses all rows—because the composition of who contributes at each occasion may shift. Compare descriptive means by missingness pattern only as exploratory, not causal.

Complete-case vs mixed models (careful wording)

A linear mixed model fitted to all available rows uses the likelihood contribution from observed outcomes conditional on random effects. Under MAR and correct mean and covariance specification, inference for the outcome model can be appropriate while ignoring the missingness mechanism (likelihood-based inference). That statement has scope limits:

tidyILD encourages comparing descriptives and fits on full vs complete-case data as a coarse sensitivity check, not a formal test.

Cohort-level and hazard summaries

coh <- ild_missing_cohort(x, outcome = "mood", plot = FALSE)
head(coh$by_occasion)
#> # A tibble: 6 × 4
#>   .ild_seq n_rows n_obs pct_observed
#>      <int>  <int> <int>        <dbl>
#> 1        1     25    23           92
#> 2        2     25    21           84
#> 3        3     25    21           84
#> 4        4     25    24           96
#> 5        5     25    19           76
#> 6        6     25    21           84
head(ild_missing_hazard_first(x, outcome = "mood"))
#> # A tibble: 6 × 4
#>   .ild_seq n_at_risk n_missing hazard
#>      <int>     <int>     <int>  <dbl>
#> 1        1        25         2 0.08  
#> 2        2        23         3 0.130 
#> 3        3        21         2 0.0952
#> 4        4        21         0 0     
#> 5        5        24         5 0.208 
#> 6        6        19         3 0.158

One entry point: ild_missingness_report()

ild_missingness_report() bundles compliance, ild_missing_pattern() (with outcome enrichment), cohort and hazard tables, optional ild_missing_model(), the same late-dropout heuristic used in guardrails (GR_DROPOUT_LATE_CONCENTRATION), and short snippets for methods text.

rpt <- ild_missingness_report(
  x,
  outcome = "mood",
  predictors = "stress",
  fit_missing_model = TRUE,
  random = FALSE,
  cohort_plot = FALSE
)
names(rpt)
#> [1] "compliance"    "pattern"       "cohort"        "hazard"       
#> [5] "flags"         "missing_model" "snippets"
rpt$snippets["overview"]
#>                                                                                                                                                                                                                 overview 
#> "Outcome mood was summarized with tidyILD person-level compliance, cohort observed fractions by occasion (.ild_seq), and a discrete-time hazard of first missing row (ordinal schedule; see ?ild_missing_hazard_first)."

MNAR as sensitivity (no single fix)

tidyILD does not fit selection models, pattern-mixture models, or joint models for MNAR. Consider external packages and pre-specified sensitivity analyses. The snippets in ild_missingness_report() remind readers that logistic missingness models are diagnostic / sensitivity, not proof of MAR.

IPW and causal tools as one sensitivity route

If you fit ild_missing_model(), you can feed predicted probabilities into ild_ipw_weights() and ild_ipw_refit() for inverse-probability weighting (see ?ild_ipw_weights and causal vignettes). This addresses observed confounding of missingness under a MAR-like weighting story; it is not a blanket MNAR solution.

mm <- ild_missing_model(x, outcome = "mood", predictors = c("stress"), random = TRUE)
x_w <- ild_ipw_weights(x, mm, stabilize = TRUE)
fit_w <- ild_ipw_refit(mood ~ stress + (1 | id), data = x_w, weights = ".ipw")

Other templates (not evaluated here)

Compare complete-case vs full mixed model (same formula):

x_cc <- dplyr::filter(x, !is.na(mood))
fit_full <- ild_lme(mood ~ stress + (1 | id), data = x, warn_uncentered = FALSE)
fit_cc <- ild_lme(mood ~ stress + (1 | id), data = x_cc, warn_uncentered = FALSE)

Multiple imputation outside tidyILD, then ild_prepare() per imputed dataset and pool with mice / mitools / brms—keep the imputation model and substantive model aligned with your estimand.

What tidyILD does not do (and where to look)

See also

vignette("tidyILD-workflow", package = "tidyILD"), vignette("msm-identification-and-recovery", package = "tidyILD"), ?ild_diagnose, ?ild_missing_pattern, ?ild_missingness_report.