mfrmr 0.2.0

Documentation accuracy pass plus research-grounded visualization and GPCM bias-inference refinements. Documentation, citations, and band attributions are corrected against primary sources, with mathematical screening-SE corrections, Snijders-corrected person-fit reporting where the assumptions are met, and clearer plot data for review.

This release keeps the 0.1.6 defaults, but it is not only an infrastructure polish release. Public review helpers have been consolidated on the *_review* names documented below, and the former *_audit* public spellings, S3 compatibility classes, and duplicate top-level fields have been removed as a deliberate breaking cleanup.

Release overview

For most users, the main changes in 0.2.0 are:

Breaking changes in 0.2.0 are intentional and concentrated around public naming clarity: former exported *_audit* helper names and their compatibility S3 classes were removed in favour of the canonical *_review* surface. Model defaults from 0.1.6 are retained.

The detailed notes below are organized as follows:

User pathways, output contracts, and terminology

Mathematical and inferential corrections

Research-grounded visualization refinements

Recovery simulation workflow

Citation and attribution corrections

Documentation refinements

Default changes

No defaults change between 0.1.6 and 0.2.0. The 0.1.6 defaults (quad_points = 31, diagnostic_mode = "both", plot.mfrm_fit(type = "wright"), keep_original = FALSE) are retained.

Note for users upgrading directly from CRAN 0.1.5 to 0.2.0 (skipping intermediate 0.1.6 builds): three defaults were flipped in 0.1.6 and remain on those values in 0.2.0 – diagnose_mfrm(diagnostic_mode) went from "legacy" to "both", plot(fit) returns the Wright map alone instead of a three-plot overview (the overview is still available via plot(fit, type = "bundle")), and fit_mfrm(quad_points) went from 15 to 31. See the “mfrmr 0.1.6” section below for the full description and revert paths.

New features

Continuous integration

New GitHub Actions workflows added alongside the existing pkgdown.yaml: R-CMD-check.yaml runs the matrix on Ubuntu (release / devel / oldrel-1) plus macos-latest and windows-latest (release), and test-coverage.yaml runs covr with artifact upload (no external service contacted).

Differential-functioning display controls

plot_dif_heatmap() gains display controls for cell labels (show_values, value_digits), absolute flag thresholds (flag_threshold, flag_color), and shared symmetric color limits (scale_limit) so several heatmaps can be drawn on a comparable scale.

plot_dif_summary() gains optional normal-approximation confidence intervals, effect-threshold guide lines, method-aware axis labels, and interpretation-guide data that downstream code can render alongside the figure.

Plot data printing

print.mfrm_plot_data() is now defined, so the headline draw = FALSE return value renders as a compact summary (name, title, reusable data shapes, legend / reference-line counts) instead of a raw list dump.

Bounded GPCM fair-average and bias unblock (slope-aware)

fair_average_table() and estimate_bias() no longer hard-stop on GPCM fits. Both helpers now use the slope-aware element-conditional GPCM construction:

Both helpers gain method = "GPCM-slope-aware" and a caveat field that names the slope convention. For fair averages, the original SE columns remain measure-level SEs, while fair_se = TRUE adds structural delta-method fair-average SEs for non-person rows when the MML Hessian is available. For bias values, the SE / t / Prob columns retain their conditional screening interpretation. See ?fair_average_table, ?estimate_bias, and gpcm_capability_matrix() for the full support contract.

build_apa_outputs(), facets_output_contract_review(), and facets_output_file_bundle(include = "score") remain blocked under GPCM in 0.2.0; they require the same SE infrastructure to ship as publication-quality outputs.

Bug fixes

Documentation

Build hygiene

.Rbuildignore tightened the inst/references/ source-package boundary. The two runtime / user-facing files in that directory – facets_column_contract.csv (read at runtime by facets_output_contract_review()) and FACETS_manual_mapping.md (the FACETS Table to mfrmr helper mapping cited in the README) – are preserved.

Performance note

The cpp11 MML backend (src/mml_backend.cpp, RSM and PCM only) is opt-in via options(mfrmr.use_cpp11_backend = TRUE) for this release. It is validated against the pure-R reference at tolerance = 1e-12 on a fixed regression fixture. The default flip to ON is planned for a follow-up release after a cycle of community testing.

Deferred to a follow-up release

Considered for 0.2.0 but not shipped in 0.2.0; carried over to a later release:

These are scheduled for a follow-up release.

mfrmr 0.1.6

This release adds empirical-Bayes shrinkage for small-N facets, a hierarchical-structure and sample-adequacy review layer, integrated missing-code pre-processing, APA output adapters for Word / HTML, model-estimated two-way non-person facet interactions, confidence-interval propagation through the plot surface and the ICC reporting family, and expanded reproducibility manifests. Six bug fixes close issues that affected bias statistics, ZSTD sign, input validation, and graphical state hygiene.

Default changes (three breaking flips)

Three default values change in this release. Scripts that explicitly pass the old value are unaffected; scripts that rely on the default should be reviewed.

New features

Model-estimated facet interactions

fit_mfrm() gains facet_interactions for confirmatory two-way interactions between non-person facets in RSM and PCM fits, for example facet_interactions = "Rater:Criterion". These terms are estimated simultaneously with the main MFRM parameters as fixed effects under zero marginal-sum constraints, contributing (A - 1) * (B - 1) free parameters for an A x B interaction block.

New supporting pieces:

The feature is intentionally narrow for the initial CRAN-facing release: person-involving interactions, higher-order interactions, GPCM interactions, and random-effect facet interactions are deferred. Residual bias screening via estimate_bias() and estimate_all_bias() remains separate from these model-estimated fixed effects.

Empirical-Bayes facet shrinkage

fit_mfrm(..., facet_shrinkage = "empirical_bayes") applies James-Stein / empirical-Bayes shrinkage to each non-person facet’s fixed-effect estimates. fit$facets$others gains ShrunkEstimate, ShrunkSE, and ShrinkageFactor columns, and fit$shrinkage_report summarises the per-facet prior variance, mean shrinkage, and effective degrees of freedom.

The estimator is the classical method-of-moments form (Efron & Morris, 1973):

Two post-hoc helpers make shrinkage available to existing fits:

The "laplace" alias currently routes to the empirical-Bayes path and is reserved for a future penalised-MML implementation.

Integration: summary(fit) exposes FacetShrinkage and FacetShrinkageTau2Mean; build_apa_outputs() adds a Method-section sentence naming the mode, mean tau_hat^2, and mean shrinkage with a Efron & Morris (1973) citation; build_mfrm_manifest() gains a shrinkage_audit table; reporting_checklist() gains an “Empirical-Bayes shrinkage” item.

Hierarchical structure and sample-adequacy review

Five new exported functions describe the observed design, flag small-N facet levels, and quantify ICC / design effect. Estimation remains fixed-effects MFRM; these helpers are purely descriptive and do not alter the fit.

Fit- and reporting-stack integration:

Optional dependencies igraph and lme4 move to Suggests; when either is absent the relevant report is omitted with a clear message().

Missing-code pre-processing in the fit call

fit_mfrm() now accepts missing_codes = NULL | TRUE | "default" | <character vector>, forwarded to prepare_mfrm_data(), review_mfrm_anchors(), and describe_mfrm_data(). When active, the standard FACETS / SPSS / SAS sentinels ("99", "999", "-1", "N", "NA", "n/a", ".", "" by default, or any caller- supplied set) are converted to NA on the person, facets, and score columns before any downstream processing. Replacement counts are recorded in fit$prep$missing_recoding and surfaced through build_mfrm_manifest()$missing_recoding. The default (missing_codes = NULL) is strictly backward-compatible.

A standalone recode_missing_codes() helper is also exported for users who prefer to recode before calling fit_mfrm().

APA output adapters

kableExtra and flextable join Suggests.

Shrinkage and review visualisations

All three methods follow the existing preset = c("standard", "publication", "compact") convention and use base-R graphics.

Confidence intervals across the plot surface

Additional visualisations

Fourteen additions across the plot surface, all base-R / additive (default behaviours unchanged):

igraph is already in Suggests; the equating-graph view falls back to the bar chart when igraph is not installed.

Expanded test coverage

Direct regression tests for the 0.1.6 additions:

Internal architecture

row_max_fast() and the three category_prob_* polytomous-response kernels are now in R/core-category-probabilities.R instead of inline in R/mfrm_core.R. Pure file-level reorganization; no behaviour change. The remaining structural split of mfrm_core.R (likelihood / optimizer / EM / gradients / prep / report tables) is scheduled for a future release.

Package-level MnSq misfit threshold

mfrm_misfit_thresholds() returns the lower / upper Linacre acceptance band that mfrmr screens use when flagging element-level Infit / Outfit MnSq misfit. Defaults are c(lower = 0.5, upper = 1.5) and can be overridden globally via R options:

Helpers that consume the band include summary(diagnose_mfrm(...)) (misfit_flagged block + key_warnings auto-flag), build_misfit_casebook() (the new element_fit source family), the bias / misfit narrative inside build_apa_outputs(), and facet_quality_dashboard() when misfit_warn = NULL. Setting the options once at the top of an analysis script therefore changes every downstream screen at once.

Additional secondary plots

Four new public helpers extend the diagnostic plot family:

plot_bubble() gains a view = c("measure", "infit_outfit") argument. The default "measure" keeps the historical Measure (logit) x MnSq bubble layout; view = "infit_outfit" switches to the Winsteps Table 30 layout (Infit MnSq on x, Outfit MnSq on y, bubble size defaults to N). Both views return the same mfrm_plot_data contract.

plot_dif_heatmap(draw = FALSE) now returns an mfrm_plot_data object whose data$matrix is the metric matrix (was previously the bare matrix only).

plot_information(..., draw = FALSE) outputs now include a series field listing which curves the legend describes ("Information", "SE", or both for type = "both"), so downstream ggplot2 re-renderers can map the right column without inspecting type manually.

Reporting surface enrichments

Internal architecture: file split

To improve navigability of the core estimation engine, four self-contained sections moved out of R/mfrm_core.R into focused files. All functions remain internal and the public API is unchanged.

R/api-simulation.R similarly grew an R/api-simulation-future-branch.R companion file holding the future-branch design-schema layer. Public simulation entry points (simulate_mfrm_data, evaluate_mfrm_design, evaluate_mfrm_diagnostic_screening, evaluate_mfrm_signal_detection) remain in R/api-simulation.R.

R/api-plotting-extras2.R was renamed to R/api-plotting-screening.R to drop the numerical suffix in favour of a functional name; tests follow the same rename.

A new tests/testthat/helper-fixtures.R exposes make_toy_fit() / make_toy_diagnostics() / local_toy_fit() helpers so future tests can reuse the standard example_core fit without retyping the load_mfrmr_data() + fit_mfrm() + diagnose_mfrm() chain.

Replay-script overhaul

export_mfrm_bundle() and build_mfrm_replay_script() now write a self-contained replay package:

Performance: diagnose_mfrm() on large designs

calc_interrater_agreement() (the inter-rater agreement helper that diagnose_mfrm() calls when Person is part of facet_cols) previously used a list() for the per-context probability lookup and c(exp_vals, ...) accumulation inside a per-row loop. This gave near-quadratic scaling: 6,400 observations took ~2 s, but 72,000 observations took ~141 s. The lookup is now an environment (hash-backed for character keys) and exp_vals is preallocated and filled by index, so the helper now scales linearly in the number of observations. On the 72,000-observation benchmark in the review, diagnose_mfrm() drops from ~141 s to ~15 s.

The make_union_find() helper used by the connectivity audit was also rewritten with an iterative find_root (with path compression) instead of the previous recursive form. Designs whose union chain depth exceeded options(expressions) (default 5,000) no longer error out with “evaluation is too deeply nested”.

Input validation: degenerate inputs surface earlier

prepare_mfrm_data() now:

fit_mfrm() now treats NaN / Inf for maxit, reltol, and quad_points as invalid input with a localised English error, instead of falling through to R’s locale-dependent “missing value where TRUE/FALSE needed” message.

Pre-rendered cheatsheet PDF

The two-page landscape cheatsheet now ships in pre-rendered form at system.file("cheatsheet", "mfrmr-cheatsheet.pdf", package = "mfrmr") alongside the existing .Rmd source. Users without a working LaTeX toolchain can open the PDF directly; users who want to customize it can still knit the .Rmd with rmarkdown::render(). The README and ?mfrmr package help now point at both files.

Help-page examples: “what to look for” annotations

The most-visited help pages now embed concrete interpretation comments inside their @examples blocks. Each shipped example shows what value ranges or patterns indicate “good”, what threshold or rule of thumb applies, and what follow-up to run if the value is off. Coverage in 0.1.6 includes:

Help-page examples: lighter-weight \donttest{}

Several main entry points now expose a small fast-path block (a JML fit on example_core plus a single diagnostic / plot call) before the heavier \donttest{} block. The fast path is below R CMD check’s example-time budget and provides a regression net that runs every check, while the full \donttest{} block continues to showcase the larger MML / publication-route examples. A final CRAN-timing pass keeps the active examples at lightweight maxit = 30 smoke fits so the printed examples demonstrate converged objects without returning to the original long-running example surface. Affected pages: ?fit_mfrm, ?diagnose_mfrm, ?plot_qc_dashboard, ?reporting_checklist, ?build_apa_outputs.

Documentation

Yen Q3 local-dependence statistic

q3_statistic(fit, diagnostics) returns the Yen (1984) Q3 index between every facet-level pair, with three published reporting thresholds (Yen 0.20, Marais 0.30, Christensen et al. relative 0.20) and a textual Interpretation column that names which flag(s) each pair triggered. The helper reuses the standardized- residual pivot that plot_local_dependence_heatmap() already draws, so the table and the heatmap stay numerically consistent.

Extended person-fit indices

compute_person_fit_indices(diagnostics, fit) adds person-level fit detail on top of the Infit / Outfit / ZSTD columns that diagnose_mfrm() already exposes:

The reported lz statistic is asymptotically standard normal under the conditional-independence assumption; |lz| > 1.96 / 2.58 are the 5% / 1% reporting flags.

Generalizability-theory adapter

mfrm_generalizability(fit) re-fits the rating data as a crossed random-effects model Score ~ 1 + (1 | Person) + (1 | Facet1) + ... via lme4::lmer and returns the canonical G / Phi coefficients plus per-source variance components. Useful when a reviewer asks for a generalizability-theory complement to the Rasch-style separation / reliability statistics that diagnose_mfrm() already emits.

Import adapters: mirt / TAM / eRm

Three thin importers expose external fit objects via the same mfrm_fit interface that the mfrmr plot and table helpers consume:

The imported objects carry the mfrm_imported_fit class and populate measurement-side slots (facets$person, facets$others, steps, summary) only. Bias / DIF / anchor / replay slots are explicitly not populated; full bundle import is planned for a future release.

Parallel parametric-bootstrap ICC

compute_facet_icc(boot = "boot") gains ci_boot_parallel ("no" / "multicore" / "snow") and ci_boot_ncpus arguments that are forwarded to lme4::bootMer(). The per-replicate cli progress bar is suppressed under parallel execution because worker processes hold their own copy of the progress state.

Parallel evaluate_mfrm_design (scaffold)

evaluate_mfrm_design() accepts a parallel = c("no", "future") argument. When "future" is requested and the future.apply Suggests package is installed, the rep loop within each design row honours whatever future::plan() is currently active; cross-design-row parallelism is planned for a future release. Without future.apply the call falls back to serial execution with an explicit message.

Resumable MML EM fits

fit_mfrm() accepts a checkpoint = list(file = ..., every_iter = ...) argument. When supplied to a mml_engine = "em" (or hybrid) fit, the EM scaffolding writes its state to file every every_iter outer iterations using saveRDS(). If the file exists when a subsequent call starts, the engine resumes from the recorded iteration. The direct optim() engine ignores the checkpoint; non-EM fits run unaffected.

GPCM verification tests

A new tests/testthat/test-gpcm-verification.R exercises every "supported" and "supported_with_caveat" row of gpcm_capability_matrix() on a toy dataset and asserts the documented helper returns the expected shape. "blocked" and "deferred" rows have negative tests that confirm the helper either refuses to run or returns an explicit caveat. These tests make the GPCM scope a contract that future commits cannot silently shrink.

Optional FACETS Table 7 style fit output on fit\(facets\)others

fit_mfrm(attach_diagnostics = TRUE) runs diagnose_mfrm() once after the fit and merges the per-level SE, Infit, Outfit, and PtMeaCorr columns onto fit$facets$others. This makes the facet table look like a FACETS Table 7 summary without a separate call. The default FALSE preserves the minimal Facet / Level / Estimate layout from 0.1.5.

Reproducibility

build_mfrm_manifest() gains several new tables so replay bundles carry everything a deterministic re-run needs:

digest is added to Suggests.

Bug fixes

Messaging improvements

Documentation and citations

Reference citations corrected:

Plot polish

Other additions

Test suite

6,380+ tests pass (up from 6,343 in 0.1.5), with 0 failures and 0 errors. New test files:

Pre-existing test-harness errors unrelated to 0.1.5 behaviour have also been cleaned up (S3 dispatch, GPCM scope wording, internal-helper prefixing with mfrmr:::).

mfrmr 0.1.5

Maintenance release

First-use workflow

Estimation and scoring

Diagnostics, reporting, and visualization

External-software scope

mfrmr 0.1.4

CRAN resubmission

mfrmr 0.1.3

CRAN resubmission

mfrmr 0.1.2

CRAN resubmission

mfrmr 0.1.1

CRAN resubmission

mfrmr 0.1.0

Initial release

Package operations and publication readiness