ivcheck

Lifecycle: stable License: MIT

Introduction

ivcheck is an R package that tests the identifying assumptions behind instrumental variable (IV) estimation. It provides three published falsification tests as named R functions, with S3 methods for fitted fixest and ivreg models plus a one-shot wrapper that runs every applicable test in a single call.

Every applied IV paper rests on two assumptions about the instrument Z: the exclusion restriction (Z affects the outcome Y only through the endogenous treatment D) and monotonicity (no defiers). Under these assumptions plus independence, the IV estimand identifies the local average treatment effect (LATE) for compliers (Imbens and Angrist 1994). Both assumptions are untestable-looking in principle, but the methodological literature has derived testable implications on the joint distribution of (Y, D, Z): Kitagawa (2015), Mourifie-Wan (2017), Frandsen-Lefgren-Leslie (2023). Rejection of these tests is evidence that at least one of exclusion or monotonicity has failed. Non-rejection is evidence of no detectable violation at the chosen level.

Applied IV research has not adopted these tests widely. Most empirical IV papers still argue identification by narrative (“my instrument is random-looking because X”), and referees are increasingly frustrated with this. The limiting factor has been tooling rather than conviction: Kitagawa’s test ships as supplementary Matlab code, Mourifie-Wan relies on the Stata clrtest module, and Frandsen-Lefgren-Leslie ships a Stata SSC module called testjfe. None is in R. ivcheck closes that gap: two added lines to a fixest::feols call and you have a published falsification test ready for your paper’s appendix.

The current landscape

The R ecosystem for IV estimation is mature. fixest is the dominant package for fast fixed-effects IV estimation via feols(y ~ x | d ~ z). ivreg provides classical 2SLS with Wu-Hausman, Sargan, and weak-IV F tests. ivmodel covers k-class estimators and weak-IV-robust confidence intervals. ivDiag (Lal, Lockhart, Xu, and Zu 2024, Political Analysis) implements effective-F and Anderson-Rubin diagnostics, valid-t and local-to-zero tests, plus sensitivity analysis.

None of these packages implements the LATE-validity family of falsification tests. Applied researchers who want their IV design formally tested have had to choose between writing a one-off replication script from the original paper’s methodology section, switching to Stata for the test and back to R for the rest of the analysis, or not running the test at all. The third option has dominated.

ivcheck is the first R-native implementation of the LATE-validity family. The implementations are faithful to the published statistics: Kitagawa’s variance-weighted interval-sup Kolmogorov-Smirnov form (equation 2.1 of the paper), the full Chernozhukov-Lee-Rosen intersection-bounds inference with Andrews-Soares adaptive moment selection for Mourifie-Wan with covariates, and the asymptotic chi-squared form of Frandsen-Lefgren-Leslie with multivalued-treatment support via section 4 of the paper. All designed to slot into existing fixest and ivreg workflows without friction.

Installation

# Once accepted by CRAN
install.packages("ivcheck")

# Development version from GitHub
# install.packages("devtools")
devtools::install_github("charlescoverdale/ivcheck")

Quick start

library(fixest)
library(ivcheck)

data(card1995)
m <- feols(lwage ~ 1 | college ~ near_college, data = card1995)
iv_check(m, n_boot = 500)
#> IV validity diagnostic
#>   Kitagawa (2015):     stat = 5.25, p = 0.00, reject
#>   Mourifie-Wan (2017): stat = 5.25, p = 0.00, reject
#> Overall: at least one test rejects IV validity at 0.05.

Two added lines, a falsification test the referee is almost guaranteed to ask about, citation-ready output. The unconditional rejection above is the correct reading: Card’s IV is plausible only conditional on demographic controls. Add a control and the conditional Mourifie-Wan test passes (see the end-to-end example below).

Walkthrough

Output lines prefixed with #> show what the console prints.

A single test on raw vectors

library(ivcheck)

set.seed(1)
n <- 500
z <- sample(0:1, n, replace = TRUE)
d <- rbinom(n, 1, 0.3 + 0.4 * z)
y <- rnorm(n, mean = d)

k <- iv_kitagawa(y, d, z, n_boot = 500)
print(k)
#>
#> -- Kitagawa (2015) -----------------------------------------------------------
#> Sample size: 500
#> Statistic: 0.04, p-value: 0.91
#> Verdict: cannot reject IV validity at 0.05

The bootstrap p-value comes from the multiplier resampling procedure of Kitagawa (2015) section 3.2. With parallel = TRUE (the default) replications run across cores on POSIX systems.

With covariates (Mourifie-Wan)

x <- rnorm(n)
mw <- iv_mw(y, d, z, x = x, n_boot = 500)
print(mw)

iv_mw() with covariates estimates F(y, d | X = x, Z = z) by cubic-polynomial series regression, computes heteroscedasticity-robust standard errors, and takes the sup of the studentised positive-part violation over a grid of (y, x) points. Critical values use adaptive moment selection with Andrews-Soares kappa_n = sqrt(log(log(n))). Without covariates it reduces exactly to the variance-weighted Kitagawa test.

Judge designs (Frandsen-Lefgren-Leslie)

set.seed(1)
n <- 2000
judge <- sample.int(20, n, replace = TRUE)
d <- rbinom(n, 1, 0.3 + 0.02 * judge)
y <- rnorm(n, mean = d)

jfe <- iv_testjfe(y, d, judge, n_boot = 500)

Designs where the instrument is a set of mutually exclusive dummies (judge, caseworker, examiner) need a purpose-built test. iv_testjfe() fits a weighted-LS regression of per-judge mu_j on per-judge p_j and tests the implied linearity via chi-squared with K - 2 degrees of freedom (default) or multiplier bootstrap (method = "bootstrap"). Multivalued treatment is supported via Frandsen-Lefgren-Leslie (2023) section 4.

One-shot diagnostic on a fitted model

library(fixest)

df <- data.frame(z = z, d = d, y = y, x = x)
m  <- feols(y ~ x | d ~ z, data = df)

iv_check(m, n_boot = 500)

iv_check() detects which tests are applicable from the model structure (binary versus multivalued D, discrete versus judge-style Z, presence and dimensionality of exogenous controls) and runs only the applicable ones. iv_kitagawa() is the unconditional test, so it is skipped when the model carries any exogenous control; iv_mw() is the conditional test and runs with up to one covariate via the Chernozhukov-Lee-Rosen series-regression path (multivariate planned for v0.2.0). Works identically on ivreg::ivreg() objects.

Power planning

pw <- iv_power(y, d, z, method = "kitagawa", n_sims = 200)

Simulates data under a parametric exclusion violation and reports rejection probability at a grid of deviation sizes. Useful when choosing between candidate tests on the same design, or planning a minimum sample size for a study.

Example: end-to-end with Card (1995)

The unconditional test rejects; the conditional one does not. That contrast is the right reading of Card’s design.

library(ivcheck)
library(fixest)

data(card1995)   # bundled

# Unconditional: Kitagawa and Mourifie-Wan both reject.
m_uncond <- feols(lwage ~ 1 | college ~ near_college, data = card1995)
iv_check(m_uncond, n_boot = 500)
#> IV validity diagnostic
#>   Kitagawa (2015):     stat = 5.25, p = 0.00, reject
#>   Mourifie-Wan (2017): stat = 5.25, p = 0.00, reject
#> Overall: at least one test rejects IV validity at 0.05.

# Conditional on age: the conditional Mourifie-Wan test does not reject.
m_cond <- feols(lwage ~ age | college ~ near_college, data = card1995)
iv_check(m_cond, n_boot = 200)
#> i Kitagawa test skipped: fitted model has exogenous controls and
#>   iv_kitagawa() is unconditional.
#> i The conditional Mourifie-Wan test is the right object here.
#>
#> IV validity diagnostic
#>   Mourifie-Wan (2017): stat = 79.5, p = 0.71, pass
#> Overall: cannot reject IV validity at 0.05.

Card’s identification strategy is “proximity-to-college is plausible only conditional on demographic controls”. The unconditional test catches this and refuses to validate the IV. Once a single demographic control (age) is included, the conditional Mourifie-Wan test reads the design as compatible with LATE-validity at the 5% level. Multivariate controls (Card’s canonical specification uses age plus race and region) are planned for v0.2.0 via a tensor-product series basis; in v0.1.2 the workaround is to reduce additional controls to a single propensity index.

This is a test of the binary college = (educ >= 16) discretisation, not Card’s original continuous-schooling IV. Inspect result$binding to see which outcome interval carries the violation when a rejection occurs.

Functions

Function Purpose
iv_kitagawa() Kitagawa (2015) variance-weighted KS test. Extends to multivalued D via Sun (2023).
iv_mw() Mourifie-Wan (2017) conditional-inequality test. Full CLR intersection-bounds with adaptive moment selection under covariates.
iv_testjfe() Frandsen-Lefgren-Leslie (2023) test for judge / group IV designs. Supports multivalued treatment.
iv_check() Wrapper that auto-detects applicable tests and runs them on a fitted IV model.
iv_power() Monte Carlo power curve for sample-size planning.

Limitations

Read before using in published work.

Scope

Notes on fidelity to the published tests

Interpretation

Why trust this implementation

Planned for future versions

Package Description
predictset Conformal prediction intervals (uncertainty around treatment effects)
nowcast Economic nowcasting
mpshock Monetary policy shock series (commonly used as instruments)
inequality Inequality measurement (distributional treatment effects)
fixest Fast IV estimation via feols(y ~ x \| d ~ z) (upstream from ivcheck)
ivreg 2SLS with Wu-Hausman, Sargan, weak-IV F (upstream from ivcheck)
ivmodel k-class estimators, weak-IV robust CIs, sensitivity analysis
ivDiag Effective F, Anderson-Rubin, valid-t, local-to-zero tests

ivcheck complements rather than competes with these. fixest or ivreg does the estimation, ivDiag does weak-IV post-estimation diagnostics, and ivcheck does LATE-assumption falsification.

Issues and requests

Report bugs or request additional tests at GitHub Issues. Pull requests implementing additional IV-validity tests from the literature are welcome; please include a reference to the original paper and a reproduction test against its empirical example.

References

Cite both the package and the underlying paper(s) for the test you use. Package citation:

citation("ivcheck")

Test-specific references (DOIs verified via crossref.org)

Function Reference DOI
iv_kitagawa() Kitagawa, T. (2015). A Test for Instrument Validity. Econometrica 83(5): 2043-2063. 10.3982/ECTA11974
iv_kitagawa() (multivalued D) Sun, Z. (2023). Instrument validity for heterogeneous causal effects. Journal of Econometrics 237(2): 105523. 10.1016/j.jeconom.2023.105523
iv_mw() Mourifie, I. and Wan, Y. (2017). Testing Local Average Treatment Effect Assumptions. Review of Economics and Statistics 99(2): 305-313. 10.1162/REST_a_00622
iv_testjfe() Frandsen, B. R., Lefgren, L. J., Leslie, E. C. (2023). Judging Judge Fixed Effects. American Economic Review 113(1): 253-277. 10.1257/aer.20201860

Foundational and methodological references

Package comparison

Keywords

instrumental variables, LATE, causal inference, exclusion restriction, monotonicity, specification testing, falsification, judge IV, Kitagawa test, Mourifie-Wan test, FLL test, econometrics.