Canonical disaggregation and the Leave-Cluster-Out test

library(convergenceDFM)
#> convergenceDFM 0.3.2 - Dynamic Factor Models for Economic Convergence
#> Type vignette('convergence-analysis') for an introduction
#> Stan backend: CmdStan available for OU estimation.

This vignette documents two design decisions of version 0.3.0:

  1. the single, canonical disaggregation engine (imported from BayesianDisaggregation, replacing a local duplicate), and
  2. the Leave-Cluster-Out test, which generalizes the delete-one-sector jackknife to dropping an entire group of sectors at once.

Both follow the project’s standing criteria: maximum multidimensional robustness; keeping the algebraic, statistical and numerical layers separate; no claims of uniqueness; and a deliberately plain reading of “surviving a leave-out” as predictive robustness under dependence, not a topological invariant.

1. One disaggregation engine, not two

Earlier versions of convergenceDFM carried their own run_disaggregation_custom_prior(): a deterministic convex blend of a prior weight matrix with a singular-vector “likelihood”. That blend never conditioned on the observed aggregate index (the Consumer Price Index, CPI) – it was a weighting heuristic dressed in Bayesian vocabulary – and it duplicated the purpose of the dedicated disaggregation package.

Version 0.3.0 removes that duplicate. The canonical disaggregation now lives in one place, BayesianDisaggregation, and convergenceDFM imports it. The asset reused is the engine, BayesianDisaggregation::disaggregate_conjugate(): an exact, closed-form linear-Gaussian state-space posterior (a Kalman filter with a Rauch-Tung-Striebel smoother) for the sectoral price levels given the aggregate index and the value-added weights. It conditions genuinely on the CPI, and – being pure R, with no Markov chain Monte Carlo – it is fast enough to use inside a resampling loop.

set.seed(1)
Tn <- 20; K <- 4
cpi <- 100 * cumprod(1 + rnorm(Tn, 0.02, 0.01)) + 50   # a positive aggregate index
W   <- matrix(runif(Tn * K), Tn, K); W <- W / rowSums(W)

fit <- BayesianDisaggregation::disaggregate_conjugate(cpi, W)
dim(fit$phi_summary$median)     # [T x K] smoothed sectoral levels
#> [1] 20  4

The honest identification is unchanged from the disaggregation package: the aggregate is strongly identified, the sectoral split is weakly identified by construction (one linear combination is pinned per period; the remaining directions are governed by the prior and by temporal smoothness). That is why a point estimate is only a summary, and the full posterior draws are what feed the downstream nested Ornstein-Uhlenbeck model by multiple imputation.

Where the engine is used here

test_reweighting_robustness() perturbs the sectoral weighting scheme and asks whether the estimated coupling survives. Each perturbed scheme is a constant-in-time prior vector, replicated across periods to form the weight matrix W; the sectoral levels are then the posterior median of the conjugate engine, now genuinely conditioned on the CPI:

# `path_cpi` and `path_weights` are Excel files; `X_matrix` is the production-side
# panel. The function reads the CPI, aligns it to the weight years, and for each
# alternative prior calls disaggregate_conjugate() internally.
rw <- test_reweighting_robustness(path_cpi, path_weights, X_matrix,
                                  max_comp = 3, seed = 11)
rw$cv_coupling   # coefficient of variation of the coupling across schemes
rw$robust        # TRUE if CV < 0.30

The whole routine is reproducible: the seed now governs not only the alternative priors but also the data diagnosis and the cross-validated component selection, so the couplings no longer depend on call order.

2. Leave-Cluster-Out

Why a cluster, not a single sector

test_jackknife_sectors() drops one sector (one column) at a time. Under cross-sectional dependence of the input-output kind – where sectors are linked by intermediate demand, the relationships catalogued in a Leontief table (the “MIP”) – dropping a single sector is optimistic: the information of the excluded sector leaks back in through its near-collinear neighbours in the same value chain. The coupling then looks more stable than it is.

test_leave_cluster_out() removes an entire value chain at once. With a whole chain gone, the prediction can no longer lean on a removed sector’s neighbours; it must rely on the general gravitation. This is the cross-sectional companion of the temporal nulls already in the package (the circular time-shift / moving-block bootstrap in rotation_null_test() and test_permutation_robustness(), which break dependence along time). It reuses the same coupling pipeline as the jackknife – it does not reimplement it.

The cluster map is pluggable

The genuine clusters are value chains defined by inter-industry linkages, and the partition is supplied by the user as cluster_map (a per-sector label vector, or a named list mapping each cluster to its sector names):

set.seed(123)
Tn <- 30; K <- 6
f   <- cumsum(rnorm(Tn))
Phi <- sapply(1:K, function(k) 100 + 5 * f + rnorm(Tn, 0, 1))   # production side
phi <- sapply(1:K, function(k) Phi[, k] + rnorm(Tn, 0, 0.5))    # market side
colnames(Phi) <- colnames(phi) <- paste0("sector_", 1:K)

chains <- list(chainA = c("sector_1", "sector_2"),
               chainB = c("sector_3", "sector_4"),
               chainC = c("sector_5", "sector_6"))
lco <- test_leave_cluster_out(Phi, phi, cluster_map = chains, seed = 7,
                              verbose = FALSE)
lco$baseline
#> [1] 3.139337
lco$cluster_estimates     # coupling with each chain removed
#>   chainA   chainB   chainC 
#> 2.966818 3.766083 0.954983
lco$robust                # TRUE if no chain changes the coupling by > 50%
#> [1] FALSE

A documented fallback until the MIP arrives

When no cluster_map is supplied, a fallback partition is built with build_cluster_map() and a message flags its use. The fallback is an explicit stopgap proxy, not a demand-linkage partition:

build_cluster_map(phi, n_clusters = 3, method = "correlation")
#> sector_1 sector_2 sector_3 sector_4 sector_5 sector_6 
#>        1        2        2        3        1        2

Neither correlation nor organic composition reproduces input-output linkages; they are one-dimensional proxies. Supply the real partition through cluster_map once the Leontief table is at hand.

Reading the statistical layer honestly

bias and se are the delete-a-group (block) jackknife estimates over the cluster-deletion replicates. They are well calibrated for roughly balanced clusters; with strongly unequal clusters they are an approximate, conservative summary. The primary outputs are the per-cluster influence/retention and the robust verdict, which is a robustness diagnostic, not a coupling point estimate. The verdict means exactly “no single value chain moves the coupling by more than half” – a statement about predictive stability under cross-sectional dependence, with no topological content.

The Leave-Cluster-Out is strictly more demanding than the single-sector jackknife: dropping a whole chain removes more shared variation, so a coupling that is robust to one-sector deletion can still be sensitive to chain deletion. That gap is the point of the test.