---
title: "Canonical disaggregation and the Leave-Cluster-Out test"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Canonical disaggregation and the Leave-Cluster-Out test}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(collapse = TRUE, comment = "#>")
```

```{r setup}
library(convergenceDFM)
```

This vignette documents two design decisions of version 0.3.0:

1. the **single, canonical disaggregation engine** (imported from
   **BayesianDisaggregation**, replacing a local duplicate), and
2. the **Leave-Cluster-Out** test, which generalizes the delete-one-sector
   jackknife to dropping an entire group of sectors at once.

Both follow the project's standing criteria: maximum multidimensional robustness;
keeping the algebraic, statistical and numerical layers separate; no claims of
uniqueness; and a deliberately plain reading of "surviving a leave-out" as
**predictive robustness under dependence**, not a topological invariant.

## 1. One disaggregation engine, not two

Earlier versions of `convergenceDFM` carried their own
`run_disaggregation_custom_prior()`: a deterministic convex blend of a prior
weight matrix with a singular-vector "likelihood". That blend never conditioned
on the observed aggregate index (the Consumer Price Index, CPI) -- it was a
weighting heuristic dressed in Bayesian vocabulary -- and it duplicated the
purpose of the dedicated disaggregation package.

Version 0.3.0 removes that duplicate. The canonical disaggregation now lives in
one place, **BayesianDisaggregation**, and `convergenceDFM` imports it. The
asset reused is the *engine*, `BayesianDisaggregation::disaggregate_conjugate()`:
an exact, closed-form linear-Gaussian state-space posterior (a Kalman filter
with a Rauch-Tung-Striebel smoother) for the sectoral price levels given the
aggregate index and the value-added weights. It conditions genuinely on the CPI,
and -- being pure R, with no Markov chain Monte Carlo -- it is fast enough to use
inside a resampling loop.

```{r conjugate}
set.seed(1)
Tn <- 20; K <- 4
cpi <- 100 * cumprod(1 + rnorm(Tn, 0.02, 0.01)) + 50   # a positive aggregate index
W   <- matrix(runif(Tn * K), Tn, K); W <- W / rowSums(W)

fit <- BayesianDisaggregation::disaggregate_conjugate(cpi, W)
dim(fit$phi_summary$median)     # [T x K] smoothed sectoral levels
```

The honest identification is unchanged from the disaggregation package: the
*aggregate* is strongly identified, the *sectoral* split is weakly identified by
construction (one linear combination is pinned per period; the remaining
directions are governed by the prior and by temporal smoothness). That is why a
point estimate is only a summary, and the full posterior draws are what feed the
downstream nested Ornstein-Uhlenbeck model by multiple imputation.

### Where the engine is used here

`test_reweighting_robustness()` perturbs the sectoral weighting scheme and asks
whether the estimated coupling survives. Each perturbed scheme is a
constant-in-time prior vector, replicated across periods to form the weight
matrix `W`; the sectoral levels are then the posterior median of the conjugate
engine, *now genuinely conditioned on the CPI*:

```{r reweight, eval=FALSE}
# `path_cpi` and `path_weights` are Excel files; `X_matrix` is the production-side
# panel. The function reads the CPI, aligns it to the weight years, and for each
# alternative prior calls disaggregate_conjugate() internally.
rw <- test_reweighting_robustness(path_cpi, path_weights, X_matrix,
                                  max_comp = 3, seed = 11)
rw$cv_coupling   # coefficient of variation of the coupling across schemes
rw$robust        # TRUE if CV < 0.30
```

The whole routine is reproducible: the seed now governs not only the alternative
priors but also the data diagnosis and the cross-validated component selection,
so the couplings no longer depend on call order.

## 2. Leave-Cluster-Out

### Why a cluster, not a single sector

`test_jackknife_sectors()` drops one sector (one column) at a time. Under
cross-sectional dependence of the input-output kind -- where sectors are linked
by intermediate demand, the relationships catalogued in a Leontief table (the
"MIP") -- dropping a single sector is optimistic: the information of the excluded
sector leaks back in through its near-collinear neighbours in the same value
chain. The coupling then looks more stable than it is.

`test_leave_cluster_out()` removes an **entire value chain** at once. With a whole
chain gone, the prediction can no longer lean on a removed sector's neighbours;
it must rely on the general gravitation. This is the cross-sectional companion of
the temporal nulls already in the package (the circular time-shift / moving-block
bootstrap in `rotation_null_test()` and `test_permutation_robustness()`, which
break dependence along time). It reuses the same coupling pipeline as the
jackknife -- it does not reimplement it.

### The cluster map is pluggable

The genuine clusters are value chains defined by inter-industry linkages, and the
partition is supplied by the user as `cluster_map` (a per-sector label vector, or
a named list mapping each cluster to its sector names):

```{r lco-data}
set.seed(123)
Tn <- 30; K <- 6
f   <- cumsum(rnorm(Tn))
Phi <- sapply(1:K, function(k) 100 + 5 * f + rnorm(Tn, 0, 1))   # production side
phi <- sapply(1:K, function(k) Phi[, k] + rnorm(Tn, 0, 0.5))    # market side
colnames(Phi) <- colnames(phi) <- paste0("sector_", 1:K)

chains <- list(chainA = c("sector_1", "sector_2"),
               chainB = c("sector_3", "sector_4"),
               chainC = c("sector_5", "sector_6"))
lco <- test_leave_cluster_out(Phi, phi, cluster_map = chains, seed = 7,
                              verbose = FALSE)
lco$baseline
lco$cluster_estimates     # coupling with each chain removed
lco$robust                # TRUE if no chain changes the coupling by > 50%
```

### A documented fallback until the MIP arrives

When no `cluster_map` is supplied, a **fallback** partition is built with
`build_cluster_map()` and a message flags its use. The fallback is an explicit
*stopgap proxy*, not a demand-linkage partition:

- `"correlation"` groups sectors by average-linkage hierarchical clustering on
  the correlation distance `1 - rho` between the sectoral series (co-movement);
- `"com"` bins a per-sector organic-composition vector into quantile groups
  (sectors of similar organic composition share a profit-rate neighbourhood).

```{r fallback}
build_cluster_map(phi, n_clusters = 3, method = "correlation")
```

Neither correlation nor organic composition reproduces input-output linkages;
they are one-dimensional proxies. Supply the real partition through `cluster_map`
once the Leontief table is at hand.

### Reading the statistical layer honestly

`bias` and `se` are the delete-a-group (block) jackknife estimates over the
cluster-deletion replicates. They are well calibrated for roughly balanced
clusters; with strongly unequal clusters they are an approximate, conservative
summary. The primary outputs are the per-cluster `influence`/`retention` and the
`robust` verdict, which is a robustness diagnostic, not a coupling point
estimate. The verdict means exactly "no single value chain moves the coupling by
more than half" -- a statement about predictive stability under cross-sectional
dependence, with no topological content.

The Leave-Cluster-Out is strictly more demanding than the single-sector
jackknife: dropping a whole chain removes more shared variation, so a coupling
that is robust to one-sector deletion can still be sensitive to chain deletion.
That gap is the point of the test.