Using Custom Datasets

Avishek Bhandari

2026-05-05

Overview

The G20 panel that ships with contagionchannels is one possible application; the same machinery applies to any directed network of return series and any set of channel proxies a user can construct. This vignette shows how to plug in custom data, explains the contracts each function expects, and ends with a worked example using a synthetic five-market panel seeded for reproducibility.

library(contagionchannels)
library(xts)
#> Loading required package: zoo
#> 
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#> 
#>     as.Date, as.Date.numeric
library(dplyr)
#> 
#> ######################### Warning from 'xts' package ##########################
#> #                                                                             #
#> # The dplyr lag() function breaks how base R's lag() function is supposed to  #
#> # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
#> # source() into this session won't work correctly.                            #
#> #                                                                             #
#> # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
#> # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
#> # dplyr from breaking base R's lag() function.                                #
#> #                                                                             #
#> # Code in packages is not affected. It's protected by R's namespace mechanism #
#> # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
#> #                                                                             #
#> ###############################################################################
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:xts':
#> 
#>     first, last
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

1. Required data structure

Two objects drive the entire pipeline.

Returns. An xts object with one column per market. Rows are trading days (the index must be Date or POSIXct); columns are demeaned daily log returns. Missing values are allowed but should be sparse; na.locf or listwise deletion are both acceptable, but the user is responsible for the choice.

Channel proxies. A data.frame with a Date column matching the returns index plus one column per raw component of each channel. The helper build_channel_composites() does the standardisation and PCA reduction; the user must only ensure that the components are aligned on date and roughly stationary (returns or first-differenced spreads, never raw levels).

str(my_returns_xts)
# An 'xts' object on 2010-01-04/2024-12-31 of 18 columns

str(my_channels_df)
# 'data.frame' with Date + raw component columns

2. Custom market list and crisis periods

The package does not hard-code 18 markets; the WQTE estimator accepts any \(N \ge 3\) markets. To pass your own crisis periods, build a named list of Date-vector pairs:

my_periods <- list(
  Pre_Pandemic = as.Date(c("2018-01-02", "2020-01-31")),
  Pandemic     = as.Date(c("2020-02-01", "2021-12-31")),
  Recovery     = as.Date(c("2022-01-01", "2024-12-31"))
)

Periods need not be contiguous and can overlap, though overlapping windows complicate cross-period comparisons. The conventional non-overlap rule is what the paper uses.

3. Custom channel composites

build_channel_composites() exposes a mapping argument that lets the user override the default component-to-channel assignment. Each channel must have at least two components for the PCA reduction to be defined.

my_mapping <- list(
  Trade           = c("BDI_chg", "TradeWeightedFX_chg", "ContainerRate_chg"),
  Financial       = c("FRAOIS", "TEDspread", "CDS_lvl"),
  Geopolitical    = c("GPR_daily", "GPR_actions"),
  Behavioral      = c("VIX_innov", "VVIX_innov", "PutCallRatio"),
  Monetary_Policy = c("ShadowRate_surp", "FF_futures_surp")
)

my_channels <- build_channel_composites(
  proxy_grid = my_proxies_df,
  mapping    = my_mapping,
  standardise = "rolling_252"
)

The standardise argument selects between "global" (one z-score over the whole sample), "rolling_252" (one-year rolling), or "period" (within each crisis period). The paper uses "rolling_252" to match the daily update cadence of the underlying components.

4. Calling the pipeline

Once returns and composites are in hand, run_contagion_pipeline() consumes both. The full call mirrors the replication vignette but with user-supplied objects:

results <- run_contagion_pipeline(
  returns       = my_returns_xts,
  channels      = my_channels,
  periods       = my_periods,
  scale         = 5,
  tau           = 0.50,
  abs_threshold = NULL,    # NULL => derive from first period Q75
  methods       = c("iv2sls", "lp"),
  bootstrap_B   = 499,
  n_cores = 4
)

Setting abs_threshold = NULL instructs the pipeline to derive the absolute threshold from the 75th percentile of the first listed period’s WQTE distribution. Alternatively, pass a numeric value to fix the threshold across applications, which is useful for cross-paper comparability.

5. Adapting visualisations for custom data

The plotting helpers accept any tidy data.frame with a Period and Channel column; they do not assume the eight G20 sub-periods. To customise:

plot_attribution_stack(
  shares_long = results$shares_long,
  period_order = names(my_periods),
  palette      = c(Trade = "#1f77b4", Financial = "#d62728",
                   Geopolitical = "#9467bd", Behavioral = "#2ca02c",
                   Monetary_Policy = "#ff7f0e")
)

plot_qte_intensity(
  F_matrix  = results$F_matrices$Pandemic,
  threshold = results$abs_threshold,
  market_order = c("US", "EU", "JP", "EM_Asia", "EM_LatAm")
)

Both helpers return a ggplot object that can be modified with the standard ggplot2 grammar.

6. Worked example: synthetic five-market panel

A small, fully-reproducible example is the most reliable way to confirm the contracts. Below we simulate five correlated equity-like return series together with five channel components, build the composites, and run a trimmed Stage 1 + Stage 2 pipeline.

set.seed(20260429)

n_obs   <- 1500
markets <- c("US", "EU", "JP", "EM_Asia", "EM_LatAm")
dates   <- seq.Date(from = as.Date("2018-01-02"), by = "day",
                    length.out = n_obs)

# Common factor + idiosyncratic shocks
Fcom <- rnorm(n_obs, sd = 0.012)
ret_mat <- sapply(markets, function(m) {
  loading <- runif(1, 0.4, 0.9)
  loading * Fcom + rnorm(n_obs, sd = 0.010)
})
my_returns <- xts(ret_mat, order.by = dates)

# Channel proxy raw components
my_proxies <- data.frame(
  Date              = dates,
  BDI_chg           = rnorm(n_obs, sd = 0.5),
  TradeFX_chg       = rnorm(n_obs, sd = 0.4),
  FRAOIS            = arima.sim(list(ar = 0.95), n_obs) * 0.01,
  TEDspread         = arima.sim(list(ar = 0.93), n_obs) * 0.01,
  GPR_daily         = exp(rnorm(n_obs, sd = 0.2)),
  GPR_actions       = exp(rnorm(n_obs, sd = 0.3)),
  VIX_innov         = rnorm(n_obs, sd = 1.5),
  VVIX_innov        = rnorm(n_obs, sd = 1.0),
  ShadowRate_surp   = rnorm(n_obs, sd = 0.05),
  FF_futures_surp   = rnorm(n_obs, sd = 0.04)
)

my_mapping <- list(
  Trade           = c("BDI_chg", "TradeFX_chg"),
  Financial       = c("FRAOIS", "TEDspread"),
  Geopolitical    = c("GPR_daily", "GPR_actions"),
  Behavioral      = c("VIX_innov", "VVIX_innov"),
  Monetary_Policy = c("ShadowRate_surp", "FF_futures_surp")
)

my_periods <- list(
  Calm   = as.Date(c("2018-01-02", "2019-12-31")),
  Stress = as.Date(c("2020-01-01", "2021-12-31"))
)
my_channels <- build_channel_composites(
  proxy_grid  = my_proxies,
  mapping     = my_mapping,
  standardise = "rolling_252"
)

head(my_channels, 3)

A trimmed Stage 1 estimate on the Calm sub-period:

calm_dates    <- my_periods$Calm
returns_calm  <- my_returns[paste0(calm_dates[1], "/", calm_dates[2])]

F_calm <- compute_wqte_matrix(
  returns   = returns_calm,
  scale     = 5,
  tau       = 0.50,
  n_cores = 1,
)

abs_thr_calm <- quantile(
  F_calm[upper.tri(F_calm) | lower.tri(F_calm)],
  probs = 0.75, na.rm = TRUE
)

links_calm <- which(F_calm >= abs_thr_calm, arr.ind = TRUE)
nrow(links_calm)

A Stage 2 IV/2SLS attribution on the same window:

channels_calm <- my_channels[
  my_channels$Date >= calm_dates[1] & my_channels$Date <= calm_dates[2], ]

iv_calm <- iv_2sls_attribute(
  returns_period  = returns_calm,
  channels_period = channels_calm,
  links           = links_calm,
  cluster_se      = TRUE
)

iv_calm$shares

End-to-end pipeline call on the synthetic data:

synth_results <- run_contagion_pipeline(
  returns       = my_returns,
  channels      = my_channels,
  periods       = my_periods,
  scale         = 5,
  tau           = 0.50,
  abs_threshold = abs_thr_calm,
  methods       = c("iv2sls"),
  bootstrap_B   = 199,
  n_cores = 1
)

synth_results$summary_table

The synthetic panel is too small for inference to be meaningful, but running the chain end-to-end is the cleanest way to verify that the data contracts are satisfied before committing compute to a real estimation.

Common pitfalls

A few user errors recur in support requests:

These pitfalls aside, the package’s contracts are deliberately minimal: an xts of returns, a data.frame of proxies, and a list of period endpoints. Anything that satisfies those three constraints can be analysed with the same machinery that produces the headline results in the paper.

Session info

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 22.04.5 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.10.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.10.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_IN       LC_NUMERIC=C         LC_TIME=en_IN       
#>  [4] LC_COLLATE=C         LC_MONETARY=en_IN    LC_MESSAGES=en_IN   
#>  [7] LC_PAPER=en_IN       LC_NAME=C            LC_ADDRESS=C        
#> [10] LC_TELEPHONE=C       LC_MEASUREMENT=en_IN LC_IDENTIFICATION=C 
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.1.4             xts_0.14.1              zoo_1.8-14             
#> [4] contagionchannels_0.1.3
#> 
#> loaded via a namespace (and not attached):
#>  [1] bslib_0.9.0        compiler_4.1.2     pillar_1.11.1      jquerylib_0.1.4   
#>  [5] tools_4.1.2        digest_0.6.37      tibble_3.3.0       jsonlite_2.0.0    
#>  [9] evaluate_1.0.5     lifecycle_1.0.5    lattice_0.20-45    pkgconfig_2.0.3   
#> [13] rlang_1.2.0        Matrix_1.5-4.1     igraph_2.2.0       cli_3.6.6         
#> [17] yaml_2.3.10        parallel_4.1.2     SparseM_1.84-2     xfun_0.53         
#> [21] fastmap_1.2.0      multitaper_1.0-17  knitr_1.50         generics_0.1.4    
#> [25] sass_0.4.10        vctrs_0.6.5        MatrixModels_0.5-1 tidyselect_1.2.1  
#> [29] grid_4.1.2         glue_1.8.0         R6_2.6.1           survival_3.2-13   
#> [33] waveslim_1.8.5     rmarkdown_2.30     magrittr_2.0.4     htmltools_0.5.8.1 
#> [37] MASS_7.3-55        splines_4.1.2      quantreg_6.1       cachem_1.1.0