seroreconstruct is an R package for Bayesian inference
of influenza virus infection status from serial antibody measurements.
It relaxes the traditional 4-fold rise rule by jointly modeling antibody
boosting after infection, waning over time, and measurement error —
estimating individual-level infection probabilities from
hemagglutination inhibition (HAI) titer data.
The statistical framework is described in Tsang et al. (2022) Nature Communications 13:1557.
The package requires two data frames:
inputdata — one row per individual
with columns:
age_group: integer (0 = children, 1 = adults, 2 = older
adults)start_time, end_time: follow-up period
(integer days)time1, time2, time3: serum
collection dates (integer days)HAI_titer_1, HAI_titer_2,
HAI_titer3: HAI titers at each collectioninputILI — daily influenza-like
illness activity, with rows corresponding to consecutive days. The row
indices must cover the range of start_time to
end_time in inputdata.library(seroreconstruct)
data("inputdata")
data("flu_activity")
head(inputdata)
#> age_group start_time end_time time1 time2 time3 HAI_titer_1 HAI_titer_2
#> 1 0 837 1192 837 1032 1192 0 0
#> 2 2 837 1192 837 1032 1192 0 0
#> 3 2 837 1192 837 1032 1192 0 0
#> 4 0 837 1192 837 1032 1192 4 4
#> 5 0 845 1191 845 1034 1191 0 0
#> 6 2 845 1191 845 1034 1191 0 0
#> HAI_titer3
#> 1 0
#> 2 0
#> 3 0
#> 4 4
#> 5 0
#> 6 0
dim(inputdata)
#> [1] 1753 9The main function sero_reconstruct() runs a Bayesian
MCMC to estimate infection probabilities and antibody dynamics
parameters. For real analyses, use at least 100,000 iterations with
appropriate burn-in and thinning.
fit <- sero_reconstruct(inputdata, flu_activity,
n_iteration = 200000, burnin = 100000, thinning = 10)For this vignette, we use a short run for illustration:
fit <- sero_reconstruct(inputdata, flu_activity,
n_iteration = 2000, burnin = 1000, thinning = 1)
#> 1000
#> MCMC complete in 23 seconds. Use summary() to view estimates.
fit
#> seroreconstruct fit
#> Individuals: 1753
#> Age groups: 3
#> Posterior samples: 1000
#> Runtime: 23 seconds
#>
#> Use summary() to extract model estimates.The summary() method extracts key estimates with 95%
credible intervals:
summary(fit)
#> Variable
#> Random error (%)
#> Two-fold error (%)
#> Fold-increase after infection for children (Boosting)
#> Fold-decrease after 1 year for children (Waning)
#> Fold-increase after 1 year for adults (Boosting)
#> Fold-decrease after 1 year for adults (Waning)
#> Infection probability for children
#> Infection probability for adults
#> Infection probability for older adults
#> Infection probability for children with pre-epidemic HAI titer < 10
#> Infection probability for adults with pre-epidemic HAI titer < 10
#> Infection probability for older adults with pre-epidemic HAI titer < 10
#> Relative risk for children (Ref: Adults)
#> Relative risk for older adults (Ref: Adults)
#> Relative risk for children with pre-epidemic HAI titer < 10 (Ref: Adults with pre-epidemic HAI titer < 10)
#> Relative risk for older adults with pre-epidemic HAI titer < 10 (Ref: Adults with pre-epidemic HAI titer < 10)
#> Point estimate Lower bound Upper bound
#> 2.45 1.46 3.87
#> 3.35 4.09 2.70
#> 8.16 4.41 12.57
#> 1.06 1.02 1.11
#> 6.20 4.46 8.03
#> 1.12 1.08 1.19
#> 0.17 0.15 0.20
#> 0.21 0.17 0.25
#> 0.14 0.09 0.18
#> 0.39 0.29 0.49
#> 0.30 0.24 0.36
#> 0.20 0.14 0.27
#> 0.84 0.66 1.07
#> 1.32 1.01 1.69
#> 0.66 0.41 0.95
#> 0.67 0.43 0.95The summary table includes:
Check convergence with trace plots and posterior density plots:
Visualize posterior trajectories for individual participants. Red lines show trajectories where infection occurred; blue lines show trajectories without infection.
Extract parameter estimates and individual-level results as data frames:
# Model parameter estimates
table_parameters(fit)
#> Parameter Mean Median Lower Upper
#> 1 random_error 0.002449224 0.002360517 0.001458128 0.003871297
#> 2 twofold_error 2.008823851 2.002093441 1.811109977 2.226452899
#> 3 boosting_children 3.028080030 3.065484041 2.142161893 3.652285927
#> 4 waning_children 0.078596897 0.077203527 0.027590924 0.147103681
#> 5 boosting_adults 2.631771904 2.629185001 2.156949228 3.004526332
#> 6 waning_adults 0.162706514 0.158313846 0.113593483 0.255005105
#> 7 inf_prob_children 0.533519831 0.535213356 0.372523502 0.721366919
#> 8 inf_prob_adults 0.379860281 0.381669558 0.293712880 0.475050602
#> 9 inf_prob_older_adults 0.236305160 0.232561598 0.158564652 0.334360678
#> 10 hai_coef -0.429737528 -0.450430439 -0.601178971 -0.253807632
# Per-individual infection probabilities
head(table_infections(fit))
#> Individual Infection_prob Infection_time_mean Baseline_titer_mean
#> 1 1 0 NA 0.52
#> 2 2 0 NA 0.46
#> 3 3 0 NA 0.49
#> 4 4 0 NA 4.50
#> 5 5 0 NA 0.50
#> 6 6 0 NA 0.50Compare infection rates across groups by fitting independent MCMCs:
Generate synthetic data for power analysis or validation:
data("para1")
data("para2")
sim <- simulate_data(inputdata, flu_activity, para1, para2)
names(sim)
#> [1] "age_group" "start_time" "end_time" "time1" "time2"
#> [6] "time3" "HAI_titer_1" "HAI_titer_2" "HAI_titer3"The simulated data can be passed directly to
sero_reconstruct() for simulation-recovery studies.
To cite seroreconstruct in publications:
Tsang TK, Perera RAPM, Fang VJ, Wong JY, Shiu EY, So HC, Ip DKM, Malik Peiris JS, Leung GM, Cowling BJ, Cauchemez S. (2022). Reconstructing antibody dynamics to estimate the risk of influenza virus infection. Nature Communications 13:1557. doi: 10.1038/s41467-022-29310-8