Getting Started with wnpmle

Overview

The wnpmle package implements the weighted nonparametric maximum likelihood estimator (wNPMLE) for the marginal mean of recurrent events in the presence of a competing terminal event.

Competing events are commonly modeled as independent right-censorings, which leads to overestimation of the marginal mean of a recurrent event. wnpmle provides valid estimation and prediction via a weighted nonparametric maximum likelihood estimation for two broad classes of semiparametric transformation models.

Models Link function G(x) Special case at param = 1
Box-Cox transformation models (model = "boxcox") \(((1 + x)^\rho - 1)/\rho\) Ghosh–Lin (\(\rho = 1\))
Logarithmic transformation models (model = "log") \(\log(1 + r x)/r\) Proportional odds (\(r = 1\))

Both are estimated via automatic differentiation using TMB, which provides exact gradients and fast convergence.


Installation

install.packages("wnpmle")

Or from GitHub:

# install.packages("remotes")
remotes::install_github("abellach/wnpmle")

Quick start: bladder cancer data

The package ships with a pre-processed version of the bladder dataset from the survival package, accessible via bladder_prep().

library(wnpmle)

bdata       <- bladder_prep()
bdata_clean <- bdata[, c("id", "time", "status", "treat", "num", "size")]
head(bdata_clean)
#>   id time status treat num size
#> 1  1    0      2     0   1    1
#> 2  2    1      2     0   1    3
#> 3  3    4      0     0   2    1
#> 4  4    7      0     0   1    1
#> 5  5   10      2     0   5    1
#> 6  6    6      1     0   4    1

Fitting the models

Ghosh-Lin model (Box-Cox, rho = 1)

fit_bc <- wnpmle_fit(
  Surv(time, status) ~ treat + num + size,
  data  = bdata_clean,
  id    = "id",
  model = "boxcox",
  rho   = 1,
  tau   = 59,
  se    = "sandwich_adj"
)
#> Note: Using Makevars in C:\Users\abell\AppData\Local\Temp\Rtmp6Z8mOF\file61c45414c7f
#> using C++ compiler: 'G__~1.EXE (GCC) 13.3.0'
summary(fit_bc)
#> 
#> Weighted NPMLE - Recurrent Events with Competing Terminal Event
#> Type       : recurrent 
#> Model      : BOXCOX transformation ( rho = 1 )
#> Subjects   : 86 
#> Events     : recurrent = 132   terminal = 22   censored = 64 
#> Log-lik    : -683.5759 
#> Convergence: relative convergence (4) 
#> 
#> Coefficients:
#>       Estimate     SE z value Pr(>|z|)
#> treat  -0.5363 0.2679  -2.002 0.045280
#> num     0.1727 0.0588   2.939 0.003298
#> size   -0.0051 0.0681  -0.074 0.940900
#> 
#> Cumulative baseline mean at time grid:
#>                 Lambda     SE
#> A(tau/4) = 14.8 0.5111 0.1446
#> A(tau/2) = 29.5 1.1076 0.2903
#> A(tau)   = 59   1.5882 0.4246
bl_bc <- baseline(fit_bc)
plot(bl_bc$time, bl_bc$Lambda, type = "s", lwd = 2,
     xlab = "Time (months)", ylab = expression(hat(Lambda)(t)),
     main = "Cumulative baseline mean (Ghosh-Lin)")
lines(bl_bc$time, bl_bc$lower, lty = 2, col = "grey50")
lines(bl_bc$time, bl_bc$upper, lty = 2, col = "grey50")

AIC(fit_bc)
#> [1] 1373.152
BIC(fit_bc)
#> [1] 1380.515

Proportional odds model (logarithmic, r = 1)

fit_log <- wnpmle_fit(
  Surv(time, status) ~ treat + num + size,
  data  = bdata_clean,
  id    = "id",
  model = "log",
  rho   = 1,
  tau   = 59,
  se    = "sandwich_adj"
)
#> Note: Using Makevars in C:\Users\abell\AppData\Local\Temp\Rtmp6Z8mOF\file61c45414c7f
#> using C++ compiler: 'G__~1.EXE (GCC) 13.3.0'
summary(fit_log)
#> 
#> Weighted NPMLE - Recurrent Events with Competing Terminal Event
#> Type       : recurrent 
#> Model      : LOG transformation ( r = 1 )
#> Subjects   : 86 
#> Events     : recurrent = 132   terminal = 22   censored = 64 
#> Log-lik    : -685.6581 
#> Convergence: relative convergence (4) 
#> 
#> Coefficients:
#>       Estimate     SE z value Pr(>|z|)
#> treat  -0.9916 0.4593  -2.159 0.030840
#> num     0.3540 0.1335   2.651 0.008017
#> size    0.0140 0.1424   0.098 0.921600
#> 
#> Cumulative baseline mean at time grid:
#>                 Lambda     SE
#> A(tau/4) = 14.8 0.5328 0.2972
#> A(tau/2) = 29.5 1.8252 1.0661
#> A(tau)   = 59   3.8781 2.4199
bl_log <- baseline(fit_log)
plot(bl_log$time, bl_log$Lambda, type = "s", lwd = 2,
     xlab = "Time (months)", ylab = expression(hat(Lambda)(t)),
     main = "Cumulative baseline mean (proportional odds)")
lines(bl_log$time, bl_log$lower, lty = 2, col = "grey50")
lines(bl_log$time, bl_log$upper, lty = 2, col = "grey50")

AIC(fit_log)
#> [1] 1377.316
BIC(fit_log)
#> [1] 1384.679

Prediction

predict() evaluates the estimated marginal mean at new covariate values. Here we compare the mean number of recurrences over time for a treated vs. placebo patient, holding the number of initial tumours and tumour size fixed at 1.

newdat <- data.frame(treat = c(0, 1), num = c(1, 1), size = c(1, 1))

pred <- predict(fit_bc, newdata = newdat,
                times = seq(1, 50, by = 1))

plot(pred$time, pred$mu_1, type = "s", lwd = 2,
     xlab = "Time (months)", ylab = "Marginal mean number of recurrences",
     ylim = range(pred[, -1]))
lines(pred$time, pred$mu_2, lwd = 2, lty = 2, col = "firebrick")
legend("topleft", legend = c("Placebo", "Thiotepa"),
       lty = c(1, 2), col = c("black", "firebrick"), bty = "n")


Choosing the transformation parameter

plot_loglik() sweeps over a grid of parameter values and plots the profile log-likelihood for both models on a single axis. The logarithmic parameter r is shown on the left (negative axis) and the Box-Cox parameter rho on the right (positive axis), meeting at zero where the two models coincide.

The grids and the follow-up horizon tau are user-controlled. The function fits length(rho_grid) + length(r_grid) models in total, so computation time scales with grid size.

plot_loglik(
  Surv(time, status) ~ treat + num + size,
  data     = bdata_clean,
  id       = "id",
  tau      = 59,
  rho_grid = seq(0.01, 1.2, by = 0.01),
  r_grid   = seq(0.01, 1.2, by = 0.01)
)

The filled circle marks rho = 1 (Ghosh-Lin) and the open circle marks r = 1 (proportional odds). The optimal parameter values are printed to the console.


Standard errors

The adjusted sandwich variance estimator accounts for the estimation of the inverse probability of censoring weights. Set se = "none" to skip SE computation, which is useful when profiling over a grid of transformation parameters.

Value Description
"sandwich_adj" Sandwich variance estimator (default)
"sandwich" Sandwich variance estimator without adjustment (approximation)
"fisher" Inverse fisher information
"none" No standard errors, faster, useful for profiling

S3 methods

Method Description
print(fit) Compact coefficient table with z-values and p-values
summary(fit) Adds cumulative baseline at tau/4, tau/2, tau
coef(fit) Named coefficient vector
vcov(fit) Full variance-covariance matrix for (beta, Lambda)
logLik(fit) Log-likelihood (for use with AIC / BIC)
AIC(fit) Akaike information criterion
BIC(fit) Bayesian information criterion
baseline(fit) Cumulative baseline mean data frame with CIs
predict(fit, newdata) Marginal mean at new covariate values

References

Bellach, A. and Kosorok, M.R. (2026). Weighted NPMLE for the marginal mean of recurrent events with a competing terminal event. arXiv preprint arXiv:2605.25934.

Bellach, A., Kosorok, M.R., Rüschendorf, L. and Fine, J.P. (2019). Weighted NPMLE for the subdistribution of a competing risk. Journal of the American Statistical Association, 114(525), 259–270. doi:10.1080/01621459.2017.1401540.

Ghosh, D. and Lin, D.Y. (2002). Marginal regression models for recurrent and terminal events. Statistica Sinica, 12, 663–688. doi:10.17615/pt0g-y207.

Zeng, D. and Lin, D.Y. (2006). Semiparametric transformation models with random effects for recurrent events. Biometrika, 93(3), 627–640. doi:10.1093/biomet/93.3.627.