GammaFrailty: Gamma Frailty Regression Models with Multiple Baseline Distributions

Shikhar Tyagi

2026-06-11

1 Introduction

The GammaFrailty package implements univariate gamma frailty regression models for survival data under six different baseline distributions:

Model Parameter(s) Reference
Arvind alpha Pandey et al. (2024)
Lindley lambda Lindley (1958)
Linear Failure Rate (LFR) a, b Bain (1974)
Power Xgamma (PXG) alpha, beta Tyagi et al. (2022)
Modified Topp-Leone (MTL) alpha Singh et al. (2025)
Power Failure Rate (PFR) a, k Mugdadi (2005)

1.1 The Gamma Frailty Framework

Let \(T_j\) be the survival time for the \(j\)-th subject and \(Z\) be an unobservable frailty variable drawn from a Gamma distribution with mean 1 and variance \(\theta\). Conditional on \(Z = z\), the hazard is

\[h(t \mid z) = z \, h_0(t) \, e^{\mathbf{X}_j \boldsymbol{\beta}}.\]

After marginalising over \(Z\), the unconditional survival function becomes

\[S(t_j) = \left[1 + \theta \, \eta_j \, H_0(t_j)\right]^{-1/\theta},\]

where \(\eta_j = e^{\mathbf{X}_j \boldsymbol{\beta}}\) and \(H_0(t)\) is the baseline cumulative hazard. When no covariates are present, \(\eta_j = 1\).


2 Installation

# Install from local source
install.packages(
  "path/to/GammaFrailty",
  repos = NULL, type = "source"
)

3 Random Number Generation

Each baseline distribution has its own r_* function, and the full frailty model generator is r_gamma_frailty().

library(GammaFrailty)

set.seed(42)

# Arvind (alpha = 0.5)
x_arvind <- r_arvind(200, alpha = 0.5)

# Lindley (lambda = 1.5)
x_lindley <- r_lindley(200, lambda = 1.5)

# LFR (a = 0.5, b = 0.2)
x_lfr <- r_lfr(200, a = 0.5, b = 0.2)

# Power Xgamma (alpha = 1, beta = 0.8)
x_pxg <- r_pxg(100, alpha = 1.0, beta = 0.8)

# Modified Topp-Leone (alpha = 2)
x_mtl <- r_mtl(100, alpha = 2.0)

# Power Failure Rate (a = 0.5, k = 1)
x_pfr <- r_pfr(200, a = 0.5, k = 1.0)

summary(x_arvind)
#>      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
#> 0.0004777 0.4437149 0.9271125 0.9702745 1.3520352 2.7002231

3.0.1 Plot baseline distributions

plot_baseline("arvind", par = 0.5, t_range = c(0.01, 3))


4 Simulating Frailty Survival Data

r_gamma_frailty() supports eight censoring mechanisms.

set.seed(1)

# Right-censored data with one covariate
dat_right <- r_gamma_frailty(
  n        = 150,
  baseline = "arvind",
  par      = 0.5,
  x        = matrix(rnorm(150), ncol = 1),
  beta     = 0.5,
  theta    = 0.3,
  cen_type = "right",
  cen_rate = 0.25
)
colnames(dat_right)[4] <- "X1"
cat("Censoring rate:", mean(dat_right$status == 0), "\n")
#> Censoring rate: 0.2666667
head(dat_right)
#>        time time2 status         X1
#> 1 1.0266130    NA      0 -0.6264538
#> 2 0.1640531    NA      1  0.1836433
#> 3 1.1684013    NA      1 -0.8356286
#> 4 0.1800794    NA      0  1.5952808
#> 5 0.5857507    NA      1  0.3295078
#> 6 2.9474645    NA      0 -0.8204684
set.seed(2)
# Interval-censored, no covariates
dat_int <- r_gamma_frailty(
  n        = 100,
  baseline = "lfr",
  par      = c(0.5, 0.2),
  theta    = 0.4,
  cen_type = "interval",
  int_width = 0.4
)
table(dat_int$status)
#> 
#>  0  1  3 
#>  5 65 30

5 Fitting Models

5.1 No-covariate model (complete data)

set.seed(3)
dat_complete <- r_gamma_frailty(120, "arvind", par = 0.5, theta = 0.3,
                                 cen_type = "none")

fit0 <- fit_gamma_frailty(dat_complete$time, dat_complete$status,
                           baseline = "arvind")
summary(fit0)
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Arvind 
#>  Sample size           : 120 
#>  No. of covariates     : 0 
#>  No. of baseline pars  : 1 
#> 
#> ----- Parameter Estimates --------------------------------
#>       Estimate  StdErr t_stat   p_value CI_lower CI_upper Signif
#> alpha   0.5239 0.08116  6.456 2.494e-09  0.38673   0.7098    ***
#> theta   0.1869 0.11192  1.670    0.0976  0.05779   0.6044      .
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.1869   SE = 0.1119
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -114.7298
#>   AIC            : 233.4596
#>   BIC            : 239.0346

5.2 Model with covariates (right-censored)

fit1 <- fit_gamma_frailty(dat_right$time, dat_right$status,
                           x = dat_right[, "X1", drop = FALSE],
                           baseline = "arvind")
summary(fit1)
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Arvind 
#>  Sample size           : 150 
#>  No. of covariates     : 1 
#>  No. of baseline pars  : 1 
#> 
#> ----- Parameter Estimates --------------------------------
#>       Estimate  StdErr t_stat   p_value CI_lower CI_upper Signif
#> alpha   0.4355 0.06959  6.259 4.024e-09  0.31843   0.5957    ***
#> X1      0.3633 0.13627  2.666  0.008536  0.09620   0.6304     **
#> theta   0.1996 0.15315  1.303  0.194491  0.04437   0.8980       
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.1996   SE = 0.1531
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -121.3515
#>   AIC            : 248.7030
#>   BIC            : 257.7349
#> 
#> ----- Overall F-test (covariates) ------------------------
#>   F-stat = 7.1073,  p-value = 0.008536

5.3 Formula interface

library(survival)
fit_formula <- gamma_frailty(
  Surv(time, status) ~ X1,
  data     = dat_right,
  baseline = "arvind"
)
summary(fit_formula)
#> Call:
#> gamma_frailty(formula = Surv(time, status) ~ X1, data = dat_right, 
#>     baseline = "arvind")
#> 
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Arvind 
#>  Sample size           : 150 
#>  No. of covariates     : 1 
#>  No. of baseline pars  : 1 
#> 
#> ----- Parameter Estimates --------------------------------
#>       Estimate  StdErr t_stat   p_value CI_lower CI_upper Signif
#> alpha   0.4355 0.06959  6.259 4.024e-09  0.31843   0.5957    ***
#> X1      0.3633 0.13627  2.666  0.008536  0.09620   0.6304     **
#> theta   0.1996 0.15315  1.303  0.194491  0.04437   0.8980       
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.1996   SE = 0.1531
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -121.3515
#>   AIC            : 248.7030
#>   BIC            : 257.7349
#> 
#> ----- Overall F-test (covariates) ------------------------
#>   F-stat = 7.1073,  p-value = 0.008536

6 Inference and Model Metrics

# Coefficients with CI
coef(fit1)
#>     alpha        X1     theta 
#> 0.4355310 0.3632854 0.1996087
confint(fit1, level = 0.95)
#>            2.5 %    97.5 %
#> alpha 0.31843347 0.5956888
#> X1    0.09620445 0.6303664
#> theta 0.04437056 0.8979744

# Log-likelihood and information criteria
logLik(fit1)
#> 'log Lik.' -121.3515 (df=3)
AIC(fit1)
#> [1] 248.703
cat("BIC:", fit1$BIC, "\n")
#> BIC: 257.7349
cat("Frailty variance (theta):", fit1$theta,
    "  SE:", fit1$theta_se, "\n")
#> Frailty variance (theta): 0.1996087   SE: 0.1531498

6.0.1 VIF and Tolerance

# Generate data with 2 covariates to illustrate VIF
set.seed(4)
dat2 <- r_gamma_frailty(150, "arvind", par = 0.5,
                         x = matrix(rnorm(300), ncol = 2),
                         beta = c(0.4, -0.3), theta = 0.3,
                         cen_type = "right")
fit2 <- fit_gamma_frailty(dat2$time, dat2$status,
                           x = dat2[, 4:5], baseline = "arvind")
cat("VIF:\n"); print(fit2$VIF)
#> VIF:
#>       X1       X2 
#> 1.014884 1.014884
cat("Tolerance:\n"); print(fit2$Tolerance)
#> Tolerance:
#>        X1        X2 
#> 0.9853344 0.9853344

7 Residuals and Diagnostics

res <- residuals_frailty(fit1)
cat("MSE:", round(res$MSE, 4),
    " RMSE:", round(res$RMSE, 4),
    " MAE:", round(res$MAE, 4), "\n")
#> MSE: 0.0144  RMSE: 0.1199  MAE: 0.0953
cat("R-squared:", round(res$R_square, 4),
    " Adj R-sq:", round(res$Adj_R_square, 4), "\n")
#> R-squared: 0.9238  Adj R-sq: 0.9233
cat("KS test p-value:", round(res$KS_pvalue, 4), "\n")
#> KS test p-value: 5e-04
dt <- diagnostics_table(fit1)
head(dt[, c("time","status","deviance","leverage","cooks_dist","DFFITS")])
#>        time status   deviance     leverage   cooks_dist      DFFITS
#> 1 1.0266130      0 -1.1135094 0.0032198000 0.0040180734 -0.06338827
#> 2 0.1640531      1  1.7572670 0.0002766949 0.0008549032  0.02923873
#> 3 1.1684013      1  0.3463552 0.0057289812 0.0006952026  0.02636669
#> 4 0.1800794      0 -0.5612617 0.0208797391 0.0068609440 -0.08283082
#> 5 0.5857507      1  0.7812591 0.0008908039 0.0005446863  0.02333851
#> 6 2.9474645      0 -2.2837808 0.0055229930 0.0291268709 -0.17066596

8 Diagnostic Plots

8.1 Core residual plots

oldpar <- par(mfrow = c(2, 2))
plot_residuals_fitted(fit1)
plot_qq_residuals(fit1)
plot_scale_location(fit1)
plot_residuals_leverage(fit1)

par(oldpar)

8.2 Survival and influence plots

plot_survival_km(fit1)

plot_coef_forest(fit1)

plot_leverage(fit1)

plot_dfbetas(fit1)


9 Prediction

# Survival at specific time points
survival_at(fit1, times = c(0.5, 1.0, 2.0)) |> head(6)
#>   subject time  survival
#> 1       1  0.5 0.7882833
#> 2       2  0.5 0.7284945
#> 3       3  0.5 0.8018229
#> 4       4  0.5 0.5955107
#> 5       5  0.5 0.7164460
#> 6       6  0.5 0.8008667
# Median survival times
med <- predict_frailty(fit1, type = "median")
summary(med)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.5060  0.8209  0.9654  0.9678  1.1008  1.6110
# Risk scores (eta = exp(X * beta))
risk <- predict_frailty(fit1, type = "risk")
summary(risk)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.4473  0.8085  0.9822  1.0640  1.2408  2.3928
# Population-averaged (marginal) survival
t_grid <- seq(0.1, 3, by = 0.2)
ms <- predict_frailty(fit1, newtime = t_grid, type = "marginal")
plot(t_grid, ms, type = "l", lwd = 2, col = "steelblue",
     xlab = "Time", ylab = "Marginal S(t)", main = "Marginal Survival")

# Expected survival drop in [0.5, 1.5]
es <- predict_frailty(fit1, type = "expected", window = c(0.5, 1.5))
summary(es)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.3373  0.4439  0.4678  0.4574  0.4814  0.4869
# Forecast beyond training range
fc <- forecast_frailty(fit1, horizon = 6, n_grid = 100)
plot(fc$time, fc$survival[1, ], type = "l", lwd = 2, col = "firebrick",
     xlab = "Time", ylab = "S(t)", main = "Forecast: Subject 1")


10 Model Comparison

set.seed(5)
dat_cmp <- r_gamma_frailty(120, "arvind", par = 0.5, theta = 0.3,
                             cen_type = "right", cen_rate = 0.2)

fits <- list(
  Arvind  = fit_gamma_frailty(dat_cmp$time, dat_cmp$status, baseline = "arvind"),
  Lindley = fit_gamma_frailty(dat_cmp$time, dat_cmp$status, baseline = "lindley"),
  LFR     = fit_gamma_frailty(dat_cmp$time, dat_cmp$status, baseline = "lfr"),
  PFR     = fit_gamma_frailty(dat_cmp$time, dat_cmp$status, baseline = "pfr")
)

compare_models(fits)
#>     Model            Baseline   n K    logLik      AIC      BIC  theta
#> 1  Arvind              Arvind 120 2 -110.6038 225.2076 230.7825 0.3111
#> 2 Lindley             Lindley 120 2 -111.6868 227.3736 232.9485 0.0000
#> 3     LFR Linear Failure Rate 120 3 -110.4537 226.9073 235.2698 0.4076
#> 4     PFR  Power Failure Rate 120 3 -109.4395 224.8790 233.2415 0.0265
#>   delta_AIC AIC_weight
#> 1    0.3286     0.3396
#> 2    2.4946     0.1150
#> 3    2.0283     0.1452
#> 4    0.0000     0.4002

11 Cross-Validation

cv_res <- cv_frailty(dat_cmp$time, dat_cmp$status,
                      baseline = "arvind", k = 5)
cat("Mean OOS log-lik:", round(cv_res$mean_oos_loglik, 3), "\n")
cat("Mean OOS RMSE:   ", round(cv_res$mean_oos_rmse,   3), "\n")

12 Censored Data - All Types

12.1 Left censoring

set.seed(6)
dat_left <- r_gamma_frailty(100, "pfr", par = c(0.5, 1), theta = 0.35,
                             cen_type = "left", left_threshold = 0.3)
table(dat_left$status)
#> 
#>   1 
#> 100
fit_left <- fit_gamma_frailty(dat_left$time, dat_left$status,
                               baseline = "pfr")
summary(fit_left)
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Power Failure Rate 
#>  Sample size           : 100 
#>  No. of covariates     : 0 
#>  No. of baseline pars  : 2 
#> 
#> ----- Parameter Estimates --------------------------------
#>       Estimate StdErr t_stat  p_value CI_lower CI_upper Signif
#> a       0.4691 0.1088  4.309 3.93e-05   0.2976   0.7392    ***
#> k       1.1820 0.4107  2.878 0.004926   0.5982   2.3355     **
#> theta   0.6224 0.4167  1.494 0.138509   0.1676   2.3117       
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.6224   SE = 0.4167
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -166.1406
#>   AIC            : 338.2813
#>   BIC            : 346.0968

12.2 Interval censoring

fit_int <- fit_gamma_frailty(dat_int$time, dat_int$status,
                              time2 = dat_int$time2,
                              baseline = "lfr")
summary(fit_int)
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Linear Failure Rate 
#>  Sample size           : 100 
#>  No. of covariates     : 0 
#>  No. of baseline pars  : 2 
#> 
#> ----- Parameter Estimates --------------------------------
#>       Estimate  StdErr  t_stat   p_value  CI_lower  CI_upper Signif
#> a     0.571170 0.09355 6.10562 2.114e-08 4.143e-01 7.874e-01    ***
#> b     0.004588 0.09047 0.05071    0.9597 7.512e-20 2.802e+14       
#> theta 0.043597 0.33317 0.13085    0.8962 1.363e-08 1.395e+05       
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.0436   SE = 0.3332
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -178.4396
#>   AIC            : 362.8793
#>   BIC            : 370.6948

12.3 Type-I censoring (fixed time)

set.seed(8)
dat_t1 <- r_gamma_frailty(100, "arvind", par = 0.5, theta = 0.3,
                           cen_type = "type1", cen_time = 2.0)
table(dat_t1$status)   # 0 = censored at time 2.0
#> 
#>  0  1 
#> 16 84
fit_t1 <- fit_gamma_frailty(dat_t1$time, dat_t1$status, baseline = "arvind")
summary(fit_t1)
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Arvind 
#>  Sample size           : 100 
#>  No. of covariates     : 0 
#>  No. of baseline pars  : 1 
#> 
#> ----- Parameter Estimates --------------------------------
#>       Estimate StdErr t_stat  p_value CI_lower CI_upper Signif
#> alpha   0.5358 0.1065  5.030 2.22e-06   0.3629   0.7911    ***
#> theta   0.4131 0.2229  1.853  0.06685   0.1435   1.1894      .
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.4131   SE = 0.2229
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -93.9247
#>   AIC            : 191.8494
#>   BIC            : 197.0598

12.4 Type-II censoring (fixed failure count)

set.seed(9)
dat_t2 <- r_gamma_frailty(100, "pfr", par = c(0.5, 1), theta = 0.3,
                           cen_type = "type2", r_failures = 70L)
table(dat_t2$status)   # exactly 70 status=1 events
#> 
#>  0  1 
#> 30 70
fit_t2 <- fit_gamma_frailty(dat_t2$time, dat_t2$status, baseline = "pfr")
summary(fit_t2)
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Power Failure Rate 
#>  Sample size           : 100 
#>  No. of covariates     : 0 
#>  No. of baseline pars  : 2 
#> 
#> ----- Parameter Estimates --------------------------------
#>       Estimate StdErr t_stat  p_value CI_lower CI_upper Signif
#> a       0.6541 0.2292 2.8535 0.005287  0.32910    1.300     **
#> k       1.2254 0.4607 2.6596 0.009153  0.58645    2.561     **
#> theta   0.6738 0.7601 0.8864 0.377581  0.07383    6.149       
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.6738   SE = 0.7601
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -112.5174
#>   AIC            : 231.0349
#>   BIC            : 238.8504

12.5 Progressive Type-I censoring (withdrawals at fixed times)

set.seed(10)
dat_pt1 <- r_gamma_frailty(120, "lindley", par = 1.5, theta = 0.4,
                            cen_type = "progressive_type1",
                            prog_times  = c(0.5, 1.0, 1.5),
                            prog_scheme = c(5L, 5L, 5L))
table(dat_pt1$status)   # 0 = withdrawn at tau_j, 1 = exact failure
#> 
#>   0   1 
#>  15 105
fit_pt1 <- fit_gamma_frailty(dat_pt1$time, dat_pt1$status,
                              baseline = "lindley")
summary(fit_pt1)
#> =========================================================
#>  Gamma Frailty Regression Model
#> =========================================================
#>  Baseline Distribution : Lindley 
#>  Sample size           : 120 
#>  No. of covariates     : 0 
#>  No. of baseline pars  : 1 
#> 
#> ----- Parameter Estimates --------------------------------
#>        Estimate StdErr t_stat p_value CI_lower CI_upper Signif
#> lambda   1.5467 0.1589  9.735  <2e-16  1.26467   1.8917    ***
#> theta    0.1154 0.1109  1.040  0.3003  0.01753   0.7592       
#> 
#> ----- Frailty Variance -----------------------------------
#>   theta (frailty var) = 0.1154   SE = 0.1109
#> 
#> ----- Model Fit ------------------------------------------
#>   Log-Likelihood : -104.5526
#>   AIC            : 213.1053
#>   BIC            : 218.6803

13 Simulation Study

set.seed(99)
sim_res <- simulation_study(
  n_sim    = 200,
  n_vec    = c(50, 100, 200),
  baseline = "arvind",
  par      = 0.5,
  beta     = 0.4,
  theta    = 0.3,
  cen_type = "right",
  cen_rate = 0.2,
  verbose  = TRUE
)

14 References

  1. Lindley, D. V. (1958). Fiducial distributions and Bayes’ theorem. Journal of the Royal Statistical Society. Series B (Methodological), 102-107.

  2. Mugdadi, A. R. (2005). The least squares type estimation of the parameters in the power hazard function. Applied mathematics and computation, 169(2), 737-748.

  3. Bain, L. J. (1974). Analysis for the linear failure-rate life-testing distribution. Technometrics, 16(4), 551-559.

  4. Singh, B., Tyagi, S., Singh, R. P., & Tyagi, A. (2025). Modified Topp-Leone distribution: properties, classical and Bayesian estimation with application to COVID-19 and reliability data. Thailand Statistician, 23(1), 72-96.

  5. Pandey, A., Singh, R. P., Tyagi, S., & Tyagi, A. (2024). Modelling climate, COVID-19, and reliability data: A new continuous lifetime model under different methods of estimation. Stat. Appl., 22(2).

  6. Tyagi, S., Kumar, S., Pandey, A., Saha, M., & Bagariya, H. Power xgamma distribution: Properties and its applications to cancer data. Int J Stat Reliab Eng. 2022; 9(1): 51-60.