---
title: "Getting started with `aftPenCDA` package"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{Getting started with `aftPenCDA` package}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  out.width = "100%",
  fig.width = 7,
  fig.height = 4,
  fig.align = "center",
  dpi = 150
)
```

## Overview

`aftPenCDA` is an R package for fitting penalized accelerated failure time (AFT) models using induced smoothing. The package supports variable selection for both right-censored and clustered partly interval-censored survival data.

Several penalty functions are implemented, including broken adaptive ridge (BAR), LASSO, adaptive LASSO (ALASSO), and SCAD. For variance estimation, the package provides both a closed-form estimator and a perturbation-based estimator.

Core computational routines are implemented in 'C++' via 'Rcpp' ('RcppArmadillo' backend) to ensure scalability for high-dimensional settings.

## Methodological background

The accelerated failure time (AFT) model with rank-based estimating equations involves nonsmooth objective functions, which pose challenges for numerical optimization.

Induced smoothing replaces the nonsmooth estimating equations with smooth approximations, allowing the use of gradient-based methods. This approach avoids direct optimization of nonsmooth rank-based estimating equations, significantly improving computational efficiency.

This leads to a quadratic approximation of the objective function. By applying a Cholesky decomposition, the problem is transformed into a least-squares-type formulation, which enables efficient coordinate descent updates for penalized estimation in high-dimensional settings.

The resulting formulation enables efficient computation even when the number of covariates is large relative to the sample size.

## Installation

You can install the development version of `aftPenCDA` from GitHub:

```{r eval = FALSE}
devtools::install_github("seonsy/aftPenCDA")
```

## Main functions

The main functions in `aftPenCDA` are:

- `aftpen()`: penalized AFT model for right-censored data
- `aftpen_pic()`: penalized AFT model for clustered partly interval-censored data

Both functions support the following penalty types:

- `"BAR"`: Broken Adaptive Ridge
- `"LASSO"`: LASSO penalty
- `"ALASSO"`: Adaptive LASSO penalty
- `"SCAD"`: Smoothly Clipped Absolute Deviation penalty

```{r, eval=FALSE}
library(aftPenCDA)
```

## Example 1: Right-censored data

We use the example right-censored dataset included in the package and fit the penalized estimator.

```{r, eval=FALSE}
data("simdat_rc")
```

We fit the model using the BAR penalty.

```{r,eval=FALSE}
fit_bar <- aftpen(simdat_rc, lambda = 0.3, se = "CF", type = "BAR")
fit_bar$beta
```

Other penalties are also available.

```{r,eval=FALSE}
fit_lasso  <- aftpen(simdat_rc, lambda = 0.1, se = "CF", type = "LASSO")
fit_alasso <- aftpen(simdat_rc, lambda = 0.1, se = "CF", type = "ALASSO")
fit_scad   <- aftpen(simdat_rc, lambda = 0.1, se = "CF", type = "SCAD")
```

## Example 2: Clustered partly interval-censored data

We use the example clustered partly interval-censored dataset included in the package and apply the proposed method.

```{r,eval=FALSE}
data("simdat_pic")
```

We fit the model using the BAR penalty.

```{r,eval=FALSE}
fit_pic <- aftpen_pic(simdat_pic, lambda = 0.0005, se = "CF", type = "BAR")
fit_pic$beta
```

Other penalties are also available for partly interval-censored data.

```{r,eval=FALSE}
fit_pic_lasso  <- aftpen_pic(simdat_pic, lambda = 0.001, se = "CF", type = "LASSO")
fit_pic_alasso <- aftpen_pic(simdat_pic, lambda = 0.001, se = "CF", type = "ALASSO")
fit_pic_scad   <- aftpen_pic(simdat_pic, lambda = 0.001, se = "CF", type = "SCAD")
```

## Variance estimation

The argument `se` specifies the variance estimation method.

- `"CF"`: closed-form estimator
- `"ZL"`: perturbation-based estimator

For example:

```{r,eval=FALSE}
fit_zl <- aftpen(simdat_rc, lambda = 0.1, se = "ZL", type = "BAR")
```

## References

Wang, You-Gan, and Yudong Zhao (2008). “Weighted Rank Regression for Clustered Data Analysis.” *Biometrics* **64**(1), 39--45.

Dai, L., K. Chen, Z. Sun, Z. Liu, and G. Li (2018). “Broken Adaptive Ridge Regression and Its Asymptotic Properties.” *Journal of Multivariate Analysis* **168**, 334--351.

Zeng, Donglin, and D. Y. Lin (2008).“Efficient Resampling Methods for Nonsmooth Estimating Functions.” *Biostatistics* **9**(2), 355--363.

Tibshirani, Robert (1996).“Regression Shrinkage and Selection via the Lasso.” *Journal of the Royal Statistical Society: Series B* **58**(1), 267--288.

Fan, Jianqing, and Runze Li (2001). “Variable Selection via Nonconcave Penalized Likelihood and Its Oracle Properties.” *Journal of the American Statistical Association* **96**(456), 1348--1360.

Zou, Hui (2006).“The Adaptive Lasso and Its Oracle Properties.” *Journal of the American Statistical Association* **101**(476), 1418--1429.
