Help for package ssmodels

Title:

Sample Selection Models

Version:

2.0.2

Language:

en-US

Author:

Fernando de Souza Bastos [aut, cre], Wagner Barreto de Souza [aut]

Maintainer:

Fernando de Souza Bastos <fernando.bastos@ufv.br>

Depends:

R (≥ 3.6.0)

Imports:

MASS, sn (≥ 2.1.0), miscTools (≥ 0.6-26), Rdpack (≥ 2.4)

Suggests:

knitr (≥ 1.24), testthat (≥ 3.0.0), numDeriv (≥ 2016.8-1.1), maxLik (≥ 1.3-6), mvtnorm (≥ 1.0-11), sampleSelection (≥ 1.2-6), kableExtra (≥ 1.1.0), kfigr (≥ 1.2), ggplot2 (≥ 3.2.1), gridExtra (≥ 2.3)

Description:

In order to facilitate the adjustment of the sample selection models existing in the literature, we created the 'ssmodels' package. Our package allows the adjustment of the classic Heckman model (Heckman (1976), Heckman (1979) <doi:10.2307/1912352>), and the estimation of the parameters of this model via the maximum likelihood method and two-step method, in addition to the adjustment of the Heckman-t models introduced in the literature by Marchenko and Genton (2012) <doi:10.1080/01621459.2012.656011> and the Heckman-Skew model introduced in the literature by Ogundimu and Hutton (2016) <doi:10.1111/sjos.12171>. We also implemented functions to adjust the generalized version of the Heckman model, introduced by Bastos, Barreto-Souza, and Genton (2021) <doi:10.5705/ss.202021.0068>, that allows the inclusion of covariables to the dispersion and correlation parameters, and a function to adjust the Heckman-BS model introduced by Bastos and Barreto-Souza (2020) <doi:10.1080/02664763.2020.1780570> that uses the Birnbaum-Saunders distribution as a joint distribution of the selection and primary regression variables. This package extends and complements existing R packages such as 'sampleSelection' (Toomet and Henningsen, 2008) and 'ssmrob' (Zhelonkin et al., 2016), providing additional robust and flexible sample selection models.

License:

GPL-2 | GPL-3 [expanded from: GPL (≥ 2)]

Encoding:

UTF-8

LazyData:

true

RdMacros:

Rdpack

BugReports:

https://github.com/fsbmat-ufv/ssmodels/issues

Config/testthat/edition:

URL:

https://fsbmat-ufv.github.io/ssmodels/

NeedsCompilation:

Config/roxygen2/version:

8.0.0

Packaged:

2026-05-30 04:06:43 UTC; Fernando

Repository:

CRAN

Date/Publication:

2026-05-30 04:20:02 UTC

A package that provides functions to fit data affected by sample selection bias.

Description

Package that provides models to fit data with sample selection bias problems. Includes:

HeckmanCL(selectEq, outcomeEq, data = data, start): Heckman's classic model fit function. Sample selection usually arises in practice as a result of partial observability of the result of interest in a study. In the presence of sample selection, the observed data do not represent a random sample of the population, even after controlling for explanatory variables. #' That is, the data is not missing completely at random. Thus, standard analysis using only complete cases will lead to biased results. Heckman introduced a sample selection model to analyze this data and proposed a complete likelihood estimation method under the assumption of normality. Such model was called Heckman model or Tobit 2 model.
HeckmantS(selectEq, outcomeEq, data = data, df, start): Heckman-t model adjustment function. The Heckman-t model maintains the original parametric structure of the Classic Heckman model, but considers a bivariate Student's t distribution as the underlying joint distribution of the selection and primary regression variable and estimates the parameters by maximum likelihood.
HeckmanSK(selectEq, outcomeEq, data = data, lambda, start): Heckman-SK model adjustment function. The Heckman-sk model maintains the original parametric structure of the Classic Heckman model, but considers a bivariate Skew-Normal distribution as the underlying joint distribution of the selection and primary regression variable and estimates the parameters by maximum likelihood.
HeckmanBS(selectEq, outcomeBS, data = data, start): Heckman-BS model adjustment function. The Heckman-BS model maintains the original parametric structure of the Classic Heckman model, but considers a bivariate Birnbaum-Saunders distribution as the underlying joint distribution of the selection and primary regression variable and estimates the parameters by maximum likelihood.
HeckmanGe(selectEq, outcomeEq,outcomeS, outcomeC, data = data): Function for fitting the Generalized Heckman model. This model generalizes the Classic Heckman model by including covariates in the dispersion and correlation structures. It allows identification of variables responsible for selection bias and heteroscedasticity.

Arguments

selection

Selection equation.

outcome

Primary regression equation for the observed response.

outcomeS

Matrix of covariates for modeling the dispersion parameter (sigma).

outcomeC

Matrix of covariates for modeling the correlation parameter (rho).

df

Initial value to the degree of freedom of Heckman-t model.

lambda

Initial value for asymmetry parameter.

start

initial values.

data

Database.

Value

A list containing the estimated parameters, Hessian matrix, number of observations, and additional diagnostic information. If initial values are not provided, they are automatically estimated using the Heckman two-step method.

Author(s)

Fernando de Souza Bastos, Wagner Barreto de Souza

Two-Step Method for Parameter Estimation of the Heckman Model

Description

Estimates classical Heckman starting values using Heckman's two-step method.

Usage

HCinitial(selection, outcome, data = sys.frame(sys.parent()))

Arguments

selection

A formula for the selection equation.

outcome

A formula for the outcome equation.

data

A data frame containing the variables.

Value

A numeric vector containing selection coefficients, outcome coefficients, sigma and rho.

Heckman BS Model fit Function

Description

Estimates the parameters of the Heckman-BS model

Usage

HeckmanBS(selection, outcome, data = sys.frame(sys.parent()), start = NULL)

Arguments

selection

Selection equation.

outcome

Primary Regression Equation.

data

Database.

start

initial values.

Details

The HeckmanBS() function fits the Sample Selection Model based on the Birnbaum Saunders bivariate distribution, it has the same number of parameters as the classical Heckman model. For more information see Bastos and Barreto-Souza (2020) doi:10.1080/02664763.2020.1780570.

Value

Returns a list with the following components.

Coefficients: Returns a numerical vector with the best estimated values of the model parameters;

Value: The value of function to be minimized (or maximized) corresponding to par.

loglik: Maximized value of the log-likelihood function calculated from the estimated coefficients.

counts: Component of the Optim function. A two-element integer vector giving the number of calls to fn and gr respectively. This excludes those calls needed to compute the Hessian, if requested, and any calls to fn to compute a finite-difference approximation to the gradient.

hessian: Component of the Optim function, with pre-defined option hessian=TRUE. A symmetric matrix giving an estimate of the Hessian at the solution found. Note that this is the Hessian of the unconstrained problem even if the box constraints are active.

fisher_infoBS: Fisher information matrix

prop_sigmaBS: Square root of the Fisher information matrix diagonal

coefficients_link: Estimates on the optimization scale. The last component is rho_star, where rho = 2 / (1 + exp(-rho_star)) - 1.

gradient_link: Analytical score vector evaluated at coefficients_link.

level: Selection variable levels

nObs: Numeric value representing the size of the database

nParam: Numerical value representing the number of model parameters

N0: Numerical value representing the number of unobserved entries

N1: Numerical value representing the number of complete entries

NXS: Numerical value representing the number of parameters of the selection model

NXO: Numerical value representing the number of parameters of the regression model

df: Numerical value that represents the difference between the size of the response vector of the selection equation and the number of model parameters

aic: Numerical value representing Akaike's information criterion.

bic: Numerical value representing Schwarz's Bayesian Criterion

initial.value: Numerical vector that represents the input values (Initial Values) used in the parameter estimation.

References

Bastos, F. S. and Barreto-Souza, W. (2020). Birnbaum-Saunders sample selection model. Journal of Applied Statistics. doi:10.1080/02664763.2020.1780570.

Examples

data(MEPS2001)
attach(MEPS2001)
selectEq <- dambexp ~ age + female + educ + blhisp + totchr + ins + income
outcomeBS <- ambexp ~ age + female + educ + blhisp + totchr + ins
HeckmanBS(selectEq, outcomeBS, data = MEPS2001)

Classic Heckman Model fit Function

Description

Estimates the parameters of the classic Heckman model via Maximum Likelihood method. The initial start is obtained via the two-step method.

Usage

HeckmanCL(selection, outcome, data = sys.frame(sys.parent()), start = NULL)

Arguments

selection

Selection equation.

outcome

Primary Regression Equation.

data

Database.

start

initial values.

Value

Returns a list with the following components.

Coefficients: Returns a numerical vector with the best estimated values of the model parameters;

Value: The value of function to be minimized (or maximized) corresponding to par.

loglik: Maximized value of the log-likelihood function calculated from the estimated coefficients.

fisher_infoHC: Fisher information matrix

prop_sigmaHC: Square root of the Fisher information matrix diagonal