library(metafrontier)This vignette provides a detailed exposition of the metafrontier methods implemented in the package, linking the econometric theory to the computational approach used at each step.
Consider \(J\) groups of firms, where group \(j\) contains \(n_j\) firms. For group \(j\), the stochastic frontier model is:
\[\ln y_{ij} = x_{ij}'\beta_j + v_{ij} - u_{ij}, \quad i = 1, \ldots, n_j\]
where \(y_{ij}\) is the output of firm \(i\) in group \(j\), \(x_{ij}\) is a vector of (logged) inputs including a constant, \(\beta_j\) is the group-specific parameter vector, \(v_{ij} \sim N(0, \sigma_{v,j}^2)\) is symmetric noise, and \(u_{ij} \ge 0\) is one-sided inefficiency.
The group-specific technical efficiency is:
\[TE_{ij} = \exp(-u_{ij}) \in (0, 1]\]
estimated via the Jondrow et al. (1982) conditional mean estimator.
The metafrontier is defined as a function \(f^*(x) = \exp(x'\beta^*)\) such that:
\[x'\beta^* \ge x'\beta_j \quad \text{for all } x \text{ and all } j\]
That is, the metafrontier weakly dominates all group frontiers. It represents the production technology available to firms with unrestricted access to all technologies.
For each firm, efficiency relative to the metafrontier decomposes as:
\[TE^*_{ij} = TE_{ij} \times TGR_{ij}\]
where the technology gap ratio is:
\[TGR_{ij} = \frac{\exp(x_{ij}'\beta_j)}{\exp(x_{ij}'\beta^*)} = \exp\left(x_{ij}'(\beta_j - \beta^*)\right) \in (0, 1]\]
A \(TGR\) of 1 means the group frontier coincides with the metafrontier at that input mix; values below 1 indicate a technology gap.
After obtaining group estimates \(\hat\beta_j\) in Stage 1, the metafrontier parameters \(\hat\beta^*\) are estimated by solving:
\[\min_{\beta^*} \sum_{j=1}^{J} \sum_{i=1}^{n_j} \left(x_{ij}'\beta^* - x_{ij}'\hat\beta_j\right)^2\] \[\text{subject to: } x_{ij}'\beta^* \ge x_{ij}'\hat\beta_j \quad \forall\, i, j\]
This is a convex quadratic program. The metafrontier package solves it using constrOptim() from base R, which implements an adaptive barrier algorithm for linearly constrained optimisation.
sim <- simulate_metafrontier(
n_groups = 2, n_per_group = 300,
tech_gap = c(0, 0.4),
sigma_u = c(0.2, 0.35),
seed = 123
)
fit_det <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
meta_type = "deterministic"
)
# Metafrontier coefficients (no standard errors)
coef(fit_det, which = "meta")
#> (Intercept) log_x1 log_x2
#> 1.0416500 0.4950084 0.1926936
# Group coefficients for comparison
coef(fit_det, which = "group")
#> $G1
#> (Intercept) log_x1 log_x2
#> 1.0416500 0.4950084 0.1926936
#>
#> $G2
#> (Intercept) log_x1 log_x2
#> 0.5814439 0.5287600 0.2015517The metafrontier intercept should be at least as large as all group intercepts:
meta_b0 <- coef(fit_det, which = "meta")[1]
group_b0 <- sapply(coef(fit_det, which = "group"), `[`, 1)
meta_b0 >= group_b0
#> G1.(Intercept) G2.(Intercept)
#> TRUE TRUEHuang, Huang, and Liu (2014) propose treating the technology gap as a stochastic variable. In Stage 2, the fitted group frontier values become the dependent variable in a second SFA:
\[\ln \hat{f}(x_{ij}; \hat\beta_j) = x_{ij}'\beta^* + v^*_{ij} - u^*_{ij}\]
where \(u^*_{ij} \ge 0\) captures the technology gap and \(v^*_{ij}\) is a noise term. This is estimated via MLE, yielding:
The stochastic metafrontier is a two-stage estimator. In Stage 2, the dependent variable \(\ln \hat{f}(x_{ij}; \hat\beta_j)\) is itself an estimate from Stage 1 – it is a generated regressor (Murphy and Topel, 1985). The standard errors reported by the package are derived from the Stage 2 Hessian alone and do not account for the sampling uncertainty in the Stage 1 group frontier estimates.
As a result:
confint()), and hypothesis tests may be understated (confidence intervals narrower than their nominal coverage warrants).vcov(fit, correction = "murphy-topel") and confint(fit, correction = "murphy-topel"). This adjusts the Stage 2 variance-covariance matrix to account for Stage 1 estimation uncertainty.fit_sto <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
meta_type = "stochastic"
)
summary(fit_sto)
#>
#> Metafrontier Model Summary
#> ==========================
#>
#> Call:
#> metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data,
#> group = "group", meta_type = "stochastic")
#>
#> Method: sfa
#> Metafrontier: stochastic
#>
#> --- Group: G1 (n = 300) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 1.041650 0.054009 19.29 <2e-16 ***
#> log_x1 0.495008 0.009173 53.97 <2e-16 ***
#> log_x2 0.192694 0.008907 21.64 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 22.685
#>
#> --- Group: G2 (n = 300) ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.58144 0.05155 11.28 <2e-16 ***
#> log_x1 0.52876 0.01019 51.88 <2e-16 ***
#> log_x2 0.20155 0.01037 19.44 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -36.696
#>
#> --- Metafrontier ---
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) 0.813734 0.208469 3.903 9.49e-05 ***
#> log_x1 0.513997 0.005155 99.716 < 2e-16 ***
#> log_x2 0.197514 0.004989 39.588 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 181.35
#>
#> --- Efficiency Decomposition ---
#> Group Mean_TE Mean_TGR Mean_TE_star
#> G1 0.8433 1.1839 0.9984
#> G2 0.7328 0.8306 0.6087
#>
#> --- Technology Gap Ratio Summary ---
#> Group N Mean SD Min Q1 Median Q3 Max
#> G1 300 1.1839 0.0320 1.1222 1.1576 1.1861 1.2087 1.2464
#> G2 300 0.8306 0.0181 0.7949 0.8148 0.8300 0.8445 0.8693The stochastic metafrontier provides standard errors:
# Variance-covariance matrix
vcov(fit_sto)
#> (Intercept) log_x1 log_x2
#> (Intercept) 4.345943e-02 -8.269764e-05 -6.986322e-05
#> log_x1 -8.269764e-05 2.657002e-05 1.867196e-06
#> log_x2 -6.986322e-05 1.867196e-06 2.489219e-05
# Log-likelihood of the metafrontier model
logLik(fit_sto)
#> 'log Lik.' 181.3504 (df=3)Under the stochastic metafrontier, TGR values are not constrained to be \(\le 1\) in finite samples, since the metafrontier need not strictly envelop all group frontiers. Values slightly above 1 can occur and are consistent with the stochastic framework.
tgr_vals <- efficiencies(fit_sto, type = "tgr")
summary(tgr_vals)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.7949 0.8300 0.9958 1.0072 1.1861 1.2464For a nonparametric metafrontier:
The package solves the DEA linear programs using lpSolveAPI.
The rts argument controls the technology assumption:
"crs" (constant returns to scale): the standard CCR model"vrs" (variable returns to scale): the BCC model"drs" / "irs" (decreasing / increasing returns)# CRS metafrontier
fit_crs <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
method = "dea",
rts = "crs"
)
# VRS metafrontier
fit_vrs <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim$data,
group = "group",
method = "dea",
rts = "vrs"
)
# Compare mean TGR
cbind(
CRS = tapply(fit_crs$tgr, fit_crs$group_vec, mean),
VRS = tapply(fit_vrs$tgr, fit_vrs$group_vec, mean)
)
#> CRS VRS
#> G1 1.0000000 1.0000000
#> G2 0.6014273 0.8180669The choice between deterministic, stochastic, and DEA metafrontiers involves trade-offs:
| Feature | Deterministic SFA | Stochastic SFA | DEA |
|---|---|---|---|
| Functional form | Parametric | Parametric | Nonparametric |
| Noise handling | Stage 1 only | Both stages | None |
| Inference on TGR | No | Yes | No |
| TGR \(\le\) 1 guaranteed | Yes | No | Yes |
| Small sample performance | Moderate | Moderate | Poor |
| References | BRO (2004) | HHL (2014) | ORB (2008) |
# Compare TGR estimates across methods
tgr_det <- tapply(fit_det$tgr, fit_det$group_vec, mean)
tgr_sto <- tapply(fit_sto$tgr, fit_sto$group_vec, mean)
tgr_dea <- tapply(fit_crs$tgr, fit_crs$group_vec, mean)
true_tgr <- tapply(sim$data$true_tgr, sim$data$group, mean)
comparison <- data.frame(
True = true_tgr,
Deterministic = tgr_det,
Stochastic = tgr_sto,
DEA_CRS = tgr_dea
)
round(comparison, 4)
#> True Deterministic Stochastic DEA_CRS
#> G1 1.0000 1.000 1.1839 1.0000
#> G2 0.6703 0.702 0.8306 0.6014Selecting between deterministic SFA, stochastic SFA, and DEA metafrontiers depends on the research question, data characteristics, and inferential requirements.
Use the deterministic SFA metafrontier (BRO 2004) when:
Use the stochastic SFA metafrontier (HHL 2014) when:
Use the DEA metafrontier when:
rts) is well-justified by the application context.In many applied studies, it is informative to estimate multiple methods and compare TGR estimates for robustness (as shown in Section 5).
Before estimating a metafrontier, it is useful to test whether separate group frontiers are actually needed. The poolability test uses a likelihood ratio statistic:
\[LR = -2\left[LL_{pooled} - \sum_{j=1}^{J} LL_j\right] \sim \chi^2_{df}\]
where \(LL_{pooled}\) is the log-likelihood of a single frontier estimated on the pooled sample and \(LL_j\) are the group-specific log-likelihoods.
poolability_test(fit_det)
#>
#> Likelihood Ratio Test for Poolability of Group Frontiers
#>
#> data: metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data, group = "group", meta_type = "deterministic")
#> LR = 442.35, df = 5, p-value < 2.2e-16A significant test (p < 0.05) confirms that the groups operate under different technologies and the metafrontier decomposition is warranted.
The simulate_metafrontier() function generates data from a known DGP, enabling parameter recovery studies:
# Monte Carlo: check parameter recovery over 100 replications
set.seed(1)
n_rep <- 100
beta_hat <- matrix(NA, n_rep, 3)
for (r in seq_len(n_rep)) {
sim_r <- simulate_metafrontier(
n_groups = 2, n_per_group = 200,
tech_gap = c(0, 0.3),
sigma_u = c(0.2, 0.3),
sigma_v = 0.15
)
fit_r <- metafrontier(
log_y ~ log_x1 + log_x2,
data = sim_r$data,
group = "group",
meta_type = "deterministic"
)
beta_hat[r, ] <- coef(fit_r, which = "meta")
}
# Bias
true_beta <- c(1.0, 0.5, 0.3)
colMeans(beta_hat) - true_betaThe simulate_metafrontier() function supports:
n_groups)n_per_group as a vector)beta_meta)tech_gap)sigma_u)seedBattese, G.E., Rao, D.S.P. and O’Donnell, C.J. (2004). A metafrontier production function for estimation of technical efficiencies and technology gaps for firms operating under different technologies. Journal of Productivity Analysis, 21(1), 91–103.
Huang, C.J., Huang, T.-H. and Liu, N.-H. (2014). A new approach to estimating the metafrontier production function based on a stochastic frontier framework. Journal of Productivity Analysis, 42(3), 241–254.
Jondrow, J., Lovell, C.A.K., Materov, I.S. and Schmidt, P. (1982). On the estimation of technical inefficiency in the stochastic frontier production function model. Journal of Econometrics, 19(2–3), 233–238.
O’Donnell, C.J., Rao, D.S.P. and Battese, G.E. (2008). Metafrontier frameworks for the study of firm-level efficiencies and technology ratios. Empirical Economics, 34(2), 231–255.