Metafrontier Methods: Theory and Computation

library(metafrontier)

This vignette provides a detailed exposition of the metafrontier methods implemented in the package, linking the econometric theory to the computational approach used at each step.

1. The metafrontier framework

1.1 Group-specific stochastic frontiers

Consider \(J\) groups of firms, where group \(j\) contains \(n_j\) firms. For group \(j\), the stochastic frontier model is:

\[\ln y_{ij} = x_{ij}'\beta_j + v_{ij} - u_{ij}, \quad i = 1, \ldots, n_j\]

where \(y_{ij}\) is the output of firm \(i\) in group \(j\), \(x_{ij}\) is a vector of (logged) inputs including a constant, \(\beta_j\) is the group-specific parameter vector, \(v_{ij} \sim N(0, \sigma_{v,j}^2)\) is symmetric noise, and \(u_{ij} \ge 0\) is one-sided inefficiency.

The group-specific technical efficiency is:

\[TE_{ij} = \exp(-u_{ij}) \in (0, 1]\]

estimated via the Jondrow et al. (1982) conditional mean estimator.

1.2 The metafrontier

The metafrontier is defined as a function \(f^*(x) = \exp(x'\beta^*)\) such that:

\[x'\beta^* \ge x'\beta_j \quad \text{for all } x \text{ and all } j\]

That is, the metafrontier weakly dominates all group frontiers. It represents the production technology available to firms with unrestricted access to all technologies.

1.3 The efficiency decomposition

For each firm, efficiency relative to the metafrontier decomposes as:

\[TE^*_{ij} = TE_{ij} \times TGR_{ij}\]

where the technology gap ratio is:

\[TGR_{ij} = \frac{\exp(x_{ij}'\beta_j)}{\exp(x_{ij}'\beta^*)} = \exp\left(x_{ij}'(\beta_j - \beta^*)\right) \in (0, 1]\]

A \(TGR\) of 1 means the group frontier coincides with the metafrontier at that input mix; values below 1 indicate a technology gap.

2. Deterministic metafrontier (Battese, Rao, and O’Donnell, 2004)

2.1 Estimation

After obtaining group estimates \(\hat\beta_j\) in Stage 1, the metafrontier parameters \(\hat\beta^*\) are estimated by solving:

\[\min_{\beta^*} \sum_{j=1}^{J} \sum_{i=1}^{n_j} \left(x_{ij}'\beta^* - x_{ij}'\hat\beta_j\right)^2\] \[\text{subject to: } x_{ij}'\beta^* \ge x_{ij}'\hat\beta_j \quad \forall\, i, j\]

This is a convex quadratic program. The metafrontier package solves it using constrOptim() from base R, which implements an adaptive barrier algorithm for linearly constrained optimisation.

2.2 Properties

2.3 Example

sim <- simulate_metafrontier(
  n_groups = 2, n_per_group = 300,
  tech_gap = c(0, 0.4),
  sigma_u = c(0.2, 0.35),
  seed = 123
)

fit_det <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  meta_type = "deterministic"
)

# Metafrontier coefficients (no standard errors)
coef(fit_det, which = "meta")
#> (Intercept)      log_x1      log_x2 
#>   1.0416500   0.4950084   0.1926936

# Group coefficients for comparison
coef(fit_det, which = "group")
#> $G1
#> (Intercept)      log_x1      log_x2 
#>   1.0416500   0.4950084   0.1926936 
#> 
#> $G2
#> (Intercept)      log_x1      log_x2 
#>   0.5814439   0.5287600   0.2015517

The metafrontier intercept should be at least as large as all group intercepts:

meta_b0 <- coef(fit_det, which = "meta")[1]
group_b0 <- sapply(coef(fit_det, which = "group"), `[`, 1)
meta_b0 >= group_b0
#> G1.(Intercept) G2.(Intercept) 
#>           TRUE           TRUE

3. Stochastic metafrontier (Huang, Huang, and Liu, 2014)

3.1 Estimation

Huang, Huang, and Liu (2014) propose treating the technology gap as a stochastic variable. In Stage 2, the fitted group frontier values become the dependent variable in a second SFA:

\[\ln \hat{f}(x_{ij}; \hat\beta_j) = x_{ij}'\beta^* + v^*_{ij} - u^*_{ij}\]

where \(u^*_{ij} \ge 0\) captures the technology gap and \(v^*_{ij}\) is a noise term. This is estimated via MLE, yielding:

3.2 Advantages over the deterministic approach

  1. Inference: Standard errors, confidence intervals, and hypothesis tests on metafrontier parameters are available.
  2. Robustness: The noise term \(v^*_{ij}\) absorbs sampling variation from Stage 1, preventing overfitting.
  3. Consistency: The metafrontier need not strictly envelop all group frontiers in finite samples, which can be more realistic.

3.3 Caveat: the generated-regressor problem

The stochastic metafrontier is a two-stage estimator. In Stage 2, the dependent variable \(\ln \hat{f}(x_{ij}; \hat\beta_j)\) is itself an estimate from Stage 1 – it is a generated regressor (Murphy and Topel, 1985). The standard errors reported by the package are derived from the Stage 2 Hessian alone and do not account for the sampling uncertainty in the Stage 1 group frontier estimates.

As a result:

3.4 Example

fit_sto <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  meta_type = "stochastic"
)

summary(fit_sto)
#> 
#> Metafrontier Model Summary
#> ==========================
#> 
#> Call:
#> metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data, 
#>     group = "group", meta_type = "stochastic")
#> 
#> Method:        sfa 
#> Metafrontier:  stochastic 
#> 
#> --- Group: G1 (n = 300) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 1.041650   0.054009   19.29   <2e-16 ***
#> log_x1      0.495008   0.009173   53.97   <2e-16 ***
#> log_x2      0.192694   0.008907   21.64   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 22.685 
#> 
#> --- Group: G2 (n = 300) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.58144    0.05155   11.28   <2e-16 ***
#> log_x1       0.52876    0.01019   51.88   <2e-16 ***
#> log_x2       0.20155    0.01037   19.44   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -36.696 
#> 
#> --- Metafrontier ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 0.813734   0.208469   3.903 9.49e-05 ***
#> log_x1      0.513997   0.005155  99.716  < 2e-16 ***
#> log_x2      0.197514   0.004989  39.588  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 181.35 
#> 
#> --- Efficiency Decomposition ---
#>  Group Mean_TE Mean_TGR Mean_TE_star
#>     G1  0.8433   1.1839       0.9984
#>     G2  0.7328   0.8306       0.6087
#> 
#> --- Technology Gap Ratio Summary ---
#>  Group   N   Mean     SD    Min     Q1 Median     Q3    Max
#>     G1 300 1.1839 0.0320 1.1222 1.1576 1.1861 1.2087 1.2464
#>     G2 300 0.8306 0.0181 0.7949 0.8148 0.8300 0.8445 0.8693

The stochastic metafrontier provides standard errors:

# Variance-covariance matrix
vcov(fit_sto)
#>               (Intercept)        log_x1        log_x2
#> (Intercept)  4.345943e-02 -8.269764e-05 -6.986322e-05
#> log_x1      -8.269764e-05  2.657002e-05  1.867196e-06
#> log_x2      -6.986322e-05  1.867196e-06  2.489219e-05

# Log-likelihood of the metafrontier model
logLik(fit_sto)
#> 'log Lik.' 181.3504 (df=3)

3.5 A note on TGR values

Under the stochastic metafrontier, TGR values are not constrained to be \(\le 1\) in finite samples, since the metafrontier need not strictly envelop all group frontiers. Values slightly above 1 can occur and are consistent with the stochastic framework.

tgr_vals <- efficiencies(fit_sto, type = "tgr")
summary(tgr_vals)
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.7949  0.8300  0.9958  1.0072  1.1861  1.2464

4. DEA-based metafrontier

4.1 Approach

For a nonparametric metafrontier:

  1. Compute group-specific DEA efficiencies \(\hat\theta_{ij}^{group}\) using only observations from group \(j\).
  2. Compute pooled DEA efficiencies \(\hat\theta_{ij}^{pool}\) using all observations.
  3. The TGR is: \(TGR_{ij} = \hat\theta_{ij}^{pool} / \hat\theta_{ij}^{group}\).

The package solves the DEA linear programs using lpSolveAPI.

4.2 Returns to scale

The rts argument controls the technology assumption:

# CRS metafrontier
fit_crs <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  method = "dea",
  rts = "crs"
)

# VRS metafrontier
fit_vrs <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  method = "dea",
  rts = "vrs"
)

# Compare mean TGR
cbind(
  CRS = tapply(fit_crs$tgr, fit_crs$group_vec, mean),
  VRS = tapply(fit_vrs$tgr, fit_vrs$group_vec, mean)
)
#>          CRS       VRS
#> G1 1.0000000 1.0000000
#> G2 0.6014273 0.8180669

5. Comparing methods

The choice between deterministic, stochastic, and DEA metafrontiers involves trade-offs:

Feature Deterministic SFA Stochastic SFA DEA
Functional form Parametric Parametric Nonparametric
Noise handling Stage 1 only Both stages None
Inference on TGR No Yes No
TGR \(\le\) 1 guaranteed Yes No Yes
Small sample performance Moderate Moderate Poor
References BRO (2004) HHL (2014) ORB (2008)
# Compare TGR estimates across methods
tgr_det <- tapply(fit_det$tgr, fit_det$group_vec, mean)
tgr_sto <- tapply(fit_sto$tgr, fit_sto$group_vec, mean)
tgr_dea <- tapply(fit_crs$tgr, fit_crs$group_vec, mean)
true_tgr <- tapply(sim$data$true_tgr, sim$data$group, mean)

comparison <- data.frame(
  True = true_tgr,
  Deterministic = tgr_det,
  Stochastic = tgr_sto,
  DEA_CRS = tgr_dea
)
round(comparison, 4)
#>      True Deterministic Stochastic DEA_CRS
#> G1 1.0000         1.000     1.1839  1.0000
#> G2 0.6703         0.702     0.8306  0.6014

6. Choosing a method: practical guidance

Selecting between deterministic SFA, stochastic SFA, and DEA metafrontiers depends on the research question, data characteristics, and inferential requirements.

Use the deterministic SFA metafrontier (BRO 2004) when:

Use the stochastic SFA metafrontier (HHL 2014) when:

Use the DEA metafrontier when:

In many applied studies, it is informative to estimate multiple methods and compare TGR estimates for robustness (as shown in Section 5).

7. Testing for technology heterogeneity

Before estimating a metafrontier, it is useful to test whether separate group frontiers are actually needed. The poolability test uses a likelihood ratio statistic:

\[LR = -2\left[LL_{pooled} - \sum_{j=1}^{J} LL_j\right] \sim \chi^2_{df}\]

where \(LL_{pooled}\) is the log-likelihood of a single frontier estimated on the pooled sample and \(LL_j\) are the group-specific log-likelihoods.

poolability_test(fit_det)
#> 
#>  Likelihood Ratio Test for Poolability of Group Frontiers
#> 
#> data:  metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data,     group = "group", meta_type = "deterministic")
#> LR = 442.35, df = 5, p-value < 2.2e-16

A significant test (p < 0.05) confirms that the groups operate under different technologies and the metafrontier decomposition is warranted.

8. Simulation for Monte Carlo studies

The simulate_metafrontier() function generates data from a known DGP, enabling parameter recovery studies:

# Monte Carlo: check parameter recovery over 100 replications
set.seed(1)
n_rep <- 100
beta_hat <- matrix(NA, n_rep, 3)

for (r in seq_len(n_rep)) {
  sim_r <- simulate_metafrontier(
    n_groups = 2, n_per_group = 200,
    tech_gap = c(0, 0.3),
    sigma_u = c(0.2, 0.3),
    sigma_v = 0.15
  )
  fit_r <- metafrontier(
    log_y ~ log_x1 + log_x2,
    data = sim_r$data,
    group = "group",
    meta_type = "deterministic"
  )
  beta_hat[r, ] <- coef(fit_r, which = "meta")
}

# Bias
true_beta <- c(1.0, 0.5, 0.3)
colMeans(beta_hat) - true_beta

The simulate_metafrontier() function supports:

References