Introduction to metafrontier

What is a metafrontier?

In efficiency analysis, we often study firms that operate under fundamentally different technologies. Steel producers using electric arc furnaces (EAF) face a different production possibility set than those using the blast furnace-basic oxygen furnace (BF-BOF) route. Hospitals in rural areas face different constraints than urban ones. Banks in developing economies operate under different regulatory and technological environments than those in advanced economies.

Standard stochastic frontier analysis (SFA) or data envelopment analysis (DEA) applied to the pooled sample implicitly assumes all firms share the same technology – an assumption that may be unrealistic. Estimating separate frontiers for each group solves this problem but makes efficiency scores incomparable across groups: a firm that is 90% efficient relative to a less advanced group frontier may actually be less productive than a firm that is 70% efficient relative to a more advanced frontier.

The metafrontier framework, introduced by Battese, Rao, and O’Donnell (2004) and extended by Huang, Huang, and Liu (2014) and O’Donnell, Rao, and Battese (2008), resolves this by:

  1. Estimating group-specific frontiers for each technology group
  2. Estimating a metafrontier that envelops all group frontiers
  3. Decomposing efficiency into two components:

\[TE^*_i = TE_i \times TGR_i\]

where:

The metafrontier package provides a unified interface for estimating metafrontier models using both SFA and DEA approaches.

Quick start

library(metafrontier)

Simulate data

The package includes simulate_metafrontier() for generating data from a known data-generating process. This is useful for Monte Carlo studies and for learning the package.

sim <- simulate_metafrontier(
  n_groups = 3,
  n_per_group = 200,
  beta_meta = c(1.0, 0.5, 0.3),  # intercept, elasticity_1, elasticity_2
  tech_gap = c(0, 0.25, 0.5),    # intercept shifts (0 = best technology)
  sigma_u = c(0.2, 0.3, 0.4),    # inefficiency SD per group
  sigma_v = 0.15,                 # noise SD
  seed = 42
)

str(sim$data[, c("log_y", "log_x1", "log_x2", "group")])
#> 'data.frame':    600 obs. of  4 variables:
#>  $ log_y : num  4.05 3.99 3.16 4.04 2.52 ...
#>  $ log_x1: num  4.57 4.69 1.43 4.15 3.21 ...
#>  $ log_x2: num  4.426 2.586 4.26 2.214 0.789 ...
#>  $ group : Factor w/ 3 levels "G1","G2","G3": 1 1 1 1 1 1 1 1 1 1 ...
table(sim$data$group)
#> 
#>  G1  G2  G3 
#> 200 200 200

The simulation generates a Cobb-Douglas frontier: \[\ln y_i = \beta_0^{(j)} + \beta_1 \ln x_{1i} + \beta_2 \ln x_{2i} + v_i - u_i\] where the intercept \(\beta_0^{(j)} = \beta_0^* - \delta_j\) is shifted down from the metafrontier by the technology gap \(\delta_j\) for group \(j\).

Estimate the metafrontier

fit <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  method = "sfa",
  meta_type = "deterministic"
)

fit
#> 
#> Metafrontier Model
#> ------------------
#> Method:           sfa 
#> Metafrontier:     deterministic 
#> Groups:           G1, G2, G3 
#> Total obs:        600 
#>   G1: 200 obs
#>   G2: 200 obs
#>   G3: 200 obs
#> 
#> Group log-likelihoods:
#>   G1: 35.49
#>   G2: 0.84652
#>   G3: -25.399
#> 
#> Mean TGR by group:
#>   G1: 1
#>   G2: 0.7593
#>   G3: 0.6047

Deterministic SFA metafrontier (Battese, Rao, and O’Donnell, 2004)

The deterministic metafrontier is estimated in two stages:

  1. Stage 1: Fit separate SFA models for each group via maximum likelihood.
  2. Stage 2: Find metafrontier coefficients \(\hat\beta^*\) by minimising \[\sum_i \left[\ln f(x_i; \hat\beta^*) - \ln f(x_i; \hat\beta_j)\right]^2\] subject to the constraint that the metafrontier envelops all group frontiers: \(\ln f(x_i; \hat\beta^*) \ge \ln f(x_i; \hat\beta_j)\) for all \(i\) and \(j\).

This is the default method:

fit_det <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  meta_type = "deterministic"
)

summary(fit_det)
#> 
#> Metafrontier Model Summary
#> ==========================
#> 
#> Call:
#> metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data, 
#>     group = "group", meta_type = "deterministic")
#> 
#> Method:        sfa 
#> Metafrontier:  deterministic 
#> 
#> --- Group: G1 (n = 200) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 1.025588   0.060881   16.85   <2e-16 ***
#> log_x1      0.493508   0.009658   51.10   <2e-16 ***
#> log_x2      0.294569   0.009805   30.04   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 35.49 
#> 
#> --- Group: G2 (n = 200) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.76703    0.06478   11.84   <2e-16 ***
#> log_x1       0.48428    0.01239   39.10   <2e-16 ***
#> log_x2       0.29703    0.01210   24.55   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 0.84652 
#> 
#> --- Group: G3 (n = 200) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.53665    0.05513   9.734   <2e-16 ***
#> log_x1       0.49950    0.01238  40.352   <2e-16 ***
#> log_x2       0.28205    0.01186  23.782   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -25.399 
#> 
#> --- Metafrontier ---
#>             Estimate
#> (Intercept)   1.0256
#> log_x1        0.4935
#> log_x2        0.2946
#> 
#> --- Efficiency Decomposition ---
#>  Group Mean_TE Mean_TGR Mean_TE_star
#>     G1  0.8571   1.0000       0.8571
#>     G2  0.8249   0.7593       0.6263
#>     G3  0.7401   0.6047       0.4475
#> 
#> --- Technology Gap Ratio Summary ---
#>  Group   N   Mean     SD    Min     Q1 Median     Q3    Max
#>     G1 200 1.0000 0.0000 1.0000 1.0000 1.0000 1.0000 1.0000
#>     G2 200 0.7593 0.0099 0.7386 0.7518 0.7592 0.7679 0.7799
#>     G3 200 0.6047 0.0125 0.5769 0.5947 0.6056 0.6143 0.6289

Stochastic metafrontier (Huang, Huang, and Liu, 2014)

The stochastic metafrontier replaces the LP in Stage 2 with a second-stage SFA, using the fitted group frontier values as the dependent variable:

\[\ln \hat{f}(x_i; \hat\beta_j) = x_i'\beta^* + v^*_i - u^*_i\]

where \(u^*_i \ge 0\) captures the technology gap stochastically. This provides a distributional framework for the TGR, enabling standard errors and hypothesis testing.

fit_sto <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  meta_type = "stochastic"
)

summary(fit_sto)
#> 
#> Metafrontier Model Summary
#> ==========================
#> 
#> Call:
#> metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data, 
#>     group = "group", meta_type = "stochastic")
#> 
#> Method:        sfa 
#> Metafrontier:  stochastic 
#> 
#> --- Group: G1 (n = 200) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 1.025588   0.060881   16.85   <2e-16 ***
#> log_x1      0.493508   0.009658   51.10   <2e-16 ***
#> log_x2      0.294569   0.009805   30.04   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 35.49 
#> 
#> --- Group: G2 (n = 200) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.76703    0.06478   11.84   <2e-16 ***
#> log_x1       0.48428    0.01239   39.10   <2e-16 ***
#> log_x2       0.29703    0.01210   24.55   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 0.84652 
#> 
#> --- Group: G3 (n = 200) ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept)  0.53665    0.05513   9.734   <2e-16 ***
#> log_x1       0.49950    0.01238  40.352   <2e-16 ***
#> log_x2       0.28205    0.01186  23.782   <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: -25.399 
#> 
#> --- Metafrontier ---
#>             Estimate Std. Error z value Pr(>|z|)    
#> (Intercept) 0.784379   0.186836   4.198 2.69e-05 ***
#> log_x1      0.493270   0.005918  83.349  < 2e-16 ***
#> log_x2      0.289361   0.005832  49.613  < 2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> Log-likelihood: 96.292 
#> 
#> --- Efficiency Decomposition ---
#>  Group Mean_TE Mean_TGR Mean_TE_star
#>     G1  0.8571   1.2894       1.1051
#>     G2  0.8249   0.9794       0.8078
#>     G3  0.7401   0.7797       0.5771
#> 
#> --- Technology Gap Ratio Summary ---
#>  Group   N   Mean     SD    Min     Q1 Median     Q3    Max
#>     G1 200 1.2894 0.0099 1.2735 1.2808 1.2887 1.2973 1.3073
#>     G2 200 0.9794 0.0157 0.9428 0.9687 0.9796 0.9906 1.0185
#>     G3 200 0.7797 0.0111 0.7537 0.7717 0.7798 0.7876 0.8027

The stochastic metafrontier provides a variance-covariance matrix:

vcov(fit_sto)
#>               (Intercept)        log_x1        log_x2
#> (Intercept)  3.490785e-02 -8.889894e-05 -7.974475e-05
#> log_x1      -8.889894e-05  3.502404e-05 -4.372711e-07
#> log_x2      -7.974475e-05 -4.372711e-07  3.401693e-05

DEA-based metafrontier

For a nonparametric approach, set method = "dea":

fit_dea <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data,
  group = "group",
  method = "dea",
  rts = "vrs"
)
#> Warning: LP infeasible for a DMU.
#> Warning: LP infeasible for a DMU.
#> Warning: LP infeasible for a DMU.
#> Warning: LP infeasible for a DMU.

fit_dea
#> 
#> Metafrontier Model
#> ------------------
#> Method:           dea 
#> Metafrontier:     deterministic 
#> Groups:           G1, G2, G3 
#> Total obs:        600 
#>   G1: 200 obs
#>   G2: 200 obs
#>   G3: 200 obs
#> 
#> Mean TGR by group:
#>   G1: 0.9959
#>   G2: 0.9221
#>   G3: NA

The DEA metafrontier computes:

  1. Group-specific DEA efficiencies under variable returns to scale (VRS), constant returns to scale (CRS), or other assumptions.
  2. A pooled DEA across all observations to form the metafrontier.
  3. \(TGR_i = TE^{pool}_i / TE^{group}_i\).

Extracting results

Efficiency scores

Use efficiencies() to extract the three components of the decomposition:

te <- efficiencies(fit_det, type = "group")
tgr <- efficiencies(fit_det, type = "tgr")
te_star <- efficiencies(fit_det, type = "meta")

# Verify the fundamental identity: TE* = TE x TGR
all.equal(te_star, te * tgr)
#> [1] TRUE

Technology gap ratio

The technology_gap_ratio() function returns TGR values grouped by technology:

tgr_by_group <- technology_gap_ratio(fit_det)
lapply(tgr_by_group, summary)
#> $G1
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>       1       1       1       1       1       1 
#> 
#> $G2
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.7386  0.7518  0.7592  0.7593  0.7679  0.7799 
#> 
#> $G3
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#>  0.5769  0.5947  0.6056  0.6047  0.6143  0.6289

For a formatted summary table:

tgr_summary(fit_det)
#>   Group   N      Mean          SD       Min        Q1    Median        Q3
#> 1    G1 200 1.0000000 0.000000000 1.0000000 1.0000000 1.0000000 1.0000000
#> 2    G2 200 0.7592884 0.009947071 0.7385586 0.7517742 0.7592422 0.7679385
#> 3    G3 200 0.6047017 0.012456784 0.5769063 0.5947118 0.6056158 0.6142562
#>         Max
#> 1 1.0000000
#> 2 0.7798918
#> 3 0.6289392

Coefficients

# Metafrontier coefficients
coef(fit_det, which = "meta")
#> (Intercept)      log_x1      log_x2 
#>   1.0255883   0.4935078   0.2945688

# Group-specific coefficients
coef(fit_det, which = "group")
#> $G1
#> (Intercept)      log_x1      log_x2 
#>   1.0255883   0.4935078   0.2945688 
#> 
#> $G2
#> (Intercept)      log_x1      log_x2 
#>   0.7670252   0.4842818   0.2970328 
#> 
#> $G3
#> (Intercept)      log_x1      log_x2 
#>   0.5366478   0.4994989   0.2820525

Model information

# Log-likelihood (sum of group log-likelihoods for deterministic)
logLik(fit_det)
#> 'log Lik.' 10.9376 (df=3)

# Number of observations
nobs(fit_det)
#> [1] 600

# AIC and BIC (available automatically via logLik method)
AIC(fit_det)
#> [1] -15.87521

Visualisation

The package provides four built-in plot types:

TGR distributions

plot(fit_det, which = "tgr")

Efficiency scatter

plot(fit_det, which = "efficiency")

Points below the 45-degree line indicate a technology gap (TE* < TE). The vertical distance from the line reflects the TGR.

Efficiency decomposition

plot(fit_det, which = "decomposition")

Side-by-side boxplots of TE, TGR, and TE* by group.

Hypothesis testing

Poolability test

The poolability test evaluates whether group-specific frontiers are statistically different from a single pooled frontier:

poolability_test(fit_det)
#> 
#>  Likelihood Ratio Test for Poolability of Group Frontiers
#> 
#> data:  metafrontier(formula = log_y ~ log_x1 + log_x2, data = sim$data,     group = "group", meta_type = "deterministic")
#> LR = 504.71, df = 10, p-value < 2.2e-16

A significant result (small p-value) indicates that the technology groups have genuinely different production technologies, justifying the metafrontier approach.

Inefficiency distributions

The package supports three distributional assumptions for the one-sided inefficiency term \(u_i\) in SFA:

# Half-normal (default): u ~ |N(0, sigma_u^2)|
fit_hn <- metafrontier(log_y ~ log_x1 + log_x2,
                       data = sim$data, group = "group",
                       dist = "hnormal")

# Truncated normal: u ~ N+(mu, sigma_u^2)
fit_tn <- metafrontier(log_y ~ log_x1 + log_x2,
                       data = sim$data, group = "group",
                       dist = "tnormal")

# Exponential: u ~ Exp(1/sigma_u)
fit_exp <- metafrontier(log_y ~ log_x1 + log_x2,
                        data = sim$data, group = "group",
                        dist = "exponential")

Comparing true and estimated values

Since we used simulated data, we can compare estimated values against the truth:

# True vs estimated metafrontier coefficients
cbind(
  True = sim$params$beta_meta,
  Estimated = coef(fit_det, which = "meta")
)
#>             True Estimated
#> (Intercept)  1.0 1.0255883
#> log_x1       0.5 0.4935078
#> log_x2       0.3 0.2945688

# True vs estimated mean TGR by group
true_tgr <- tapply(sim$data$true_tgr, sim$data$group, mean)
est_tgr <- tapply(fit_det$tgr, fit_det$group_vec, mean)
cbind(True = true_tgr, Estimated = est_tgr)
#>         True Estimated
#> G1 1.0000000 1.0000000
#> G2 0.7788008 0.7592884
#> G3 0.6065307 0.6047017

# Correlation between true and estimated efficiency
cor(sim$data$true_te, fit_det$te_group)
#> [1] 0.80659
cor(sim$data$true_te_star, fit_det$te_meta)
#> [1] 0.9400828

Panel SFA Metafrontier

The package supports panel data via the Battese-Coelli (1992) and (1995) models. Use the panel argument:

# Simulate panel data
panel_sim <- simulate_panel_metafrontier(
  n_groups = 2, n_firms_per_group = 20, n_periods = 5, seed = 42
)

# BC92: time-varying inefficiency u_it = u_i * exp(-eta*(t-T))
fit_panel <- metafrontier(
  log_y ~ log_x1 + log_x2,
  data = panel_sim$data,
  group = "group",
  panel = list(id = "firm", time = "year"),
  panel_dist = "bc92"
)
summary(fit_panel)

# The eta parameter captures time-varying inefficiency
# eta > 0: inefficiency decreasing over time
# eta < 0: inefficiency increasing over time

Bootstrap Confidence Intervals for TGR

The boot_tgr() function provides parametric and nonparametric bootstrap confidence intervals for the technology gap ratio:

sim <- simulate_metafrontier(n_groups = 2, n_per_group = 100, seed = 42)
fit <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data,
                    group = "group", meta_type = "stochastic")

# Nonparametric bootstrap (case resampling within groups)
boot <- boot_tgr(fit, R = 499, type = "nonparametric", seed = 1)
print(boot)

# Observation-level CIs
ci <- confint(boot)
head(ci)

# Group-level mean TGR CIs
boot$ci_group

# Parametric bootstrap (resample from estimated error distributions)
boot_par <- boot_tgr(fit, R = 499, type = "parametric", seed = 1)

Murphy-Topel Variance Correction

The stochastic metafrontier is a two-stage estimator where Stage 2 uses fitted values from Stage 1 as regressors. This “generated regressor” problem means naive standard errors understate uncertainty. The Murphy-Topel (1985) correction adjusts for this:

fit <- metafrontier(log_y ~ log_x1 + log_x2, data = sim$data,
                    group = "group", meta_type = "stochastic")

# Naive (uncorrected) standard errors
vcov(fit)

# Murphy-Topel corrected standard errors
vcov(fit, correction = "murphy-topel")

# Corrected confidence intervals
confint(fit, correction = "murphy-topel")

Latent Class Metafrontier

When group membership is unobserved, use latent_class_metafrontier():

sim <- simulate_metafrontier(n_groups = 2, n_per_group = 100, seed = 42)

# Fit with 2 latent classes
lc <- latent_class_metafrontier(
  log_y ~ log_x1 + log_x2,
  data = sim$data, n_classes = 2,
  n_starts = 5, seed = 123
)
print(lc)
summary(lc)

# Select optimal number of classes via BIC
bic_table <- select_n_classes(
  log_y ~ log_x1 + log_x2, data = sim$data,
  n_classes_range = 2:4, n_starts = 3, seed = 42
)
print(bic_table)  # choose n_classes with lowest BIC

Directional Distance Functions (DDF)

For additive efficiency decomposition, use DDF-based metafrontier:

sim <- simulate_metafrontier(n_groups = 2, n_per_group = 50, seed = 42)
# Use raw (non-log) data for DEA
sim$data$y <- exp(sim$data$log_y)
sim$data$x1 <- exp(sim$data$log_x1)
sim$data$x2 <- exp(sim$data$log_x2)

fit_ddf <- metafrontier(
  y ~ x1 + x2, data = sim$data, group = "group",
  method = "dea", type = "directional", direction = "output"
)
summary(fit_ddf)

# Additive decomposition: beta_meta = beta_group + ddf_tgr
head(data.frame(
  beta_meta = fit_ddf$beta_meta,
  beta_group = fit_ddf$beta_group,
  ddf_tgr = fit_ddf$ddf_tgr
))

References