This vignette demonstrates advanced analytical workflows using
unicefData, aligned with the examples in the package
documentation paper (Azevedo, 2026). All examples use the same
indicators, countries, and parameters as the paper and the Stata help
file, enabling cross-language reproducibility.
The examples in this vignette demonstrate the principle of treating data acquisition as code. Notice how each example explicitly specifies:
CME_MRY0T4 for
under-5 mortality)This approach contrasts with workflows where researchers: 1. Manually download data from a web portal 2. Apply undocumented filters in Excel or R 3. Manually clean and reshape the data
With unicefData, all these decisions are explicit and version-controlled in your script. This makes your analysis:
This is especially important in research assisted by AI tools, where automated analysis must rest on transparent and verifiable data foundations.
Reproduce the paper’s South Asia mortality trend analysis (paper Example 5+6):
# Fetch under-5 mortality for South Asian countries
df <- unicefData(
indicator = "CME_MRY0T4",
countries = c("AFG", "BGD", "BTN", "IND", "MDV", "NPL", "PAK", "LKA")
)
# Filter to total (both sexes)
df_total <- df %>% filter(sex == "_T" | is.na(sex))
# Plot trends
plot(
value ~ period,
data = df_total[df_total$iso3 == "AFG", ],
type = "l", col = "red", lwd = 2,
ylim = range(df_total$value, na.rm = TRUE),
xlab = "Year", ylab = "Under-5 mortality rate (per 1,000)",
main = "U5MR Trends in South Asia"
)
lines(value ~ period, data = df_total[df_total$iso3 == "BGD", ], col = "blue", lwd = 2)
lines(value ~ period, data = df_total[df_total$iso3 == "IND", ], col = "green", lwd = 2)
lines(value ~ period, data = df_total[df_total$iso3 == "PAK", ], col = "orange", lwd = 2)
legend("topright",
legend = c("Afghanistan", "Bangladesh", "India", "Pakistan"),
col = c("red", "blue", "green", "orange"), lwd = 2
)The equivalent Stata code from the paper:
. unicefdata, indicator(CME_MRY0T4) countries(AFG BGD BTN IND MDV NPL PAK LKA) clear
. keep if sex == "_T"
. graph twoway ///
(connected value period if iso3 == "AFG", lcolor(red)) ///
(connected value period if iso3 == "BGD", lcolor(blue)) ///
(connected value period if iso3 == "IND", lcolor(green)) ///
(connected value period if iso3 == "PAK", lcolor(orange)), ///
legend(order(1 "Afghanistan" 2 "Bangladesh" 3 "India" 4 "Pakistan"))
Equity analysis using wealth disaggregation (paper Example 8):
# Fetch stunting data with all wealth quintiles
df <- unicefData(
indicator = "NT_ANT_HAZ_NE2",
sex = "ALL",
wealth = "ALL",
latest = TRUE
)
# Filter to wealth quintiles only
df_wealth <- df %>%
filter(wealth_quintile %in% c("Q1", "Q2", "Q3", "Q4", "Q5"))
# Average stunting by wealth quintile (global)
summary_wealth <- df_wealth %>%
group_by(wealth_quintile) %>%
summarise(mean_stunting = mean(value, na.rm = TRUE), .groups = "drop") %>%
arrange(wealth_quintile)
print(summary_wealth)
# Visualize the wealth gradient
barplot(
summary_wealth$mean_stunting,
names.arg = summary_wealth$wealth_quintile,
ylab = "Stunting prevalence (%)",
main = "Child Stunting by Wealth Quintile",
col = c("#d73027", "#fc8d59", "#fee090", "#91bfdb", "#4575b4")
)Quantify the equity gap between poorest and richest quintiles:
# Fetch stunting for specific countries with Q1 and Q5
df <- unicefData(
indicator = "NT_ANT_HAZ_NE2",
countries = c("IND", "PAK", "BGD", "ETH"),
wealth = "ALL",
latest = TRUE
)
# Compute wealth gap (Q1 - Q5 = poorest minus richest)
df_gap <- df %>%
filter(wealth_quintile %in% c("Q1", "Q5")) %>%
tidyr::pivot_wider(
id_cols = c(iso3, country),
names_from = wealth_quintile,
values_from = value
) %>%
mutate(wealth_gap = Q1 - Q5) %>%
arrange(desc(wealth_gap))
print(df_gap)Compare neonatal and under-5 mortality across countries (paper Example 10):
# Fetch multiple mortality indicators
df <- unicefData(
indicator = c("CME_MRM0", "CME_MRY0T4"),
countries = c("BRA", "MEX", "ARG", "COL", "PER", "CHL"),
year = "2020:2023"
)
# Keep latest year per country-indicator
df_latest <- df %>%
filter(sex == "_T" | is.na(sex)) %>%
group_by(iso3, indicator) %>%
slice_max(period, n = 1) %>%
ungroup()
# Reshape wide for comparison
df_wide <- df_latest %>%
select(iso3, country, indicator, value) %>%
tidyr::pivot_wider(names_from = indicator, values_from = value)
print(df_wide)Track DTP3 and MCV1 coverage over time (paper immunization example):
# Fetch immunization indicators
df <- unicefData(
indicator = c("IM_DTP3", "IM_MCV1"),
year = "2000:2023"
)
# Global average by year and indicator
trends <- df %>%
group_by(period, indicator) %>%
summarise(coverage = mean(value, na.rm = TRUE), .groups = "drop")
# Plot
dtp3 <- trends[trends$indicator == "IM_DTP3", ]
mcv1 <- trends[trends$indicator == "IM_MCV1", ]
plot(coverage ~ period, data = dtp3, type = "l", col = "blue", lwd = 2,
ylim = c(60, 95), xlab = "Year", ylab = "Coverage (%)",
main = "Global Immunization Coverage Trends")
lines(coverage ~ period, data = mcv1, col = "red", lwd = 2)
legend("bottomright", legend = c("DTP3", "MCV1"),
col = c("blue", "red"), lwd = 2)Analyze U5MR by UNICEF region using metadata enrichment (paper Example 12):
# Fetch with regional classifications
df <- unicefData(
indicator = "CME_MRY0T4",
add_metadata = c("region", "income_group"),
latest = TRUE
)
# Filter to countries only (exclude regional aggregates)
df_countries <- df %>%
filter(geo_type == 0, sex == "_T" | is.na(sex))
# Average U5MR by region
by_region <- df_countries %>%
group_by(region) %>%
summarise(avg_u5mr = mean(value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(avg_u5mr))
print(by_region)
# Average U5MR by income group
by_income <- df_countries %>%
group_by(income_group) %>%
summarise(avg_u5mr = mean(value, na.rm = TRUE), .groups = "drop") %>%
arrange(desc(avg_u5mr))
print(by_income)Create panel datasets for econometric analysis (paper Example 9):
Compare indicators from different domains side-by-side:
# One column per indicator
df_cross <- unicefData(
indicator = c("CME_MRY0T4", "CME_MRY0", "IM_DTP3", "IM_MCV1"),
countries = c("AFG", "ETH", "PAK", "NGA"),
latest = TRUE,
format = "wide_indicators"
)
print(df_cross)
# Correlation between mortality and immunization
if (all(c("CME_MRY0T4", "IM_DTP3") %in% names(df_cross))) {
cor_val <- cor(df_cross$CME_MRY0T4, df_cross$IM_DTP3, use = "complete.obs")
message("Correlation between U5MR and DTP3: ", round(cor_val, 3))
}Examine male-female mortality gaps (paper disaggregation example):
# Fetch all sex categories
df <- unicefData(
indicator = "CME_MRY0T4",
countries = c("IND", "PAK", "BGD"),
year = 2020,
sex = "ALL"
)
# Compute male-female gap (biological pattern: male > female)
df_gap <- df %>%
filter(sex %in% c("M", "F")) %>%
tidyr::pivot_wider(
id_cols = c(iso3, country, period),
names_from = sex,
values_from = value
) %>%
mutate(mf_gap = M - F)
print(df_gap)# Stunting prevalence
df_stunting <- unicefData(indicator = "NT_ANT_HAZ_NE2", latest = TRUE)
# Stunting by wealth (poorest quintile only)
df_q1 <- unicefData(
indicator = "NT_ANT_HAZ_NE2",
wealth = "Q1",
latest = TRUE
)
# Stunting by residence (rural only)
df_rural <- unicefData(
indicator = "NT_ANT_HAZ_NE2",
residence = "R",
latest = TRUE
)Handle errors gracefully when processing multiple indicators:
# Process multiple indicators, some of which may not exist
indicators <- c("CME_MRY0T4", "IM_DTP3", "INVALID_CODE_XYZ")
results <- list()
for (ind in indicators) {
tryCatch({
results[[ind]] <- unicefData(indicator = ind, countries = "BRA", latest = TRUE)
message("OK: ", ind, " (", nrow(results[[ind]]), " rows)")
}, error = function(e) {
message("FAIL: ", ind, " - ", conditionMessage(e))
})
}The Stata equivalent uses capture noisily (paper Example
16):
. foreach ind in CME_MRY0T4 IM_DTP3 INVALID {
. capture noisily unicefdata, indicator(`ind') clear
. if _rc == 0 {
. summarize value
}
. }
Keep local metadata up to date with the UNICEF Data Warehouse:
vignette("unicefData-introduction") — Getting started
guide?unicefData — Main function documentation?search_indicators — Indicator discovery?filter_unicef_data — Post-processing filtersThis package was developed at the UNICEF Data and Analytics Section. The author gratefully acknowledges the collaboration of Lucas Rodrigues, Yang Liu, and Karen Avanesian, whose technical contributions and feedback were instrumental in the development of this R package.
Special thanks to Yves Jaques, Alberto Sibileau, and Daniele Olivotti for designing and maintaining the UNICEF SDMX data warehouse infrastructure that makes this package possible.
The author also acknowledges the UNICEF database managers and technical teams who ensure data quality, as well as the country office staff and National Statistical Offices whose data collection efforts make this work possible.
Development of this package was supported by UNICEF institutional funding for data infrastructure and statistical capacity building. The author also acknowledges UNICEF colleagues who provided testing and feedback during development, as well as the broader open-source R community.
Development was assisted by AI coding tools (GitHub Copilot, Claude). All code has been reviewed, tested, and validated by the package maintainers.
This package is provided for research and analytical purposes.
The unicefData package provides programmatic access to
UNICEF’s public data warehouse. While the author is affiliated with
UNICEF, this package is not an official UNICEF product and the
statements in this documentation are the views of the author and do not
necessarily reflect the policies or views of UNICEF.
Data accessed through this package comes from the UNICEF Data Warehouse. Users should verify critical data points against official UNICEF publications at data.unicef.org.
This software is provided “as is”, without warranty of any kind, express or implied, including but not limited to the warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authors or UNICEF be liable for any claim, damages or other liability arising from the use of this software.
The designations employed and the presentation of material in this package do not imply the expression of any opinion whatsoever on the part of UNICEF concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries.
Important Note on Data Vintages
Official statistics are subject to revisions as new information becomes available and estimation methodologies improve. UNICEF indicators are regularly updated based on new surveys, censuses, and improved modeling techniques. Historical values may be revised retroactively to reflect better information or methodological improvements.
For reproducible research and proper data attribution, users should:
CME_MRY0T4)Example citation for data used in research:
Under-5 mortality data (indicator: CME_MRY0T4) accessed from UNICEF Data Warehouse via unicefData R package (v2.1.0) on 2026-02-09. Data available at: https://sdmx.data.unicef.org/
This practice ensures that others can verify your results and understand any differences that may arise from data updates. For official UNICEF statistics in publications, always cross-reference with the current version at data.unicef.org.
If you use this package in your research, please cite:
Azevedo, J.P. (2026). unicefData: Trilingual R, Python, and Stata Interface
to UNICEF SDMX Data Warehouse. R package version 2.1.0.
https://github.com/unicef-drp/unicefData
For data citations, please refer to the specific UNICEF datasets accessed through the warehouse and cite them according to UNICEF’s data citation guidelines.
This package is released under the MIT License. See the LICENSE file for full details.