Creating Customized Indicators for surveyPrev

Qianyu Dong and Zehang Richard Li

2026-06-03

In this vignette, we provide an overview of the list of DHS indicators currently implemented in surveyPrev and the process to add new indicators or create customized indicators.

First we load the surveyPrev package, and any packages used in the customized indicator processing function later. In our example, dplyr and labelled are used.

library(surveyPrev)
library(dplyr)
library(labelled)
library(kableExtra)

In order to use getDHSdata() to download the relevant DHS data directly from the DHS website, we need to

  1. register with DHS to gain data access, and
  2. set up DHS account details in R using the rdhs package, i.e.,
rdhs::set_rdhs_config(email = "your_email", project = "your_registered_DHS_project_title")

1 Built-in indicators

Currently, the surveyPrev package supports 182 indicators listed in Table 1. The full list of indicators and their IDs can be found in the indicatorList dataset. In previous versions of surveyPrev, a small set of indicators could also be referred to by shorter alternative IDs; these alternative IDs are no longer used, and the standard DHS indicator IDs should be used instead.

data(indicatorList)
head(indicatorList)
Table 1: List of built-in indicators in the surveyPrev package.
ID Description Topic
CH_SZWT_C_L25 Birth weight: Less than 2.5 kg Chapter 10 - Child Health
CH_DIAT_C_ORT Diarrhea treatment (Children under five with diarrhea treated with either ORS or RHF) Chapter 10 - Child Health
CH_DIAT_C_ABI Treatment of diarrhea: Antibiotics Chapter 10 - Child Health
CH_DIAT_C_ADV Treatment of diarrhea: Advice or treatment was sought Chapter 10 - Child Health
CH_DIAT_C_AMO Treatment of diarrhea: Antimotility drugs Chapter 10 - Child Health

These indicators above can be directly processed within surveyPrev using the getDHSdata() and getDHSindicator() functions. The indicator ID in the indicatorList dataset is used to retrieve the indicator. getDHSindicator() processes the raw survey data into a data.frame, where the column titled value is the indicator of interest. It also contains cluster ID, household ID, survey weight and strata information. This data format allows a svydesign object to be defined in the survey package. For example,

indicator <- "RH_ANCN_W_N4P"
year <- 2018
country <- "Zambia"
dhsData1 <- getDHSdata(country = country, indicator = indicator, year = year)
data1 <- getDHSindicator(dhsData1, indicator = indicator)
head(data1)
##   cluster householdID            v022            v023    v024  weight strata
## 1       1           1 eastern - rural eastern - rural eastern 1892890  rural
## 2       1           2 eastern - rural eastern - rural eastern 1892890  rural
## 3       1           3 eastern - rural eastern - rural eastern 1892890  rural
## 4       1           4 eastern - rural eastern - rural eastern 1892890  rural
## 5       1           9 eastern - rural eastern - rural eastern 1892890  rural
## 6       1           9 eastern - rural eastern - rural eastern 1892890  rural
##   value
## 1     1
## 2     1
## 3     1
## 4     0
## 5     0
## 6     0

If the DHS download using the API fails, you may also manually download the file from the DHS website and read into R. The getDHSdata() function returns a message specifying which file is used (e.g., Individual Record file for the ANC visit example).

2 New indicators

Details on how standard DHS indicators are defined can be found in the Guide to DHS Statistics:https://dhsprogram.com/data/Guide-to-DHS-Statistics/ by searching for an indicator. Codes for creating most standard DHS indicators can be found on DHS GitHub site: https://github.com/DHSProgram/DHS-Indicators-R. The indicators are organized by chapters in the Guide to DHS Statistics.

To use surveyPrev to create a new indicator not already built into the package, we need to specify

  1. which DHS data file to download,
  2. a set of rules to create the indicator from the DHS standard recode, or a customized function to process the indicator from the raw DHS survey data.

We will discuss and give examples for both ways to create the new indicator below.

2.1 DHS dataset types

The table below lists the different types of DHS data and their naming conventions in surveyPrev. You can find more details on this website.

Table 2: DHS data types
Name Recode
MRdata Men’s Recode
PRdata Household Member Recode
KRdata Children’s Recode
BRdata Births Recode
CRdata Couples’ Recode
HRdata Household Recode
IRdata Individual Recode

2.2 Option 1: Specifying data processing rules

For many standard indicators, the specification can be summarized into the following three steps:

  1. Filter the data for relevant individuals based on certain selection criterion.
  2. Specify a yes response, i.e., outcome value = 1, if certain conditions hold.
  3. Specify a no response, i.e., outcome value = 0, if certain conditions hold.

If an indicator can be specified in this way, with the conditions depending on variables in the DHS data, they can be specified using the filter, yesCondition, and noCondition arguments in the getDHSindicator() function.

As an example, we create a dataset below that consists of neonatal deaths during the last 10 years prior to survey. The dataset we need is the births recode. After obtaining the raw data using getDHSdata() function, we specify the following conditions:

  1. Filter the dataset with v008 - b3 < 120. v008 and b3 are column names in the DHS birth recode. v008 is the data of interview, and b3 is the date of birth. Both dates are in Century Month Code (CMC) format, so that the difference is in the unit of months. This filter creates a dataset that only consist of births 120 months prior to survey.
  2. Specify a yes response, i.e., a neonatal death, by b7 == 0. b7 is the column specifying age of deaths. Age of deaths equal to 0 means the death happened during the first month of birth.
  3. Specify a no response, i.e., not a neonatal death, by b7 > 0 | is.na(b7). That is, deaths after the first month or live children are considered not a neonatal death.

Putting everything together, the dataset can be created with the following codes.

Recode <- "Births Recode"
dhsData <- getDHSdata(country = "Zambia", indicator = NULL, 
                        Recode = Recode, year = "2018")
data <- getDHSindicator(Rdata = dhsData,
                        indicator = NULL, 
                        filter = "v008 - b3 < 120",
                        yesCondition = "b7 == 0", 
                        noCondition = "b7 > 0 | is.na(b7)")

Notice that all the conditions are specified by strings of expressions that R can use to filter the data. For the filter argument, more than one filter can be applied. For example, the following two calls to getDHSindicator() are equivalent and add an additional filter for births by mothers over 30 years old at time of birth.

data_over30_a <- getDHSindicator(Rdata = dhsData,
                        indicator = NULL, 
                        filter = c("v008 - b3 < 120", "b3 - v011 > 30 * 12"),
                        yesCondition = "b7 == 0", 
                        noCondition = "b7 > 0 | is.na(b7)")
data_over30_b <- getDHSindicator(Rdata = dhsData,
                        indicator = NULL, 
                        filter = "v008 - b3 < 120 & b3 - v011 > 30 * 12",
                        yesCondition = "b7 == 0", 
                        noCondition = "b7 > 0 | is.na(b7)")

2.3 Option 2: Specifying function to process the indicator

When the indicator specification needs more steps than the first option can handle, users can import the full data processing functions by following the steps below.

Let’s take Current use of any modern method of contraception (all women) as an example. For users familiar with the standard indicators defined by the DHS Data Indicator API, the indicator ID is “FP_CUSA_W_MOD”. We will go through the steps to create the customized function below.

Step 1: Search indicator ID or key words in the Guide to DHS Statistics, and then identify which chapter it is in.

For our example, we can either search “FP_CUSA_W_MOD”, or “contraceptive”, and it is in chapter 7: family planning.

Screenshot of Step 1(a): Searching for indicator ID or key words.

Figure 1: Screenshot of Step 1(a): Searching for indicator ID or key words.

Screenshot of Step 1(b): Identifying which chapter the indicator is from.

Figure 2: Screenshot of Step 1(b): Identifying which chapter the indicator is from.

Step 2: We can download IndicatorList.xlsx from DHS GitHub site, and search the keyword again in the corresponding chapter to find out

  1. which DHS data recode is used to create this indicator
  2. which file contains the code to create this indicator in the DHS GitHub repository.
  3. what the corresponding variable name is in the DHS GitHub repository.

For our example, since we are looking up the indicator of all women currently using any modern method of contraception, we identify the cell “currently use any modern method”, and find out that the code to process this indicator is in the FP_USE.do file and we need IRdata (Individual Recode). We also identify that the variable name used in the R code on the GitHub repository is “fp_cruse_mod”. We will use all three pieces of information in the next step to find the code that processes the indicator.

Screenshot of Step 2: Finding file name and recode name.

Figure 3: Screenshot of Step 2: Finding file name and recode name.

Step 3: In the Github repository, we find the folder for Chapter 7, the file FP_USE.R, and search “fp_cruse_mod” to find the following chunk of code script.

Screenshot of Step 3: Finding code.

Figure 4: Screenshot of Step 3: Finding code.

We extract the following chunk of codes for this indicator, which takes the IR data, and perform a few steps of data cleaning.

# Currently use modern method
IRdata <- IRdata %>%
    mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
    set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
    set_variable_labels(fp_cruse_mod = "Currently used any modern method")

We can use the code chunk to define a new function to be used in the getDHSindicator function. The self-defined function should:

  • Use Recode file as input and return the same data.frame.
  • Change the name of your variable into "value" in the end.

The example below creates a fp_cruse_mod function for “Current use of any modern method of contraception (all women)”, which can be recognized by the getDHSindicator function later.

fp_cruse_mod <- function(RData) {
    IRdata <- RData %>%
        mutate(fp_cruse_mod = ifelse(v313 == 3, 1, 0)) %>%
        set_value_labels(fp_cruse_mod = c(yes = 1, no = 0)) %>%
        set_variable_labels(fp_cruse_mod = "Currently used any modern method")
    colnames(IRdata)[colnames(IRdata) == "fp_cruse_mod"] <- "value"
    return(IRdata)
}

Finally, after we create this function fp_cruse_mod, we can create the indicator by

  1. Use the getDHSdata() function to download the relevant DHS datasets using the DHS data type identified in Step 2. In this example, we specify Recode = "Individual Recode" and indicator = NULL. The recode must be one of the recode names in Table 1. We can also set Recode = NULL, in which case all available DHS data types will be downloaded.
  2. Use the function FUN = fp_cruse_mod in the call of getDHSindicator to process the indicator according to the customized function.

Altogether, the following code creates the dataset of the processed indicator.

year <- 2018
country <- "Zambia"
Recode <- "Individual Recode"
dhsData <- getDHSdata(country = country, indicator = NULL, Recode = Recode, year = year)
data <- getDHSindicator(dhsData, indicator = NULL, FUN = fp_cruse_mod)
head(data)
##   cluster householdID            v022            v023    v024  weight strata
## 1       1           1 eastern - rural eastern - rural eastern 1892890  rural
## 2       1           2 eastern - rural eastern - rural eastern 1892890  rural
## 3       1           3 eastern - rural eastern - rural eastern 1892890  rural
## 4       1           4 eastern - rural eastern - rural eastern 1892890  rural
## 5       1           7 eastern - rural eastern - rural eastern 1892890  rural
## 6       1           9 eastern - rural eastern - rural eastern 1892890  rural
##   value
## 1     0
## 2     0
## 3     0
## 4     1
## 5     1
## 6     1

3 Multiple dataset

Some indicators, such as HIV prevalence, require additional data files. Take, HIV prevalence among general population, as an example. This is a built-in indicator with the ID “HA_HIVP_B_HIV”. It needs three Recode files: Individual, Men’s and HIV Test Results, so that the output of getDHSdata() and the input for getDHSindicator() is a list of three data files.

HIVdhsData <- getDHSdata(country = country, indicator = NULL, Recode = c("Individual Recode",
    "Men's Recode", "HIV Test Results Recode"), year = year)