Measuring gaps in access

Understanding CART

The ESCAP LNOB trees use the Classification and Regression Tree (CART) methodology to determine the best split at each node. For each node, an algorithm evaluates all possible splits to determine which circumstance provides the most information about an indicator. Each node is split in this way until further splits cease to yield additional information or the sample size becomes too small.

Figure 1: A simple example of a Classification and Regression Tree (CART)


To understand this process, imagine you have a basket filled with apples, oranges, and limes, and you want to count how many fruits there are of each type (Figure 1). To do this, you could sort them according to different characteristics, such as shape, size or color. Sorting by shape isn’t very useful, because apples, oranges, and limes are all round. Size is more useful, but it still doesn’t provide enough information to distinguish between apples and oranges. Sorting the fruits by color would be the most useful. So, color would become the first split in the tree.

CART In Practice

Let’s look at how the CART methodology works in practice with LNOB trees (Figure 2). In the example below, our indicator of interest is “Access to Electricity”. The CART algorithm looks for the circumstance that shows the greatest disparity between groups. The algorithm starts by searching for the first split (or "partition) of the tree, using a splitting criterion. The splitting criterion can be defined in several ways, but the one used here is the Analysis of Variance or "ANOVA". 

Let's assume there are two possible splits for our indicator: a split on “education” (less vs. more), or a split on “residence” (urban vs. rural). The “residence” split may result in groups that are more homogenous and better satisfy the splitting criterion. In that case, “residence” would become the circumstance leading to the first split in our LNOB tree.

Building Trees with CART

CART analysis doesn’t stop at the first split. It analyzes subsequent nodes to see if further splits can be made, using the splitting criterion. In the example below, the “urban residence” node is further split by “education” (more vs. less). We can see that people who reside in urban areas with more education are the group furthest ahead, with 100% access to electricity.

The CART algorithm did not split the node for “rural residence.” The splitting criterion was not met, or the sample size was too small to be split further. This means that people living in rural areas are the group furthest behind, with only 25% access to electricity.

Figure 2: Simplified example of using CART for LNOB

Measuring Inequality

D-Index Explained

D-Index is a way to measure inequality of access across all groups in a sample. Let’s look at the countries in the example below: the average rate of access to electricity in each country is 65%, but but different social groups have different rates of access. D-Index values range from 0-1: a higher number indicates more inequality, while a lower number indicates less inequality. D-Index is comparable to the Gini Coefficient, which is a different aggregate measure of economic inequality.

While LNOB trees are useful for comparing rates of access across groups, D-Index provides a single number that can summarize inequality of access for all groups in a sample. D-Index facilitates easy comparisons between samples, such as between provinces, countries, or groups of countries.

D-Index has been adapted so that the value of a barrier (e.g. childhood malnutrition) still has the same interpretation as that of an opportunity: the lower the D-index the lower the inequality. In general, the D-Index measures the distribution of a positive outcome. Malnutrition is not a positive outcome, but rather a barrier for a child’s development prospects. To calculate the D-Index for this barrier, while keeping the same interpretation as for other positively defined indicators (opportunities), the absence of stunting is first calculated. The remaining calculations follow the same formula as for standard positively defined indicators.



LNOB Trees can…

  • Identify the groups that are the furthest behind using shared circumstances
  • Reveal which circumstances are associated with the biggest gaps in access to basic opportunities
  • Help policymakers understand who to prioritize for initiatives to reduce inequality

LNOB Trees can’t…

  • Capture population groups that represent smaller share than 5% of the reference population
  • Explain causal relationships between circumstances and outcomes
  • Predict who will be the furthest behind in the future or in a different sample