The interplay between volunteers and firm’s employees in distributed innovation: emergent architectures and stigmergy in open source software

Situation when π = 1 and γ_v and λ_v become irrelevant (graph is a plane), all quadrants

	Average (AVG)			Standard dev. (SD)			SD/AVG
Parameters	Gini	Beta	Perf	Gini	Beta	Perf	Gini	Beta	Perf
γ_e = 0; λ_e = 0	0.173	0.014	662.665	0.005	0.015	22.528	0.031	1.046	0.034
γ_e = 0; λ_e = 4	0.388	0.093	985.428	0.006	0.014	11.099	0.014	0.149	0.011
γ_e = 1; λ_e = 0	0.271	0.018	599.142	0.007	0.019	20.701	0.026	1.046	0.035
γ_e = 1; λ_e = 4	0.760	0.021	541.691	0.010	0.010	25.497	0.013	0.472	0.047
γ_e = 0; λ_e = 20	0.011	0.000	221.471	0.022	0.000	0.847	1.961	2.243	0.004
γ_e = 10; λ_e = 0	0.052	−0.005	209.822	0.046	0.004	9.356	0.877	−0.877	0.045
γ_e = 10; λ_e = 20	0.000	0.000	221.100	0.000	0.000	0.000	.	.	0.000

	Average (AVG)			Standard dev. (SD)			SD/AVG
Parameters	Gini	Beta	Perf	Gini	Beta	Perf	Gini	Beta	Perf
γ_e = 0; λ_e = 0	0.173	0.014	662.665	0.005	0.015	22.528	0.031	1.046	0.034
γ_e = 0; λ_e = 4	0.388	0.093	985.428	0.006	0.014	11.099	0.014	0.149	0.011
γ_e = 1; λ_e = 0	0.271	0.018	599.142	0.007	0.019	20.701	0.026	1.046	0.035
γ_e = 1; λ_e = 4	0.760	0.021	541.691	0.010	0.010	25.497	0.013	0.472	0.047
γ_e = 0; λ_e = 20	0.011	0.000	221.471	0.022	0.000	0.847	1.961	2.243	0.004
γ_e = 10; λ_e = 0	0.052	−0.005	209.822	0.046	0.004	9.356	0.877	−0.877	0.045
γ_e = 10; λ_e = 20	0.000	0.000	221.100	0.000	0.000	0.000	.	.	0.000

Notice: γ_e, λ_e and γ_v, λ_v have the same roles in the equations. Thus, we obtain the same surfaces both varying γ_e, λ_e to their full extent with employees only and varying γ_v, λ_v to their full extent with volunteers only. Within this surface, reaching γ_e = 10 or λ_e = 20 produces non-interesting extreme solutions, while maintaining γ_e between 0 and 1 and λ_e between 0 and 4 results in interesting dynamics. On this basis, we defined firms’ strategies, restricting our analysis to that portion of the plane.

Table 1.

Situation when π = 1 and γ_v and λ_v become irrelevant (graph is a plane), all quadrants

	Average (AVG)			Standard dev. (SD)			SD/AVG
Parameters	Gini	Beta	Perf	Gini	Beta	Perf	Gini	Beta	Perf
γ_e = 0; λ_e = 0	0.173	0.014	662.665	0.005	0.015	22.528	0.031	1.046	0.034
γ_e = 0; λ_e = 4	0.388	0.093	985.428	0.006	0.014	11.099	0.014	0.149	0.011
γ_e = 1; λ_e = 0	0.271	0.018	599.142	0.007	0.019	20.701	0.026	1.046	0.035
γ_e = 1; λ_e = 4	0.760	0.021	541.691	0.010	0.010	25.497	0.013	0.472	0.047
γ_e = 0; λ_e = 20	0.011	0.000	221.471	0.022	0.000	0.847	1.961	2.243	0.004
γ_e = 10; λ_e = 0	0.052	−0.005	209.822	0.046	0.004	9.356	0.877	−0.877	0.045
γ_e = 10; λ_e = 20	0.000	0.000	221.100	0.000	0.000	0.000	.	.	0.000

	Average (AVG)			Standard dev. (SD)			SD/AVG
Parameters	Gini	Beta	Perf	Gini	Beta	Perf	Gini	Beta	Perf
γ_e = 0; λ_e = 0	0.173	0.014	662.665	0.005	0.015	22.528	0.031	1.046	0.034
γ_e = 0; λ_e = 4	0.388	0.093	985.428	0.006	0.014	11.099	0.014	0.149	0.011
γ_e = 1; λ_e = 0	0.271	0.018	599.142	0.007	0.019	20.701	0.026	1.046	0.035
γ_e = 1; λ_e = 4	0.760	0.021	541.691	0.010	0.010	25.497	0.013	0.472	0.047
γ_e = 0; λ_e = 20	0.011	0.000	221.471	0.022	0.000	0.847	1.961	2.243	0.004
γ_e = 10; λ_e = 0	0.052	−0.005	209.822	0.046	0.004	9.356	0.877	−0.877	0.045
γ_e = 10; λ_e = 20	0.000	0.000	221.100	0.000	0.000	0.000	.	.	0.000

Notice: γ_e, λ_e and γ_v, λ_v have the same roles in the equations. Thus, we obtain the same surfaces both varying γ_e, λ_e to their full extent with employees only and varying γ_v, λ_v to their full extent with volunteers only. Within this surface, reaching γ_e = 10 or λ_e = 20 produces non-interesting extreme solutions, while maintaining γ_e between 0 and 1 and λ_e between 0 and 4 results in interesting dynamics. On this basis, we defined firms’ strategies, restricting our analysis to that portion of the plane.

Notice that the firm choosing λ_e and γ_e fully respects the distributed nature of the innovation process of open and collaborative communities (Baldwin and von Hippel, 2011): employees do still choose which module they want to contribute to, even if the context of their choices is shaped by the firm. The stochastic element in our model allows a situation in which, even when the firm fixes the parameters defining the attractivity of contributing to each module, two different employees of the same firm can still choose to contribute to different modules. Being the development of our model path dependent, this effect may be magnified as the architecture evolves, leading to very different results. This intervention of the firm on the environment where employees act is consistent with the literature ranging from Anderson (1999) to Gulati et al. (2012) that we recalled in the paper and that calls for a more indirect control of the management on self-organizing groups of innovators.

We also distinguish between employees and volunteers by setting differently α, the number of lines of code contributed by the developer. David et al. (2003) gathered data on the hours per week developers spend working on their current project. On that basis, we can posit that the lines of code produced by a developer resemble an exponential distribution. By using the classical inverse transformation method on the cumulative distribution (e.g., Ross, 2003), we can employ the following exponential random number generator to model the number of lines of code produced by a developer working freely on the code (a volunteer):

$$\alpha = - \delta \left[ {1n\left( {1 - p} \right)} \right]$$

(1)

where |$p \in \left[ {0;1} \right]$| is uniformly distributed and δ is the mean of the distribution, which for simplicity we set equal to 1. In line with the idea that firms pay for the time spent by each employee on the project and thus exert a certain control over the amount of code produced, we can use the same distribution (they are all developers, in the end) but with zero variance, leading to a non-stochastic δ = 1 for employees.

Finally, another lever the firm has is trying to control the access of volunteers to its projects. This control is total as long as the firm decides to keep the project in-house, where volunteers are not allowed. However, when it “goes open,” firm’s control on the number of volunteers willing to join is limited. It can still control the number of allocated employees, though. In our model, we thus focus on the proportion π of employees and vary it from 0 to 100%, so that we are able to see what happens in all the possible cases. This is also in line with the managerial literature explored above, where Anderson (199) explicitly says that “Managers can indirectly influence the emergence of adaptive behavior by … changing the demography of an organization[. This] will alter the pattern of behavior that emerges from it…. much more research will be required to help strategists think about how to guide the strategic evolution of an enterprise by making specific types of demographic changes” (Anderson, 1999: 229).

3.4 Outcome variables: speed, performance, and division of labor

We want to describe the evolution of the architecture both dynamically, following the pattern of expansion of the code over its development, structurally, investigating how the division of labor allocates developers’ effort to different modules in ways that affect the overall shape of the architecture, and qualitatively, assessing the performance of the development process in terms of code evolution. We are interested in how, for different configurations of the parameters (γ_e; λ_e) and (γ_v,; λ_v), these dimensions evolve together as code is being produced. We aim at identifying trade-offs, independencies, and coevolution of the three outcome variables to single out the strategies that may help firms to strike a positive balance between them.

First, the dynamics of collaborative innovation can proceed at different levels of speed (Zhong and Ozdemir, 2010). We thus need an outcome variable measuring the speed of architecture emergence. The empirical observation on the pace at which the number of files grew in the period following the initial release of a software product, Lehman’s (1980) “Fourth Law” revised by Turski (1996), held that the pace at which files were added was close to linear but tended to slow down in absolute terms as well as in proportion to the existing code base (Robles et al., 2005a), producing a curvilinear relationship between the cumulated code basis and the marginal addition of new code. Among explanations that could be offered in support of this assertion, there is the argument that the number of possible interconnections among n files would increase approximately as the square of their number. As the software will become increasingly more complex, a rapidly rising amount of effort will be dedicated in understanding the previous code and debugging it, thereby slowing the pace at which further files could be added (Feller and Fitzgerald, 2002; Midha et al., 2010).

This means that we can judge architecture emergence in dynamic terms considering how the number of modules evolves over the development of the architecture. Evaluating the dynamics of architecture emergence, a positive scenario is represented by super-linear growth, where newly developed software modules are created at an increasing speed (e.g., Scacchi, 2006), speeding up architecture growth. Instead, sublinear growth manifests when new modules are created at a slowing pace, resulting in a less favorable scenario in which the evolution of the architecture tends to slow down. In our model, we can proxy the pace of architecture evolution by using the number of “cycles,” i.e., the iterations our model goes through while running the simulations. A simple way to capture speed is thus estimating a linear fit for the scatter plot of the average number of modules per cycle so far vis-à-vis the number of cycles (considered a proxy for the pace of architecture evolution) and then recording its slope. The higher the slope (Beta), the more new modules are created later in the process, increasing the average number of modules per cycle as cycles evolve. A smaller Beta means that new modules are less frequently created as cycles proceed, depressing the average number of modules per cycle.⁴ Thus, the higher the Beta, the better our evaluation of the dynamic evolution of the architecture.

Second, we want to establish a variable able to parsimoniously capture and judge the structure of the emergent architecture. This is not an easy task, as the structure of the architecture is a very complex object. As a preliminary observation consider that the organization of a distributed innovation project is usually an “onion model” (Crowston and Howison, 2006), where a periphery, usually several orders of magnitude larger than the core (David and Rullani, 2008), contributes to the development of the code providing inputs very different from those assured by the leading project team (Rullani and Haefliger, 2013). The distribution of the effort is indeed very skewed (Krishnamurthy, 2002), with the core members of the projects developing most of the code and usually focusing on the modules that represent the backbone of the project, while peripheral members provide much smaller contributions in terms of code but act on a wider set of modules, guided by their specific needs even when fulfilled by modules very far from the root (Dalle and Jullien, 2003). This is useful in terms of code development, as the periphery has specific functions that the core is unable to perform (Rullani and Haefliger, 2013). For example, “peripheral developers make significant contributions to product quality and diffusion, especially on projects that are in the more mature stages of product development” (Setia et al., 2012: 144). As the organizational structure of the project and the code architecture tend to mirror each other (Colfer and Baldwin, 2016), the organization described above maps into a mixed architecture, with coexisting small and large modules (e.g., see Koch and Schneider, 2002 for the GNOME case). We believe that a mixed architecture is desirable precisely because it is a feature expressing the division of labor between the periphery and the core, a division of labor that is fundamental for the OSS model to work (O’Mahony, 2003; Giuri et al., 2010; Rullani and Haefliger, 2013).

This is in line also with another consideration. Baldwin and Clark (2006) consider modules as a developmental option, whose function is to provide the architecture with incipits for future development directions. That option can be taken or not, but the presence of the module itself opens the possibility for the architecture to evolve, and thus, it is a value for the developers. In that sense, new modules represent the exploration of new possible areas of developments. New modules, that are smaller by definition, may thus represent the exploratory side of the architecture. An example can be the driver for an unknown printer developed by a peripheral member of the project, as recalled by Dalle and Jullien (2003). Larger modules, on the contrary, are modules whose code has been cumulated toward the aim of creating effective code. They are the exploitation side of the architecture. An example of this is the “vertical bus” modules, which are crucial modules of Linux and are at the core of its architecture with links to many other peripheral modules (MacCormack et al., 2006). We could not retrieve their size, but we can assume their code needs to be large enough to respond to the many different calls requested by other modules. Thus, a good code structure is diversified because this means relying on large modules, assuring the stability of the main direction of development (exploitation), and at the same time on a number of smaller modules assuring the “option value” of the modular structure (exploration). Indeed, there are some hints that OSS project like Linux may exhibit such structure. Ghosh and David (2003), in their study of the development of the Linux kernel, show that the distribution of package sizes is very skewed, with 10% of the packages accounting for more than half of the project’s total code. This relatively striking feature means that there are a limited number of packages receiving large contributions and a large number of packages with only a limited number of contributions. Moreover, the diversity in terms of package sizes rises over time. In the model, we measure the diversity of the emergent architecture using the Gini coefficient across the module size in terms of lines of code. According to the previous discussion, we place architectures with higher Gini coefficients (Gini) in a better position in terms of a desirable core–periphery structure.

Finally, we want to capture the performance of the development process in terms of how evolved the code is. The easiest measure to keep track of code evolution is by measuring its versions. Projects whose development process performs well, are able to produce architectures with modules with higher version numbers, and vice versa. Assuming that version numbers of modules increase proportionally to the amount of work spent on them, we can derive version number simply by accounting for each developer’s intervention on that module. We can then combine all modules’ versions into one unique index by using modules’ position in the architecture as weights, being modules closer to root more difficult to evolve. Additionally, as modularity has been proved to be crucial for software systems (Baldwin and Clark, 2006; MacCormack et al., 2006), we expect that performing code development processes would produce architectures that are more modular. To account for that, we add a multiplier proportional to the number of modules the architecture is made of. All this is captured in the following equation:

$$Perf = \left( {1 + M/10} \right) \cdot \mathop \sum \limits_m^M \left( {{v_m}/{d_m}} \right)$$

where M is the number of modules the architecture is made of (weighted by a factor 1/10), the term v_m captures the version number of module m, and d_m is the distance from the root. Operatively, we consider qualitatively more performing development processes those with the highest value of Perf.

4. The model

4.1 Decision rules

We are now ready to see how all these building blocks enter in the decision rule of the developers governing to which module each one of them will contribute.

We start by setting the attractiveness |${r_m}$|of contributing to the module m as the combination of three main components (equation 2). The first component is |${x_m}$|⁠, the number of lines of code module m is made of (whose characteristic exponent is γ). The second component is |${d_m}$|⁠, the distance in the code tree of module m from the first “root” module (whose weight in the function is controlled by the characteristic exponent λ). We use the inverse of |${d_m}$| to account for the fact that when exponent λ is higher than 1, contributing to modules far from the root is deemed less attractive. The third component accounts for the impact the developer’s contribution can have on the advancement of the module. We assume that contributing to a certain module will be more attractive if the developer knows that her α will generate a larger impact. This is captured by the term |$\Delta {v_m}$| that measures the improvement in the module’s version generated by the developer’s contribution α. |$\Delta {v_m}$| is then used to weight the ratio between the two previous components. Notice that we believe the version is also a function of |${x_m}$| and |${d_m}$| as it depends on the lines of code the module is already made of and of the distance it has from the root (which determines also the ease of development) (equation 3).

In (2) |$\lambda $| is the characteristic exponent of |${d_m}$|⁠, determining how much attractive is contributing to modules positioned closer to the root (i.e., with low |${d_m}$|⁠).⁵ When |$\lambda = 0$|⁠, all modules are equally attractive, whatever their height |${d_m}$|⁠, whereas, as |$\lambda $| increases, the attractiveness of contributing to each module changes according to its position. For example, for projects written in C, the most important pieces of code are usually closer to the root and thus contributing to them may be more attractive.

Consider now term |${x_m}$| in equation (2). Being the number of lines of code, it clearly captures the size of module m. When |$\gamma = 0$|⁠, all modules are the same in terms of attractivity. When |$\gamma $| is high, contributing to larger modules becomes more attractive. This can be the case with Linux, where the high diversity in module size allows developers to join the many developers working on larger modules and learn from them.

A developer compares the values of attractiveness of contributing to each module and then chooses how to allocate her effort, and her code α, on this basis. To model such choice, we apply discrete choice theory (Anderson et al., 1992). For every cycle of developer’s choice, the simulation calculates the attractiveness of contributing to all the existing and new (potential) modules. The higher the attractiveness, the higher the probability that the developer will contribute to that module. Rather than comparing directly the levels of attractiveness, we introduce a stochastic element using the attractiveness of each module to define the probability of “drawing it” from a uniform distribution. To see how this works in practice, consider a simplified scenario with only two modules, where module A has an attractiveness of 0.2, while module B has an attractiveness of 0.6. Imagine then placing the attractiveness of the two modules along the segment [0;1]. Module A would correspond to the portion of the segment spanning [0;0.2], while B will correspond to [0.2 + ε;0.8]. A random draw from a uniform distribution ranging [0;1] could thus be located in the part of the segment corresponding to module A (i.e., [0;0.2]), to module B (i.e., [0.2 + ε;0.8]), or could have a value higher than 0.8, where the developer discards all the existing module and creates new modules. New modules, here called potential modules, are modeled as “spin-offs” of the existing module and will technically depend on them. This creates a representation of the code that has the operational advantage of disentangling clearly between modules that are considered core and closer to the original module (root) from those that are more ancillary and more distant from it.

We thus can express the probability that the developer chooses module i over other existing and potential modules as a function⁶:

$${\bf{\it{P}}}\left( i \right) = f\left( {{{{\rho _i}\left( \alpha \right)} \over {\mathop \sum{\rho _m}\left( \alpha \right) + \mathop \sum{\rho _{m^{\prime}}}\left( \alpha \right)}}} \right)$$

(4)

where |${\rho _i}\left( \alpha \right)$| represents the probability of editing module i, while |${\rho _m}\left( \alpha \right)$| and |${\rho _{m^{\prime}}}\left( \alpha \right) $|are the probability of editing an existing module m and a potential module m′, respectively. As it is easy to see, the former is positively related to the probability of choosing i, while an increase in the sum of the probabilities of editing existing modules and potential modules decreases the probability of choosing i (Anderson et al., 1992).

4.2 Simulation dynamics

The simulations we run on the building blocks explained above are such that at each discrete time step, a new contribution is added to the existing system, i.e., either an existing module is improved or a new one is created. The specific steps followed in each cycle are these (see Figure 1):

Figure 1.

The flow of the simulation

A typology of developer—employee or volunteer—is chosen randomly on the basis of the proportion of employees π set by the firm.
For each and every module—including potential ones—the developer calculates how much contributing to it will be attractive to her using equations (2) and (3). This computation is based on the module’s positions with respect to the root (weighted by the developer’s λ) and the module’s size in terms of lines of code (weighted by her γ). Of course, if the developer is an employee, her λ and γ are set by the firm as [γ_e; λ_e]. If she is a volunteer (i.e., [γ_v,; λ_v]), both are explored along a larger set of possible combinations.
Once a choice is made, the developer contributes to the chosen module α lines of code, set by the firm or randomly determined by equation (1), in this way affecting also the relative size and position of all the other modules (i.e., changing the architecture).
The values of the system are modified accordingly, and the cycle is then repeated.

We iterate this cycle 200 times.

Notice that the stigmergic dynamics captured by the cycles in our model are meant to closely map the description that Anderson (1999) offers of the best management attitude toward evolving systems such as distributed innovation projects: “When agents are added to, deleted from, or recombined within a network, a coevolutionary cascade results; in dynamic equilibrium, some of these cascades will result in large-scale adaptation, allowing a continuous series of small changes to generate evolution in a punctuated equilibrium …” (Anderson, 1999: 229).

Operationally, we first fix the firm’s strategy at the beginning by setting π, i.e., the proportion of employees and the couple (λ_e; γ_e), and then investigate the whole plane of volunteers’ parameters (λ_v, γ_v) simulating for each couple of parameters the formation of 10 code architectures. For each code structure, we compute the three outcome variables Gini, Beta, and Perf; average their values over the 10 simulations (keeping track also of their standard deviations); and attach the resulting array to the specific couple (λ_v, γ_v) that produced them. The results are then captured by producing the figures and tables shown in the next section of the paper.

4.3 Results

We start from the results obtained when there are no volunteers involved, only employees. In this case, volunteers’ parameters γ_v and λ_v have no influence, and the resulting surface is a plane. Table 1 gives a sense of this case.

The table shows that the indicators for the three outcome variables cannot be maximized at the same time. Indeed, there is a clear trade-off between the Gini on one side and Beta and Perf on the other side. All the three outcomes are maximized with a certain precision (small ratio between standard deviation and mean) along the line λ_e = 4, but Beta and Perf require γ_e = 0, reaching their peaks at 0.093 and 985, respectively, while Gini requires γ_e = 1 to become 0.76. This means that firms should push employees to work on modules closer to the root in all cases, but the indication to work also on large modules depends on which dimension among development speed, performance, or core–periphery division of labor the firm wants to maximize: it cannot get the top of all outcomes.

How can managers overcome these trade-offs? Allowing volunteers into the project seems to be a viable solution to be tested. When moving to the other boundary condition and exploring what happens when only volunteers are admitted into the project, the result resembles the three graphs in Figure 2.

Figure 2.

Situation when π = 0 and γ_e and λ_e become irrelevant (same graph for all scenarios)

As there are no employees, γ_e and λ_e become irrelevant, and the graphs are the same for any combination of the two.⁷ Figure 2 shows that there is a smaller triangle, namely (γ_v;λ_v,) = (2;0)–(0;12)–(2;12), in which all indexes peak, i.e., the trade-off is less stringent. In such space, if volunteers have γ_v = 0 and λ_v is varying only between 4 and 8, then the three indexes reach values that are very close—or even higher—than those seen for the case where there are no volunteers. For λ_v = 4, Beta is the highest (0.098), above the best in Table 1 (0.093), Perf remains very high (943), only 4% lower than that in Table 1 (985), and Gini is 0.54, almost 30% higher than what we can get with the same combination of γ_e and λ_e when no volunteers are allowed (0.338). Moreover, in case Gini is what matters for the firm, λ_v = 8 assures a value (0.70) that is just 8% lower than its max in Table 1 (0.76), while keeping Beta (0.081) and Perf (706) relatively high. The best compromise is λ_v = 6 where the three indexes have values of 0.61, 0.089, and 852, respectively. This is not the only choice for firms. Another interesting combination is λ_v = 2 and γ_v = 1. In such case, Beta is taxed (0.050), but Gini and Perf remain high (0.60 and 829, respectively). In conclusion, having all volunteers can solve the trade-off observed for all employees and lead to better results but requires the ability to select volunteers, to make sure their λ_v and γ_v are within a precise small range, something that is very difficult to realize by firms.

Having no or all volunteers are clearly boundary conditions. Mixture of volunteers and employees can indeed produce even better results or results that may be easier to develop and maintain. To explore these situations, we structured our simulations in four scenarios, each defined on the basis of the incentive schemes applied by the firm among the combinations [λ_e; γ_e] = ([0;0], [0;1], [4;0], [4;1]). The four scenarios, thus, represent all firms’ strategies we consider in the model. For each scenario, we vary stepwise the proportion of employees π from 1 to 0 and see for each of these levels how different combinations of volunteers’ λ_v and γ_v impact the emerging code architecture in terms of the three outcome variables defined above: Gini, Beta, and Perf. We ordered the description of the scenarios from the most interesting to the least interesting.

The most interesting scenario in this case is scenario (A) (γ_e = 0 and λ_e = 4). Provided that volunteers attracted to the project strongly prefer contributing to modules close to the root or compensate for the lack of such interest by at least a mild preference for large modules, having 20% of the project team made of volunteers assures that Gini raises to an average of 0.49 and that Perf reaches an average of 1018, while Beta has only a slight decrease to an average of 0.075. However, as Beta has always a large standard deviation, this means that the firm may also end up in a positive tail of the distribution, and reach values as high as 0.10. As the proportion of volunteers grows to 40%, average Gini increases to 0.61, while the averages of two other indexes have only a slight decrease. Beta has a mean of 0.055, and the firm can still hope to reach 0.081 when lucky enough to end up in a tail of the distribution. Perf is still very high: the mean is 953. As the proportion of volunteers increases, the trade-off manifests again, and while average Gini grows high (up to 0.80, then dropping drastically for all volunteers, see Figure 3), both averages of Beta and Perf diminish. Thus, this represents certainly a very interesting scenario when volunteers are few, especially if Perf is deemed as important.

Figure 3.

Passage for Gini from π = 0.2 to π = 0 when γ_e = 0 and λ_e = 4

These effects can be seen however only if the firm is able to select volunteers whose inclinations are at least toward one typology of modules (either large or close to the root, or both). In the model, this is verified when the linear combination of their parameters is above the segment [γ_v; λ_v,] = [(0;∼8)–(∼3;0)]. Below that segment, however, the situation is less positive. For 20% of volunteers, while Beta reproduces the same values as in the rest of the plane, Perf shows the valley represented in Figure 4, and the Gini moves downward toward the origin. In this case, the combination of λ_v and γ_v must be such that the system remains as close to the edge of the segment as possible. Increasing the percentage of volunteers makes the trade-off between Perf and Beta, on one side, and Gini, on the other side, emerge again. Indeed, while Gini still moves downward toward the origin, both Perf and Beta radically change their shape, and now they increase their values the closer they get to the origin. The situation in which the trade-off is milder is when 60% of the team is made of volunteers, γ_v is 0, and λ_v is 6. In such case, Perf is 929 and Beta reaches even a large (but very unstable) 0.10, and Gini is 0.52. This combination is not very far from what we can observe with the same parameters when volunteers are 80% and 100%. However, this is just one specific combination, very difficult to strike with such precision.

Figure 4.

Perf for γ_e = 0; λ_e = 4; π = 0.8

Overall, thus, the firms need to be selective in the volunteers they attract. In case volunteers’ number does not exceed that of employees, volunteers’ inclination should be toward either large modules or those close to the root (especially the last one), or both. If volunteers are more than employees, the firm must change radically its approach and attract volunteers not interested in module size and only moderately interested in modules close to the root.

In scenario (B) (γ_e = 1 and λ_e = 0), high-enough values of λ_v (equal or larger than 4) assure high values of Perf for all proportion of volunteers from 20% to 60% (from 842 to 924, on average). In the same area, average Gini also performs well, steadily growing from 0.40 when volunteers are 20% to almost 0.70 when they are 60%. In this area of the plane, Beta is still very small (from 0.005 to 0.015 on average) in almost all combinations of our parameters. Closer to the origin one can however strike better balances between the indexes, if volunteers are the majority. In this case, λ_v equal or 4 or 6 and γ_v equal to 0 assure that Beta remains high, between 0.053 and 0.097, while Gini remains between 0.46 and 0.57, and Perf between 890 and 934. Overall, there are thus only two combinations in the whole scenario that ease the trade-off. This scenario is thus clearly worse than scenario (A).

In scenario (C) (γ_e = 0; λ_e = 0) when λ_v is high (larger than 10) or compensated by γ_v (3 or more), Perf and Gini can reach interesting values: Perf has its highest average peak (1060) when 40% volunteers are introduced, and these levels can be maintained while Gini can still be around 0.50. Development speed is however a serious problem: average Beta starts small (with an average of 0.014) and worsens even more with more volunteers. Many combinations of λ_v and γ_v even result in negative Beta’s. Also close to the origin development speed is problematic: as long as there is at least one employee in the project, there is only one combination of parameters that produces Beta larger than 0.070 and only two above 0.056. Fortunately, the combination that scores the best Beta also eases the trade-off between the indexes: with 80% of volunteers, λ_v = 6 and γ_v = 0, Beta, Perf, and Gini become equal to 0.086, 949, and 0.55, respectively. However, no other combination in this area has comparable results. In the whole scenario, thus, there is only one combination that performs well. Overall, this scenario seems worse than the previous two.

In scenario (D) (γ_e = 1; λ_e = 4), attracting a portion of volunteers that ranges from 40% to 80%, all λ_v = 4 or 6 and γ_v = 0, or λ_v = 2 and γ_v = 1, selects Beta that most of the times are between 0.060 and 0.080. This region still assigns high values to Perf, which shows its peak in the origin but keeps high values also here (750–960). Gini remains always relatively high (between 0.57 and 0.71), especially when volunteers are few, while dropping when volunteers become the majority. Gini is shaped the opposite of Perf and Beta, dropping toward the origin, as shown in Figure 5. It is an area where striking the balance between the different outcome indexes is hard, as the firm has to select quite sharply the volunteers with the best combination of γ_v and λ_v. However, in this scenario, it is the only strategy available to the firm, as the plane resulting from high λ_v and γ_v that could give better results in scenario (A) is seriously underperforming here, with low and decreasing Perf and Beta throughout all the combinations of volunteers and employees (see Figure 5). In general, thus, this scenario can be an alternative to scenarios (C) and (B) but is dominated by scenario (A).

Figure 5.

The three indexes when γ_e = 1; λ_e = 4 and π = 0.4

In conclusion, the four scenarios depict a situation in which a firm developing OSS only by employed developers faces a trade-off between having a productive and fast project development vis-à-vis an ideal periphery-core structure, or, in terms of our measures, Perf and Beta on one side and Gini on the other side. Giving employees the only indication to develop modules close to the root, an indication as strong as γ_e = 0; λ_e = 4, allows high levels of the latter but depressed the former. Vice versa if the firm also requires the chosen modules to be large (γ_e = 1). The intuition behind these results is that in our model the firm proves to be unable to generate enough diversity in the developers to make them attend to different tasks, so that they can contribute more widely to each outcome. The management can escape this situation by opening the project to volunteers. This strategy can go as far as having no employees, and this already softens the trade-off. In this case, however, the firm must attract volunteers with specific preferences toward module size and position in the structure, meaning precise (and small) values of λ_v and γ_v. When only one type of developer is allowed, the best results correspond to a very small region of the plane, asking the firm to exert more control than it is likely to have when operating in a distributed innovation context. Allowing for a mixture of employed developers and volunteers creates more room for maneuvering, allowing to strike competitive results but without such precise indications on the type of needed volunteers. The best strategy here is a mix of prescriptions, as in scenario (A): pushing employees toward modules close to the root, attracting volunteers but not as many as employees, and trying to select volunteers with a high interest in working on the modules close to the root (high λ_v) or, in case, compensate that with an interest also in large modules (large γ_v). Choosing other scenarios or attracting many volunteers are dominated by the previous strategy and can only lead to either worse results in one or more outcome indexes, and/or to the need to be even more accurate in the selection of volunteers, who must have a very precise profile (meaning, precise values for λ_v and γ_v). The intuition behind this result is that a balanced mixture of volunteers and employees allows for more diversity, “spreading” developers’ effort toward modules that contribute to all the outcomes at the same time. At the same time, it allows a mixture of volunteer-led exploration and autonomous growth, and firm-led directed growth, merging together the benefits of distributed open innovation offered by the former and coordination toward aggregate outcomes offered by the latter. The following Table 2 provides a summary of the possible strategies firms could follow on the basis of their objectives.

Table 2.

Main results for each scenario

γ_e	λ_e	π	γ_v	λ_v	Gini	Beta	Perf	Trade-off	Region of parameters
0	4	100	–	–	0.388	0.093	985.428	strong	vast
1	4	100	–	–	0.760	0.021	541.691	strong	vast
–	–	0	0	4	0.54	0.098	943	medium	small
–	–	0	0	6	0.60	0.050	829	eased	small
–	–	0	0	8	0.70	0.081	706	medium	small
0	4	80	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.49	0.075	1018	medium	vast
0	4	60	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.61	0.055	953	eased	vast
0	4	60↓	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		↑	↓	↓	strong	vast
0	4	40–20	0	6	0.52	0.10	929	strong	small
1	0	40–80	–	>4	0.40–0.70	0.005–0.015	842–924	strong	vast
1	0	20–40	0	4–6	0.46–0.57	0.053–0.097	890–934	eased	small
0	0	40	(γ_v;λ_v,) > [(0;∼10);(∼3;0)]		0.50	0.014	1060	strong	vast
0	0	20	0	6	0.55	0.086	949	eased	small
1	4	20–60	0	4–6	0.57–0.71	0.060–0.080	750–960	eased	small
1	4	20–60	1	2	0.57–0.71	0.060–0.080	750–960	eased	small

γ_e	λ_e	π	γ_v	λ_v	Gini	Beta	Perf	Trade-off	Region of parameters
0	4	100	–	–	0.388	0.093	985.428	strong	vast
1	4	100	–	–	0.760	0.021	541.691	strong	vast
–	–	0	0	4	0.54	0.098	943	medium	small
–	–	0	0	6	0.60	0.050	829	eased	small
–	–	0	0	8	0.70	0.081	706	medium	small
0	4	80	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.49	0.075	1018	medium	vast
0	4	60	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.61	0.055	953	eased	vast
0	4	60↓	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		↑	↓	↓	strong	vast
0	4	40–20	0	6	0.52	0.10	929	strong	small
1	0	40–80	–	>4	0.40–0.70	0.005–0.015	842–924	strong	vast
1	0	20–40	0	4–6	0.46–0.57	0.053–0.097	890–934	eased	small
0	0	40	(γ_v;λ_v,) > [(0;∼10);(∼3;0)]		0.50	0.014	1060	strong	vast
0	0	20	0	6	0.55	0.086	949	eased	small
1	4	20–60	0	4–6	0.57–0.71	0.060–0.080	750–960	eased	small
1	4	20–60	1	2	0.57–0.71	0.060–0.080	750–960	eased	small

Table 2.

Main results for each scenario

γ_e	λ_e	π	γ_v	λ_v	Gini	Beta	Perf	Trade-off	Region of parameters
0	4	100	–	–	0.388	0.093	985.428	strong	vast
1	4	100	–	–	0.760	0.021	541.691	strong	vast
–	–	0	0	4	0.54	0.098	943	medium	small
–	–	0	0	6	0.60	0.050	829	eased	small
–	–	0	0	8	0.70	0.081	706	medium	small
0	4	80	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.49	0.075	1018	medium	vast
0	4	60	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.61	0.055	953	eased	vast
0	4	60↓	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		↑	↓	↓	strong	vast
0	4	40–20	0	6	0.52	0.10	929	strong	small
1	0	40–80	–	>4	0.40–0.70	0.005–0.015	842–924	strong	vast
1	0	20–40	0	4–6	0.46–0.57	0.053–0.097	890–934	eased	small
0	0	40	(γ_v;λ_v,) > [(0;∼10);(∼3;0)]		0.50	0.014	1060	strong	vast
0	0	20	0	6	0.55	0.086	949	eased	small
1	4	20–60	0	4–6	0.57–0.71	0.060–0.080	750–960	eased	small
1	4	20–60	1	2	0.57–0.71	0.060–0.080	750–960	eased	small

γ_e	λ_e	π	γ_v	λ_v	Gini	Beta	Perf	Trade-off	Region of parameters
0	4	100	–	–	0.388	0.093	985.428	strong	vast
1	4	100	–	–	0.760	0.021	541.691	strong	vast
–	–	0	0	4	0.54	0.098	943	medium	small
–	–	0	0	6	0.60	0.050	829	eased	small
–	–	0	0	8	0.70	0.081	706	medium	small
0	4	80	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.49	0.075	1018	medium	vast
0	4	60	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		0.61	0.055	953	eased	vast
0	4	60↓	(γ_v;λ_v,) > [(0;∼8);(∼3;0)]		↑	↓	↓	strong	vast
0	4	40–20	0	6	0.52	0.10	929	strong	small
1	0	40–80	–	>4	0.40–0.70	0.005–0.015	842–924	strong	vast
1	0	20–40	0	4–6	0.46–0.57	0.053–0.097	890–934	eased	small
0	0	40	(γ_v;λ_v,) > [(0;∼10);(∼3;0)]		0.50	0.014	1060	strong	vast
0	0	20	0	6	0.55	0.086	949	eased	small
1	4	20–60	0	4–6	0.57–0.71	0.060–0.080	750–960	eased	small
1	4	20–60	1	2	0.57–0.71	0.060–0.080	750–960	eased	small

5. Discussion of the results

Our formal model captures how different code architectures emerge from distributed innovation projects coordinated via stigmergy. In particular, we explore how code evolves when firms run a closed project, and when they decide to open it to collaborative innovation communities, or more bluntly “go open source” (MacCormack et al., 2006). In all case, they need to choose carefully the different strategies they can use to shape the collaboration between their employees and the volunteers that may collaborate with them.

Thanks to our model we are able to identify the trade-offs managers face between maximizing development speed, augmenting its performance, and moving toward a desirable team organizational structure, and how to alleviate these trade-offs.

We first find that firms may not need “to go open source” to assure high speed or performance of the development process. What really matters is allowing a mixture of self-organization and direction, giving employees autonomy to choose the task to perform in terms of module size while mildly directing them toward modules close to the root. The downside of this case is that—as one may expect—no division of labor between a core and a periphery is created.

As the literature has shown, this division of labor is crucial for distributed innovation processes such as those leveraged by open and collaborative innovation communities: while the core is dedicated, and able, to perform certain tasks, such as code development of main modules, the periphery is able to bring into the picture much more diversity and much more exploration (Rullani and Haefliger, 2013). Indeed, the same setting generating fast growth and quality leads also to quite homogenous architectures in terms of module size, something that is likely reflected in no or low stratification of developers into a core and a periphery. The lack of division of labor implies losing part of the advantages distributed innovation generates.

Firms can thus consider opening the project to volunteers’ participation precisely to increase diversity in architecture with the correspondent construction of a core–periphery structure. Should the firm decide to launch the project without employees and rely only on volunteers, the previous trade-off would be indeed solved, favoring the construction of a core of larger modules and the formation of wider periphery of smaller modules while keeping the speed and performance of the development process high. However, this would be possible only under certain conditions: only attracting a very specific typology of volunteers, those disregarding working on modules with a certain size and having a mild preference for modules close to the root, leads to such a positive result. Failing to attract almost exclusively such developers, the firm will be unable to strike the balance it is looking for.

If instead of relying only on one type of developer, the firm decides to mix employees and volunteers, it may capture the best of both worlds. In such case, the best results are obtained when employees, who should remain the majority, are pushed toward modules close to the root, while volunteers just need to be attracted by root-related modules and/or to large modules. The selection of volunteers is thus much softer than in the volunteers-only case, and can be promoted via a set of tools easy to implement (e.g., by giving more visibility on the development platform to large and root-related modules). The presence of volunteers generates the highest diversity in the distribution of module size, signaling that a core can be now clearly identified and detached from a wide periphery contributing to smaller modules, while keeping the pace of development and the performance of the code development process high. The mix of conditions determined above is very important, as the firm will be unable to reach the optimal mixture of the speed, performance, and diversity otherwise.

These results have important implications not only for theory but also for managers.

As of theory: first, we contribute to distributed innovation literature by showing that self-organization is a crucial organizational methodology that firms can implement in distributed innovation projects run internally, without opening to collaborative innovation communities, but needs to be coupled with some directions from the firm itself. While some degree of self-organization is expected from any innovation process aiming at having a distributed nature, even when internal to firms, our results give more nuance to this perception by pointing to the fact that in such projects the control of firms should be reduced in certain dimensions (e.g., avoiding setting guidelines for employees relative to module size) but exerted in others (e.g., setting guidelines to direct employees toward root-related modules). As information is “sticky” (Von Hippel, 1994), distributed innovators have the best knowledge about what they know, the problems they can solve, and how to reach a solution with their own tools (Lakhani and Von Hippel, 2003). A centralized authority will inevitably lack this knowledge, and may easily take sub-optimal decisions on how to allocate tasks. In this way, however, certain tasks, necessary but less in line with individuals’ motivations, risk to receive less attention than they deserve. Firms worrying about the performance and speed of the development process need thus to support such work by directing their employees to such tasks. Letting them free to choose their tasks while at the same time providing some directions about relevant issues that may be overlooked results in high development speed and performance.

The problem that emerges in this case is that running the project internally, involving employees only, implies less heterogeneity in the project team, and generates homogenous architecture that is at odds with an extensive division of innovative labor (Arora and Gambardella, 1994). However, the firm can regain some room for maneuvering letting volunteers join the project. With the injection of volunteers, more options are possible, and the firm can push heterogeneity in the architecture mirroring the core–periphery structure of the project team (Colfer and Baldwin, 2016). Paradoxically, letting independent and uncontrollable developers into the team allows the firm to gain more strategic options in its effort to direct the project development toward a desirable outcome.

Second, we contribute to the broader literature on the organizational role of artifacts (e.g., Orlikowski, 2007). The concept of “boundary object” (Carlile, 2002) has been applied to describe how artifacts act as coordination devices mediating (Bechky, 2003) (or failing to mediate) the relationship between different communities (D’Adderio, 2003), or as devices carrying organizational memory (Pondy and Mitroff, 1979; Cacciatori, 2008). In this stream of literature, objects emerge as “tools” that exert influence on how human actors frame their problems, share knowledge, and coordinate tasks (Becker et al., 2021) to the point of representing artifacts as the pivots of actual tacit coordination mechanisms (Srikanth and Puranam, 2011; Srikanth and Puranam, 2014). While this literature is rich and broad, it is very often disconnected from the processes that generate the artifacts themselves (Star and Griesemer, 1989; Boland and Tenkasi, 1995; Carlile, 2002; Leonardi, 2011; Turner and Rindova, 2012). The stigmergic approach we apply allows to simultaneously take into consideration artifacts’ characteristics and the processes through which those objects are developed and improved. Stigmergy is focused on the whole set of relationships between the artifact functionalities, the organizational design, and the norm of use (Majchrzak, 2009: 19). Thus, coordination through stigmergy is not embedded only in the artifact architecture, but it also emerges as a result of the collaborative production process. This multifaceted analytical lens allows us to build a model of distributed innovation in open and collaborative innovation communities that considers the object’s characteristics, the interactions each contributor has with those characteristics, and how they play together to both generate the artifact and coordinate contributors’ work at the same time.

Following this line of thoughts, we contribute to this literature also in another way.

In open source, Baldwin and Clark (2006) and MacCormack et al. (2006) argue that code is the product as well as the means through which incentives are aligned and coordination is realized. Code exhibiting high modularity reduces free-riding, enhances participation (Baldwin and Clark, 2006), and creates room for the collaboration of many independent individuals at the same time. It reduces the costs of undertaking one specific task and those associated with the effect of the local changes on other parts of the code (MacCormack et al., 2006). Ghosh and David (2008) provide empirical evidence that supports a consistent view. Using some Linux kernel versions, the authors study developers’ relations and module dependencies and find a significant level of correspondence between the two. Cataldo et al.’s (2009) Socio-Technical Congruence captures this consistency between social networks of developers and dependencies of the modules of the artifact they develop.

This is precisely the argument we capture by placing the artifact (the code) at the center of the tacit coordination mechanism (Srikanth and Puranam, 2011, 2014; Becker et al., 2021) at work during the open source development process. The artifact coevolves together with the work of its distributed producers, and its structure (architecture) is what agents both affect and use as a coordination device. Indeed, in our stigmergic process, coordination is not an exogenous scheme of behavior—possibly embedded into an artifact—imposed to the agents. Rather, it is the endogenous result of the process of joint development. We thus put forward stigmergy as an endogenous tacit coordination mechanism that may be very relevant for contexts such as distributed innovation where artifacts are at the same time coordination devices and the results of the joint production process.

Third, we showed that there is a clear trade-off between certain properties of the architectures emerging from distributed innovation processes. Fast growth and high performance of the development process are not easily coupled with high diversity in module size, something we related to the core–periphery structure that has been proved to be a key feature of distributed innovation (Dahlander and Frederiksen, 2012; Rullani and Haefliger, 2013). This result is interesting per se. Usually, division of innovative labor and core–periphery structures have been thought of as the causes of high speed and performance of code development. We find that this is true—or at least compatible with our results—only when a certain proportion of volunteers—and volunteers of a certain kind—participate in the open and collaborative innovation community. In all the other cases, a marked division of innovative labor is very difficult to achieve while keeping the speed and performance of the development process high. In this sense, division of innovative labor between a core and a periphery seems much less fundamental than expected (Dahlander and Frederiksen, 2012; Rullani and Haefliger, 2013) to assure speed and quality, which seem instead to be related much more to the self-organizing nature of the distributed innovation phenomenon.

As of managerial implications, our findings suggest to managers what leverages they can use to reach a series of goals when dealing internally with distributed innovation processes and when taking into account the idea of opening to a broader collaborative innovation community of volunteers, whose selection can be controlled only indirectly (e.g., via interface design). We showed that firms can build strategies maneuvering the incentives they provide their employees with in terms of pushing them toward developing certain parts of the architectures rather than others, controlling the proportion of their employees over volunteers, and possibly attracting volunteers with certain contribution preferences rather than others.

None of these strategies is easy to implement, as they are all based on an indirect influence on the project development. However, there are tools to realize such influence. The literature on online communities has shown that the way the digital interaction environment is shaped has an enormous influence on the capability to mobilize community members in one direction or another. For example, Foss et al. (2021) demonstrate that an active community environment (lively forum discussions, provision of bug reports and patches, and so on) activates community members. A firm could allocate part of the time of its employees to openly discuss issues and set up the collaboration space in a way that such discussions are easily available to anyone. Moreover, Rullani and Haefliger (2013) discuss how such provision of artifacts from the key developers of the projects can activate a larger periphery of members. Finally, both Foss et al. (2016) and Foss et al. (2021) show that the way the interaction space is designed (whether mailing lists are grouped or not, how discussions are threaded in forums, whether the main interaction channel is code or forum messages, and so on) greatly influences what kind of contribution is triggered. For example, Foss et al. (2016) show that favoring interaction via messages fosters developers’ creation of new projects, that in our context may be paralleled by the creation of new modules to tackle new issues, while interaction via pieces of code pushes exploitation, i.e., developers’ adherence to existing project/modules. Thus, firms can design the interaction space to activate developers in the periphery of their projects (thus changing the balance between employees and volunteers) and to favor contributions to certain kind of modules rather than others (thus promoting self-selection of contributors with specific preferences). These tools must enter the toolbox of any manager designing strategies for distributed innovation.

Notice that this argument is in line with, and indeed gives more substance to, the call we initially mentioned—from Anderson (1999) to Grant (2008)—for studies on how managers can indirectly affect the innovation spaces where their internal resources are engaged into a distributed innovation process with external noncontrollable resources. In this respect, we also offer to managers a map to navigate different strategies in line with the firm’s main objectives. Indeed, a firm may lean toward the maximization of one or two of the indexes we identified or have constraints (e.g., a limit in the capability to attract volunteers, Giordani et al., 2018) and aims (e.g., diffusing the product among users by attracting many volunteers) that are outside the scope of the model. Table 2 presents some estimates of the relative gain and loss for each strategy the firm can apply, offering thus a menu manager can choose from to guide the firm toward the desired outcome.

6. Conclusion, current limitations, and possible future lines of research

In this paper, we have investigated how firms can improve the pace and performance of project development and the division of innovative labor (Arora and Gambardella, 1994) between the core and periphery of the project team (Dahlander and Frederiksen, 2012; Rullani and Haefliger, 2013) when engaged in distributed innovation projects related to open and collaborative innovation communities (Baldwin and von Hippel, 2011). We claim firms need to give away most part of the control they can have on this process but should still push it toward better combinations of speed, performance, and diversity using indirect leverages: influencing rather than fully determining their employees’ task choices, allocating the right amount of employees to the project, and attracting volunteers with certain preferences in terms of tasks to be performed. We develop a formal model to investigate the tension between the variables sketched above in the setting of OSS. The model describes how employees interact with volunteers in the creation of open source code and uses the emergent architectures to evaluate the speed and performance of development and the size distribution of the modules (mirroring the presence of a periphery developing smaller modules and a core acting on the largest ones, Colfer and Baldwin, 2016). We find that a firm could develop distributed innovation projects in-house assuring speed and performance but trading those off with the distribution of innovative labor within the project team. Letting volunteers in to form mixed teams allows the firm to ease this trade-off and, under certain conditions, to strike a good balance between speed, performance, and team core–periphery organization.

As every paper, also this work suffers from several limitations. First, we derived the building blocks of our model from a discussion of open and collaborative innovation communities, distributed innovation and their properties, studied their dynamic connections via a simulation exercise, and retrieved some results. However, we did not run any empirical test to verify our conclusions. This is certainly a promising avenue for further research. It is interesting to notice that both a qualitative approach, e.g., providing case studies that could be compared to the different scenarios we produced, as well as quantitative investigations, for example measuring Beta, Gini, and Perf for different projects and correlating them with project success, could be undertaken along this line.

Second, we underplay the role of direct communication. Even if some open source projects heavily use tools built around the code that foster artifact-centered interaction (such as versioning systems), many other open source projects and other distributed innovation instances rely heavily also on direct communication, via mailing lists for example. Including this type of communication in the model certainly represent an interesting expansion of our research.

Another avenue for further research is related to the possibility of adapting the model to instances of distributed innovation other than OSS. Despite similarities, for example in the role of large modules (Aaltonen and Seiler, 2016), Wikipedia has a radically different architecture that could be captured by fine-tuning the way the components of the model interact. The limitation due to the use of OSS as the setting of our analysis could be overcome by allowing the model to acquire the characteristics of other instances of open and collaborative innovation communities, and then comparing them, also with respect to the present paper.

Finally, the strategic behavior of the agents can also be modeled in a more nuanced way. For example, firms may be allowed to change their strategy over time, adapting it to the different proportion of developers working on the project; volunteers may be attracted not only by the relative characteristics of each module but also by the relative presence within the overall architecture of modules they are mostly attracted to (capturing the degree of homophily with the type of developers attracted so far). Such improvements in the definition of actors’ behavioral rules may certainly create different dynamics and can lead to interesting comparative exercises when contrasted with our results, allowing the identification of the specific role of the newly added component.

Footnotes

1

As an example, see the projects SQuirreL SQL Client, presented in the SourceForge home page as one of the most popular projects in August 2021: https://sourceforge.net/p/squirrel-sql/git/ci/ObjectTreeFindProgress/tree/.

2

Operationally, we adapt a tool built in an earlier line of work (Dalle and David, 2005; 2005), where however no firms nor employees were present, where no lines of code were employed to measure module size, and where open source developers’ motivations were the main point of discussion.

3

We tested all combinations of λ and γ within the set [λ; γ] = [0–20;0–10]. Moreover, in all these simulations, developers’ effort α was considered stochastic, with a distribution following equation (1) in the main text of the paper. Please see the next paragraph for more details on the meaning of α.

4

Notice that our analysis accounts also for the path-dependent properties of code development. The presence (or lack) of newly created modules in the earliest cycles changes the average number of modules per cycle more than what creation of later modules does. Early creation of modules (or the lack of it) thus is more important in determining the Beta than later developments, representing the inherent dependence from initial conditions of code evolution.

5

In particular, for the root we have: |$ {r_0}\left( {{x_{\rm{root}}}} \right) = {v_0}\left( {{x_{\rm{root}}}} \right)\left[ {{{\left( {1 + {x_{\rm{root}}}} \right)}^\gamma }} \right]$| and |$\forall m \ne \rm{root}:{r_m}\left( {{x_m}} \right) \to 0 $|as |$\lambda \to + \infty $|⁠.

6

For m’, the potential module associated with m, we have by construction: |$\forall m{^{^{\prime}}}:\;{x_{m{^{^{\prime}}}}} = 0 = {r_{m{^{^{\prime}}}}}\left( {{x_{m{^{^{\prime}}}}}} \right)$| and |$\forall m{^{^{\prime}}}{\rm{ the potential module associated with module }}m:\;{\rho _{m{^{^{\prime}}}}}\left( \alpha \right) = {r_{m{^{^{\prime}}}}}\left( \alpha \right)$|⁠. Given this, since the distance from the root of a potential module is the distance of its parent module plus one (by construction), the previous equations imply because |$\forall m:\left[ {{{\left( {1 + {x_{m^{\prime}}}} \right)}^\gamma }} \right] = 1 $|since |$\forall m:{x_{m^{\prime}}} = 0$|⁠.

7

We use those obtained for (γ_e = 0; λ_e = 0) simply as a reference point.

REFERENCES

Aaltonen

A.

and

S.

Seiler

(

2016

), ‘

Cumulative growth in user-generated content production

,’

Management Science

,

62

(

7

),

2054

–

2069

.

Afuah

A.

and

C. L.

Tucci

(

2012

), ‘

Crowdsourcing as a solution to distant search

,’

Academy of Management Review

,

37

(

3

),

355

–

375

.

Ahuja

G.

(

2000

), ‘

Collaboration networks, structural holes, and innovation: a longitudinal study

,’

Administrative Science Quarterly

,

45

(

3

),

425

–

455

.

Aksoy-Yurdagul

D.

,

F.

Rullani

and

C.

Rossi-Lamastra

(

2021

), ‘

Designing shared spaces for firm-community collaborations for innovation: formal policies and coordination in open source projects

,’

Creativity and Innovation Management

,

30

(

1

),

164

–

181

.

Alexy

O.

,

J.

Henkel

and

M. W.

Wallin

(

2013

), ‘

From closed to open: job role changes, individual predispositions, and the adoption of commercial open source software development

,’

Research Policy

,

42

(

8

),

1325

–

1340

.

Alexy

O.

,

J.

West

,

H.

Klapper

and

M.

Reitzig

(

2018

), ‘

Surrendering control to gain advantage: reconciling openness and the resource-based view of the firm

,’

Strategic Management Journal

,

39

(

6

),

1704

–

1727

.

Anderson

P.

(

1999

), ‘

Complexity theory and organization science

,’

Organization Science

,

10

(

3

),

216

–

232

.

Anderson

S. P.

,

A.

de Palma

and

J.-F.

Thisse

(

1992

),

Discrete Choice Theory of Product Differentiation

.

MIT Press

:

Cambridge, MA

.

Arora

A.

and

A.

Gambardella

(

1994

), ‘

The changing technology of technological change: general and abstract knowledge and the division of innovative labour

,’

Research Policy

,

23

(

5

),

523

–

532

.

Bagozzi

R. P.

and

U. M.

Dholakia

(

2006

), ‘

Open source software user communities: a study of participation in Linux user groups

,’

Management Science

,

52

(

7

),

1099

–

1115

.

Baldwin

C.

,

C.

Hienerth

and

E.

von Hippel

(

2006

), ‘

How user innovations become commercial products: a theoretical investigation and case study

,’

Research Policy

,

35

(

9

),

1291

–

1313

.

Baldwin

C.

and

E.

von Hippel

(

2011

), ‘

Modeling a paradigm shift: from producer innovation to user and open collaborative innovation

,’

Organization Science

,

22

(

6

),

1399

–

1417

.

Baldwin

C. Y.

and

K. B.

Clark

(

2006

), ‘

The architecture of participation: does code architecture mitigate free riding in the open source development model?

’

Management Science

,

52

(

7

),

1116

–

1127

.

Bechky

B.

(

2003

), ‘

Sharing meaning across occupational communities: the transformation of understanding on the production floor

,’

Organization Science

,

14

(

3

),

312

–

330

.

Becker

M. A.

,

F.

Rullani

and

F.

Zirpoli

(

2021

), ‘

The role of digital artefacts in early stages of distributed innovation processes

,’

Research Policy

,

50

(

10

), 104349.

Belenzon

S.

and

M.

Schankerman

(

2015

), ‘

Motivation and sorting of human capital in open innovation

,’

Strategic Management Journal

,

36

(

6

),

795

–

820

.

Bogers

M.

and

J.

West

(

2012

), ‘

Managing distributed innovation: strategic utilization of open and user innovation

,’

Creativity and Innovation Management

,

21

(

1

),

61

–

75

.

Boland

R. J.

and

R. V.

Tenkasi

(

1995

), ‘

Perspective making and perspective taking in communities of knowing

,’

Organization Science

,

6

(

4

),

350

–

372

.

Bolici

F.

,

J.

Howison

and

K.

Crowston

(

2009

), ‘

Coordination without discussion? Socio-technical congruence and stigmergy in free and open source software projects

,’

Workshop on Socio-Technical Congruence

.

Vancouver, BC

,

May

.

Bolici

F.

,

J.

Howison

and

K.

Crowston

(

2016

), ‘

Stigmergic coordination in FLOSS development teams: integrating explicit and implicit mechanisms

,’

Cognitive Systems Research

,

38

,

14

–

22

.

Bonabeau

E.

,

M.

Dorigo

and

G.

Theraulaz

(

2000

), ‘

Inspiration for optimization from social insert behavior

,’

Nature

,

406

(

6791

),

39

–

42

.

Bonaccorsi

A.

,

S.

Giannangeli

and

C.

Rossi

(

2006

), ‘

Entry strategies under competing standards: hybrid business models in the open source software industry

,’

Management Science

,

52

(

7

),

1085

–

1098

.

Boudreau

K. J.

(

2010

), ‘

Open platform strategies and innovation: granting access vs. devolving control

,’

Management Science

,

56

(

10

),

1849

–

1872

.

Boudreau

K. J.

,

L. B.

Jeppesen

,

T.

Reichstein

and

F.

Rullani

(

2021

), ‘

Crowdfunding as donations to entrepreneurial firms

,’

Research Policy

,

50

(

7

), 104264.

Breukner

S. A.

and

H. V. D.

Parunak

(

2002

), ‘

Swarming agents for distributed pattern detection and classification

,’

Proc. Workshop on Ubiquitous Computing. First Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 02)

.

Bologna, Italy

.

August

.

Cacciatori

E.

(

2008

), ‘

Memory objects in project environments: storing, retrieving and adapting learning in project-based firms

,’

Research Policy

,

37

(

9

),

1591

–

1601

.

Carlile

P. R.

(

2002

), ‘

A pragmatic view of knowledge and boundaries: boundary objects in new product development

,’

Organization Science

,

13

(

4

),

442

–

455

.

Cataldo

M.

,

A.

Mockus

,

J. A.

Roberts

and

J. D.

Herbsleb

(

2009

), ‘

Software dependencies, work dependencies, and their impact on failures

,’

IEEE Transactions on Software Engineering

,

35

(

6

),

864

–

878

.

Chen

J.

,

Y.

Ren

and

J.

Riedl

(

2010

), ‘

The effects of diversity on group productivity and member withdrawal in online volunteer groups

,’

Proc. CHI2010

.

ACM Press

, Atlanta, Georgia, pp.

821

–

830

.

Colfer

L. J.

and

C. Y.

Baldwin

(

2016

), ‘

The mirroring hypothesis: theory, evidence, and exceptions

,’

Industrial and Corporate Change

,

25

(

5

),

709

–

738

.

Corbet

J.

,

G.

Kroah-Hartman

and

A.

McPherson

(

2012

), ‘

Linux Kernel Development. How Fast it is Going, Who is Doing It, What They are Doing, and Who is Sponsoring It

,’

The Linux Foundation

, March 2012, https://flosshub.org/content/linux-kernel-development-how-fast-it-going-who-doing-it-what-they-are-doing-and-who-sponsori.

Crowston

K.

and

J.

Howison

(

2006

), ‘

Hierarchy and centralization in free and open source software team communications

,’

Knowledge, Technology & Policy

,

18

(

4

),

65

–

85

.

Crowston

K.

,

C.

Østerlund

,

J.

Howison

and

F.

Bolici

(

2017

), ‘

Work features to support stigmergic coordination in distributed teams

,’

Academy of Management Proceedings

,

2017

(

1

), 14409.

Crowston

K.

and

A.

Rezgui

(

2020

), ‘

Effects of stigmergic and explicit coordination on Wikipedia article quality

,’

Proceedings of HICSS (Hawaii International Conference on System Science)

.

Wailea, HI

.

Crowston

K.

,

J. S.

Saltz

,

A.

Rezgui

,

Y.

Hegde

and

S.

You

(

2019

), ‘

Socio-technical affordances for stigmergic coordination implemented in MIDST, a tool for data-science teams

,’

Proceedings of the ACM on Human-Computer Interaction, 3(CSCW)

, p. 117. 10.1145/3359219.

D’Adderio

L.

(

2003

), ‘

Configuring software, reconfiguring memories: the influence of integrated systems on the reproduction of knowledge and routines

,’

Industrial and Corporate Change

,

12

(

2

),

321

–

350

.

Dahlander

L.

and

L.

Frederiksen

(

2012

), ‘

The core and cosmopolitans: a relational view of innovation in user communities

,’

Organization Science

,

23

(

4

),

988

–

1007

.

Dahlander

L.

,

L.

Frederiksen

and

F.

Rullani

(

2008

), ‘

Online communities and open innovation: governance and symbolic value creation

,’

Industry and Innovation

,

15

(

2

),

115

–

123

.

Dahlander

L.

and

M. G.

Magnusson

(

2005

), ‘

Relationships between open source software companies and communities: observations from Nordic firms

,’

Research Policy

,

34

(

4

),

481

–

493

.

Dahlander

L.

and

M. W.

Wallin

(

2006

), ‘

A man on the inside: unlocking communities as complementary assets

,’

Research Policy

,

35

(

8

),

1243

–

1259

.

Dalle

J.-M.

P. A.

David

(

2005

), ‘The allocation of software development resources in ‘open source’ production mode,’ in

J.

Feller

,

B.

Fitzgerald

,

S. A.

Hissam

and

K. R.

Lakhani

(eds),

Perspectives on Free and Open Source Software

.

MIT Press

:

Cambridge, MA

, pp.

297

–

328

.

Dalle

J.-M.

P. A.

David

(

2008

), ‘Simulating code growth in Libre (open source) mode,’ in

E.

Brousseau

and

N.

Curien

(eds),

Internet and Digital Economics: Principles, Methods and Applications

.

Cambridge University Press

:

New York

, pp.

391

–

421

.

Dalle

J.-M.

P. A.

David

R. A.

Ghosh

W. E.

Steinmueller

(

2005

), ‘Advancing economic research on the free and open source software mode of production,’ in

M.

Wynants

and

J.

Cornelis

(eds),

How Open Will the Future Be? Social and Cultural Scenarios Based on Open Standards and Open-Source Software

.

VUB Press

:

Brussels

, 395–426.

Dalle

J.-M.

and

N.

Jullien

(

2003

), ‘

‘Libre’ software: turning fads into institutions?

’

Research Policy

,

32

(

1

),

1

–

11

.

Daniel

S. L.

,

L. M.

Maruping

,

M.

Cataldo

and

J.

Herbsleb

(

2018

), ‘

The impact of ideology misfit on open source software communities and companies

,’

MIS Quarterly

,

42

(

4

),

1069

–

1096

.

David

P. A.

(

1985

), ‘

Clio and the economics of QWERTY

,’

The American Economic Review

,

75

(

2

),

332

–

337

.

David

P. A.

and

F.

Rullani

(

2008

), ‘

Dynamics of innovation in an open source collaboration environment: lurking, laboring and launching FLOSS projects on SourceForge

,’

Industrial and Corporate Change

,

17

(

4

),

647

–

710

.

David

P. A.

and

J. S.

Shapiro

(

2008

), ‘

Community-based production of open source software: what do we know about the developers who participate?

’

Information Economics and Policy

,

20

(

4

),

364

–

398

.

David

P. A.

,

A. H.

Waterman

and

S.

Arora

(

2003

), ‘

FLOSS-US: The Free/Libre Open Source Software Survey for 2003 Policy paper, Stanford Institute for Economic Policy Research

,’

Stanford, CA

. (

September

) http://www.stanford.edu/group/floss-us/report/FLOSS-US-Report.pdf (Accessed

28 April, 2010

).

Dell’Era

C.

,

A.

Di Minin

,

G.

Ferrigno

,

F.

Frattini

,

P.

Landoni

and

R.

Verganti

(

2020

), ‘

Value capture in open innovation processes with radical circles: a qualitative analysis of firms’ collaborations with slow food, memphis, and free software foundation

,’

Technological Forecasting and Social Change

,

158

, 120–128.

den Besten

M. L.

,

J.-M.

Dalle

and

F.

Galia

(

2008

), ‘

The allocation of collaborative efforts in open-source software

,’

Information Economics and Policy

,

20

(

4

),

316

–

322

.

Elliott

M.

(

2006

), ‘

Stigmergic collaboration: the evolution of group work

,’

M/C Journal

,

9

(

2

). 10.5204/mcj.2599.

Faraj

S.

,

S. L.

Jarvenpaa

and

A.

Majchrzak

(

2011

), ‘

Knowledge collaboration in online communities

,’

Organization Science

,

22

(

5

),

1224

–

1239

.

Feller

J.

and

B.

Fitzgerald

(

2002

),

Understanding Open Source Software Development

,

Addison-Wesley

:

London, UK

.

Fitzgerald

B.

(

2005

), ‘

The transformation of open source software

,’

MIS Quarterly

,

30

(

3

),

587

–

598

.

Fosfuri

A.

,

M. S.

Giarratana

and

A.

Luzzi

(

2008

), ‘

The penguin has entered the building: the commercialization of open source software products

,’

Organization Science

,

19

(

2

),

292

–

305

.

Fosfuri

A.

,

M. S.

Giarratana

and

E.

Roca

(

2011

), ‘

Community-focused strategies

,’

Strategic Organization

,

9

(

3

),

222

–

239

.

Foss

N.

,

L.

Frederiksen

and

F.

Rullani

(

2016

), ‘

Problem-formulation and problem-solving in self-organized communities: how modes of communication shape project behaviors in the free open source software community

,’

Strategic Management Journal

,

37

(

13

),

2589

–

2610

.

Foss

N.

,

L. B.

Jeppesen

and

F.

Rullani

(

2021

), ‘

How context and attention shape behaviors in online communities: a modified garbage can model

,’

Industrial and Corporate Change

,

30

(

1

),

1

–

18

.

Garud

R.

,

S.

Jain

and

A.

Kumaraswamy

(

2002

), ‘

Institutional entrepreneurship in the sponsorship of common technological standards: the case of Sun Microsystems and Java

,’

Academy of Management Journal

,

45

,

196

–

214

.

Ghosh

R.

,

K.

Haaland

and

B. H.

Hall

(

2008

), ‘

Which firms participate in open source software development? A study using data from Debian

,’

Presented at the conference: DIME - DRUID Fundamental on Open and Proprietary Innovation Regimes

.

Copenhagen

,

June

pp.

17

–

20

.

Ghosh

R. A.

and

P. A.

David

(

2003

), ‘

The nature and composition of the Linux kernel developer community: a dynamic analysis

,’

Stanford Institute for Economic Policy Research, Project NOSTRA Working paper

.

Stanford, CA

.

21

February

.

Ghosh

R. A.

and

P. A.

David

(

2008

), ‘

Relating social structure to technical structure: Findings from the Linux kernel, SIEPR-NOSTRA Working Paper, Stanford University (May)

,’

Presented at the DIME - DRUID Fundamental on Open and Proprietary Innovation Regimes: Opportunities and limitations of the open source models of innovation and the role of intellectual property rights

.

Copenhagen Business School

,

Copenhagen, Denmark

,

17

June

p. 17.

Ghosh

R. A.

,

R.

Glott

,

B.

Kreiger

and

G.

Robles

(

2002

), ‘

The free/libre and open source software developers survey and study

,’

Final Report

,

International Institute of Infonomics

,

June

.

Giordani

P. E.

,

F.

Rullani

and

L.

Zirulia

(

2018

), ‘

Endogenous growth of open collaborative innovation communities: a supply-side perspective

,’

Industrial and Corporate Change

,

27

(

4

),

745

–

762

.

Giuri

P.

,

M.

Ploner

,

F.

Rullani

and

S.

Torrisi

(

2010

), ‘

Skills, division of labor and performance in collective inventions: evidence from open source software

,’

International Journal of Industrial Organization

,

28

(

1

),

54

–

68

.

Grant

R. M.

(

2008

), ‘

The future of management: where is Gary Hamel leading us?

’

Long Range Planning

,

41

(

5

),

469

–

482

.

Grassé

P.-P.

(

1959

), ‘

La reconstruction du nid et les coordinations inter-individuelles chez Bellicositermes natalensis et Cubitermes sp. La théorie de la stigmergie: Essai d’interprétation du comportement des termites constructeurs

,’

Insectes Sociaux

,

6

(

1

),

41

–

81

.

Gruber

M.

and

J.

Henkel

(

2006

), ‘

New ventures based on open innovation—an empirical analysis of start-up firms in embedded Linux

,’

International Journal of Technology Management

,

33

(

4

),

356

–

372

.

Gulati

R.

,

P.

Puranam

and

M.

Tushman

(

2012

), ‘

Meta-organization design: rethinking design in interorganizational and community contexts

,’

Strategic Management Journal

,

33

(

6

),

571

–

586

.

Haefliger

S.

,

E.

Monteiro

,

D.

Foray

and

G.

von Krogh

(

2011

), ‘

Social software and strategy

,’

Long Range Planning

,

44

(

297

), 316.

Haefliger

S.

,

G.

von Krogh

and

S.

Spaeth

(

2008

), ‘

Code reuse in open source software

,’

Management Science

,

54

(

1

),

180

–

193

.

Harison

E.

and

H.

Koski

(

2010

), ‘

Applying open innovation in business strategies: evidence from Finnish software firms

,’

Research Policy

,

39

(

3

),

351

–

359

.

Henkel

J.

(

2006

), ‘

Selective revealing in open innovation processes: the case of embedded Linux

,’

Research Policy

,

35

(

7

),

953

–

969

.

Henkel

J.

,

S.

Schöberl

and

O.

Alexy

(

2013

), ‘

The emergence of openness: how and why firms adopt selective revealing in open innovation

,’

Research Policy

, forthcoming,

43

,

879

–

890

.

Herraiz

I.

,

J. M.

Gonzalez-Barahona

and

G.

Robles

(

2007

), ‘

Towards a theoretical model for software growth

,’

Proceedings of the Fourth International Workshop on Mining Software Repositories

.

IEEE Computer Society

, Minneapolis, MN, p. 21.

Herraiz

I.

,

G.

Robles

,

J. J.

Amor

,

T.

Romera

and

J. M. G.

Barahona

(

2006

), ‘

The processes of joining in global distributed software projects

,’

Proceedings of the 2006 international workshop on Global software development for the practitioner

, Shanghai, China, pp.

27

–

33

.

Heylighen

F.

(

2007

), ‘Why is open access development so successful? Stigmergic organization and the economics of information,’

B.

Lutterbeck

,

M.

Bärwolff

and

R. A.

Gehring

,

Open Source Jahrbuch 2007

.

Lehmanns Media

:

Berlin, Germany

, pp.

165

–

180

.

Howison

J.

and

K.

Crowston

(

2014

), ‘

Collaboration through open superposition: a theory of the open source way

,’

MIS Quarterly

,

38

(

1

),

29

–

A9

.

Jeppesen

L. B.

and

L.

Frederiksen

(

2006

), ‘

Why firm-established user communities work for innovation? The personal attributes of innovative users in the case of computer-controlled music instruments

,’

Organization Science

,

17

(

1

),

45

–

64

.

Jeppesen

L. B.

and

K. R.

Lakhani

(

2010

), ‘

Marginality and problem solving effectiveness in broadcast search

,’

Organization Science

,

21

(

5

),

1016

–

1033

.

Kittur

A.

,

B.

Suh

,

B. A.

Pendleton

and

E. H.

Chi

(

2007

), ‘

He says, she says: conflict and coordination in Wikipedia

,’

Proc. of CSCW2007

.

ACM Press

, Melbourne, VIC, Australia, pp.

453

–

462

.

Koch

S.

and

G.

Schneider

(

2002

), ‘

Effort, cooperation and coordination in an open source software project: GNOME

,’

Information Systems Journal

,

12

(

1

),

27

–

42

.

Krishnamurthy

S.

(

2002

), ‘

Cave or community? An empirical examination of 100 mature open source projects

,’

First Monday

,

7

(

6

).

Lakhani

K. R.

and

E.

Von Hippel

(

2003

), ‘

How open source software works: “free” user-to-user assistance

,’

Research Policy

,

32

(

6

),

923

–

943

.

Lakhani

K. R.

R. G.

Wolf

(

2005

), ‘Why hackers do what they do: understanding motivations and effort in free/ open source software projects,’ in

J.

Feller

,

B.

Fitzgerald

,

S. A.

Hissam

and

K. R.

Lakhani

,

Perspectives on Free and Open Source Software

.

MIT Press

:

Cambridge, MA

, pp.

3

–

21

.

Langlois

R.

and

G.

Garzarelli

(

2008

), ‘

Of hackers and hairdressers: modularity and the organizational economics of open-source collaboration

,’

Industry and Innovation

,

15

(

2

),

125

–

143

.

Lehman

M. M.

(

1980

), ‘

Programs, life cycles and laws of software evolution

,’

Proceedings of the IEEE

,

68

(

9

),

1060

–

1078

.

Leonardi

P. M.

(

2011

), ‘

Innovation blindness: culture, frames, and cross-boundary problem construction in the development of new technology concepts

,’

Organization Science

,

22

(

2

),

347

–

369

.

Lerner

J.

and

J.

Tirole

(

2002

), ‘

Some simple economics of open source

,’

The Journal of Industrial Economics

,

50

(

2

),

197

–

234

.

Levine

S. S.

and

M. J.

Prietula

(

2014

), ‘

Open collaboration for innovation: principles and performance

,’

Organization Science

,

25

(

5

),

1414

–

1433

.

Levinthal

D. A.

(

1997

), ‘

Adaptation on rugged landscapes

,’

Management Science

,

43

(

7

),

934

–

950

.

MacCormack

A.

,

J.

Rusnak

and

C. Y.

Baldwin

(

2006

), ‘

Exploring the structure of complex software designs: an empirical study of open source and proprietary code

,’

Management Science

,

52

(

7

),

1015

–

1030

.

Madey

G. R.

,

V. W.

Freeh

and

R. O.

Tynan

(

2002

), ‘

Agent-based modeling of open source using swarm

,’

Proceedings of the Americas Conference on Information Systems

.

AMCIS 2002

:

Dallas, TX

,

August

.

Majchrzak

A.

(

2009

), ‘

Comment: where is the theory in wikis?

’

MIS Quarterly

,

33

(

1

),

18

–

20

.

Majchrzak

A.

and

A.

Malhotra

(

2019

),

Unleashing the Crowd Collaborative Solutions to Wicked Business and Societal Problems

,

Palgrave Macmillan, London, United Kingdom

.

Marengo

L.

,

G.

Dosi

,

P.

Legrenzi

and

C.

Pasquali

(

2000

), ‘

The structure of problem-solving knowledge and the structure of organizations

,’

Industrial and Corporate Change

,

9

(

4

),

757

–

788

.

Mehra

A.

and

V.

Mookerjee

(

2012

), ‘

Human capital development for programmers using open source software

,’

MIS Quarterly

,

36

(

1

),

107

–

122

.

Midha

V.

,

P.

Palvia

,

R.

Singh

and

N.

Kshetri

(

2010

), ‘

Improving open source software maintenance

,’

Journal of Computer Information Systems

,

50

(

3

),

81

–

90

.

Mihm

J.

,

C. H.

Loch

,

D.

Wilkinson

and

B. A.

Huberman

(

2010

), ‘

Hierarchical structure and search in complex organizations

,’

Management Science

,

56

(

5

),

831

–

848

.

Narduzzo

A.

and

A.

Rossi

(

2005

), ‘

The Role of Modularity in Free/Open Source Software Development, in S. Koch (ed)

,’

Free/Open Software Development

Idea Group, p.

84

–

102

.

Narduzzo

A.

and

A.

Rossi

(

2008

),

Modularity in Action: GNU/Linux and Free/open Source Software Development Model Unleashed

,

Department of Computer and Management Sciences

:

University of Trento, Italy

.

Neary

D.

and

V.

David

(

2010

), ‘

The GNOME census: who writes GNOME?

’

Neary Consulting

.

O’Mahony

S.

(

2003

), ‘

Guarding the commons

,’

Research Policy

,

32

(

7

),

1179

–

1198

.

O’Mahony

S.

and

B. A.

Bechky

(

2008

), ‘

Boundary organizations: enabling collaboration among unexpected allies

,’

Administrative Science Quarterly

,

53

(

3

),

422

–

459

.

Orlikowski

W.

(

2007

), ‘

Sociomaterial practices: exploring technology at work

,’

Organization Studies

,

28

(

9

),

1435

–

1448

.

Parmentier

G.

and

V.

Mangematin

(

2020

), ‘

Orchestrating innovation with user communities in the creative industries

,’

Technological Forecasting and Social Change

,

83

(

2014

),

40

–

53

.

Pondy

L. R.

I. I.

Mitroff

(

1979

), ‘Beyond open system models of organization,’ in

B. M.

Staw

,

Research in Organizational Behavior

.

Greenwich, Conn.; London: JAI Press

, pp.

3

–

39

.

Raymond

E. S.

(

1998

),

The Cathedral & the Bazaar: Musings on Linux and Open Source by an Accidental Revolutionary

,

O’Reilly Associates

:

Sebastopol, CA

.

Robles

G.

,

J. J.

Amor

,

J. M.

Gonzalez-Barahona

and

I.

Herraiz

(

2005a

), ‘

Evolution and growth in large libre software projects

,’

Proceedings of the 8th International Workshop on Principles of Software Evolution

.

IEEE Computer Society

:

LosAlmitos, CA

, pp.

165

–

174

, Lisbon, Portugal, 5–6

September

.

Robles

G.

,

J. M.

Gonzalez-Barahona

,

M.

Michlmayr

and

J. J.

Amor

(

2006

), ‘

Mining large software compilations over time: another perspective of software evolution

,’

Proceedings of the 2006 international workshop on Mining software repositories

.

ACM

, Shanghai, China, pp.

3

–

9

.

Robles

G.

,

J. J.

Merelo

and

J. M.

Gonzalez-Barahona

(

2005b

), ‘

Self-organized development in libre software: a model based on the stigmergy concept

,’ Proceedings of the 6^th International Conference on Software Modeling and Simulation.

St. Louis, MO

,

May

.

Ross

S. M.

(

2003

),

Introduction to Probability Models

, 8th edn.

Academic Press

:

New York

.

Rullani

F.

and

S.

Haefliger

(

2013

), ‘

The periphery on stage: the intra-organizational dynamics in online communities of creation

,’

Research Policy

,

42

(

4

),

941

–

953

.

Scacchi

W.

(

2006

), ‘Understanding open source software evolution,’ in

N. H.

Madhavji

,

M. M.

Lehman

,

J. F.

Ramil

and

D.

Perry

(eds),

Software Evolution and Feedback

.

John Wiley and Sons Inc

:

New York

, 181–206.

Setia

P.

,

B.

Rajagopalan

,

V.

Sambamurthy

and

R.

Calantone

(

2012

), ‘

How peripheral developers contribute to open-source software development

,’

Information Systems Research

,

23

(

1

),

144

–

163

.

Shah

S. K.

(

2006

), ‘

Motivation, governance, and the viability of hybrid forms in open source software development

,’

Management Science

,

52

(

7

),

1000

–

1014

.

Smith

N.

,

A.

Capiluppi

and

J.

Fernandez-Ramil

(

2006

), ‘

Agent-based simulation of open source evolution

,’

Software Process: Improvement and Practice

,

11

(

4

),

423

–

434

.

Sojer

M.

and

J.

Henkel

(

2010

), ‘

Code reuse in open source software development: quantitative evidence, drivers, and impediments

,’

Journal of the Association for Information Systems

,

11

(

12

),

868

–

901

.

Spaeth

S.

,

M.

Stuermer

and

G.

Von Krogh

(

2010

), ‘

Enabling knowledge creation through outsiders: towards a push model of open innovation

,’

International Journal of Technology Management

,

52

(

3

),

411

–

431

.

Srikanth

K.

and

P.

Puranam

(

2011

), ‘

Integrating distributed work: comparing task design, communication, and tacit coordination mechanisms

,’

Strategic Management Journal

,

32

(

8

),

849

–

875

.

Srikanth

K.

and

P.

Puranam

(

2014

), ‘

The firm as a coordination system: evidence from software services offshoring

,’

Organization Science

,

25

(

4

),

1253

–

1271

.

Stam

W.

(

2009

), ‘

When does community participation enhance the performance of open source software companies?

’

Research Policy

,

38

(

8

),

1288

–

1299

.

Star

S. L.

and

J. R.

Griesemer

(

1989

), ‘

Institutional ecology, ‘translations’ and boundary objects: amateurs and professionals in Berkeley’s museum of vertebrate zoology, 1907–39

,’

Social Studies of Science

,

19

(

3

),

387

–

420

.

Tajedin

H.

,

A.

Madhok

and

M.

Keyhani

(

2019

), ‘

A theory of digital firm-designed markets: defying knowledge constraints with crowds and marketplaces

,’

Strategy Science

,

4

(

4

),

323

–

342

.

Turner

S. F.

and

V.

Rindova

(

2012

), ‘

A balancing act: how organizations pursue consistency in routine functioning in the face of ongoing change

,’

Organization Science

,

23

(

1

),

24

–

46

.

Turski

W. M.

(

1996

), ‘

Reference model for smooth growth of software systems

,’

IEEE Transactions on Software Engineering

,

22

(

8

),

599

–

600

.

Von Hippel

E.

(

1994

), ‘

“Sticky information” and the locus of problem solving: implications for innovation

,’

Management Science

,

40

(

4

),

429

–

439

.

Von Hippel

E.

(

2005

), ‘Open source software projects as user innovation networks,’ in

J.

Feller

,

B.

Fitzgerald

,

S.

Hissam

and

K.

Lakhani

,

Perspectives on Open Source Software

.

MIT Press

:

Cambridge, MA

, pp.

267

–

278

.

von Hippel

E.

and

G.

von Krogh

(

2003

), ‘

Open source software and the ‘private-collective’ innovation model: issues for organization science

,’

Organization Science

,

14

(

2

),

209

–

223

.

von Krogh

G.

,

S.

Haefliger

,

S.

Spaeth

and

M. W.

Wallin

(

2012

), ‘

Carrots and rainbows: motivation and social practice in open source software development

,’

MIS Quarterly

,

36

(

2

),

649

–

676

.

von Krogh

G.

and

E.

von Hippel

(

2006

), ‘

The promise of research on open source software

,’

Management Science

,

52

(

7

),

975

–

983

.

West

J.

and

S.

O’Mahony

(

2008

), ‘

The role of participation architecture in growing sponsored open source communities

,’

Industry and Innovation

,

15

(

2

),

145

–

168

.

You

S.

,

K.

Crowston

and

Y.

Hegde

(

2019

), ‘

Coordination in OSS 2.0: ANT approach

,’

Proceedings of the 52nd Hawaii International Conference on System Sciences

, Grand Wailea, Maui, Hawaii, USA.

Zhong

X.

and

S. Z.

Ozdemir

(

2010

), ‘

Structure, learning, and the speed of innovating: a two-phase model of collective innovation using agent based modeling

,’

Industrial and Corporate Change

,

19

(

5

),

1459

–

1492

.

Zuchowski

O.

,

O.

Posegga

,

D.

Schlagwein

and

K.

Fischbach

(

2016

), ‘

Internal crowdsourcing: conceptual framework, structured review, and research agenda

,’

Journal of Information Technology

,

31

(

2

),

166

–

184

.