A joint normal-ordinal (probit) model for ordinal and continuous longitudinal data

Number of measurements of the Mini Mental State Exam (MMSE) and Activities of Daily Living (ADL) at each time point.

Response	Day 1	Day 3	Day 5	Day 8	Day 12
MMSE	59	58	60	52	38
ADL	60	0	60	0	40

Table 1

Open in new tab Download slide

Number of measurements of the Mini Mental State Exam (MMSE) and Activities of Daily Living (ADL) at each time point.

Response	Day 1	Day 3	Day 5	Day 8	Day 12
MMSE	59	58	60	52	38
ADL	60	0	60	0	40

Neurocognitive status was assessed using the mini-mental state exam (MMSE), which includes subscales for memory, linguistic ability, concentration, and psychomotor executive skills. Cognitive status was classified as no impairment (MMSE $\geq$ 24), moderate impairment (18 $\leq$ MMSE $\leq 23$ ⁠), or severe impairment (MMSE $\leq 17$ ⁠) (Milisen et al., 1998; Tombaugh and McIntyre, 1992). In addition, the functional status was measured using an adapted version of the Katz ADL-scale (ADL), which is treated as continuous. The mean ADL scores and individual profiles are presented at Fig. 1. A higher ADL value indicates more dependence on caretakers for activities of daily living, whereas a higher category of MMSE indicates a lower level of impairment. For exploratory purposes, the point-biserial correlations between the observed responses at several time points has been calculated (see Appendix H). It suggests that there exists a moderately strong relation between ADL and both the event of having severe impairment and the event of having impairment. This correlation seems to slightly increase over time. However, these correlations are not corrected for covariates and are only valid when the data would be missing completely at random, which is a very strict assumption.

Fig. 1

Observed average (with 95% confidence interval) of the activities of daily living scores on day 1, 5 and 12 (solid) and individual profiles of the 60 subjects (dashed).

3 Methodology

3.1 Model for a single longitudinal continuous response

One of the most popular models for longitudinal continuous variables is the linear mixed model. Suppose we have N subjects and the jth measurement for subject i is denoted by Y_ij. The vector

(Y_{i 1}, .. Y_{i n_{i}})

of all n_i measurements of subject i is denoted by

Y_{i}

⁠. With this notation, we can write the model as

\begin{matrix} Y_{i} | b_{i} \sim N (X_{i} β + Z_{i} b_{i}, Σ_{i}), \\ b_{i} \sim N (0, D), \end{matrix}

(3.1)

where

X_{i}

and

Z_{i}

are, respectively, (⁠

n_{i} \times k

⁠) and (⁠

n_{i} \times q

⁠) dimensional matrices of known covariates of, respectively, the fixed effects

β

and the random effects

b_{i}

⁠.

Σ_{i}

denotes the (⁠

n_{i} \times n_{i}

⁠) dimensional covariance matrix. Notably, i does not mean that the estimates of the variance depends on the subject. It indicates that the dimensions of the residual matrix can depend on the subject (Verbeke and Molenberghs, 2000). We can simplify

Σ_{i}

to

σ^{2} I_{i}

with the assumption that the random effects fully capture the correlation between the measurements within subjects. This conditional independence assumption is however not necessary. It can be relaxed by the inclusion of, for example, serial correlation.

A property of the linear mixed model is that the parameters of the conditional model and the marginal model are exactly equal. This holds since

E [Y_{i j}] = E [E (Y_{i j} | b_{i})] = x'_{i j} β

⁠. Still, the marginal model is defined as

Y_{i} = X_{i} β + ϵ_{i}^{*} .

(3.2)

The residuals

ϵ_{i}^{*}

are here by definition correlated and are normally distributed around

0

with variance

V_{i}^{*}

⁠. As a consequence, the distribution of the response is

Y_{i} \sim N (X_{i} β, V_{i}^{*}),

(3.3)

with

V_{i}^{*} = Z_{i} DZ'_{i} + Σ_{i}

⁠. More information about linear mixed models can be found in Verbeke and Molenberghs (2000).

3.2 Model for a single longitudinal ordinal response

A random-effects ordinal regression model can be used for clustered or repeated measures of an ordinal response. A threshold concept is applied, which assumes that the observed ordered response categories are determined by the value of an underlying continuous response. A series of threshold values

γ_{1}, γ_{2}, .. γ_{d - 1}

are assumed for the d categories. A response is categorized as category c if the latent response

Y_{i k}^{*}

surpasses the threshold value

γ_{c - 1}

⁠, but not γ_c. For the measurement at time k of this latent response of subject i,

k = 1, \dots, p_{i}

⁠,

Y_{i k}^{*} = x'_{i k} β + z'_{i k} b_{i} + ϵ_{i k},

is the hierarchical linear mixed model, where

x_{i k}

is the

r \times 1

vector that contains values for the covariates of the r-dimensional fixed effects vector

β

⁠. Next,

z_{i k}

is the

q \times 1

design vector for the q random effects

b_{i}

⁠.

b_{i}

is assumed to follow a normal distribution around 0 with the covariance matrix D. ϵ_ik are the residuals and are assumed to be independently normally distributed with mean 0 and variance

σ^{2}

⁠.

From the latter model for

Y_{i k}^{*}

⁠, the probabilities of the response categories can be derived. The probability that a response at time k for subject i falls into category c equals

P (Y_{i k} = c) = Φ (\frac{γ_{c} - ζ_{i k}}{σ}) - Φ (\frac{γ_{c - 1} - ζ_{i k}}{σ}),

where

ζ_{i k} = x'_{i k} β + z'_{i k} b_{i}

and

Φ (.)

equals the cumulative normal distribution. Similarly, the probability that a response k of subject i is less than or equal to category c equals

P (Y_{i k} \leq c) = Φ (\frac{γ_{c} - ζ_{i k}}{σ}) .

The choice of the unit and the origin of ζ is arbitrary (Hedeker and Gibbons, 1994). Alternatively, the logit link function can be applied (Ivanova et al., 2016), but this leads to more cumbersome calculations and less closed forms can be derived than with the probit link.

In Supplementary Appendix A the marginal random-effects ordinal regression model is derived. In the latter model, the interpretation is no longer conditional on the random effects. Let

Z_{i}

denote the

n_{i} \times q

dimensional design matrix of the random effects and

X_{i}

denote the

n_{i} \times r

dimensional design matrix of the fixed effects. The marginal model is the following:

P (y_{i} \leq c) = Φ (γ_{c} - X_{i} β; L_{i}^{- 1}),

(3.4)

where

L_{i} = I - Z_{i} {(D^{- 1} + Z'_{i} Z_{i})}^{- 1} Z'_{i} .

3.3 Joint model

The joint mixed model employs a q-dimensional random effects vector $ξ_{i}$ to encompass random effects linked the ordinal response as well as random effects linked with the continuous response. This vector follows a multivariate normal distribution with a mean of zero and a covariance matrix D. This matrix D accounts for the correlation between repeated measurements of the same response, as well as the correlations between (the vectors of) measurements for different responses. We assume that the responses are independent given the random effects, meaning that the random effects fully capture the correlation between the responses. Consequently, the joint density of the responses, given the random effects, is equivalent to the product of the conditional densities of the individual responses.

The joint marginal density can be obtained by integrating out the random effects out of the joint density, these calculations can be found in Supplementary Appendix B. Note that the primary purpose of this joint marginal density is to provide an intermediate result for future calculations. The joint marginal density is as follows:

f (y_{1 i}, y_{2 i} \leq c) = ϕ (X_{1 i} β; V_{i}) Φ (γ_{c} - X_{2 i} β - α_{i}; B_{i}),

(3.5)

where

\begin{matrix} V_{i} = Z_{1 i} DZ'_{1 i} + Σ_{i}, \\ α_{i} = H_{i} (y_{1 i} - X_{1 i} β), \\ H_{i} = B_{i} Z_{2 i} K_{i} Z'_{1 i} Σ_{i}^{- 1}, \\ K_{i}^{- 1} = D^{- 1} + Z'_{1 i} Σ_{i}^{- 1} Z_{1 i} + Z'_{2 i} Z_{2 i}, \\ B_{i}^{- 1} = I - Z_{2 i} K_{i} Z'_{2 i} . \end{matrix}

It is possible to extend (3.5) to the high-dimensional case, with multiple ordinal and/or continuous responses. Let

Y_{c i}

represent a vector containing all the measurements of a continuous responses:

Y_{c i} = (Y_{1 i 1}^{c}, \dots, Y_{1 i n_{1 i}}^{c}, Y_{2 i 1}^{c}, \dots, Y_{2 i n_{2 i}}^{c}, \dots, Y_{a i 1}^{c}, \dots, Y_{a i n_{a i}}^{c})

⁠. Notably, the number of measurements for each response does not have to be the same. Similarly, let

Y_{b i}

denote a vector containing all the measurements of the o ordinal responses:

Y_{b i} = (Y_{1 i 1}^{b}, \dots, Y_{1 i p_{1 i}}^{b}, Y_{2 i 1}^{b}, \dots, Y_{2 i p_{2 i}}^{b}, \dots, Y_{o i 1}^{b}, \dots, Y_{o p_{o i}}^{b})

⁠. Additionally, the matrices

Z_{c i}

and

Z_{b i}

consist of concatenated matrices of covariates for the random effects of continuous and ordinal responses, respectively. Specifically,

Z_{c i}

is formed by combining the matrices of covariates for the separate continuous responses:

Z_{c i} = [Z_{1 i}^{c}, Z_{2 i}^{c}, \dots, Z_{a i}^{c}]'

⁠. Similarly,

Z_{b i}

is formed by concatenating the matrices of covariates for the separate ordinal responses:

Z_{b i} = [Z_{1 i}^{b}, Z_{2 i}^{b}, \dots, Z_{o i}^{b}]'

⁠. The matrices of covariates for the fixed effects are defined in a similar manner:

X_{c i}

contains the concatenated matrices of covariates for the continuous responses:

[X_{1 i}^{c}, X_{2 i}^{c}, \dots, X_{a i}^{c}]'

⁠, while

X_{b i}

contains the concatenated matrices of covariates for the ordinal responses:

[X_{1 i}^{b}, X_{2 i}^{b}, \dots, X_{o i}^{b}]'

⁠. Further,

Σ_{i}

is a block diagonal matrix with as blocks the variance-covariance matrices of the continuous responses

Σ_{i} = [\begin{matrix} Σ_{1 i} & 0 & \dots & 0 \\ 0 & Σ_{2 i} & .. & 0 \\ \dots & \dots & \dots & \dots \\ 0 & 0 & \dots & Σ_{a i} \end{matrix}] ​ .

It is easy to see that the marginal hierarchical model is now

f (y_{c i}, y_{b i} \leq c) = ϕ (X_{c i} β; V_{i}) Φ (γ_{c} - X_{b i} β - α_{i}; B_{i}) .

(3.6)

To be as general as possible, we will use the above expressions for the remainder of the paper.

Due to potential computational difficulties associated with high-dimensional models, Fieuws and Verbeke (2006) introduced a pseudo-likelihood method to simplify the model fitting. This involves fitting a bivariate model for every pair of responses and then combining the results. Kundu (2011) offers a convenient guide to implementing this method in SAS NLMIXED.

3.4 Conditional models

Conditional models offer a practical approach for making predictions of one subset of measurements, conditional on another subset. This methodology proves particularly valuable in circumventing challenges related to time-dependent covariates. In analogy to Section 3.3, we define

Y_{c i}

as a vector composed of all the measurements of a continuous responses and

Y_{b i}

as a vector composed of all the measurements of o ordinal responses. Next, let

{\tilde{Y}}_{c i}

denote a

{\tilde{n}}_{i}

-dimensional subset of the continuous response vector

Y_{c i}

⁠, while

{\tilde{Y}}_{b i}

represents a

{\tilde{p}}_{i}

-dimensional subset of the ordinal response vector

Y_{b i}

⁠. Notably,

{\tilde{Y}}_{c i}

and

{\tilde{Y}}_{b i}

can contain measurements of different, respectively, continuous and ordinal responses. By analogy,

{\tilde{X}}_{c i}

and

{\tilde{X}}_{b i}

represent the

{\tilde{n}}_{i} \times q

and

{\tilde{p}}_{i} \times q

submatrices of

X_{c i}

and

X_{b i}

⁠. Leveraging the ratios of the marginal distributions, it becomes feasible to derive expected values and their corresponding prediction intervals. A first conditional expected value is a subset the continuous responses, given a subset of both continuous and ordinal responses. This specific

n_{a}

-dimensional subvector of predicted continuous response(s) is denoted as

{\tilde{Y}}_{c i}^{a}

⁠. This prediction is conditional on, on the one hand,

{\tilde{Y}}_{c i}^{b}

⁠, the subvector of length

n_{b}

of values of the continuous response vector and, on the other hand,

{\tilde{Y}}_{b i}

⁠, the subvector of ordinal responses. The notation will be as follows: the superscript specifies the submatrices or subvectors; superscript a and b denote, respectively, the rows a₁ until

a_{n_{a}}

and b₁ to

b_{n_{b}}

⁠. In addition, the superscript bb specifies the rows b₁ until

b_{n_{b}}

and columns b₁ until

b_{n_{b}}

⁠. The superscript ab indicates row a₁ until

a_{n_{a}}

and column b₁ until

b_{n_{b}}

⁠. The conditional expected value is as follows:

\begin{matrix} E [{\tilde{Y}}_{ci}^{a} | {\tilde{Y}}_{ci}^{b} = {\tilde{y}}_{ci}^{b}, {\tilde{y}}_{b i} \leq c] = ({(E_{i} V_{i}^{- 1} {\tilde{X}}_{c i} β)}^{a} + E_{i}^{a b} {(E_{i}^{b b})}^{- 1} ({\tilde{y}}_{ci}^{b} - {(E_{i} V_{i}^{- 1} {\tilde{X}}_{c i} β)}^{b})) \\ + ({(E_{i} H'_{i} B_{i}^{- 1})}^{a} - E_{i}^{a b} {(E_{i}^{b b})}^{- 1} {(E_{i} H_{i} B_{i}^{- 1})}^{b}) κ, \end{matrix}

(3.7)

where κ equals the expected value of the truncated normal distribution with variance

T_{i}

⁠, mean

F_{i}

and limits]

- \infty

⁠; d]. This expression is implemented in standard statistical software, such as in the R package tmvtnorm. The analytical expression can be found in Manjunath and Wilhelm (2021). In addition,

\begin{matrix} E_{i}^{- 1} = H'_{i} B_{i}^{- 1} H_{i} + V_{i}^{- 1}, \\ T_{i}^{- 1} = {(E_{i} H_{i}^{'} B_{i}^{- 1})}^{b'} {(E_{i}^{b b})}^{- 1} {(E_{i} H_{i}^{'} B_{i}^{- 1})}^{b} + B_{i}^{- 1} - (H_{i}^{'} B_{i}^{- 1})' E_{i} (H_{i}^{'} B_{i}^{- 1}), \\ F_{i} = T_{i} ({(E_{i} H_{i}^{'} B_{i}^{- 1})}^{b'} {(E_{i}^{b b})}^{- 1} ({\tilde{y}}_{c i}^{b} - {(E_{i} V_{i}^{- 1} {\tilde{X}}_{c i} β)}^{b}) + (H_{i}^{'} B_{i}^{- 1})' E_{i} (V_{i}^{- 1} {\tilde{X}}_{c i} β)), \\ d = γ_{c} - {\tilde{X}}_{b i} β + H_{i} {\tilde{X}}_{c i} β . \end{matrix}

The corresponding prediction interval and the derivations can be retrieved in Supplementary Appendix C.

A special case of (3.7) is when the continuous response is modeled conditional on solely the ordinal response. In this case, the expression of the expected value simplifies as follows

\begin{matrix} E [{\tilde{Y}}_{c i} | {\tilde{y}}_{b i} \leq c] = E_{i} (V_{i}^{- 1} {\tilde{X}}_{c i} β + H_{i}^{'} B_{i}^{- 1} κ), \end{matrix}

(3.8)

where κ is again the expected value of the truncated normal distribution with variance

T_{i}^{*}

⁠, mean

F_{i}^{*}

and limits]

- \infty

⁠; d]. Further,

\begin{matrix} T_{i}^{* - 1} = B_{i}^{- 1} - (H_{i}^{'} B_{i}^{- 1})' E_{i} (H_{i}^{'} B_{i}^{- 1}) \\ F_{i}^{*} = T_{i}^{*} \cdot (H_{i}^{'} B_{i}^{- 1})' E_{i} (V_{i}^{- 1} {\tilde{X}}_{c i} β) . \end{matrix}

The expressions for the prediction interval and the related details considering the calculations can be found in Supplementary Appendix D.

To make predictions for the ordinal response, we can derive conditional probabilities for a subset of the ordinal response conditional on a subset of the ordinal response and the continuous response. We denote the subset for which we calculate the probability of being in category c or lower as

{\tilde{Y}}_{bi}^{a}

⁠, and we calculate it conditional on a subset of the ordinal response

{\tilde{Y}}_{bi}^{b}

⁠, and a subset of the continuous response,

{\tilde{Y}}_{c i}

⁠. The use of the superscripts b, ab and bb is in analogy with (3.7). The conditional probability can be expressed as follows:

\begin{matrix} f ({\tilde{y}}_{bi}^{a} \leq c | {\tilde{y}}_{bi}^{b} \leq c, {\tilde{y}}_{c i}) = \frac{Φ (γ_{c} - {\tilde{X}}_{b i} β - H_{i} ({\tilde{y}}_{c i} - {\tilde{X}}_{c i} β); B_{i})}{Φ (γ_{c}^{b} - {\tilde{X}}_{b i}^{b} β - H_{i}^{b} ({\tilde{y}}_{c i} - {\tilde{X}}_{c i} β); B_{i}^{b b})} . \end{matrix}

(3.9)

After applying the logit transformation to ensure that the boundaries are constrained to the unit interval, the corresponding confidence interval can be calculated using the delta method. The gradients of the parameters can be found in Supplementary Appendix E.

A special case is the conditional density of the ordinal response(s) conditional on solely a subvector of the continuous response vector. The expected probability is then simplified to

f ({\tilde{y}}_{b i} \leq c | {\tilde{y}}_{c i}) = Φ (γ_{c} - {\tilde{X}}_{b i} β - H_{i} ({\tilde{y}}_{c i} - {\tilde{X}}_{c i} β); B_{i}) .

(3.10)

Supplementary Appendix F contains the formulas related to the standard errors.

3.5 Correlation function

By using the property that the responses are independent conditional on the random effects, it is feasible to deduce a correlation function from the hierarchical joint model. This correlation captures the manifest correlation, denoted as

ρ_{Y_{1 i j}, Y_{2 i k} \leq c}

⁠. It quantifies the relationship between the continuous response

Y_{1 i}

at time j and the event of an ordinal response

Y_{2 i}

below category c at time k. This model-based manifest correlation represents the correlation between the scores on the original scale, whereas the latent correlation quantifies the correlation between the underlying random effects. Although calculating the latent correlation is simpler, the scientific interest often focuses on the manifest correlation rather than the latent correlation. The formula for the manifest correlation function is as follows:

\begin{matrix} ρ_{Y_{1 i j}, Y_{2 i k} \leq c} = \frac{- \frac{1}{L_{i}} z'_{1 i j} M_{i}^{- 1} z_{2 i k} ϕ (γ_{c} - x'_{2 i k} β; L_{i}^{- 1})}{\sqrt{(z_{1 i j}^{'} D^{*} z_{1 i j} + Σ_{1 i j}) Φ (γ_{c} - x'_{2 i k} β; L_{i}^{- 1}) (1 - Φ (γ_{c} - x'_{2 i k} β; L_{i}^{- 1}))}}, \end{matrix}

(3.11)

where

D^{*}

denotes the submatrix of D relating to the variances and covariances of the random effects of the responses

Y_{1 i}

and

Y_{2 i}

⁠. In addition,

M = {(D^{*})}^{- 1} + z'_{2 i k} z_{2 i k}

⁠.

The details of the derivations and the formulas regarding the standard errors can be found in Supplementary Appendix G.

4 Parameter estimation

The parameters in the joint random effects model are estimated via maximum likelihood. The likelihood function of the joint random-effects model is constructed under the assumption that the responses are independent given the random effects. As a result the likelihood function for a joint model of the responses Y_ci and Y_bi equals

L (θ) = \prod_{i = 1}^{N} \int f_{1 i} (y_{c i} | b_{i}) f_{2 i} (y_{b i} \leq c | b_{i}) f (b_{i} | D) d b_{i},

(4.12)

in which the vector θ contains all parameters of the conditional distributions and the distribution of the random effects

b_{i}

⁠. In most cases, numerical approximations are needed for the integral in 4.12. In this paper, adaptive Gaussian quadrature is used for the estimation, which is implemented in the SAS procedure NLMIXED (Pinheiro and Bates, 1995; Molenberghs and Verbeke, 2005). The code for fitting the joint model can be found in Appendix H or via Github.

5 Data analysis

In this section, the relationship between the continuous functioning score (ADL) and the ordinal level of impairment (MMSE) is examined. First, a joint model is implemented, as discussed in Section 3.3. Here, we fit a linear mixed model for ADL (as discussed in Section 3.1) and a generalized linear mixed model with a probit link for MMSE (as discussed in Section 3.2). Next, we allow the random effects of the responses to correlate to create the joint model. Since Fig. 1 clearly indicates that the evolution of ADL is not linear, we will include time as a categorical covariate in the linear mixed model. In contrast, time since the operation is included as a continuous covariate in the generalized linear mixed model for impairment. The model can be written as

\begin{matrix} Y_{1 i j} & = & β_{1, 0} + β_{1, 1} {I (Time}_{i j} = 5) + β_{1, 2} {I (Time}_{i j} = 12) + β_{1, 3} {I (Sex}_{i} = F) \\ + β_{1, 4} {Age}_{i j} + b_{10 i} + b_{11 i} \frac{{Time}_{i j}}{100} + ϵ_{1 i j}, \\ Φ^{- 1} (P (Y_{2 i j} \leq c)) & = & γ_{c} - (β_{2, 1} {Time}_{i j} + β_{2, 2} {I (Sex}_{i} = F) + β_{2, 3} {Age}_{i j} + b_{20 i} + b_{21 i} \frac{{Time}_{i j}}{100}) . \end{matrix}

The hierarchical models include several random effects: $b_{10 i}$ and $b_{20 i}$ are the random intercepts for, respectively, ADL and MMSE. Next, $b_{11 i}$ and $b_{21 i}$ are the random slopes for respectively ADL and MMSE. In order to account for the correlation between the responses, different assumptions can be made regarding the joint distribution of the random effects such as for example setting the correlations to 1 (ie shared random-effects model). However, we have chosen to make the distribution as flexible as possible, ie, not to impose any restriction on the covariance matrix. We assumed that $[b_{10 i}, b_{11 i}, b_{20 i}, b_{21 i}] \sim MVN (0, D)$ and $ϵ_{1 i} \sim MVN (0, σ^{2} I_{i})$ ⁠. The full SAS code can be found in Supplementary Appendix H and in Github. Convergence was reached within 10 hours and 19 minutes on a regular laptop (CPU $=$ Processor Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz, 2592 Mhz, 6 Core(s), 12 Logical Processor(s), RAM $=$ 24GB) and resulted in the parameter estimates shown in Table 2.

Table 2

Parameter estimates (standard errors) of ADLTOT and MMSE.

Effect	ADLTOT		MMSE
Intercept	3.42	(4.50)	–
γ₁	–		–19.17	(5.70)
γ₂	–		–16.61	(5.50)
Time	–		0.04	(0.04)
Time 5	–2.68	(0.35)	–
Time 12	–3.62	(0.57)	–
Sex: Female	–1.58	(1.02)	–0.37	(1.11)
Age	0.20	(0.05)	–0.22	(0.07)
$σ^{2}$	3.02	(0.60)	–

Effect	ADLTOT		MMSE
Intercept	3.42	(4.50)	–
γ₁	–		–19.17	(5.70)
γ₂	–		–16.61	(5.50)
Time	–		0.04	(0.04)
Time 5	–2.68	(0.35)	–
Time 12	–3.62	(0.57)	–
Sex: Female	–1.58	(1.02)	–0.37	(1.11)
Age	0.20	(0.05)	–0.22	(0.07)
$σ^{2}$	3.02	(0.60)	–

Table 2

Parameter estimates (standard errors) of ADLTOT and MMSE.

Effect	ADLTOT		MMSE
Intercept	3.42	(4.50)	–
γ₁	–		–19.17	(5.70)
γ₂	–		–16.61	(5.50)
Time	–		0.04	(0.04)
Time 5	–2.68	(0.35)	–
Time 12	–3.62	(0.57)	–
Sex: Female	–1.58	(1.02)	–0.37	(1.11)
Age	0.20	(0.05)	–0.22	(0.07)
$σ^{2}$	3.02	(0.60)	–

Effect	ADLTOT		MMSE
Intercept	3.42	(4.50)	–
γ₁	–		–19.17	(5.70)
γ₂	–		–16.61	(5.50)
Time	–		0.04	(0.04)
Time 5	–2.68	(0.35)	–
Time 12	–3.62	(0.57)	–
Sex: Female	–1.58	(1.02)	–0.37	(1.11)
Age	0.20	(0.05)	–0.22	(0.07)
$σ^{2}$	3.02	(0.60)	–

An association between the responses was indicated by a Wald test, by showing that the covariances among the random effects of the different responses significantly differ from zero (⁠ $H_{0} : d_{13} = d_{14} = d_{23} = d_{24} = 0, χ_{d f = 4}^{2} = 12.11, p = 0.02$ ⁠). The latent correlations between the random effect offer a first glimpse into the relationship between the two responses (Table 3). These values can be interpreted in terms of the latent random effects. For instance, the correlation between the random intercepts (⁠ $r = - .69$ ⁠) shows that immediately after the operation, a lower starting value of ADL (better functioning) than expected based on the covariates, is related to a lower probability of having a more severe level of impairment than expected based on the covariates. Still, these are the correlations between the underlying (latent) random effects on the probit-scale. It can be of interest to also examine the manifest correlations, as discussed in Section 3.5, which are the model-based correlations between the responses on their original scale.

Table 3

Latent correlations [CI] between the random effects of MMSE and ADL.

	$b_{10 i}$	$b_{11 i}$	$b_{20 i}$	$b_{21 i}$
$b_{10 i}$	1
$b_{11 i}$	.12 [–.44;.61]	1
$b_{20 i}$	–.70[–.89; –.31]	–.38 [–.77;.21]	1
$b_{21 i}$	.38 [–.80;.95]	–.07 [–.98;.98]	–.72 [–1;.95]	1

	$b_{10 i}$	$b_{11 i}$	$b_{20 i}$	$b_{21 i}$
$b_{10 i}$	1
$b_{11 i}$	.12 [–.44;.61]	1
$b_{20 i}$	–.70[–.89; –.31]	–.38 [–.77;.21]	1
$b_{21 i}$	.38 [–.80;.95]	–.07 [–.98;.98]	–.72 [–1;.95]	1

Note that to ensure that the correlation is bounded between –1 and 1, the Fisher-Z-transformation is applied to compute the confidence intervals, after which the values are transformed back to the original scale.

Table 3

Latent correlations [CI] between the random effects of MMSE and ADL.

	$b_{10 i}$	$b_{11 i}$	$b_{20 i}$	$b_{21 i}$
$b_{10 i}$	1
$b_{11 i}$	.12 [–.44;.61]	1
$b_{20 i}$	–.70[–.89; –.31]	–.38 [–.77;.21]	1
$b_{21 i}$	.38 [–.80;.95]	–.07 [–.98;.98]	–.72 [–1;.95]	1

	$b_{10 i}$	$b_{11 i}$	$b_{20 i}$	$b_{21 i}$
$b_{10 i}$	1
$b_{11 i}$	.12 [–.44;.61]	1
$b_{20 i}$	–.70[–.89; –.31]	–.38 [–.77;.21]	1
$b_{21 i}$	.38 [–.80;.95]	–.07 [–.98;.98]	–.72 [–1;.95]	1

Note that to ensure that the correlation is bounded between –1 and 1, the Fisher-Z-transformation is applied to compute the confidence intervals, after which the values are transformed back to the original scale.

By the use of (3.11) the correlations between the responses on their original scale can be computed. Since these model-based correlations depend on the covariate values chosen, we computed them for a man of mean age (78). They are displayed in Table 4. Functional impairment is at each timepoint significantly correlated with the event of a severe cognitive impairment and the event of impairment. The correlation is quite constant over time, but better cognitive functioning is consistently related to a higher probability of having no impairment and a lower probability of having severe impairment.

Table 4

Correlations between ADL (higher: lower functioning) and MMSE (cognitive impairment) for a 78-year-old man.

Panel A: Manifest correlations between ADL and the event of having severe impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.44 [.28 ;.58 ]	.44 [.29 ;.57 ]	.44 [.29 ;.57 ]	.43 [.28 ;.57 ]	.42 [.26 ;.57 ]
5	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.61 ]	.48 [.31 ;.61 ]
12	.47 [.26 ;.65 ]	.48 [.28 ;.64 ]	.48 [.29 ;.64 ]	.49 [.29 ;.64 ]	.49 [.28 ;.65 ]

Panel A: Manifest correlations between ADL and the event of having severe impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.44 [.28 ;.58 ]	.44 [.29 ;.57 ]	.44 [.29 ;.57 ]	.43 [.28 ;.57 ]	.42 [.26 ;.57 ]
5	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.61 ]	.48 [.31 ;.61 ]
12	.47 [.26 ;.65 ]	.48 [.28 ;.64 ]	.48 [.29 ;.64 ]	.49 [.29 ;.64 ]	.49 [.28 ;.65 ]

Panel B: Manifest correlations between ADL and the event of having impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.47 [.31 ;.60 ]	.47 [.32 ;.60 ]	.47 [.33 ;.6 ]	.48 [.34 ;.60 ]	.48 [.32 ;.61 ]
5	.51 [.38 ;.62 ]	.52 [.39 ;.62 ]	.52 [.41 ;.62 ]	.53 [.41 ;.63 ]	.54 [.41 ;.65 ]
12	.50 [.30 ;.66 ]	.51 [.32 ;.66 ]	.52 [.34 ;.66 ]	.54 [.37 ;.67 ]	.55 [.37 ;.70 ]

Panel B: Manifest correlations between ADL and the event of having impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.47 [.31 ;.60 ]	.47 [.32 ;.60 ]	.47 [.33 ;.6 ]	.48 [.34 ;.60 ]	.48 [.32 ;.61 ]
5	.51 [.38 ;.62 ]	.52 [.39 ;.62 ]	.52 [.41 ;.62 ]	.53 [.41 ;.63 ]	.54 [.41 ;.65 ]
12	.50 [.30 ;.66 ]	.51 [.32 ;.66 ]	.52 [.34 ;.66 ]	.54 [.37 ;.67 ]	.55 [.37 ;.70 ]

Table 4

Correlations between ADL (higher: lower functioning) and MMSE (cognitive impairment) for a 78-year-old man.

Panel A: Manifest correlations between ADL and the event of having severe impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.44 [.28 ;.58 ]	.44 [.29 ;.57 ]	.44 [.29 ;.57 ]	.43 [.28 ;.57 ]	.42 [.26 ;.57 ]
5	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.61 ]	.48 [.31 ;.61 ]
12	.47 [.26 ;.65 ]	.48 [.28 ;.64 ]	.48 [.29 ;.64 ]	.49 [.29 ;.64 ]	.49 [.28 ;.65 ]

Panel A: Manifest correlations between ADL and the event of having severe impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.44 [.28 ;.58 ]	.44 [.29 ;.57 ]	.44 [.29 ;.57 ]	.43 [.28 ;.57 ]	.42 [.26 ;.57 ]
5	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.60 ]	.48 [.34 ;.61 ]	.48 [.31 ;.61 ]
12	.47 [.26 ;.65 ]	.48 [.28 ;.64 ]	.48 [.29 ;.64 ]	.49 [.29 ;.64 ]	.49 [.28 ;.65 ]

Panel B: Manifest correlations between ADL and the event of having impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.47 [.31 ;.60 ]	.47 [.32 ;.60 ]	.47 [.33 ;.6 ]	.48 [.34 ;.60 ]	.48 [.32 ;.61 ]
5	.51 [.38 ;.62 ]	.52 [.39 ;.62 ]	.52 [.41 ;.62 ]	.53 [.41 ;.63 ]	.54 [.41 ;.65 ]
12	.50 [.30 ;.66 ]	.51 [.32 ;.66 ]	.52 [.34 ;.66 ]	.54 [.37 ;.67 ]	.55 [.37 ;.70 ]

Panel B: Manifest correlations between ADL and the event of having impairment.
	Time(Impairment)
Time (ADL)	1	3	5	8	12
1	.47 [.31 ;.60 ]	.47 [.32 ;.60 ]	.47 [.33 ;.6 ]	.48 [.34 ;.60 ]	.48 [.32 ;.61 ]
5	.51 [.38 ;.62 ]	.52 [.39 ;.62 ]	.52 [.41 ;.62 ]	.53 [.41 ;.63 ]	.54 [.41 ;.65 ]
12	.50 [.30 ;.66 ]	.51 [.32 ;.66 ]	.52 [.34 ;.66 ]	.54 [.37 ;.67 ]	.55 [.37 ;.70 ]

Table 5 presents the predicted MMSE (Mini-Mental State Examination) statuses on days 8 and 12, conditional on ADL (Activities of Daily Living) scores recorded on days 1 and 8. The latter ADL scores were set to one standard deviation below, at, or one standard deviation above the respective day’s mean. Additionally, sex was set to female, and age was set to 78 years. The predictions are derived from the parameter estimates obtained from the joint model, which were then substituted into the conditional model (3.10). The results in Table 5 shows that a history of strong reliance on a caregiver corresponds to a high probability of cognitive impairment both in the present and in the immediate future. Furthermore, the confidence intervals emphasize that a low ADL score, indicative of low caregiver dependence, holds limited predictive value for MMSE status. This is in contrast to moderate or high ADL scores, which demonstrate stronger association with MMSE outcomes.

Table 5

Prediction of cognitive impairment based on the history of ADL at time 1 and 5 for a female of 78 years.

Timepoint prediction	History ADL (day 1–day 5)	P(Impairment=Severe)	P(Impairment)
8	14.57–10.87	0.03 [0.00; 0.94]	0.22 [0.12; 0.35]
8	18.10–15.42	0.25 [0.17; 0.36]	0.68 [0.52; 0.80]
8	21.63–19.97	0.72 [0.53; 0.86]	0.96 [0.75; 0.99]
12	14.57–10.87	0.02 [0.00; 1.00]	0.19 [0.47; 0.84]
12	18.10–15.42	0.21 [0.12; 0.35]	0.66 [0.48; 0.80]
12	21.63–19.97	0.69 [0.47; 0.84]	0.95 [0.73; 0.99]

Timepoint prediction	History ADL (day 1–day 5)	P(Impairment=Severe)	P(Impairment)
8	14.57–10.87	0.03 [0.00; 0.94]	0.22 [0.12; 0.35]
8	18.10–15.42	0.25 [0.17; 0.36]	0.68 [0.52; 0.80]
8	21.63–19.97	0.72 [0.53; 0.86]	0.96 [0.75; 0.99]
12	14.57–10.87	0.02 [0.00; 1.00]	0.19 [0.47; 0.84]
12	18.10–15.42	0.21 [0.12; 0.35]	0.66 [0.48; 0.80]
12	21.63–19.97	0.69 [0.47; 0.84]	0.95 [0.73; 0.99]

Table 5

Prediction of cognitive impairment based on the history of ADL at time 1 and 5 for a female of 78 years.

Timepoint prediction	History ADL (day 1–day 5)	P(Impairment=Severe)	P(Impairment)
8	14.57–10.87	0.03 [0.00; 0.94]	0.22 [0.12; 0.35]
8	18.10–15.42	0.25 [0.17; 0.36]	0.68 [0.52; 0.80]
8	21.63–19.97	0.72 [0.53; 0.86]	0.96 [0.75; 0.99]
12	14.57–10.87	0.02 [0.00; 1.00]	0.19 [0.47; 0.84]
12	18.10–15.42	0.21 [0.12; 0.35]	0.66 [0.48; 0.80]
12	21.63–19.97	0.69 [0.47; 0.84]	0.95 [0.73; 0.99]

Timepoint prediction	History ADL (day 1–day 5)	P(Impairment=Severe)	P(Impairment)
8	14.57–10.87	0.03 [0.00; 0.94]	0.22 [0.12; 0.35]
8	18.10–15.42	0.25 [0.17; 0.36]	0.68 [0.52; 0.80]
8	21.63–19.97	0.72 [0.53; 0.86]	0.96 [0.75; 0.99]
12	14.57–10.87	0.02 [0.00; 1.00]	0.19 [0.47; 0.84]
12	18.10–15.42	0.21 [0.12; 0.35]	0.66 [0.48; 0.80]
12	21.63–19.97	0.69 [0.47; 0.84]	0.95 [0.73; 0.99]

6 Concluding remarks

In this research, our primary focus has been on introducing two new methodologies that are built upon the foundation of joint models: The first methodology involves closed-form expressions to obtain the manifest correlations from the model between the responses on their original scale, providing an alternative to investigating latent correlations between the underlying random effects on the probit-scale. Our second methodology entails employing conditional joint models in lieu of time-dependent covariates to analyze the effect of a longitudinal predictor on the longitudinal response. This shift effectively sidesteps several complications associated with time-dependent covariates. Firstly, the need for specifying lags is obviated with the joint modeling approach. With manifest correlations, we can assess the effect of a predictor on the response at each time point. In addition, our conditional model allows for straightforward adjustments of lags without refitting the joint model. Secondly, due to the symmetric nature of the relationship in a joint model, challenges posed by endogeneity or intermediary variables are mitigated. Thirdly, the presence of missing data necessitates no additional steps, thanks to the principle of ignorability (Rubin, 1976). Consequently, our methodology becomes highly suitable for unbalanced data, as it operates without the requirement of lags or additional methods for handling missing data.

Moreover, our paper extends the application of our conditional model to scenarios involving multiple longitudinal predictors. By consolidating all predictors into an elongated vector and adapting the design matrices accordingly, our methodology remains seamlessly applicable.

Illustrating the practical utility of our methodology, we added a case study investigating the association of two longitudinal responses: a continuous physical functioning score and an ordinal mental functioning score. We show that predictions concerning the ordinal response can be effectively derived from the historical trajectory of the continuous response. In addition, the missingness is assumed to be at random, and hence results in no additional steps in the data analysis due to ignorability (Rubin, 1976). Implementation of the joint model can be achieved through the NLMIXED procedure in SAS. Code is provided in Supplementary Appendix H for both transforming the data in the correct format and fitting the joint model.

However, it’s worth noting that a limitation of our methodology is its reliance on the proportional odds assumption inherent in the ordinal regression model. Of course, extension to non-proportionality is possible by having (certain) covariate effects category-dependent. But then, as always, care needs to be taken to ensure non-negative probabilities ensue. A second drawback is the computational complexity of a joint model, especially when more than two responses are included. Various methodologies can be used, such as the pairwise fitting approach for high-dimensional data (Fieuws and Verbeke, 2006), the split-sample approach for large datasets (Molenberghs et al., 2011), or a combination of both (Ivanova et al., 2017). A third limitation arises from the dependence of correlations and confidence intervals on the selected random effect structure. Therefore, it is recommended to model the random effects with a high degree of flexibility, potentially incorporating splines. Importantly, even within this scenario, our methodology remains applicable.

Further research can be conducted on the bounds of the manifest correlation function. Research in the context of surrogate markers (Alonso and Molenberghs, 2007) and the Bahadur model (Molenberghs and Verbeke, 2005) have shown that respectively the bounds of the R² or the correlation between dichotomous responses can generally be smaller than one. In addition, it can be of interest to implement the methodology in a SAS macro to facilitate the usability. Secondly, other approaches can be explored, such as multiple imputation models, where the value of the one longitudinal variable is imputed via a random-effects model and then included as a time-dependent covariate in the other longitudinal model. Still, some issues of time-dependent covariates would persist, such as endogeneity, possibility of intermediate variables and the definition of lags.

Supplementary material

Supplementary material is available at Biostatistics Journal online.

Funding

None declared.

Conflict of interest statement. None declared.

References

Alonso

A

,

Molenberghs

G

.

Surrogate marker evaluation from an information theory perspective

.

Biometrics.

2007

:

63

(

1

):

180

–

186

.

Breslow

NE

,

Clayton

DG

.

Approximate inference in generalized linear mixed models

.

J Am Stat Assoc.

1993

:

88

(

421

):

9

–

25

.

Cavender

JB

,

Rogers

WJ

,

Fisher

LD

,

Gersh

BJ

,

Coggin

CJ

,

Myers

WO

.

Effect of smoking on survival and morbidity in patients randomized to medical or surgical therapy in the coronary artery surgery study (CASS): 10-Year follow-up

.

J Am College Cardiol

.

1992

:

20

(

2

):

287

–

294

.

Chakraborty

H

,

Helms

RW

,

Sen

PK

,

Cohen

MS

.

Estimating correlation by using a general linear mixed model: evaluation of the relationship between the concentration of HIV-1 RNA in blood and semen

.

Stat Med.

2003

:

22

(

9

):

1457

–

1464

.

Delporte

M

,

Fieuws

S

,

Molenberghs

G

,

Verbeke

G

,

Situma Wanyama

S

,

Hatziagorou

E

,

De Boeck

C

.

A joint normal–binary (probit) model

.

Int Stat Rev

.

2022

:90:

S37

–

S51

.

Diggle

P.

Analysis of longitudinal data

.

New York

:

Oxford Statistical Science Series

;

2002

. p.

245

–

281

.

Engel

B

,

Keen

A

.

A simple approach for the analysis of generalized linear mixed models

.

Stat Neerlandica.

1994

:

48

(

1):1

–22.

Faes

C

,

Geys

H

,

Aerts

M

,

Molenberghs

G

,

Catalano

PJ

.

Modeling combined continuous and ordinal outcomes in a clustered setting

.

J Agric Biol Environ Stat

.

2004

:

9

(

4

):

515

–

530

.

Fieuws

S

,

Verbeke

G

Pairwise fitting of mixed models for the joint modeling of multivariate longitudinal profiles

.

Biometrics.

2006

;

62

(

2

):

424

–

431

.

Fieuws

S

,

Verbeke

G

,

Boen

F

,

Delecluse

C

.

High dimensional multivariate mixed models for binary questionnaire data

.

J R Stat Soc Ser C (Appl Stat

).

2006

:

55

(

4

):

449

–

460

.

Hedeker

D

,

Gibbons

RD.

A random-effects ordinal regression model for multilevel analysis

.

Biometrics

.

1994

:

50

(

4

):

933

–

944

.

Ivanova

A

,

Molenberghs

G

,

Verbeke

G

.

Mixed models approaches for joint modeling of different types of responses

.

J Biopharm Stat

.

2016

:

26

(

4

):

601

–

618

.

Ivanova

A

,

Molenberghs

G

,

Verbeke

G

.

Fast and highly efficient pseudo-likelihood methodology for large and complex ordinal data

.

Stat Methods Med Res.

2017

:

26

(

6

):

2758

–

2779

.

Kundu

M.

Implementation of pairwise fitting technique for analyzing multivariate longitudinal data in SAS

.

Nashville, TN, United States

:

PharmaSUG

;

2011

, p.

1

–

12

.

Google Preview

Laird

NM

,

Ware

JH

.

Random-effects models for longitudinal data

.

Biometrics

.

1982

:

38

(

4

):

963

–

974

.

Manjunath

BG

,

Wilhelm

S

.

Moments calculation for the doubly truncated multivariate normal density

.

J Behav Data Sci

.

2021

:

1(1

):

13

–

33

.

Milanzi

E

,

Molenberghs

G

,

Alonso

A

,

Verbeke

G

,

De Boeck

P

.

Reliability measures in item response theory: manifest versus latent correlation functions

.

Br J Math Stat Psychol.

2015

:

68

(

1

):

43

–

64

.

Milisen

K

,

Abraham

IL

,

Broos

PL

.

Postoperative variation in neurocognitive and functional status in elderly hip fracture patients

.

J Adv Nursing.

1998

:

27

(

1

):

59

–

67

.

Molenberghs

G

,

Verbeke

G.

Models for discrete longitudinal data

.

New York, NY

:

Springer

;

2005

.

Google Preview

Molenberghs

G

,

Verbeke

G

,

Iddi

S

.

Pseudo-likelihood methodology for partitioned large and complex samples

.

Stat Probab Lett.

2011

:

81

(

7

):

892

–

901

.

Pinheiro

JC

,

Bates

DM

.

Approximations to the log-likelihood function in the nonlinear mixed-effects model

.

J Comput Graphical Stat

.

1995

:

4

(

1

):

12

–

35

.

Poddar

A.

Analysis off dependent discrete choices using Gaussian Copula. PhD thesis, Old Dominion University,

2016

.

Qian

T

,

Klasnja

P

,

Murphy

SA

.

Linear mixed models with endogenous covariates: modeling sequential treatment effects with application to a mobile health study

.

Stat Sci

.

2020

:

35

(

3

):

375

–

390

.

PubMed

Rizopoulos

D.

Joint models for longitudinal and time-to-event data: With applications in R

.

New York

:

Chapman and Hall/CRC

;

2012

, p.

100

.

Rubin

DB

.

Inference and missing data

.

Biometrika

.

1976

:

63

(

3

):

581

–

592

.

Tombaugh

T

,

McIntyre

N

.

The mini-mental state examination: a comprehensive review

.

J Am Geriatrics Soc

.

1992

:

40

(

9

):

922

–

935

.

Verbeke

G

,

Molenberghs

G.

Linear mixed models for longitudinal data

.

New York

:

Springer Series in Statistics

;

2000

; p.

24

.

Google Preview

Wolfinger

R

,

O’Connell

M.

Generalized linear mixed models: a pseudo-likelihood approach

.

J Stat Comput Simul

.

1993

:

48

:

233

–

243

.