Statistical primer: an introduction into the principles of Bayesian statistical analyses in clinical trials

Abstract

Trials in cardiac surgery are often hampered at the design level by small sample sizes and ethical considerations. The conventional analytical approach, combining frequentist statistics with null hypothesis significance testing, has known limitations and its associated P-values are often misinterpreted, leading to dichotomous conclusions of trial results. The Bayesian statistical framework may overcome these limitations through probabilistic reasoning and is subsequently introduced in this Primer. The Bayesian framework combines prior beliefs and currently obtained data (the likelihood), resulting in updated beliefs, also known as posterior distributions. These distributions subsequently facilitate probabilistic interpretations. Several previous cardiac surgery trials have been performed under a Bayesian framework and this Primer enhances the understanding of their basic concepts by linking results to graphical presentations. Furthermore, contemporary trials that were initially analysed under a frequentist framework, are re-analysed within a Bayesian framework to demonstrate several interpretative advantages.

Open in new tab Download slide

Randomized controlled trial, Statistics, Bayesian statistics, Methodology

INTRODUCTION

For statistical inference, cardiovascular researchers often rely on observed outcome frequencies in sample data and calculated P values [1–4]. This is called the frequentist-null hypothesis significance testing (NHST) approach. Nevertheless, the frequentist-NHST paradigm is associated with several well-known limitations. First, it does not take into account any existing previous evidence. Furthermore, P-values not only depend on the magnitude of the treatment effect but also on the precision with which the effect is measured. Consequently, they are strongly influenced by sample size. Moreover, by combining the P-value with a uniform significance level (α), the resulting dichotomization of statistical significance further complicates a valid interpretation of treatment effects [5, 6]. This is a particular challenge in cardiac surgery trials, as these are often constrained by smaller sample sizes, operational challenges and ethical considerations, rendering the task of adequately powering studies more challenging.

In contrast to the deductive reasoning that is represented by the frequentist-NHST paradigm, Bayesianism is an example of inductive reasoning and reflective of clinical reasoning [1]. The aim of this primer is to make the invested surgical researcher who is familiar with the frequent-NHST paradigm, accustomed to Bayesian statistics in cardiac surgery trials.

FREQUENTISM, NULL HYPOTHESIS SIGNIFICANCE TESTING AND P-VALUE MISCONCEPTIONS

Conceptually, frequentist statistical tests are all performed under the assumption that the null hypothesis (H₀, typically presuming that there is no treatment effect at all) is true. Therefore, the probability that H₀ is true cannot be quantified (which is similarly the case for the quantification of probability of the alternative hypothesis H₁). Instead, the probability of the observed frequencies (i.e. the trial results) given the null hypothesis is calculated (mathematical denotation: $P (data | hypothesis)$ ⁠). Consequently, the P-value represents the probability of observing these (or more extreme) frequencies under the assumption of H₀. Although often considered an improvement, the frequentist-NHST 95% confidence interval (CI) is also hampered by these interpretative difficulties, as it is, again, the interval in which, under repeated sampling, 95% of such intervals would contain the true parameter value.

Table 1 summarizes the main features of the frequentist-NHST approach. Table 2 elaborates on the most common misconceptions of its associated P-values [3–6]. Importantly, many characteristics that are often ascribed to the P-value (as presented in Table 2) are actually features of the Bayesian posterior distribution. This implies that there is a seeming need for a statistical approach capable of providing such answers.

Table 1:

Open in new tab

Features and differences in frequentism and Bayesianism

	Frequentism-NHST	Bayesianism
Mathematical denotation	$P (data \| hypothesis)$	$P (hypothesis \| data)$
Conceptual explanation	Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H₀ is true	Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data
Introduction of prior data	Impossible	Central aspect to obtain the posterior distribution
Use of current data (likelihood)	Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis	Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence
Adherence to the Likelihood Principle	Violates the likelihood principle	Fully respects the likelihood principle
Inference basis	Incorporates P-values, significance levels and sampling distribution	Based entirely on the posterior distribution
Inferential interval	The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments	The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest
Probabilistic quantification of specific hypothesis	Impossible	Directly available from the area under the curve of the posterior distribution

	Frequentism-NHST	Bayesianism
Mathematical denotation	$P (data \| hypothesis)$	$P (hypothesis \| data)$
Conceptual explanation	Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H₀ is true	Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data
Introduction of prior data	Impossible	Central aspect to obtain the posterior distribution
Use of current data (likelihood)	Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis	Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence
Adherence to the Likelihood Principle	Violates the likelihood principle	Fully respects the likelihood principle
Inference basis	Incorporates P-values, significance levels and sampling distribution	Based entirely on the posterior distribution
Inferential interval	The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments	The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest
Probabilistic quantification of specific hypothesis	Impossible	Directly available from the area under the curve of the posterior distribution

Partly based on Heuts et al. [1].

H₀: null hypothesis; NHST: null hypothesis significance testing.

Table 1:

Open in new tab

Features and differences in frequentism and Bayesianism

	Frequentism-NHST	Bayesianism
Mathematical denotation	$P (data \| hypothesis)$	$P (hypothesis \| data)$
Conceptual explanation	Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H₀ is true	Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data
Introduction of prior data	Impossible	Central aspect to obtain the posterior distribution
Use of current data (likelihood)	Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis	Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence
Adherence to the Likelihood Principle	Violates the likelihood principle	Fully respects the likelihood principle
Inference basis	Incorporates P-values, significance levels and sampling distribution	Based entirely on the posterior distribution
Inferential interval	The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments	The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest
Probabilistic quantification of specific hypothesis	Impossible	Directly available from the area under the curve of the posterior distribution

	Frequentism-NHST	Bayesianism
Mathematical denotation	$P (data \| hypothesis)$	$P (hypothesis \| data)$
Conceptual explanation	Estimates the probability of observing the data (or more extreme data in repeated similar experiments), under the assumption that H₀ is true	Uses a posterior distribution, derived from a weighted combination of a prior belief and the current data, to provide probability estimates of a hypothesis given the observed data
Introduction of prior data	Impossible	Central aspect to obtain the posterior distribution
Use of current data (likelihood)	Central aspect used to assess the consistency of the data given a specific parameter, value, or hypothesis	Central aspect which quantifies the observed data support for all possible parameter values, serving as a relative measure of evidence
Adherence to the Likelihood Principle	Violates the likelihood principle	Fully respects the likelihood principle
Inference basis	Incorporates P-values, significance levels and sampling distribution	Based entirely on the posterior distribution
Inferential interval	The confidence interval (usually 95%), representing the interval that would contain the parameter of interest in 95% of instances in infinite sample of similar future experiments	The credible interval (usually 95%, or the highest posterior density interval [1]) which represent the interval for where there is 95% (un)certainty that it contains the parameter of interest
Probabilistic quantification of specific hypothesis	Impossible	Directly available from the area under the curve of the posterior distribution

Partly based on Heuts et al. [1].

H₀: null hypothesis; NHST: null hypothesis significance testing.

Table 2:

Open in new tab

Common P-value misconceptions

Misconception	Adequate interpretation
The P-value represents the probability of H₀ being true	As the P-value is calculated under the assumption that H₀ is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H₀ is true.
The P-value represents the 1-probability of H₁ being true	Similarly, as the P-value is calculated under the assumption that H₀ is true, it cannot simultaneously represent the probability that H₁ is true.
A P-value >0.05 implies that H₀ is true	As the P-value is calculated under the assumption H₀ is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments.
A P-value <0.05 implies that H₁ is true	Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments.
A P-value <0.05 implies that there is an important difference between groups	Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small.
A P-value >0.05 implies that there is no difference between groups	Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials.

Misconception	Adequate interpretation
The P-value represents the probability of H₀ being true	As the P-value is calculated under the assumption that H₀ is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H₀ is true.
The P-value represents the 1-probability of H₁ being true	Similarly, as the P-value is calculated under the assumption that H₀ is true, it cannot simultaneously represent the probability that H₁ is true.
A P-value >0.05 implies that H₀ is true	As the P-value is calculated under the assumption H₀ is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments.
A P-value <0.05 implies that H₁ is true	Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments.
A P-value <0.05 implies that there is an important difference between groups	Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small.
A P-value >0.05 implies that there is no difference between groups	Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials.

Partly based on Goodman et al. [5] and Greenland et al. [6].

H₀: null hypothesis, H₁: alternative hypothesis.

Table 2:

Open in new tab

Common P-value misconceptions

Misconception	Adequate interpretation
The P-value represents the probability of H₀ being true	As the P-value is calculated under the assumption that H₀ is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H₀ is true.
The P-value represents the 1-probability of H₁ being true	Similarly, as the P-value is calculated under the assumption that H₀ is true, it cannot simultaneously represent the probability that H₁ is true.
A P-value >0.05 implies that H₀ is true	As the P-value is calculated under the assumption H₀ is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments.
A P-value <0.05 implies that H₁ is true	Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments.
A P-value <0.05 implies that there is an important difference between groups	Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small.
A P-value >0.05 implies that there is no difference between groups	Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials.

Misconception	Adequate interpretation
The P-value represents the probability of H₀ being true	As the P-value is calculated under the assumption that H₀ is true, it cannot represent a probability of a hypothesis. Instead, it represents the probability of finding similar—or more extreme—data in future similar experiments, under the assumption that H₀ is true.
The P-value represents the 1-probability of H₁ being true	Similarly, as the P-value is calculated under the assumption that H₀ is true, it cannot simultaneously represent the probability that H₁ is true.
A P-value >0.05 implies that H₀ is true	As the P-value is calculated under the assumption H₀ is true, a P-value above 0.05 merely implies that similar or more extreme data would be observed in >5% of similar future experiments.
A P-value <0.05 implies that H₁ is true	Similarly to the previous two misconceptions, a P-value below 0.05 only represents the <5% probability that similar or more extreme data would be observed in such future experiments.
A P-value <0.05 implies that there is an important difference between groups	Smaller P-values represent the probability of the observed data under the null hypothesis, which can particularly occur in very large trials, although a treatment effect may be very small.
A P-value >0.05 implies that there is no difference between groups	Larger P-values simply represent that the observed data are still not that unusual under the null hypothesis, which may be particularly evident in smaller trials.

Partly based on Goodman et al. [5] and Greenland et al. [6].

H₀: null hypothesis, H₁: alternative hypothesis.

BAYESIANISM AND ITS PRINCIPLES

In contrast to the frequentist-NHST framework, the Bayesian framework actually estimates the probability of a hypothesis, in the light of the current data. This can also be denoted as $P (hypothesis | data)$ (Table 1). Bayesian methodology is based on the combination of a previous belief (the ‘prior’) with the currently obtained data (the likelihood), leading to a posterior belief (the ‘posterior’) which is denoted by Bayes’ Theorem (⁠ $P [A | B] = \frac{P [B | A] P [A]}{P [B]}$ ⁠). The terminology that is used throughout this primer is explained in Supplementary Material S1. Also, Supplementary Material S2 explains the features of Bayes’ Theorem in more detail, elucidating the abovementioned formula. Then, Fig. 1 visually presents the interplay between prior, likelihood and posterior, using Bayes’ Theorem in two scenarios.

Figure 1:

Principles of Bayesian statistics. Visual presentation of Bayes’ Theorem (Supplementary Material S2), presenting the interplay between prior, likelihood and posterior. (A) Example of a hypothetical trial analysed under a minimally informative prior (assuming no difference with a wide distribution of clinically plausible effect sizes) in which it can be noted that the posterior is virtually similar to the likelihood, as the prior only exerts a negligible effect. (B) Example of an analysis of a hypothetical trial under an informed prior, in which we can observe that the posterior results from both the prior and the likelihood, and subsequently has a mean effect with a smaller distribution, reflecting an increased level of certainty.

Open in new tab Download slide

Notably, since the posterior is a reflection of a probability distribution, it allows for estimating the probability of various hypotheses. This includes the presence of ‘any’ difference between groups [(absolute) risk difference >0% or risk ratio >1.0], as well as clinically relevant treatment effects, such as a 1.0% mortality increase. These clinically relevant treatment effects are often referred to as the ‘minimal clinically important difference’.

The prior

The elicited prior reflects the previous belief regarding a treatment effect. This belief can be based on clinical experience, an attitude towards a treatment effect, or, preferably, previous (trial) data [1, 2]. The prior is arguably the most debated aspect of the Bayesian approach, as it could be sensitive to subjectivity. To mitigate this subjectivity, a minimally informative prior can serve as a valid starting point for the analysis of randomized controlled trials (Fig. 1A) [2]. This prior centres around no effect (i.e. [A]RD 0 or risk ratio 1.0) with a wide distribution capturing clinically plausible effect sizes, allowing it to only have negligible influence on the posterior.

Trial data can also be analysed under informed priors, which can be derived from previous trial data and subsequent meta-analyses (Fig. 1B) [1]. However, it is important that such a prior is realistic, which implies that it is derived from a similar study design (i.e. previous randomized data in the light of the analysis of a randomized controlled trial), and similar study population (i.e. in terms of risk, intervention and control). Of note, unrealistic priors can have a marked—though inappropriate—effect on the posterior.

Lastly, reference priors can be applied to represent enthusiastic, pessimistic or sceptical prior views, and they may consequently serve as sensitivity analyses [1, 2, 7]. If the posterior is relatively insensitive to these views, one can consider the trial results robust.

Ideally, all priors—both minimally informative, informed and reference priors—are predefined in a trial protocol and statistical analysis plan to guarantee objective conduct [7].

The likelihood

The likelihood in Bayesianism may seem similar to the likelihood in frequentism, although there are important conceptual differences. In Bayesianism, the likelihood quantifies how well the data support all different parameter values, and is used to update beliefs about the parameter via Bayes’ Theorem. Instead, in frequentism-NHST, the likelihood represents the probability of the observed data conditional on a specific hypothesis, as parameters in this paradigm are treated as single, fixed (but unknown) entities. It is then used to assess the consistency of the data with a fixed parameter value.

Combining prior and likelihood: the posterior

Through Bayes’ Theorem, the product of the prior and likelihood is the posterior distribution. Visually, the prior plot (Fig. 1, particularly B) is a probability density function (PDF) over the range of plausible parameter values where its shape and location indicate how strongly certain parameter values are believed before the data are observed. The likelihood plot is also a curve (though not a PDF), with its peak corresponding to the parameter value most consistent with the data. The posterior plot is proportional to the product of the heights of the prior and likelihood curves, and is often normalized to produce a PDF. For more in-depth computational aspects of the calculation of the posterior distribution, we refer to previous guiding articles [1, 8].

The posterior is summarized by a mean effect size and a 95% credible interval (CrI, or highest posterior density interval). Interestingly, this 95% CrI is a probability interval and therefore reflective of the interval for which there is 95% (un)certainty that it contains the true effect (in contrast to the frequentist 95% CI, Table 1).

Practical applications of Bayesian thinking

Bayesian inference mirrors clinical reasoning through sequential updating. One prominent historical example of Bayesian thinking originates from the seminal 1979 publication by Diamond and Forrester [9]. In their calculations, characteristics such as age, sex and symptomatology were determined the prior, which was then updated by the use of functional tests (electrocardiographic exercise testing), yielding a posterior probability of obstructive coronary artery disease (CAD). Consequently, the posterior probability of CAD can then differ between a 70-year-old male with typical chest pain and a 20-year-old female with atypical symptoms, despite both having a similarly ‘positive’ stress test [9].

THE APPLICABILITY OF BAYESIAN STATISTICAL INFERENCE IN CARDIAC SURGERY TRIALS

Cardiac surgery trials can especially benefit from the Bayesian approach for three important reasons:

Sample sizes are relatively limited in cardiac surgery trials, potentially leading to the erroneous conclusion that an intervention is ineffective, although a clinically relevant treatment effect may be present.
A surgical treatment effect should be clinically meaningful (i.e. rather large), regardless of statistical significance. The Bayesian approach facilitates the estimation of the probability of such treatment effects.
For many interventions, meaningful prior evidence exists, which can be incorporated into the analysis of the current data.

MATERIALS AND METHODS

The presented re-analyses constitute original work by the authors and were performed with dedicated Bayesian statistical software programs (JASP, JASP team, 2024, version 0.19.0 for Mac, Amsterdam, the Netherlands).

Data availability

All data that were used for the re-analyses and settings in the statistical program JASP will be made openly available upon publication through https://github.com/samuelheuts/Bayes_in_Cardiac_Surgery.

PREVIOUS APPLICATIONS OF BAYESIAN STATISTICS IN CARDIAC SURGERY TRIALS

Below, we outline several trials that incorporated a primary Bayesian statistical analysis plan. These are clarified by making use of visual representations of their posterior distributions.

SURTAVI and EVOLUT-LR

Most of the trials comparing transcatheter aortic valve implantation to surgical aortic valve replacement applied a non-inferiority design. In the series of trials evaluating self-expandable valves, the Bayesian framework was used (SURTAVI [10] and EVOLUT-LR [11]).

In SURTAVI [10], the probability of the non-inferiority margin (7% absolute risk increase) was estimated based on the posterior distribution of the risk difference (transcatheter aortic valve implantation—surgical aortic valve replacement), and non-inferiority would be declared if the probability of the treatment effect is less than that margin exceeded 97.1% (to control the type I error rate, based on extensive simulations) [12]. Figure 2A presents the reconstructed posterior of SURTAVI’s primary outcome under a minimally informative prior, and illustrates how this posterior distribution can be used to estimate the probability of both superiority and non-inferiority in the same analysis.

Figure 2:

Visual presentation of the original Bayesian analyses of the SURTAVI [11] and EVOLUT-LR [11] trials. The numbers between brackets denote the 95% credible interval. SAVR: surgical aortic valve replacement; TAVI: transcatheter aortic valve implantation.

Open in new tab Download slide

Similarly, EVOLUT-LR estimated the probability of non-inferiority (6% absolute risk increase), which would be declared if the posterior probability of the risk difference being less than this margin exceeded 97.2% (again, the threshold for this probability was predetermined to control the type I error rate, based on simulations) [11, 12]. Interestingly, EVOLUT-LR also prespecified a superiority analysis (as can be performed based on the same posterior distribution, without additional tests). Superiority would be declared if this probability exceeded 98.4%. Figure 2B presents these analyses and their interpretation for EVOLUT-LR. As can be appreciated, non-inferiority was met, while superiority could not be declared (i.e. a posterior probability of 77.9% instead of >98.4%).

More examples of primary Bayesian analysis of cardiac surgery trials are presented in Supplementary Materials S3–S5.

USING BAYES TO FACILITATE MORE INTUITIVE INTERPRETATION OF EXISTING DATA

Here, we aim to demonstrate how a Bayesian re-analysis can complement the frequentist-NHST results of pivotal trials in cardiac surgery.

When absence of evidence does not equate to evidence of absence: FAME-3

The FAME-3 trial assessed the effectiveness of fractional flow reserve (FFR)-guided percutaneous coronary intervention (PCI) versus coronary artery bypass grafting (CABG) in patients with multivessel stable coronary artery disease [13]. The composite end-point during the 3-year analysis occurred in 12.0% and 9.2% of FFR-PCI and CABG patients [frequentist hazard ratio (HR) 1.3, 95% CI 0.98–1.03, P = 0.07]. FAME-3 therefore concluded that ‘there was no significant difference in the incidence of the composite outcome between FFR-guided PCI and CABG’. Such dichotomous conclusions—implying ‘absence of an effect’—can be unsettling to clinicians, especially when survival curves suggest clear differences in outcomes. Here, a Bayesian re-analysis could offer a more intuitive interpretation of the probabilities associated with these outcomes.

We employed a minimally informative prior (i.e. a mean of 0 and SD of 2 on the log odds ratio scale). As can be appreciated in Fig. 3A, this resulted in a mean risk difference of −2.9% (95% CrI −6.0–0.3%). These findings lead to a posterior probability of any effect (>0% risk difference) of 96.7%, while the probability of a >−1% risk difference in the primary end-point in favour of CABG was 88.5%.

Figure 3:

Results of a Bayesian re-interpretation of the primary end-point (A) and all-cause mortality end-points (B and C) of the FAME-3 trial [13]. The numbers between brackets denote the 95% credible interval. The RCT-informed prior results from a Bayesian meta-analysis by Kawczynski et al. [14]. (A)RD: (absolute) risk difference; CABG: coronary artery bypass grafting, PCI: percutaneous coronary intervention, RCT: randomized controlled trial.

Open in new tab Download slide

Nevertheless, all-cause mortality is arguably the most clinically relevant end-point. In FAME-3, at 3-years, the all-cause mortality rates were 4.1% and 3.9% in the FFR-guided PCI and CABG groups (frequentist HR 1.0, 95% CI 0.6–1.7), respectively. Similarly, when re-analysed under a Bayesian framework, these findings yielded a risk difference of −0.2% (95% CrI −2.2% to 1.8%), with a posterior probability of a CABG benefit of only 57.8% in terms of mortality (Fig. 3B).

One could consider introducing previous randomized evidence into the analysis of the all-cause mortality rate of the FAME-3 trial. Recently, Kawczynski and colleagues performed a Bayesian meta-analysis of PCI versus CABG trials [14]. Kawzcynski et al.’s posterior could be used as a prior for the current Bayesian re-analysis, emphasizing the Bayesian principle that today’s posterior is tomorrow’s prior [1]. Under this prior, this analysis results in a mean mortality difference of −0.8% in favour of CABG (95% CrI −1.6; 0%). In turn, the posterior probability of a > 0% and >−1% risk difference is 98.2% and 30.1%, respectively (Fig. 3C).

Defining evidence of absence: ISCHEMIA

Under the frequentist-NHST framework, it may be difficult to demonstrate the absence of a clinically relevant treatment effect. In contrast, Bayesian inference allows for the estimation of both clinically relevant and clinically irrelevant treatment effects. The concept of the region of practical equivalence facilitates this distinction (Supplementary Material S1). The region of practical equivalence can be estimated as the region between the negative and positive minimal clinically important differences. If this region is considerably large, one can safely assume that a treatment is neither clinically beneficial, nor harmful. Or, in other words, one could demonstrate evidence of absence.

The ISCHEMIA trial published its 7-year follow-up in 2023 [15]. This trial evaluated the clinical effectiveness of an invasive approach versus a conservative approach in patients with stable coronary artery disease. In ISCHEMIA, the difference in all-cause mortality was 0.09% in favour of the invasive approach (95% CrI −1.85 to 1.99%). Commendably, Hochman and colleagues performed these analyses through Bayesian statistical methodologies and found a posterior probability of any invasive treatment effect of 53.8%, and a 46.2% probability of a treatment effect in favour of conservative treatment. Moreover, they demonstrated a posterior probability of 17% of a > 1% all-cause mortality difference in favour of the invasive arm, and a 13% posterior probability of such an effect in favour of the conservative arm [15].

Based on these findings, one can estimate the probability of the absence of a treatment effect (i.e. the region between +1% and -1% ARD for all-cause mortality), which is 70%, as can be appreciated in Fig. 4.

Figure 4:

Result of a Bayesian re-interpretation of the ISCHEMIA [15] trial to demonstrate ‘evidence of absence’. The dotted line presents the mean risk difference between groups in ISCHEMIA (0.09%), while the dark-shaded area under the curve represents the probability of the region of practical equivalence (ROPE, 70%). ROPE: region of practical equivalence.

Open in new tab Download slide

CONCLUSION

Bayesian statistical methodology offers distinct advantages in the context of cardiac surgery trials, by incorporating prior evidence and generating posterior distributions. This statistical primer aims to introduce the foundational concepts of Bayesian statistics, equipping the surgical reader with a practical understanding of its relevance and application.

SUPPLEMENTARY MATERIAL

Supplementary material is available at EJCTS online.

FUNDING

Samuel Heuts is supported by the Dekkerprogram of the Dutch Heart Foundation.

Conflict of interest: Graeme L. Hickey is an employee and shareholder of Medtronic, unrelated to the current work.

DATA AVAILABILITY

All data will be made openly available upon publication through https://github.com/samuelheuts/Bayes_in_Cardiac_Surgery.

Author contributions

Samuel Heuts: Conceptualization, Methodology, Formal analysis, Data curation, Writing—original draft, Visualization. Michal J. Kawczynski: Conceptualization, Data curation, Writing—review and editing, Visualization. Bart J.J. Veldersc: Conceptualization, Data curation, Writing—review and editing, Visualization. James M. Brophyd: Conceptualization, Validation, Supervision, Writing—review and editing. Graeme L. Hickeye: Conceptualization, Validation, Supervision, Writing—review and editing. Mariusz Kowalewskia: Conceptualization, Validation, Supervision, Writing—original draft

REFERENCES

Heuts

Kawczynski

Sayed

et al.

Bayesian analytical methods in cardiovascular clinical trials: why, when, and how

Can J Cardiol

2025

;

–

Yarnell

Abrams

Baldwin

et al.

Clinical trials in critical care: can a Bayesian approach enhance clinical and scientific decision-making?

Lancet Respir Med

2021

;

207

–

Wasserstein

Lazar

NA.

The ASA statement on p-values: context, process, and purpose

Am Stat

2016

;

129

–

Google Scholar

Crossref

WorldCat

Wasserstein

Schirm

Lazar

NA.

Moving to a world beyond “p < 0.05

.”.

Am Stat

2019

;

–

Google Scholar

Crossref

WorldCat

Goodman

A dirty dozen: twelve p-value misconceptions

Semin Hematol

2008

;

135

–

Greenland

Senn

Rothman

et al.

Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations

Eur J Epidemiol

2016

;

337

–

Heuts

de Heer

Gabrio

PRECISe investigators

et al.

The impact of high versus standard enteral protein provision on functional recovery following intensive care admission: protocol for a pre-planned secondary Bayesian analysis of the PRECISe trial

Clin Nutr ESPEN

2024

;

162

–

van de Schoot

Depaoli

King

et al.

Bayesian statistics and modelling

Nat Rev Methods Primers

2021

;

Google Scholar

OpenURL Placeholder Text

WorldCat

Diamond

Forrester

JS.

Analysis of probability as an aid in the clinical diagnosis of coronary-artery disease

N Engl J Med

1979

;

300

1350

–

Reardon

Van Mieghem

Popma

SURTAVI Investigators

et al.

Surgical or transcatheter aortic-valve replacement in intermediate-risk patients

N Engl J Med

2017

;

376

1321

–

Popma

Deeb

Yakubov

Evolut Low Risk Trial Investigators

et al

Transcatheter aortic-valve replacement with a self-expanding valve in low-risk patients

N Engl J Med

2019

;

380

1706

–

FDA guidance documents: https://www.fda.gov/regulatory-information/search-fda-guidance-documents/guidance-use-bayesian-statistics-medical-device-clinical-trials.

Zimmermann

Ding

Pijls

NHJ

et al. ,

FAME 3 Investigators

Fractional flow reserve-guided pci or coronary bypass surgery for 3-vessel coronary artery disease: 3-year follow-up of the FAME 3 Trial

Circulation

2023

;

148

950

–

Kawczynski

Gabrio

Maessen

et al.

Percutaneous coronary intervention with drug-eluting stents versus coronary bypass surgery for coronary artery disease: a Bayesian perspective

J Thorac Cardiovasc Surg

2024

;

Google Scholar

OpenURL Placeholder Text

WorldCat

Hochman

Anthopolos

Reynolds

et al. ,

ISCHEMIA-EXTEND Research Group

Survival after invasive or conservative management of stable coronary disease

Circulation

2023

;

147

–

ABBREVIATIONS

(A)RD
(Absolute) risk difference

CI
Confidence interval

CrI
Credible interval

HR
Hazard ratio

NHST
Null hypothesis significance testing

OR
Odds ratio

PDF
Probability density function

This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected] for reprints and translation rights for reprints. All other permissions can be obtained through our RightsLink service via the Permissions link on the article page on our site—for further information please contact [email protected].

Download all slides

Month:	Total Views:
April 2025	170
May 2025	23

Article Contents

Statistical primer: an introduction into the principles of Bayesian statistical analyses in clinical trials

Abstract

INTRODUCTION

FREQUENTISM, NULL HYPOTHESIS SIGNIFICANCE TESTING AND P-VALUE MISCONCEPTIONS

BAYESIANISM AND ITS PRINCIPLES

The prior

The likelihood

Combining prior and likelihood: the posterior

Practical applications of Bayesian thinking

THE APPLICABILITY OF BAYESIAN STATISTICAL INFERENCE IN CARDIAC SURGERY TRIALS

MATERIALS AND METHODS

Data availability

PREVIOUS APPLICATIONS OF BAYESIAN STATISTICS IN CARDIAC SURGERY TRIALS

SURTAVI and EVOLUT-LR

USING BAYES TO FACILITATE MORE INTUITIVE INTERPRETATION OF EXISTING DATA

When absence of evidence does not equate to evidence of absence: FAME-3

Defining evidence of absence: ISCHEMIA

CONCLUSION

SUPPLEMENTARY MATERIAL

FUNDING

DATA AVAILABILITY

Author contributions

REFERENCES

ABBREVIATIONS

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Most Read

Most Cited

Article Contents

Statistical primer: an introduction into the principles of Bayesian statistical analyses in clinical trials

Abstract

INTRODUCTION

FREQUENTISM, NULL HYPOTHESIS SIGNIFICANCE TESTING AND P-VALUE MISCONCEPTIONS

BAYESIANISM AND ITS PRINCIPLES

The prior

The likelihood

Combining prior and likelihood: the posterior

Practical applications of Bayesian thinking

THE APPLICABILITY OF BAYESIAN STATISTICAL INFERENCE IN CARDIAC SURGERY TRIALS

MATERIALS AND METHODS

Data availability

PREVIOUS APPLICATIONS OF BAYESIAN STATISTICS IN CARDIAC SURGERY TRIALS

SURTAVI and EVOLUT-LR

USING BAYES TO FACILITATE MORE INTUITIVE INTERPRETATION OF EXISTING DATA

When absence of evidence does not equate to evidence of absence: FAME-3

Defining evidence of absence: ISCHEMIA

CONCLUSION

SUPPLEMENTARY MATERIAL

FUNDING

DATA AVAILABILITY

Author contributions

REFERENCES

ABBREVIATIONS

Supplementary data

Citations

Views

Altmetric

Email alerts

Citing articles via

Most Read

Most Cited

This Feature Is Available To Subscribers Only