Communicating and understanding statistical measures when quantifying the between-group difference in competing risks

Summary of the distinction between CSH-based, SDH-based and RMTL-based methods under competing risks

	CSH-based	SDH-based	RMTL-based
Treatment effect	The direct (aetiological) effect on a specific event	The actual (total) effect on a specific event	The actual (total) effect on a specific event
Statistical inference
Estimation in each group	In general, the CSH in each group may not be provided	In general, the SDH in each group may not be provided	The RMTL in each group can be obtained
Effect size between groups	Relative measure: cause-specific hazard ratio (cHR)	Relative measure: subdistribution hazard ratio (sHR)	Absolute measure: the difference in RMTLs between groups (RMTLd)
Hypothesis test	Log-rank test	Gray’s test	RMTLd test
Proportional hazards model assumption	Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading	Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading	No
Study period	From t = 0 to the longest specific event time between groups	From t = 0 to the longest specific event time between groups	From t = 0 to the restricted time point $τ$
Clinical interpretation	Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups	Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups	Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event
The selection of restricted time point	Corresponds to the longest specific event time between groups	Corresponds to the longest specific event time between groups	A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected

	CSH-based	SDH-based	RMTL-based
Treatment effect	The direct (aetiological) effect on a specific event	The actual (total) effect on a specific event	The actual (total) effect on a specific event
Statistical inference
Estimation in each group	In general, the CSH in each group may not be provided	In general, the SDH in each group may not be provided	The RMTL in each group can be obtained
Effect size between groups	Relative measure: cause-specific hazard ratio (cHR)	Relative measure: subdistribution hazard ratio (sHR)	Absolute measure: the difference in RMTLs between groups (RMTLd)
Hypothesis test	Log-rank test	Gray’s test	RMTLd test
Proportional hazards model assumption	Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading	Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading	No
Study period	From t = 0 to the longest specific event time between groups	From t = 0 to the longest specific event time between groups	From t = 0 to the restricted time point $τ$
Clinical interpretation	Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups	Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups	Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event
The selection of restricted time point	Corresponds to the longest specific event time between groups	Corresponds to the longest specific event time between groups	A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected

CSH, cause-specific hazard; RMTL, restricted mean time lost; SDH, subdistribution hazard.

Table 1.

Summary of the distinction between CSH-based, SDH-based and RMTL-based methods under competing risks

	CSH-based	SDH-based	RMTL-based
Treatment effect	The direct (aetiological) effect on a specific event	The actual (total) effect on a specific event	The actual (total) effect on a specific event
Statistical inference
Estimation in each group	In general, the CSH in each group may not be provided	In general, the SDH in each group may not be provided	The RMTL in each group can be obtained
Effect size between groups	Relative measure: cause-specific hazard ratio (cHR)	Relative measure: subdistribution hazard ratio (sHR)	Absolute measure: the difference in RMTLs between groups (RMTLd)
Hypothesis test	Log-rank test	Gray’s test	RMTLd test
Proportional hazards model assumption	Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading	Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading	No
Study period	From t = 0 to the longest specific event time between groups	From t = 0 to the longest specific event time between groups	From t = 0 to the restricted time point $τ$
Clinical interpretation	Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups	Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups	Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event
The selection of restricted time point	Corresponds to the longest specific event time between groups	Corresponds to the longest specific event time between groups	A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected

	CSH-based	SDH-based	RMTL-based
Treatment effect	The direct (aetiological) effect on a specific event	The actual (total) effect on a specific event	The actual (total) effect on a specific event
Statistical inference
Estimation in each group	In general, the CSH in each group may not be provided	In general, the SDH in each group may not be provided	The RMTL in each group can be obtained
Effect size between groups	Relative measure: cause-specific hazard ratio (cHR)	Relative measure: subdistribution hazard ratio (sHR)	Absolute measure: the difference in RMTLs between groups (RMTLd)
Hypothesis test	Log-rank test	Gray’s test	RMTLd test
Proportional hazards model assumption	Yes, when the proportional CSH assumption is violated, the clinical interpretation of a single cHR may be misleading	Yes, when the proportional SDH assumption is violated, the clinical interpretation of a single sHR may be misleading	No
Study period	From t = 0 to the longest specific event time between groups	From t = 0 to the longest specific event time between groups	From t = 0 to the restricted time point $τ$
Clinical interpretation	Based on the scale of ‘hazard’, cHR needs to be interpreted as an increase (or decrease) of CSH between groups	Based on the scale of ‘hazard’, sHR needs to be interpreted as an increase (or decrease) of SDH between groups	Based on the scale of ‘time’, RMTLd can be interpreted as the average loss (or gain) of time/life due to a specific event
The selection of restricted time point	Corresponds to the longest specific event time between groups	Corresponds to the longest specific event time between groups	A specific time point can be considered with clinical consideration; if not specified, the minimum of the longest observed time between groups can be selected

CSH, cause-specific hazard; RMTL, restricted mean time lost; SDH, subdistribution hazard.

Table 2.

The statistical outcomes of the NEfERT-T trial and EBMT example^a

	No. (%)^b	CSH-based	SDH-based	RMTL-based^c
Example 1: NEfERT-T trial
CNS lesions
Trastuzumab–paclitaxel (TP)	41 (17.3%)	—	—	8.96 (6.51, 11.41)
Neratinib–paclitaxel (NP)	21 (8.7%)	—	—	4.56 (2.72, 6.40)
NP vs TP		cHR = 0.49 (0.29, 0.83); Log-rank test P = 0.006	sHR = 0.47 (0.28, 0.79); Gray’s test P = 0.004	RMTLd = –4.40 (–7.46, –1.33); RMTLd test P = 0.005
Test for PH assumption		P = 0.848	P = 0.290	—
Death without CNS lesions
Trastuzumab–paclitaxel (TP)	115 (48.5%)	—	—	25.07 (22.05, 28.08)
Neratinib–paclitaxel (NP)	146 (60.3%)	—	—	29.81 (27.08, 32.54)
NP vs TP		cHR = 1.21 (0.95, 1.55); Log-rank test P = 0.119	sHR = 1.34 (1.05, 1.71); Gray’s test P = 0.019	RMTLd = 4.74 (0.68, 8.81); RMTLd test P = 0.022
Test for PH assumption		P = 0.625	P = 0.279	—
Example 2: EBMT example
Death without relapse
Unmatched	145 (26.6%)	—	—	4.66 (4.00, 5.32)
Matched	388 (22.4%)	—	—	3.64 (3.32, 3.96)
Matched vs unmatched		cHR = 0.83 (0.68, 1.00); Log-rank test P = 0.051	sHR = 0.83 (0.69, 1.01); Gray’s test P = 0.064	RMTLd = –1.02 (–1.76, –0.29); RMTLd test P = 0.006
Test for PH assumption		P = 0.001	P = 0.002	—
Relapse
Unmatched	90 (16.5%)	—	—	2.66 (2.16, 3.16)
Matched	280 (16.1%)	—	—	2.61 (2.33, 2.90)
Matched vs unmatched		cHR = 0.96 (0.75, 1.21); Log-rank test P = 0.714	sHR = 0.97 (0.76, 1.22); Gray’s test P = 0.773	RMTLd = –0.05 (–0.62, 0.53); RMTLd test P = 0.877
Test for PH assumption		P = 0.219	P = 0.282	—

	No. (%)^b	CSH-based	SDH-based	RMTL-based^c
Example 1: NEfERT-T trial
CNS lesions
Trastuzumab–paclitaxel (TP)	41 (17.3%)	—	—	8.96 (6.51, 11.41)
Neratinib–paclitaxel (NP)	21 (8.7%)	—	—	4.56 (2.72, 6.40)
NP vs TP		cHR = 0.49 (0.29, 0.83); Log-rank test P = 0.006	sHR = 0.47 (0.28, 0.79); Gray’s test P = 0.004	RMTLd = –4.40 (–7.46, –1.33); RMTLd test P = 0.005
Test for PH assumption		P = 0.848	P = 0.290	—
Death without CNS lesions
Trastuzumab–paclitaxel (TP)	115 (48.5%)	—	—	25.07 (22.05, 28.08)
Neratinib–paclitaxel (NP)	146 (60.3%)	—	—	29.81 (27.08, 32.54)
NP vs TP		cHR = 1.21 (0.95, 1.55); Log-rank test P = 0.119	sHR = 1.34 (1.05, 1.71); Gray’s test P = 0.019	RMTLd = 4.74 (0.68, 8.81); RMTLd test P = 0.022
Test for PH assumption		P = 0.625	P = 0.279	—
Example 2: EBMT example
Death without relapse
Unmatched	145 (26.6%)	—	—	4.66 (4.00, 5.32)
Matched	388 (22.4%)	—	—	3.64 (3.32, 3.96)
Matched vs unmatched		cHR = 0.83 (0.68, 1.00); Log-rank test P = 0.051	sHR = 0.83 (0.69, 1.01); Gray’s test P = 0.064	RMTLd = –1.02 (–1.76, –0.29); RMTLd test P = 0.006
Test for PH assumption		P = 0.001	P = 0.002	—
Relapse
Unmatched	90 (16.5%)	—	—	2.66 (2.16, 3.16)
Matched	280 (16.1%)	—	—	2.61 (2.33, 2.90)
Matched vs unmatched		cHR = 0.96 (0.75, 1.21); Log-rank test P = 0.714	sHR = 0.97 (0.76, 1.22); Gray’s test P = 0.773	RMTLd = –0.05 (–0.62, 0.53); RMTLd test P = 0.877
Test for PH assumption		P = 0.219	P = 0.282	—

cHR, cause-specific hazard ratio; CNS, central nervous system; CSH, cause-specific hazard; PH, proportional hazards; RMTL, restricted mean time lost; RMTLd, difference in RMTLs between groups; SDH, subdistribution hazard; sHR, subdistribution hazard ratio.

a

The corresponding confidence interval (CI) in this table is the 95% CI.

b

The number of events and corresponding proportions in each group (%).

Table 2.

Open in new tab Download slide

The statistical outcomes of the NEfERT-T trial and EBMT example^a

	No. (%)^b	CSH-based	SDH-based	RMTL-based^c
Example 1: NEfERT-T trial
CNS lesions
Trastuzumab–paclitaxel (TP)	41 (17.3%)	—	—	8.96 (6.51, 11.41)
Neratinib–paclitaxel (NP)	21 (8.7%)	—	—	4.56 (2.72, 6.40)
NP vs TP		cHR = 0.49 (0.29, 0.83); Log-rank test P = 0.006	sHR = 0.47 (0.28, 0.79); Gray’s test P = 0.004	RMTLd = –4.40 (–7.46, –1.33); RMTLd test P = 0.005
Test for PH assumption		P = 0.848	P = 0.290	—
Death without CNS lesions
Trastuzumab–paclitaxel (TP)	115 (48.5%)	—	—	25.07 (22.05, 28.08)
Neratinib–paclitaxel (NP)	146 (60.3%)	—	—	29.81 (27.08, 32.54)
NP vs TP		cHR = 1.21 (0.95, 1.55); Log-rank test P = 0.119	sHR = 1.34 (1.05, 1.71); Gray’s test P = 0.019	RMTLd = 4.74 (0.68, 8.81); RMTLd test P = 0.022
Test for PH assumption		P = 0.625	P = 0.279	—
Example 2: EBMT example
Death without relapse
Unmatched	145 (26.6%)	—	—	4.66 (4.00, 5.32)
Matched	388 (22.4%)	—	—	3.64 (3.32, 3.96)
Matched vs unmatched		cHR = 0.83 (0.68, 1.00); Log-rank test P = 0.051	sHR = 0.83 (0.69, 1.01); Gray’s test P = 0.064	RMTLd = –1.02 (–1.76, –0.29); RMTLd test P = 0.006
Test for PH assumption		P = 0.001	P = 0.002	—
Relapse
Unmatched	90 (16.5%)	—	—	2.66 (2.16, 3.16)
Matched	280 (16.1%)	—	—	2.61 (2.33, 2.90)
Matched vs unmatched		cHR = 0.96 (0.75, 1.21); Log-rank test P = 0.714	sHR = 0.97 (0.76, 1.22); Gray’s test P = 0.773	RMTLd = –0.05 (–0.62, 0.53); RMTLd test P = 0.877
Test for PH assumption		P = 0.219	P = 0.282	—

	No. (%)^b	CSH-based	SDH-based	RMTL-based^c
Example 1: NEfERT-T trial
CNS lesions
Trastuzumab–paclitaxel (TP)	41 (17.3%)	—	—	8.96 (6.51, 11.41)
Neratinib–paclitaxel (NP)	21 (8.7%)	—	—	4.56 (2.72, 6.40)
NP vs TP		cHR = 0.49 (0.29, 0.83); Log-rank test P = 0.006	sHR = 0.47 (0.28, 0.79); Gray’s test P = 0.004	RMTLd = –4.40 (–7.46, –1.33); RMTLd test P = 0.005
Test for PH assumption		P = 0.848	P = 0.290	—
Death without CNS lesions
Trastuzumab–paclitaxel (TP)	115 (48.5%)	—	—	25.07 (22.05, 28.08)
Neratinib–paclitaxel (NP)	146 (60.3%)	—	—	29.81 (27.08, 32.54)
NP vs TP		cHR = 1.21 (0.95, 1.55); Log-rank test P = 0.119	sHR = 1.34 (1.05, 1.71); Gray’s test P = 0.019	RMTLd = 4.74 (0.68, 8.81); RMTLd test P = 0.022
Test for PH assumption		P = 0.625	P = 0.279	—
Example 2: EBMT example
Death without relapse
Unmatched	145 (26.6%)	—	—	4.66 (4.00, 5.32)
Matched	388 (22.4%)	—	—	3.64 (3.32, 3.96)
Matched vs unmatched		cHR = 0.83 (0.68, 1.00); Log-rank test P = 0.051	sHR = 0.83 (0.69, 1.01); Gray’s test P = 0.064	RMTLd = –1.02 (–1.76, –0.29); RMTLd test P = 0.006
Test for PH assumption		P = 0.001	P = 0.002	—
Relapse
Unmatched	90 (16.5%)	—	—	2.66 (2.16, 3.16)
Matched	280 (16.1%)	—	—	2.61 (2.33, 2.90)
Matched vs unmatched		cHR = 0.96 (0.75, 1.21); Log-rank test P = 0.714	sHR = 0.97 (0.76, 1.22); Gray’s test P = 0.773	RMTLd = –0.05 (–0.62, 0.53); RMTLd test P = 0.877
Test for PH assumption		P = 0.219	P = 0.282	—

cHR, cause-specific hazard ratio; CNS, central nervous system; CSH, cause-specific hazard; PH, proportional hazards; RMTL, restricted mean time lost; RMTLd, difference in RMTLs between groups; SDH, subdistribution hazard; sHR, subdistribution hazard ratio.

a

The corresponding confidence interval (CI) in this table is the 95% CI.

b

The number of events and corresponding proportions in each group (%).

Practical interpretation

From Table 2, the results of CSH-based and SDH-based methods in the NEfERT-T example both suggest statistically significant differences in CNS lesions between groups with log-rank P = 0.006 and Gray’s P = 0.004, respectively. However, in addition to a single P-value from the hypothesis testing procedure, a practical interpretation of the effect size between groups can be supplemented to provide more medical information. However, the estimated cHR (NP over TP group) of 0.49 (95% CI 0.29 to 0.83) (Table 2) specifically indicates that TP treatment achieved an approximately 51% instantaneous cause-specific hazard rate reduction on CNS lesions, but not that TP treatment decreased nearly 51% of the risk of experiencing CNS lesions. For the sHR, one-to-one correspondence between SDH and CIF can guarantee that the direction of the sHR can be interpreted as the direction of the CIFs between groups; that is, the sHR of 0.47 (95% CI 0.28 to 0.79) indicates that the CIF of CNS lesions of the NP group may be lower than that of the TP group (Figure 1A); however, it cannot be interpreted as the risk of experiencing CNS lesions being 0.47 times lower among the NP group than among individuals in the TP group. Because the HR (both cHR and sHR), the ratio of hazard functions, should be correctly interpreted as the ‘relative rate’ (hazard) but not ‘relative risk’ (probability),¹³ this may cause an non-intuitive interpretation for researchers. Moreover, as a ‘relative’ quantity by itself, the estimated HR (NP over TP group) may fail to convey appreciation of the magnitude of the treatment effect without reference to the hazard rate in the TP group.¹¹^,³²

In regard to the RMTL-based method, the RMTLs of the NP and TP groups on CNS lesions were estimated to be 4.56 (95% CI 2.72 to 6.40) and 8.96 (95% CI 6.51 to 11.41) months, respectively, by setting $τ =$ 54.09 months, which can be interpreted as follows. Individuals in the NP group lost an average of 4.56 months due to CNS lesions during 54.09 months of follow-up, whereas the TP group lost 8.96 months on average. When evaluating the between-group difference, the estimated RMTLd of -4.40 (NP minus TP group, 95% CI -7.46 to -1.33, P = 0.005) indicated that subjects in the TP group lost an additional 4.40 months due to CNS lesions within 54.09 months. Thus, the RMTL-based method can provide a more intuitive and acceptable interpretation from the scale of ‘time’ rather than ‘hazard’ to describe the treatment effect (more detail about clinical interpretation shown in Supplementary material Section 3, available as Supplementary data at IJE online).

Proportional hazards model assumption

We also tested the proportional hazards assumption of CSH and SDH in the NEfERT-T example. The lack-of-fit tests for non-proportional hazards of CSH³³ and SDH³⁴ on CNS lesions were P = 0.848 and P = 0.290, respectively, indicating that the proportional cause-specific hazard (PCH) and subdistribution hazard (PSH) assumptions were satisfied with a P-value larger than 0.050. Note that the proportional hazards indicate that the hazard ratio remains approximately constant during the entire follow-up period.¹⁵^,³⁵ However, if the hazards assumption is violated with a P-value lower than 0.050, the hazard ratio may change over time, and thus a single cHR or sHR may not be sufficient to quantify between-group differences.

For example, in the EBMT example, the CIF curves for the unmatched group were slightly higher than those of the matched group in the early period of follow-up, but after approximately 10 years, the magnitude of the difference became larger visually and caused a pattern with a late difference in survival (Figure 2A). This is supported by the results of PCH and PSH assumption tests, which are both violated with P = 0.001 and P = 0.002, respectively. As a result, both a single cHR (= 0.83, 95% CI 0.68 to 1.00, log-rank P = 0.051) and sHR (= 0.83, 95% CI 0.69 to 1.01, Gray’s test P = 0.064) estimated through the cause-specific Cox model and Fine–Gray model, lacked statistical power in the non-proportional hazards scenarios (Supplementary material Section 4, available as Supplementary data at IJE online).

However, the results of the RMTLd test suggested that there was a positive association between gender and death, with a P-value of 0.006, and the estimated RMTLd of −1.02 (95% CI −1.76 to −0.29, $τ =$ 16.24 years) can be interpreted as that, without gender matching, subjects lost an additional 1.02 years on average within 16.24 years. Actually, without proportional model assumption constraints, the RMTL-based method can be applied easily, can provide a clinically meaningful interpretation in the analytical procedure and at the same time, also has adequate ability to detect the between-group difference under the PSH or some non-PSH situations.²⁰

The selection of restricted time points

When performing the RMTL-based method, the selection of a restricted time point $τ$ in practice will certainly have a substantial impact on both the inference procedure (including estimation and testing between groups) and the clinical interpretation (Supplementary material Section 5, available as Supplementary data at IJE online). To illustrate the structure of the time points, we show the largest event time or censored time of the two groups in the EBMT example (Figure 3). For example, Patient 5 experienced the largest time until death without relapse in the unmatched group (t = 13.14 years), and Patient 1 experienced the largest censored time in the matched group (t = 17.26 years).

Figure 3.

Patterns for the largest event time or censored time of matched and unmatched groups in Example 2

There are two main approaches to determine the choices of $τ$ when assessing between-group differences. One is to take the minimum of the largest observed time between groups. Here, the largest observed time means the largest event time or largest censored time in one specific group. For instance in the unmatched group, Patient 4 had the largest observed time of 16.24 years, whereas the largest time was 17.26 years in the matched group (Patient 1). Thus, the minimum of the largest observed time between unmatched and matched groups is 16.24 years, and then we set $τ$ = 16.24 years in the EBMT example (the choice of $τ$ = 54.09 months in Example 1 is also similar). This kind of selection can incorporate all sufficient data to assess the treatment effect over the entire follow-up period.^36–38 In the second approach, $τ$ can be selected at fixed time points, such as $τ$ = 10, 12 and 14 years, for specific considerations, such as obtaining better clinical meaning or comparing outcomes from different studies.^39–41 For example, for clinical considerations, 21 or 28 days can be set as $τ$ in COVID-19 trials.⁴²^,⁴³

When estimating the hazard rate in one group, hazard-based (both CSH and SDH) methods also correspond to an underlying restricted time point at which at least one subject has experienced a specific event,⁴⁴ that is the largest event time. When quantifying between-group differences, the final event of the pooled sample in the entire study will be considered, which is the maximum of the largest event time between groups. In the EBMT example, the longest times to death in the unmatched and matched groups are 13.14 (Patient 5) and 12.49 years (Patient 2), respectively, and thus the maximum of the longest death time between groups is 13.14 years. Thus, the selection of a restricted time point of hazard-based methods depends on when the final specific event occurs (i.e. it is event-driven) and thus seems to be ‘passive’, whereas the $τ$ in the RMTL-based method is relatively ‘subjective’. However, the flexibility of the restricted time points in the RMTL-based approach does not mean arbitrariness. We recommend formulating a reasonable description of the time point selection before obtaining results, either objectively choosing the minimum of the longest observed time or considering a specific time point based on the purpose of the research.

R package ‘crRMTL’

The ‘crRMTL’ is available on GitHub [https://github.com/chenzgz/crRMTL] for assessing between-group differences based on CSH-, SDH- and RMTL-based methods, and both of the two illustrative examples also be included in this package. In addition, a step-by-step guide to the estimation procedures in this article is provided in Supplementary material Section 6 (available as Supplementary data at IJE online).

Summary and discussion

When quantifying between-group differences for time-to-event data in competing risks, two hazard-based analytical procedures, CSH-based and SDH-based methods, can be used to capture different aspects of competing risks data. The former focuses on the direct (or aetiological) effect and the latter is used to evaluate the actual effect of a specific event.7–8^,10,45 However, traditional hazard-based methods may have limitations on proportional hazards assumptions and clinical interpretation.¹²^,¹⁵^,⁴⁶ Therefore, the RMTL-based approach has attracted increasing attention due to its easy-to-understand clinical interpretation and model-free characteristics. As the area under the cumulative incidence curves, the RMTL-based method provides a perspective of ‘time’ rather than ‘hazard’ to describe the treatment effect. The RMTL-based approach has robust performance in detecting between-group differences both in the PSH scenario and some non-PSH situations.²⁰ However, the RMTL-based method may not be sensitive to crossing CIF curves because RMTLd may be equal to 0 in this scenario⁴⁷; but even though the RMTLd test in the crossing CIF curves may be insufficient in hypothesis testing, the estimated RMTL in each group is of crucial importance in clinical interpretation.

In addition, several limitations were not discussed in this article. First, we mainly focused on the univariable analysis for quantifying the between-group differences, but failed to extend the discussion to the multivariable analysis. Actually, there are several RMTL-based models that can adjust for confounding in competing risks.¹⁹^,^48–49 However, our discussions in distinctions of methodological concepts are still relevant in biomedical and epidemiological applications and helpful for clinicians and epidemiologists. Finally, we used the statement ‘treatment effect’ but did not attempt to provide more discussion on the topic of causal inference in competing risks.

Note that this article does not prove that the RMTL-based method can capture the entire treatment effect. No single measure serves all research purposes or can capture the entire profile of between-group differences. Hence, we hope that this discussion on the distinction between RMTL-based and hazard-based approaches will be helpful in applying and interpreting the above time-to-event data under competing risks.

Conclusion: recommendations

When PCH and PSH assumptions are met, we recommend that CSH-based and SDH-based methods be applied to assess between-group differences aetiologically and prognostically, respectively. In addition, the RMTL-based method can be supplemented to provide an intuitive interpretation of the treatment effect. However, when the PSH assumption is violated, we recommend the RMTL-based method as the primary outcome to assess the actual effect on specific events.

When applying the RMTL-based method, we recommend formulating a reasonable description of the time point selection when choosing a specific time point based on practical consideration; if not specified, the minimum of the longest observed time of research can be set as an alternative selection before obtaining results.

Ethics approval

Not applicable: all the work was developed using published or reconstructed data to better explain the difference between methods.

Data availability

The data in Example 1 were reconstructed using the progression-free survival and overall survival curves from the NEfERT-T trial. The data in Example 2 were derived from sources in the R package ‘mstate’ [https://www.r-project.org]. Both sets of data can be obtained in our package ‘crRMTL’.

Supplementary data

Supplementary data are available at IJE online.

Author contributions

Z.C and H.W. proposed the concept and design of the study. H.W. and Z.C. wrote the first draft of the manuscript. Y.H. and C.Z. reviewed the manuscript. H.W. and Z.C. completed the R package. All authors contributed to subsequent revisions of the manuscript and approved the final version. Z.C. is the guarantor. The corresponding author attests that all listed authors meet authorship criteria and that no others meeting the criteria have been omitted.

Funding

This work was supported by the National Natural Science Foundation of China (grant numbers 82173622, 82073675, 81903411, 81673268) and the Guangdong Basic and Applied Basic Research Foundation (grant number 2022A115011525).

Acknowledgements

The authors thank the editors and reviewers for their careful review and insightful comments.

Conflict of interest

None declared.

References

1

Awada

A

,

Colomer

R

,

Inoue

K

et al.

Neratinib plus paclitaxel vs trastuzumab plus paclitaxel in previously untreated metastatic ERBB2-positive breast cancer: the NEfERT-T randomized clinical trial

.

JAMA Oncol

2016

;

2

:

1557

–

64

.

2

Putter

H

,

Fiocco

M

,

Geskus

RB.

Tutorial in biostatistics: competing risks and multi-state models

.

Stat Med

2007

;

26

:

2389

–

430

.

3

Schumacher

M

,

Ohneberg

K

,

Beyersmann

J.

Competing risk bias was common in a prominent medical journal

.

J Clin Epidemiol

2016

;

80

:

135

–

36

.

4

van Walraven

C

,

McAlister

FA.

Competing risk bias was common in Kaplan-Meier risk estimates published in prominent medical journals

.

J Clin Epidemiol

2016

;

69

:

170

–

73.e8

.

5

Koller

MT

,

Raatz

H

,

Steyerberg

EW

et al.

Competing risks and the clinical community: irrelevance or ignorance?

Stat Med

2012

;

31

:

1089

–

97

.

6

Austin

PC

,

Fine

JP.

Accounting for competing risks in randomized controlled trials: a review and recommendations for improvement

.

Stat Med

2017

;

36

:

1203

–

209

.

7

Lau

B

,

Cole

SR

,

Gange

SJ.

Competing risk regression models for epidemiologic data

.

Am J Epidemiol

2009

;

170

:

244

–

56

.

8

Andersen

PK

,

Geskus

RB

,

de Witte

T

,

Putter

H.

Competing risks in epidemiology: possibilities and pitfalls

.

Int J Epidemiol

2012

;

41

:

861

–

70

.

9

Latouche

A

,

Allignol

A

,

Beyersmann

J

,

Labopin

M

,

Fine

JP.

A competing risks analysis should report results on all cause-specific hazards and cumulative incidence functions

.

J Clin Epidemiol

2013

;

66

:

648

–

53

.

10

Tullio

A

,

Magli

A

,

Moretti

E

et al.

Why we should take care of the competing risk bias in survival analysis: a phase II trial on the toxicity profile of radiotherapy for prostate cancer

.

Rep Pract Oncol Radiother

2019

;

24

:

511

–

19

.

11

Blagoev

KB

,

Wilkerson

J

,

Fojo

T.

Hazard ratios in cancer clinical trials–a primer

.

Nat Rev Clin Oncol

2012

;

9

:

178

–

83

.

12

Alexander

BM

,

Schoenfeld

JD

,

Trippa

L.

Hazards of hazard ratios - deviations from model assumptions in immunotherapy

.

N Engl J Med

2018

;

378

:

1158

–

59

.

13

Sutradhar

R

,

Austin

PC.

Relative rates not relative risks: addressing a widespread misinterpretation of hazard ratios

.

Ann Epidemiol

2018

;

28

:

54

–

57

.

14

Stensrud

MJ

,

Aalen

JM

,

Aalen

OO

et al.

Limitations of hazard ratios in clinical trials

.

Eur Heart J

2019

;

40

:

1378

–

83

.

15

Stensrud

MJ

,

Hernán

MA.

Why test for proportional hazards?

JAMA

2020

;

323

:

1401

–

402

.

16

Andersen

PK.

Decomposition of number of life years lost according to causes of death

.

Stat Med

2013

;

32

:

5278

–

85

.

17

Zhao

L

,

Tian

L

,

Claggett

B

et al.

Estimating treatment effect with clinical interpretation from a comparative clinical trial with an end point subject to competing risks

.

JAMA Cardiol

2018

;

3

:

357

–

58

.

18

Lyu

J

,

Hou

Y

,

Chen

Z.

The use of restricted mean time lost under competing risks data

.

BMC Med Res Methodol

2020

;

20

:

197

.

19

Conner

SC

,

Trinquart

L.

Estimation and modeling of the restricted mean time lost in the presence of competing risks

.

Stat Med

2021

;

40

:

2177

–

96

.

20

Wu

H

,

Yuan

H

,

Yang

Z

,

Hou

Y

,

Chen

Z.

Implementation of an alternative method for assessing competing risks: restricted mean time lost

.

Am J Epidemiol

2022

;

191

:

163

–

72

.

21

Dignam

JJ

,

Kocherginsky

MN.

Choice and interpretation of statistical tests used when competing risks are present

.

J Clin Oncol

2008

;

26

:

4027

–

34

.

22

Bakoyannis

G

,

Touloumi

G.

Practical methods for competing risks data: a review

.

Stat Methods Med Res

2012

;

21

:

257

–

72

.

23

Wolkewitz

M

,

Cooper

BS

,

Bonten

MJ

,

Barnett

AG

,

Schumacher

M.

Interpreting and comparing risks in the presence of competing events

.

BMJ

2014

;

349

:

g5060

.

24

Poythress

JC

,

Lee

MY

,

Young

J.

Planning and analyzing clinical trials with competing risks: recommendations for choosing appropriate statistical methodology

.

Pharm Stat

2020

;

19

:

4

–

21

.

25

Guyot

P

,

Ades

AE

,

Ouwens

MJ

,

Welton

NJ.

Enhanced secondary analysis of survival data: reconstructing the data from published Kaplan-Meier survival curves

.

BMC Med Res Methodol

2012

;

12

:

9

.

26

de Wreede

LC

,

Fiocco

M

,

Putter

H.

mstate: an R package for the analysis of competing risks and multi-state models

.

J Stat Soft

2011

;

38

:

1

–

30

.

27

Schmoor

C

,

Schumacher

M

,

Finke

J

,

Beyersmann

J.

Competing risks and multistate models

.

Clin Cancer Res

2013

;

19

:

12

–

21

.

28

Gray

RJ.

A class of K-sample tests for comparing the cumulative incidence of a competing risk

.

Ann Stat

1988

;

16

:

1141

–

54

.

29

Fine

J

,

Gray

RJ.

A proportional hazards model for the subdistribution of a competing risk

.

J Am Stat Assoc

1999

;

94

:

496

–

509

.

30

Dharmarajan

K

,

Hsieh

AF

,

Kulkarni

VT

et al.

Trajectories of risk after hospitalization for heart failure, acute myocardial infarction, or pneumonia: retrospective cohort study

.

BMJ

2015

;

350

:

h411

.

31

Fasslrinner

F

,

Schetelig

J

,

Burchert

A

et al.

Long-term efficacy of reduced-intensity versus myeloablative conditioning before allogeneic haemopoietic cell transplantation in patients with acute myeloid leukaemia in first complete remission: retrospective follow-up of an open-label, randomised phase 3 trial

.

Lancet Haematol

2018

;

5

:

e161

–

69

.

32

Kim

DH

,

Uno

H

,

Wei

LJ.

Restricted mean survival time as a measure to interpret clinical trial results

.

JAMA Cardiol

2017

;

2

:

1179

–

80

.

33

Schoenfeld

D.

Chi-squared goodness-of-fit tests for the proportional hazards regression model

.

Biometrika

1980

;

67

:

145

–

53

.

34

Li

J

,

Scheike

TH

,

Zhang

MJ.

Checking Fine and Gray subdistribution hazards model with cumulative sums of residuals

.

Lifetime Data Anal

2015

;

21

:

197

–

217

.

35

Uno

H

,

Claggett

B

,

Tian

L

et al.

Moving beyond the hazard ratio in quantifying the between-group difference in survival analysis

.

J Clin Oncol

2014

;

32

:

2380

–

85

.

36

Tian

L

,

Jin

H

,

Uno

H

et al.

On the empirical choice of the time window for restricted mean survival time

.

Biometrics

2020

;

76

:

1157

–

66

.

37

Perego

C

,

Sbolli

M

,

Specchia

C

et al.

Utility of restricted mean survival time analysis for heart failure clinical trial evaluation and interpretation

.

JACC Heart Fail

2020

;

8

:

973

–

83

.

38

Rahmadian

AP

,

Delos Santos

S

,

Parshad

S

,

Everest

L

et al.

Quantifying the survival benefits of oncology drugs with a focus on immunotherapy using restricted mean survival time

.

J Natl Compr Canc Netw

2020

;Mar

18

:

278

–

85

.

39

Horiguchi

M

,

Tian

L

,

Uno

H

et al.

Quantification of long-term survival benefit in a comparative oncology clinical study

.

JAMA Oncol

2018

;

4

:

881

–

82

.

40

Luo

X

,

Huang

B

,

Quan

H.

Design and monitoring of survival trials based on restricted mean survival times

.

Clin Trials

2019

;

16

:

616

–

25

.

41

Pauker

M

,

Chappell

R.

Window mean survival time

.

Stat Med

2021

;

40

:

5521

–

33

.

42

Li

X

,

Raventós

B

,

Roel

E

et al.

Association between covid-19 vaccination, SARS-CoV-2 infection, and risk of immune mediated neurological events: population based cohort and self-controlled case series analysis

.

BMJ

2022

;

376

:

e068373

.

PubMed

OpenURL Placeholder Text

43

Fabiani

M

,

Puopolo

M

,

Morciano

C

,

Italian Integrated Surveillance of Covid-19 Study Group and Italian Covid-19 Vaccines Registry Group

et al.

Effectiveness of mRNA vaccines and waning of protection against SARS-CoV-2 infection and severe covid-19 during predominant circulation of the delta variant in Italy: retrospective cohort study

.

BMJ

2022

;

376

:

e069052

.

PubMed

OpenURL Placeholder Text