Pooling controls from nested case–control studies with the proportional risks model

Scenario 1 simulation result based on full cohort data.

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.02	$β_{11}$	0.693	0.694	0.029	0.001	0.94	0.694	0.029	0.001	0.94	1.00
	$β_{12}$	0.916	0.918	0.029	0.001	0.96	0.918	0.029	0.001	0.96	1.00
	$β_{21}$	0.405	0.409	0.066	0.004	0.95	0.409	0.066	0.004	0.95	1.01
	$β_{22}$	1.099	1.103	0.070	0.005	0.94	1.102	0.069	0.005	0.94	1.02
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.096	0.009	0.95
0.1	$β_{11}$	0.693	0.695	0.030	0.001	0.95	0.695	0.030	0.001	0.95	1.01
	$β_{12}$	0.916	0.917	0.033	0.001	0.95	0.917	0.033	0.001	0.95	1.01
	$β_{21}$	0.405	0.406	0.029	0.001	0.97	0.406	0.029	0.001	0.97	1.00
	$β_{22}$	1.099	1.098	0.032	0.001	0.96	1.098	0.031	0.001	0.96	1.01
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.698	0.032	0.001	0.96	0.698	0.032	0.001	0.96	1.01
	$β_{12}$	0.916	0.913	0.035	0.001	0.95	0.913	0.035	0.001	0.94	1.01
	$β_{21}$	0.405	0.404	0.022	<0.001	0.95	0.404	0.022	<0.001	0.95	1.01
	$β_{22}$	1.099	1.100	0.025	0.001	0.96	1.100	0.024	0.001	0.95	1.02
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.02	$β_{11}$	0.693	0.694	0.029	0.001	0.94	0.694	0.029	0.001	0.94	1.00
	$β_{12}$	0.916	0.918	0.029	0.001	0.96	0.918	0.029	0.001	0.96	1.00
	$β_{21}$	0.405	0.409	0.066	0.004	0.95	0.409	0.066	0.004	0.95	1.01
	$β_{22}$	1.099	1.103	0.070	0.005	0.94	1.102	0.069	0.005	0.94	1.02
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.096	0.009	0.95
0.1	$β_{11}$	0.693	0.695	0.030	0.001	0.95	0.695	0.030	0.001	0.95	1.01
	$β_{12}$	0.916	0.917	0.033	0.001	0.95	0.917	0.033	0.001	0.95	1.01
	$β_{21}$	0.405	0.406	0.029	0.001	0.97	0.406	0.029	0.001	0.97	1.00
	$β_{22}$	1.099	1.098	0.032	0.001	0.96	1.098	0.031	0.001	0.96	1.01
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.698	0.032	0.001	0.96	0.698	0.032	0.001	0.96	1.01
	$β_{12}$	0.916	0.913	0.035	0.001	0.95	0.913	0.035	0.001	0.94	1.01
	$β_{21}$	0.405	0.404	0.022	<0.001	0.95	0.404	0.022	<0.001	0.95	1.01
	$β_{22}$	1.099	1.100	0.025	0.001	0.96	1.100	0.024	0.001	0.95	1.02
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

Table 1:

Scenario 1 simulation result based on full cohort data.

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.02	$β_{11}$	0.693	0.694	0.029	0.001	0.94	0.694	0.029	0.001	0.94	1.00
	$β_{12}$	0.916	0.918	0.029	0.001	0.96	0.918	0.029	0.001	0.96	1.00
	$β_{21}$	0.405	0.409	0.066	0.004	0.95	0.409	0.066	0.004	0.95	1.01
	$β_{22}$	1.099	1.103	0.070	0.005	0.94	1.102	0.069	0.005	0.94	1.02
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.096	0.009	0.95
0.1	$β_{11}$	0.693	0.695	0.030	0.001	0.95	0.695	0.030	0.001	0.95	1.01
	$β_{12}$	0.916	0.917	0.033	0.001	0.95	0.917	0.033	0.001	0.95	1.01
	$β_{21}$	0.405	0.406	0.029	0.001	0.97	0.406	0.029	0.001	0.97	1.00
	$β_{22}$	1.099	1.098	0.032	0.001	0.96	1.098	0.031	0.001	0.96	1.01
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.698	0.032	0.001	0.96	0.698	0.032	0.001	0.96	1.01
	$β_{12}$	0.916	0.913	0.035	0.001	0.95	0.913	0.035	0.001	0.94	1.01
	$β_{21}$	0.405	0.404	0.022	<0.001	0.95	0.404	0.022	<0.001	0.95	1.01
	$β_{22}$	1.099	1.100	0.025	0.001	0.96	1.100	0.024	0.001	0.95	1.02
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.02	$β_{11}$	0.693	0.694	0.029	0.001	0.94	0.694	0.029	0.001	0.94	1.00
	$β_{12}$	0.916	0.918	0.029	0.001	0.96	0.918	0.029	0.001	0.96	1.00
	$β_{21}$	0.405	0.409	0.066	0.004	0.95	0.409	0.066	0.004	0.95	1.01
	$β_{22}$	1.099	1.103	0.070	0.005	0.94	1.102	0.069	0.005	0.94	1.02
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.096	0.009	0.95
0.1	$β_{11}$	0.693	0.695	0.030	0.001	0.95	0.695	0.030	0.001	0.95	1.01
	$β_{12}$	0.916	0.917	0.033	0.001	0.95	0.917	0.033	0.001	0.95	1.01
	$β_{21}$	0.405	0.406	0.029	0.001	0.97	0.406	0.029	0.001	0.97	1.00
	$β_{22}$	1.099	1.098	0.032	0.001	0.96	1.098	0.031	0.001	0.96	1.01
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.698	0.032	0.001	0.96	0.698	0.032	0.001	0.96	1.01
	$β_{12}$	0.916	0.913	0.035	0.001	0.95	0.913	0.035	0.001	0.94	1.01
	$β_{21}$	0.405	0.404	0.022	<0.001	0.95	0.404	0.022	<0.001	0.95	1.01
	$β_{22}$	1.099	1.100	0.025	0.001	0.96	1.100	0.024	0.001	0.95	1.02
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

Table 2:

Scenario 1 simulation result based on NCC data.

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	RE
0.02	$β_{11}$	0.693	0.695	0.055	0.003	0.95	0.694	0.051	0.003	0.95	1.17
	$β_{12}$	0.916	0.919	0.061	0.004	0.95	0.918	0.058	0.003	0.95	1.13
	$β_{21}$	0.405	0.414	0.126	0.016	0.95	0.408	0.080	0.006	0.95	2.47
	$β_{22}$	1.099	1.119	0.155	0.024	0.94	1.103	0.087	0.008	0.93	3.24
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.097	0.009	0.95
0.1	$β_{11}$	0.693	0.698	0.060	0.004	0.94	0.696	0.046	0.002	0.94	1.68
	$β_{12}$	0.916	0.920	0.066	0.004	0.94	0.916	0.053	0.003	0.94	1.56
	$β_{21}$	0.405	0.407	0.057	0.003	0.95	0.407	0.045	0.002	0.95	1.56
	$β_{22}$	1.099	1.098	0.067	0.004	0.95	1.098	0.050	0.002	0.96	1.80
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.700	0.066	0.004	0.93	0.700	0.045	0.002	0.94	2.17
	$β_{12}$	0.916	0.915	0.066	0.004	0.96	0.914	0.046	0.002	0.96	2.03
	$β_{21}$	0.405	0.406	0.040	0.002	0.95	0.406	0.037	0.001	0.94	1.20
	$β_{22}$	1.099	1.102	0.048	0.002	0.95	1.101	0.043	0.002	0.94	1.28
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	RE
0.02	$β_{11}$	0.693	0.695	0.055	0.003	0.95	0.694	0.051	0.003	0.95	1.17
	$β_{12}$	0.916	0.919	0.061	0.004	0.95	0.918	0.058	0.003	0.95	1.13
	$β_{21}$	0.405	0.414	0.126	0.016	0.95	0.408	0.080	0.006	0.95	2.47
	$β_{22}$	1.099	1.119	0.155	0.024	0.94	1.103	0.087	0.008	0.93	3.24
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.097	0.009	0.95
0.1	$β_{11}$	0.693	0.698	0.060	0.004	0.94	0.696	0.046	0.002	0.94	1.68
	$β_{12}$	0.916	0.920	0.066	0.004	0.94	0.916	0.053	0.003	0.94	1.56
	$β_{21}$	0.405	0.407	0.057	0.003	0.95	0.407	0.045	0.002	0.95	1.56
	$β_{22}$	1.099	1.098	0.067	0.004	0.95	1.098	0.050	0.002	0.96	1.80
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.700	0.066	0.004	0.93	0.700	0.045	0.002	0.94	2.17
	$β_{12}$	0.916	0.915	0.066	0.004	0.96	0.914	0.046	0.002	0.96	2.03
	$β_{21}$	0.405	0.406	0.040	0.002	0.95	0.406	0.037	0.001	0.94	1.20
	$β_{22}$	1.099	1.102	0.048	0.002	0.95	1.101	0.043	0.002	0.94	1.28
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

Table 2:

Scenario 1 simulation result based on NCC data.

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	RE
0.02	$β_{11}$	0.693	0.695	0.055	0.003	0.95	0.694	0.051	0.003	0.95	1.17
	$β_{12}$	0.916	0.919	0.061	0.004	0.95	0.918	0.058	0.003	0.95	1.13
	$β_{21}$	0.405	0.414	0.126	0.016	0.95	0.408	0.080	0.006	0.95	2.47
	$β_{22}$	1.099	1.119	0.155	0.024	0.94	1.103	0.087	0.008	0.93	3.24
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.097	0.009	0.95
0.1	$β_{11}$	0.693	0.698	0.060	0.004	0.94	0.696	0.046	0.002	0.94	1.68
	$β_{12}$	0.916	0.920	0.066	0.004	0.94	0.916	0.053	0.003	0.94	1.56
	$β_{21}$	0.405	0.407	0.057	0.003	0.95	0.407	0.045	0.002	0.95	1.56
	$β_{22}$	1.099	1.098	0.067	0.004	0.95	1.098	0.050	0.002	0.96	1.80
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.700	0.066	0.004	0.93	0.700	0.045	0.002	0.94	2.17
	$β_{12}$	0.916	0.915	0.066	0.004	0.96	0.914	0.046	0.002	0.96	2.03
	$β_{21}$	0.405	0.406	0.040	0.002	0.95	0.406	0.037	0.001	0.94	1.20
	$β_{22}$	1.099	1.102	0.048	0.002	0.95	1.101	0.043	0.002	0.94	1.28
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

			Model (1)				Model (2)
$p_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	RE
0.02	$β_{11}$	0.693	0.695	0.055	0.003	0.95	0.694	0.051	0.003	0.95	1.17
	$β_{12}$	0.916	0.919	0.061	0.004	0.95	0.918	0.058	0.003	0.95	1.13
	$β_{21}$	0.405	0.414	0.126	0.016	0.95	0.408	0.080	0.006	0.95	2.47
	$β_{22}$	1.099	1.119	0.155	0.024	0.94	1.103	0.087	0.008	0.93	3.24
	$γ_{2} - γ_{1}$	–1.652					–1.658	0.097	0.009	0.95
0.1	$β_{11}$	0.693	0.698	0.060	0.004	0.94	0.696	0.046	0.002	0.94	1.68
	$β_{12}$	0.916	0.920	0.066	0.004	0.94	0.916	0.053	0.003	0.94	1.56
	$β_{21}$	0.405	0.407	0.057	0.003	0.95	0.407	0.045	0.002	0.95	1.56
	$β_{22}$	1.099	1.098	0.067	0.004	0.95	1.098	0.050	0.002	0.96	1.80
	$γ_{2} - γ_{1}$	0					0.001	0.056	0.003	0.96
0.2	$β_{11}$	0.693	0.700	0.066	0.004	0.93	0.700	0.045	0.002	0.94	2.17
	$β_{12}$	0.916	0.915	0.066	0.004	0.96	0.914	0.046	0.002	0.96	2.03
	$β_{21}$	0.405	0.406	0.040	0.002	0.95	0.406	0.037	0.001	0.94	1.20
	$β_{22}$	1.099	1.102	0.048	0.002	0.95	1.101	0.043	0.002	0.94	1.28
	$γ_{2} - γ_{1}$	0.750					0.754	0.051	0.003	0.96

3.3.2 Scenario 2

When baselines are not proportional, model (2.2) leads to biased estimates for $β$ and the proportionality factor $γ$ ⁠, while model (2.1) still yields unbiased $β$ estimates in both full cohort (Supplementary Table S4 of the Supplementary Materials) and NCC (Table 3) analyses. In the full cohort analysis, an increase in bias from misspecification is accompanied by a slight increase or decrease in SEs, such that the RE of model (2.2) to model (2.1) estimates are generally smaller than 1. However, in the NCC analysis, model (2.2) estimates may have considerably smaller SEs from pooling controls and some bias due to model misspecification, which lead to smaller MSEs and consequently better efficiency compared to model (2.1) estimates.

Table 3:

Scenario 2 simulation result based on NCC data

			Model (1)				Model (2)^a
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.696	0.058	0.003	0.96	0.673	0.048	0.003	0.93	1.25
	$β_{12}$	0.916	0.924	0.065	0.004	0.93	0.888	0.056	0.004	0.90	1.11
	$β_{21}$	0.405	0.414	0.073	0.005	0.95	0.448	0.057	0.005	0.88	1.08
	$β_{22}$	1.099	1.115	0.086	0.008	0.95	1.178	0.064	0.010	0.77	0.74
5	$β_{11}$	0.693	0.693	0.053	0.003	0.96	0.723	0.050	0.003	0.92	0.83
	$β_{12}$	0.916	0.917	0.059	0.003	0.95	0.954	0.055	0.004	0.90	0.77
	$β_{21}$	0.405	0.410	0.102	0.010	0.96	0.322	0.064	0.011	0.76	0.94
	$β_{22}$	1.099	1.110	0.120	0.015	0.95	0.966	0.067	0.022	0.51	0.66

			Model (1)				Model (2)^a
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.696	0.058	0.003	0.96	0.673	0.048	0.003	0.93	1.25
	$β_{12}$	0.916	0.924	0.065	0.004	0.93	0.888	0.056	0.004	0.90	1.11
	$β_{21}$	0.405	0.414	0.073	0.005	0.95	0.448	0.057	0.005	0.88	1.08
	$β_{22}$	1.099	1.115	0.086	0.008	0.95	1.178	0.064	0.010	0.77	0.74
5	$β_{11}$	0.693	0.693	0.053	0.003	0.96	0.723	0.050	0.003	0.92	0.83
	$β_{12}$	0.916	0.917	0.059	0.003	0.95	0.954	0.055	0.004	0.90	0.77
	$β_{21}$	0.405	0.410	0.102	0.010	0.96	0.322	0.064	0.011	0.76	0.94
	$β_{22}$	1.099	1.110	0.120	0.015	0.95	0.966	0.067	0.022	0.51	0.66

			Model (2) + piecewise constant^b					Model $(2) +$ linear function of time^c
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.690	0.049	0.002	0.94	1.41	0.694	0.049	0.002	0.95	1.41
	$β_{12}$	0.916	0.912	0.057	0.003	0.93	1.34	0.918	0.057	0.003	0.94	1.34
	$β_{21}$	0.405	0.421	0.056	0.003	0.94	1.59	0.415	0.056	0.003	0.95	1.69
	$β_{22}$	1.099	1.132	0.063	0.005	0.92	1.52	1.120	0.063	0.004	0.94	1.76
5	$β_{11}$	0.693	0.708	0.050	0.003	0.95	1.04	0.695	0.049	0.002	0.96	1.15
	$β_{12}$	0.916	0.933	0.056	0.003	0.94	1.03	0.916	0.055	0.003	0.95	1.13
	$β_{21}$	0.405	0.362	0.069	0.007	0.90	1.57	0.402	0.077	0.005	0.95	1.93
	$β_{22}$	1.099	1.037	0.071	0.009	0.88	1.66	1.104	0.077	0.006	0.96	2.47

			Model (2) + piecewise constant^b					Model $(2) +$ linear function of time^c
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.690	0.049	0.002	0.94	1.41	0.694	0.049	0.002	0.95	1.41
	$β_{12}$	0.916	0.912	0.057	0.003	0.93	1.34	0.918	0.057	0.003	0.94	1.34
	$β_{21}$	0.405	0.421	0.056	0.003	0.94	1.59	0.415	0.056	0.003	0.95	1.69
	$β_{22}$	1.099	1.132	0.063	0.005	0.92	1.52	1.120	0.063	0.004	0.94	1.76
5	$β_{11}$	0.693	0.708	0.050	0.003	0.95	1.04	0.695	0.049	0.002	0.96	1.15
	$β_{12}$	0.916	0.933	0.056	0.003	0.94	1.03	0.916	0.055	0.003	0.95	1.13
	$β_{21}$	0.405	0.362	0.069	0.007	0.90	1.57	0.402	0.077	0.005	0.95	1.93
	$β_{22}$	1.099	1.037	0.071	0.009	0.88	1.66	1.104	0.077	0.006	0.96	2.47

a

The estimates for the proportionality factors are –0.651 for $α_{2} = 0.5$ and –1.169 for $α_{2} = 5$ ⁠.

b

For $α_{2} = 0.5$ ⁠, the estimates for the piecewise constants are –0.015 in $(0, 2.5]$ and –1.107 in $(2.5, 10]$ ⁠. For $α_{2} = 5$ ⁠, the estimates for the piecewise constants are –1.969 in $(0, 7.5]$ and 0.258 in $(7.5, 10]$ ⁠.

c

The estimated proportionality factors are $0.198 - 0.237 t$ for $α_{2} = 0.5$ and $- 5.540 + 0.694 t$ for $α_{2} = 5$ ⁠.

Table 3:

Scenario 2 simulation result based on NCC data

			Model (1)				Model (2)^a
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.696	0.058	0.003	0.96	0.673	0.048	0.003	0.93	1.25
	$β_{12}$	0.916	0.924	0.065	0.004	0.93	0.888	0.056	0.004	0.90	1.11
	$β_{21}$	0.405	0.414	0.073	0.005	0.95	0.448	0.057	0.005	0.88	1.08
	$β_{22}$	1.099	1.115	0.086	0.008	0.95	1.178	0.064	0.010	0.77	0.74
5	$β_{11}$	0.693	0.693	0.053	0.003	0.96	0.723	0.050	0.003	0.92	0.83
	$β_{12}$	0.916	0.917	0.059	0.003	0.95	0.954	0.055	0.004	0.90	0.77
	$β_{21}$	0.405	0.410	0.102	0.010	0.96	0.322	0.064	0.011	0.76	0.94
	$β_{22}$	1.099	1.110	0.120	0.015	0.95	0.966	0.067	0.022	0.51	0.66

			Model (1)				Model (2)^a
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.696	0.058	0.003	0.96	0.673	0.048	0.003	0.93	1.25
	$β_{12}$	0.916	0.924	0.065	0.004	0.93	0.888	0.056	0.004	0.90	1.11
	$β_{21}$	0.405	0.414	0.073	0.005	0.95	0.448	0.057	0.005	0.88	1.08
	$β_{22}$	1.099	1.115	0.086	0.008	0.95	1.178	0.064	0.010	0.77	0.74
5	$β_{11}$	0.693	0.693	0.053	0.003	0.96	0.723	0.050	0.003	0.92	0.83
	$β_{12}$	0.916	0.917	0.059	0.003	0.95	0.954	0.055	0.004	0.90	0.77
	$β_{21}$	0.405	0.410	0.102	0.010	0.96	0.322	0.064	0.011	0.76	0.94
	$β_{22}$	1.099	1.110	0.120	0.015	0.95	0.966	0.067	0.022	0.51	0.66

			Model (2) + piecewise constant^b					Model $(2) +$ linear function of time^c
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.690	0.049	0.002	0.94	1.41	0.694	0.049	0.002	0.95	1.41
	$β_{12}$	0.916	0.912	0.057	0.003	0.93	1.34	0.918	0.057	0.003	0.94	1.34
	$β_{21}$	0.405	0.421	0.056	0.003	0.94	1.59	0.415	0.056	0.003	0.95	1.69
	$β_{22}$	1.099	1.132	0.063	0.005	0.92	1.52	1.120	0.063	0.004	0.94	1.76
5	$β_{11}$	0.693	0.708	0.050	0.003	0.95	1.04	0.695	0.049	0.002	0.96	1.15
	$β_{12}$	0.916	0.933	0.056	0.003	0.94	1.03	0.916	0.055	0.003	0.95	1.13
	$β_{21}$	0.405	0.362	0.069	0.007	0.90	1.57	0.402	0.077	0.005	0.95	1.93
	$β_{22}$	1.099	1.037	0.071	0.009	0.88	1.66	1.104	0.077	0.006	0.96	2.47

			Model (2) + piecewise constant^b					Model $(2) +$ linear function of time^c
$α_{2}$	Parameter	True	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
0.5	$β_{11}$	0.693	0.690	0.049	0.002	0.94	1.41	0.694	0.049	0.002	0.95	1.41
	$β_{12}$	0.916	0.912	0.057	0.003	0.93	1.34	0.918	0.057	0.003	0.94	1.34
	$β_{21}$	0.405	0.421	0.056	0.003	0.94	1.59	0.415	0.056	0.003	0.95	1.69
	$β_{22}$	1.099	1.132	0.063	0.005	0.92	1.52	1.120	0.063	0.004	0.94	1.76
5	$β_{11}$	0.693	0.708	0.050	0.003	0.95	1.04	0.695	0.049	0.002	0.96	1.15
	$β_{12}$	0.916	0.933	0.056	0.003	0.94	1.03	0.916	0.055	0.003	0.95	1.13
	$β_{21}$	0.405	0.362	0.069	0.007	0.90	1.57	0.402	0.077	0.005	0.95	1.93
	$β_{22}$	1.099	1.037	0.071	0.009	0.88	1.66	1.104	0.077	0.006	0.96	2.47

a

The estimates for the proportionality factors are –0.651 for $α_{2} = 0.5$ and –1.169 for $α_{2} = 5$ ⁠.

b

For $α_{2} = 0.5$ ⁠, the estimates for the piecewise constants are –0.015 in $(0, 2.5]$ and –1.107 in $(2.5, 10]$ ⁠. For $α_{2} = 5$ ⁠, the estimates for the piecewise constants are –1.969 in $(0, 7.5]$ and 0.258 in $(7.5, 10]$ ⁠.

c

The estimated proportionality factors are $0.198 - 0.237 t$ for $α_{2} = 0.5$ and $- 5.540 + 0.694 t$ for $α_{2} = 5$ ⁠.

Allowing time-dependence of the proportionality factor greatly reduces the bias and MSEs of $β$ estimates based on model (2.2). As the true $γ (t)$ is nearly linear in time under $α_{2} = 0.5$ and 5 for the majority of follow-up, the linear form yields less biased and more efficient estimates than the piecewise-constant form. In the full cohort analysis, the RE of model (2.2) to model (2.1) increases from (0.16-0.66) to (0.57-0.98) with the piecewise-constant proportionality factor and (0.87-1.00) with the linear proportionality factor. In the NCC analysis, the RE of model (2.2) to model (2.1) increases from (0.66-1.25) to (1.03-1.66) with the piecewise-constant proportionality factor and (1.13-2.47) with the linear proportionality factor.

3.3.3 Scenario 3

When age affects the two failure types differently and is matched on, model (2.1) gives unbiased $β$ estimates. In contrast, not including age in model (2.2) can bias the estimates for $β$ and the proportionality factor (Table 4). The bias is especially substantial for the proportionality factor, as it absorbs the effects of age. Including age as described in Section 2.2.3 makes the model (2.2) estimates unbiased, at a cost of a slight increase in standard errors. In this case, it’s reasonable to retain age in model (2.2), especially when the proportionality factor is of interest.

Table 4:

Scenario 3 simulation result.

		Model (1)				Model (2)					Model (2) + age
Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
$β_{11}$	0.693	0.693	0.057	0.003	0.93	0.687	0.050	0.002	0.93	1.29	0.692	0.050	0.003	0.94	1.28
$β_{12}$	0.916	0.915	0.061	0.004	0.94	0.909	0.052	0.003	0.95	1.37	0.915	0.052	0.003	0.95	1.38
$β_{21}$	0.405	0.407	0.076	0.006	0.96	0.415	0.055	0.003	0.95	1.88	0.405	0.054	0.003	0.96	1.95
$β_{22}$	1.099	1.104	0.092	0.008	0.95	1.112	0.065	0.004	0.94	1.92	1.097	0.065	0.004	0.94	2.00
$γ_{2} - γ_{1}$	–0.720					–0.815	0.071	0.014	0.70		–0.722	0.073	0.005	0.95
$β_{23} - β_{13}$	–0.375										–0.372	0.054	0.003	0.93

		Model (1)				Model (2)					Model (2) + age
Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
$β_{11}$	0.693	0.693	0.057	0.003	0.93	0.687	0.050	0.002	0.93	1.29	0.692	0.050	0.003	0.94	1.28
$β_{12}$	0.916	0.915	0.061	0.004	0.94	0.909	0.052	0.003	0.95	1.37	0.915	0.052	0.003	0.95	1.38
$β_{21}$	0.405	0.407	0.076	0.006	0.96	0.415	0.055	0.003	0.95	1.88	0.405	0.054	0.003	0.96	1.95
$β_{22}$	1.099	1.104	0.092	0.008	0.95	1.112	0.065	0.004	0.94	1.92	1.097	0.065	0.004	0.94	2.00
$γ_{2} - γ_{1}$	–0.720					–0.815	0.071	0.014	0.70		–0.722	0.073	0.005	0.95
$β_{23} - β_{13}$	–0.375										–0.372	0.054	0.003	0.93

Table 4:

Scenario 3 simulation result.

		Model (1)				Model (2)					Model (2) + age
Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
$β_{11}$	0.693	0.693	0.057	0.003	0.93	0.687	0.050	0.002	0.93	1.29	0.692	0.050	0.003	0.94	1.28
$β_{12}$	0.916	0.915	0.061	0.004	0.94	0.909	0.052	0.003	0.95	1.37	0.915	0.052	0.003	0.95	1.38
$β_{21}$	0.405	0.407	0.076	0.006	0.96	0.415	0.055	0.003	0.95	1.88	0.405	0.054	0.003	0.96	1.95
$β_{22}$	1.099	1.104	0.092	0.008	0.95	1.112	0.065	0.004	0.94	1.92	1.097	0.065	0.004	0.94	2.00
$γ_{2} - γ_{1}$	–0.720					–0.815	0.071	0.014	0.70		–0.722	0.073	0.005	0.95
$β_{23} - β_{13}$	–0.375										–0.372	0.054	0.003	0.93

		Model (1)				Model (2)					Model (2) + age
Parameter	True	Mean	SE	MSE	Coverage	Mean	SE	MSE	Coverage	$RE$	Mean	SE	MSE	Coverage	$RE$
$β_{11}$	0.693	0.693	0.057	0.003	0.93	0.687	0.050	0.002	0.93	1.29	0.692	0.050	0.003	0.94	1.28
$β_{12}$	0.916	0.915	0.061	0.004	0.94	0.909	0.052	0.003	0.95	1.37	0.915	0.052	0.003	0.95	1.38
$β_{21}$	0.405	0.407	0.076	0.006	0.96	0.415	0.055	0.003	0.95	1.88	0.405	0.054	0.003	0.96	1.95
$β_{22}$	1.099	1.104	0.092	0.008	0.95	1.112	0.065	0.004	0.94	1.92	1.097	0.065	0.004	0.94	2.00
$γ_{2} - γ_{1}$	–0.720					–0.815	0.071	0.014	0.70		–0.722	0.073	0.005	0.95
$β_{23} - β_{13}$	–0.375										–0.372	0.054	0.003	0.93

3.3.4 Scenario 4

${\tilde{L}}_{F U} (β)$ yields unbiased and more efficient estimates for $β$ than ${\tilde{L}}_{1} (β)$ (Supplementary Table S5 of the Supplementary Materials).

3.3.5 Scenario 5

With or without the offset, ${\tilde{L}}_{I C} (β)$ gives unbiased and more efficient estimates for $β$ than ${\tilde{L}}_{1} (β)$ (Supplementary Table S6 of the Supplementary Materials). Inclusion of the offset corrects the bias of the proportionality factor estimate and restores the coverage. Supplementary Table S6 also provides the simulation result for $\int_{0}^{5} λ_{110} (t) d t$ and $\int_{0}^{5} λ_{120} (t) d t$ ⁠, the 5-yr cumulative baselines within the group of subjects eligible to both studies $(Q_{1})$ ⁠. These estimates show that the denominator and numerator weights provided in Section 2.2.6 are appropriate.

4 PLCO vitamin D NCC data

The PLCO trial enrolled around 155,000 participants aged 55 to 74 yrs old from 1993 to 2001. Participants were randomized 1:1 to the control arm or the screening arm, to which cancer screening exams were given to assess whether they reduce mortality from specific cancers. To demonstrate our methods, we pool and reanalyze data from two separate NCC studies within the screening arm of the PLCO trial, which studied the association of serum vitamin D concentration with prostate cancer (Ahn et al. 2008) and colorectal cancer (Weinstein et al. 2015), respectively. Serum vitamin D concentration collected at screening was found to be positively associated with prostate cancer risk, and negatively associated with colorectal cancer risk in respective study populations. It is unclear whether these associations would persist when competing risks are considered, and it is of interest to compare the effects of serum vitamin D on the two cancer types in men. As discussed previously, the two studies had different endpoints, follow-up intervals, inclusion criteria, and matchingvariables.

We use the pooled data to study the association between serum vitamin D concentration and the two cancers in a competing risks framework. Two endpoints are considered: time to the earlier of colorectal or prostate cancer diagnosis where diagnoses of other cancers are (a) not considered, or (b) treated as censoring events. For (a), five prostate cancer cases are excluded as their colorectal cancer diagnosis predated prostate cancer diagnosis. For (b), we exclude two sampled risk sets from the colorectal cancer study where the cases had other cancers diagnosed before colorectal cancer, and 23 prostate cancer cases whose first diagnosed cancer was not prostate cancer. We exclude 25 sampled risk sets from the colorectal study and 20 subjects from the prostate cancer study with incomplete covariates (baseline vitamin D concentration, BMI, and diabetes status).

Because the colorectal cancer study was individually matched but the prostate cancer study was frequency-matched (Appendix 4 of the Supplementary Materials), we rematch prostate cancer cases 1:1 individually using the frequency sample for endpoint (a) and (b) and the same matching variables as the colorectal cancer study (age at initial serum draw date $\pm 1$ yr and initial serum draw date $\pm 30$ d or 60 d when needed). We exclude 12 sampled risks sets whose members don’t have the same eligibility to the two studies for both endpoints. 11 and 14 prostate cancer cases with no available controls are excluded for endpoint (a) and (b), respectively. The colorectal cancer study contributes 439 and 437 cases and equal numbers of controls to the combined analysis for endpoints (a) and (b), respectively. The prostate cancer study contributes 719 and 698 cases and equal numbers of controls to the combined analysis for endpoints (a) and (b),respectively.

Serum vitamin D concentrations were categorized using season-specific quintiles within each study to account for seasonal fluctuations of vitamin D, which is consistent with the original studies (Ahn et al. 2008; Weinstein et al. 2015). For the colorectal cancer study, the cutoffs are

(33.6, 44.6, 55.8, 68.1)

nmol / L

for December to May and

(44.2, 57.4, 66.5, 78.3)

nmol / L

for June to November. For the prostate cancer study, the cutoffs are

(37.3, 45.8, 54.3, 65.8)

nmol / L

for December to May and

(47.3, 55.8, 65.5, 77.3)

nmol / L

for June to November. We combine data from the two studies to construct

{\tilde{L}}_{1} (β)

with model (2.1) and

{\tilde{L}}_{F U, I C} (β)

with both models (2.1) and (2.2). For the latter, model (2.2) is fitted to sampled risk sets (collectively

Q_{1}

⁠) eligible to both NCC studies and having a case time larger than 365 d, the overlap of the follow-up of the two studies. Model (2.1) is fitted to two groups: one group

(Q_{2})

contains sampled risk sets from the colorectal cancer study not eligible to the prostate cancer study or having a case time within 365 d; the other group

(Q_{3})

contains sampled risk sets from the prostate cancer study not eligible to the colorectal cancer study. In particular,

{\tilde{L}}_{F U, I C} (β)

is

\begin{matrix} \prod_{j : \tilde{R} (τ_{j}) \subseteq Q_{1}} \frac{exp {I (ϵ_{j} = 2) γ (t) + Z_{j}^{T} β_{ϵ_{j}}}}{\sum_{l \in \tilde{R} (τ_{j})} \sum_{k = 1}^{2} exp {I (k = 2) γ (t) + Z_{l}^{T} β_{k}}} \\ \prod_{j : \tilde{R} (τ_{j}) \subseteq Q_{2}} \frac{exp (Z_{j}^{T} β_{1})}{\sum_{l \in \tilde{R} (τ_{j})} exp (Z_{l}^{T} β_{1})} \prod_{j : \tilde{R} (τ_{j}) \subseteq Q_{3}} \frac{exp (Z_{j}^{T} β_{2})}{\sum_{l \in \tilde{R} (τ_{j})} exp (Z_{l}^{T} β_{2})}, \end{matrix}

where

β_{1}

and

β_{2}

are the effects of covariates on risks of colorectal cancer and prostate cancer, respectively, and

γ (t)

is a time-dependent proportionality factor that includes an offset

c_{2} - c_{1}

(see Appendix 2 of the Supplementary Materials). The

β

coefficients are shared across groups because there is no strong clinical basis to believe they would differ by eligibility to the two studies or interval of follow-up.

γ (t)

is chosen to have a quadratic form

a + b t + c t^{2}

given the statistical significance of likelihood ratio tests (LRT) for

c = 0

(0.034 for endpoint (a) and 0.035 for endpoint (b)) and

b = c = 0

(P-value

< 0.001

for both endpoints). The matching variables are omitted from the models for the following reasons: (i) gender and race are not identifiable as subjects eligible to both studies are all non-Hispanic White males; (ii) age at serum collection has a negligible effect; and (iii) the potential effect of serum collection date is already accounted for by using season-specific vitamin D quintiles. The results are summarized in Table 5. Supplementary Figure S2 in the Supplementary Materials demonstrates the use of smoothed baseline hazards and that the estimated cause-specific baselines from

{\tilde{L}}_{F U, I C} (β)

approximate those from

{\tilde{L}}_{1} (β)

when the proportionality factor is modeled properly.

Table 5:

Result of the combined analysis of two case-control studies nested in the PLCO cohort.

Endpoint (a)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter^a	Estimate	SE	95% CI	Estimate	SE	95% CI	RE^c
Colorectal	Vitamin D quintile
	2	0.153	0.211	(–0.262, 0.567)	0.297	0.192	(–0.079, 0.672)	1.22
	3	–0.085	0.217	(–0.510, 0.341)	0.150	0.196	(–0.234, 0.534)	1.23
	4	–0.223	0.225	(–0.663, 0.217)	0.124	0.205	(–0.278, 0.526)	1.20
	5	–0.394	0.241	(–0.868, 0.079)	–0.095	0.218	(–0.523, 0.333)	1.22
	BMI $>$ 25	0.007	0.161	(–0.308, 0.321)	0.085	0.150	(–0.210, 0.380)	1.14
	Diabetes	0.470	0.260	(–0.039, 0.979)	0.602	0.237	(0.137, 1.067)	1.20
Prostate	Vitamin D quintile
	2	–0.006	0.188	(–0.373, 0.362)	–0.091	0.181	(–0.446, 0.264)	1.07
	3	0.521	0.173	(0.182, 0.859)	0.385	0.166	(0.060, 0.710)	1.08
	4	0.571	0.188	(0.203, 0.939)	0.347	0.177	(–0.001, 0.693)	1.13
	5	0.262	0.180	(–0.091, 0.615)	0.096	0.173	(–0.244, 0.435)	1.08
	BMI $>$ 25	–0.050	0.126	(–0.297, 0.196)	–0.094	0.120	(–0.328, 0.141)	1.10
	Diabetes	–0.066	0.231	(–0.519, 0.387)	–0.162	0.216	(–0.584, 0.261)	1.15
Proportionality^b	Intercept				3.438	0.551	(2.357, 4.518)
	t				–0.519	0.588	(–1.672, –0.634)
	$t^{2}$				–0.307	0.157	(–0.616, 0.001)

Endpoint (a)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter^a	Estimate	SE	95% CI	Estimate	SE	95% CI	RE^c
Colorectal	Vitamin D quintile
	2	0.153	0.211	(–0.262, 0.567)	0.297	0.192	(–0.079, 0.672)	1.22
	3	–0.085	0.217	(–0.510, 0.341)	0.150	0.196	(–0.234, 0.534)	1.23
	4	–0.223	0.225	(–0.663, 0.217)	0.124	0.205	(–0.278, 0.526)	1.20
	5	–0.394	0.241	(–0.868, 0.079)	–0.095	0.218	(–0.523, 0.333)	1.22
	BMI $>$ 25	0.007	0.161	(–0.308, 0.321)	0.085	0.150	(–0.210, 0.380)	1.14
	Diabetes	0.470	0.260	(–0.039, 0.979)	0.602	0.237	(0.137, 1.067)	1.20
Prostate	Vitamin D quintile
	2	–0.006	0.188	(–0.373, 0.362)	–0.091	0.181	(–0.446, 0.264)	1.07
	3	0.521	0.173	(0.182, 0.859)	0.385	0.166	(0.060, 0.710)	1.08
	4	0.571	0.188	(0.203, 0.939)	0.347	0.177	(–0.001, 0.693)	1.13
	5	0.262	0.180	(–0.091, 0.615)	0.096	0.173	(–0.244, 0.435)	1.08
	BMI $>$ 25	–0.050	0.126	(–0.297, 0.196)	–0.094	0.120	(–0.328, 0.141)	1.10
	Diabetes	–0.066	0.231	(–0.519, 0.387)	–0.162	0.216	(–0.584, 0.261)	1.15
Proportionality^b	Intercept				3.438	0.551	(2.357, 4.518)
	t				–0.519	0.588	(–1.672, –0.634)
	$t^{2}$				–0.307	0.157	(–0.616, 0.001)

Endpoint (b)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter¹	Estimate	SE	95% CI	Estimate	SE	95% CI	${RE}^{3}$
Colorectal	Vitamin D quintile
	2	0.124	0.213	(–0.293, 0.541)	0.281	0.193	(–0.097, 0.658)	1.22
	3	-0.097	0.218	(–0.523, 0.330)	0.095	0.195	(–0.287, 0.478)	1.24
	4	-0.234	0.225	(–0.675, 0.207)	0.102	0.205	(–0.299, 0.504)	1.20
	5	-0.397	0.242	(–0.871, 0.078)	-0.096	0.219	(–0.526, 0.334)	1.22
	BMI $>$ 25	-0.001	0.161	(–0.316, 0.314)	0.089	0.151	(–0.207, 0.384)	1.13
	Diabetes	0.468	0.260	(–0.040, 0.977)	0.633	0.239	(0.165, 1.102)	1.18
Prostate	Vitamin D quintile
	2	0.003	0.186	(–0.362, 0.367)	-0.084	0.182	(–0.441, 0.272)	1.05
	3	0.434	0.170	(0.100, 0.768)	0.344	0.166	(0.019, 0.669)	1.06
	4	0.482	0.179	(0.131, 0.833)	0.289	0.171	(–0.046, 0.624)	1.10
	5	0.302	0.176	(–0.043, 0.647)	0.144	0.170	(–0.189, 0.477)	1.07
	BMI $>$ 25	-0.018	0.129	(–0.270, 0.234)	-0.078	0.122	(–0.318, 0.161)	1.11
	Diabetes	0.043	0.237	(–0.422, 0.508)	-0.091	0.221	(–0.525, 0.342)	1.15
Proportionality²	Intercept				3.384	0.555	(2.296, 4.472)
	t				-0.510	0.592	(–1.670, 0.651)
	$t^{2}$				-0.309	0.159	(–0.620, 0.002)

Endpoint (b)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter¹	Estimate	SE	95% CI	Estimate	SE	95% CI	${RE}^{3}$
Colorectal	Vitamin D quintile
	2	0.124	0.213	(–0.293, 0.541)	0.281	0.193	(–0.097, 0.658)	1.22
	3	-0.097	0.218	(–0.523, 0.330)	0.095	0.195	(–0.287, 0.478)	1.24
	4	-0.234	0.225	(–0.675, 0.207)	0.102	0.205	(–0.299, 0.504)	1.20
	5	-0.397	0.242	(–0.871, 0.078)	-0.096	0.219	(–0.526, 0.334)	1.22
	BMI $>$ 25	-0.001	0.161	(–0.316, 0.314)	0.089	0.151	(–0.207, 0.384)	1.13
	Diabetes	0.468	0.260	(–0.040, 0.977)	0.633	0.239	(0.165, 1.102)	1.18
Prostate	Vitamin D quintile
	2	0.003	0.186	(–0.362, 0.367)	-0.084	0.182	(–0.441, 0.272)	1.05
	3	0.434	0.170	(0.100, 0.768)	0.344	0.166	(0.019, 0.669)	1.06
	4	0.482	0.179	(0.131, 0.833)	0.289	0.171	(–0.046, 0.624)	1.10
	5	0.302	0.176	(–0.043, 0.647)	0.144	0.170	(–0.189, 0.477)	1.07
	BMI $>$ 25	-0.018	0.129	(–0.270, 0.234)	-0.078	0.122	(–0.318, 0.161)	1.11
	Diabetes	0.043	0.237	(–0.422, 0.508)	-0.091	0.221	(–0.525, 0.342)	1.15
Proportionality²	Intercept				3.384	0.555	(2.296, 4.472)
	t				-0.510	0.592	(–1.670, 0.651)
	$t^{2}$				-0.309	0.159	(–0.620, 0.002)

a

The reference levels of the covariates are the first vitamin D quintile, BMI $\leq 25$ and no diabetes.

b

The proportionality factor is modeled as a linear function of t, where t is the case time in days divided by 1000.

c

Relative efficiency is the ratio of the variance of an estimate from ${\tilde{L}}_{1} (β)$ to the variance of the corresponding estimate from ${\tilde{L}}_{F U, I C} (β)$ ⁠.

Table 5:

Result of the combined analysis of two case-control studies nested in the PLCO cohort.

Endpoint (a)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter^a	Estimate	SE	95% CI	Estimate	SE	95% CI	RE^c
Colorectal	Vitamin D quintile
	2	0.153	0.211	(–0.262, 0.567)	0.297	0.192	(–0.079, 0.672)	1.22
	3	–0.085	0.217	(–0.510, 0.341)	0.150	0.196	(–0.234, 0.534)	1.23
	4	–0.223	0.225	(–0.663, 0.217)	0.124	0.205	(–0.278, 0.526)	1.20
	5	–0.394	0.241	(–0.868, 0.079)	–0.095	0.218	(–0.523, 0.333)	1.22
	BMI $>$ 25	0.007	0.161	(–0.308, 0.321)	0.085	0.150	(–0.210, 0.380)	1.14
	Diabetes	0.470	0.260	(–0.039, 0.979)	0.602	0.237	(0.137, 1.067)	1.20
Prostate	Vitamin D quintile
	2	–0.006	0.188	(–0.373, 0.362)	–0.091	0.181	(–0.446, 0.264)	1.07
	3	0.521	0.173	(0.182, 0.859)	0.385	0.166	(0.060, 0.710)	1.08
	4	0.571	0.188	(0.203, 0.939)	0.347	0.177	(–0.001, 0.693)	1.13
	5	0.262	0.180	(–0.091, 0.615)	0.096	0.173	(–0.244, 0.435)	1.08
	BMI $>$ 25	–0.050	0.126	(–0.297, 0.196)	–0.094	0.120	(–0.328, 0.141)	1.10
	Diabetes	–0.066	0.231	(–0.519, 0.387)	–0.162	0.216	(–0.584, 0.261)	1.15
Proportionality^b	Intercept				3.438	0.551	(2.357, 4.518)
	t				–0.519	0.588	(–1.672, –0.634)
	$t^{2}$				–0.307	0.157	(–0.616, 0.001)

Endpoint (a)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter^a	Estimate	SE	95% CI	Estimate	SE	95% CI	RE^c
Colorectal	Vitamin D quintile
	2	0.153	0.211	(–0.262, 0.567)	0.297	0.192	(–0.079, 0.672)	1.22
	3	–0.085	0.217	(–0.510, 0.341)	0.150	0.196	(–0.234, 0.534)	1.23
	4	–0.223	0.225	(–0.663, 0.217)	0.124	0.205	(–0.278, 0.526)	1.20
	5	–0.394	0.241	(–0.868, 0.079)	–0.095	0.218	(–0.523, 0.333)	1.22
	BMI $>$ 25	0.007	0.161	(–0.308, 0.321)	0.085	0.150	(–0.210, 0.380)	1.14
	Diabetes	0.470	0.260	(–0.039, 0.979)	0.602	0.237	(0.137, 1.067)	1.20
Prostate	Vitamin D quintile
	2	–0.006	0.188	(–0.373, 0.362)	–0.091	0.181	(–0.446, 0.264)	1.07
	3	0.521	0.173	(0.182, 0.859)	0.385	0.166	(0.060, 0.710)	1.08
	4	0.571	0.188	(0.203, 0.939)	0.347	0.177	(–0.001, 0.693)	1.13
	5	0.262	0.180	(–0.091, 0.615)	0.096	0.173	(–0.244, 0.435)	1.08
	BMI $>$ 25	–0.050	0.126	(–0.297, 0.196)	–0.094	0.120	(–0.328, 0.141)	1.10
	Diabetes	–0.066	0.231	(–0.519, 0.387)	–0.162	0.216	(–0.584, 0.261)	1.15
Proportionality^b	Intercept				3.438	0.551	(2.357, 4.518)
	t				–0.519	0.588	(–1.672, –0.634)
	$t^{2}$				–0.307	0.157	(–0.616, 0.001)

Endpoint (b)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter¹	Estimate	SE	95% CI	Estimate	SE	95% CI	${RE}^{3}$
Colorectal	Vitamin D quintile
	2	0.124	0.213	(–0.293, 0.541)	0.281	0.193	(–0.097, 0.658)	1.22
	3	-0.097	0.218	(–0.523, 0.330)	0.095	0.195	(–0.287, 0.478)	1.24
	4	-0.234	0.225	(–0.675, 0.207)	0.102	0.205	(–0.299, 0.504)	1.20
	5	-0.397	0.242	(–0.871, 0.078)	-0.096	0.219	(–0.526, 0.334)	1.22
	BMI $>$ 25	-0.001	0.161	(–0.316, 0.314)	0.089	0.151	(–0.207, 0.384)	1.13
	Diabetes	0.468	0.260	(–0.040, 0.977)	0.633	0.239	(0.165, 1.102)	1.18
Prostate	Vitamin D quintile
	2	0.003	0.186	(–0.362, 0.367)	-0.084	0.182	(–0.441, 0.272)	1.05
	3	0.434	0.170	(0.100, 0.768)	0.344	0.166	(0.019, 0.669)	1.06
	4	0.482	0.179	(0.131, 0.833)	0.289	0.171	(–0.046, 0.624)	1.10
	5	0.302	0.176	(–0.043, 0.647)	0.144	0.170	(–0.189, 0.477)	1.07
	BMI $>$ 25	-0.018	0.129	(–0.270, 0.234)	-0.078	0.122	(–0.318, 0.161)	1.11
	Diabetes	0.043	0.237	(–0.422, 0.508)	-0.091	0.221	(–0.525, 0.342)	1.15
Proportionality²	Intercept				3.384	0.555	(2.296, 4.472)
	t				-0.510	0.592	(–1.670, 0.651)
	$t^{2}$				-0.309	0.159	(–0.620, 0.002)

Endpoint (b)
		${\tilde{L}}_{1} (β)$			${\tilde{L}}_{F U, I C} (β)$
	Parameter¹	Estimate	SE	95% CI	Estimate	SE	95% CI	${RE}^{3}$
Colorectal	Vitamin D quintile
	2	0.124	0.213	(–0.293, 0.541)	0.281	0.193	(–0.097, 0.658)	1.22
	3	-0.097	0.218	(–0.523, 0.330)	0.095	0.195	(–0.287, 0.478)	1.24
	4	-0.234	0.225	(–0.675, 0.207)	0.102	0.205	(–0.299, 0.504)	1.20
	5	-0.397	0.242	(–0.871, 0.078)	-0.096	0.219	(–0.526, 0.334)	1.22
	BMI $>$ 25	-0.001	0.161	(–0.316, 0.314)	0.089	0.151	(–0.207, 0.384)	1.13
	Diabetes	0.468	0.260	(–0.040, 0.977)	0.633	0.239	(0.165, 1.102)	1.18
Prostate	Vitamin D quintile
	2	0.003	0.186	(–0.362, 0.367)	-0.084	0.182	(–0.441, 0.272)	1.05
	3	0.434	0.170	(0.100, 0.768)	0.344	0.166	(0.019, 0.669)	1.06
	4	0.482	0.179	(0.131, 0.833)	0.289	0.171	(–0.046, 0.624)	1.10
	5	0.302	0.176	(–0.043, 0.647)	0.144	0.170	(–0.189, 0.477)	1.07
	BMI $>$ 25	-0.018	0.129	(–0.270, 0.234)	-0.078	0.122	(–0.318, 0.161)	1.11
	Diabetes	0.043	0.237	(–0.422, 0.508)	-0.091	0.221	(–0.525, 0.342)	1.15
Proportionality²	Intercept				3.384	0.555	(2.296, 4.472)
	t				-0.510	0.592	(–1.670, 0.651)
	$t^{2}$				-0.309	0.159	(–0.620, 0.002)

a

The reference levels of the covariates are the first vitamin D quintile, BMI $\leq 25$ and no diabetes.

b

The proportionality factor is modeled as a linear function of t, where t is the case time in days divided by 1000.

c

Relative efficiency is the ratio of the variance of an estimate from ${\tilde{L}}_{1} (β)$ to the variance of the corresponding estimate from ${\tilde{L}}_{F U, I C} (β)$ ⁠.

${\tilde{L}}_{F U, I C} (β)$ gives estimates with smaller variance than ${\tilde{L}}_{1} (β)$ ⁠, and the efficiency gain is greater for coefficients associated with colorectal cancer, the less common outcome. For the effect of vitamin D on cancer risks, the two conditional likelihoods give slightly different estimates. However, the two sets of estimates are very similar in terms of how they change with increasing vitamin D quintiles, and the conclusions based on $95 %$ CIs are the same except for the fourth quintile. ${\tilde{L}}_{F U, I C} (β)$ is more sensitive than ${\tilde{L}}_{1} (β)$ in detecting the effects of diabetes on colorectal cancer.

Likelihood ratio tests are used to examine whether there is any effect of vitamin D quintiles on the risks of two cancers. Each test has 4 degrees of freedom. The test statistics (P-value) for no effect of vitamin D quintiles on colorectal cancer are 6.7 (0.15) for endpoint (a) and 6.1 (0.18) for endpoint (b) based on ${\tilde{L}}_{1} (β)$ ⁠, and 4.4 (0.36) for endpoint (a) and 3.8 (0.44) for endpoint (b) based on ${\tilde{L}}_{F U, I C} (β)$ ⁠. It’s noteworthy that when prostate cancer is considered as a competing risk, the effects of vitamin D on colorectal cancer risk are no longer significant, contrary to the significant protective effects found in the original colorectal cancer study (Weinstein et al. 2015). All test results are consistent with the null hypothesis of no effect on colorectal cancer. On the other hand, the test statistics (P-value) for no effect of vitamin D quintiles on prostate cancer are 18.0 (0.001) for endpoint (a) and 12.6 (0.01) for endpoint (b) based on ${\tilde{L}}_{1} (β)$ ⁠, and 12.3 (0.02) for endpoint (a) and 8.7 (0.07) for endpoint (b) based on ${\tilde{L}}_{F U, I C} (β)$ ⁠. The two conditional likelihoods thus draw different conclusions regarding the effect of serum vitamin D on prostate cancer for endpoint $(b)$ ⁠.

${\tilde{L}}_{F U, I C} (β)$ also allows formally testing covariate effects across two cancer types. The LRT statistics (df, P-value) for equality of effects on the two cancer types are 9.3 $(4, 0.054)$ for vitamin D quintiles, 1.0 $(1, 0.33)$ for BMI, and 6.7 $(1, 0.01)$ for diabetes for endpoint (a). For endpoint (b), the corresponding test statistics (df, P-value) are 8.5 $(4, 0.08)$ ⁠, 0.8 $(1, 0.36)$ ⁠, and 5.9 $(1, 0.02)$ ⁠. It is only possible to interpret these tests within the group of subjects eligible to both studies, who are non-Hispanic White males with no history of cancer and colon diseases at baseline, and with at least one prostate cancer screen in the PLCO before October 2003. Within this particular population, the effects of diabetes status on prostate cancer and colorectal cancers are significantly different, and the effects of serum vitamin D concentration on the two cancer types seem different but only marginally significant. Although this result appears different from the strong and opposite effects seen separately in the two original studies (Ahn et al. 2008; Weinstein et al. 2015), we caution direct comparisons because of the differences in study population and type of endpoint ((b) in the colorectal cancer study and (a) in the prostate cancer) between the prior studies and our reanalysis.

5 Discussion

The data augmentation method proposed by Lunn and McNeil (1995) has been a popular approach to analyzing competing risks data for its conceptual and computational simplicity. It is often used in addition to the standard model, such as in secondary analyses to investigate effects of exposure across cancer subtypes (Song et al. 2016), or to compare vaccine efficacy on different viral genotypes in vaccine trials (rgp120 HIV Vaccine Study Group 2005). While popular in practice, it only applies to full cohort data, but often covariates can only be ascertained for a subset of the cohort in biomedical studies. As evidenced in our method development, there are unique challenges in extending the Lunn and McNeil (1995) approach to nested case–control studies.

Our paper explores the feasibility of extending Lunn and McNeil (1995) for full cohorts to nested-case control studies. Under proportional risks, we find that efficiency gain is minimal for full cohort analyses, but substantial for NCC analyses. The efficiency advantage of model (2.2) persists even when more controls are matched to each case (Supplementary Table S7 of the Supplementary Materials). When the proportionality assumption does not hold, model (2.2) may lead to very biased estimates. In that case, we recommend modeling the proportionality factors as time-dependent to approximate the true model and reduce bias. Belot et al. (2010) report similar findings regarding bias and efficiency in full cohort analyses, and they recommend using cubic splines when the proportional hazards assumptions do not hold, including those for the proportionality factors. Alternatively, one may split the NCC data into subsets such that baselines are proportional within each subset, or only a single outcome is included. Although theoretically plausible, this may be difficult to implement unless there exists some clinical evidence to suggest such proportional subsets. For categorical outcomes typically modeled using polytomous logistic regression, model (2.2) provides an alternative analytical tool (Xue et al. 2013), where the time-dependent proportionality factors may be incorporated to reduce bias undernon-proportionality.

When different NCC studies from the same cohort are combined, the modifications needed for model (2.2) are also discussed in detail and supported by theory and simulations. We present how to use graphics and formal tests to assess the proportionality assumption for both full cohort and NCC analyses. The PLCO example demonstrates how to flexibly apply methods proposed in this paper to real-world problems. Our methods also provides a way to reuse existing NCC samples for competing risks analyses, alternative to approaches based on maximum likelihood estimation or weighted partial likelihood (Saarela et al. 2008; Støer and Samuelsen 2013). The maximum likelihood approach requires additional unverifiable assumptions and the computation can be challenging. The weighted partial likelihood approach potentially allows combining studies on different time axes (e.g. time on study and age), thus more flexible than the conditional likelihood approach. However, asymptotic theory is generally not established except for Kaplan-Meier (KM) type of weights. When there is additional matching, weighted partial likelihood with KM-weights may be computationally intensive (Støer and Samuelsen 2013, 2016). Batch effects, close matching, and small cohort sizes may also bias WPL estimates.

As with Lunn and McNeil (1995), our method assumes pooled failure times are untied. With a small number of ties, it is reasonable to break the ties by adding small randomly generated numbers to the tied times. Heavy ties may be handled with exact conditional likelihood or approximations provided by standard software.

We observe that when the numbers of cases are small, the parameter estimates from both models (2.1) and (2.2) can be biased, and the magnitude of bias increases with increasing effect sizes (data not shown). These findings are consistent with what Bertke et al. (2013) report in a simulation study on nested case–control studies with limited number of cases and a single outcome. It is worth noting that with small numbers of cases, model (2.2) estimates are on average less biased than those of model (2.1), when model (2.2) is appropriate. That is, model (2.2) seems to require a smaller number of cases to reach consistency and asymptotic normality than model (2.1) when the proportionality assumption holds.

This paper focuses on modeling cause-specific hazards. Another popular model for competing risks data is the Fine-Gray model for sub-distribution hazards (Fine and Gray 1999). It is possible to specify proportional risks models for sub-distribution hazards. However, as the conditioning is different, it is unclear to us how to fit the model using the partial-likelihood approach of Lunn and McNeil (1995). This is so for full cohort data, as well as data based on the nested case–control design of (Wolkewitz et al. 2014) for sub-distribution hazards.

There are, however, some limitations to our methods. First, combination of NCC studies and use of model (2.2) require all studies to be on the same time axis and collect the same covariates. Second, the weighted baseline approach requires knowing the number of cohort subjects at risk at each failure time. Note that the WPL approach also share this limitation and the covariate part of the first limitation. Third, if complicated functions of time are needed to model the proportionality factors, there may be efficiency loss for model (2.2). Fourth, no off-the-shelf software is available for combining two studies with different matching variables. Lastly, if the eligibility criteria differ for $K \geq 3$ studies, evaluation of the proportionality assumption may be a tedious task. Future work includes developing software to automate evaluation of the proportionality assumption, extending the proportional risks model to weighted partial likelihood analyses, and combining NCC studies from different cohorts.

Acknowledgments

Cancer incidence data have been provided by the Alabama Statewide Cancer Registry, Arizona Cancer Registry, Colorado Central Cancer Registry, District of Columbia Cancer Registry, Georgia Cancer Registry, Hawaii Cancer Registry, Cancer Data Registry of Idaho, Maryland Cancer Registry, Michigan Cancer Surveillance Program, Minnesota Cancer Surveillance System, Missouri Cancer Registry, Nevada Central Cancer Registry, Ohio Cancer Incidence Surveillance System, Pennsylvania Cancer Registry, Texas Cancer Registry, Utah Cancer Registry, Virginia Cancer Registry, and Wisconsin Cancer Reporting System. All are supported in part by funds from the Center for Disease Control and Prevention, National Program for Central Registries, local states or by the National Cancer Institute, Surveillance, Epidemiology, and End Results program. The results reported here and the conclusions derived are the sole responsibility of the authors.

Supplementary material

Supplementary material is available at Biostatistics Journal online.

Funding

Y.E.S. work was supported by the New Faculty Startup Fund from Seoul National University, the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-00211561), and the LAMP Program of the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (RS-2023-00301976).

Conflict of interest statement

None declared.

Data availability

The $R$ code for implementing the proposed methods, simulations, and data examples is available at https://github.com/yench/PRMinNCC.

References

Ahn

J

,

Peters

U

,

Albanes

D

,

Purdue

MP

,

Abnet

CC

,

Chatterjee

N

,

Horst

RL

,

Hollis

BW

,

Huang

W-Y

,

Shikany

JM

, et al.

Serum vitamin D concentration and prostate cancer risk: a nested case–control study

.

J Natl Cancer Inst.

2008

:

100

(

11

):

796

–

804

.

Belot

A

,

Abrahamowicz

M

,

Remontet

L

,

Giorgi

R.

Flexible modeling of competing risks in survival analysis

.

Stat Med.

2010

:

29

(

23

):

2453

–

2468

.

Bertke

S

,

Hein

M

,

Schubauer-Berigan

M

,

Deddens

J.

A simulation study of relative efficiency and bias in the nested case–control study design

.

Epidemiol Methods

.

2013

:

2

(1):

85

–

93

.

Borgan

Ø.

Estimation of covariate-dependent markov transition probabilities from nested case-control data

.

Stat Methods Med Res.

2002

:11(

2

):

183

–

202

.

Breslow

NE.

Contribution to discussion of paper by Dr Cox

.

J R Stat Soc Ser B

.

1972

:

34

:

216

–

217

.

Cox

DR.

Regression models and life-tables

.

J R Stat Soc SerB (Methodol)

.

1972

:

34

(

2

):

187

–

202

.

Fine

JP

,

Gray

RJ.

A proportional hazards model for the subdistribution of a competing risk

.

J Am Stat Assoc

.

1999

:

94

(

446

):

496

–

509

.

Gray

RJ.

Some diagnostic methods for cox regression models through hazard smoothing

.

Biometrics

.

1990

:

46

:

93

–

102

.

Holt

J.

Competing risk analyses with special reference to matched pair experiments

.

Biometrika

.

1978

:

65

(

1

):

159

–

165

.

Kalbfleisch

JD

,

Prentice

RL.

The statistical analysis of failure time data

.

Hoboken, New Jersey

:

Joh Wiley & Sons

;

2002

.

Langholz

B

,

Borgan

Ø.

Estimation of absolute risk from nested case-control data

.

Biometrics.

1997

:

53

(

2

):

767

–

774

.

Langholz

B

,

Thomas

DC.

Nested case-control and case-cohort methods of sampling from a cohort: a critical comparison

.

Am J Epidemiol.

1990

:

131

(

1

):

169

–

176

.

Lubin

JH.

Case-control methods in the presence of multiple failure times and competing risks

.

Biometrics

.

1985

:

41

(

1

):

49

–

54

.

Lunn

M

,

McNeil

D.

Applying cox regression to competing risks

.

Biometrics

.

1995

:

51

(

2

):

524

–

532

.

Mondul

AM

,

Weinstein

SJ

,

Horst

RL

,

Purdue

M

,

Albanes

D.

Serum vitamin D and risk of bladder cancer in the prostate, lung, colorectal, and ovarian (plco) cancer screening trial

.

Cancer Epidemiol Biomark Prevent.

2012

:

21

(

7

):

1222

–

1225

.

Muller

DC

,

Hodge

AM

,

Fanidi

A

,

Albanes

D

,

Mai

X-M

,

Shu

XO

,

Weinstein

SJ

,

Larose

TL

,

Zhang

X

,

Han

J

, et al.

No association between circulating concentrations of vitamin D and risk of lung cancer: an analysis in 20 prospective studies in the lung cancer cohort consortium (lc3)

.

Ann Oncol.

2018

:

29

(

6

):

1468

–

1475

.

Peters

U

,

Hayes

RB

,

Chatterjee

N

,

Shao

W

,

Schoen

RE

,

Pinsky

P

,

Hollis

BW

,

McGlynn

KA

,

Prostate C, Lung, O. C. S. P. Team

.

Circulating vitamin D metabolites, polymorphism in vitamin D receptor, and colorectal adenoma risk

.

Cancer Epidemiol Biomark Prevent.

2004

:13(

4

):

546

–

552

.

rgp120 HIV Vaccine Study Group

.

Placebo-controlled phase 3 trial of a recombinant glycoprotein 120 vaccine to prevent hiv-1 infection

.

J Infect Dis.

2005

:

191

(

5

):

654

–

665

.

PubMed

Saarela

O

,

Kulathinal

S

,

Arjas

E

,

Läärä

E.

Nested case–control data utilized for multiple outcomes: a likelihood approach and alternatives

.

Stat Med.

2008

:

27

(

28

):

5991

–

6008

.

Salim

A

,

Yang

Q

,

Reilly

M.

The value of reusing prior nested case–control data in new studies with different outcome

.

Stat Med.

2012

:

31

(

11–12

):

1291

–

1302

.

PubMed

Song

M

,

Nishihara

R

,

Wang

M

,

Chan

AT

,

Qian

ZR

,

Inamura

K

,

Zhang

X

,

Ng

K

,

Kim

SA

,

Mima

K

, et al.

Plasma 25-hydroxyvitamin D and colorectal cancer risk according to tumour immunity status

.

Gut.

2016

:

65

(

2

):

296

–

304

.

Støer

NC

,

Samuelsen

SO.

Comparison of estimators in nested case–control studies with multiple outcomes

.

Lifetime Data Anal.

2012

:

18

:

261

–

283

.

Støer

NC

,

Samuelsen

SO.

Inverse probability weighting in nested case-control studies with additional matching—a simulation study

.

Stat Med.

2013

:

32

(

30

):

5328

–

5339

.

Støer

NC

,

Samuelsen

SO.

multiplencc: inverse probability weighting of nested case-control data

.

R J

.

2016

:

8

(

2

):

5

.

Tai

B-C

,

Machin

D

,

White

I

,

Gebski

V.

Competing risks analysis of patients with osteosarcoma: a comparison of four different approaches

.

Stat Med.

2001

:

20

(

5

):

661

–

684

.

Weinstein

SJ

,

Purdue

MP

,

Smith-Warner

SA

,

Mondul

AM

,

Black

A

,

Ahn

J

,

Huang

W-Y

,

Horst

RL

,

Kopp

W

,

Rager

H

, et al.

Serum 25-hydroxyvitamin D, vitamin D binding protein and risk of colorectal cancer in the prostate, lung, colorectal and ovarian cancer screening trial

.

Int J Cancer

.

2015

:136(

6

):

E654

–

E664

.

Wolkewitz

M

,

Cooper

BS

,

Palomar-Martinez

M

,

Olaechea-Astigarraga

P

,

Alvarez-Lerma

F

,

Schumacher

M.

Nested case–control studies in cohorts with competing events

.

Epidemiology.

2014

:

25

(

1

):

122

–

125

.

Xue

X

,

Kim

MY

,

Gaudet

MM

,

Park

Y

,

Heo

M

,

Hollenbeck

AR

,

Strickler

HD

,

Gunter

MJ.

A comparison of the polytomous logistic regression and joint cox proportional hazards models for evaluating multiple disease subtypes in prospective cohort studies

.

Cancer Epidemiol Biomark Prevent.

2013

:

22

(

2

):

275

–

285

.