Discrete-time competing-risks regression with or without penalization

ABSTRACT

Many studies employ the analysis of time-to-event data that incorporates competing risks and right censoring. Most methods and software packages are geared towards analyzing data that comes from a continuous failure time distribution. However, failure-time data may sometimes be discrete either because time is inherently discrete or due to imprecise measurement. This paper introduces a new estimation procedure for discrete-time survival analysis with competing events. The proposed approach offers a major key advantage over existing procedures and allows for straightforward integration and application of widely used regularized regression and screening-features methods. We illustrate the benefits of our proposed approach by a comprehensive simulation study. Additionally, we showcase the utility of the proposed procedure by estimating a survival model for the length of stay of patients hospitalized in the intensive care unit, considering 3 competing events: discharge to home, transfer to another medical facility, and in-hospital death. A Python package, PyDTS, is available for applying the proposed method with additional features.

competing events, penalized regression, regularized regression, sure independent screening, survival analysis

1 INTRODUCTION

Most methods and software for survival analysis are tailored to data with continuous failure-time distributions. However, there are situations where failure times are discrete. This can be due to the nature of the time unit being discrete or because of inaccuracies in measurement. An example is time to pregnancy, where the observation time is defined by the number of menstrual cycles. In some cases, events can happen at any point in time, but only the time interval in which each event occurred is recorded in available data. For instance, death from cancer recorded in months since diagnosis (Lee et al., 2018).

Competing events occur when individuals are susceptible to several types of events but can only experience at most one event at a time. If multiple events can happen simultaneously, they can be treated as a separate event type (Kalbfleisch and Prentice, 2011). For instance, competing risks in a study of hospital length of stay could be discharge and in-hospital death, where the occurrence of one of these events prevents observation of the other event for the same patient. Another classic example of competing risks is cause-specific mortality, such as death from heart disease, cancer, or other causes. It is acknowledged that using standard continuous-time models on discrete-time data with competing events without proper adjustments can lead to a systematic bias (Lee et al., 2018; Wu et al., 2022). For example, Lee et al. (2018) noted the bias in the cumulative incidence function estimator, resulting in poor coverage rates, such as an empirical coverage of 0.66 versus a nominal level of 0.95.

The motivation for this project is to analyze data of length of stay (LOS) of patients in healthcare facilities. LOS typically refers to the number of days a patient stays in the hospital during a single admission (Awad et al., 2017; Lequertier et al., 2021). Accurate prediction of LOS is crucial for hospital management and planning of bed capacity, as it affects healthcare delivery access, quality, and efficiency (Lequertier et al., 2021). In particular, hospitalizations in intensive care units (ICU) consume a significant amount of hospital resources per patient (Adhikari et al., 2010). In this study, we use the publicly available Medical Information Mart for Intensive Care (MIMIC) - IV (version 2.0) data (Goldberger et al., 2000; Johnson et al., 2022) to develop a model for predicting LOS in ICU based on patients’ characteristics upon arrival in ICU. The study involves 25,170 ICU admissions from 2014 to 2020 with only 28 unique times, resulting in many tied events at each time point. The 3 competing events analyzed were discharge to home (69.0%), transfer to another medical facility (21.4%), and in-hospital death (6.1%). Patients who left the ICU against medical advice (1.0%) were considered censored, and administrative censoring was imposed for patients hospitalized for more than 28 days (2.5%).

Regression analysis of continuous-time survival data with competing risks can be performed using standard non-competing events tools because the likelihood function for the continuous-time setting can be factored into likelihoods for each cause-specific hazard function (Kalbfleisch and Prentice, 2011). However, this is not the case for some regression models of discrete-time survival data with competing risks (Allison, 1982). The literature on discrete-time survival data with competing risks can be categorized into 2 primary groups. The first group involves cause-specific hazard functions that serve as a natural and direct analogy to those found in the continuous survival time context. In this case, the cause-specific hazard function is solely dependent on the parameters of the specific competing event (Allison, 1982; Lee et al., 2018; Wu et al., 2022). As this formulation results in a likelihood that cannot be decomposed into distinct components for each type of event, Allison (1982) explored an alternative, more manageable formulation. In particular, he introduced a cause-specific hazard function that depends not only on the parameters associated with the specific competing event but also on the parameters related to all other event types. This cause-specific hazard formulation was later adopted and further developed by others (Möst et al., 2016; Schmid and Berger, 2021; Tutz et al., 2016). As noted by Allison (1982), the advantage of the second approach is a significant simplification of the estimation procedure, albeit at the cost of interpretability. For additional technical details comparing these 2 approaches, please refer to Section 5 below. In this work, we choose to align with the first suggestion of Allison (1982) and the approach also adopted by Lee et al. (2018), and focus on the natural and direct analogy to the cause-specific hazard function in the context of continuous survival time, as this formulation of cause-specific hazard models provides a clearer interpretation.

Lee et al. (2018) showed that if one naively treats competing events as censoring in the discrete-time likelihood, separate estimation of cause-specific hazard models for different event types may be accomplished using a collapsed likelihood, which is equivalent to fitting a generalized linear model to repeated binary outcomes. Moreover, the maximum collapsed-likelihood estimators are consistent and asymptotically normal under standard regularity conditions, which gives rise to Wald confidence intervals and likelihood-ratio tests for the effects of covariates. Wu et al. (2022) focused on 2 competing events and used a different approach from that of Lee et al. (2018). However, they noted that it leads to the same estimators. The contribution of Wu et al. (2022) is mainly by allowing an additional fixed effect of medical center in the model.

Consider a setting with M competing events. Each cause-specific hazard model of Lee et al. (2018) includes |$d+p$| parameters, where d parameters can be viewed as the cause-specific baseline-hazard parameters and p are the unknown cause-specific regression coefficients. As will be shown in Section 2, the standard maximum likelihood approach requires estimating |$M(d+p)$| parameters simultaneously. Lee et al. (2018) substantially simplified this by estimating |$p+d$| parameters for each competing event separately.

In this work we focus primarily on the popular logit-link function and introduce a new estimation technique that further simplifies the estimation procedure. Our new estimator separates the estimation procedure of the d cause-specific baseline-hazard parameters and the p cause-specific regression coefficients, within each event type. It will be demonstrated that this separation is highly useful for incorporating common penalized methods like lasso and elastic net, among others (Hastie et al., 2009) and enables easy implementation of screening methods for high-dimensional data, such as sure independent screening (Fan et al., 2010; Fan and Lv, 2008; Saldana and Feng, 2018; Zhao and Li, 2012). Our Python software, PyDTS (Meir et al., 2022), implements both our method and the one from Lee et al. (2018) and other tools for discrete-time survival analysis.

It might seem that the advantages of using a proper discrete-time competing-events regression model over a continuous-time model would diminish when d is large. However, Web Appendix A provides simulation results challenging this assumption. We compared our discrete analysis to a naive analysis using the standard partial-likelihood approach, like that in the R function coxph. With 2000 observations, 9 time points, and 2 competing events, the naive approach’s baseline hazard estimators—regardless of using Breslow, Efron, or Exact tie corrections—showed substantial biases. In contrast, our method produced almost unbiased results. This finding holds similarly with 5000 observations and 50 time points. The bias in the naive approach arises from the inappropriate use of the Breslow estimator for baseline hazards, which theoretically is only justifiable when the likelihood can be decomposed into distinct components for each event type, a criterion our discrete-time model does not meet, as will be shown in the next section.

2 METHODS

2.1 Models and likelihood function

Consider T as a discrete event time taking values |${1,2,\ldots ,d}$|⁠, and J as the type of event, where |$J \in \lbrace 1,\ldots ,M\rbrace$|⁠. Let |${\bf Z}$| be a |$p \times 1$| vector of time-independent covariates. The setting of time-dependent covariates will be discussed later. The discrete cause-specific hazard function is defined as |$\lambda _j(t|{\bf Z}) = \Pr (T=t,J=j|T\ge t, {\bf Z})$| for |$t = 1,2,\ldots ,d$| and |$j=1,\ldots ,M$|⁠. Following Allison (1982) framework, the semi-parametric hazard functions via a regression transformation model are expressed as

$$\begin{eqnarray} h(\lambda _{j}(t|{\bf Z})) = \alpha _{jt} +{\bf Z}^T {\boldsymbol {\beta }}_j \quad t = 1,2,\ldots , \quad j=1,\ldots ,M, \end{eqnarray}$$

where h is a known function. The model’s complexity is emphasized by its semi-parametric nature, handling |$M(d+p)$| unknown parameters. The shared covariates |${\bf Z}$| among the M models do not require that every model use all the covariates. The regression coefficient vectors, |${\boldsymbol {\beta }}_j$|⁠, are specific to different event types, allowing for flexibility in model specification. By setting any coefficient to zero, its corresponding covariate can be excluded from that particular model. We adopt the popular logit transformation |$h(a)=\log \lbrace a/(1-a) \rbrace$|⁠, leading to the following cause-specific hazard function

$$\begin{eqnarray} \lambda _j(t|{\bf Z})=\frac{\exp (\alpha _{jt}+{\bf Z}^T{\boldsymbol {\beta }}_j)} {1+\exp (\alpha _{jt}+{\bf Z}^T{\boldsymbol {\beta }}_j)} \, . \end{eqnarray}$$

(1)

This approach, which leaves |$\alpha _{jt}$| unspecified, parallels the method of an unspecified baseline hazard in the Cox model (Cox, 1972), affirming the semi-parametric nature of our discrete-time model.

Define |$S(t|{\bf Z}) = \Pr (T>t|{\bf Z})$| as the overall survival given |${\bf Z}$|⁠. Then, the probability that an event of type j occurs at time t, |$t=1,\ldots ,d$|⁠, |$j=1,\ldots ,M$|⁠, is given by

$$\begin{eqnarray} \Pr (T=t,J=j|{\bf Z})&=&\lambda _j(t|{\bf Z})S(t-1|{\bf Z})\\ &=&\lambda _j(t|{\bf Z}) \prod _{k=1}^{t-1} \left\lbrace 1- \sum _{j^{\prime }=1}^M\lambda _{j^{\prime }}(k|{\bf Z}) \right\rbrace \, . \end{eqnarray}$$

The probability of event type j by time t given Z, also known as the cumulative incident function (CIF) is |$F_j(t|{\bf Z}) = \sum _{k=1}^{t}\lambda _j(k|{\bf Z}) \prod _{l=1}^{k-1} \left\lbrace 1-\sum _{j^{\prime }=1}^M\lambda _{j^{\prime }}(l|{\bf Z}) \right\rbrace $|⁠, and the marginal probability of event type j equals |$\Pr (J=j|{\bf Z}) = \sum _{t=1}^{d} \lambda _j(t|{\bf Z}) \prod _{k=1}^{t-1} \left\lbrace 1-\sum _{j^{\prime }=1}^M\lambda _{j^{\prime }}(k|{\bf Z}) \right\rbrace $|⁠. Our goal is estimating the parameters |$\Omega = (\alpha _{11},\ldots ,\alpha _{1d},{\boldsymbol {\beta }}_1^T, \ldots , \alpha _{M1},\ldots ,\alpha _{Md},{\boldsymbol {\beta }}_M^T)$|⁠.

For simplicity, we temporarily assume 2 competing events, ie, |$M=2$|⁠. The data consist of n independent observations, each with |$(X_i,\delta _i,J_i,{\bf Z}_i)$| where |$X_i=\min (C_i,T_i)$|⁠, |$C_i$| is a discrete right-censoring time, |$\delta _i=I(T_i \le C_i)$| is the event indicator and |$J_i\in \lbrace 0,1,2\rbrace$|⁠, where |$J_i=0$| if and only if |$\delta _i=0$|⁠, |$i=1,\ldots ,n$|⁠. It is assumed that given the covariates, the censoring and failure times are independent and non-informative in the sense of Section 3.2 of Kalbfleisch and Prentice (2011). In the case of grouped continuous-time data, it is assumed that events always occur before censoring within the same interval. Then, the likelihood function is proportional to

$$\begin{eqnarray} L &=& \prod _{i=1}^n \left\lbrace \frac{\lambda _1(X_i|{\bf Z}_i)}{1-\lambda _1(X_i|{\bf Z}_i)-\lambda _2(X_i|{\bf Z}_i)}\right\rbrace ^{I(J_i=1)}\\ &&\times \, \left\lbrace \frac{\lambda _2(X_i|{\bf Z}_i)}{1-\lambda _1(X_i|{\bf Z}_i)-\lambda _2(X_i|{\bf Z}_i)}\right\rbrace ^{I(J_{i}=2)} \\ && \times \prod _{t=1}^{X_i}\lbrace 1-\lambda _1(t|{\bf Z}_i)-\lambda _2(t|{\bf Z}_i)\rbrace \, . \end{eqnarray}$$

Equivalently,

$$\begin{eqnarray} L &=& \prod _{i=1}^n \left[\prod _{j=1}^2 \prod _{t=1}^{X_i} \left\lbrace \frac{\lambda _j(t|{\bf Z}_i)}{1-\lambda _1(t|{\bf Z}_i)-\lambda _2(t|{\bf Z}_i)}\right\rbrace ^{\delta _{jit}}\right] \\ &&\times \,\prod _{t=1}^{X_i}\lbrace 1-\lambda _1(t|{\bf Z}_i)-\lambda _2(t|{\bf Z}_i)\rbrace \end{eqnarray}$$

and the log-likelihood (up to a constant) becomes

$$\begin{eqnarray} \log L &=& \sum _{i=1}^n \sum _{t=1}^{X_i} \left[\sum _{j=1}^2\delta _{jit} \log \lambda _j(t|{\bf Z}_i) + \lbrace 1-\delta _{1it}-\delta _{2it}\rbrace\right.\\ &&\left.\times \,\vphantom{\sum _{j=1}^2}% \log \lbrace 1-\lambda _1(t|{\bf Z}_i)-\lambda _2(t|{\bf Z}_i)\rbrace \right] \end{eqnarray}$$

(2)

where |$\delta _{jit}$| is set to one if subject i experiences event of type j at time t, and 0 otherwise. Evidently, in contrast to the continuous-time setting with competing events, L cannot be decomposed into separate likelihoods for each cause-specific hazard function |$\lambda _j$|⁠. To estimate |$\Omega$|⁠, which encompasses |$M(d+p)$| parameters, maximizing |$\log L$| becomes time-intensive. Lee et al. (2018) suggested estimating each set of |$d+p$| parameters of each cause independently. We enhance this approach by separately estimating |$(\alpha _{j1},\ldots ,\alpha _{jd})$| and |${\boldsymbol {\beta }}_j$| for each cause.

2.2 The collapsed log-likelihood approach of Lee et al. (2018)

The estimation method of Lee et al. (2018) uses a collapsed log-likelihood approach, simplifying the analysis by expanding the dataset. Each subject i is represented by multiple dummy observations up to time |$X_i$|⁠. For each time |$t \le X_i$|⁠, indicators |$\delta _{jit}=I(T_i=t, J_i=j)$| are defined for whether event type j occurs at time t; see Table S1 of the Supplementary Material (SM). This setup allows for a conditional multinomial distribution of events. With |$M=2$|⁠, we get |$\lbrace \delta _{1it},\delta _{2it},1-\delta _{1it}-\delta _{2it}\rbrace$| and the estimation of |$(\alpha _{11},\ldots ,\alpha _{1d},{\boldsymbol {\beta }}_1^T)$| utilizes a collapsed log-likelihood where |$\delta _{2it}$| and |$1-\delta _{1it}-\delta _{2it}$| are combined. This collapsed log-likelihood is tailored for analyzing cause |$j=1$| using a binary regression model, with |$\delta _{1it}$| serving as the outcome variable, and is given by

$$\begin{eqnarray} &&\log L_1(\alpha _{11},\ldots ,\alpha _{1d},{\boldsymbol {\beta }}_1)\\ &&\quad = \sum _{i=1}^n \sum _{t=1}^{X_i}[\delta _{1it} \log \lambda _1(t|{\bf Z}_i)+(1-\delta _{1it})\\ &&\qquad\times \,\log \lbrace 1-\lambda _1(t|{\bf Z}_i)\rbrace ] \, . \end{eqnarray}$$

Similarly, the collapsed log-likelihood for cause |$j=2$| with |$\delta _{2it}$| as the outcome becomes

$$\begin{eqnarray} && \log L_2(\alpha _{21},\ldots ,\alpha _{2d},{\boldsymbol {\beta }}_2)\\ &&\quad = \sum _{i=1}^n \sum _{t=1}^{X_i}[\delta _{2it} \log \lambda _2(t|{\bf Z}_i)+(1-\delta _{2it})\\ &&\qquad\times \, \log \lbrace 1-\lambda _2(t|{\bf Z}_i)\rbrace] \, , \end{eqnarray}$$

and one can fit the 2 models, separately. In general, for M competing events, the estimators of |$(\alpha _{j1},\ldots ,\alpha _{jd},{\boldsymbol {\beta }}_j^T)$|⁠, are the respective values that maximize

$$\begin{eqnarray} && \log L_j(\alpha _{j1},\ldots ,\alpha _{jd},{\boldsymbol {\beta }}_j)\\ &&\quad = \sum _{i=1}^n \sum _{t=1}^{X_i}[\delta _{jit} \log \lambda _j(t|{\bf Z}_i)+(1-\delta _{jit})\\ &&\qquad\times \, \log \lbrace 1-\lambda _j(t|{\bf Z}_i)\rbrace] \end{eqnarray}$$

(3)

with |$j=1,\ldots ,M$|⁠. Namely, each maximization for event j involves |$d + p$| parameters. Lee et al. (2018) showed that the estimators are asymptotically multivariate normally distributed and the covariance matrix can be consistently estimated. Since L does not separate into distinct components for each event type, optimizing each collapsed likelihood |$L_j$| separately does not produce the same results as maximizing the entire likelihood across all parameters. This introduces a trade-off between computational simplicity and the potential loss of estimation efficiency. The authors also pointed out that standard generalized linear models (GLM) could be used for each |$\log L_j$|⁠, and due to the Markov property ensuring conditional independence, the basic variance estimator from the GLM, which presumes independence, remains valid.

2.3 The proposed approach

When applying penalized regression or screening analysis (ie, performing separate regression for each covariate) with the above collapsed log-likelihoods, it is necessary to estimate both |$(\alpha _{j1},\ldots ,\alpha _{jd})$| and |${\boldsymbol {\beta }}_j$| for each cause j, rather than only |${\boldsymbol {\beta }}_j$|⁠. Our proposed procedure separates the estimation of |$(\alpha _{j1},\ldots ,\alpha _{jd})$| and |${\boldsymbol {\beta }}_j$| within each cause. This separation allows for focusing solely on estimating |${\boldsymbol {\beta }}_j$| during the penalized regression or screening processes. Subsequently, |$(\alpha _{j1},\ldots ,\alpha _{jd})$| is consistently estimated using new estimating equations.

For separating the estimation of |$(\alpha _{j1},\ldots ,\alpha _{jd})$| and |${\boldsymbol {\beta }}_j$| within each cause, we adopt the conditional-logistic regression approach (Cox, 2018; Gail et al., 1981). This involves analyzing the expanded dataset. Let |$\mathcal {N}_t$| be the set of all dummy observations with |$\widetilde{X}$| equal to t (see Table S1 of SM). A likelihood based on conditional-logistic regression is replacing Eq. (3), which stratifies the expanded dataset by |$\widetilde{X}$| and conditions on the number of observed events within each stratum, |$\sum _{i \in \mathcal {N}_t} \delta _{jit}$|⁠. Specifically, define |${\bf d}_{jt}$| as a vector of 0s and 1s with a length equal to the cardinality of |$\mathcal {N}_t$|⁠, where |$d_{jit}$| represents its components. Also, let |$\mathcal {S}_{jt}$| be the set of all possible vectors |${\bf d}_{jt}$| such that |$\sum _{i \in \mathcal {N}_t }d_{jit}=\sum _{i \in \mathcal {N}_t}\delta _{jit}$|⁠. Then, the conditional likelihoods of the expanded data, stratified by |$\widetilde{X}$| and given |$\sum _{i \in \mathcal {N}_t} \delta _{jit}$|⁠, |$t=1,\ldots ,d$|⁠, are given by

$$\begin{eqnarray} L_j^{\mathcal {C}}({\boldsymbol {\beta }}_j) &=& \prod _{t=1}^{d} \frac{\exp (\sum _{i \in \mathcal {N}_t} \delta _{jit} {\bf Z}_i^T {\boldsymbol {\beta }}_j)}{\sum _{{\bf d}_{jt} \in \mathcal {S}_{jt}} \exp (\sum _{i \in \mathcal {N}_t} d_{jit} {\bf Z}_i^T {\boldsymbol {\beta }}_j)} \\ && j=1,\ldots ,M \, . \end{eqnarray}$$

(4)

The estimators |$\widehat{{\boldsymbol {\beta }}}_j$| are the values of |${{\boldsymbol {\beta }}}_j$| that maximize the conditional likelihoods. Clearly, |$\exp (\alpha _{jt})$| in the numerator and denominator, within each j and t, is canceled out.

Equation (4) resembles the partial likelihood from a Cox regression model when ties are present (see, eg, Eq. (8.4.3) of Klein and Moeschberger (2003)), enabling the use of standard Cox-model routines for estimating |${\boldsymbol {\beta }}_j$|⁠, |$j=1,\ldots ,M$|⁠. In R, the clogit function employs this strategy by creating necessary dummy variables and strata, then calling coxph. This function defaults to the Breslow approximation for conditional likelihood, with options for exact forms and other common tie approximations available. The use of available Cox model routine for maximizing Eq. (4) is only a mathematical trick, while Eq. (1) still holds.

Leveraging the estimators |$\widehat{{\boldsymbol {\beta }}}_j$|⁠, |$j=1,\ldots ,M$|⁠, we propose estimating |$\alpha _{jt}$|⁠, |$j=1,\ldots ,M$|⁠, |$t=1,\ldots ,d$|⁠, through a series of |$Md$| single-dimensional optimization algorithms applied to the original (ie, non-expanded) dataset such that for each |$(j,t)$|⁠,

$$\begin{eqnarray} \widehat{\alpha }_{jt} = \mbox{argmin}_{a} \left\lbrace \frac{1}{Y.(t)} \sum _{i=1}^n I(X_i \ge t)\frac{\exp (a+{\bf Z}_i^T\widehat{{\boldsymbol {\beta }}}_j)}{1+\exp (a+{\bf Z}_i^T\widehat{{\boldsymbol {\beta }}}_j)} - \frac{N_j(t)}{Y.(t)}\right\rbrace ^2 \, , \\ \end{eqnarray}$$

(5)

where |$Y.(t)=\sum _{i=1}^n I(X_i \ge t)$| and |$N_j(t)=\sum _{i=1}^n I(X_i = t, J_i=j)$|⁠. Equation (5) involves minimizing the squared difference between the observed proportion of failures of type j at time t, ie, |$N_j(t)/Y.(t)$|⁠, and the expected proportion of failures, as determined by Model (1) and |$\widehat{{\boldsymbol {\beta }}}_j$|⁠. Since each |$\alpha _{jt}$| is estimated separately, standard optimization routines like nlminb in R or minimize of scipy in python are suitable for use.

In summary, the new proposed estimation procedure consists of the following 2 steps:

Using the expanded dataset, estimate each |${\boldsymbol {\beta }}_j$| individually, by maximizing Eq. (4) using a stratified Cox routine, such as the clogit function in the survival R package, and get |$\widehat{{\boldsymbol {\beta }}}_j$|⁠, |$j = 1,\ldots , M$|⁠.
Using |$\widehat{{\boldsymbol {\beta }}}_j$|⁠, |$j = 1,\ldots , M$|⁠, and the non-expanded dataset, estimate each |$\alpha _{jt}$|⁠, |$j = 1,\ldots ,M$|⁠, |$t=1,\ldots ,d$|⁠, separately, by Eq. (5).

The simulation results in Section 3 show that the above two-step procedure performs well in terms of bias and provides similar standard errors to those of Lee et al. (2018).

The consistency and asymptotic normality of each |$\widehat{{\boldsymbol {\beta }}}_j$|⁠, |$j=1,\ldots ,M$|⁠, follow a similar argument of Lee et al. (2018). Namely, due to the Markov property, which includes conditional independence of the binary variables, the properties of the estimators and the naive variances’ estimators from the conditional logistic regression approach above which assumes independence remain valid, as |$n \rightarrow \infty$| and under finite fixed values of d and M. The consistency and asymptotic normality of |$\widehat{\alpha }_{jt}$| are derived in Web Appendix B.

The proposed two-step estimation procedure can easily handle covariates or coefficients that change over time, |${\bf Z}(t)$| and |${\boldsymbol {\beta }}_j(t)$|⁠, respectively. Similarly to continuous survival time, time-dependent covariates are coded by breaking the individual’s time into multiple time intervals, with one row of data for each interval. Hence, combining this data expansion step with the expansion described in Table S1 is straightforward. For time-dependent coefficients, |${\boldsymbol {\beta }}_j(t)$|⁠, Eq. (4) is replaced by |$L_j^{\mathcal {C}}({\boldsymbol {\beta }}_j(t)) = \frac{\exp \lbrace \sum _{i \in \mathcal {N}_t} \delta _{jit} {\bf Z}_i^T {\boldsymbol {\beta }}_j(t)\rbrace }{\sum _{{\bf d}_{jt} \in \mathcal {S}_t} \exp \lbrace \sum _{i \in \mathcal {N}_t} d_{jit} {\bf Z}_i^T {\boldsymbol {\beta }}_j(t)\rbrace }$| with |$j=1,\ldots ,M \, , \, t=1,\ldots ,d$|⁠. Clearly, one can easily combine time-dependent covariate with time-dependent coefficients. Estimating |$\alpha _{jt}$| with time-dependent covariates or regression coefficients involves using |${\bf Z}(t)$| and |$\widehat{{\boldsymbol {\beta }}}_j(t)$| in the modified version of Eq. (5).

2.4 The utility of the proposed approach

Advancements in data collection technologies have greatly increased the number of potential predictors. Our method of separating the estimation of |${\boldsymbol {\beta }}_j$| from |$(\alpha _{j1},\ldots ,\alpha _{jd})$| is particularly useful in dimension reduction and model selection. Below are 2 examples demonstrating the effectiveness of our two-step estimation procedure.

Example 1: Regularized regression. Penalized regression (Hastie et al., 2009) methods place a constraint on the size of the regression coefficients. We propose to apply penalized regression methods in Lagrangian form based on Eq. (4) by minimizing

$$\begin{eqnarray} -\log L_j^{\mathcal {C}}({\boldsymbol {\beta }}_j) + \eta _j P({\boldsymbol {\beta }}_j) \quad j=1,\ldots ,M \, , \end{eqnarray}$$

(6)

where P is a penalty function and |$\eta _j>0$| is a shrinkage tuning parameter. For instance, in the |$l_1$| penalty employed by lasso, |$P({\boldsymbol {\beta }}_j)=\sum _{k=1}^p|\beta _{jk}|$|⁠. In the case of |$l_2$| regularization for ridge regression, |$P({\boldsymbol {\beta }}_j)=\sum _{k=1}^p\beta ^2_{jk}$|⁠. Elastic net, on the other hand, involves an additional set of tuning parameters to balance between lasso and ridge regression (see Hastie et al. (2009) for additional penalty functions). Based on the proposed approach, any routine of regularized Cox regression model can be used for estimating |${\boldsymbol {\beta }}_j$|⁠, |$j=1,\ldots ,M$|⁠, based on (6) (eg, glmnet of R or CoxPHFitter of Python). Finally, |$\alpha _{j1},\ldots ,\alpha _{jd}$| are estimated only once the regularization step is completed and models are selected. In contrast, penalized regression using the collapsed log-likelihood approach of Lee et al. (2018) requires minimizing |$-\log L_j(\alpha _{j1},\ldots ,\alpha _{jd},{\boldsymbol {\beta }}_j) + \eta _j P({\boldsymbol {\beta }}_j)$|⁠, which necessitates estimating |$\alpha _{j1},\ldots ,\alpha _{jd}$|⁠.

The tuning parameters |$\eta _j$|⁠, |$j=1,\ldots ,M$|⁠, control the amount of regularization and their values play a crucial role. In our Python package, PyDTS, the values of |$\eta _j$| are selected by K-fold cross validation while the criterion is to maximize the out-of-sample global area under the receiver operating characteristics curve (AUC). Web Appendix I provides the definitions and estimators of the area under the receiver operating characteristics curve and Brier score for discrete-survival data with competing risks and right censoring. This includes the cause-specific AUC and Brier score at each time t, |$\mbox{AUC}_j(t)$| and |$\mbox{BS}_{j}(t)$|⁠; integrated cause-specific AUC and Brier score, |$\mbox{AUC}_j$| and |$\mbox{BS}_{j}$|⁠; and global AUC and Brier score, |$\mbox{AUC}$| and |$\mbox{BS}$|⁠.

Example 2: Sure independent screening. Under ultra-high dimension settings, most of the regularized methods suffer from the curse of dimensionality, high variance, and overfitting (Fan et al., 2012; Hastie et al., 2009). To overcome these issues, the marginal screening technique, sure independent screening (SIS) has been shown to filter out many uninformative variables under an ordinary linear model with normal errors (Fan and Lv, 2008). Subsequently, penalized variable selection methods are often applied to the remaining variables. The key idea of the SIS procedure is to rank all predictors by using a utility measure between the response and each predictor and then to retain the top variables. The SIS procedure has been extended to various models and data types such as generalized linear models (Fan and Song, 2010), additive models (Fan et al., 2011), and Cox regression models (Fan et al., 2010; Saldana and Feng, 2018; Zhao and Li, 2012). We focus on SIS and SIS followed by lasso (SIS-L) (Fan et al., 2010; Saldana and Feng, 2018) within the proposed two-step procedure.

SIS involves fitting a marginal regression for each covariate by maximizing

$$\begin{eqnarray} L_j^{\mathcal {C}}(\beta _{jr}) \quad j=1,\ldots ,M, \quad r=1,\ldots ,p \end{eqnarray}$$

(7)

where |${\boldsymbol {\beta }}_j=(\beta _{j1},\ldots ,\beta _{jp})^T$|⁠. The SIS procedure subsequently assesses the importance of features by ranking them according to the magnitude of their marginal regression coefficients. Then, the selected sets of variables are given by |$\widehat{\mathcal {M}}_{j,w_n} = \lbrace 1 \le k \le p \, : \, |\widehat{{\boldsymbol {\beta }}}_{jk}| \ge w_n \rbrace$|⁠, |$j=1,\ldots ,M$|⁠, where |$w_n$| is a threshold value. We adopt the data-driven threshold of Saldana and Feng (2018). Given data of the form |$\lbrace X_i,\delta _i,J_i,{\bf Z}_i \, ; \, i=1,\dots ,n\rbrace$|⁠, a random permutation |$\pi$| of |$\lbrace 1,\ldots ,n\rbrace$| is used to decouple |${\bf Z}_i$| and |$(X_i,\delta _i,J_i)$| so that the resulting data |$\lbrace X_i,\delta _i,J_i,{\bf Z}_{\pi (i)} \, ; \, i=1,\dots ,n \rbrace$| follow a model in which the covariates have no predicted power over the survival time of any event type. For the permuted data, we re-estimate individual regression coefficients and get |$\widehat{\beta }^{*}_{jr}$|⁠. The data-driven threshold is defined by |$w_n = \max _{1\le j \le M, 1\le k \le p}|\widehat{\beta }^{*}_{jk}|$|⁠. For SIS-L procedure, the lasso regularization is then added in the first step of our procedure applied to the set of covariates selected by SIS. In contrast to (7), applying SIS or SIS-L with the collapsed log-likelihood approach requires maximizing |$L_j(\alpha _{j1},\ldots ,\alpha _{jd},\beta _{jr})$|⁠, |$j=1,\ldots ,M$|⁠, |$r=1,\ldots ,p$|⁠, which involves estimating |$\alpha _{j1},\ldots ,\alpha _{jd}$|⁠.

3 SIMULATION STUDY

We evaluated our approach using a simulation study across 19 settings, detailed in Table S2 of the SM, and compared the results with Lee et al. (2018). The sampling process starts by selecting a vector of covariates |${\bf Z}$| for each individual. Based on the model, Eq. (1), the event type is sampled according to the true probabilities |$\Pr (J=j|{\bf Z})$|⁠. The event time is then sampled from |$\Pr (T=t|J=j,{\bf Z})=\Pr (T=t,J=j|{\bf Z})/\Pr (J=j|{\bf Z})$|⁠, detailed in Section 2.1. For simulation settings 1-10, covariates were drawn from a standard uniform distribution. Parameters for Settings 1-2 include |$\alpha _{1t} = -1.4 + 0.4 \log t$| and |$\alpha _{2t} = -1.3 + 0.4\log t$| for |$t=1,\ldots ,7$|⁠, with |${\boldsymbol {\beta }}_1 = -0.7 (\log 0.8, \log 3, \log 3, \log 2.5, \log 2)$|⁠, and |${\boldsymbol {\beta }}_{2} = -0.6 (\log 1, \log 3, \log 4, \log 3, \log 2)$|⁠. Censoring times followed a discrete uniform distribution with a probability of 0.02 for each |$t=1,\ldots ,7$|⁠. For Settings 3-4, parameters were set to |$\alpha _{1t} = -2.0 - 0.2 \log t$| and |$\alpha _{2t} = -2.2 - 0.2\log t$|⁠, |$t=1,\ldots ,30$|⁠, with |${\boldsymbol {\beta }}$| values the same as in Settings 1-2. Censoring times were sampled with a probability of 0.01 for each t.

Table 1 and Fig. 1 summarize the results of |${\boldsymbol {\beta }}_{j}$| and |$\alpha _{jt}$|⁠, respectively, for 2 competing risks. Results with other sample sizes and 3 competing risks are provided in Web Appendix D and Web Appendix E. Evidently, the method of Lee et al. (2018) and the proposed method perform similarly in terms of bias and standard errors. In addition, the empirical coverage rates of 95% Wald-type confidence intervals for each regression coefficient, based on the proposed approach, are reasonably close to 95%.

$Simulation results of two competing events. Results of ${\alpha }_{jt}$. Each panel is based on a different sample size (250, 500, 5000 and 20,000). Number of observed events are shown in red and brown bars for event types $j=1$ and $j=2$, respectively. True values and mean of estimates are in blue and green for $j=1$ and $j=2$. True values are shown in dashed lines, mean of estimates based on Lee et al. (2018) and the proposed two-step approach denoted by circles and diamonds, respectively.$

FIGURE 1

Simulation results of two competing events. Results of |${\alpha }_{jt}$|⁠. Each panel is based on a different sample size (250, 500, 5000 and 20,000). Number of observed events are shown in red and brown bars for event types |$j=1$| and |$j=2$|⁠, respectively. True values and mean of estimates are in blue and green for |$j=1$| and |$j=2$|⁠. True values are shown in dashed lines, mean of estimates based on Lee et al. (2018) and the proposed two-step approach denoted by circles and diamonds, respectively.

Open in new tab Download slide

TABLE 1

Open in new tab

Simulation results of two competing events. Results of Lee et al. (2018) include mean and estimated standard error (Est SE).

		True	Lee et al. (2018)		Two-step
n	\|$\beta _{jk}$\|	Value	Mean	Est SE	Mean	Est SE	Emp SE	CR
250	\|$\beta _{11}$\|	0.156	0.138	0.390	0.137	0.389	0.375	0.965
	\|$\beta _{12}$\|	−0.769	−0.751	0.395	−0.745	0.393	0.399	0.945
	\|$\beta _{13}$\|	−0.769	−0.817	0.395	−0.811	0.393	0.378	0.965
	\|$\beta _{14}$\|	−0.641	−0.642	0.395	−0.637	0.393	0.409	0.950
	\|$\beta _{15}$\|	−0.485	−0.496	0.393	−0.492	0.391	0.425	0.925
	\|$\beta _{21}$\|	0.000	−0.002	0.380	−0.002	0.378	0.357	0.960
	\|$\beta _{22}$\|	−0.659	−0.704	0.384	−0.698	0.383	0.394	0.950
	\|$\beta _{23}$\|	−0.832	−0.849	0.385	−0.842	0.383	0.378	0.955
	\|$\beta _{24}$\|	−0.659	−0.675	0.384	−0.669	0.382	0.406	0.945
	\|$\beta _{25}$\|	−0.416	−0.451	0.382	−0.447	0.381	0.402	0.940
500	\|$\beta _{11}$\|	0.156	0.133	0.273	0.132	0.273	0.270	0.925
	\|$\beta _{12}$\|	−0.769	−0.795	0.276	−0.791	0.276	0.295	0.945
	\|$\beta _{13}$\|	−0.769	−0.815	0.278	−0.812	0.277	0.294	0.945
	\|$\beta _{14}$\|	−0.641	−0.642	0.275	−0.640	0.275	0.260	0.965
	\|$\beta _{15}$\|	−0.485	−0.472	0.274	−0.470	0.273	0.258	0.975
	\|$\beta _{21}$\|	0.000	0.005	0.265	0.005	0.265	0.254	0.955
	\|$\beta _{22}$\|	−0.659	−0.681	0.268	−0.678	0.267	0.277	0.925
	\|$\beta _{23}$\|	−0.832	−0.855	0.269	−0.852	0.269	0.268	0.950
	\|$\beta _{24}$\|	−0.659	−0.634	0.267	−0.631	0.267	0.274	0.940
	\|$\beta _{25}$\|	−0.416	−0.415	0.266	−0.414	0.265	0.272	0.940
5000	\|$\beta _{11}$\|	0.223	0.227	0.094	0.225	0.093	0.104	0.940
	\|$\beta _{12}$\|	−1.099	−1.093	0.096	−1.082	0.095	0.104	0.920
	\|$\beta _{13}$\|	−1.099	−1.102	0.096	−1.090	0.095	0.105	0.935
	\|$\beta _{14}$\|	−0.916	−0.914	0.095	−0.904	0.094	0.092	0.955
	\|$\beta _{15}$\|	−0.693	−0.701	0.095	−0.694	0.094	0.099	0.940
	\|$\beta _{21}$\|	−0.000	0.004	0.121	0.004	0.120	0.119	0.945
	\|$\beta _{22}$\|	−1.099	−1.091	0.124	−1.083	0.123	0.129	0.925
	\|$\beta _{23}$\|	−1.386	−1.402	0.125	−1.393	0.125	0.137	0.920
	\|$\beta _{24}$\|	−1.099	−1.109	0.124	−1.101	0.123	0.135	0.925
	\|$\beta _{25}$\|	−0.693	−0.704	0.122	−0.698	0.121	0.120	0.945
20 000	\|$\beta _{11}$\|	0.223	0.220	0.047	0.217	0.047	0.046	0.935
	\|$\beta _{12}$\|	−1.099	−1.099	0.048	−1.088	0.048	0.044	0.965
	\|$\beta _{13}$\|	−1.099	−1.098	0.048	−1.087	0.048	0.046	0.940
	\|$\beta _{14}$\|	−0.916	−0.920	0.048	−0.910	0.047	0.041	0.980
	\|$\beta _{15}$\|	−0.693	−0.690	0.047	−0.682	0.047	0.046	0.945
	\|$\beta _{21}$\|	−0.000	0.003	0.060	0.003	0.060	0.065	0.930
	\|$\beta _{22}$\|	−1.099	−1.095	0.062	−1.088	0.061	0.066	0.940
	\|$\beta _{23}$\|	−1.386	−1.394	0.063	−1.385	0.062	0.057	0.980
	\|$\beta _{24}$\|	−1.099	−1.096	0.062	−1.089	0.061	0.061	0.950
	\|$\beta _{25}$\|	−0.693	−0.700	0.061	−0.695	0.061	0.056	0.970

		True	Lee et al. (2018)		Two-step
n	\|$\beta _{jk}$\|	Value	Mean	Est SE	Mean	Est SE	Emp SE	CR
250	\|$\beta _{11}$\|	0.156	0.138	0.390	0.137	0.389	0.375	0.965
	\|$\beta _{12}$\|	−0.769	−0.751	0.395	−0.745	0.393	0.399	0.945
	\|$\beta _{13}$\|	−0.769	−0.817	0.395	−0.811	0.393	0.378	0.965
	\|$\beta _{14}$\|	−0.641	−0.642	0.395	−0.637	0.393	0.409	0.950
	\|$\beta _{15}$\|	−0.485	−0.496	0.393	−0.492	0.391	0.425	0.925
	\|$\beta _{21}$\|	0.000	−0.002	0.380	−0.002	0.378	0.357	0.960
	\|$\beta _{22}$\|	−0.659	−0.704	0.384	−0.698	0.383	0.394	0.950
	\|$\beta _{23}$\|	−0.832	−0.849	0.385	−0.842	0.383	0.378	0.955
	\|$\beta _{24}$\|	−0.659	−0.675	0.384	−0.669	0.382	0.406	0.945
	\|$\beta _{25}$\|	−0.416	−0.451	0.382	−0.447	0.381	0.402	0.940
500	\|$\beta _{11}$\|	0.156	0.133	0.273	0.132	0.273	0.270	0.925
	\|$\beta _{12}$\|	−0.769	−0.795	0.276	−0.791	0.276	0.295	0.945
	\|$\beta _{13}$\|	−0.769	−0.815	0.278	−0.812	0.277	0.294	0.945
	\|$\beta _{14}$\|	−0.641	−0.642	0.275	−0.640	0.275	0.260	0.965
	\|$\beta _{15}$\|	−0.485	−0.472	0.274	−0.470	0.273	0.258	0.975
	\|$\beta _{21}$\|	0.000	0.005	0.265	0.005	0.265	0.254	0.955
	\|$\beta _{22}$\|	−0.659	−0.681	0.268	−0.678	0.267	0.277	0.925
	\|$\beta _{23}$\|	−0.832	−0.855	0.269	−0.852	0.269	0.268	0.950
	\|$\beta _{24}$\|	−0.659	−0.634	0.267	−0.631	0.267	0.274	0.940
	\|$\beta _{25}$\|	−0.416	−0.415	0.266	−0.414	0.265	0.272	0.940
5000	\|$\beta _{11}$\|	0.223	0.227	0.094	0.225	0.093	0.104	0.940
	\|$\beta _{12}$\|	−1.099	−1.093	0.096	−1.082	0.095	0.104	0.920
	\|$\beta _{13}$\|	−1.099	−1.102	0.096	−1.090	0.095	0.105	0.935
	\|$\beta _{14}$\|	−0.916	−0.914	0.095	−0.904	0.094	0.092	0.955
	\|$\beta _{15}$\|	−0.693	−0.701	0.095	−0.694	0.094	0.099	0.940
	\|$\beta _{21}$\|	−0.000	0.004	0.121	0.004	0.120	0.119	0.945
	\|$\beta _{22}$\|	−1.099	−1.091	0.124	−1.083	0.123	0.129	0.925
	\|$\beta _{23}$\|	−1.386	−1.402	0.125	−1.393	0.125	0.137	0.920
	\|$\beta _{24}$\|	−1.099	−1.109	0.124	−1.101	0.123	0.135	0.925
	\|$\beta _{25}$\|	−0.693	−0.704	0.122	−0.698	0.121	0.120	0.945
20 000	\|$\beta _{11}$\|	0.223	0.220	0.047	0.217	0.047	0.046	0.935
	\|$\beta _{12}$\|	−1.099	−1.099	0.048	−1.088	0.048	0.044	0.965
	\|$\beta _{13}$\|	−1.099	−1.098	0.048	−1.087	0.048	0.046	0.940
	\|$\beta _{14}$\|	−0.916	−0.920	0.048	−0.910	0.047	0.041	0.980
	\|$\beta _{15}$\|	−0.693	−0.690	0.047	−0.682	0.047	0.046	0.945
	\|$\beta _{21}$\|	−0.000	0.003	0.060	0.003	0.060	0.065	0.930
	\|$\beta _{22}$\|	−1.099	−1.095	0.062	−1.088	0.061	0.066	0.940
	\|$\beta _{23}$\|	−1.386	−1.394	0.063	−1.385	0.062	0.057	0.980
	\|$\beta _{24}$\|	−1.099	−1.096	0.062	−1.089	0.061	0.061	0.950
	\|$\beta _{25}$\|	−0.693	−0.700	0.061	−0.695	0.061	0.056	0.970

Results of the proposed two-step approach include mean, estimated SE, empirical SE (Emp SE), and empirical coverage rate (CR) of 95% Wald-type confidence interval.

TABLE 1

Open in new tab

Simulation results of two competing events. Results of Lee et al. (2018) include mean and estimated standard error (Est SE).

		True	Lee et al. (2018)		Two-step
n	\|$\beta _{jk}$\|	Value	Mean	Est SE	Mean	Est SE	Emp SE	CR
250	\|$\beta _{11}$\|	0.156	0.138	0.390	0.137	0.389	0.375	0.965
	\|$\beta _{12}$\|	−0.769	−0.751	0.395	−0.745	0.393	0.399	0.945
	\|$\beta _{13}$\|	−0.769	−0.817	0.395	−0.811	0.393	0.378	0.965
	\|$\beta _{14}$\|	−0.641	−0.642	0.395	−0.637	0.393	0.409	0.950
	\|$\beta _{15}$\|	−0.485	−0.496	0.393	−0.492	0.391	0.425	0.925
	\|$\beta _{21}$\|	0.000	−0.002	0.380	−0.002	0.378	0.357	0.960
	\|$\beta _{22}$\|	−0.659	−0.704	0.384	−0.698	0.383	0.394	0.950
	\|$\beta _{23}$\|	−0.832	−0.849	0.385	−0.842	0.383	0.378	0.955
	\|$\beta _{24}$\|	−0.659	−0.675	0.384	−0.669	0.382	0.406	0.945
	\|$\beta _{25}$\|	−0.416	−0.451	0.382	−0.447	0.381	0.402	0.940
500	\|$\beta _{11}$\|	0.156	0.133	0.273	0.132	0.273	0.270	0.925
	\|$\beta _{12}$\|	−0.769	−0.795	0.276	−0.791	0.276	0.295	0.945
	\|$\beta _{13}$\|	−0.769	−0.815	0.278	−0.812	0.277	0.294	0.945
	\|$\beta _{14}$\|	−0.641	−0.642	0.275	−0.640	0.275	0.260	0.965
	\|$\beta _{15}$\|	−0.485	−0.472	0.274	−0.470	0.273	0.258	0.975
	\|$\beta _{21}$\|	0.000	0.005	0.265	0.005	0.265	0.254	0.955
	\|$\beta _{22}$\|	−0.659	−0.681	0.268	−0.678	0.267	0.277	0.925
	\|$\beta _{23}$\|	−0.832	−0.855	0.269	−0.852	0.269	0.268	0.950
	\|$\beta _{24}$\|	−0.659	−0.634	0.267	−0.631	0.267	0.274	0.940
	\|$\beta _{25}$\|	−0.416	−0.415	0.266	−0.414	0.265	0.272	0.940
5000	\|$\beta _{11}$\|	0.223	0.227	0.094	0.225	0.093	0.104	0.940
	\|$\beta _{12}$\|	−1.099	−1.093	0.096	−1.082	0.095	0.104	0.920
	\|$\beta _{13}$\|	−1.099	−1.102	0.096	−1.090	0.095	0.105	0.935
	\|$\beta _{14}$\|	−0.916	−0.914	0.095	−0.904	0.094	0.092	0.955
	\|$\beta _{15}$\|	−0.693	−0.701	0.095	−0.694	0.094	0.099	0.940
	\|$\beta _{21}$\|	−0.000	0.004	0.121	0.004	0.120	0.119	0.945
	\|$\beta _{22}$\|	−1.099	−1.091	0.124	−1.083	0.123	0.129	0.925
	\|$\beta _{23}$\|	−1.386	−1.402	0.125	−1.393	0.125	0.137	0.920
	\|$\beta _{24}$\|	−1.099	−1.109	0.124	−1.101	0.123	0.135	0.925
	\|$\beta _{25}$\|	−0.693	−0.704	0.122	−0.698	0.121	0.120	0.945
20 000	\|$\beta _{11}$\|	0.223	0.220	0.047	0.217	0.047	0.046	0.935
	\|$\beta _{12}$\|	−1.099	−1.099	0.048	−1.088	0.048	0.044	0.965
	\|$\beta _{13}$\|	−1.099	−1.098	0.048	−1.087	0.048	0.046	0.940
	\|$\beta _{14}$\|	−0.916	−0.920	0.048	−0.910	0.047	0.041	0.980
	\|$\beta _{15}$\|	−0.693	−0.690	0.047	−0.682	0.047	0.046	0.945
	\|$\beta _{21}$\|	−0.000	0.003	0.060	0.003	0.060	0.065	0.930
	\|$\beta _{22}$\|	−1.099	−1.095	0.062	−1.088	0.061	0.066	0.940
	\|$\beta _{23}$\|	−1.386	−1.394	0.063	−1.385	0.062	0.057	0.980
	\|$\beta _{24}$\|	−1.099	−1.096	0.062	−1.089	0.061	0.061	0.950
	\|$\beta _{25}$\|	−0.693	−0.700	0.061	−0.695	0.061	0.056	0.970

		True	Lee et al. (2018)		Two-step
n	\|$\beta _{jk}$\|	Value	Mean	Est SE	Mean	Est SE	Emp SE	CR
250	\|$\beta _{11}$\|	0.156	0.138	0.390	0.137	0.389	0.375	0.965
	\|$\beta _{12}$\|	−0.769	−0.751	0.395	−0.745	0.393	0.399	0.945
	\|$\beta _{13}$\|	−0.769	−0.817	0.395	−0.811	0.393	0.378	0.965
	\|$\beta _{14}$\|	−0.641	−0.642	0.395	−0.637	0.393	0.409	0.950
	\|$\beta _{15}$\|	−0.485	−0.496	0.393	−0.492	0.391	0.425	0.925
	\|$\beta _{21}$\|	0.000	−0.002	0.380	−0.002	0.378	0.357	0.960
	\|$\beta _{22}$\|	−0.659	−0.704	0.384	−0.698	0.383	0.394	0.950
	\|$\beta _{23}$\|	−0.832	−0.849	0.385	−0.842	0.383	0.378	0.955
	\|$\beta _{24}$\|	−0.659	−0.675	0.384	−0.669	0.382	0.406	0.945
	\|$\beta _{25}$\|	−0.416	−0.451	0.382	−0.447	0.381	0.402	0.940
500	\|$\beta _{11}$\|	0.156	0.133	0.273	0.132	0.273	0.270	0.925
	\|$\beta _{12}$\|	−0.769	−0.795	0.276	−0.791	0.276	0.295	0.945
	\|$\beta _{13}$\|	−0.769	−0.815	0.278	−0.812	0.277	0.294	0.945
	\|$\beta _{14}$\|	−0.641	−0.642	0.275	−0.640	0.275	0.260	0.965
	\|$\beta _{15}$\|	−0.485	−0.472	0.274	−0.470	0.273	0.258	0.975
	\|$\beta _{21}$\|	0.000	0.005	0.265	0.005	0.265	0.254	0.955
	\|$\beta _{22}$\|	−0.659	−0.681	0.268	−0.678	0.267	0.277	0.925
	\|$\beta _{23}$\|	−0.832	−0.855	0.269	−0.852	0.269	0.268	0.950
	\|$\beta _{24}$\|	−0.659	−0.634	0.267	−0.631	0.267	0.274	0.940
	\|$\beta _{25}$\|	−0.416	−0.415	0.266	−0.414	0.265	0.272	0.940
5000	\|$\beta _{11}$\|	0.223	0.227	0.094	0.225	0.093	0.104	0.940
	\|$\beta _{12}$\|	−1.099	−1.093	0.096	−1.082	0.095	0.104	0.920
	\|$\beta _{13}$\|	−1.099	−1.102	0.096	−1.090	0.095	0.105	0.935
	\|$\beta _{14}$\|	−0.916	−0.914	0.095	−0.904	0.094	0.092	0.955
	\|$\beta _{15}$\|	−0.693	−0.701	0.095	−0.694	0.094	0.099	0.940
	\|$\beta _{21}$\|	−0.000	0.004	0.121	0.004	0.120	0.119	0.945
	\|$\beta _{22}$\|	−1.099	−1.091	0.124	−1.083	0.123	0.129	0.925
	\|$\beta _{23}$\|	−1.386	−1.402	0.125	−1.393	0.125	0.137	0.920
	\|$\beta _{24}$\|	−1.099	−1.109	0.124	−1.101	0.123	0.135	0.925
	\|$\beta _{25}$\|	−0.693	−0.704	0.122	−0.698	0.121	0.120	0.945
20 000	\|$\beta _{11}$\|	0.223	0.220	0.047	0.217	0.047	0.046	0.935
	\|$\beta _{12}$\|	−1.099	−1.099	0.048	−1.088	0.048	0.044	0.965
	\|$\beta _{13}$\|	−1.099	−1.098	0.048	−1.087	0.048	0.046	0.940
	\|$\beta _{14}$\|	−0.916	−0.920	0.048	−0.910	0.047	0.041	0.980
	\|$\beta _{15}$\|	−0.693	−0.690	0.047	−0.682	0.047	0.046	0.945
	\|$\beta _{21}$\|	−0.000	0.003	0.060	0.003	0.060	0.065	0.930
	\|$\beta _{22}$\|	−1.099	−1.095	0.062	−1.088	0.061	0.066	0.940
	\|$\beta _{23}$\|	−1.386	−1.394	0.063	−1.385	0.062	0.057	0.980
	\|$\beta _{24}$\|	−1.099	−1.096	0.062	−1.089	0.061	0.061	0.950
	\|$\beta _{25}$\|	−0.693	−0.700	0.061	−0.695	0.061	0.056	0.970

Results of the proposed two-step approach include mean, estimated SE, empirical SE (Emp SE), and empirical coverage rate (CR) of 95% Wald-type confidence interval.

The aim of Settings 11-16 is to showcase how lasso regularization is integrated into our two-step procedure for feature selection. In Settings 11-13 |$p=100$| covariates were considered, and only 5 of them are with non-zero values. Two settings of zero-mean normally distributed covariates were considered: (i) independent covariates, each with variance 0.4; (ii) the following covariances were updated in setting (i) |$Cov(Z_1, Z_9) = 0.1$|⁠, |$Cov(Z_2, Z_{10}) = 0.3$|⁠, |$Cov(Z_4, Z_8) = -0.3$|⁠, and |$Cov(Z_5, Z_{12}) = -0.1$|⁠. In order to get appropriate survival probabilities based on Eq. (1), covariates were truncated to be within |$[-1.5,1.5]$|⁠. The parameters of the model were set to be |$\alpha _{1t} = -3.4 - 0.1 \log t$|⁠, |$\alpha _{2t} = -3.4 - 0.2\log t$|⁠, |$t = 1, \ldots , 15$|⁠. The first 5 components of |${\boldsymbol {\beta }}_1$| and |${\boldsymbol {\beta }}_2$| were set to be |$(1.2,1.5,-1,-0.3,-1.2)$| and |$(-1.2,1,1,-1,1.4)$|⁠, respectively, and the rest of the coefficients were set to zero.

Based on one simulated dataset of Setting 11 (see Figure S5 of the SM) and the selected values of |$\eta _j$|⁠, the means and standard deviations (SD) based on the 5-fold integrated cause-specific |$\widehat{\mbox{AUC}}_j$| were |$\widehat{\mbox{AUC}}_1=0.796$| (SD = 0.007) and |$\widehat{\mbox{AUC}}_2=0.803$| (SD = 0.007), with a mean global |$\widehat{\mbox{AUC}}=0.8$| (SD = 0.003). The mean global AUC of the non-regularized procedure was |$\widetilde{\mbox{AUC}}=0.795$| (SD = 0.002). Looking at this specific example, we observe a substantial reduction in the number of covariates selected by the lasso penalty, without a significant change in the discrimination performance as measured by the AUC. The mean integrated cause-specific Brier Scores were |$\widehat{\mbox{BS}}_1=0.045$| (SD = 0.002) and |$\widehat{\mbox{BS}}_2=0.044$| (SD = 0.003), with a mean global Brier Score |$\widehat{\mbox{BS}}=0.044$| (SD = 0.002). Similar results were observed for the one simulated dataset of Setting 12 (see Web Appendix F).

Setting 13 is similar to Setting 12, but with 100 repetitions. It shows that the means of true- and false-positive discoveries for each event type, |$\mbox{TP}_j$| and |$\mbox{FP}_j$|⁠, |$j=1,2$|⁠, under the selected values of |$\eta _j$| were |$\mbox{TP}_1=4.99$|⁠, |$\mbox{FP}_1=0.01$|⁠, |$\mbox{TP}_2=5$|⁠, and |$\mbox{FP}_2=0$|⁠. The results indicate that the correct model was selected in all 100 repetitions, with a single exception for |$j=1$|⁠. Similar results were observed with smaller sample size of |$n=500$| (see Web Appendix F, Settings 14-16). Web Appendix C provides a detailed description of Settings 17-19, demonstrating the excellent performance of integrating screening methods into the two-step procedure.

4 MIMIC DATA ANALYSIS - LENGTH OF HOSPITAL STAY IN ICU

Although the MIMIC dataset records admission and discharge times to the minute, it is advisable to use daily units for survival analysis because times within a day are more influenced by hospital procedures than by patients’ health status. The analysis includes 25 170 ICU admissions with 3 competing events: discharge to home (⁠|$J=1$|⁠, 69.0%), transfer to another medical facility (⁠|$J=2$|⁠, 21.4%), and in-hospital death (⁠|$J=3$|⁠, 6.1%). The analysis is restricted to admissions classified as “emergency,” with a distinction between direct emergency and emergency ward (EW). Emergency admission history is included by two covariates: the number of previous emergency admissions (admissions number), and a dummy variable indicating whether the previous admission ended within 30 days prior to the last one (recent admission). Additional covariates included in the analysis are: year of admission (available in resolution of 3 years); standardized age at admission; a binary variable indicating night admission (between 20:00 and 8:00); ethnicity (Asian, Black, Hispanic, White, Other); and lab test results (normal or abnormal) performed upon arrival and with results within the first 24 hours of admission. Note that it is common to include initial laboratory test results when predicting hospital length of stay (Almeida et al., 2024). The analysis includes 36 covariates in total. Web Appendix G summarizes the covariates’ distribution.

Three methods were considered: Lee et al. (2018), the proposed two-step approach, and the proposed two-step approach with lasso. For the latter, the selection of |$\eta _j$|⁠, |$j=1,2,3$|⁠, were carried out using 4-fold cross validation, and by maximizing the out-of-sample global AUC. |$\log \eta _j$| was allowed to vary from −12 to −1, in steps of 1. The resulting selected values of |$\log \eta _j$|⁠, |$j=1,2,3$|⁠, were −5, −9, and −11. The results of the 3 procedures are presented in Tables 2–4 and Figure 2. The parameters’ estimates were similar between Lee et al. (2018)’s approach and the two-step procedure without regularization, as expected. Computation time was also similar between Lee et al. (2018)’s approach and the two-step procedure without regularization with estimation time of 29.5 seconds and 22.1 seconds, respectively.

$MIMIC dataset—LOS analysis. Regularized regression with 4-fold CV. The selected values of $\eta _j$ are shown in dashed-dotted lines on panels (A-F). (A-C) Number of non-zero coefficients for $j=1,2,3$. (D-F) The estimated coefficients, as a function of $\eta _j$, $j=1,2,3$. (G-I) Mean (and SD bars) of the 4 folds $\widehat{\mbox{AUC}}_j(t)$, $j=1,2,3$, for the selected values $\log \eta _1=-5$, $\log \eta _2=-9$ and $\log \eta _3=-11$. The number of observed events of each type is shown by bars. (J) Results of estimated $\alpha _{jt}$ by the method of Lee et al. (2018) (circle), the proposed two-step approach (stars) with no regularization and the proposed approach with lasso (left triangular). Numbers of observed events are shown in blue bars for home discharge ($j=1$), in green bars for further treatment ($j=2$), and in red bars for in-hospital death ($j=3$). lasso estimates are based on $\log \eta _1=-5$, $\log \eta _2=-9$ and $\log \eta _3=-11$.$

FIGURE 2

MIMIC dataset—LOS analysis. Regularized regression with 4-fold CV. The selected values of |$\eta _j$| are shown in dashed-dotted lines on panels (A-F). (A-C) Number of non-zero coefficients for |$j=1,2,3$|⁠. (D-F) The estimated coefficients, as a function of |$\eta _j$|⁠, |$j=1,2,3$|⁠. (G-I) Mean (and SD bars) of the 4 folds |$\widehat{\mbox{AUC}}_j(t)$|⁠, |$j=1,2,3$|⁠, for the selected values |$\log \eta _1=-5$|⁠, |$\log \eta _2=-9$| and |$\log \eta _3=-11$|⁠. The number of observed events of each type is shown by bars. (J) Results of estimated |$\alpha _{jt}$| by the method of Lee et al. (2018) (circle), the proposed two-step approach (stars) with no regularization and the proposed approach with lasso (left triangular). Numbers of observed events are shown in blue bars for home discharge (⁠|$j=1$|⁠), in green bars for further treatment (⁠|$j=2$|⁠), and in red bars for in-hospital death (⁠|$j=3$|⁠). lasso estimates are based on |$\log \eta _1=-5$|⁠, |$\log \eta _2=-9$| and |$\log \eta _3=-11$|⁠.

Open in new tab Download slide

TABLE 2

Open in new tab

MIMIC dataset—LOS analysis: estimated regression coefficients of event type discharge to home, |$J=1$|⁠.

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.000 (0.024)	0.003 (0.022)	0.000 (0.000)
	3+	−0.032 (0.023)	−0.027 (0.022)	0.000 (0.000)
Anion gap	Abnormal	−0.137 (0.032)	−0.128 (0.030)	0.000 (0.000)
Bicarbonate	Abnormal	−0.208 (0.021)	−0.194 (0.020)	−0.119 (0.019)
Calcium total	Abnormal	−0.291 (0.020)	−0.270 (0.019)	−0.190 (0.018)
Chloride	Abnormal	−0.148 (0.024)	−0.137 (0.023)	−0.071 (0.021)
Creatinine	Abnormal	−0.103 (0.024)	−0.098 (0.023)	−0.072 (0.021)
Direct emergency	Yes	−0.011 (0.026)	−0.014 (0.024)	0.000 (0.000)
Ethnicity	Black	0.006 (0.046)	0.009 (0.042)	0.000 (0.000)
	Hispanic	0.132 (0.053)	0.120 (0.048)	0.000 (0.000)
	Other	−0.162 (0.051)	−0.146 (0.047)	0.000 (0.000)
	White	−0.031 (0.041)	−0.026 (0.038)	0.000 (0.000)
Glucose	Abnormal	−0.215 (0.018)	−0.192 (0.016)	−0.088 (0.016)
Hematocrit	Abnormal	−0.042 (0.032)	−0.037 (0.029)	−0.042 (0.029)
Hemoglobin	Abnormal	−0.080 (0.033)	−0.071 (0.030)	−0.081 (0.030)
Insurance	Medicare	0.138 (0.039)	0.125 (0.036)	0.000 (0.000)
	Other	0.219 (0.036)	0.200 (0.033)	0.030 (0.016)
MCH	Abnormal	−0.002 (0.023)	−0.002 (0.022)	0.000 (0.000)
MCHC	Abnormal	−0.128 (0.019)	−0.116 (0.018)	−0.003 (0.017)
MCV	Abnormal	−0.048 (0.026)	−0.045 (0.024)	0.000 (0.000)
Magnesium	Abnormal	−0.080 (0.030)	−0.074 (0.028)	0.000 (0.000)
Marital status	Married	0.224 (0.032)	0.205 (0.030)	0.093 (0.016)
	Single	−0.087 (0.033)	−0.079 (0.031)	0.000 (0.000)
	Widowed	0.026 (0.040)	0.020 (0.037)	0.000 (0.000)
Night admission	Yes	0.081 (0.017)	0.075 (0.016)	0.000 (0.000)
Phosphate	Abnormal	−0.052 (0.019)	−0.048 (0.018)	0.000 (0.000)
Platelet count	Abnormal	−0.068 (0.019)	−0.062 (0.018)	0.000 (0.000)
Potassium	Abnormal	−0.103 (0.032)	−0.095 (0.030)	0.000 (0.000)
RDW	Abnormal	−0.327 (0.021)	−0.308 (0.020)	−0.271 (0.019)
Recent admission	Yes	−0.262 (0.035)	−0.247 (0.033)	−0.001 (0.027)
Red blood cells	Abnormal	−0.089 (0.027)	−0.078 (0.024)	−0.024 (0.025)
Sex	Female	−0.007 (0.018)	−0.006 (0.016)	0.000 (0.000)
Sodium	Abnormal	−0.312 (0.030)	−0.297 (0.029)	−0.142 (0.026)
Standardized age		−0.260 (0.011)	−0.234 (0.010)	−0.162 (0.009)
Urea nitrogen	Abnormal	−0.148 (0.022)	−0.139 (0.020)	−0.136 (0.020)
White blood cells	Abnormal	−0.276 (0.018)	−0.252 (0.016)	−0.159 (0.016)

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.000 (0.024)	0.003 (0.022)	0.000 (0.000)
	3+	−0.032 (0.023)	−0.027 (0.022)	0.000 (0.000)
Anion gap	Abnormal	−0.137 (0.032)	−0.128 (0.030)	0.000 (0.000)
Bicarbonate	Abnormal	−0.208 (0.021)	−0.194 (0.020)	−0.119 (0.019)
Calcium total	Abnormal	−0.291 (0.020)	−0.270 (0.019)	−0.190 (0.018)
Chloride	Abnormal	−0.148 (0.024)	−0.137 (0.023)	−0.071 (0.021)
Creatinine	Abnormal	−0.103 (0.024)	−0.098 (0.023)	−0.072 (0.021)
Direct emergency	Yes	−0.011 (0.026)	−0.014 (0.024)	0.000 (0.000)
Ethnicity	Black	0.006 (0.046)	0.009 (0.042)	0.000 (0.000)
	Hispanic	0.132 (0.053)	0.120 (0.048)	0.000 (0.000)
	Other	−0.162 (0.051)	−0.146 (0.047)	0.000 (0.000)
	White	−0.031 (0.041)	−0.026 (0.038)	0.000 (0.000)
Glucose	Abnormal	−0.215 (0.018)	−0.192 (0.016)	−0.088 (0.016)
Hematocrit	Abnormal	−0.042 (0.032)	−0.037 (0.029)	−0.042 (0.029)
Hemoglobin	Abnormal	−0.080 (0.033)	−0.071 (0.030)	−0.081 (0.030)
Insurance	Medicare	0.138 (0.039)	0.125 (0.036)	0.000 (0.000)
	Other	0.219 (0.036)	0.200 (0.033)	0.030 (0.016)
MCH	Abnormal	−0.002 (0.023)	−0.002 (0.022)	0.000 (0.000)
MCHC	Abnormal	−0.128 (0.019)	−0.116 (0.018)	−0.003 (0.017)
MCV	Abnormal	−0.048 (0.026)	−0.045 (0.024)	0.000 (0.000)
Magnesium	Abnormal	−0.080 (0.030)	−0.074 (0.028)	0.000 (0.000)
Marital status	Married	0.224 (0.032)	0.205 (0.030)	0.093 (0.016)
	Single	−0.087 (0.033)	−0.079 (0.031)	0.000 (0.000)
	Widowed	0.026 (0.040)	0.020 (0.037)	0.000 (0.000)
Night admission	Yes	0.081 (0.017)	0.075 (0.016)	0.000 (0.000)
Phosphate	Abnormal	−0.052 (0.019)	−0.048 (0.018)	0.000 (0.000)
Platelet count	Abnormal	−0.068 (0.019)	−0.062 (0.018)	0.000 (0.000)
Potassium	Abnormal	−0.103 (0.032)	−0.095 (0.030)	0.000 (0.000)
RDW	Abnormal	−0.327 (0.021)	−0.308 (0.020)	−0.271 (0.019)
Recent admission	Yes	−0.262 (0.035)	−0.247 (0.033)	−0.001 (0.027)
Red blood cells	Abnormal	−0.089 (0.027)	−0.078 (0.024)	−0.024 (0.025)
Sex	Female	−0.007 (0.018)	−0.006 (0.016)	0.000 (0.000)
Sodium	Abnormal	−0.312 (0.030)	−0.297 (0.029)	−0.142 (0.026)
Standardized age		−0.260 (0.011)	−0.234 (0.010)	−0.162 (0.009)
Urea nitrogen	Abnormal	−0.148 (0.022)	−0.139 (0.020)	−0.136 (0.020)
White blood cells	Abnormal	−0.276 (0.018)	−0.252 (0.016)	−0.159 (0.016)

TABLE 2

Open in new tab

MIMIC dataset—LOS analysis: estimated regression coefficients of event type discharge to home, |$J=1$|⁠.

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.000 (0.024)	0.003 (0.022)	0.000 (0.000)
	3+	−0.032 (0.023)	−0.027 (0.022)	0.000 (0.000)
Anion gap	Abnormal	−0.137 (0.032)	−0.128 (0.030)	0.000 (0.000)
Bicarbonate	Abnormal	−0.208 (0.021)	−0.194 (0.020)	−0.119 (0.019)
Calcium total	Abnormal	−0.291 (0.020)	−0.270 (0.019)	−0.190 (0.018)
Chloride	Abnormal	−0.148 (0.024)	−0.137 (0.023)	−0.071 (0.021)
Creatinine	Abnormal	−0.103 (0.024)	−0.098 (0.023)	−0.072 (0.021)
Direct emergency	Yes	−0.011 (0.026)	−0.014 (0.024)	0.000 (0.000)
Ethnicity	Black	0.006 (0.046)	0.009 (0.042)	0.000 (0.000)
	Hispanic	0.132 (0.053)	0.120 (0.048)	0.000 (0.000)
	Other	−0.162 (0.051)	−0.146 (0.047)	0.000 (0.000)
	White	−0.031 (0.041)	−0.026 (0.038)	0.000 (0.000)
Glucose	Abnormal	−0.215 (0.018)	−0.192 (0.016)	−0.088 (0.016)
Hematocrit	Abnormal	−0.042 (0.032)	−0.037 (0.029)	−0.042 (0.029)
Hemoglobin	Abnormal	−0.080 (0.033)	−0.071 (0.030)	−0.081 (0.030)
Insurance	Medicare	0.138 (0.039)	0.125 (0.036)	0.000 (0.000)
	Other	0.219 (0.036)	0.200 (0.033)	0.030 (0.016)
MCH	Abnormal	−0.002 (0.023)	−0.002 (0.022)	0.000 (0.000)
MCHC	Abnormal	−0.128 (0.019)	−0.116 (0.018)	−0.003 (0.017)
MCV	Abnormal	−0.048 (0.026)	−0.045 (0.024)	0.000 (0.000)
Magnesium	Abnormal	−0.080 (0.030)	−0.074 (0.028)	0.000 (0.000)
Marital status	Married	0.224 (0.032)	0.205 (0.030)	0.093 (0.016)
	Single	−0.087 (0.033)	−0.079 (0.031)	0.000 (0.000)
	Widowed	0.026 (0.040)	0.020 (0.037)	0.000 (0.000)
Night admission	Yes	0.081 (0.017)	0.075 (0.016)	0.000 (0.000)
Phosphate	Abnormal	−0.052 (0.019)	−0.048 (0.018)	0.000 (0.000)
Platelet count	Abnormal	−0.068 (0.019)	−0.062 (0.018)	0.000 (0.000)
Potassium	Abnormal	−0.103 (0.032)	−0.095 (0.030)	0.000 (0.000)
RDW	Abnormal	−0.327 (0.021)	−0.308 (0.020)	−0.271 (0.019)
Recent admission	Yes	−0.262 (0.035)	−0.247 (0.033)	−0.001 (0.027)
Red blood cells	Abnormal	−0.089 (0.027)	−0.078 (0.024)	−0.024 (0.025)
Sex	Female	−0.007 (0.018)	−0.006 (0.016)	0.000 (0.000)
Sodium	Abnormal	−0.312 (0.030)	−0.297 (0.029)	−0.142 (0.026)
Standardized age		−0.260 (0.011)	−0.234 (0.010)	−0.162 (0.009)
Urea nitrogen	Abnormal	−0.148 (0.022)	−0.139 (0.020)	−0.136 (0.020)
White blood cells	Abnormal	−0.276 (0.018)	−0.252 (0.016)	−0.159 (0.016)

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.000 (0.024)	0.003 (0.022)	0.000 (0.000)
	3+	−0.032 (0.023)	−0.027 (0.022)	0.000 (0.000)
Anion gap	Abnormal	−0.137 (0.032)	−0.128 (0.030)	0.000 (0.000)
Bicarbonate	Abnormal	−0.208 (0.021)	−0.194 (0.020)	−0.119 (0.019)
Calcium total	Abnormal	−0.291 (0.020)	−0.270 (0.019)	−0.190 (0.018)
Chloride	Abnormal	−0.148 (0.024)	−0.137 (0.023)	−0.071 (0.021)
Creatinine	Abnormal	−0.103 (0.024)	−0.098 (0.023)	−0.072 (0.021)
Direct emergency	Yes	−0.011 (0.026)	−0.014 (0.024)	0.000 (0.000)
Ethnicity	Black	0.006 (0.046)	0.009 (0.042)	0.000 (0.000)
	Hispanic	0.132 (0.053)	0.120 (0.048)	0.000 (0.000)
	Other	−0.162 (0.051)	−0.146 (0.047)	0.000 (0.000)
	White	−0.031 (0.041)	−0.026 (0.038)	0.000 (0.000)
Glucose	Abnormal	−0.215 (0.018)	−0.192 (0.016)	−0.088 (0.016)
Hematocrit	Abnormal	−0.042 (0.032)	−0.037 (0.029)	−0.042 (0.029)
Hemoglobin	Abnormal	−0.080 (0.033)	−0.071 (0.030)	−0.081 (0.030)
Insurance	Medicare	0.138 (0.039)	0.125 (0.036)	0.000 (0.000)
	Other	0.219 (0.036)	0.200 (0.033)	0.030 (0.016)
MCH	Abnormal	−0.002 (0.023)	−0.002 (0.022)	0.000 (0.000)
MCHC	Abnormal	−0.128 (0.019)	−0.116 (0.018)	−0.003 (0.017)
MCV	Abnormal	−0.048 (0.026)	−0.045 (0.024)	0.000 (0.000)
Magnesium	Abnormal	−0.080 (0.030)	−0.074 (0.028)	0.000 (0.000)
Marital status	Married	0.224 (0.032)	0.205 (0.030)	0.093 (0.016)
	Single	−0.087 (0.033)	−0.079 (0.031)	0.000 (0.000)
	Widowed	0.026 (0.040)	0.020 (0.037)	0.000 (0.000)
Night admission	Yes	0.081 (0.017)	0.075 (0.016)	0.000 (0.000)
Phosphate	Abnormal	−0.052 (0.019)	−0.048 (0.018)	0.000 (0.000)
Platelet count	Abnormal	−0.068 (0.019)	−0.062 (0.018)	0.000 (0.000)
Potassium	Abnormal	−0.103 (0.032)	−0.095 (0.030)	0.000 (0.000)
RDW	Abnormal	−0.327 (0.021)	−0.308 (0.020)	−0.271 (0.019)
Recent admission	Yes	−0.262 (0.035)	−0.247 (0.033)	−0.001 (0.027)
Red blood cells	Abnormal	−0.089 (0.027)	−0.078 (0.024)	−0.024 (0.025)
Sex	Female	−0.007 (0.018)	−0.006 (0.016)	0.000 (0.000)
Sodium	Abnormal	−0.312 (0.030)	−0.297 (0.029)	−0.142 (0.026)
Standardized age		−0.260 (0.011)	−0.234 (0.010)	−0.162 (0.009)
Urea nitrogen	Abnormal	−0.148 (0.022)	−0.139 (0.020)	−0.136 (0.020)
White blood cells	Abnormal	−0.276 (0.018)	−0.252 (0.016)	−0.159 (0.016)

TABLE 3

Open in new tab

MIMIC dataset—LOS analysis: estimated regression coefficients of event type discharged to another facility, |$J=2$|⁠.

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.108 (0.041)	0.107 (0.040)	0.087 (0.038)
	3+	0.194 (0.037)	0.190 (0.036)	0.169 (0.034)
Anion gap	Abnormal	−0.006 (0.048)	−0.006 (0.047)	0.000 (0.002)
Bicarbonate	Abnormal	−0.121 (0.033)	−0.117 (0.032)	−0.110 (0.032)
Calcium total	Abnormal	−0.098 (0.031)	−0.094 (0.031)	−0.088 (0.030)
Chloride	Abnormal	0.016 (0.036)	0.015 (0.035)	0.000 (0.002)
Creatinine	Abnormal	−0.199 (0.036)	−0.191 (0.035)	−0.173 (0.035)
Direct emergency	Yes	−0.373 (0.052)	−0.363 (0.050)	−0.345 (0.050)
Ethnicity	Black	0.084 (0.090)	0.079 (0.088)	0.028 (0.086)
	Hispanic	−0.068 (0.111)	−0.070 (0.108)	−0.088 (0.106)
	Other	0.026 (0.099)	0.022 (0.097)	−0.006 (0.095)
	White	0.144 (0.082)	0.138 (0.081)	0.094 (0.079)
Glucose	Abnormal	−0.138 (0.031)	−0.132 (0.030)	−0.126 (0.030)
Hematocrit	Abnormal	0.038 (0.057)	0.039 (0.055)	0.032 (0.055)
Hemoglobin	Abnormal	0.018 (0.062)	0.015 (0.060)	0.005 (0.059)
Insurance	Medicare	0.237 (0.075)	0.230 (0.074)	0.238 (0.073)
	Other	−0.094 (0.074)	−0.091 (0.072)	−0.081 (0.072)
MCH	Abnormal	0.042 (0.038)	0.040 (0.037)	0.019 (0.031)
MCHC	Abnormal	−0.010 (0.031)	−0.011 (0.030)	0.000 (0.003)
MCV	Abnormal	−0.020 (0.041)	−0.019 (0.039)	0.000 (0.003)
Magnesium	Abnormal	−0.039 (0.048)	−0.038 (0.047)	−0.025 (0.046)
Marital Status	Married	−0.254 (0.054)	−0.249 (0.053)	−0.262 (0.052)
	Single	0.209 (0.054)	0.200 (0.053)	0.176 (0.052)
	Widowed	0.175 (0.058)	0.163 (0.056)	0.149 (0.056)
Night admission	Yes	0.056 (0.029)	0.054 (0.028)	0.047 (0.028)
Phosphate	Abnormal	−0.042 (0.033)	−0.040 (0.032)	−0.034 (0.031)
Platelet count	Abnormal	−0.130 (0.032)	−0.125 (0.031)	−0.118 (0.031)
Potassium	Abnormal	0.042 (0.048)	0.042 (0.047)	0.023 (0.047)
RDW	Abnormal	−0.107 (0.033)	−0.104 (0.032)	−0.093 (0.031)
Recent admission	Yes	−0.021 (0.051)	−0.023 (0.049)	0.000 (0.004)
Red blood cells	Abnormal	0.083 (0.052)	0.079 (0.050)	0.073 (0.050)
Sex	Female	0.090 (0.031)	0.088 (0.030)	0.078 (0.030)
Sodium	Abnormal	−0.056 (0.042)	−0.056 (0.041)	−0.039 (0.038)
Standardized age		0.536 (0.021)	0.525 (0.021)	0.519 (0.021)
Urea nitrogen	Abnormal	0.100 (0.035)	0.095 (0.034)	0.077 (0.034)
White blood cells	Abnormal	−0.107 (0.029)	−0.103 (0.028)	−0.099 (0.028)

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.108 (0.041)	0.107 (0.040)	0.087 (0.038)
	3+	0.194 (0.037)	0.190 (0.036)	0.169 (0.034)
Anion gap	Abnormal	−0.006 (0.048)	−0.006 (0.047)	0.000 (0.002)
Bicarbonate	Abnormal	−0.121 (0.033)	−0.117 (0.032)	−0.110 (0.032)
Calcium total	Abnormal	−0.098 (0.031)	−0.094 (0.031)	−0.088 (0.030)
Chloride	Abnormal	0.016 (0.036)	0.015 (0.035)	0.000 (0.002)
Creatinine	Abnormal	−0.199 (0.036)	−0.191 (0.035)	−0.173 (0.035)
Direct emergency	Yes	−0.373 (0.052)	−0.363 (0.050)	−0.345 (0.050)
Ethnicity	Black	0.084 (0.090)	0.079 (0.088)	0.028 (0.086)
	Hispanic	−0.068 (0.111)	−0.070 (0.108)	−0.088 (0.106)
	Other	0.026 (0.099)	0.022 (0.097)	−0.006 (0.095)
	White	0.144 (0.082)	0.138 (0.081)	0.094 (0.079)
Glucose	Abnormal	−0.138 (0.031)	−0.132 (0.030)	−0.126 (0.030)
Hematocrit	Abnormal	0.038 (0.057)	0.039 (0.055)	0.032 (0.055)
Hemoglobin	Abnormal	0.018 (0.062)	0.015 (0.060)	0.005 (0.059)
Insurance	Medicare	0.237 (0.075)	0.230 (0.074)	0.238 (0.073)
	Other	−0.094 (0.074)	−0.091 (0.072)	−0.081 (0.072)
MCH	Abnormal	0.042 (0.038)	0.040 (0.037)	0.019 (0.031)
MCHC	Abnormal	−0.010 (0.031)	−0.011 (0.030)	0.000 (0.003)
MCV	Abnormal	−0.020 (0.041)	−0.019 (0.039)	0.000 (0.003)
Magnesium	Abnormal	−0.039 (0.048)	−0.038 (0.047)	−0.025 (0.046)
Marital Status	Married	−0.254 (0.054)	−0.249 (0.053)	−0.262 (0.052)
	Single	0.209 (0.054)	0.200 (0.053)	0.176 (0.052)
	Widowed	0.175 (0.058)	0.163 (0.056)	0.149 (0.056)
Night admission	Yes	0.056 (0.029)	0.054 (0.028)	0.047 (0.028)
Phosphate	Abnormal	−0.042 (0.033)	−0.040 (0.032)	−0.034 (0.031)
Platelet count	Abnormal	−0.130 (0.032)	−0.125 (0.031)	−0.118 (0.031)
Potassium	Abnormal	0.042 (0.048)	0.042 (0.047)	0.023 (0.047)
RDW	Abnormal	−0.107 (0.033)	−0.104 (0.032)	−0.093 (0.031)
Recent admission	Yes	−0.021 (0.051)	−0.023 (0.049)	0.000 (0.004)
Red blood cells	Abnormal	0.083 (0.052)	0.079 (0.050)	0.073 (0.050)
Sex	Female	0.090 (0.031)	0.088 (0.030)	0.078 (0.030)
Sodium	Abnormal	−0.056 (0.042)	−0.056 (0.041)	−0.039 (0.038)
Standardized age		0.536 (0.021)	0.525 (0.021)	0.519 (0.021)
Urea nitrogen	Abnormal	0.100 (0.035)	0.095 (0.034)	0.077 (0.034)
White blood cells	Abnormal	−0.107 (0.029)	−0.103 (0.028)	−0.099 (0.028)

TABLE 3

Open in new tab

MIMIC dataset—LOS analysis: estimated regression coefficients of event type discharged to another facility, |$J=2$|⁠.

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.108 (0.041)	0.107 (0.040)	0.087 (0.038)
	3+	0.194 (0.037)	0.190 (0.036)	0.169 (0.034)
Anion gap	Abnormal	−0.006 (0.048)	−0.006 (0.047)	0.000 (0.002)
Bicarbonate	Abnormal	−0.121 (0.033)	−0.117 (0.032)	−0.110 (0.032)
Calcium total	Abnormal	−0.098 (0.031)	−0.094 (0.031)	−0.088 (0.030)
Chloride	Abnormal	0.016 (0.036)	0.015 (0.035)	0.000 (0.002)
Creatinine	Abnormal	−0.199 (0.036)	−0.191 (0.035)	−0.173 (0.035)
Direct emergency	Yes	−0.373 (0.052)	−0.363 (0.050)	−0.345 (0.050)
Ethnicity	Black	0.084 (0.090)	0.079 (0.088)	0.028 (0.086)
	Hispanic	−0.068 (0.111)	−0.070 (0.108)	−0.088 (0.106)
	Other	0.026 (0.099)	0.022 (0.097)	−0.006 (0.095)
	White	0.144 (0.082)	0.138 (0.081)	0.094 (0.079)
Glucose	Abnormal	−0.138 (0.031)	−0.132 (0.030)	−0.126 (0.030)
Hematocrit	Abnormal	0.038 (0.057)	0.039 (0.055)	0.032 (0.055)
Hemoglobin	Abnormal	0.018 (0.062)	0.015 (0.060)	0.005 (0.059)
Insurance	Medicare	0.237 (0.075)	0.230 (0.074)	0.238 (0.073)
	Other	−0.094 (0.074)	−0.091 (0.072)	−0.081 (0.072)
MCH	Abnormal	0.042 (0.038)	0.040 (0.037)	0.019 (0.031)
MCHC	Abnormal	−0.010 (0.031)	−0.011 (0.030)	0.000 (0.003)
MCV	Abnormal	−0.020 (0.041)	−0.019 (0.039)	0.000 (0.003)
Magnesium	Abnormal	−0.039 (0.048)	−0.038 (0.047)	−0.025 (0.046)
Marital Status	Married	−0.254 (0.054)	−0.249 (0.053)	−0.262 (0.052)
	Single	0.209 (0.054)	0.200 (0.053)	0.176 (0.052)
	Widowed	0.175 (0.058)	0.163 (0.056)	0.149 (0.056)
Night admission	Yes	0.056 (0.029)	0.054 (0.028)	0.047 (0.028)
Phosphate	Abnormal	−0.042 (0.033)	−0.040 (0.032)	−0.034 (0.031)
Platelet count	Abnormal	−0.130 (0.032)	−0.125 (0.031)	−0.118 (0.031)
Potassium	Abnormal	0.042 (0.048)	0.042 (0.047)	0.023 (0.047)
RDW	Abnormal	−0.107 (0.033)	−0.104 (0.032)	−0.093 (0.031)
Recent admission	Yes	−0.021 (0.051)	−0.023 (0.049)	0.000 (0.004)
Red blood cells	Abnormal	0.083 (0.052)	0.079 (0.050)	0.073 (0.050)
Sex	Female	0.090 (0.031)	0.088 (0.030)	0.078 (0.030)
Sodium	Abnormal	−0.056 (0.042)	−0.056 (0.041)	−0.039 (0.038)
Standardized age		0.536 (0.021)	0.525 (0.021)	0.519 (0.021)
Urea nitrogen	Abnormal	0.100 (0.035)	0.095 (0.034)	0.077 (0.034)
White blood cells	Abnormal	−0.107 (0.029)	−0.103 (0.028)	−0.099 (0.028)

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.108 (0.041)	0.107 (0.040)	0.087 (0.038)
	3+	0.194 (0.037)	0.190 (0.036)	0.169 (0.034)
Anion gap	Abnormal	−0.006 (0.048)	−0.006 (0.047)	0.000 (0.002)
Bicarbonate	Abnormal	−0.121 (0.033)	−0.117 (0.032)	−0.110 (0.032)
Calcium total	Abnormal	−0.098 (0.031)	−0.094 (0.031)	−0.088 (0.030)
Chloride	Abnormal	0.016 (0.036)	0.015 (0.035)	0.000 (0.002)
Creatinine	Abnormal	−0.199 (0.036)	−0.191 (0.035)	−0.173 (0.035)
Direct emergency	Yes	−0.373 (0.052)	−0.363 (0.050)	−0.345 (0.050)
Ethnicity	Black	0.084 (0.090)	0.079 (0.088)	0.028 (0.086)
	Hispanic	−0.068 (0.111)	−0.070 (0.108)	−0.088 (0.106)
	Other	0.026 (0.099)	0.022 (0.097)	−0.006 (0.095)
	White	0.144 (0.082)	0.138 (0.081)	0.094 (0.079)
Glucose	Abnormal	−0.138 (0.031)	−0.132 (0.030)	−0.126 (0.030)
Hematocrit	Abnormal	0.038 (0.057)	0.039 (0.055)	0.032 (0.055)
Hemoglobin	Abnormal	0.018 (0.062)	0.015 (0.060)	0.005 (0.059)
Insurance	Medicare	0.237 (0.075)	0.230 (0.074)	0.238 (0.073)
	Other	−0.094 (0.074)	−0.091 (0.072)	−0.081 (0.072)
MCH	Abnormal	0.042 (0.038)	0.040 (0.037)	0.019 (0.031)
MCHC	Abnormal	−0.010 (0.031)	−0.011 (0.030)	0.000 (0.003)
MCV	Abnormal	−0.020 (0.041)	−0.019 (0.039)	0.000 (0.003)
Magnesium	Abnormal	−0.039 (0.048)	−0.038 (0.047)	−0.025 (0.046)
Marital Status	Married	−0.254 (0.054)	−0.249 (0.053)	−0.262 (0.052)
	Single	0.209 (0.054)	0.200 (0.053)	0.176 (0.052)
	Widowed	0.175 (0.058)	0.163 (0.056)	0.149 (0.056)
Night admission	Yes	0.056 (0.029)	0.054 (0.028)	0.047 (0.028)
Phosphate	Abnormal	−0.042 (0.033)	−0.040 (0.032)	−0.034 (0.031)
Platelet count	Abnormal	−0.130 (0.032)	−0.125 (0.031)	−0.118 (0.031)
Potassium	Abnormal	0.042 (0.048)	0.042 (0.047)	0.023 (0.047)
RDW	Abnormal	−0.107 (0.033)	−0.104 (0.032)	−0.093 (0.031)
Recent admission	Yes	−0.021 (0.051)	−0.023 (0.049)	0.000 (0.004)
Red blood cells	Abnormal	0.083 (0.052)	0.079 (0.050)	0.073 (0.050)
Sex	Female	0.090 (0.031)	0.088 (0.030)	0.078 (0.030)
Sodium	Abnormal	−0.056 (0.042)	−0.056 (0.041)	−0.039 (0.038)
Standardized age		0.536 (0.021)	0.525 (0.021)	0.519 (0.021)
Urea nitrogen	Abnormal	0.100 (0.035)	0.095 (0.034)	0.077 (0.034)
White blood cells	Abnormal	−0.107 (0.029)	−0.103 (0.028)	−0.099 (0.028)

TABLE 4

Open in new tab

MIMIC dataset—LOS analysis: estimated regression coefficients of event type in-hospital death, |$J=3$|⁠.

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.147 (0.074)	0.147 (0.073)	0.140 (0.074)
	3+	0.142 (0.069)	0.140 (0.068)	0.134 (0.068)
Anion gap	Abnormal	0.582 (0.064)	0.573 (0.064)	0.571 (0.064)
Bicarbonate	Abnormal	0.543 (0.056)	0.537 (0.056)	0.535 (0.056)
Calcium total	Abnormal	0.204 (0.054)	0.204 (0.054)	0.203 (0.054)
Chloride	Abnormal	0.147 (0.059)	0.143 (0.058)	0.142 (0.058)
Creatinine	Abnormal	0.273 (0.067)	0.271 (0.067)	0.271 (0.067)
Direct emergency	Yes	−0.318 (0.096)	−0.311 (0.095)	−0.302 (0.095)
Ethnicity	Black	−0.236 (0.140)	−0.235 (0.139)	−0.203 (0.140)
	Hispanic	−0.395 (0.183)	−0.393 (0.181)	−0.351 (0.181)
	Other	0.145 (0.147)	0.133 (0.145)	0.155 (0.146)
	White	−0.156 (0.123)	−0.157 (0.122)	−0.130 (0.123)
Glucose	Abnormal	0.215 (0.064)	0.212 (0.063)	0.208 (0.063)
Hematocrit	Abnormal	−0.198 (0.108)	−0.194 (0.107)	−0.165 (0.108)
Hemoglobin	Abnormal	0.024 (0.122)	0.023 (0.121)	0.003 (0.121)
Insurance	Medicare	−0.224 (0.136)	−0.225 (0.135)	−0.171 (0.138)
	Other	−0.242 (0.133)	−0.240 (0.132)	−0.188 (0.135)
MCH	Abnormal	−0.066 (0.070)	−0.066 (0.069)	−0.057 (0.069)
MCHC	Abnormal	0.027 (0.056)	0.029 (0.055)	0.027 (0.055)
MCV	Abnormal	0.060 (0.072)	0.061 (0.071)	0.055 (0.071)
Magnesium	Abnormal	0.329 (0.073)	0.324 (0.072)	0.320 (0.072)
Marital status	Married	0.156 (0.102)	0.154 (0.101)	0.127 (0.061)
	Single	0.026 (0.107)	0.027 (0.106)	0.000 (0.008)
	Widowed	0.047 (0.115)	0.048 (0.114)	0.020 (0.084)
Night admission	Yes	−0.096 (0.053)	−0.093 (0.052)	−0.089 (0.052)
Phosphate	Abnormal	0.178 (0.056)	0.176 (0.055)	0.174 (0.055)
Platelet count	Abnormal	0.235 (0.054)	0.232 (0.054)	0.229 (0.054)
Potassium	Abnormal	0.227 (0.072)	0.221 (0.071)	0.221 (0.071)
RDW	Abnormal	0.492 (0.058)	0.486 (0.058)	0.483 (0.058)
Recent admission	Yes	0.250 (0.083)	0.242 (0.082)	0.242 (0.082)
Red blood cells	Abnormal	0.142 (0.105)	0.140 (0.104)	0.130 (0.104)
Sex	Female	−0.011 (0.057)	−0.008 (0.057)	−0.005 (0.057)
Sodium	Abnormal	0.276 (0.064)	0.270 (0.063)	0.268 (0.063)
Standardized age		0.580 (0.041)	0.574 (0.040)	0.568 (0.040)
Urea nitrogen	Abnormal	0.141 (0.070)	0.141 (0.070)	0.141 (0.070)
White blood cells	Abnormal	0.579 (0.056)	0.571 (0.056)	0.568 (0.055)

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.147 (0.074)	0.147 (0.073)	0.140 (0.074)
	3+	0.142 (0.069)	0.140 (0.068)	0.134 (0.068)
Anion gap	Abnormal	0.582 (0.064)	0.573 (0.064)	0.571 (0.064)
Bicarbonate	Abnormal	0.543 (0.056)	0.537 (0.056)	0.535 (0.056)
Calcium total	Abnormal	0.204 (0.054)	0.204 (0.054)	0.203 (0.054)
Chloride	Abnormal	0.147 (0.059)	0.143 (0.058)	0.142 (0.058)
Creatinine	Abnormal	0.273 (0.067)	0.271 (0.067)	0.271 (0.067)
Direct emergency	Yes	−0.318 (0.096)	−0.311 (0.095)	−0.302 (0.095)
Ethnicity	Black	−0.236 (0.140)	−0.235 (0.139)	−0.203 (0.140)
	Hispanic	−0.395 (0.183)	−0.393 (0.181)	−0.351 (0.181)
	Other	0.145 (0.147)	0.133 (0.145)	0.155 (0.146)
	White	−0.156 (0.123)	−0.157 (0.122)	−0.130 (0.123)
Glucose	Abnormal	0.215 (0.064)	0.212 (0.063)	0.208 (0.063)
Hematocrit	Abnormal	−0.198 (0.108)	−0.194 (0.107)	−0.165 (0.108)
Hemoglobin	Abnormal	0.024 (0.122)	0.023 (0.121)	0.003 (0.121)
Insurance	Medicare	−0.224 (0.136)	−0.225 (0.135)	−0.171 (0.138)
	Other	−0.242 (0.133)	−0.240 (0.132)	−0.188 (0.135)
MCH	Abnormal	−0.066 (0.070)	−0.066 (0.069)	−0.057 (0.069)
MCHC	Abnormal	0.027 (0.056)	0.029 (0.055)	0.027 (0.055)
MCV	Abnormal	0.060 (0.072)	0.061 (0.071)	0.055 (0.071)
Magnesium	Abnormal	0.329 (0.073)	0.324 (0.072)	0.320 (0.072)
Marital status	Married	0.156 (0.102)	0.154 (0.101)	0.127 (0.061)
	Single	0.026 (0.107)	0.027 (0.106)	0.000 (0.008)
	Widowed	0.047 (0.115)	0.048 (0.114)	0.020 (0.084)
Night admission	Yes	−0.096 (0.053)	−0.093 (0.052)	−0.089 (0.052)
Phosphate	Abnormal	0.178 (0.056)	0.176 (0.055)	0.174 (0.055)
Platelet count	Abnormal	0.235 (0.054)	0.232 (0.054)	0.229 (0.054)
Potassium	Abnormal	0.227 (0.072)	0.221 (0.071)	0.221 (0.071)
RDW	Abnormal	0.492 (0.058)	0.486 (0.058)	0.483 (0.058)
Recent admission	Yes	0.250 (0.083)	0.242 (0.082)	0.242 (0.082)
Red blood cells	Abnormal	0.142 (0.105)	0.140 (0.104)	0.130 (0.104)
Sex	Female	−0.011 (0.057)	−0.008 (0.057)	−0.005 (0.057)
Sodium	Abnormal	0.276 (0.064)	0.270 (0.063)	0.268 (0.063)
Standardized age		0.580 (0.041)	0.574 (0.040)	0.568 (0.040)
Urea nitrogen	Abnormal	0.141 (0.070)	0.141 (0.070)	0.141 (0.070)
White blood cells	Abnormal	0.579 (0.056)	0.571 (0.056)	0.568 (0.055)

TABLE 4

Open in new tab

MIMIC dataset—LOS analysis: estimated regression coefficients of event type in-hospital death, |$J=3$|⁠.

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.147 (0.074)	0.147 (0.073)	0.140 (0.074)
	3+	0.142 (0.069)	0.140 (0.068)	0.134 (0.068)
Anion gap	Abnormal	0.582 (0.064)	0.573 (0.064)	0.571 (0.064)
Bicarbonate	Abnormal	0.543 (0.056)	0.537 (0.056)	0.535 (0.056)
Calcium total	Abnormal	0.204 (0.054)	0.204 (0.054)	0.203 (0.054)
Chloride	Abnormal	0.147 (0.059)	0.143 (0.058)	0.142 (0.058)
Creatinine	Abnormal	0.273 (0.067)	0.271 (0.067)	0.271 (0.067)
Direct emergency	Yes	−0.318 (0.096)	−0.311 (0.095)	−0.302 (0.095)
Ethnicity	Black	−0.236 (0.140)	−0.235 (0.139)	−0.203 (0.140)
	Hispanic	−0.395 (0.183)	−0.393 (0.181)	−0.351 (0.181)
	Other	0.145 (0.147)	0.133 (0.145)	0.155 (0.146)
	White	−0.156 (0.123)	−0.157 (0.122)	−0.130 (0.123)
Glucose	Abnormal	0.215 (0.064)	0.212 (0.063)	0.208 (0.063)
Hematocrit	Abnormal	−0.198 (0.108)	−0.194 (0.107)	−0.165 (0.108)
Hemoglobin	Abnormal	0.024 (0.122)	0.023 (0.121)	0.003 (0.121)
Insurance	Medicare	−0.224 (0.136)	−0.225 (0.135)	−0.171 (0.138)
	Other	−0.242 (0.133)	−0.240 (0.132)	−0.188 (0.135)
MCH	Abnormal	−0.066 (0.070)	−0.066 (0.069)	−0.057 (0.069)
MCHC	Abnormal	0.027 (0.056)	0.029 (0.055)	0.027 (0.055)
MCV	Abnormal	0.060 (0.072)	0.061 (0.071)	0.055 (0.071)
Magnesium	Abnormal	0.329 (0.073)	0.324 (0.072)	0.320 (0.072)
Marital status	Married	0.156 (0.102)	0.154 (0.101)	0.127 (0.061)
	Single	0.026 (0.107)	0.027 (0.106)	0.000 (0.008)
	Widowed	0.047 (0.115)	0.048 (0.114)	0.020 (0.084)
Night admission	Yes	−0.096 (0.053)	−0.093 (0.052)	−0.089 (0.052)
Phosphate	Abnormal	0.178 (0.056)	0.176 (0.055)	0.174 (0.055)
Platelet count	Abnormal	0.235 (0.054)	0.232 (0.054)	0.229 (0.054)
Potassium	Abnormal	0.227 (0.072)	0.221 (0.071)	0.221 (0.071)
RDW	Abnormal	0.492 (0.058)	0.486 (0.058)	0.483 (0.058)
Recent admission	Yes	0.250 (0.083)	0.242 (0.082)	0.242 (0.082)
Red blood cells	Abnormal	0.142 (0.105)	0.140 (0.104)	0.130 (0.104)
Sex	Female	−0.011 (0.057)	−0.008 (0.057)	−0.005 (0.057)
Sodium	Abnormal	0.276 (0.064)	0.270 (0.063)	0.268 (0.063)
Standardized age		0.580 (0.041)	0.574 (0.040)	0.568 (0.040)
Urea nitrogen	Abnormal	0.141 (0.070)	0.141 (0.070)	0.141 (0.070)
White blood cells	Abnormal	0.579 (0.056)	0.571 (0.056)	0.568 (0.055)

		Lee et al. (2018)	Two-step	Two-step and lasso
		Estimate (SE)	Estimate (SE)	Estimate (SE)
Admissions number	2	0.147 (0.074)	0.147 (0.073)	0.140 (0.074)
	3+	0.142 (0.069)	0.140 (0.068)	0.134 (0.068)
Anion gap	Abnormal	0.582 (0.064)	0.573 (0.064)	0.571 (0.064)
Bicarbonate	Abnormal	0.543 (0.056)	0.537 (0.056)	0.535 (0.056)
Calcium total	Abnormal	0.204 (0.054)	0.204 (0.054)	0.203 (0.054)
Chloride	Abnormal	0.147 (0.059)	0.143 (0.058)	0.142 (0.058)
Creatinine	Abnormal	0.273 (0.067)	0.271 (0.067)	0.271 (0.067)
Direct emergency	Yes	−0.318 (0.096)	−0.311 (0.095)	−0.302 (0.095)
Ethnicity	Black	−0.236 (0.140)	−0.235 (0.139)	−0.203 (0.140)
	Hispanic	−0.395 (0.183)	−0.393 (0.181)	−0.351 (0.181)
	Other	0.145 (0.147)	0.133 (0.145)	0.155 (0.146)
	White	−0.156 (0.123)	−0.157 (0.122)	−0.130 (0.123)
Glucose	Abnormal	0.215 (0.064)	0.212 (0.063)	0.208 (0.063)
Hematocrit	Abnormal	−0.198 (0.108)	−0.194 (0.107)	−0.165 (0.108)
Hemoglobin	Abnormal	0.024 (0.122)	0.023 (0.121)	0.003 (0.121)
Insurance	Medicare	−0.224 (0.136)	−0.225 (0.135)	−0.171 (0.138)
	Other	−0.242 (0.133)	−0.240 (0.132)	−0.188 (0.135)
MCH	Abnormal	−0.066 (0.070)	−0.066 (0.069)	−0.057 (0.069)
MCHC	Abnormal	0.027 (0.056)	0.029 (0.055)	0.027 (0.055)
MCV	Abnormal	0.060 (0.072)	0.061 (0.071)	0.055 (0.071)
Magnesium	Abnormal	0.329 (0.073)	0.324 (0.072)	0.320 (0.072)
Marital status	Married	0.156 (0.102)	0.154 (0.101)	0.127 (0.061)
	Single	0.026 (0.107)	0.027 (0.106)	0.000 (0.008)
	Widowed	0.047 (0.115)	0.048 (0.114)	0.020 (0.084)
Night admission	Yes	−0.096 (0.053)	−0.093 (0.052)	−0.089 (0.052)
Phosphate	Abnormal	0.178 (0.056)	0.176 (0.055)	0.174 (0.055)
Platelet count	Abnormal	0.235 (0.054)	0.232 (0.054)	0.229 (0.054)
Potassium	Abnormal	0.227 (0.072)	0.221 (0.071)	0.221 (0.071)
RDW	Abnormal	0.492 (0.058)	0.486 (0.058)	0.483 (0.058)
Recent admission	Yes	0.250 (0.083)	0.242 (0.082)	0.242 (0.082)
Red blood cells	Abnormal	0.142 (0.105)	0.140 (0.104)	0.130 (0.104)
Sex	Female	−0.011 (0.057)	−0.008 (0.057)	−0.005 (0.057)
Sodium	Abnormal	0.276 (0.064)	0.270 (0.063)	0.268 (0.063)
Standardized age		0.580 (0.041)	0.574 (0.040)	0.568 (0.040)
Urea nitrogen	Abnormal	0.141 (0.070)	0.141 (0.070)	0.141 (0.070)
White blood cells	Abnormal	0.579 (0.056)	0.571 (0.056)	0.568 (0.055)

The global AUCs of the proposed approach without and with lasso penalty were highly similar, |$\widehat{\mbox{AUC}}=0.649$| (SD = 0.003) and |$\widehat{\mbox{AUC}}=0.651$| (SD = 0.003). By adding lasso regularization, the number of predictors for each event type was reduced (see last column of Tables 2–4), but the corresponding estimators for |$\alpha _{jt}$| remained highly similar.

The estimates for |$\mbox{AUC}_j(t)$| typically range from 0.5 to 0.8 for discharges to home or further treatment, and are higher for death within the first 3 days of hospitalization. The integrated cause-specific AUCs were |$\widehat{\mbox{AUC}}_1=0.642$| (SD = 0.002), |$\widehat{\mbox{AUC}}_2=0.655$| (SD = 0.012), and |$\widehat{\mbox{AUC}}_3=0.740$| (SD = 0.006), with a global |$\widehat{\mbox{AUC}}=0.651$| (SD = 0.003). The integrated cause-specific Brier Scores were |$\widehat{\mbox{BS}}_1=0.105$| (SD = 0.002), |$\widehat{\mbox{BS}}_2=0.042$| (SD = 0.001), and |$\widehat{\mbox{BS}}_3=0.010$| (SD = 0.001), with a global Brier Score of |$\widehat{\mbox{BS}}=0.085$| (SD = 0.001). Additional discussion of the results is provided in Web Appendix G.

5 DISCUSSION

This work provides a new estimation procedure for a semi-parametric logit-link survival model of discrete time with competing events. Our current deviation from Lee et al. (2018) involves a simplification by segregating the estimation procedures for |$\alpha _{jt}$| and |${\boldsymbol {\beta }}_j$|⁠. Our approach is valid when using both the logit- and log-link functions; however, it does not hold under the complementary log-log model. Our current software uses the logit link.

The hazard models considered in Tutz et al. (2016), Möst et al. (2016) and Schmid and Berger (2021) are of the form |$\lambda ^{*}_{j}(t|{\bf Z}) = \frac{\exp (\alpha ^{*}_{jt}+{\bf Z}^T {\boldsymbol {\beta }}^{*}_j)}{1+\sum _{j^{\prime }=1}^M\exp (\alpha ^{*}_{j^{\prime }t}+{\bf Z}^T {\boldsymbol {\beta }}^{*}_{j^{\prime }})} \, \, \, j=1,\ldots ,M \, .$| Namely, the hazard model |$\lambda ^{*}_{j}$| is a function not only of the parameters associated with the jth competing event but also of the parameters related to all other event types. In contrast, the hazard function |$\lambda _j$|⁠, adopted by Allison (1982), Lee et al. (2018), Wu et al. (2022) and in this work, is a function only of the parameters of the jth competing event. Both models, |$\lambda _j$| and |$\lambda _j^{*}$|⁠, are valid and were presented by Allison (1982). However, as discussed by Allison (1982), models in the spirit of |$\lambda _j$| provide a natural and direct analogy to the cause-specific hazard function in the context of continuous survival time. Because the discrete-time likelihood cannot be factored into separate components for each of the M types of events, Allison (1982) considered a more tractable formulation. In particular, he explored the generalization of the logistic model |$\lambda ^{*}_{j}$|⁠, which was later adopted by Tutz et al. (2016), Möst et al. (2016) and Schmid and Berger (2021).

In Web Appendix H, we show that although computation times for the two methods are comparable at lower values of d, our proposed method becomes more efficient as d increases. Furthermore, during tests on a system with 16GB RAM, Lee et al. (2018)’s method experienced memory errors at relatively low values of d, while our two-step procedure ran smoothly without any issues.

FUNDING

T.M. is supported by the Israeli Council for Higher Education (Vatat) fellowship in data science via the Technion; M.G. work was supported by the ISF 767/21 grant and Malag competitive grant in data science (DS).

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

The estimation procedures and simulation study were implemented in Python using the PyDTS package (Meir et al., 2022). An example of our approach implemented in R is also available. Codes are available at https://github.com/tomer1812/pydts/ and https://github.com/tomer1812/DiscreteTimeSurvivalPenalization. The MIMIC dataset is accessible at https://physionet.org/content/mimiciv/2.0/ and subjected to credentials.

REFERENCES

Adhikari

Fowler

Bhagwanjee

Rubenfeld

(

2010

Critical care and the global burden of critical illness in adults

The Lancet

376

1339

–

1346

Month:	Total Views:
April 2025	144
May 2025	37

Article Contents

Discrete-time competing-risks regression with or without penalization

ABSTRACT

1 INTRODUCTION

2 METHODS

2.1 Models and likelihood function

2.2 The collapsed log-likelihood approach of Lee et al. (2018)

2.3 The proposed approach

2.4 The utility of the proposed approach

3 SIMULATION STUDY

4 MIMIC DATA ANALYSIS - LENGTH OF HOSPITAL STAY IN ICU

5 DISCUSSION

FUNDING

CONFLICT OF INTEREST

DATA AVAILABILITY

REFERENCES

Supplementary data

Citations

Views

Altmetric

Email alerts

Related articles in

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only