A semiparametric quantile regression rank score test for zero-inflated data

Type I error result with the significant threshold |$\alpha =0.05$|⁠.

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.056	0.074	0.090	0.880	0.904
	\|$x_2$\|	0.042	0.066	0.078	0.822	0.820
	\|$x_3$\|	0.054	0.064	0.090	0.902	0.888
	\|$x_4$\|	0.052	0.052	0.092	0.888	0.904
	\|$x_5$\|	0.046	0.062	0.082	0.906	0.888
Setting 2	\|$x_2$\|	0.044	0.082	0.088	0.850	0.834
	\|$x_3$\|	0.040	0.072	0.096	0.890	0.880
	\|$x_4$\|	0.044	0.066	0.314	0.908	0.896
	\|$x_5$\|	0.034	0.072	\|$ 0.330$\|	0.870	0.878
	\|$x_2$\|⁠, \|$x_3$\|	\|$0.052$\|	0.064	0.094	\|$0.970$\|	\|$0.946$\|
	\|$x_4$\|⁠, \|$x_5$\|	0.042	0.058	0.068	0.978	0.996

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.056	0.074	0.090	0.880	0.904
	\|$x_2$\|	0.042	0.066	0.078	0.822	0.820
	\|$x_3$\|	0.054	0.064	0.090	0.902	0.888
	\|$x_4$\|	0.052	0.052	0.092	0.888	0.904
	\|$x_5$\|	0.046	0.062	0.082	0.906	0.888
Setting 2	\|$x_2$\|	0.044	0.082	0.088	0.850	0.834
	\|$x_3$\|	0.040	0.072	0.096	0.890	0.880
	\|$x_4$\|	0.044	0.066	0.314	0.908	0.896
	\|$x_5$\|	0.034	0.072	\|$ 0.330$\|	0.870	0.878
	\|$x_2$\|⁠, \|$x_3$\|	\|$0.052$\|	0.064	0.094	\|$0.970$\|	\|$0.946$\|
	\|$x_4$\|⁠, \|$x_5$\|	0.042	0.058	0.068	0.978	0.996

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.046	0.056	0.048	0.902	0.948
	\|$x_2$\|	0.046	0.062	0.062	0.856	0.986
	\|$x_3$\|	0.044	0.050	0.072	0.906	0.932
	\|$x_4$\|	0.050	0.054	0.042	0.866	0.954
	\|$x_5$\|	0.050	0.058	0.068	0.932	0.940
Setting 2	\|$x_2$\|	0.058	0.058	0.066	0.798	0.988
	\|$x_3$\|	0.046	0.058	0.110	0.896	0.916
	\|$x_4$\|	0.056	0.062	0.326	0.912	0.974
	\|$x_5$\|	0.056	0.040	0.248	0.876	0.984
	\|$x_2$\|⁠, \|$x_3$\|	0.056	0.066	0.076	0.965	0.998
	\|$x_4$\|⁠, \|$x_5$\|	0.056	0.054	0.062	0.894	0.918

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.046	0.056	0.048	0.902	0.948
	\|$x_2$\|	0.046	0.062	0.062	0.856	0.986
	\|$x_3$\|	0.044	0.050	0.072	0.906	0.932
	\|$x_4$\|	0.050	0.054	0.042	0.866	0.954
	\|$x_5$\|	0.050	0.058	0.068	0.932	0.940
Setting 2	\|$x_2$\|	0.058	0.058	0.066	0.798	0.988
	\|$x_3$\|	0.046	0.058	0.110	0.896	0.916
	\|$x_4$\|	0.056	0.062	0.326	0.912	0.974
	\|$x_5$\|	0.056	0.040	0.248	0.876	0.984
	\|$x_2$\|⁠, \|$x_3$\|	0.056	0.066	0.076	0.965	0.998
	\|$x_4$\|⁠, \|$x_5$\|	0.056	0.054	0.062	0.894	0.918

TABLE 1

Type I error result with the significant threshold |$\alpha =0.05$|⁠.

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.056	0.074	0.090	0.880	0.904
	\|$x_2$\|	0.042	0.066	0.078	0.822	0.820
	\|$x_3$\|	0.054	0.064	0.090	0.902	0.888
	\|$x_4$\|	0.052	0.052	0.092	0.888	0.904
	\|$x_5$\|	0.046	0.062	0.082	0.906	0.888
Setting 2	\|$x_2$\|	0.044	0.082	0.088	0.850	0.834
	\|$x_3$\|	0.040	0.072	0.096	0.890	0.880
	\|$x_4$\|	0.044	0.066	0.314	0.908	0.896
	\|$x_5$\|	0.034	0.072	\|$ 0.330$\|	0.870	0.878
	\|$x_2$\|⁠, \|$x_3$\|	\|$0.052$\|	0.064	0.094	\|$0.970$\|	\|$0.946$\|
	\|$x_4$\|⁠, \|$x_5$\|	0.042	0.058	0.068	0.978	0.996

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.056	0.074	0.090	0.880	0.904
	\|$x_2$\|	0.042	0.066	0.078	0.822	0.820
	\|$x_3$\|	0.054	0.064	0.090	0.902	0.888
	\|$x_4$\|	0.052	0.052	0.092	0.888	0.904
	\|$x_5$\|	0.046	0.062	0.082	0.906	0.888
Setting 2	\|$x_2$\|	0.044	0.082	0.088	0.850	0.834
	\|$x_3$\|	0.040	0.072	0.096	0.890	0.880
	\|$x_4$\|	0.044	0.066	0.314	0.908	0.896
	\|$x_5$\|	0.034	0.072	\|$ 0.330$\|	0.870	0.878
	\|$x_2$\|⁠, \|$x_3$\|	\|$0.052$\|	0.064	0.094	\|$0.970$\|	\|$0.946$\|
	\|$x_4$\|⁠, \|$x_5$\|	0.042	0.058	0.068	0.978	0.996

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.046	0.056	0.048	0.902	0.948
	\|$x_2$\|	0.046	0.062	0.062	0.856	0.986
	\|$x_3$\|	0.044	0.050	0.072	0.906	0.932
	\|$x_4$\|	0.050	0.054	0.042	0.866	0.954
	\|$x_5$\|	0.050	0.058	0.068	0.932	0.940
Setting 2	\|$x_2$\|	0.058	0.058	0.066	0.798	0.988
	\|$x_3$\|	0.046	0.058	0.110	0.896	0.916
	\|$x_4$\|	0.056	0.062	0.326	0.912	0.974
	\|$x_5$\|	0.056	0.040	0.248	0.876	0.984
	\|$x_2$\|⁠, \|$x_3$\|	0.056	0.066	0.076	0.965	0.998
	\|$x_4$\|⁠, \|$x_5$\|	0.056	0.054	0.062	0.894	0.918

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index	ZINB	ZIP
Setting 1	\|$x_1$\|	0.046	0.056	0.048	0.902	0.948
	\|$x_2$\|	0.046	0.062	0.062	0.856	0.986
	\|$x_3$\|	0.044	0.050	0.072	0.906	0.932
	\|$x_4$\|	0.050	0.054	0.042	0.866	0.954
	\|$x_5$\|	0.050	0.058	0.068	0.932	0.940
Setting 2	\|$x_2$\|	0.058	0.058	0.066	0.798	0.988
	\|$x_3$\|	0.046	0.058	0.110	0.896	0.916
	\|$x_4$\|	0.056	0.062	0.326	0.912	0.974
	\|$x_5$\|	0.056	0.040	0.248	0.876	0.984
	\|$x_2$\|⁠, \|$x_3$\|	0.056	0.066	0.076	0.965	0.998
	\|$x_4$\|⁠, \|$x_5$\|	0.056	0.054	0.062	0.894	0.918

When the sample size increases to |$n=2000$|⁠, we use the asymptotic distribution of |$\mathcal {T}_\tau$| to obtain its |$p$|-value. We observe that the type I error of ZIQ-SIR is under control (Table 1), confirming the validity of asymptotic distribution. Note that type I error inflation persists for Quantile Single-Index and ZIQRank across different sample sizes due to the limited flexibility of their models. The type I error of ZIQRank is improved due to the increased sample size, similar to Quantile Single-index under Setting 1. However, their type I error is still inflated under Setting 2 due to the correlation among covariates and their model misspecifications.

The hypothesis testing results with a significance level of |$\alpha = 0.01$| are consistent with the above results, provided in Supplement B.2 (Tables S.2 and S.4). We have also conducted additional simulation studies using hurdle Poisson and hurdle negative binomial methods (Mullahy, 1986). Similar to ZIP and ZINB, the hurdle methods also exhibit type I error inflation due to their restrictive parametric assumptions (Supplement B.3).

3.3 Power results

Power results are also performed under Setting 1 and Setting 2 with sample sizes of |$n\in \lbrace 500,2000\rbrace$|⁠. We do not present the power results of ZINB and ZIP methods, since Table 1 shows their severe type I error inflation. The power for ZIQ-SIR, ZIQRank, and Quantile Single-index are presented in Table 2.

TABLE 2

Power results (without ZIP and ZINB) with the significant threshold |$\alpha =0.05$|⁠.

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.484	0.474	0.124
	\|$x_2$\|	0.080	0.082	0.080
Setting 1	\|$x_3$\|	0.594	0.608	0.120
	\|$x_4$\|	0.276	0.278	0.098
	\|$x_5$\|	0.274	0.260	0.112
Setting 2	\|$x_2$\|	0.066	0.068	0.090
	\|$x_3$\|	0.330	0.286	0.126
	\|$x_4$\|	0.066	0.068	0.366
	\|$x_5$\|	0.068	0.082	0.384
	\|$x_2$\|⁠, \|$x_3$\|	0.292	0.270	0.116
	\|$x_4$\|⁠, \|$x_5$\|	0.082	0.074	0.110

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.484	0.474	0.124
	\|$x_2$\|	0.080	0.082	0.080
Setting 1	\|$x_3$\|	0.594	0.608	0.120
	\|$x_4$\|	0.276	0.278	0.098
	\|$x_5$\|	0.274	0.260	0.112
Setting 2	\|$x_2$\|	0.066	0.068	0.090
	\|$x_3$\|	0.330	0.286	0.126
	\|$x_4$\|	0.066	0.068	0.366
	\|$x_5$\|	0.068	0.082	0.384
	\|$x_2$\|⁠, \|$x_3$\|	0.292	0.270	0.116
	\|$x_4$\|⁠, \|$x_5$\|	0.082	0.074	0.110

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.998	0.992	0.386
	\|$x_2$\|	0.122	0.156	0.078
Setting 1	\|$x_3$\|	1.000	0.998	0.384
	\|$x_4$\|	0.916	0.910	0.210
	\|$x_5$\|	0.854	0.824	0.298
Setting 2	\|$x_2$\|	0.154	0.162	0.084
	\|$x_3$\|	0.934	0.906	0.180
	\|$x_4$\|	0.266	0.262	0.430
	\|$x_5$\|	0.194	0.190	0.426
	\|$x_2$\|⁠, \|$x_3$\|	0.914	0.886	0.084
	\|$x_4$\|⁠, \|$x_5$\|	0.218	0.198	0.096

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.998	0.992	0.386
	\|$x_2$\|	0.122	0.156	0.078
Setting 1	\|$x_3$\|	1.000	0.998	0.384
	\|$x_4$\|	0.916	0.910	0.210
	\|$x_5$\|	0.854	0.824	0.298
Setting 2	\|$x_2$\|	0.154	0.162	0.084
	\|$x_3$\|	0.934	0.906	0.180
	\|$x_4$\|	0.266	0.262	0.430
	\|$x_5$\|	0.194	0.190	0.426
	\|$x_2$\|⁠, \|$x_3$\|	0.914	0.886	0.084
	\|$x_4$\|⁠, \|$x_5$\|	0.218	0.198	0.096

TABLE 2

Open in new tab Download slide

Power results (without ZIP and ZINB) with the significant threshold |$\alpha =0.05$|⁠.

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.484	0.474	0.124
	\|$x_2$\|	0.080	0.082	0.080
Setting 1	\|$x_3$\|	0.594	0.608	0.120
	\|$x_4$\|	0.276	0.278	0.098
	\|$x_5$\|	0.274	0.260	0.112
Setting 2	\|$x_2$\|	0.066	0.068	0.090
	\|$x_3$\|	0.330	0.286	0.126
	\|$x_4$\|	0.066	0.068	0.366
	\|$x_5$\|	0.068	0.082	0.384
	\|$x_2$\|⁠, \|$x_3$\|	0.292	0.270	0.116
	\|$x_4$\|⁠, \|$x_5$\|	0.082	0.074	0.110

\|$n=500$\|	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.484	0.474	0.124
	\|$x_2$\|	0.080	0.082	0.080
Setting 1	\|$x_3$\|	0.594	0.608	0.120
	\|$x_4$\|	0.276	0.278	0.098
	\|$x_5$\|	0.274	0.260	0.112
Setting 2	\|$x_2$\|	0.066	0.068	0.090
	\|$x_3$\|	0.330	0.286	0.126
	\|$x_4$\|	0.066	0.068	0.366
	\|$x_5$\|	0.068	0.082	0.384
	\|$x_2$\|⁠, \|$x_3$\|	0.292	0.270	0.116
	\|$x_4$\|⁠, \|$x_5$\|	0.082	0.074	0.110

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.998	0.992	0.386
	\|$x_2$\|	0.122	0.156	0.078
Setting 1	\|$x_3$\|	1.000	0.998	0.384
	\|$x_4$\|	0.916	0.910	0.210
	\|$x_5$\|	0.854	0.824	0.298
Setting 2	\|$x_2$\|	0.154	0.162	0.084
	\|$x_3$\|	0.934	0.906	0.180
	\|$x_4$\|	0.266	0.262	0.430
	\|$x_5$\|	0.194	0.190	0.426
	\|$x_2$\|⁠, \|$x_3$\|	0.914	0.886	0.084
	\|$x_4$\|⁠, \|$x_5$\|	0.218	0.198	0.096

n = 2000	Predictor	ZIQ-SIR	ZIQRank	Quantile Single-index
	\|$x_1$\|	0.998	0.992	0.386
	\|$x_2$\|	0.122	0.156	0.078
Setting 1	\|$x_3$\|	1.000	0.998	0.384
	\|$x_4$\|	0.916	0.910	0.210
	\|$x_5$\|	0.854	0.824	0.298
Setting 2	\|$x_2$\|	0.154	0.162	0.084
	\|$x_3$\|	0.934	0.906	0.180
	\|$x_4$\|	0.266	0.262	0.430
	\|$x_5$\|	0.194	0.190	0.426
	\|$x_2$\|⁠, \|$x_3$\|	0.914	0.886	0.084
	\|$x_4$\|⁠, \|$x_5$\|	0.218	0.198	0.096

Under Setting 1, with sample sizes of |$n\in \lbrace 500,2000\rbrace$|⁠, we observe that our method demonstrates comparable power to the ZIQRank method, while the Quantile Single-index method shows significantly lower power. This suggests that ZIQ-SIR and ZIQRank better address the issue of zero inflation in the simulated data, thus achieving higher power. Under Setting 2, we observe that the power of the ZIQ-SIR method is generally higher than the ZIQRank method, suggesting that the ZIQ-SIR method better handles the correlation among covariates and results in a higher power. The Quantile Single-index method shows the highest power for certain covariates due to its type I error inflation (Table 1).

To better reflect real microbial taxonomic count data, we conduct additional simulations with count responses and present the results in Supplement B.4. These results, which account for the discrete nature of the data, remain consistent with our original findings.

4 APPLICATION IN COLUMBIAN’S GUT DATA

We demonstrate the performance of our ZIQ-SIR method using data from the Colombian Gut study (De la Cuesta-Zuluaga et al., 2018; Gonzalez et al., 2018). We present hypothesis testing results for taxa associated with health-related biological and dietary features using both ZIQ-SIR and ZIQRank. The results for ZIP, ZINB, and the Quantile Single-Index method are not included due to their significantly inflated type I error rates.

4.1 Data description

The dataset consists of microbiome counts for 425 taxa at the genus level for 441 adults, alongside covariates related to diet (eg, fiber, various fats, protein), anthropometric measures [age, body mass index (BMI)], lipid profile [high-density lipoprotein (HDL), LDL], glucose metabolism (glucose, insulin), blood pressure (diastolic and systolic), city, and medication usage. Categorical variables such as sex, medication, and city were treated as dummy variables. We excluded 3 subjects due to missing values and 2 others with triglyceride levels exceeding 800 mg/dL, resulting in a final sample size of 436.

We analyzed 109 taxa with observed zero proportions below 0.8, as higher percentages of zeros can lead to unreliable results (Zhang and Yi, 2020). Consistent with standard practice in microbiome studies (Xia et al., 2018), we adjusted for library size (ie, the total count of the 109 taxa per person) by including it as a covariate in our model. Other data normalization methods, such as rarefaction (Willis, 2019), relative abundance (Gloor et al., 2016), and CSS (McKnight et al., 2019), can also be applied. Additionally, since the quantile-based models are designed for continuous outcomes, we applied jittering (uniformly distributed between 0 and 1) to the non-zero microbiome counts.

The 24 covariates were divided into three groups: biological features, dietary features, and other covariates. Other covariates include 4 covariates: age, sex, city, and medication usage. The biological feature group comprises 12 covariates, including adiponectin, BMI, cholesterol, diastolic blood pressure, systolic blood pressure, glucose, glycosylated hemoglobin, HDL, LDL, insulin, triglycerides, and waist circumference. These covariates are closely linked to overall health (Ma and Shieh, 2006; Stefan et al., 2003). The dietary feature group includes 8 covariates related to macronutrient consumption: fiber, percentage of animal protein, carbohydrates, monounsaturated fat, polyunsaturated fat, saturated fat, total fat, and protein. We performed hypothesis testing for the 109 taxa against the biological and dietary feature groups using the proposed ZIQ-SIR method with a fast permutation approach (Supplement A.2), comparing the results with those by the ZIQRank method.

4.2 Permutation test for type I error evaluation

As the effective sample size is relatively small due to excessive zeros, before analyzing the data using the two methods, we first evaluate the type I errors of ZIQ-SIR and ZIQRank on the real microbiome data testing for the biological features and dietary features. We permute the covariates jointly for each subject to create 50 null datasets. The permutation maintains the association between all the covariates but removes the association between covariates and the response, microbial abundance. Therefore, none of the covariates should have an impact on microbial abundance, and microbiomes with small |$p$|-values are considered false positives. We use the null datasets to test for the biological and dietary features, respectively, and report the type I error by the proportion of taxa with nominal |$p$|-values less than 0.05 within each set. This evaluation procedure is widely adopted in real data analysis (Ling et al., 2021b; Soneson and Robinson, 2018). Results suggest that, in both analyses, our method has type I error controlled with around 5% taxa having |$p$|-values smaller than 0.05, while the ZIQRank method has inflation to some extent (Figure 3).

$Boxplot of fraction of taxa with $p<0.05$ based on 50 null datasets.$

FIGURE 3

Boxplot of fraction of taxa with |$p<0.05$| based on 50 null datasets.

4.3 Hypothesis testing results

We tested the relationship between microbial abundance and covariates of interest using the proposed ZIQ-SIR method and ZIQRank. The |$p$|-values are adjusted for multiple testing by controlling the False Discovery Rate (FDR) (Benjamini and Hochberg, 1995), and taxa with FDR-adjusted |$p$|-values less than 0.05 were considered significantly associated with the biological or dietary features. The detailed |$p$|-values are given in Table S.9, Supplement C.1.

Using the ZIQ-SIR method, we identified 3 taxa associated with the biological features at a 5% FDR threshold, including 1 taxon also identified by ZIQRank (Table S.9). Peptococcaceae-unspecified and Rhizobiales-unspecified-unspecified are exclusively discovered by our method. The literature suggests that Peptococcaceae-unspecified is closely related to LDL levels (Zhu et al., 2021) and lower triglyceride levels (Ejtahed et al., 2020), findings confirmed by our analysis. Similarly, Rhizobiales-unspecified-unspecified has been linked to glucose levels (Asensio et al., 2022). Both methods identified Actinomycetales-unspecified-unspecified, which has been reported to increase significantly in CAD patients with higher BMI, lower cholesterol, and higher glucose levels (Sawicka-Smiarowska et al., 2021).

For the dietary features, the ZIQ-SIR method uniquely identified the taxon Peptostreptococcus and found Peptococcaceae-unspecified in common with the ZIQRank method (Table S.9). Previous studies have indicated that low-fiber, high-protein diets influence the abundance of Peptostreptococcus (Martínez-López et al., 2021), while experimental evidence shows that Peptococcaceae-unspecified levels are significantly higher in individuals consuming high-fat diets (Wang et al., 2020). These findings are consistent with our analysis.

5 DISCUSSION

We proposed ZIQ-SIR to test associations between zero-inflated outcomes and covariates of interest, accommodating their potential nonlinear relationships. Our method introduces a new single-index model for zero-inflated data and provides detailed procedures for both large and small sample sizes. Numerical experiments show that ZIQ-SIR maintains well-controlled type I errors while achieving high power, whereas existing methods like ZINB, ZIP, and ZIQRank often exhibit inflated type I errors when their assumptions are violated.

While the single-index model in ZIQ-SIR provides greater flexibility for the positive component, the logistic regression component retains a linear structure, as real-world applications have not shown compelling evidence for a more complex formulation. Future work could explore more generalized logistic regression models (Stukel, 1988). Another promising future direction is adapting our method to test associations between covariates and groups of zero-inflated outcomes (eg, multiple taxa in microbiome studies), requiring methods for multivariate responses. This extension could reveal broader patterns and improve inference. Incorporating structural information, such as hierarchical or clustered relationships among outcomes, may further enhance model fit and statistical power (Washburne et al., 2018).

ACKNOWLEDGMENTS

We thank the editor, associate editor, and two referees for their valuable comments and constructive suggestions.

FUNDING

This work was supported, in part, by a grant from the National Heart, Lung, and Blood Institute [R01 HL155417] and two grants from the National Institute of General Medical Sciences [R01GM151301, R01GM155734].

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

The dataset that supports the findings in this paper is available on https://qiita.ucsd.edu/ with Study ID 11993 (De la Cuesta-Zuluaga et al., 2018; Gonzalez et al., 2018).

REFERENCES

Agarwal

D. K.

Gelfand

A. E.

Citron-Pousty

(

2002

Zero-inflated models with application to spatial count data

Environmental and Ecological Statistics

341

–

355

Asensio

E. M.

Ortega-Azorín

Barragán

Alvarez-Sala

Sorlí

J. V.

P. E.

et al. (

2022

Association between microbiome-related human genetic variants and fasting plasma glucose in a high-cardiovascular-risk mediterranean population

Medicina

1238

Benjamini

Hochberg

(

1995

Controlling the false discovery rate: a practical and powerful approach to multiple testing

Journal of the Royal Statistical Society. Series B (Methodological)

289

–

300

De Boor

(

2001

A practical guide to splines, Revised edition Applied Mathematical Sciences

New York

Springer New York

De la Cuesta-Zuluaga

Corrales-Agudelo

Velásquez-Mejía

E. P.

Carmona

J. A.

Abad

J. M.

Escobar

J. S.

(

2018

Gut microbiota is associated with obesity and cardiometabolic disease in a population in the midst of Westernization

Scientific Reports

–

Edgington

E. S.

(

1972

An additive method for combining probability values from independent experiments

The Journal of Psychology

351

–

363

Efron

Tibshirani

R. J.

(

1994

An introduction to the bootstrap

New York

Chapman and Hall/CRC

Eilers

P. H.

Marx

B. D.

(

1996

Flexible smoothing with B-splines and penalties

Statistical Science

–

121

Ejtahed

H.-S.

Angoorani

Soroush

A.-R.

Hasani-Ranjbar

Siadat

S.-D.

Larijani

(

2020

Gut microbiota-derived metabolites in obesity: a systematic review

Bioscience of Microbiota, Food and Health

–

Frank

D. N.

St. Amand

A. L.

Feldman

R. A.

Boedeker

E. C.

Harpaz

Pace

N. R.

(

2007

Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases

Proceedings of the National Academy of Sciences

104

13780

–

13785

Gloor

G. B.

J. R.

Pawlowsky-Glahn

Egozcue

J. J.

(

2016

It’s all relative: analyzing microbiome data as compositions

Annals of Epidemiology

322

–

329

Gonzalez

Navas-Molina

J. A.

Kosciolek

McDonald

Vázquez-Baeza

Ackermann

et al. (

2018

Qiita: rapid, web-enabled microbiome meta-analysis

Nature Methods

796

–

798

Gutenbrunner

Jurečková

Koenker

Portnoy

(

1993

Tests of linear hypotheses based on regression rank scores

Journal of Nonparametric Statistics

307

–

331

Hawinkel

Mattiello

Bijnens

Thas

(

2019

A broken promise: microbiome differential abundance methods do not control the false discovery rate

Briefings in Bioinformatics

210

–

221

Horton

N. J.

Kim

Saitz

(

2007

A cautionary note regarding count models of alcohol consumption in randomized controlled trials

BMC Medical Research Methodology

–

Jiang

Ling

Zhang

Mao

Yin

et al. (

2015

Altered fecal microbiota composition in patients with major depressive disorder

Brain, Behavior, and Immunity

186

–

194

Kinder-Haake

(

2012

A framework for human microbiome research

Nature

486

215

–

221

Koenker

Bassett

Jr (

1978

Regression quantiles

Econometrica: Journal of the Econometric Society

–

Koenker

Portnoy

(

1994

Quantile smoothing splines

Biometrika

673

–

680

Kostic

A. D.

Chun

Robertson

Glickman

J. N.

Gallini

C. A.

Michaud

et al. (

2013

Fusobacterium nucleatum potentiates intestinal tumorigenesis and modulates the tumor-immune microenvironment

Cell Host and Microbe

207

–

215

Lewsey

J. D.

Thomson

W. M.

(

2004

The utility of the zero-inflated Poisson and zero-inflated negative binomial models: a case study of cross-sectional and longitudinal DMF data examining the effect of socio-economic status

Community Dentistry and Oral Epidemiology

183

–

189

Liang

Liu

Tsai

C.-L.

(

2010

Estimation and testing for partially linear single-index models

Annals of Statistics

3811

Ling

Zhang

Cheng

Wei

(

2021a

Zero-inflated quantile rank-score based test (ZIQRank) with application to scRNA-seq differential gene expression analysis

The Annals of Applied Statistics

1673

Ling

Zhao

Plantinga

A. M.

Launer

L. J.

Fodor

A. A.

Meyer

K. A.

et al. (

2021b

Powerful and robust non-parametric association testing for microbiome data via a zero-inflated quantile approach (ZINQ)

Microbiome

–

Liu

Song

Zhou

Xia

Dong

et al. (

2018

Gut microbiome associates with lipid-lowering effect of rosuvastatin in vivo

Frontiers in Microbiology

530

Liu

Xie

(

2020

Cauchy combination test: a powerful test with analytic p-value calculation under arbitrary dependency structures

Journal of the American Statistical Association

115

393

–

402

Shieh

K.-J.

(

2006

Cholesterol and human health

The Journal of American Science

–

(

2016

Inference For Single-Index Quantile Regression Models With Profile Optimization

The Annals of Statistics

1234

–

1268

Martin

B. D.

Witten

Willis

A. D.

(

2020

Modeling microbial abundances and dysbiosis with beta-binomial regression

The Annals of Applied Statistics

Martínez-López

L. M.

Pepper

Pilla

Woodward

A. P.

Suchodolski

J. S.

Mansfield

(

2021

Effect of sequentially fed high protein, hydrolyzed protein, and high fiber diets on the fecal microbiota of healthy dogs: a cross-over study

Animal Microbiome

–

McKnight

D. T.

Huerlimann

Bower

D. S.

Schwarzkopf

Alford

R. A.

Zenger

K. R.

(

2019

Methods for normalizing microbiome data: an ecological perspective

Methods in Ecology and Evolution

389

–

400

Mullahy

(

1986

Specification and testing of some modified count data models

Journal of Econometrics

341

–

365

Nam

Henderson

N. C.

Rohan

Woo

E. J.

Russek-Cohen

(

2017

Logistic regression likelihood ratio test analysis for detecting signals of adverse events in post-market safety surveillance

Journal of Biopharmaceutical Statistics

990

–

1008

Neykov

Liu

J. S.

Cai

(

2016

L1-regularized least squares for support recovery of high dimensional single index models with gaussian designs

The Journal of Machine Learning Research

2976

–

3012

Ridout

Hinde

Demétrio

C. G.

(

2001

A score test for testing a zero-inflated Poisson regression model against zero-inflated negative binomial alternatives

Biometrics

219

–

223

Sawicka-Smiarowska

Bondarczuk

Bauer

Niemira

Szalkowska

Raczkowska

et al. (

2021

Gut microbiome in chronic coronary syndrome patients

Journal of Clinical Medicine

5074

Schwarzer

Antes

Schumacher

(

2002

Inflation of type I error rate in two statistical tests for the detection of publication bias in meta-analyses with binary outcomes

Statistics in Medicine

2465

–

2477

Soneson

Robinson

M. D.

(

2018

Bias, robustness and scalability in single-cell differential expression analysis

Nature Methods

255

Stefan

Stumvoll

Vozarova

Weyer

Funahashi

Matsuzawa

et al. (

2003

Plasma adiponectin and endogenous glucose production in humans

Diabetes Care

3315

–

3319

Stukel

T. A.

(

1988

Generalized logistic models

Journal of the American Statistical Association

426

–

431

Tippett

L. H. C.

and

others

(

1931

The Methods of Statistics

London

Williams & Norgate Ltd

Wang

Kong

Zhao

Zhang

Chen

et al. (

2020

A high-fat diet increases gut microbiota biodiversity and energy expenditure due to nutrient difference

Nutrients

3197

Wang

C.-Y.

(

2015

Variable selection for zero-inflated and overdispersed data with application to health care demand in Germany

Biometrical Journal

867

–

884

Washburne

A. D.

Morton

J. T.

Sanders

McDonald

Zhu

Oliverio

A. M.

et al. (

2018

Methods for phylogenetic analysis of microbiome data

Nature Microbiology

652

–

661

Willis

A. D.

(

2019

Rarefaction, alpha diversity, and statistics

Frontiers in Microbiology

2407

Xia

Sun

Chen

D.-G.

Xia

Sun

Chen

D.-G.

(

2018

Modeling zero-inflated microbiome data. Statistical Analysis of Microbiome Data with R

453

–

496

Singapore

Springer

Yau

K. K.

Wang

Lee

A. H.

(

2003

Zero-inflated negative binomial mixed regression modeling of over-dispersed count data with extra zeros

Biometrical Journal: Journal of Mathematical Methods in Biosciences

437

–

452

Zaykin

D. V.

Zhivotovsky

L. A.

Westfall

P. H.

Weir

B. S.

(

2002

Truncated product method for combining P-values

Genetic Epidemiology: The Official Publication of the International Genetic Epidemiology Society

170

–

185