BAMITA: Bayesian multiple imputation for tensor arrays

Simulation results for study 1: rank selection and independent error.

Missing	Tensor	Missing	BAMITA independent				EM algorithm	True	Selected	Overall
pattern	dimension	proportion	True rank		Selected rank		True rank	Converged	Converged	Converged
			MSE	Coverage (%)	MSE	Coverage (%)	MSE	%	%	%
Entry missing	(10× 10× 10)	20%	0.399	94.8	0.327	94.7	0.363	99	100	99
		50%	0.566	95.2	0.508	92.6	0.718	96	100	96
		70%	1.323	93.0	0.869	86.0	1.136	90	91	85
	(20× 20× 20)	20%	0.269	94.4	0.267	94.5	0.266	99	100	99
		50%	0.306	94.5	0.302	94.7	0.326	94	99	94
		70%	0.357	94.0	0.335	95.0	0.522	76	100	76
	(10× 100× 1000)	20%	0.257	94.4	0.257	94.4	0.259	97	97	97
		50%	0.271	94.4	0.271	94.4	0.272	98	98	98
		70%	0.256	94.3	0.256	94.3	0.287	92	92	91
Fiber missing	(10× 10× 10)	20%	0.375	95.3	0.375	95.3	0.438	100	100	100
		50%	0.523	93.1	0.478	92.8	0.876	88	89	85
		70%	0.771	87.3	0.641	83.8	1.128	34	43	23
	(20× 20× 20)	20%	0.285	94.8	0.285	94.8	0.279	100	100	100
		50%	0.300	94.5	0.298	94.8	0.371	94	100	94
		70%	0.335	90.9	0.335	93.1	0.720	65	81	65
	(10× 100× 1000)	20%	0.251	94.4	0.251	94.4	0.265	100	100	100
		50%	0.254	93.2	0.254	93.2	0.346	98	98	97
		70%	0.260	87.6	0.259	87.6	0.570	52	48	47

Missing	Tensor	Missing	BAMITA independent				EM algorithm	True	Selected	Overall
pattern	dimension	proportion	True rank		Selected rank		True rank	Converged	Converged	Converged
			MSE	Coverage (%)	MSE	Coverage (%)	MSE	%	%	%
Entry missing	(10× 10× 10)	20%	0.399	94.8	0.327	94.7	0.363	99	100	99
		50%	0.566	95.2	0.508	92.6	0.718	96	100	96
		70%	1.323	93.0	0.869	86.0	1.136	90	91	85
	(20× 20× 20)	20%	0.269	94.4	0.267	94.5	0.266	99	100	99
		50%	0.306	94.5	0.302	94.7	0.326	94	99	94
		70%	0.357	94.0	0.335	95.0	0.522	76	100	76
	(10× 100× 1000)	20%	0.257	94.4	0.257	94.4	0.259	97	97	97
		50%	0.271	94.4	0.271	94.4	0.272	98	98	98
		70%	0.256	94.3	0.256	94.3	0.287	92	92	91
Fiber missing	(10× 10× 10)	20%	0.375	95.3	0.375	95.3	0.438	100	100	100
		50%	0.523	93.1	0.478	92.8	0.876	88	89	85
		70%	0.771	87.3	0.641	83.8	1.128	34	43	23
	(20× 20× 20)	20%	0.285	94.8	0.285	94.8	0.279	100	100	100
		50%	0.300	94.5	0.298	94.8	0.371	94	100	94
		70%	0.335	90.9	0.335	93.1	0.720	65	81	65
	(10× 100× 1000)	20%	0.251	94.4	0.251	94.4	0.265	100	100	100
		50%	0.254	93.2	0.254	93.2	0.346	98	98	97
		70%	0.260	87.6	0.259	87.6	0.570	52	48	47

Results are presented as median values over all the missing elements. The best performance in each setting was marked in boldface.

Table 1

Simulation results for study 1: rank selection and independent error.

Missing	Tensor	Missing	BAMITA independent				EM algorithm	True	Selected	Overall
pattern	dimension	proportion	True rank		Selected rank		True rank	Converged	Converged	Converged
			MSE	Coverage (%)	MSE	Coverage (%)	MSE	%	%	%
Entry missing	(10× 10× 10)	20%	0.399	94.8	0.327	94.7	0.363	99	100	99
		50%	0.566	95.2	0.508	92.6	0.718	96	100	96
		70%	1.323	93.0	0.869	86.0	1.136	90	91	85
	(20× 20× 20)	20%	0.269	94.4	0.267	94.5	0.266	99	100	99
		50%	0.306	94.5	0.302	94.7	0.326	94	99	94
		70%	0.357	94.0	0.335	95.0	0.522	76	100	76
	(10× 100× 1000)	20%	0.257	94.4	0.257	94.4	0.259	97	97	97
		50%	0.271	94.4	0.271	94.4	0.272	98	98	98
		70%	0.256	94.3	0.256	94.3	0.287	92	92	91
Fiber missing	(10× 10× 10)	20%	0.375	95.3	0.375	95.3	0.438	100	100	100
		50%	0.523	93.1	0.478	92.8	0.876	88	89	85
		70%	0.771	87.3	0.641	83.8	1.128	34	43	23
	(20× 20× 20)	20%	0.285	94.8	0.285	94.8	0.279	100	100	100
		50%	0.300	94.5	0.298	94.8	0.371	94	100	94
		70%	0.335	90.9	0.335	93.1	0.720	65	81	65
	(10× 100× 1000)	20%	0.251	94.4	0.251	94.4	0.265	100	100	100
		50%	0.254	93.2	0.254	93.2	0.346	98	98	97
		70%	0.260	87.6	0.259	87.6	0.570	52	48	47

Missing	Tensor	Missing	BAMITA independent				EM algorithm	True	Selected	Overall
pattern	dimension	proportion	True rank		Selected rank		True rank	Converged	Converged	Converged
			MSE	Coverage (%)	MSE	Coverage (%)	MSE	%	%	%
Entry missing	(10× 10× 10)	20%	0.399	94.8	0.327	94.7	0.363	99	100	99
		50%	0.566	95.2	0.508	92.6	0.718	96	100	96
		70%	1.323	93.0	0.869	86.0	1.136	90	91	85
	(20× 20× 20)	20%	0.269	94.4	0.267	94.5	0.266	99	100	99
		50%	0.306	94.5	0.302	94.7	0.326	94	99	94
		70%	0.357	94.0	0.335	95.0	0.522	76	100	76
	(10× 100× 1000)	20%	0.257	94.4	0.257	94.4	0.259	97	97	97
		50%	0.271	94.4	0.271	94.4	0.272	98	98	98
		70%	0.256	94.3	0.256	94.3	0.287	92	92	91
Fiber missing	(10× 10× 10)	20%	0.375	95.3	0.375	95.3	0.438	100	100	100
		50%	0.523	93.1	0.478	92.8	0.876	88	89	85
		70%	0.771	87.3	0.641	83.8	1.128	34	43	23
	(20× 20× 20)	20%	0.285	94.8	0.285	94.8	0.279	100	100	100
		50%	0.300	94.5	0.298	94.8	0.371	94	100	94
		70%	0.335	90.9	0.335	93.1	0.720	65	81	65
	(10× 100× 1000)	20%	0.251	94.4	0.251	94.4	0.265	100	100	100
		50%	0.254	93.2	0.254	93.2	0.346	98	98	97
		70%	0.260	87.6	0.259	87.6	0.570	52	48	47

Results are presented as median values over all the missing elements. The best performance in each setting was marked in boldface.

Coverage rates for credible intervals are all approximately 95%, so uncertainty is correctly inferred. Morevoer, our Bayesian independent imputation method performs better than the EM algorithm in terms of the MSE in most of the scenarios. When the dimension of the tensor is |$(10\times 100\times 1000)$|⁠, the two algorithms have similar performance. This matches our expectation since when the total sample size is large enough (for the element-wise missing), the results solved by the frequentist EM algorithm are generally similar to the posterior mean for the Bayesian model with flat priors and independent error. MCMC convergence for the fixed number of iterations is generally achieved, but less so for scenarios with 70% missing fibers.

4.2 Simulation study 2: tensor imputation and correlated error

The second simulation study evaluates the performance of our Bayesian imputation algorithm with correlated error, in terms of the imputed entries or fibers in the tensor. Similar to the first simulation, we generate |$\boldsymbol{\mathscr{X}}$| according to (5) with |$[[{\mathbf{U}^{(1)}},\mathbf{U}^{(2)},\mathbf{U}^{(3)}]]$| as rank-3 underlying structure. To better mimic the condition of our application data, we simulate |$\boldsymbol{\mathscr{X}}$| with dimensions |$10\times 10\times 10,\ 20\times 20\times 20$|⁠, and |$65\times 168\times 6$|⁠. The error terms |$\boldsymbol{\mathscr{E}}$| are generated with separable covariance structure:

$$\boldsymbol{\mathscr{E}}\sim N_{I_1\times I_2\times I_3}(\mathbf{0},\Sigma_1,(\Sigma_2\Sigma_2)^T,(\Sigma_3\Sigma_3^T))$$

where |$\Sigma_i, i=2,3$| are the |$I_2\times I_2$| and |$I_3\times I_3$| covariance matrix with diagonal elements being 0.9 and other elements randomly set to be either 0.3 or –0.3 with equal probability. Following our applications, |$\Sigma_1$| is set to be the |$I_1\times I_1$| diagonal matrix (ie there is no correlation structure among the first dimension) with diagonal elements being 0.5. The different missing patterns and proportions of the tensor elements are the same as in the first simulation study. In this simulation study the rank is set to be 3 and the performance of each algorithm is directly evaluated over the missing elements.

The missing value is imputed using the posterior mean, and we calculate the mean squared error (MSE) and the coverage rate for the 95% credible interval of the missing elements for the two Bayesian methods. For the frequentist EM method, we calculate the MSE for the missing elements as a comparison. We also compare with the missForest method (Stekhoven and Bühlmann 2012) applied to the matricized data along the first mode, as a general approach that does not assume low-rank tensor structure.

The results can be seen in Table 2. For the 70% fiber-wise missing with a dimension of |$(10\times 10\times 10)$|⁠, we do not include the results as only 4% of the total experiments converged. We can see that the Bayesian correlated imputation algorithm outperforms the Bayesian independent imputation algorithm and the frequentist EM algorithm in most of the scenarios in terms of the MSE except for the 70% fiber-wise missing case with dimension |$(65\times168\times6)$|⁠. The coverage of the correlated algorithm is also comparable to the independent algorithm, and both have coverage close to the nominal rate of 95%.

Table 2

Simulation results for study 2: tensor imputation with correlated error.

Missing	Tensor	Missing	BAMITA Correlated		BAMITA Indep	missForest	EM	Corr	Inde	Overall
pattern	dimension	proportion	MSE	MSE low-rank	MSE	MSE	Algorithm	prop	prop	prop
			Coverage(%)	Coverage(%)	Coverage(%)		MSE	(%)	(%)	(%)
Entry missing	(10× 10× 10)	20%	0.009	0.001	0.067	0.617	0.076	100	99	99
			97.4	98.2	94.4		0.076	100	99	99
		50%	0.033	0.007	0.064	0.914	0.110	100	92	92
			95.8	96.3	94.8		0.076	100	99	99
		70%	0.102	0.044	0.113	–	0.415	99	85	84
			93.6	93.7	94.6		0.076	100	99	99
	(20× 20× 20)	20%	0.007	0.001	0.105	0.371	0.105	100	100	100
			97.2	98.3	94.6		0.076	100	99	99
		50%	0.033	0.002	0.107	0.561	0.107	100	100	100
			95.9	97.3	94.8		0.076	100	99	99
		70%	0.083	0.009	0.117	0.886	0.151	100	82	82
			94.4	95.2	94.7		0.076	100	99	99
	(65× 168× 6)	20%	0.050	0.001	0.273	0.364	0.287	95	98	93
			95.8	97.3	94.3		0.076	100	99	99
		50%	0.138	0.005	0.275	0.397	0.277	90	93	84
			95.2	96.5	94.3		0.076	100	99	99
		70%	0.232	0.013	0.293	0.494	0.300	77	94	72
			94.6	95.9	94.3		0.076	100	99	99
Fiber missing	(10× 10× 10)	20%	0.025	0.002	0.065	0.655	0.098	98	97	95
			94.5	98.4	94.7		0.076	100	99	99
		50%	0.085	0.017	0.083	0.988	0.339	75	64	54
			92.7	95.0	94.8		0.076	100	99	99
		70%	–	–	–	–	–	9	16	4
			–	–	–
	(20× 20× 20)	20%	0.027	0.001	0.108	0.387	0.108	100	100	100
			93.2	98.3	94.8		0.076	100	99	99
		50%	0.073	0.003	0.112	0.595	0.145	96	88	85
			91.4	96.9	94.8		0.076	100	99	99
		70%	0.130	0.020	0.145	0.967	0.507	80	36	32
			90.2	92.4	94.4		0.076	100	99	99
	(65× 168× 6)	20%	0.109	0.002	0.286	0.421	0.283	82	99	81
			92.3	98.1	94.7		0.076	100	99	99
		50%	0.316	0.082	0.309	0.515	0.437	82	87	71
			91.9	96.0	94.4		0.076	100	99	99
		70%	0.798	0.538	0.555	0.904	0.761	59	49	33
			90.0	90.0	93.1		0.076	100	99	99

Missing	Tensor	Missing	BAMITA Correlated		BAMITA Indep	missForest	EM	Corr	Inde	Overall
pattern	dimension	proportion	MSE	MSE low-rank	MSE	MSE	Algorithm	prop	prop	prop
			Coverage(%)	Coverage(%)	Coverage(%)		MSE	(%)	(%)	(%)
Entry missing	(10× 10× 10)	20%	0.009	0.001	0.067	0.617	0.076	100	99	99
			97.4	98.2	94.4		0.076	100	99	99
		50%	0.033	0.007	0.064	0.914	0.110	100	92	92
			95.8	96.3	94.8		0.076	100	99	99
		70%	0.102	0.044	0.113	–	0.415	99	85	84
			93.6	93.7	94.6		0.076	100	99	99
	(20× 20× 20)	20%	0.007	0.001	0.105	0.371	0.105	100	100	100
			97.2	98.3	94.6		0.076	100	99	99
		50%	0.033	0.002	0.107	0.561	0.107	100	100	100
			95.9	97.3	94.8		0.076	100	99	99
		70%	0.083	0.009	0.117	0.886	0.151	100	82	82
			94.4	95.2	94.7		0.076	100	99	99
	(65× 168× 6)	20%	0.050	0.001	0.273	0.364	0.287	95	98	93
			95.8	97.3	94.3		0.076	100	99	99
		50%	0.138	0.005	0.275	0.397	0.277	90	93	84
			95.2	96.5	94.3		0.076	100	99	99
		70%	0.232	0.013	0.293	0.494	0.300	77	94	72
			94.6	95.9	94.3		0.076	100	99	99
Fiber missing	(10× 10× 10)	20%	0.025	0.002	0.065	0.655	0.098	98	97	95
			94.5	98.4	94.7		0.076	100	99	99
		50%	0.085	0.017	0.083	0.988	0.339	75	64	54
			92.7	95.0	94.8		0.076	100	99	99
		70%	–	–	–	–	–	9	16	4
			–	–	–
	(20× 20× 20)	20%	0.027	0.001	0.108	0.387	0.108	100	100	100
			93.2	98.3	94.8		0.076	100	99	99
		50%	0.073	0.003	0.112	0.595	0.145	96	88	85
			91.4	96.9	94.8		0.076	100	99	99
		70%	0.130	0.020	0.145	0.967	0.507	80	36	32
			90.2	92.4	94.4		0.076	100	99	99
	(65× 168× 6)	20%	0.109	0.002	0.286	0.421	0.283	82	99	81
			92.3	98.1	94.7		0.076	100	99	99
		50%	0.316	0.082	0.309	0.515	0.437	82	87	71
			91.9	96.0	94.4		0.076	100	99	99
		70%	0.798	0.538	0.555	0.904	0.761	59	49	33
			90.0	90.0	93.1		0.076	100	99	99

Results are presented as median values over all the missing elements.

Table 2

Simulation results for study 2: tensor imputation with correlated error.

Missing	Tensor	Missing	BAMITA Correlated		BAMITA Indep	missForest	EM	Corr	Inde	Overall
pattern	dimension	proportion	MSE	MSE low-rank	MSE	MSE	Algorithm	prop	prop	prop
			Coverage(%)	Coverage(%)	Coverage(%)		MSE	(%)	(%)	(%)
Entry missing	(10× 10× 10)	20%	0.009	0.001	0.067	0.617	0.076	100	99	99
			97.4	98.2	94.4		0.076	100	99	99
		50%	0.033	0.007	0.064	0.914	0.110	100	92	92
			95.8	96.3	94.8		0.076	100	99	99
		70%	0.102	0.044	0.113	–	0.415	99	85	84
			93.6	93.7	94.6		0.076	100	99	99
	(20× 20× 20)	20%	0.007	0.001	0.105	0.371	0.105	100	100	100
			97.2	98.3	94.6		0.076	100	99	99
		50%	0.033	0.002	0.107	0.561	0.107	100	100	100
			95.9	97.3	94.8		0.076	100	99	99
		70%	0.083	0.009	0.117	0.886	0.151	100	82	82
			94.4	95.2	94.7		0.076	100	99	99
	(65× 168× 6)	20%	0.050	0.001	0.273	0.364	0.287	95	98	93
			95.8	97.3	94.3		0.076	100	99	99
		50%	0.138	0.005	0.275	0.397	0.277	90	93	84
			95.2	96.5	94.3		0.076	100	99	99
		70%	0.232	0.013	0.293	0.494	0.300	77	94	72
			94.6	95.9	94.3		0.076	100	99	99
Fiber missing	(10× 10× 10)	20%	0.025	0.002	0.065	0.655	0.098	98	97	95
			94.5	98.4	94.7		0.076	100	99	99
		50%	0.085	0.017	0.083	0.988	0.339	75	64	54
			92.7	95.0	94.8		0.076	100	99	99
		70%	–	–	–	–	–	9	16	4
			–	–	–
	(20× 20× 20)	20%	0.027	0.001	0.108	0.387	0.108	100	100	100
			93.2	98.3	94.8		0.076	100	99	99
		50%	0.073	0.003	0.112	0.595	0.145	96	88	85
			91.4	96.9	94.8		0.076	100	99	99
		70%	0.130	0.020	0.145	0.967	0.507	80	36	32
			90.2	92.4	94.4		0.076	100	99	99
	(65× 168× 6)	20%	0.109	0.002	0.286	0.421	0.283	82	99	81
			92.3	98.1	94.7		0.076	100	99	99
		50%	0.316	0.082	0.309	0.515	0.437	82	87	71
			91.9	96.0	94.4		0.076	100	99	99
		70%	0.798	0.538	0.555	0.904	0.761	59	49	33
			90.0	90.0	93.1		0.076	100	99	99

Missing	Tensor	Missing	BAMITA Correlated		BAMITA Indep	missForest	EM	Corr	Inde	Overall
pattern	dimension	proportion	MSE	MSE low-rank	MSE	MSE	Algorithm	prop	prop	prop
			Coverage(%)	Coverage(%)	Coverage(%)		MSE	(%)	(%)	(%)
Entry missing	(10× 10× 10)	20%	0.009	0.001	0.067	0.617	0.076	100	99	99
			97.4	98.2	94.4		0.076	100	99	99
		50%	0.033	0.007	0.064	0.914	0.110	100	92	92
			95.8	96.3	94.8		0.076	100	99	99
		70%	0.102	0.044	0.113	–	0.415	99	85	84
			93.6	93.7	94.6		0.076	100	99	99
	(20× 20× 20)	20%	0.007	0.001	0.105	0.371	0.105	100	100	100
			97.2	98.3	94.6		0.076	100	99	99
		50%	0.033	0.002	0.107	0.561	0.107	100	100	100
			95.9	97.3	94.8		0.076	100	99	99
		70%	0.083	0.009	0.117	0.886	0.151	100	82	82
			94.4	95.2	94.7		0.076	100	99	99
	(65× 168× 6)	20%	0.050	0.001	0.273	0.364	0.287	95	98	93
			95.8	97.3	94.3		0.076	100	99	99
		50%	0.138	0.005	0.275	0.397	0.277	90	93	84
			95.2	96.5	94.3		0.076	100	99	99
		70%	0.232	0.013	0.293	0.494	0.300	77	94	72
			94.6	95.9	94.3		0.076	100	99	99
Fiber missing	(10× 10× 10)	20%	0.025	0.002	0.065	0.655	0.098	98	97	95
			94.5	98.4	94.7		0.076	100	99	99
		50%	0.085	0.017	0.083	0.988	0.339	75	64	54
			92.7	95.0	94.8		0.076	100	99	99
		70%	–	–	–	–	–	9	16	4
			–	–	–
	(20× 20× 20)	20%	0.027	0.001	0.108	0.387	0.108	100	100	100
			93.2	98.3	94.8		0.076	100	99	99
		50%	0.073	0.003	0.112	0.595	0.145	96	88	85
			91.4	96.9	94.8		0.076	100	99	99
		70%	0.130	0.020	0.145	0.967	0.507	80	36	32
			90.2	92.4	94.4		0.076	100	99	99
	(65× 168× 6)	20%	0.109	0.002	0.286	0.421	0.283	82	99	81
			92.3	98.1	94.7		0.076	100	99	99
		50%	0.316	0.082	0.309	0.515	0.437	82	87	71
			91.9	96.0	94.4		0.076	100	99	99
		70%	0.798	0.538	0.555	0.904	0.761	59	49	33
			90.0	90.0	93.1		0.076	100	99	99

Results are presented as median values over all the missing elements.

4.3 Simulation study 3: imputation for function of a fiber

In the third simulation study, we examine whether an arbitrary function of the entire fiber can be reasonably captured when the data is generated with a correlation structure. This is motivated by our data applications where, instead of focusing on the imputed data entries, we are interested in the alpha diversity calculated as a function of the entire mode-2 fiber for each subject and each time point (see section 5 for more detail). We argue that although the MSE of each imputed element is mostly adopted as the metric for evaluating the imputation performance, the “structure” of the imputed data slice (ie fiber) is also of interest for downstream analyses of the tensor.

The data generating mechanism is similar to that of the second simulation study except that the covariance matrix |$\Sigma_1$| and |$\Sigma_3$| are now identity matrices with dimensions I₂ and I₃ respectively, and |$\Sigma_2$| is a matrix with diagonal elements being 1 and other non-diagonal elements being 0.15. Under the data-generating mechanism, we do not expect the Bayesian correlated algorithm to outperform the independent algorithm in terms of the point-wise imputation error since the other modes (modes 1 and 3) are uncorrelated. However, with the additional model of the covariance structure for mode 2, our correlated algorithm may better capture the variation of the function of the imputed mode 2 fiber. To evaluate this, for each imputed fiber |$\hat{\boldsymbol{\mathscr{X}}}[i,\cdot,k]$|⁠, we calculate the linear predictor |$\beta_b\hat{\boldsymbol{\mathscr{X}}}[i,\cdot,k]$| with randomly generated coefficient β_b, |$b=1,\ldots,100$|⁠. Then we evaluate the mean MSE and coverage (only for fiber-wise missing) for the linear predictors over the 100 randomly generated coefficients β_b. The results are displayed as“MSE(Fiber)” in Table 3.

Table 3

Simulation results for study 3: imputation for a function of a fiber.

Missing	Tensor	Missing	BAMITA		BAMITA		EM Algorithm	Converged	Converged	Converged
pattern	dimension	proportion	Correlated		Independent		True Rank	Proportion	Proportion	Proportion
			MSE	MSE	MSE	MSE	MSE
			(Imputation)	(Fiber)	(Imputation)	(Fiber)	(Imputation)	Correlated	Independent	Overall
			Coverage(%)	Coverage(%)	Coverage(%)	Coverage(%)	-	-	-	-
Entry missing	(10× 10× 10)	20%	0.409	–	0.314	–	0.322	100	100	100
			88.0	78.0	95.0	53.0	–
		50%	0.396	–	0.327	–	0.423	100	97	97
			91.3	90.0	95.0	72.0	–
		70%	0.596	–	0.466	–	0.776	98	99	97
			90.7	89	94.9	83.5	–
	(20× 20× 20)	20%	0.266	–	0.260	–	0.265	100	100	100
			92.2	91.2	94.8	36.2	–
		50%	0.284	–	0.274	–	0.282	100	100	100
			92.0	91.2	94.9	53.2	–
		70%	0.278	–	0.272	–	0.326	100	86	86
			93.0	91.0	94.9	62.3	–
	(65× 168× 6)	20%	0.287	–	0.260	–	0.270	100	99	99
			91.9	91.8	94.8	13.8	–
		50%	0.268	–	0.271	–	0.299	100	98	98
			92.9	92.6	94.8	22.3	–
		70%	0.250	–	0.278	–	0.305	100	96	96
			93.6	92.1	94.7	26.0	–
Fiber missing	(10× 10× 10)	20%	0.404	0.767	0.311	0.617	0.332	99	100	99
			90.0	90.2	94.8	83.0	–
		50%	0.875	1.122	0.514	0.915	0.612	78	92	73
			89.5	87.1	94.6	82.8	–
		70%	2.609	5.153	4.041	3.510	0.964	9	39	7
			86.6	83.8	93.2	84.2	–
	(20× 20× 20)	20%	0.285	0.668	0.266	0.616	0.273	100	100	100
			93.0	90.3	94.9	67.9	–
		50%	0.320	0.766	0.285	0.701	0.367	100	91	91
			92.3	87.4	94.8	68.4	–
		70%	0.430	0.837	0.369	0.769	0.645	93	61	59
			91.8	86.1	94.6	69.6	–
	(65× 168× 6)	20%	0.261	0.940	0.259	0.947	0.288	100	99	99
			94.6	80.0	94.8	29.4	–
		50%	0.465	0.990	0.337	0.958	0.431	89	85	77
			92.9	76.3	94.6	31.0	–
		70%	0.869	1.005	0.540	0.979	0.686	32	43	15
			90.7	71.8	93.2	32.0	–

Missing	Tensor	Missing	BAMITA		BAMITA		EM Algorithm	Converged	Converged	Converged
pattern	dimension	proportion	Correlated		Independent		True Rank	Proportion	Proportion	Proportion
			MSE	MSE	MSE	MSE	MSE
			(Imputation)	(Fiber)	(Imputation)	(Fiber)	(Imputation)	Correlated	Independent	Overall
			Coverage(%)	Coverage(%)	Coverage(%)	Coverage(%)	-	-	-	-
Entry missing	(10× 10× 10)	20%	0.409	–	0.314	–	0.322	100	100	100
			88.0	78.0	95.0	53.0	–
		50%	0.396	–	0.327	–	0.423	100	97	97
			91.3	90.0	95.0	72.0	–
		70%	0.596	–	0.466	–	0.776	98	99	97
			90.7	89	94.9	83.5	–
	(20× 20× 20)	20%	0.266	–	0.260	–	0.265	100	100	100
			92.2	91.2	94.8	36.2	–
		50%	0.284	–	0.274	–	0.282	100	100	100
			92.0	91.2	94.9	53.2	–
		70%	0.278	–	0.272	–	0.326	100	86	86
			93.0	91.0	94.9	62.3	–
	(65× 168× 6)	20%	0.287	–	0.260	–	0.270	100	99	99
			91.9	91.8	94.8	13.8	–
		50%	0.268	–	0.271	–	0.299	100	98	98
			92.9	92.6	94.8	22.3	–
		70%	0.250	–	0.278	–	0.305	100	96	96
			93.6	92.1	94.7	26.0	–
Fiber missing	(10× 10× 10)	20%	0.404	0.767	0.311	0.617	0.332	99	100	99
			90.0	90.2	94.8	83.0	–
		50%	0.875	1.122	0.514	0.915	0.612	78	92	73
			89.5	87.1	94.6	82.8	–
		70%	2.609	5.153	4.041	3.510	0.964	9	39	7
			86.6	83.8	93.2	84.2	–
	(20× 20× 20)	20%	0.285	0.668	0.266	0.616	0.273	100	100	100
			93.0	90.3	94.9	67.9	–
		50%	0.320	0.766	0.285	0.701	0.367	100	91	91
			92.3	87.4	94.8	68.4	–
		70%	0.430	0.837	0.369	0.769	0.645	93	61	59
			91.8	86.1	94.6	69.6	–
	(65× 168× 6)	20%	0.261	0.940	0.259	0.947	0.288	100	99	99
			94.6	80.0	94.8	29.4	–
		50%	0.465	0.990	0.337	0.958	0.431	89	85	77
			92.9	76.3	94.6	31.0	–
		70%	0.869	1.005	0.540	0.979	0.686	32	43	15
			90.7	71.8	93.2	32.0	–

Results are presented as median values over all the missing elements. The best performance in each setting is marked in boldface.

Table 3

Simulation results for study 3: imputation for a function of a fiber.

Missing	Tensor	Missing	BAMITA		BAMITA		EM Algorithm	Converged	Converged	Converged
pattern	dimension	proportion	Correlated		Independent		True Rank	Proportion	Proportion	Proportion
			MSE	MSE	MSE	MSE	MSE
			(Imputation)	(Fiber)	(Imputation)	(Fiber)	(Imputation)	Correlated	Independent	Overall
			Coverage(%)	Coverage(%)	Coverage(%)	Coverage(%)	-	-	-	-
Entry missing	(10× 10× 10)	20%	0.409	–	0.314	–	0.322	100	100	100
			88.0	78.0	95.0	53.0	–
		50%	0.396	–	0.327	–	0.423	100	97	97
			91.3	90.0	95.0	72.0	–
		70%	0.596	–	0.466	–	0.776	98	99	97
			90.7	89	94.9	83.5	–
	(20× 20× 20)	20%	0.266	–	0.260	–	0.265	100	100	100
			92.2	91.2	94.8	36.2	–
		50%	0.284	–	0.274	–	0.282	100	100	100
			92.0	91.2	94.9	53.2	–
		70%	0.278	–	0.272	–	0.326	100	86	86
			93.0	91.0	94.9	62.3	–
	(65× 168× 6)	20%	0.287	–	0.260	–	0.270	100	99	99
			91.9	91.8	94.8	13.8	–
		50%	0.268	–	0.271	–	0.299	100	98	98
			92.9	92.6	94.8	22.3	–
		70%	0.250	–	0.278	–	0.305	100	96	96
			93.6	92.1	94.7	26.0	–
Fiber missing	(10× 10× 10)	20%	0.404	0.767	0.311	0.617	0.332	99	100	99
			90.0	90.2	94.8	83.0	–
		50%	0.875	1.122	0.514	0.915	0.612	78	92	73
			89.5	87.1	94.6	82.8	–
		70%	2.609	5.153	4.041	3.510	0.964	9	39	7
			86.6	83.8	93.2	84.2	–
	(20× 20× 20)	20%	0.285	0.668	0.266	0.616	0.273	100	100	100
			93.0	90.3	94.9	67.9	–
		50%	0.320	0.766	0.285	0.701	0.367	100	91	91
			92.3	87.4	94.8	68.4	–
		70%	0.430	0.837	0.369	0.769	0.645	93	61	59
			91.8	86.1	94.6	69.6	–
	(65× 168× 6)	20%	0.261	0.940	0.259	0.947	0.288	100	99	99
			94.6	80.0	94.8	29.4	–
		50%	0.465	0.990	0.337	0.958	0.431	89	85	77
			92.9	76.3	94.6	31.0	–
		70%	0.869	1.005	0.540	0.979	0.686	32	43	15
			90.7	71.8	93.2	32.0	–

Missing	Tensor	Missing	BAMITA		BAMITA		EM Algorithm	Converged	Converged	Converged
pattern	dimension	proportion	Correlated		Independent		True Rank	Proportion	Proportion	Proportion
			MSE	MSE	MSE	MSE	MSE
			(Imputation)	(Fiber)	(Imputation)	(Fiber)	(Imputation)	Correlated	Independent	Overall
			Coverage(%)	Coverage(%)	Coverage(%)	Coverage(%)	-	-	-	-
Entry missing	(10× 10× 10)	20%	0.409	–	0.314	–	0.322	100	100	100
			88.0	78.0	95.0	53.0	–
		50%	0.396	–	0.327	–	0.423	100	97	97
			91.3	90.0	95.0	72.0	–
		70%	0.596	–	0.466	–	0.776	98	99	97
			90.7	89	94.9	83.5	–
	(20× 20× 20)	20%	0.266	–	0.260	–	0.265	100	100	100
			92.2	91.2	94.8	36.2	–
		50%	0.284	–	0.274	–	0.282	100	100	100
			92.0	91.2	94.9	53.2	–
		70%	0.278	–	0.272	–	0.326	100	86	86
			93.0	91.0	94.9	62.3	–
	(65× 168× 6)	20%	0.287	–	0.260	–	0.270	100	99	99
			91.9	91.8	94.8	13.8	–
		50%	0.268	–	0.271	–	0.299	100	98	98
			92.9	92.6	94.8	22.3	–
		70%	0.250	–	0.278	–	0.305	100	96	96
			93.6	92.1	94.7	26.0	–
Fiber missing	(10× 10× 10)	20%	0.404	0.767	0.311	0.617	0.332	99	100	99
			90.0	90.2	94.8	83.0	–
		50%	0.875	1.122	0.514	0.915	0.612	78	92	73
			89.5	87.1	94.6	82.8	–
		70%	2.609	5.153	4.041	3.510	0.964	9	39	7
			86.6	83.8	93.2	84.2	–
	(20× 20× 20)	20%	0.285	0.668	0.266	0.616	0.273	100	100	100
			93.0	90.3	94.9	67.9	–
		50%	0.320	0.766	0.285	0.701	0.367	100	91	91
			92.3	87.4	94.8	68.4	–
		70%	0.430	0.837	0.369	0.769	0.645	93	61	59
			91.8	86.1	94.6	69.6	–
	(65× 168× 6)	20%	0.261	0.940	0.259	0.947	0.288	100	99	99
			94.6	80.0	94.8	29.4	–
		50%	0.465	0.990	0.337	0.958	0.431	89	85	77
			92.9	76.3	94.6	31.0	–
		70%	0.869	1.005	0.540	0.979	0.686	32	43	15
			90.7	71.8	93.2	32.0	–

Results are presented as median values over all the missing elements. The best performance in each setting is marked in boldface.

The results are shown in Table 3, from which we can see that, although the Bayesian independent algorithm has a relatively good coverage rate for each imputed element, the mean coverage for the random linear combination of each imputed fiber is not ideal, especially in the high-dimension cases. The Bayesian correlated algorithm, on the other hand, has substantially better coverage performance in terms of the imputed fiber.

5 Infant gut microbiome application

The gut houses a rich array of microbial organisms, presenting a diverse landscape. This dynamic ecosystem holds potential as significant indicators for both digestive and broader health conditions. We consider a longitudinal study of the gut microbiome on 52 infants in the neonatal intensive care unit (NICU), where stool samples were collected over the first 3 mo of life (Cong et al. 2017). Using 16s rRNA sequencing technology, we obtained microbiome data, which we aggregated to the genus level, yielding 152 distinct genera. Employing standard preprocessing techniques, we addressed zero values by introducing pseudo counts, transformed the data into compositional profiles, and applied the centered log-ratio (clr) transformation. Data were aggregated every 5 consecutive days, yielding 30 time intervals. Consequently, we obtained a tensor data array with dimensions of |$52\times 152\times 30$|⁠. However, due to the unavailability of samples from every infant at every time point, the tensor array exhibits a fiber-wise missing structure, with approximately 71% of the samples being absent. Our objective is to employ BAMITA to address missing values and assess the dynamic diversity in the microbiome over this population. Before applying our algorithm to this dataset, we first check the suitability of the normality assumption and the separable covariance assumption. Results are presented in Section S2, and suggest that the two assumptions are reasonable for these data. The microbiome diversity is often measured using alpha diversity which refers to diversity on a local scale (Thukral 2017). Here, we compute diversity using the Shannon-Wiener index (Shannon 1948). For a given fiber of the data X|$_{i,k}$|⁠, the Shannon alpha diversity is calculated as |$-\sum_{j=1}^{152} p_j\log(p_j)$| where |$p_j=\frac{exp(\text{ClrX}_{i,j,k})}{\sum_{j=1}^{152}exp(\text{ClrX}_{i,j,k})}$| is the proportion of element j in the sample.

We impute the missing fibers of the microbiome tensor array with the two proposed Bayesian algorithms and the EM algorithm approach. For the Bayesian multiple imputation algorithm with separable covariance structure error, we assume independence in the first dimension (the 52 infants) but account for correlation across time and genera. The performance of the different algorithms is evaluated through cross-validation. For each of 200 simulation experiments, we randomly hold an additional 25% of the observed fibers out and calculate the mean squared error of the imputed values and the true observed value. Shannon diversity is also computed for the missing fibers at each MCMC iteration, and they are compared to the observed diversity measures for the validation set. For the Bayesian imputation with covariance structure (BAMITA Correlated), we also evaluate the MSE for the estimated low-rank structure, ie |$[[{(\mathbf{U}^{1})},\ldots, (\mathbf{U}^{N})]]$|⁠, which represent the imputation without adjustment of covariance structure.

The results are summarized in Table 4. The low-rank Bayesian imputation approach with correlation structure performs substantially better than other approaches with respect to MSE for both imputed values in the fiber and Shannon diversity. This suggests that the data have a strong low-rank structure, with substantial correlation in the residual covariance. Moreover, while coverage rates are appropriate for the fiber-wise entries under both models, coverage for Shannon diversity is much higher for the correlated model. This illustrates the notion that accurately inferring uncertainty in the marginal distribution for the entries of an array does not imply uncertainty will be accurately inferred for multivariate functions that are used in downstream analysis. The MSE for the low-rank structure in the correlated algorithm is similar to the MSE of the independent algorithm, indicating that adjusting the covariance structure helps the imputation performance.

Table 4

Fiber-wise imputation results under cross-validation for the neonatal ClrX application.

Number of	BAMITA			BAMITA		EM
components	Correlated			Independent		Algorithm
	Imputation	Low-rank	Shannon	Imputation	Shannon	Imputation
	MSE	MSE	MSE	MSE	MSE	MSE
	(Coverage)		(Coverage)	(Coverage)	(Coverage)
1	0.340	0.583	0.744	0.560	1.834	0.569
	(95.4)		(84.3)	(94.1)	(55.3)
2	0.345	0.570	0.698	0.533	1.471	0.557
	(95.3)		(84.4)	(94.3)	(55.0)
3	0.353	0.561	0.665	0.518	1.035	0.554
	(95.1)		(84.2)	(94.5)	(63.0)
4	0.365	0.547	0.635	0.505	0.924	0.543
	(95.0)		(81.9)	(94.5)	(63.3)
5	0.384	0.552	0.611	0.505	0.903	0.555
	(94.8)		(82.0)	(94.3)	(61.7)
6	0.391	0.547	0.608	0.497	0.874	0.554
	(94.6)		(81.8)	(94.2)	(60.0)
7	0.423	0.576	0.608	0.496	0.827	0.555
	(94.3)		(80.9)	(94.1)	(59.7)
8	0.439	0.578	0.605	0.499	0.794	0.554
	(94.2)		(80.8)	(94.1)	(59.1)

Number of	BAMITA			BAMITA		EM
components	Correlated			Independent		Algorithm
	Imputation	Low-rank	Shannon	Imputation	Shannon	Imputation
	MSE	MSE	MSE	MSE	MSE	MSE
	(Coverage)		(Coverage)	(Coverage)	(Coverage)
1	0.340	0.583	0.744	0.560	1.834	0.569
	(95.4)		(84.3)	(94.1)	(55.3)
2	0.345	0.570	0.698	0.533	1.471	0.557
	(95.3)		(84.4)	(94.3)	(55.0)
3	0.353	0.561	0.665	0.518	1.035	0.554
	(95.1)		(84.2)	(94.5)	(63.0)
4	0.365	0.547	0.635	0.505	0.924	0.543
	(95.0)		(81.9)	(94.5)	(63.3)
5	0.384	0.552	0.611	0.505	0.903	0.555
	(94.8)		(82.0)	(94.3)	(61.7)
6	0.391	0.547	0.608	0.497	0.874	0.554
	(94.6)		(81.8)	(94.2)	(60.0)
7	0.423	0.576	0.608	0.496	0.827	0.555
	(94.3)		(80.9)	(94.1)	(59.7)
8	0.439	0.578	0.605	0.499	0.794	0.554
	(94.2)		(80.8)	(94.1)	(59.1)

‘Imputation MSE’ gives relative MSE for held-out values in the tensor, ’low-rank MSE’ gives relative MSE when imputing via the low-rank term only in the correlated model, and ’Shannon MSE’ gives MSE for imputed Shannon entropy for held-out fibers. Coverage rates for 95% credible intervals are shown in parentheses.

Table 4

Open in new tab Download slide

Fiber-wise imputation results under cross-validation for the neonatal ClrX application.

Number of	BAMITA			BAMITA		EM
components	Correlated			Independent		Algorithm
	Imputation	Low-rank	Shannon	Imputation	Shannon	Imputation
	MSE	MSE	MSE	MSE	MSE	MSE
	(Coverage)		(Coverage)	(Coverage)	(Coverage)
1	0.340	0.583	0.744	0.560	1.834	0.569
	(95.4)		(84.3)	(94.1)	(55.3)
2	0.345	0.570	0.698	0.533	1.471	0.557
	(95.3)		(84.4)	(94.3)	(55.0)
3	0.353	0.561	0.665	0.518	1.035	0.554
	(95.1)		(84.2)	(94.5)	(63.0)
4	0.365	0.547	0.635	0.505	0.924	0.543
	(95.0)		(81.9)	(94.5)	(63.3)
5	0.384	0.552	0.611	0.505	0.903	0.555
	(94.8)		(82.0)	(94.3)	(61.7)
6	0.391	0.547	0.608	0.497	0.874	0.554
	(94.6)		(81.8)	(94.2)	(60.0)
7	0.423	0.576	0.608	0.496	0.827	0.555
	(94.3)		(80.9)	(94.1)	(59.7)
8	0.439	0.578	0.605	0.499	0.794	0.554
	(94.2)		(80.8)	(94.1)	(59.1)

Number of	BAMITA			BAMITA		EM
components	Correlated			Independent		Algorithm
	Imputation	Low-rank	Shannon	Imputation	Shannon	Imputation
	MSE	MSE	MSE	MSE	MSE	MSE
	(Coverage)		(Coverage)	(Coverage)	(Coverage)
1	0.340	0.583	0.744	0.560	1.834	0.569
	(95.4)		(84.3)	(94.1)	(55.3)
2	0.345	0.570	0.698	0.533	1.471	0.557
	(95.3)		(84.4)	(94.3)	(55.0)
3	0.353	0.561	0.665	0.518	1.035	0.554
	(95.1)		(84.2)	(94.5)	(63.0)
4	0.365	0.547	0.635	0.505	0.924	0.543
	(95.0)		(81.9)	(94.5)	(63.3)
5	0.384	0.552	0.611	0.505	0.903	0.555
	(94.8)		(82.0)	(94.3)	(61.7)
6	0.391	0.547	0.608	0.497	0.874	0.554
	(94.6)		(81.8)	(94.2)	(60.0)
7	0.423	0.576	0.608	0.496	0.827	0.555
	(94.3)		(80.9)	(94.1)	(59.7)
8	0.439	0.578	0.605	0.499	0.794	0.554
	(94.2)		(80.8)	(94.1)	(59.1)

‘Imputation MSE’ gives relative MSE for held-out values in the tensor, ’low-rank MSE’ gives relative MSE when imputing via the low-rank term only in the correlated model, and ’Shannon MSE’ gives MSE for imputed Shannon entropy for held-out fibers. Coverage rates for 95% credible intervals are shown in parentheses.

We apply the rank-1 correlated model to the full data, with no validation set, to infer trends in microbiome diversity over time. We consider three approaches to generate uncertainty bounds for the mean diversity at each time point. For approach 1., we impute the missing data at their posterior mean, treat the values as fixed, and create 95% confidence intervals using the classical frequentist approach via a t-distribution. For approach 2., we use only the observed data at each timepoint, and create a 95% interval via a t-distribution. Note the t-interval for a mean is equivalent to a Bayesian credible interval with a uniform prior on the mean and log-uniform prior on the variance of the values. Thus, for approach 3. we use the posterior samples to propagate uncertainty from the imputation step and generate credible intervals for the full model. That is, let |$\text{mean}(\alpha_{k,t})$| and |$\text{sd}{(\alpha_{k,t})}$| be the sample mean and standard deviation for diversity at timepoint k and MCMC iteration t, including observed values and imputed values simulated from the posterior. Then, we simulate a value for the population mean |$\bar{\alpha}_{k,t}$| via

$$\bar{\alpha}_{k,t}=\text{mean}(\alpha_{k,t})+\text{sd}{(\alpha_{k,t})} T_{51}$$

where T₅₁ is a t-distributed random variable with 51 degrees of freedom. Intervals obtained via the quantiles of the |$\bar{\alpha}_{k,t}$| will then properly account for both variability in the imputed values and sampling variability.

The resulting trends for Shannon diversity, with 95% confidence/credible bounds, are shown in Fig. 1. The confidence bounds generated using the point-imputed data are substantially more narrow than the bounds generated using multiple imputation in some instances, illustrating the danger of underestimating uncertainty if imputed data are treated as fixed and known. In contrast, the bounds generated using only the observed data are sporadic and very wide in some cased, illustrating the disadvantages of completely ignoring missing data. The multiple imputation approach is a principled compromise between these two extremes, and show a pattern in which diversity decreases over the first few days of life in the NICU and then stays relatively constant.

Fig. 1

Estimates for Shannon diversity over time for the neonatal infant ClrX data. The left panel gives the mean under imputation and credible intervals generated using either point or multiple imputation, the right panel gives the mean and credible interval for observed data only.

Section S1 presents another application to longitudinal microbiome data for an experiment in mice. This application similarly illustrates the advantages of the correlated BAMITA method for imputation accuracy and uncertainty propagation.

6 Discussion

Our results demonstrate the advantages of accounting for residual covariance and uncertainty when imputing missing values in tensor data. While the motivating application for the development of BAMITA was longitudinal microbiome data, the model is generally applicable to a wide variety of application scenarios. Aspects of the model and sampling algorithm may be modified or extended, e.g. to capture spatiotemporal structure in relevant modes (Yokota et al. 2016; Guan 2024) rather than a general covariance. Moreover, the assumption of normality for the residual error may be relaxed. For example, a tensor modeling approach is often effective for multi-condition RNA-Seq gene expression data (Hore et al. 2016), which may have a Poisson or negative binomial distribution.

In our data applications we selected the rank using cross-validation (Algorithm 2). Thus, subsequent analyses with the selected rank for the same data are prone to post-selection inference. However, as just one parameter (the rank) is empirically estimated, over fitting is not a major concern. For example, in our simulation, coverage rates are appropriate when the rank is selected through cross-validation. Nevertheless, a possible extension is to infer the number of components R (ie rank) as a parameter within the Bayesian model to account for its uncertainty, using reversible jump MCMC (Frühwirth-Schnatter et al. 2024) or other approaches. However, this would increase the computational complexity of the approach.

To make our algorithm applicable in various data application scenarios, we adopted the flat prior for the underlying parameters U which is independent of the data scale. However, under the specific data application scenarios, informative Gaussian priors can also be considered in our algorithm as a conjugate prior. One future topic is to extend our algorithm with the informative Gaussian priors on the underlying structure elements |$\boldsymbol{U}^{(1)},\boldsymbol{U}^{(2)}, \ldots,\boldsymbol{U}^{(N)}$|⁠.

In our real data applications, we use cross-validation to decide whether to adopt the Bayesian multiple imputation algorithm with the separable covariance structure or the independent structure, using mean squared error (MSE) of the imputed elements. However, alternative Bayesian model selection criteria, such as the deviance information criterion (DIC) (Spiegelhalter et al. 2002), could also be used to select the most appropriate model.

Computing time is often a bottleneck to fully Bayesian inference for high-dimensional data. We have carefully specified our models to facilitate efficient Gibbs sampling in high-dimensions. However, the model with independence error allows for a much more efficient algorithm; our largest dataset, described in Section 5, took several hours to run under the correlated model and under 10 min with independent error. Thus, this is a trade-off to modeling the residual covariance in higher dimensions.

Supplementary material

Supplementary material is available at Biostatistics Journal online.

Funding

This work was supported in part by National Institutes of Health grants R01-HG010731 and R01-GM130622.

Conflict of interest

None declared.

Acknowledgments

The authors thank the anonymous reviewers for their valuable suggestions.

References

Acar

E

,

Dunlavy

DM

,

Kolda

TG

,

Morup

M.

2011

.

Scalable tensor factorizations for incomplete data

.

Chemom Intell Lab Syst

.

106

:

41

–

56

.

Carroll

JD

,

Chang

J-J.

1970

.

Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition

.

Psychometrika

.

35

:

283

–

319

.

Chen

X

,

He

Z

,

Sun

L.

2019

.

A Bayesian tensor decomposition approach for spatiotemporal traffic data imputation

.

Transport Res C Emerging Technol

.

98

:

73

–

84

.

Chen

Y-L

,

Hsu

C-T

,

Liao

H-YM.

2013

.

Simultaneous tensor decomposition and completion using factor priors

.

IEEE Trans Pattern Anal Mach Intell

.

36

:

577

–

591

.

Cong

X

,

Judge

M

,

Xu

W

,

Diallo

A

,

Janton

S

,

Brownell

EA

,

Maas

K

,

Graf

J.

2017

.

Influence of feeding type on gut microbiome development in hospitalized preterm infants

.

Nursing Res

.

66

:

123

–

133

.

Frühwirth-Schnatter

S

,

Hosszejni

D

,

Lopes

HF.

2024

.

Sparse Bayesian factor analysis when the number of factors is unknown

.

Bayesian Anal.

1

:

1

–

31

.

Guan

L.

2024

.

Smooth and probabilistic parafac model with auxiliary covariates

.

J Comput Graph Stat

.

33

:

538

–

550

.

Guhaniyogi

R

,

Qamar

S

,

Dunson

DB.

2017

.

Bayesian tensor regression

.

J Mach Learn Res

.

18

:

1

–

31

.

Hoff

PD.

2011

.

Separable covariance arrays via the Tucker product, with applications to multivariate relational data

.

Bayesian Anal.

6

:

179

–

196

.

Hoff

PD.

2015

.

Multilinear tensor regression for longitudinal relational data

.

Annals Appl Stat

.

9

:

1169

.

Hore

V

,

Vinuela

A

,

Buil

A

,

Knight

J

,

McCarthy

MI

,

Small

K

,

Marchini

J.

2016

.

Tensor decomposition for multiple-tissue gene expression experiments

.

Nat Genet

.

48

:

1094

–

1100

.

Kolda

TG

,

Bader

BW.

2009

.

Tensor decompositions and applications

.

SIAM Rev.

51

:

455

–

500

.

Liu

J

,

Musialski

P

,

Wonka

P

,

Ye

J.

2012

.

Tensor completion for estimating missing values in visual data

.

IEEE Trans Pattern Anal Mach Intell

.

35

:

208

–

220

.

Mazumder

R

,

Hastie

T

,

Tibshirani

R.

2010

.

Spectral regularization algorithms for learning large incomplete matrices

.

J Mach Learn Res

.

11

:

2287

–

2322

.

PubMed

Salakhutdinov

R

,

Mnih

A.

2008

. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In: Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland. New York NY, USA: Association for Computing Machinery. p.

880

–

887

.

Shannon

CE.

1948

.

A mathematical theory of communication

.

Bell Syst Techn J

.

27

:

379

–

423

.

Spiegelhalter

DJ

,

Best

NG

,

Carlin

BP

,

Van Der Linde

A.

2002

.

Bayesian measures of model complexity and fit

.

J R Stat Soc Ser B (Stat Methodol

).

64

:

583

–

639

.

Stekhoven

DJ

,

Bühlmann

P.

2012

.

Missforest—non-parametric missing value imputation for mixed-type data

.

Bioinformatics

.

28

:

112

–

118

.

Tan

H

,

Feng

G

,

Feng

J

,

Wang

W

,

Zhang

Y-J

,

Li

F.

2013

.

A tensor-based method for missing traffic data completion

.

Trans Res C Emerg Technol

.

28

:

15

–

27

.

Thukral

AK.

2017

.

A review on measurement of alpha diversity in biology

.

Agric Res J

.

54

:

1

–

10

.

Tucker

LR.

1966

.

Some mathematical notes on three-mode factor analysis

.

Psychometrika.

31

:

279

–

311

.

Wang

K

,

Xu

Y.

2024

.

Bayesian tensor-on-tensor regression with efficient computation

.

Stat Its Interface.

17

:

199

.

Wu

Y

,

Tan

H

,

Li

Y

,

Zhang

J

,

Chen

X.

2018

.

A fused CP factorization method for incomplete tensors

.

IEEE Trans Neural Netw Learn Syst

.

30

:

751

–

764

.

Yokota

T

,

Zhao

Q

,

Cichocki

A.

2016

.

Smooth parafac decomposition for tensor completion

.

IEEE Trans Signal Process

.

64

:

5423

–

5436

.