Double robust variance estimation with parametric working models

Estimated effect of maternal anemia on birth weight by model specification and estimator.

	\|$\widehat{DR}$\|	ES-SE	ES-95% CI	IF-SE	IF-95% CI	NB-SE	NB-95% CI
Full covariate set
Classic AIPW	−37	56	(−147, 73)	58	(−151, 76)	63	(−161, 86)
Wtd regression AIPW	−41	56	(−151, 69)	56	(−151, 68)	61	(−160, 78)
TMLE	−37	56	(−147, 73)	58	(−151, 76)	63	(−160, 85)
Naive outcome
Classic AIPW	−36	57	(−148, 76)	61	(−156, 84)	64	(−162, 90)
Wtd regression AIPW	−36	57	(−148, 77)	61	(−156, 84)	64	(−161, 90)
TMLE	−36	57	(−148, 77)	61	(−156, 84)	63	(−160, 89)
Naive propensity
Classic AIPW	−41	58	(−154, 73)	56	(−150, 69)	58	(−155, 74)
Wtd regression AIPW	−47	57	(−159, 65)	54	(−153, 59)	59	(−162, 68)
TMLE	−41	58	(−154, 73)	56	(−150, 69)	59	(−155, 74)

	\|$\widehat{DR}$\|	ES-SE	ES-95% CI	IF-SE	IF-95% CI	NB-SE	NB-95% CI
Full covariate set
Classic AIPW	−37	56	(−147, 73)	58	(−151, 76)	63	(−161, 86)
Wtd regression AIPW	−41	56	(−151, 69)	56	(−151, 68)	61	(−160, 78)
TMLE	−37	56	(−147, 73)	58	(−151, 76)	63	(−160, 85)
Naive outcome
Classic AIPW	−36	57	(−148, 76)	61	(−156, 84)	64	(−162, 90)
Wtd regression AIPW	−36	57	(−148, 77)	61	(−156, 84)	64	(−161, 90)
TMLE	−36	57	(−148, 77)	61	(−156, 84)	63	(−160, 89)
Naive propensity
Classic AIPW	−41	58	(−154, 73)	56	(−150, 69)	58	(−155, 74)
Wtd regression AIPW	−47	57	(−159, 65)	54	(−153, 59)	59	(−162, 68)
TMLE	−41	58	(−154, 73)	56	(−150, 69)	59	(−155, 74)

Abbreviations:CI = confidence interval, ES = empirical sandwich, IF = influence function, NB = nonparametric bootstrap, SE = standard error, Wtd = weighted. The naive outcome model included only the exposure, and the naive propensity model included only an intercept. The nonparametric bootstrap was based on 5000 resamples.

TABLE 1

Open in new tab Download slide

Estimated effect of maternal anemia on birth weight by model specification and estimator.

	\|$\widehat{DR}$\|	ES-SE	ES-95% CI	IF-SE	IF-95% CI	NB-SE	NB-95% CI
Full covariate set
Classic AIPW	−37	56	(−147, 73)	58	(−151, 76)	63	(−161, 86)
Wtd regression AIPW	−41	56	(−151, 69)	56	(−151, 68)	61	(−160, 78)
TMLE	−37	56	(−147, 73)	58	(−151, 76)	63	(−160, 85)
Naive outcome
Classic AIPW	−36	57	(−148, 76)	61	(−156, 84)	64	(−162, 90)
Wtd regression AIPW	−36	57	(−148, 77)	61	(−156, 84)	64	(−161, 90)
TMLE	−36	57	(−148, 77)	61	(−156, 84)	63	(−160, 89)
Naive propensity
Classic AIPW	−41	58	(−154, 73)	56	(−150, 69)	58	(−155, 74)
Wtd regression AIPW	−47	57	(−159, 65)	54	(−153, 59)	59	(−162, 68)
TMLE	−41	58	(−154, 73)	56	(−150, 69)	59	(−155, 74)

	\|$\widehat{DR}$\|	ES-SE	ES-95% CI	IF-SE	IF-95% CI	NB-SE	NB-95% CI
Full covariate set
Classic AIPW	−37	56	(−147, 73)	58	(−151, 76)	63	(−161, 86)
Wtd regression AIPW	−41	56	(−151, 69)	56	(−151, 68)	61	(−160, 78)
TMLE	−37	56	(−147, 73)	58	(−151, 76)	63	(−160, 85)
Naive outcome
Classic AIPW	−36	57	(−148, 76)	61	(−156, 84)	64	(−162, 90)
Wtd regression AIPW	−36	57	(−148, 77)	61	(−156, 84)	64	(−161, 90)
TMLE	−36	57	(−148, 77)	61	(−156, 84)	63	(−160, 89)
Naive propensity
Classic AIPW	−41	58	(−154, 73)	56	(−150, 69)	58	(−155, 74)
Wtd regression AIPW	−47	57	(−159, 65)	54	(−153, 59)	59	(−162, 68)
TMLE	−41	58	(−154, 73)	56	(−150, 69)	59	(−155, 74)

The doubly robust methods provided similar point estimates for all model specifications, with estimates of approximately −40g, though all 95% CIs included zero. Precision estimates based on the empirical sandwich variance estimator and the influence function based variance estimator were similar when the full covariate set was included in both working models. Under the naive outcome model, standard errors for the influence function based variance estimator were larger than those for the empirical sandwich variance estimator. When the propensity model was naive, standard errors for the influence function based variance estimator were smaller than those for the empirical sandwich variance estimator. Note that there was more fluctuation in 95% CI half-widths across model specifications for the influence function based variance estimator than the empirical sandwich variance estimator (Figure 1). Bootstrap standard errors were larger than those of the empirical sandwich and influence function based variance estimators. This may be due to the presence of extreme birth weights in the data (Web Figure 1).

There are notable limitations associated with this analysis. First, the average causal effect may not be an informative estimand for the anemia exposure. The average causal effect contrasts average birth weight under settings where all women have anemia and no women have anemia. It is challenging to envision a setting where all women would be susceptible to anemia, so the estimators (1), (2), and (3) may lack a meaningful causal interpretation. Other estimands could be considered, including a comparison of |$E(Y^0)$| with the natural course, |$E(Y)$| (Hubbard and van der Laan, 2008), or contrasts of stochastic intervention distributions (Kennedy, 2019). Additionally, as with all observational studies, there may be uncontrolled confounding not captured in the observed set of covariates. This concern may be heightened in settings where causal consistency is also questionable (Hernán and van der Weele, 2011), i.e., there may be multiple interventions to modify anemia status. Additional support is needed to justify the causal identification assumptions before determining whether the estimates in Table 1 are substantively relevant. Finally, the analysis included birth weights from 14 participants who experienced stillbirth. Future work could reexamine this problem using alternative approaches for competing events.

4 SIMULATION STUDY

A simulation study was conducted to compare the empirical properties of the influence function based variance estimator, the empirical sandwich variance estimator, and the nonparametric bootstrap in conjunction with each of the 3 doubly robust methods discussed in Section 2. Simulations were conducted with |$n=800$|⁠, similar to the sample size in the example in Section 3.

4.1 Simulation setup

Three covariates (⁠|$Z_1$|⁠, |$Z_2$|⁠, |$Z_3$|⁠), the exposure (⁠|$X$|⁠), and potential outcomes (⁠|$Y^0$| and |$Y^1$|⁠) were simulated. The covariate |$Z_1$| was distributed normal with mean 155 and standard deviation 7.6. Two binary covariates |$Z_2$| and |$Z_3$| were simulated from Bernoulli distributions with means 0.25 and 0.75, respectively. The exposure |$X$| was simulated from a Bernoulli distribution with mean |$\mbox{expit}(15-0.1 Z_1 + 2.5 Z_2 - Z_3 - 0.02 Z_1 Z_2 + 0.005 Z_1 Z_3)$|⁠. Potential outcomes |$Y^1$| and |$Y^0$| under exposure and no exposure, respectively, were simulated from a normal distribution with mean |$E(Y^x)=1000+ 11.5 Z_1 + 100 Z_2 -15 Z_1 Z_2 +25x-5.5 x Z_1 -30 x Z_2 +5 x Z_1 Z_2$| and standard deviation |$\sigma =400$| for |$x \in \lbrace 0,1\rbrace$|⁠. Under this data generating mechanism, the |$ACE$| was approximately |$-60$|⁠. The marginal distributions of the exposure and outcome were modeled after the example in Section 3. To examine the performance of the estimators under the null, simulations were also conducted where |$E(Y^1)$| was equal to |$E(Y^0)$|⁠, as defined above, such that |$ACE=0$|⁠.

The estimators |$\widehat{DR}_{C}$|⁠, |$\widehat{DR}_{WR}$|⁠, and |$\widehat{DR}_{TMLE}$| were applied to 5000 simulated samples. Estimators were computed 4 ways: (1) propensity and outcome models correctly specified, (2) propensity models correctly specified but outcome models misspecified, (3) outcome models correctly specified but propensity models misspecified, and (4) both models misspecified. Misspecified propensity models included only an intercept and a linear term for |$(Z_1-155)^2$|⁠, and misspecified outcome models included only an intercept and linear terms for |$X$| and |$(Z_1-155)^2$|⁠.

Each of the 3 variance estimators described in Section 2.3 was applied. For the nonparametric bootstrap, 1000 resamples were included in each iteration for the estimation of the bootstrap standard error, excluding any resamples where working models failed to converge. Simulation results were summarized by empirical bias, average standard error (ASE), empirical standard error (ESE), standard error ratio (SER = ASE/ESE), and empirical 95% CI coverage. The ratio of the variance estimate for each simulation and the ESE, that is, the variance ratio, was also summarized. That is, |$VR=se_s / ESE$|⁠, where |$se_s=\sqrt{\hat{V}(\widehat{DR})}$| for a given doubly robust estimator, model specification, and variance estimator for simulation |$s$|⁠.

4.2 Simulation results

Both point and variance estimators performed as expected in simulations. The results of the simulation study are presented in Table 2 and Figures 2 and 3. When at least 1 model was correctly specified, the classic AIPW, weighted regression AIPW, and TMLE displayed minimal bias, but all were substantially biased when both working models were misspecified. Under correct specification of both working models, all 3 variance estimators tracked closely with the ESE, resulting in SERs close to 1 and CIs attaining the nominal level of coverage. When both models were misspecified, all 3 variance estimators tracked with the ESE, but bias was substantial, resulting in CIs with below nominal coverage.

$Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, $\sigma =400$, 5000 simulations. $ACE$ was approximately −60. Black squares denote the mean variance ratio (=SER). Results exclude 1 simulation where models failed to converge. The 0.33% of correct model specification simulations, 4.02% of misspecified outcome model simulations, and 0.004% of misspecified propensity model simulations where the ratio was above 1.2 or below 0.8 are not displayed.$

FIGURE 2

Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, |$\sigma =400$|⁠, 5000 simulations. |$ACE$| was approximately −60. Black squares denote the mean variance ratio (=SER). Results exclude 1 simulation where models failed to converge. The 0.33% of correct model specification simulations, 4.02% of misspecified outcome model simulations, and 0.004% of misspecified propensity model simulations where the ratio was above 1.2 or below 0.8 are not displayed.

$Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, $\sigma =400$, 5000 simulations under the null. Black squares denote the mean variance ratio (=SER). The 0.01% of correct model specification simulations and 1.72% of misspecified outcome model simulations where the ratio was above 2.75 or below 0.5 are not displayed.$

FIGURE 3

Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, |$\sigma =400$|⁠, 5000 simulations under the null. Black squares denote the mean variance ratio (=SER). The 0.01% of correct model specification simulations and 1.72% of misspecified outcome model simulations where the ratio was above 2.75 or below 0.5 are not displayed.

Open in new tab Download slide

TABLE 2

Simulation summary results, continuous outcome, |$n=800$|⁠, |$\sigma =400$|⁠, 5000 simulations.

Scenario	Estimator	Bias	ESE	SER, ES	Cov, ES (%)	SER, NB	Cov, NB (%)	SER, IF	Cov, IF (%)
\|$ACE \approx -60$\|
CS	Classic	0.4	58.4	0.99	95	1.00	95	1.00	95
	WR	0.4	58.3	0.99	95	1.00	95	0.99	95
	TMLE	0.4	58.3	0.99	95	1.00	95	1.00	95
MO	Classic	−0.4	60.0	0.99	95	1.02	96	1.07	97
	WR	−1.6	59.2	0.99	95	1.00	95	1.05	96
	TMLE	−0.5	59.7	0.99	95	1.01	95	1.06	96
MP	Classic	0.3	57.8	1.00	95	1.00	95	0.97	94
	WR	0.3	57.8	1.00	95	1.00	95	0.97	94
	TMLE	0.3	57.8	1.00	95	1.00	95	0.97	94
MB	Classic	−23.8	57.0	1.00	92	1.00	92	1.00	92
	WR	−23.8	57.0	1.00	92	1.00	92	1.00	92
	TMLE	−23.8	57.0	1.00	92	1.00	92	1.00	92
\|$ACE = 0$\|
CS	Classic	0.1	35.0	0.96	94	1.00	95	0.97	94
	WR	0.1	34.8	0.96	94	0.98	94	0.95	94
	TMLE	0.1	35.0	0.96	94	0.99	95	0.97	94
MO	Classic	1.4	48.0	0.93	94	1.07	96	2.10	100
	WR	2.5	44.8	0.94	93	1.02	95	2.18	100
	TMLE	1.6	46.7	0.93	94	1.04	96	2.12	100
MP	Classic	0.0	33.7	0.98	95	0.99	95	0.89	92
	WR	−0.1	33.7	0.98	95	0.99	95	0.89	92
	TMLE	0.0	33.7	0.98	95	0.99	95	0.89	92
MB	Classic	158.8	75.9	0.99	44	0.99	44	0.99	44
	WR	158.8	75.9	0.99	44	0.99	44	0.99	44
	TMLE	158.8	75.9	0.99	44	0.99	44	0.99	44

Scenario	Estimator	Bias	ESE	SER, ES	Cov, ES (%)	SER, NB	Cov, NB (%)	SER, IF	Cov, IF (%)
\|$ACE \approx -60$\|
CS	Classic	0.4	58.4	0.99	95	1.00	95	1.00	95
	WR	0.4	58.3	0.99	95	1.00	95	0.99	95
	TMLE	0.4	58.3	0.99	95	1.00	95	1.00	95
MO	Classic	−0.4	60.0	0.99	95	1.02	96	1.07	97
	WR	−1.6	59.2	0.99	95	1.00	95	1.05	96
	TMLE	−0.5	59.7	0.99	95	1.01	95	1.06	96
MP	Classic	0.3	57.8	1.00	95	1.00	95	0.97	94
	WR	0.3	57.8	1.00	95	1.00	95	0.97	94
	TMLE	0.3	57.8	1.00	95	1.00	95	0.97	94
MB	Classic	−23.8	57.0	1.00	92	1.00	92	1.00	92
	WR	−23.8	57.0	1.00	92	1.00	92	1.00	92
	TMLE	−23.8	57.0	1.00	92	1.00	92	1.00	92
\|$ACE = 0$\|
CS	Classic	0.1	35.0	0.96	94	1.00	95	0.97	94
	WR	0.1	34.8	0.96	94	0.98	94	0.95	94
	TMLE	0.1	35.0	0.96	94	0.99	95	0.97	94
MO	Classic	1.4	48.0	0.93	94	1.07	96	2.10	100
	WR	2.5	44.8	0.94	93	1.02	95	2.18	100
	TMLE	1.6	46.7	0.93	94	1.04	96	2.12	100
MP	Classic	0.0	33.7	0.98	95	0.99	95	0.89	92
	WR	−0.1	33.7	0.98	95	0.99	95	0.89	92
	TMLE	0.0	33.7	0.98	95	0.99	95	0.89	92
MB	Classic	158.8	75.9	0.99	44	0.99	44	0.99	44
	WR	158.8	75.9	0.99	44	0.99	44	0.99	44
	TMLE	158.8	75.9	0.99	44	0.99	44	0.99	44

Abbreviations: Cov = 95% confidence interval coverage, that is, the proportion of simulated samples for which the 95% CI included ACE; CS = correct specification of both models; ES = empirical sandwich variance estimator; ESE = empirical standard error; IF = influence function based variance estimator; MO = misspecified outcome model, MP = misspecified propensity model; MB = misspecified both models; NB = nonparametric bootstrap variance estimator; SER = standard error ratio (ASE/ESE), where ASE = average estimated standard error; WR = weighted regression AIPW. Monte Carlo standard error for 95% CI coverage was 0.3% when coverage was 95%. Results exclude 1 simulation where models did not converge. Bias, ESE, SER, and 95% CI coverage calculated for ACE.

TABLE 2

Simulation summary results, continuous outcome, |$n=800$|⁠, |$\sigma =400$|⁠, 5000 simulations.

Scenario	Estimator	Bias	ESE	SER, ES	Cov, ES (%)	SER, NB	Cov, NB (%)	SER, IF	Cov, IF (%)
\|$ACE \approx -60$\|
CS	Classic	0.4	58.4	0.99	95	1.00	95	1.00	95
	WR	0.4	58.3	0.99	95	1.00	95	0.99	95
	TMLE	0.4	58.3	0.99	95	1.00	95	1.00	95
MO	Classic	−0.4	60.0	0.99	95	1.02	96	1.07	97
	WR	−1.6	59.2	0.99	95	1.00	95	1.05	96
	TMLE	−0.5	59.7	0.99	95	1.01	95	1.06	96
MP	Classic	0.3	57.8	1.00	95	1.00	95	0.97	94
	WR	0.3	57.8	1.00	95	1.00	95	0.97	94
	TMLE	0.3	57.8	1.00	95	1.00	95	0.97	94
MB	Classic	−23.8	57.0	1.00	92	1.00	92	1.00	92
	WR	−23.8	57.0	1.00	92	1.00	92	1.00	92
	TMLE	−23.8	57.0	1.00	92	1.00	92	1.00	92
\|$ACE = 0$\|
CS	Classic	0.1	35.0	0.96	94	1.00	95	0.97	94
	WR	0.1	34.8	0.96	94	0.98	94	0.95	94
	TMLE	0.1	35.0	0.96	94	0.99	95	0.97	94
MO	Classic	1.4	48.0	0.93	94	1.07	96	2.10	100
	WR	2.5	44.8	0.94	93	1.02	95	2.18	100
	TMLE	1.6	46.7	0.93	94	1.04	96	2.12	100
MP	Classic	0.0	33.7	0.98	95	0.99	95	0.89	92
	WR	−0.1	33.7	0.98	95	0.99	95	0.89	92
	TMLE	0.0	33.7	0.98	95	0.99	95	0.89	92
MB	Classic	158.8	75.9	0.99	44	0.99	44	0.99	44
	WR	158.8	75.9	0.99	44	0.99	44	0.99	44
	TMLE	158.8	75.9	0.99	44	0.99	44	0.99	44

Scenario	Estimator	Bias	ESE	SER, ES	Cov, ES (%)	SER, NB	Cov, NB (%)	SER, IF	Cov, IF (%)
\|$ACE \approx -60$\|
CS	Classic	0.4	58.4	0.99	95	1.00	95	1.00	95
	WR	0.4	58.3	0.99	95	1.00	95	0.99	95
	TMLE	0.4	58.3	0.99	95	1.00	95	1.00	95
MO	Classic	−0.4	60.0	0.99	95	1.02	96	1.07	97
	WR	−1.6	59.2	0.99	95	1.00	95	1.05	96
	TMLE	−0.5	59.7	0.99	95	1.01	95	1.06	96
MP	Classic	0.3	57.8	1.00	95	1.00	95	0.97	94
	WR	0.3	57.8	1.00	95	1.00	95	0.97	94
	TMLE	0.3	57.8	1.00	95	1.00	95	0.97	94
MB	Classic	−23.8	57.0	1.00	92	1.00	92	1.00	92
	WR	−23.8	57.0	1.00	92	1.00	92	1.00	92
	TMLE	−23.8	57.0	1.00	92	1.00	92	1.00	92
\|$ACE = 0$\|
CS	Classic	0.1	35.0	0.96	94	1.00	95	0.97	94
	WR	0.1	34.8	0.96	94	0.98	94	0.95	94
	TMLE	0.1	35.0	0.96	94	0.99	95	0.97	94
MO	Classic	1.4	48.0	0.93	94	1.07	96	2.10	100
	WR	2.5	44.8	0.94	93	1.02	95	2.18	100
	TMLE	1.6	46.7	0.93	94	1.04	96	2.12	100
MP	Classic	0.0	33.7	0.98	95	0.99	95	0.89	92
	WR	−0.1	33.7	0.98	95	0.99	95	0.89	92
	TMLE	0.0	33.7	0.98	95	0.99	95	0.89	92
MB	Classic	158.8	75.9	0.99	44	0.99	44	0.99	44
	WR	158.8	75.9	0.99	44	0.99	44	0.99	44
	TMLE	158.8	75.9	0.99	44	0.99	44	0.99	44

Differences between variance estimators are apparent for scenarios where only 1 working model was correctly specified. When either the outcome model or the propensity model was misspecified, CIs based on the empirical sandwich variance estimator and the nonparametric bootstrap attained the nominal level of coverage, and estimated standard errors generally tracked closely with the ESE. In Figures 2 and 3, note that variance ratios for the empirical sandwich estimator and the nonparametric bootstrap are clustered around 1 for these scenarios. In contrast, the influence function based variance estimator was empirically biased when either working model was misspecified. As expected, it tended to overestimate the variance when the outcome model was misspecified, leading to SERs above 1 and conservative CI coverage. When the propensity model was misspecified but the outcome model was correctly specified, the influence function based variance estimator underestimated the variance, resulting in SERs below 1. Bias in the influence function based variance estimator was more pronounced under the null. Under outcome model misspecification, SERs exceeded 2.0, resulting in approximately 100% CI coverage. Under propensity misspecification, SERs were 0.89, resulting in CIs with below nominal coverage.

The empirical sandwich variance and nonparametric bootstrap estimators both demonstrated the expected doubly robust variance property, but there was a notable difference in the performance of the estimators. The nonparametric bootstrap generally demonstrated more variation than the empirical sandwich estimator, as evidenced by increased spread and more extreme outliers in Figures 2 and 3.

Additional simulations were conducted with |$\sigma \in \lbrace 200,600\rbrace$|⁠, with a binary outcome, and with |$n=2000$|⁠. Twelve scenarios were considered. Details and results of the additional scenarios are provided in Web Appendix C.

In summary, the simulation study demonstrated the theoretical properties explained in Section 2. That is, the empirical sandwich variance estimator and the nonparametric bootstrap were empirically unbiased when at least 1 of the 2 working models was correctly specified. The influence function based variance estimator is not consistent under misspecification of either working model. The influence function based variance estimator was conservative when the outcome model was misspecified and was generally anti-conservative when the propensity model was misspecified. The magnitude and direction of bias under misspecified working models varied across simulation scenarios.

5 DISCUSSION

Doubly robust estimators have gained popularity due to their ability to provide consistent point estimates when either an outcome or a propensity model is correctly specified. Here, the commonly used influence function based variance estimator is compared with the empirical sandwich variance estimator and the nonparametric bootstrap in conjunction with 3 doubly robust estimators: the classic AIPW estimator, the weighted regression AIPW estimator, and TMLE. For estimation of the average causal effect with observational data, the influence function based variance estimator is consistent only when both outcome and propensity models are correctly specified. In contrast, both the empirical sandwich variance estimator and the nonparametric bootstrap are doubly robust variance estimators. As such, CIs constructed from these estimators are expected to provide nominal CI coverage when either model is correctly specified.

This paper considers only variance estimation of the average causal effect with observational data. The influence function based variance estimator can be consistent under the misspecification of 1 working model in some settings. For example, this variance estimator is consistent under parametric outcome model misspecification for marginally randomized trials (assuming that the exposure is included in the outcome model) (Wang et al., 2019; Chang et al., 2023) and for the estimation of some nondegenerate estimands (Haneuse and Rotnitzky, 2013).

While here the consideration was limited to finite-dimensional parametric modeling approaches, machine learning approaches are commonly applied with doubly robust estimators. These approaches allow for more complex functional forms for continuous covariates than fully parametric approaches and include higher order interaction terms that may not be included in investigator-specified parametric models, often leading to estimators that are more robust to model misspecification (Zivich et al., 2022a). However, the convergence of machine learning algorithms is typically slower and consistent variance estimation is more challenging than with parametric modeling approaches. Machine learning methods are typically not compatible with the estimating equations approach discussed in this paper for fitting working models, though alternative methods have been developed for doubly robust variance estimation in this context (Benkeser et al., 2017; Avagyan and Vansteelandt, 2022).

Use of the empirical sandwich variance estimator in conjunction with doubly robust estimators is not new, though its doubly robust property has not been emphasized. Lunceford and Davidian (2004) discuss the use of the empirical sandwich variance estimator for estimating the variance of the classic AIPW estimator and note that it tends to be more stable than the influence function based variance estimator. However, based on the systematic review discussed in the introduction (Smith et al., 2023), the empirical sandwich variance estimator does not appear to be widely used with TMLE. While some existing software packages compute empirical sandwich variance estimators for doubly robust estimators (eg, the “causaltrt” procedure in SAS and the “dr” procedure in Stata), other popular software packages compute the influence function based variance estimator but not the empirical sandwich variance estimator (eg, the AIPW and tmle packages in R and the zEpid package in Python). We hope that this work will allow for easier implementation of doubly robust point and variance estimation.

ACKNOWLEDGMENTS

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors thank the co-editor, associate editor, and two reviewers for suggestions that strengthened this work. Thanks also to Dr. Bradley Saul and Dr. Michael Hudgens at the University of North Carolina for coding support and helpful suggestions, respectively.

FUNDING

This work was supported by the National Institutes of Health under award numbers R01 AI157758, K01 AI182506, R01 AI085073, P30 AI050410, and K01 AI177102.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

Deidentified individual participant data will be made available beginning 3 months and ending 5 years following article publication for researchers who provide a methodologically sound proposal to achieve aims in the approved protocol. Proposals should be directed to the corresponding author, and data requestors will need to sign a data access agreement.

REFERENCES

Avagyan

Vansteelandt

(

2022

High-dimensional inference for the average treatment effect under model misspecification using penalized bias-reduced double-robust estimation

Biostatistics and Epidemiology

221

–

238

Azizah

F. K.

Dewi

Y. L. R.

Murti

(

2022

The effect of maternal anemia on low birth weight: a systematic review and meta analysis

Journal of Maternal and Child Health

–

10.1002/9781118445112.stat08068

Benkeser

Carone

Laan

M. V. D.

Gilbert

P. B.

(

2017

Doubly robust nonparametric inference on the average treatment effect

Biometrika

104

863

–

880

Chang

C.-R.

Song

Wang

(

2023

Covariate adjustment in randomized clinical trials with missing covariate and outcome data

Statistics in Medicine

3919

–

3935

Chernozhukov

Chetverikov

Demirer

Duflo

Hansen

Newey

et al. (

2018

Double/debiased machine learning for treatment and structural parameters

The Econometrics Journal

–

C68

Daniel

R. M.

(

2014

Double robustness

. In:

Wiley StatsRef: Statistics Reference Online

Balakrishnan

Colton

Everitt

Piegorsch

Ruggeri

Teugels

–

Wiley

Davison

(

1997

Bootstrap Methods and Their Application (Chapter 1)

New York, NY

Cambridge University Press

Funk

M. J.

Westreich

Wiesen

Stürmer

Brookhart

M. A.

Davidian

(

2011

Doubly robust estimation of causal effects

American Journal of Epidemiology

173

761

–

767

Gabriel

E. E.

Sachs

M. C.

Martinussen

Waernbaum

Goetghebeur

Vansteelandt

et al. (

2024

Inverse probability of treatment weighting with generalized linear outcome models for doubly robust estimation

Statistics in Medicine

534

–

547

Gruber

van der Laan

(

2012

tmle: an R package for targeted maximum likelihood estimation

Journal of Statistical Software

–

Gruber

van der Laan

M. J.

(

2010

A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome

The International Journal of Biostatistics

Article 26

PubMed

Haneuse

Rotnitzky

(

2013

Estimation of the effect of interventions that modify the received treatment

Statistics in Medicine

5260

–

5277

Hernán

M. A.

van der Weele

T. J.

(

2011

Compound treatments and transportability of causal inference

Epidemiology

368

–

377

Hubbard

A. E.

van der Laan

M. J.

(

2008

Population intervention models in causal inference

Biometrika

–

Kang

J. D.

Schafer

J. L.

(

2007

Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data

Statistical Science

523

–

539

Kennedy

E. H.

(

2019

Nonparametric causal effects based on incremental propensity score interventions

Journal of the American Statistical Association

114

645

–

656

Levine

A. M.

Berhane

Masri-Lavine

Sanchez

M. L.

Young

Augenbraun

et al. (

2001

Prevalence and correlates of anemia in a large cohort of HIV-infected women: Women’s Interagency HIV Study

Journal of Acquired Immune Deficiency Syndromes

–

Lunceford

J. K.

Davidian

(

2004

Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study

Statistics in Medicine

2937

–

2960

Luque-Fernandez

M. A.

Schomaker

Rachet

Schnitzer

M. E.

(

2018

Targeted maximum likelihood estimation for a binary treatment: a tutorial

Statistics in Medicine

2530

–

2546

Muñoz

I. D.

van der Laan

(

2012

Population intervention causal effects based on stochastic interventions

Biometrics

541

–

549

Naimi

A. I.

Mishler

A. E.

Kennedy

E. H.

(

2023

Challenges in obtaining valid causal effect estimates with machine learning algorithms

American Journal of Epidemiology

192

1536

–

1544

Price

J. T.

Vwalika

Freeman

B. L.

Cole

S. R.

Saha

P. T.

Mbewe

F. M.

et al. (

2021

Weekly 17 alpha-hydroxyprogesterone caproate to prevent preterm birth among women living with HIV: a randomised, double-blind, placebo-controlled trial

The Lancet HIV

e605

–

e613

Robins

Sued

Lei-Gomez

Rotnitzky

(

2007

Comment: performance of double-robust estimators when “inverse probability” weights are highly variable

Statistical Science

544

–

559

Robins

J. M.

Rotnitzky

Zhao

(

1994

Estimation of regression coefficients when some regressors are not always observed

Journal of the American Statistical Association

846

–

866

Saul

B. C.

Hudgens

M. G.

(

2020

The calculus of M-estimation in R with geex

Journal of Statistical Software

–

Shook-Sa

B. E.

Hudgens

M. G.

Knittel

A. K.

Edmonds

Ramirez

Cole

S. R.

et al. (

2024

Exposure effects on count outcomes with observational data, with application to incarcerated women

The Annals of Applied Statistics

2147

–

2165

Smith

M. J.

Phillips

R. V.

Luque-Fernandez

M. A.

Maringe

(

2023

Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review

Annals of Epidemiology

–

Stefanski

L. A.

Boos

D. D.

(

2002

The calculus of M-estimation

The American Statistician

–

Tran

Yiannoutsos

Wools-Kaloustian

Siika

van der Laan

Petersen

(

2019

Double robust efficient estimators of longitudinal treatment effects: comparative performance in simulations and a case study

The International Journal of Biostatistics

/j/ijb.2019.15.issue–2/ijb-2017-0054/ijb-2017-0054.xml

Tsiatis

A. A.

(

2006

Semiparametric Theory and Missing Data (Section 3.2)

, vol.

New York, NY

Springer

Google Preview

van der Laan

M. J.

Rose

Sekhon

J. S.

Gruber

Porter

K. E.

van der Laan

M. J.

(

2011

Propensity-score-based estimators and C-TMLE

. In:

Targeted Learning: Causal Inference for Observational and Experimental Data

343

–

364

New York, NY

Springer

Google Preview

van der Laan

M. J.

Rubin

(

2006

Targeted maximum likelihood learning

The International Journal of Biostatistics

–