M-estimation for common epidemiological measures: introduction and applied examples

Point estimates with 95% confidence intervals from maximum likelihood and M-estimation in first three applied examples, from R

		MLE	M-estimation
	Parameter	Logistic regression	Logistic regression	IPW^a	G-computation^a
Outcome model	${\hat{β}}_{0}$ ⁠, intercept	–1.89 (–2.13, –1.66)	–1.89 (–2.13, –1.66)	–	–1.89 (–2.13, –1.66)
	${\hat{β}}_{1}$ ⁠, coefficient for anaemia	0.12 (–0.43, 0.66)	0.12 (–0.43, 0.66)	–	0.12 (–0.43, 0.66)
	${\hat{β}}_{2}$ ⁠, coefficient for BP	0.36 (–0.11, 0.82)	0.36 (–0.11, 0.82)	–	0.36 (–0.11, 0.82)
Propensity score model	${\hat{α}}_{0}$ ⁠, intercept	–	–	–1.74 (–1.95, –1.53)	–
Propensity score model	${\hat{α}}_{1}$ ⁠, coefficient for BP	–	–	–0.30 (–0.83, 0.24)	–
Parameters of interest	${\hat{μ}}_{0}$ ⁠, risk under no anaemia	–	–	0.14 (0.11, 0.17)	0.14 (0.11, 0.17)
	${\hat{μ}}_{1}$ ⁠, risk under anaemia	–	–	0.15 (0.09, 0.22)	0.16 (0.09, 0.22)
	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$ ⁠, risk difference	–	–	0.01 (–0.06, 0.08)	0.02 (–0.06, 0.09)

		MLE	M-estimation
	Parameter	Logistic regression	Logistic regression	IPW^a	G-computation^a
Outcome model	${\hat{β}}_{0}$ ⁠, intercept	–1.89 (–2.13, –1.66)	–1.89 (–2.13, –1.66)	–	–1.89 (–2.13, –1.66)
	${\hat{β}}_{1}$ ⁠, coefficient for anaemia	0.12 (–0.43, 0.66)	0.12 (–0.43, 0.66)	–	0.12 (–0.43, 0.66)
	${\hat{β}}_{2}$ ⁠, coefficient for BP	0.36 (–0.11, 0.82)	0.36 (–0.11, 0.82)	–	0.36 (–0.11, 0.82)
Propensity score model	${\hat{α}}_{0}$ ⁠, intercept	–	–	–1.74 (–1.95, –1.53)	–
Propensity score model	${\hat{α}}_{1}$ ⁠, coefficient for BP	–	–	–0.30 (–0.83, 0.24)	–
Parameters of interest	${\hat{μ}}_{0}$ ⁠, risk under no anaemia	–	–	0.14 (0.11, 0.17)	0.14 (0.11, 0.17)
	${\hat{μ}}_{1}$ ⁠, risk under anaemia	–	–	0.15 (0.09, 0.22)	0.16 (0.09, 0.22)
	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$ ⁠, risk difference	–	–	0.01 (–0.06, 0.08)	0.02 (–0.06, 0.09)

BP, blood pressure; IPW, inverse probability weighting; MLE, maximum likelihood estimation.

a

Differences in $\hat{μ}$ and $\hat{δ}$ are due to the different modelling approaches of g-computation and IPW.

Table 1.

Point estimates with 95% confidence intervals from maximum likelihood and M-estimation in first three applied examples, from R

		MLE	M-estimation
	Parameter	Logistic regression	Logistic regression	IPW^a	G-computation^a
Outcome model	${\hat{β}}_{0}$ ⁠, intercept	–1.89 (–2.13, –1.66)	–1.89 (–2.13, –1.66)	–	–1.89 (–2.13, –1.66)
	${\hat{β}}_{1}$ ⁠, coefficient for anaemia	0.12 (–0.43, 0.66)	0.12 (–0.43, 0.66)	–	0.12 (–0.43, 0.66)
	${\hat{β}}_{2}$ ⁠, coefficient for BP	0.36 (–0.11, 0.82)	0.36 (–0.11, 0.82)	–	0.36 (–0.11, 0.82)
Propensity score model	${\hat{α}}_{0}$ ⁠, intercept	–	–	–1.74 (–1.95, –1.53)	–
Propensity score model	${\hat{α}}_{1}$ ⁠, coefficient for BP	–	–	–0.30 (–0.83, 0.24)	–
Parameters of interest	${\hat{μ}}_{0}$ ⁠, risk under no anaemia	–	–	0.14 (0.11, 0.17)	0.14 (0.11, 0.17)
	${\hat{μ}}_{1}$ ⁠, risk under anaemia	–	–	0.15 (0.09, 0.22)	0.16 (0.09, 0.22)
	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$ ⁠, risk difference	–	–	0.01 (–0.06, 0.08)	0.02 (–0.06, 0.09)

		MLE	M-estimation
	Parameter	Logistic regression	Logistic regression	IPW^a	G-computation^a
Outcome model	${\hat{β}}_{0}$ ⁠, intercept	–1.89 (–2.13, –1.66)	–1.89 (–2.13, –1.66)	–	–1.89 (–2.13, –1.66)
	${\hat{β}}_{1}$ ⁠, coefficient for anaemia	0.12 (–0.43, 0.66)	0.12 (–0.43, 0.66)	–	0.12 (–0.43, 0.66)
	${\hat{β}}_{2}$ ⁠, coefficient for BP	0.36 (–0.11, 0.82)	0.36 (–0.11, 0.82)	–	0.36 (–0.11, 0.82)
Propensity score model	${\hat{α}}_{0}$ ⁠, intercept	–	–	–1.74 (–1.95, –1.53)	–
Propensity score model	${\hat{α}}_{1}$ ⁠, coefficient for BP	–	–	–0.30 (–0.83, 0.24)	–
Parameters of interest	${\hat{μ}}_{0}$ ⁠, risk under no anaemia	–	–	0.14 (0.11, 0.17)	0.14 (0.11, 0.17)
	${\hat{μ}}_{1}$ ⁠, risk under anaemia	–	–	0.15 (0.09, 0.22)	0.16 (0.09, 0.22)
	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$ ⁠, risk difference	–	–	0.01 (–0.06, 0.08)	0.02 (–0.06, 0.09)

BP, blood pressure; IPW, inverse probability weighting; MLE, maximum likelihood estimation.

a

Differences in $\hat{μ}$ and $\hat{δ}$ are due to the different modelling approaches of g-computation and IPW.

Estimating the marginal risk difference

Using the same data as the prior example, we now want to estimate the marginal risk difference of anaemia on preterm birth, adjusted for (standardized by) elevated blood pressure. We illustrate two standardization approaches, IPW and g-computation. Parameter estimates are included in Table 1.

Inverse Probability Weighting

The top of Table 2 describes the four steps to estimate the marginal risk difference by IPW: (i) estimating

\hat{α}

in the propensity score logistic regression model; (ii) estimating the marginal risk under no anaemia,

{\hat{μ}}_{0}

⁠; (iii) estimating the marginal risk under anaemia,

{\hat{μ}}_{1}

⁠; and (iv) estimating the risk difference,

\hat{δ}

⁠. To use M-estimation, we translate each step into estimation equations as shown in Table 2. Step 1 is estimating a logistic regression. We derived the estimating equations for logistic regression in the prior example. Steps 2 and 3 are estimating sample means. The equation for

δ

⁠, in the fourth step, can be rearranged into the form of an estimating equation. The Supplementary Material (available as Supplementary data at IJE online) shows the derivation of each estimating equation. Finally, we stack these estimating equations together:

\sum_{i = 1}^{n} [\begin{matrix} X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i}) 1 \\ (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) W_{i} \\ \frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{0} \\ \frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{1} \\ ({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{matrix}] .

Table 2.

Standardization by inverse probability weighting and by g-computation

Inverse probability weighting
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{α}$ in the propensity score logistic regression model	$\hat{\Pr} (X_{i} = 1) = expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) 1 \\ (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of $Y$ among individuals with $X = 0$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 0 \| W_{i})$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of $Y$ among individuals with $X = 1$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 1 \| W_{i})$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

Inverse probability weighting
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{α}$ in the propensity score logistic regression model	$\hat{\Pr} (X_{i} = 1) = expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) 1 \\ (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of $Y$ among individuals with $X = 0$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 0 \| W_{i})$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of $Y$ among individuals with $X = 1$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 1 \| W_{i})$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

G-computation
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{β}$ in the outcome logistic regression model	$\hat{\Pr} (Y_{i} = 1) = expit (β_{0} + β_{1} X_{i} + β_{2} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) 1 \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) X_{i} \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 0$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 1$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

G-computation
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{β}$ in the outcome logistic regression model	$\hat{\Pr} (Y_{i} = 1) = expit (β_{0} + β_{1} X_{i} + β_{2} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) 1 \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) X_{i} \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 0$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 1$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

Table 2.

Standardization by inverse probability weighting and by g-computation

Inverse probability weighting
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{α}$ in the propensity score logistic regression model	$\hat{\Pr} (X_{i} = 1) = expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) 1 \\ (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of $Y$ among individuals with $X = 0$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 0 \| W_{i})$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of $Y$ among individuals with $X = 1$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 1 \| W_{i})$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

Inverse probability weighting
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{α}$ in the propensity score logistic regression model	$\hat{\Pr} (X_{i} = 1) = expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) 1 \\ (X_{i} - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of $Y$ among individuals with $X = 0$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 0 \| W_{i})$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{(1 - X_{i}) Y_{i}}{1 - expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of $Y$ among individuals with $X = 1$ ⁠, weighted by the inverse of $\hat{\Pr} (X_{i} = 1 \| W_{i})$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})}\}$	$\sum_{i = 1}^{n} \{\frac{X_{i} Y_{i}}{expit ({\hat{α}}_{0} + {\hat{α}}_{1} W_{i})} - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

G-computation
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{β}$ in the outcome logistic regression model	$\hat{\Pr} (Y_{i} = 1) = expit (β_{0} + β_{1} X_{i} + β_{2} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) 1 \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) X_{i} \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 0$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 1$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

G-computation
Estimation steps
#	Description	Equation	Estimating equation
1	Estimate $\hat{β}$ in the outcome logistic regression model	$\hat{\Pr} (Y_{i} = 1) = expit (β_{0} + β_{1} X_{i} + β_{2} W_{i})$	$\sum_{i = 1}^{n} [\begin{matrix} (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) 1 \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) X_{i} \\ (Y_{i} - {\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) W_{i} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \end{matrix}]$
2	Estimate the standardized risk under no anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 0$	${\hat{μ}}_{0} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{0}\} = 0$
3	Estimate the standardized risk under anaemia by taking the mean of the predicted probabilities of $Y$ using $\hat{β}$ and setting $X = 1$	${\hat{μ}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i})$	$\sum_{i = 1}^{n} \{expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{1}\} = 0$
4	Estimate the risk difference	$\hat{δ} = {\hat{μ}}_{1} - {\hat{μ}}_{0}$	$\sum_{i = 1}^{n} \{({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ}\} = 0$

Here, M-estimation produces point estimates and standard errors (and thus CIs) for all the parameters $\hat{θ} = (\hat{α}, \hat{μ}, \hat{δ})$ ⁠. Using the sandwich variance estimator, the estimated standard errors for ${\hat{μ}}_{0}$ and ${\hat{μ}}_{1}$ appropriately account for the uncertainty in $\hat{α}$ ⁠. For $\hat{δ}$ ⁠, the sandwich variance estimator is an implementation of the delta method (because this estimating equation is only a function of other parameters and not directly a function of the data), incorporating uncertainty in ${\hat{μ}}_{0}$ and ${\hat{μ}}_{1}$ ⁠.⁷

In practice, it is common to estimate so-called ‘robust’ variances when implementing IPW.¹⁹ Such variances are estimated using the sandwich variance estimator where the stack of estimating equations does not include the estimating equations for the propensity score model parameters, thus treating the propensity score parameters as known. With IPW for the risk difference, these robust variances are conservative (i.e. larger than the empirical variance).²⁰ In contrast, M-estimation provides a consistent variance estimator when including the propensity score model estimating equations. When using odds weights (e.g. to standardize to individuals with anaemia), the robust standard errors may be anticonservative, whereas M-estimation still provides a consistent variance estimator.²⁰

G-computation

The bottom of Table 2 describes the four steps to estimate the marginal risk difference by g-computation²¹ and the estimating equations corresponding to each step. The translation of each step follows from the previous examples and is shown in detail in the Supplementary Material (available as Supplementary data at IJE online). Finally, we stack these estimating equations on top of each other:

\sum_{i = 1}^{n} [\begin{matrix} Y_{i} - expit ({\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i}) 1 \\ (Y_{i} - expit ({\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i})) X_{i} \\ (Y_{i} - expit ({\hat{β}}_{0} + {\hat{β}}_{1} X_{i} + {\hat{β}}_{2} W_{i})) W_{i} \\ expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 0 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{0} \\ expit ({\hat{β}}_{0} + {\hat{β}}_{1} \times 1 + {\hat{β}}_{2} W_{i}) - {\hat{μ}}_{1} \\ ({\hat{μ}}_{1} - {\hat{μ}}_{0}) - \hat{δ} \end{matrix}] = [\begin{matrix} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{matrix}] .

Here, M-estimation produces point estimates and standard errors for all the parameters $\hat{θ} = (\hat{β}, \hat{μ}, \hat{δ})$ ⁠. If we had alternatively used MLE to estimate the outcome model in Step 1 (Table 2), we would typically rely on bootstrap to estimate standard errors for the risks and the risk difference. Although generally easy to implement, bootstrap can be computationally inefficient, particularly with large sample sizes, as it requires having to resample and compute the estimator using each of the resamples. Instead, M-estimation only requires numerically approximating the derivatives and matrix algebra operations for the sandwich variance estimator, making it more computationally efficient than the bootstrap.

Addressing outcome misclassification

Here, we leverage external validation data to address outcome misclassification to estimate the prevalence of the outcome. We include only a summary of the approach in the main text, with technical details in the Supplementary Material (available as Supplementary data at IJE online).

Specifically, we want to estimate the point prevalence of HIV treatment for 950 HIV-positive adults in Multicenter AIDS Cohort Study/Women’s Interagency HIV Study (MACS/WIHS) combined cohort study in 1995.⁵^,²² However, only self-reported treatment is available, which is a potentially misclassified measurement of our outcome of interest. In an external sample (two studies pooled: one in the MACS and one in the University of North Carolina centre for AIDS Research HIV Cohort study), we have data on both the gold standard measure of HIV treatment (based on medical and pharmacy records) and self-reported treatment for 331 individuals. This is an example of data fusion where multiple data sources (here, the study sample and external validation data) are combined to estimate the parameter of interest.

To correct for the measurement error, we use the Rogan Gladen equation, which corrects the probability of the mismeasured outcome using the sensitivity and specificity of the measurement. In the Supplementary Material (available as Supplementary data at IJE online), we provide the Rogan Gladen equation, illustrate derivation of the estimating equations and provide the full stack of equations. The stack includes equations for four parameters: (i) mismeasured outcome prevalence in the MACS/WIHS; (ii) the sensitivity from the external sample; (iii) the specificity from the external sample; and (iv) the corrected outcome prevalence in the MACS/WIHS, our parameter of interest.

Using M-estimation (and the sandwich variance estimator), the estimated variance for the corrected prevalence appropriately incorporates the uncertainty in the estimates of sensitivity and specificity from the external sample. An alternative for variance estimation would be to use bootstrap. Because we have two samples (the MACS/WIHS sample and the external sample), the bootstrap would entail resampling separately from each sample. This example highlights that M-estimation can be used when data are not independent and identically distribution (i.e. iid), which is generally the case in data fusion analyses.

Using M-estimation, the naïve prevalence of treatment based on self-report (i.e. subject to measurement error) is 0.72 (95% CI 0.69, 0.74). The estimated sensitivity and specificity are 0.84 (95% CI 0.80, 0.89) and 0.80 (95% CI 0.71, 0.88), respectively. The measurement-error-corrected prevalence of treatment is 0.80 (95% CI 0.72, 0.88).

Discussion

Here, we introduced M-estimation and provided four examples illustrating its application to address common problems in epidemiology. Further examples and technical details are available.²^,⁵^,⁷^,¹³ MLE is a default method and is easy to implement for many analytical approaches used in epidemiology. However, there are some settings where the flexibility of M-estimation may be preferred, including estimating multiple parameters simultaneously by stacking equations together, automating the delta method, or when data are not independent and identically distributed. The computational efficiency of M-estimation over the bootstrap may be meaningful when paired with multiple imputation for missing data or conducting sensitivity analysis across a range of inputs.

M-estimation has limitations. First, as shown, there is an estimating equation for each parameter, meaning that theoretically M-estimation only applies to a finite number of parameters. Therefore, standard M-estimation is not justified for some nonparametric applications, like the Kaplan–Meier estimator. Second, the sandwich variance estimator involves taking the derivative of the estimating equation, so the estimating equation must be differentiable. The estimating equation for the median, for example, is not differentiable.² Third, M-estimation cannot be used when the estimating equation depends on $i$ ⁠. Such is the case with the Cox proportional hazards model.²³ Finally, M-estimation is based on large-sample theory and may not be unbiased in small samples.

Despite these restrictions, M-estimation is a flexible and computationally efficient estimation procedure that is a powerful addition to the epidemiologist’s toolbox.

Data availability

The data underlying this article are available in GitHub, at [https://github.com/rachael-k-ross/Mestimation-worked-example].

Supplementary data

Supplementary data are available at IJE online

Author contributions

All authors designed the tutorial. P.Z. and R.R. created the code. R.R. drafted the paper. All authors critically revised the paper.

Funding

R.R. is supported in part by a grant from the National Institute of Drug Abuse (R01DA056407). P.Z. is supported by a training grant from National Institute of Allergy and Infectious Diseases (T32AI007001) and in part by a grant from the National Institute of Allergy and Infectious Diseases (K01AI177102). S.C. is supported in part by a grant from the National Institute of Allergy and Infectious Diseases (R01AI157758).J.S.A.S. is supported by grants from the National Institutes of Health (5P30AI050410) and Gates Foundation (INV016221).

Conflict of interest

None declared.

References

1

Cole

SR

,

Chu

H

,

Greenland

S.

Maximum likelihood, profile likelihood, and penalized likelihood: a primer

.

Am J Epidemiol

2014

;

179

:

252

–

60

.

2

Stefanski

LA

,

Boos

DD.

The calculus of M-estimation

.

Am Stat

2002

;

56

:

29

–

38

.

3

Naimi

AI

,

Cole

SR

,

Kennedy

EH.

An introduction to g methods

.

Int J Epidemiol

2017

;

46

:

756

–

62

.

4

Hernán

MA

,

Robins

JM.

Causal Inference: What If

.

Boca Raton

:

Chapman & Hall/CRC

,

2020

.

Google Preview

OpenURL Placeholder Text

5

Cole

SR

,

Edwards

JK

,

Breskin

A

et al.

Illustration of two fusion designs and estimators

.

Am J Epidemiol

2022

;

192

:

467

–

74

.

6

Kulesa

A

,

Krzywinski

M

,

Blainey

P

,

Altman

N.

Sampling distributions and the bootstrap

.

Nat Methods

2015

;

12

:

477

–

78

.

7

Boos

DD

,

Stefanski

LA.

Chapter 7. M-estimation (estimating equations). In:

Essential Statistical Inference Theory Methods

. New York:

Springer

,

2013

, pp.

297

–

337

.

8

Press

WH

,

Teukolsky

SA

,

Vetterling

WT

,

Flannery

BP.

Root finding and nonlinear sets of equations. In:

Numerical Recipes: The Art of Scientific Computing.

3rd edn. Cambridge:

Camrbidge University Press

,

2007

, pp.

442

–

486

.

Google Preview

OpenURL Placeholder Text

9

Virtanen

P

,

Gommers

R

,

Oliphant

TE

et al. ;

SciPy 1.0 Contributors

.

SciPy 1.0: fundamental algorithms for scientific computing in Python

.

Nat Methods

2020

;

17

:

261

–

72

.

10

Soetaert

K

,

Hindmarsh

AC

,

Eisenstat

SC

,

Moler

C

,

Dongarra

J

,

Saad

Y.

rootSolve: Nonlinear Root Finding, Equilibrium and Steady-State Analysis of Ordinary Differential Equations [Internet].

2021

. https://cran.r-project.org/package=rootSolve (6 February 2024, date last accessed).

11

Rothman

KJ

,

Greenland

S

,

Lash

TL.

Modern Epidemiology

.

Philadelphia, PA

:

Lippincott Williams & Wilkins

,

2008

.

Google Preview

OpenURL Placeholder Text

12

Saul

BC

,

Hudgens

MG.

The calculus of M-estimation in R with geex

.

J Stat Softw

2020

;

92

:

1

–

15

.

13

Zivich

PN

,

Klose

M

,

Cole

SR

,

Edwards

JK

,

Shook-Sa

BE.

Delicatessen: M-Estimation in Python. arXiv, doi:arXiv:2203.11300, 10 October

2022

, preprint: not peer reviewed. http://arxiv.org/abs/2203.11300 (6 February 2024, date last accessed).

14

Castillo

MC

,

Fuseini

NM

,

Rittenhouse

KJ

et al.

Zambian Preterm Birth Prevention Study (ZAPPS): cohort characteristics at enrollment

.

Gates Open Res

2018

;

2

:

25

–

18

.

15

Mansournia

MA

,

Nazemipour

M

,

Naimi

AI

,

Collins

GS

,

Campbell

MJ.

Reflection on modern methods: demystifying robust standard errors for epidemiologists

.

Int J Epidemiol

2021

;

50

:

346

–

51

.

16

Royall

RM.

Model robust confidence intervals using maximum likelihood estimators

.

Int Stat Rev Rev Int Stat

1986

;

54

:

221

.

17

Zou

G.

A modified poisson regression approach to prospective studies with binary data

.

Am J Epidemiol

2004

;

159

:

702

–

706

.

18

McNutt

L-A

,

Wu

C

,

Xue

X

,

Hafner

JP.

Estimating the relative risk in cohort studies and clinical trials of common outcomes

.

Am J Epidemiol

2003

;

157

:

940

–

43

.

19

Robins

JM

,

Hernán

MA

,

Brumback

B.

Marginal structural models and causal inference in epidemiology

.

Epidemiology

2000

;

11

:

550

–

60

.

20

Reifeis

SA

,

Hudgens

MG.

Practice of epidemiology on variance of the treatment effect in the treated when estimated by inverse probability weighting

.

Am J Epidemiol

2022

;

191

:

1

–

6

.

21

Snowden

JM

,

Rose

S

,

Mortimer

KM.

Implementation of G-computation on a simulated data set: demonstration of a causal inference technique

.

Am J Epidemiol

2011

;

173

:

731

–

38

.

22

Cole

SR

,

Jacobson

LP

,

Tien

PC

,

Kingsley

L

,

Chmiel

JS

,

Anastos

K.

Using marginal structural measurement-error models to estimate the long-term effect of antiretroviral therapy on incident AIDS or death

.

Am J Epidemiol

2010

;

171

:

113

–

22

.

23

Lin

DY

,

Wei

LJ.

The robust inference for the cox proportional hazards model

.

J Am Stat Assoc

1989

;

84

:

1074

–

78

.