Conditional Inference in Cis -Mendelian Randomization Using Weak Genetic Factors

Est. gives the point estimate of the method if applicable.

					F-AR	F-LM	F-CLR
Est.	0.131	0.132	0.133	0.129	-	-	-
CI-L	0.052	0.052	0.051	0.045	-0.050	0.052	0.051
CI-U	0.210	0.211	0.215	0.220	0.334	0.213	0.215
Q-stat.	0.997	0.997	0.964	0.945	0.998*	0.999*	0.999*

					F-AR	F-LM	F-CLR
Est.	0.131	0.132	0.133	0.129	-	-	-
CI-L	0.052	0.052	0.051	0.045	-0.050	0.052	0.051
CI-U	0.210	0.211	0.215	0.220	0.334	0.213	0.215
Q-stat.	0.997	0.997	0.964	0.945	0.998*	0.999*	0.999*

Note: CI-L and CI-U are the lower and upper estimated 95% confidence intervals. The brackets after S-LIML indicate the threshold level υ taken. formula gives the p-value associated with testing the null of no heterogeneity in instrument–LDL-C and instrument–CHD associations using the Sargan–Hansen test; see the discussion in Section 5.

TABLE 1

Est. gives the point estimate of the method if applicable.

					F-AR	F-LM	F-CLR
Est.	0.131	0.132	0.133	0.129	-	-	-
CI-L	0.052	0.052	0.051	0.045	-0.050	0.052	0.051
CI-U	0.210	0.211	0.215	0.220	0.334	0.213	0.215
Q-stat.	0.997	0.997	0.964	0.945	0.998*	0.999*	0.999*

					F-AR	F-LM	F-CLR
Est.	0.131	0.132	0.133	0.129	-	-	-
CI-L	0.052	0.052	0.051	0.045	-0.050	0.052	0.051
CI-U	0.210	0.211	0.215	0.220	0.334	0.213	0.215
Q-stat.	0.997	0.997	0.964	0.945	0.998*	0.999*	0.999*

In Table 1, the confidence intervals for S-LIML and identification-robust methods are obtained by test inversion. The 95% asymptotic confidence intervals for the F-CLR and F-LM tests are similar to the S-LIML intervals, while the F-AR approach is much less precise and is unable to reject the null hypothesis of no causal association (F-AR p-value: 0.455).

The results of alternative summary data methods are presented in Table 2. We find that DecorrIVW and CLR with correlated variants are quite sensitive to the pruning threshold chosen, with the 95% asymptotic confidence interval of DecorrIVW-1 not overlapping with the DecorrIVW-2 interval.

Our simulation study illustrated how our factor-based approaches were relatively robust to biases from direct variant effects on the outcome. The heterogeneity plots in Figure 3 can provide insight on the coherency of evidence across multiple instruments. A more formal method to test for excessive heterogeneity uses the Sargan–Hansen (1982) test. Table 1 shows that the F-LIML and S-LIML approaches provide strong evidence of no heterogeneity when using estimated factors as instruments. Since identification-robust methods do not provide point estimates, for the starred entries in the last rows of Tables 1 and 2, we evaluated the Sargan–Hansen test statistic (1982) at the mid-point of the relevant confidence interval. There was no ‘degrees of freedom’ correction for this substitution which should lead to more conservative p-values (i.e., we are less likely to reject the null of no heterogeneity). Despite this, more liberal pruning-based approaches show evidence of greater heterogeneity when considering individual variants as instruments.

TABLE 2

Est. gives the point estimate of the method if applicable.

	CLR-0	CLR-1	CLR-2	CLR-4	RAPS-0	DIVW-0	DecorrIVW-1	DecorrIVW-2
Est.	–	–	–	–	0.122	0.120	0.068	0.117
CI-L	0.056	0.021	0.088	0.107	0.081	0.080	0.043	0.101
CI-U	0.218	0.120	0.153	0.147	0.163	0.160	0.092	0.133
Q-stat.	0.282*	0.610*	0.391*	0.012*	0.252	0.252	0.413	0.000

	CLR-0	CLR-1	CLR-2	CLR-4	RAPS-0	DIVW-0	DecorrIVW-1	DecorrIVW-2
Est.	–	–	–	–	0.122	0.120	0.068	0.117
CI-L	0.056	0.021	0.088	0.107	0.081	0.080	0.043	0.101
CI-U	0.218	0.120	0.153	0.147	0.163	0.160	0.092	0.133
Q-stat.	0.282*	0.610*	0.391*	0.012*	0.252	0.252	0.413	0.000

Note: CI-L and CI-U are the lower and upper estimated 95% confidence intervals. formula gives the p-value associated with testing the null of no heterogeneity in instrument–LDL-C and instrument–CHD associations using the Sargan–Hansen test; see the discussion in Section 5.

TABLE 2

Est. gives the point estimate of the method if applicable.

	CLR-0	CLR-1	CLR-2	CLR-4	RAPS-0	DIVW-0	DecorrIVW-1	DecorrIVW-2
Est.	–	–	–	–	0.122	0.120	0.068	0.117
CI-L	0.056	0.021	0.088	0.107	0.081	0.080	0.043	0.101
CI-U	0.218	0.120	0.153	0.147	0.163	0.160	0.092	0.133
Q-stat.	0.282*	0.610*	0.391*	0.012*	0.252	0.252	0.413	0.000

	CLR-0	CLR-1	CLR-2	CLR-4	RAPS-0	DIVW-0	DecorrIVW-1	DecorrIVW-2
Est.	–	–	–	–	0.122	0.120	0.068	0.117
CI-L	0.056	0.021	0.088	0.107	0.081	0.080	0.043	0.101
CI-U	0.218	0.120	0.153	0.147	0.163	0.160	0.092	0.133
Q-stat.	0.282*	0.610*	0.391*	0.012*	0.252	0.252	0.413	0.000

Overall, our findings provide robust evidence that genetically-predicted lower LDL-C levels using variants in the CETP gene region are associated with a lower risk of CHD.

6. Conclusion

There is an increasing focus on using cis-MR analyses to guide drug development; genetic evidence may be crucial to support novel targets, precision medicine subgroups, and the design of expensive clinical trials (Gill et al., 2021). The use of a few uncorrelated variants as instruments may lead to inferences which are vulnerable to direct variant effects on the outcome. On the other hand, using correlated variants as instruments may result in unstable inferences which are particularly sensitive to common problems of misspecified variant correlations. We believe that our factor-based approach provides a robust and practical way to assess the general weight of evidence for a causal association from the gene region of interest. Given its reliable performance in simulation, we recommend the use of the F-CLR test alongside existing robust methods in cis-MR investigations, especially in settings where there are multiple genetic signals for the exposure in the gene region.

A limitation of our approach is that we require the knowledge of a large genetic correlation matrix of the gene region, and systematic misspecification of these correlations (e.g., due to structural differences in the two sampled populations) may lead to biased inferences. At the same time, in settings where all variants are valid instruments, methods which use individual variants as instruments, rather than genetic factors, tend to provide a more powerful analysis, especially in cases where there are only a few strong genetic signals for the exposure. Furthermore, when the quality of available genetic correlation estimates is in doubt or there is only one strong genetic signal, it may be sensible to compare results from methods using correlated variants to those of the Wald ratio estimator based on the variant most strongly associated with the exposure. Finally, our approach assumes that the true number of factors is known. We leave for future the work the problem of formalizing a potential bias-variance trade off associated with the use of additional factors as instruments, and developing a procedure that determines an optimal selection.

Data Availability Statement

The data that support the findings in this paper are openly available. Summary data on coronary artery disease from CARDIoGRAMplusC4D investigators were accessed at http://www.CARDIOGRAMPLUSC4D.ORG. Summary data on low-density lipoprotein cholesterol concentrations from Neale Lab's analysis of UK Biobank data were accessed at http://www.nealelab.is/uk-biobank/. Data on genetic variant correlations from The 1000 Genomes Project Consortium (1000 Genomes Project, Auton et al., 2015) were accessed using the twosampleMR R software package (Hemani et al., 2018).

Acknowledgments

Ashish Patel and Paul Newcombe were funded by the UK Medical Research Council (programme number MC-UU-00002-9). Paul Newcombe also acknowledges support from the NIHR Cambridge BRC. Stephen Burgess was supported by Sir Henry Dale Fellowship jointly funded by the Wellcome Trust and the Royal Society (grant number 204623-Z-16-Z). Dipender Gill was funded by the Wellcome 4i Program at Imperial College London (award number 203928-Z-16-Z), the British Heart Foundation Centre of Research Excellence (RE-18-4-34215) at Imperial College London, and a National Institute for Health Research Clinical Lectureship at St George' s, University of London (CL-2020-16-001). We thank Jack Bowden, Francis DiTraglia, Apostolos Gkatzionis, Alexei Onatski, Richard Smith, and Chris Wallace for helpful discussions. We thank two anonymous referees and the associate editor for detailed comments.

References

Anderson

T.W.

Rubin

(

1949

)

Estimation of the parameters of a single equation in a complete system of stochastic equations

The Annals of Mathematical Statistics

–

Andrews

D.W.

Moreira

M.J.

Stock

J.H.

(

2006

)

Optimal two-sided invariant similar tests for instrumental variables regression

Econometrica

715

–

752

Andrews

D.W.

Moreira

M.J.

Stock

J.H.

(

2007

)

Performance of conditional Wald tests in IV regression with weak instruments

Journal of Econometrics

139

116

–

132

Andrews

Stock

J.H.

Sun

(

2019

)

Weak instruments in instrumental variables regression: theory and practice

Annual Review of Economics

727

–

753

Auton

Abecasis

G.R.

Altshuler

D.M.

Durbin

R.M.

Bentley

D.R.

McEwen

J.E.

(

2015

)

A global reference for human genetic variation

Nature

526

–

Bai

(

2003

)

Inferential theory for factor models of large dimensions

Econometrica

135

–

171

Bai

(

2002

)

Determining the number of factors in approximate factor models

Econometrica

191

–

221

Bai

(

2010

)

Instrumental variable estimation in a data rich environment

Econometric Theory

1577

–

1606

Bi, N., Kang, H. & Taylor, J. (2020) Inferring treatment effects after testing instrument strength in linear models. arXiv:2003.06723 (pre-print),

–

Bowden

Del Greco M

Minelli

Zhao

Lawlor

D.A.

Sheehan

N.A.

(

2019

)

Improving the accuracy of two-sample summary-data Mendelian randomization: moving beyond the NOME assumption

International Journal of Epidemiology

728

–

742

Bowman

Hopewell

J.C.

Chen

Wallendszus

Stevens

Collins

R.T.

HPS3 and TIMI55 REVEAL Collaborative Group

. (

2017

)

Effects of anacetrapib in patients with atherosclerotic vascular disease

New England Journal of Medicine

377

1217

–

1227

Burgess

Butterworth

Thompson

S.G.

(

2013

)

Mendelian randomization analysis with multiple genetic variants using summarized data

Genetic Epidemiology

658

–

665

Burgess

Mason

A.M.

Grant

Slob

E. A.W.

Gkatzionis

Gill

(

2023

)

Using genetic association data to guide drug discovery and development: review of methods and applications

American Journal of Human Genetics

110

195

–

214

Chao

J.C.

Swanson

N.R.

(

2005

)

Consistent estimation with a large number of weak instruments

Econometrica

1673

–

1692

Fithian, W., Sun, D.L. & Taylor, J. (2017) Optimal inference after model selection. arXiv:1410.2597 (pre-print),

–

Gill

Georgakis

M.K.

Walker

V.M.

Schmidt

A.F.

Gkatzionis

Davies

N.M.

(

2021

)

Mendelian randomization for studying the effects of perturbing drug targets

Wellcome Open Research

–

Goering

H.H.

Terwilliger

J.D.

Blangero

(

2002

)

Large upward bias in estimation of locus-specific effects from genome-wide scans

The American Journal of Human Genetics

1357

–

1369

Guggenberger

Kumar

(

2012

)

On the size distortion of tests after an overidentifying restrictions pretest

Journal of Applied Econometrics

1138

–

1160

Hansen

L.P.

(

1982

)

Large sample properties of generalized method of moments estimators

Econometrica

1029

–

1054

Hemani

Bowden

Davey Smith

(

2018

)

Evaluating the potential role of pleiotropy in Mendelian randomization studies

Human Molecular Genetics

R195

–

R208

Hemani

, Zheng. J.,

Elsworth

Wade

K.H.

Haberland

Baird

Laurin

Burgess

Bowden

Langdon

Tan

V.Y.

Yarmolinsky

Shihab

H.A.

Timpson

N.J.

Evans

D.M.

Relton

Martin

R.M.

Davey Smith

Gaunt

T.R.

Haycock

P.C.

(

2018

)

The MR-Base platform supports systematic causal inference across the human phenome

eLife

–

Kleibergen

(

2005

)

Testing parameters in GMM without assuming that they are identified

Econometrica

1103

–

1123

Moreira

M.J.

(

2003

)

A conditional likelihood ratio test for structural models

Econometrica

1027

–

1048

Mounier

Kutalik

(

2023

)

Bias correction for inverse variance weighting Mendelian randomization

Genetic Epidemiology

–

Newey

W.K.

Windmeijer

(

2009

)

Generalized method of moments with many weak moment conditions

Econometrica

687

–

719

Nikpay et al., M.

(

2015

)

A comprehensive 1000 Genomes-based genome-wide association meta-analysis of coronary artery disease

Nature Genetics

1121

–

Onatski

(

2010

)

Determining the number of factors from empirical distribution of eigenvalues

Review of Economics and Statistics

1004

–

1016

Sampson

A.R.

Sill

M.W.

(

2005

)

Drop-the-losers design: normal case

Biometrical Journal

257

–

268

Schmidt

A.F.

Hunt

N.B.

Gordillo-Maranon

Charoen

Drenos

Finan

(

2021

)

Cholesteryl Ester Transfer Protein (CETP) as a drug target for cardiovascular disease

Nature Communications

–

Stelzer

Rosen

Plaschkes

Zimmerman

Twik

Fishilevich

Stein

T.I.

Nudel

Lieder

Mazor

Kaplan

Dahary

Warshawsky

Guan-Golan

Kohn

Rappaport

Safran

Lancet

(

2016

)

The GeneCards suite: From gene data mining to disease genome sequence analyses

Current Protocols in Bioinformatics

–

Stock

J.H.

Wright

J.H.

Yogo

(

2002

)

A survey of weak instruments and weak identification in generalized method of moments

Journal of Business and Economic Statistics

518

–

529

Sudlow

Gallacher

Allen

Beral

Burton

Danesh

(

2015

)

UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age

PLoS Medicine

–

Swerdlow

D.I.

Kuchenbaecker

K.B.

Shah

Sofat

Holmes

M.V.

Hingorani

A.D.

(

2016

)

Selecting instruments for Mendelian randomization in the wake of genome-wide association studies

International Journal of Epidemiology

1600

–

1616

Walker

V.M.

Smith

G.D.

Davies

N.M.

Martin

R.M.

(

2017

)

Mendelian randomization: a novel approach for the prediction of adverse drug events and drug repurposing opportunities

International Journal of Epidemiology

2078

–

2089

Wang

Kang

(

2022

)

Weak-instrument robust tests in two-sample summary-data Mendelian randomization

Biometrics

1699

–

1713

Yang

Ferreira

Morris

A.P.

Medland

S.E.

, Genetic Investigation of ANthropometric Traits (GIANT)

Consortium

DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium

Madden

P.A.

Heath

A.C.

Martin

N.G.

Montgomery

G.W.

Weedon

M.N.

Loos

R.J.

Frayling

T.M.

McCarthy

M.I.

Hirschhorn

J.N.

Goddard

M.E.

Visscher

P.M.

(

2012

)

Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits

Nature Genetics

369

–

375

B.T.

Shao

Kang

(

2021

)

Debiased inverse-variance weighted estimator in two-sample summary-data Mendelian randomization

Annals of Statistics

2079

–

2100

Zhao

Wang

Hemani

Bowden

Small

D.S.

(

2020

)

Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score

Annals of Statistics

1742

–

1769