Research into REF evaluator behaviour offers three key pieces of advice for REF2021

People always ask me, what is impact? Indeed, this question seems to be the key concern of many researchers and research administrators globally, and even more for UK researchers in the lead up to the REF2021.

I am not wanting to disappoint those who seek these answers within this humble blog, but I am not going to answer this question here. Indeed, I cannot answer this, and neither can anyone else despite what they tell you.

That is because, for the sake of research evaluation, there is a key component missing in the question of what is impact: impact is not impact until someone says that it is.

What is missing is an appreciation of how impact appeals to those who not only recognise it as impact, but value it as impact or not. I’m talking here about those who officially evaluate impact in processes such as the UK’s Research Excellence Framework (REF).

Flickr/AJ Cann, CC BY-SA

In REF2014, impact was valued by a group of evaluators using peer review. This is a process that is understood by all researchers. Theoretically, peer review, as a process, guarantees both a fair (I know, I did say “theoretically”) and legitimate outcome, but is it the right process to assess impact given so few of us understand what is impact in the first place?

This is the question I address in my new book, “The Evaluators’ Eye: Impact assessment and academic peer review”, published by Palgrave Macmillan. Using data from the REF2014 evaluation, I argue that the meaning of impact and the system used to value it are two very different things. When impact is evaluated using peer review, the outcome depends on who the evaluators were, as well as how the group assessed impact together. In short, deciding what is impact is not and cannot be divorced from the process used to assess it and the behaviour of the evaluators as they negotiated evaluation outcomes.

Unlike traditional outcomes of research, impact as a concept is ambiguous. Give a peer reviewer, or a research “expert” any article, grant or promotion application and ask them to assess it on the basis of its research excellence, and they will do so happily with little supervision. Ask the same for impact and the process becomes more complex. Indeed, with little personal or professional understanding of the criteria (the researchers who are still asking, ‘what is impact?’), evaluators have little or no basis for their assessment. For many this is because they have not done an evaluation of impact before, while others have only limited experience evaluating some aspects of impact.

This challenge is not new. Lamont’s (2009) and Luukkonen’s (2012) studies of evaluating “interdisciplinarity” in research proposals showed how evaluators tended towards delivering conservative assessments of criteria they found to be ambiguous. This is explained by understanding that evaluators (peers and experts combined) do not feel sufficiently socialised with the concepts behind the criterion in order to deviate too much from the assessment guidelines, and pragmatic behaviours that favour these conservative outcomes. It also means, however, that evaluators are more likely to base their assessment strategies around other proxies for excellence such as an assessment of scientific excellence but also includes biases, interpretations or overly pragmatic shortcuts that mimic “groupthink”. This is neither right nor wrong, as by definition a decision made by peer review by peers and experts cannot ever be considered wrong. What it means is that evaluators require other tools, and perhaps even a different assessment process with which to base their decisions around impact. That is, at least, until impact becomes a more formalised, socialised and recognised part of what we mean when we refer to “research excellence”.

Indeed, I don’t blame the REF2014 evaluators for adopting conservative tools and pragmatic behaviours to navigate the evaluation of impact. These evaluators had a tremendous task before them, and they completed it with little experience or understanding and in the face of some passionate anti-impact advocates. Within these pragmatic behaviours however are insights that can help us prepare ourselves for the next REF in 2021. Based on my findings in The Evaluators’ Eye, I offer the three pieces of advice to researchers and research managers.

1. Think about the evaluators, not just about the impact

Do not think of the evaluators as individuals. Instead, think of each Unit of Assessment (UoA) as a group or collective. In my book, I described how, through deliberation, panels came to a consensus in order to develop a dominant framework (the “eye” in “The Evaluators’ Eye”) that determined how they approached the assessment of each submission. Evaluators described this framework as the “mood music” that guided how they navigated the conceptualisation of impact-related criteria, and weighed up the strengths and weaknesses of each case study.

Identifying this “eye” prior to the REF2021 evaluation is virtually impossible, as it is wholly dependent on what happens in the room, on the day. A different evaluator and expert mix within the group, theoretically this means that a different dominant framework and strategy may be adopted.

The REF2021 UoA panels are likely to include a combination of new, novice evaluators, as well as experienced evaluators who participated in the REF2014 impact assessment process. This change in membership will influence the development of a new dominant framework for the evaluation of impact in 2021. Although conventional wisdom would assume that novice evaluators will follow the lead of more experienced evaluators, resulting in similar strategies being adopted in REF2021 as were adopted in REF2014, this is not necessarily the case. It can be safely assumed that more experienced evaluators will lead the deliberations, but their behaviour around impact will not necessarily mirror what was done for REF2014. Indeed, with experience comes the confidence to mould and develop their conceptualisation of the impact evaluation criteria, which could form a unique assessment strategy that is distinct from the REF2014 process. In this regard, we may see deviations from the guidance provided to panels to assess impact case studies. More experienced evaluators are likely to be more confident in treating the criteria in a flexible manner, moulding it for their requirements, similar to what is seen with evaluators assessing more traditional notions of research excellence. In this, it is likely that we will see the group adopting a less conservative approach to the assessment of impact, and one that therefore results in different outcomes.

This means that despite recent assertions by HEFCE in consultation events, the value of impact may be different in REF2021, and what was four star last time will not necessarily be four star next time. There will be similarities that reflect the similar criteria, but moves to re-create the evaluation process using REF2014 guidelines, procedures and even evaluators have limited value in predicting outcomes. Instead of attempting to game an impossible system, concentrate on ensuring that your case for impact is strong (see below) because at the end of the day, excellent impact always wins.

2. Leave no room for doubt

At the heart of pragmatic approaches to evaluation, lies reassurance that the simplest answer is usually the best. Likewise with impact, it is easy for evaluators to identify and reward case studies that at first glance are clearly worth the top grad of 4 stars. The aim for submissions, therefore, is to be in this category (where the case for impact is clear), and untroubled by causal gaps in impact pathway or doubts as to the contribution of the research to the final impact. Your aim, is not to give evaluators reason to doubt your impact claim.

In my book, I identified a number of strategies that REF2014 used to navigate impact. Two of these, centricity and linearity, stressed how important it was to show how the research contributed to the impact (linearity), and how essential or central this contribution was to the impact (centricity, aka could the impact have happened if this research did not exist?). In addition, in contrast to the way evaluators assess research excellence, when assessing impact evaluators tended towards initially assuming all case studies were 4-star, and deducting points during the evaluation (rather than accruing points through a convincing case study).

As such, do not quickly dismiss the value of small steps on the pathway to larger, quantifiable impacts that are easier to recognise being 4-star when writing your case studies. This means that every step counts and that even small-scale meetings with stakeholders, or fleeting comments that you research had been useful, could make the difference between an evaluator instantly identifying the case study as being worth 4 stars. To this end, case studies should include all possible attributing impact steps to fill in any gaps in the pathway between your research and its impact. In short, if you cannot tell me, clearly and simply, why your research matters, then you are unlikely to be able to convince a panel of evaluators who are looking for more than just box ticking.

3. Follow the rules

REF2014 might have been the first formal assessment of the impact of research beyond academia, but this does not mean that all impact happens on REF-terms. The REF2014 Impact definition; 'an effect on, change or benefit to the economy, society, culture, public policy or services, health, the environment or quality of life, beyond academia' is not real impact. Instead, it is a construction that is specific for the UK REF exercise. As we start to see various societal impact criteria manifest themselves in audit systems across the world, so too do we see deviations in the definition of what is considered impact, and what is not. As an example, in Australia “industry and public engagement” is considered as evidence of Impact, in Italy the reference to “third stream activities” has a specific industry connotation, whereas in the REF2014 notions of “public engagement” were specifically omitted from the definition.

REF impact, therefore, is not real, “organic” impact and so there may be, at times, things that seem to be impact, but are not considered under the umbrella of REF impact. It is tempting to submit these impacts, but it is not the best strategy.

In my book, I explore the concept of ‘evaluation mechanics’ which are those tools given to the evaluation panel to help navigate the assessment process. These mechanics can act either to facilitate (preferred), or infiltrate (not ideal) the assessment process with both intended and unintended consequences. What distinguishes infiltrative and facilitative mechanics is whether we can see ‘artefacts’ in the evaluation outcomes; that is, whether the outcome perfectly reflects these mechanics. This indicates that the panel had little confidence or ability to mould these mechanics to construct a dominant framework or definition adapted to the context of that panel; instead deliberation was guided primarily by the mechanics. This, in effect, encourages conservative behaviours and as a result, conservative outcomes.

An example of this is with how considerations of public engagement as impact were considered in the REF2014. Many REF panellists had a broader personal conceptualisation of impact as they went into the REF process. These evaluators then needed to reconcile their personal opinions about what counts as impact, even if these opinions were shared with a critical mass of other evaluators, with what the evaluation mechanics specified as impact. An example of this is public engagement, where the mechanics provided to the panel specifically omitted public engagement as a component of impact. As a result, despite valuing public engagement personally, evaluators accepted the narrower definition mechanic given for the assessment, and instead only valued the benefits arising from that engagement.

This behaviour, I argue, risks groupthink where the act of deliberation was bypassed in favour of accepting official guidance on definitions. This is different from evaluating scientific excellence, where evaluators feel sufficiently confident to experiment and push the boundaries of the evaluation through expert deliberation, rather than accept the guidance provided by the mechanics blindly.

Of course, when faced with difficulties, it is natural that evaluators adopt pragmatic approaches, and these approaches are driven primarily by the mechanics provided to the evaluators before the evaluation. Thankfully, prior to preparing their submission, submitting institutions are given these guidelines specifically to guide their preparations. So, my advice is to use them.

Gemma Derrick is the co-Director of the Centre for Higher Education Research and Evaluation at Lancaster University. Her book, The Evaluators Eye: Impact Assessment and Academic Peer Review is published by Palgrave Macmillan and is now available through the publisher or on Amazon.

This research was funded by the UK Economic and Social Research Council (ESRC) Future Research Leaders Programme (ES/K008897/2).