The use of EDI data in REF submissions has been on many minds.
Working with EDI data is, broadly speaking, a good idea. But we have every chance of getting it wrong, from missing opportunities to causing serious harm, in REF and beyond.
Given my work in a sector that is world-leading on EDI data – the UK film and TV industry – I have had questions galore land in my inbox. So I wanted to share some lessons from film and TV that I hope will be useful for higher education.
D-Data
We typically use the term EDI data to mean quantitative statistics about identity characteristics protected under the 2010 Equality Act, especially, gender/sex, disability and race. That is not a bad start. But not all protected characteristics are (equally) relevant to getting high-quality talent, creativity, skill and curiosity into our sector.
For instance, I’ve yet to see anyone claim that marital status (a protected characteristic) matters but there is plenty of evidence that caring responsibilities (not a protected characteristic) do. If we look only at protected characteristics, which is typically what higher education institutions have in their HR database, we likely miss a big part of the stories we could tell.
Statistics show how an identity characteristic is distributed in a group or cohort. Knowing how identities vary in a group is useful. But identity statistics alone do not tell us anything about equality, equity or inclusion. They don’t tell us about people’s experiences or about who does or doesn’t get opportunity and resources. In other words, identity statistics can only ever be indicators of the “D” in EDI.
I will use the term “diversity data” for quantitative information about individual characteristics that are relevant to inclusion, opportunity, advancement and outcomes.
What do the numbers mean?
Since the 2010 Equality Act it has become standard practice in the UK to collect and monitor diversity data. We routinely look at identity statistics for staff, students, audiences etc. In both higher education and film and TV, funding is often tied to the submission of diversity data. Admittedly, some diversity targets in UK film and TV make the NIHR’s previous Athena SWAN requirement look like a hurdle next to a pole vault bar. But certainly, in the TV workforce, participation is slowly becoming more equitable.
It’s no mean feat to compile diversity data. But using it is more complex than we admit. The underlying idea of diversity statistics is “representation”: we want to know whether individuals with a characteristic are appropriately represented. On their own, identity statistics do not answer such questions. To assess whether a group is appropriately represented, we need a benchmark and a purpose .
Benchmarking
Choosing meaningful benchmarks is key. The most widely used benchmarks are Office for National Statistics data on the UK population or UK labour force. The underlying rationale goes like this: If the diversity of your staff differs from the demographic profile of the labour force, individuals from certain groups might face systematic challenges and unfair barriers to getting hired by you.
However, workforce profiles vary considerably by UK nations and regions. For instance, a share of 30 per cent Black and minoritised ethnic research and research enabling staff might roughly mirror the regional labour force in London or the Midlands but not in North England, Wales or Scotland. But is the correct benchmark really a higher education institution’s regional labour market anyway? Or should it be the national or UK labour market? Or even an international one? Arguments for meaningful benchmarks need to be made on a case-by-case basis
Importantly, our benchmarks need to fit the purpose of our data collection. If, for instance, that purpose were to “ensure our staff profile mirrors that of the regional workforce, so that, as an institution with civic responsibilities, we can support regional talent and skill to thrive” regional statistics can be a meaningful benchmark. But if we wanted to increase the proportion of women professors in an engineering faculty, tracking gender data for that faculty over time or comparing it to gender data from other institutions’ engineering faculties or from large industry employers could be more meaningful.
Without knowing the purpose of diversity data collection, it is impossible to assess whether meaningful measures and benchmarks have been chosen and progress has been made. Percentage figures on their own do not carry meaning and are impossible to assess.
Handle with care
Collecting, processing, managing and reporting diversity data is subject to legal protection (General Data Protection Regulation). In addition, diversity data should always be based on information provided by individuals about themselves. It is not acceptable to assess, for instance, the gender or race composition of a research unit on the basis of perceived gender or race. To cut a long Critical Data Studies lecture short, we cannot just look around the room and count who we think is Black, disabled or gay, and then put that information into a REF environment statement. Just… don’t.
We often overlook that diversity data makes not only those staff who provide information about their identity characteristics vulnerable, but also those who handle it. Information about our colleagues’ identity characteristics stays with us beyond REF and may make us think something like: “I won’t suggest he apply for this role; it’s too demanding for someone who is disabled.” Which is not considerate, it’s discrimination. And if we are found to have excluded, or discriminated against, another person, our health and well-being, career, reputation or material circumstances will likely be negatively affected.
REF should never expose colleagues who otherwise do not handle non-anonymised diversity data to information about their colleagues’ identities. No one who is preparing a UoA submission, and who doesn’t routinely have access to non-anonymised diversity data, should be put at the risk of looking at anything other than properly anonymised summary statistics, with small numbers redacted, for instance following HESA guidelines.
The D, the E and the I
One more learning to share before I get constructive. This one is about the link between the D in EDI, and the E and I. Diversity data can be hugely valuable in our quest for healthy research environments and for recruiting and retaining creative, innovative talent. It can usefully draw attention to processes or practices that might be problematic. If, for instance, the share of disabled staff at professorial/leadership level is considerably lower than at lower grades, our promotions process might make it disproportionately more difficult for disabled people to advance into leadership roles. And that would certainly be worth investigating.
However, “good” diversity data is not a robust indicator of good and inclusive practice. The identity stats for a research unit can “look ok” but staff may still not be appropriately included in decisions on a research project, or may be discriminated against in the author sequence for a publication. Too strong a focus on diversity statistics can itself cause problems. Aiming to meet a percentage target for individuals from particular groups can lead to “diversity hires” and to the stigmatisation of staff perceived to have been recruited for their identity characteristics rather than their skills, expertise and experience. Which would be anything but a healthy research environment.
Diversity (the stats part) and inclusion (the lived reality part) do go together like a horse and carriage in the sense that they often show up next to each other. But unlike what Frank Sinatra claims about love and marriage, you can have one without the other. Which is especially problematic if you have “good” diversity data without inclusive behaviour.
Next steps
So yes, it’s complicated. But it’s also not that difficult. The most important EDI data lesson from UK film and TV is second nature to everyone involved in research: define your purpose and question, and then figure out which data relates to it. You might even find that a simpler, tightly focused data set is more robust, appropriate and convincing than throwing a lot of quantitative evidence onto the table (pun intended).
Here’s a few tips to take away and use in your own work:
- Provide context, show your workings: Don’t just include identity stats, explain what you take them to mean, how you use them, what you compare them to.
- Think beyond diversity data: Rather than using information about staff’s identities, is there other data you can use – project evaluations, research culture surveys, staff consultations? Data that speaks to the I, or even the E, in EDI?
- Protect everyone involved. Diversity data is for life (and about lives), not just for REF.
It is not yet clear what, if any, (diversity) data will be required or optional for REF submissions. But we always have more choice than we think – about the questions we ask, and the data we choose to answer them with. REF is one of the many moments we get to make those choices. We should welcome them, consciously and constructively, and not default to merely reactive counting exercises.

