Cloaks swish. Cameras flash. It’s graduation day, the culmination of years of effort. It celebrates learning journeys whose outcomes have nurtured the realisation of talents as varied as our students themselves.
It is a triumphant moment. It is also the moment in which the sector reveals the outcome of its own Magic Sorting Hat, whose sorcery is to collapse all this richness into a singular measure. As students move across the stage to grasp the sweaty palm of the VC or a visiting dignitary, they are anointed.
You are a First. You are a Third. You are a 2:1.
There is something absurd about this, that such diverse, hard-won successes can be reduced to so little. That absurdity invites a bit of playfulness. So, indulge me in a couple of thought experiments. They are fun, but I hope they reveal something more serious about the way we think about standards, and how often that crowds out a conversation about value.
Thought experiment one: What if classifications are more noise than signal?
Let us begin with something obvious. Like any set of grades, classifications exist to signal a hierarchy. They are supposed to say something trustworthy about the distribution of talent – where a First signals the pinnacle of academic mastery. What “mastery” is – and how relevant that signal is beyond the academy – is a point I think we should dwell on as ambiguous.
“Mastery” isn’t the upper tier of talent. Our quality frameworks do not, by principle, norm reference, and for good reason that are well-worn in assessment debates: shaving off a top slice of talent would exclude cohorts of students who might, in a less competitive year, have made the cut. So, then, we criterion reference; we classify against the extent to which programme outcomes have been met to a high standard. On that logic, we ought to be delighted when more and more students meet those standards. Yet when they do, we shift uneasily and brace for assaultive chorus of “dumbing down.”
The truth of the First feels even less solid when set against the range of disciplinary and transdisciplinary capabilities we try to pack into that single measure, and the range of contexts that consume it at face value. They use it to rank and sort for their own purposes; to make initial cuts of cohorts of prospective employees to make shortlists manageable, for instance, with troubling assumptive generalisation. That classification is paradoxically a very thin measure, and one that is overloaded with meaning.
It is worth asking how we ended up trusting so much to a device designed for a quite different era. The honours classification system has nineteenth-century roots, but the four-band structure that still dominates UK higher education really bedded in over the last century. The version we live with now is an artefact of an industrial-era university system; built in a world that imagined talent as a fixed trait and universities as institutions that sorted a small elite into neat categories for professional roles. It made sense for a smaller, more homogeneous system, but sits awkwardly against the complex and interdisciplinary world students now graduate into.
Today it remains a system that works a bit like a child’s play dough machine. Feed in anything you like, bright colours, different shapes and unique textures, and the mechanism will always force them into the same homogenous brown sausage. In the same way, the classification system takes something rich and individual and compresses it into something narrow and uniform. That compression has consequences.
The first consequence is that the system compresses in all sorts of social advantages that have little to do with academic mastery. Access to cultural capital, confidence shaped by schooling, freedom from financial precarity, familiarity with the tacit games of assessment. These things make it easier for some students to convert their social position into academic performance. Despite the sector’s valiant reach for equity, the boundary between a 2:1 and a 2:2 can still reflect background as much as brilliance, yet the classification treats this blend of advantage as evidence of individual superiority.
The second consequence is that the system squeezes out gains that really matter, but that are not formally sanctioned within our quality frameworks. There is value in what students learn in that space a university punctuates, well beyond curriculum learning outcomes. They navigate difficult group dynamics. They lead societies, manage budgets and broker solutions under pressure. They balance study with work or caring responsibilities and develop resilience, judgement, confidence, and perspicacity in ways that marking criteria cannot capture. For many students, these experiences are the heart of their learning gains. Yet once the classification is issued, that can disappear.
It is easy to be blithe about these kinds of gains, to treat them as nice but incidental and not the serious business of rigorous academic pursuit. Yet we know this extra-curricular experience can have a significant impact on student success and graduate futures, and it is relevant to those who consume the classification. For many employers, the distinctive value that graduates offer over non-graduates is rarely discipline specific, and a substantial proportion of graduates progress into careers only tangentially aligned to their subjects. We still sell the Broader Benefits of Higher Education™, but our endpoint signaling system is blind to all of this.
The moral panic about grade inflation then catches us in a trap. It draws us into a game of proving the hierarchy is intact and dependable, sapping the energy to attend to whether we are actually evidencing the value of what has been learned.
Thought experiment two: What if we gave everyone a First?
Critics love to accuse universities of handing out Firsts to everyone. So, what if we did? Some commentators would probably implode in an apoplectic frenzy, and that would be fun to watch. But the demand for a signal would not disappear. Employers and postgraduate providers would still want some way to differentiate outcomes. They would resent losing a simple shorthand, even though they have spent years complaining about its veracity. Deprived of the simplicity of the hierarchy, we would all be forced into a more mature conversation about what students can do.
We could meet that conversation with confidence. We could embrace and celebrate the complexity of learning gain. We could shift to focus on surfacing capability rather than distilling it. Doing so would mean thinking carefully about how to make complexity navigable for external audiences, without relying on a single ranking. If learning gains were visible and tied directly to achievement, rather than filtered through an abstract grading function, the signal becomes more varied, more human, and more honest.
Such an approach would illuminate the nuance and complexity of talent. It would connect achievement to the equally complex needs of a modern world far better than a classification ever could. It would also change how students relate to their studies. It would free them from the gravitational pull of a grade boundary and the reductive brutality that compresses all their value to a normative measure. They could invest their attention in expansive and divergent growth, in developing their own distinctive combinations of talents. It would position us, as educators, more clearly in the enabling-facilitator space and less in the adversarial-arbiter space. That would bring us closer to the kind of relationship with learners most of us thought we were signing up for. And it would just be …nicer.
Without classifications the proxy is gone, and universities then hold a responsibility to ensure that students can show their learning gains directly, in ways that are clear, meaningful, and relevant.
A future beyond classifications
The sector is capable of imagination on this question – and in the mid-2000s it really did. The Burgess Review was our last serious attempt to rethink classifications. It was also the moment in which our courage and imagination faltered in their alignment.
The Burgess conclusion was blunt. The classification system was not fit for purpose. The proposed alternative was the Higher Education Achievement Report (HEAR), designed to give a much fuller account of a student’s learning. HEAR was meant to capture not only modules and marks, but the gains in skills, knowledge, competence and confidence that arise from a wider range of catalysts: taught courses, voluntary work, caring responsibilities, leadership in clubs and societies, placements, projects and other contributions across university life. It would show the texture of what students had done and the value they could offer, rather than a single number on a certificate.
Across Europe, colleagues were (and are) pursuing similar ambitions. Across Bologna-aligned countries, universities have been developing transcript systems that are richer, more contextual and more personalised. They have experimented with digital supplements, unified competence frameworks, micro-credentials and detailed records of project work. The mission is less about ranking learners and more about describing learning. At times, their models make our narrow transcript look a little embarrassing.
HEAR sat in the same family of ideas, but the bridge it offered was never fully crossed. The system stepped back, HEAR survived as an improved transcript, the ambition behind it did not. And fundamentally, the classification remained at the centre as the core value-signal that overshadowed everything else.
Since then, the sector has spent roughly two decades tightening algorithms, strengthening externality and refining calibration. Important work, but all aimed at stabilising the classification system rather than asking what it is for – or if something else could do the job better.
In parallel, we have been playing a kind of defensive tennis, batting back an onslaught of accusations of grade inflation from newspapers and commentators that bleed into popular culture and a particular flavour of politics. Those anxieties now echo in the regulatory system, most recently in the Office for Students’ focus on variation in the way institutions calculate degrees. Each time we rush to prove that the machinery is sound – to defend the system rather than question it – we bolster something fundamentally flawed.
Rather than obsessing over how finely we can calibrate a hierarchy, a more productive question is what kind of signal a mass, diverse system really needs, and what kinds of value we want to evidence. Two growing pressures make that question harder to duck.
One is the changing conversation about the so-called graduate premium. For years, policymakers and prospectuses have leaned on an article of faith: do a degree, secure a better job.
Putting aside the problematics of “better,” and the variations across the sector, this has roughly maintained as true. A degree has long been a free pass through the first gates of a wide range of professions. But the earnings gap between graduates and non-graduates has narrowed, and employers are more openly questioning whether lack of a university degree should necessarily preclude certain students from their roles. In this context, we need to get better at demonstrating graduate value, not just presuming it.
The other pressure is technological. In a near future where AI tools are routine in almost every form of knowledge work, outputs on their own will tell us less about who can do what. The central question will not be whether students have avoided AI, but whether they can use it in the service of their own judgement, originality and values. When almost anyone can generate tidy text or polished slides with the same tools, the difference that graduates make lies in qualities that are harder to see in a single grade.
If the old proxy is wobbling from both sides, we need a different way of showing value in practice. That work has at least three parts: how we assess, what students leave with, and how we help them make sense of it.
How we assess
Authentic assessment offers one answer; assessment that exercises capability against contexts and performances that translate beyond the academy. But the sector rarely unlocks its full potential. Too often, the medium changes while the logic remains the same. An essay becomes a presentation, a report becomes a podcast, but the grade still does the heavy lifting. Underneath, the dominant logic tends to be one of correspondence. Students are rewarded for replicating a sanctioned knowledge system, rather than for evidencing the distinctive value they can create.
The problem is not that colleagues have failed to read the definitions. Most versions of authentic assessment already talk about real-world tasks, audiences and stakes. The difficulty is that, when we try to put those ideas into practice, we often pull our punches. Tasks may begin with live problems, external partners or community briefs, but as they move through programme boards and benchmarking they get domesticated into safer, tidier versions that are easier to mark against familiar criteria. We worry about consistency, comparability, grade distributions. Anxieties about loosening our grip on standards quietly win out over the opportunity to evidence value.
When we resist that domestication, authentic tasks can generate artefacts that stand as evidence of what students can actually do. We don’t need the proxy of a grade to evidence value; it stands for itself. Crucially, the value they surface is always contextual. It is less about ticking off a fixed list of behaviours against a normative framework, and more about how students make their knowledge, talents and capacities useful in defined and variable settings. The interesting work happens at the interface between learner and context, not in the delivery of a perfectly standardised product. Grades don’t make sense here. Even rubrics don’t.
What students leave with
If we chose to take evidencing learning gains seriously, we could design a system in which students leave with a collection of artefacts that capture their talents in authentic and varied ways, and that show how those talents play out in different contexts. These artefacts can show depth, judgement and collaboration, as well as growth over time. What is lost is the “rigour” and sanction of an expert judgement to confirm those capacities. But perhaps here, too, we could be more creative.
One way I can imagine this is through an institutional micro-credential architecture that articulates competences, rather than locking them inside individual modules. Students would draw on whatever learning they have done, in the curriculum, around it and beyond the university, to make a claim against a specific micro-credential built around a small number of competency statements. The assessment then focuses on whether the evidence they offer really demonstrates those competencies.
Used well, that kind of system could pull together disciplinary work, placements and roles beyond the curriculum into a coherent profile. For those of us who have dabbled in the degree apprenticeship space, it’s like the ultimate end-point assessment, with each student forging a completely individualized profile that draws in disciplinary capabilities alongside adjunct and transdisciplinary assets.
For that to be more than an internal hobby, it needs to rest on a shared language. The development of national skills classification frameworks in the UK might be providing that for us. It is intended to give us a common, granular vocabulary that spans sectors and occupations, and that universities could use as a reference point when they describe what their graduates can do.
The trouble is, I doubt, that this kind of skills-map-as-transcript can ever really flourish if it must sit in the shadow of a single classification. That was part of HEAR’s problem. It survived as a supplement while the degree class kept doing the signaling. If we are serious about value, we may eventually need to let go of the single upper-case proxy altogether. Every student would leave not with a solitary number, but with a skills profile that is recognisably linked to their discipline and shaped by everything else they have learned and contributed in the years they spent with us.
How students make sense of it
Without support to make sense of their evidence, richness risks becoming noise of a different kind. This is one reason classifications remain attractive. They collapse complexity into simplicity. They offer a single judgement, even if that judgement obscures more than it reveals.
Students need help to unify their evidence into a coherent narrative. It is tempting to see that as the business of careers and employability services alone, but that would be a mistake. This is a whole-institution task, embedded in curriculum, co-curriculum and the wider student experience.
From conversations within courses to structured opportunities for reflection and synthesis, students need the means to articulate their value in ways that match their aspirations. They need to design imagined future versions of their stories, develop assets to make them real, test them, succeed and fail, and find direction in serendipity. This project of self, and arriving at that story – a grounded account of who they are now, what they can do and where they might go next – is arguably the apex output of a higher education. It is the point at which years of dispersed learning start to cohere into a sense of direction. And it feels like a very modern version of the old ideal of universities as a place to find oneself.
Perhaps the sector is now better placed, culturally and technologically, to build that kind of recognition model rather than another supplement. Or at the very least, perhaps the combined pressure of AI and a more skeptical conversation about the graduate premium offers enough of a burning platform to make another serious attempt unavoidable.
A reborn signal
I am being playful. I do not expect anyone to actually give every student a First. Classifications have long endured, and they will not disappear any time soon. Any institution that chose to step away from them would be taking a genuine act of brinkmanship. But when confronted with accusations of grade inflation, universities defend their practices with care and detail. What they defend far less often is their students, whose talents and achievements are flattened by the very system we insist on maintaining. We treat accusations of inflation as threats to standards, rather than prompts to talk about value.
The purpose of these thought experiments is to renew curiosity about what a better signal might look like. One that does justice to the richness of learners’ journeys and speaks more honestly about the value higher education adds. One that helps employers, communities and students themselves to see capability in a world where tools like AI are part of the furniture, and where value is found in how learning connects with real contexts.
At heart, this is about what and whom we choose to value, and how we show it. Perhaps it is time to return to the thread Burgess began and to pick it up properly this time, with the courage that moment represented and the bravery our students deserve.
Join Mark and Team Wonkhe at The Secret Life of Students on Tuesday 17 March at the Shaw Theatre in London to keep the conversation going about what it means to learn as a human in the age of AI.

