Now the struggle is no longer real, are students becoming stupid?

wergfhieruhguier

I used to take copious notes.

In meetings, encounters, on Zoom calls, even when travelling – including, in a former life, when behind the wheel – I have pages of the stuff up in boxes up in the attic. In fact, not having a uniball pen and an A5-sized ringbound pad to hand would often cause me significant anxiety.

Over time, I’d rationalised why. Writing notes, they said, requires selective attention. You can’t transcribe everything in real time, so I was forced to decide what matters, paraphrase it, and organise it.

That process increases semantic processing, which strengthens memory traces even if the notes are never revisited. So the benefit came from the cognitive work of filtering, compressing, and structuring information, not from the artefact produced.

I’d doodle too – something something undiagnosed ADHD something something. But externalising fragments of information, I’d tell myself, whether as words, symbols, arrows, or shapes, reduced my cognitive load.

It freed my capacity to process relationships, implications, and meaning while the discussion continued. Even when my notes were messy or incomplete, the act of offloading stabilised my attention and comprehension in the moment.

But then the other day, when someone in the team explained how they used their electronic device to take notes – but often never looked at them again – it dawned on me that I don’t, any more.

I tried. I own any number of e-pens and tablets and gadgets that allow me to. But I never clicked with any of them, and many now join the ringbound scrawls that I somehow can’t let go of in the box in the corner.

From time to time, I’ll audio record the encounters I’m in and use a large language model (LLM) to summarise actions, or recall detail. Sometimes, I’ll flit between noting things I need to do on a task manager app, or on a Google doc, or even on a real life post-it note attached to the monitor.

But I no longer take notes. I’m not that person anymore. Am I becoming stupid?

Guinea pigs in Budapest

There’s a cracking story from last autumn that surrounds a group of students and researchers from Corvinus University in Budapest. In the early days of generative AI, the concerns were mainly about the way in which the tools could be used to produce things – and in a culture where continuous assessment relies upon the digital asset being produced being asynchronously graded as a symbol of a student’s learning, the instant and obvious problem was whether “they produced it”.

But in Hungary, academics had noticed that polling and focus groups had started to surface a deeper reliance on AI – not just to write up the report or complete the essay, but all the other bits too – the research, the reading, the synthesis, the exploration that the write up was designed, on one proxy level or another, to demonstrate.

They wondered whether what was starting to look like reliance was affecting their motivation, their genuine understanding of the material, and the extent to which it was substituting for the process of knowledge acquisition. To interrogate the impact on learning outcomes, an experiment was created. In an operations research module, students were randomly placed into two groups – where one was permitted to use AI tools during both teaching episodes and examinations, and the other wasn’t.

Anticipating objections and to make it fair, they’d even ensured that a compensation mechanism would kick in – students in the lower-performing group would receive grading adjustments until average performance across the two groups was equalised. But despite the academics’ best efforts to explain the design and create a level-ish playing field for all participants, students – many of whom had an eye on the relationship between exam results and access to scholarships – were furious. One told news portal Telex:

I really don’t think it’s fair, it’s quite absurd that some people can use AI in the exam and others can’t, and the results are on the same scale. This way, they don’t measure knowledge, but who is in which group – and I think this is fucking unfair.”

And even though the experiment had been approved by every relevant bit of the university’s governance – the ethics board, the head of department, the programme director, and the Student Council – they were able to get their concerns first into the media, then the Office of the Commissioner for Fundamental Rights, and eventually to the minister for Culture and Innovation.

Their instinct was that student reaction was more revealing than the data – AI tools have already become so embedded in how students work that removing them felt not like a fair test, but like a punishment. The experiment was duly halted – much to the frustration of associate professor Balázs Sziklai:

…it would be important for the decision-making and legislative bodies to take an encouraging approach to research into the role of AI in education. Student experiments are essential for us to understand exactly what effects are taking place.

Not all was lost. Even though the control group also ended up being permitted to use AI – removing the basis for a clean comparison – according to Sziklai –

…it can be stated with complete certainty that the students did not master any part of the curriculum.

In his view, the knowledge that they were working with a safety net had killed all motivation, self-confidence, and curiosity. That’s partly because students hadn’t just coasted through preparation – they had stopped paying any attention to the answers they were giving in the exam itself:

They uncritically copied the AI’s answers, even if they were clearly stupid. If the language model suggested two separate solutions, then they would definitely copy both, saying one would definitely be good.”

In the first twenty minutes of the exam – set up as easy true-false questions, answered offline, with no devices – the average score was 53 per cent. In the second part, where students tackled tasks similar to those practised in class but with AI tools permitted, the average jumped to 75 per cent.

Sziklai and his team are continuing the research – running focus groups to understand how students experienced the experiment and what they think about AI’s role in their education. And Corvinus issued the sort of statement that universities issue – committed to examining challenges and opportunities, supporting research that provides equal opportunities, shaping examination conditions in a modern and ethical way, and so on.

But was the conclusion that Sziklai drew the slam dunk that he thought it was? Were his students becoming stupid?

The right kind of hard

Over in the US, researchers working with kids in schools had been asking similar questions. In one study, high school maths students were randomly split into groups – one given access to ChatGPT during practice sessions, one given a version of ChatGPT that had been prompted to act as a tutor and refuse to give direct answers, and a control group with no AI access at all.

The pattern in the first group would have been familiar to Sziklai. Students used the tool as a crutch – performance improved in the short term, but when it was taken away, they performed worse than those who’d never had access at all. They had worse long-term retention, and worse independent problem-solving. The tool had helped them get through the work without them having to do the work.

But in the second group – the one given the tutor version, the one that had been pushed to recall and problem-solve rather than handed answers – something else happened. That saw nearly double the short-term gains, without the long-term drop-off. The researchers had only tested students after a single practice session with a fairly simple tutoring prompt – raising the question of what sustained exposure to a better-designed tool might achieve.

A report published by US education nonprofit Bellwether in June 2025 uses the study – and a growing body of others like it – to argue that the question “is AI good or bad for learning?” is the wrong one. The right question, they suggest, is this. When does ease enable deeper learning, and when is ease a shortcut with a hidden cost?

Their answer draws on decades of cognitive science around what researchers variously call “productive struggle,” “desirable difficulty,” or the “zone of proximal development.” The core idea is that effort only enhances learning when it sits in the right zone – hard enough to require genuine cognitive work, but not so hard that the student disengages.

Get the calibration right and you trigger a virtuous cycle – memory encoding, sustained attention, intrinsic motivation, and the metacognitive skills that allow students to monitor their own understanding. Get it wrong – in either direction – and the learning doesn’t happen.

The framework casts the Corvinus results in a different light. The problem wasn’t just that students had access to AI – it was that the conditions under which they were using it gave them no reason to struggle at all. Why wrestle with a problem when the tool will hand you an answer and the grading system will catch you if you’re wrong? The safety net didn’t just reduce difficulty – it removed the relationship between effort and outcome entirely.

In other words, what Sziklai observed may have been less about AI destroying the capacity to learn than about it destroying the motivation to bother – which is a different, if related, problem. And the authors argue that this distinction runs through everything – not a technology problem, but a design problem.

The same tool that turns one student into a passive copier can turn another into a more curious, more persistent thinker – depending on how it’s built, how it’s introduced, and what the student is being asked to do with it. And if students are responding rationally to a badly designed assessment, then the question isn’t whether AI makes people stupid. It’s whether it renders their teachers stupid.

Better products, worse people

When students offload not just the task, but the thinking about the task – the planning, the monitoring, the self-correction, the internal process of figuring out what you know, what you don’t, and what to do about the gap – that’s what everyone’s starting to call “metacognitive laziness”.

In one study, undergraduates who used AI to research a topic experienced lower cognitive load than those who used a traditional search engine. For some, that sounds like a good thing – but the quality of their arguments was worse. The researchers suspected that because the process felt easier, students simply hadn’t engaged in the deeper processing that the harder route had forced upon them. The effort wasn’t a bug in the old method. It was the mechanism.

Another study on how university students interacted with an AI tool found that almost half of conversations were “direct” – students were looking for answers with minimal engagement, no wrestling, no iteration, no back-and-forth, just “give me the thing”.

When second-language learners given ChatGPT support for a writing task were assessed, the ChatGPT group produced better essays – but showed no significant difference in actual knowledge gain or transfer. The tool had improved the product without improving the person.

When students in a brainstorming study were given AI support, they rated the task as requiring less effort. They also rated it as significantly less enjoyable and less valuable – even when independent raters judged the outputs to be better. In another study, stories written with AI-generated ideas were rated as more creative, better written, and more enjoyable to read – but were also more similar to each other. Individual quality went up – but collective novelty went down.

The work got easier, and the product got better. But something – the satisfaction, the meaning, the sense of ownership – got worse. If the process is where the learning lives, and if the process is also where the meaning lives, then optimising for output is the problem. It might be hollowing out the thing that made the activity worthwhile in the first place.

Of course, much of what education systems measure, and much of what employers reward, is output. The essay, the exam score, the report, the “deliverable”. The Bellwether authors acknowledge the tension – they argue that educational goals need to shift toward meaning-making, critical discernment, and the ability to sustain effort amid complexity, and that “as AI takes on more of the routine, the passable bar for what humans contribute will rise.”

But they stop short of the harder claim. If the tool is always going to be there – if it’s as permanent and ambient as the search engine already is – then the expectation that students will carry large volumes of subject knowledge around in their heads starts to look less like a standard and more like an anachronism.

The 53 per cent on the Corvinus paper test is a scandal if you think people should be able to answer those questions unaided. It’s an inevitability if you think they’ll never have to. The skill that matters in that world isn’t knowing the answer – it’s being able to tell when the tool is giving you something stupid.

But is higher education really ready to rebuild, and work, around the admission that knowing things might matter less than knowing what to do when you don’t?

Cognitive debt

Over at MIT, researchers had been asking a version of the same question – but with electrodes. They recruited 54 participants, split them into three groups, and asked them to write essays. One group used an LLM, one used a search engine, and one used nothing at all. Each completed three sessions under the same condition – and then, in a fourth session, some were switched. LLM users were told to write unaided, and unaided writers were given the tool.

The results were astonishing. Brain-only writers showed the strongest and most distributed “neural connectivity” – the broadest, most active networks. Search engine users showed moderate engagement. But LLM users showed the weakest – cognitive activity didn’t just correlate with tool use – it scaled down in direct proportion to it. The more the tool did, the less the brain did.

But it was the switchover that should give us pause. When LLM users were asked to write without the tool, they didn’t just find it harder. Their brains showed reduced connectivity in the regions associated with sustained attention and memory – not struggling to do the work, but under-engaged, as if the neural architecture for doing it had somehow downgraded. Four months of LLM-assisted writing hadn’t just let them avoid the cognitive effort – it appeared to have changed what their brains were ready to do.

The researchers call it “cognitive debt” – a cost that accumulates invisibly and comes due later. And there was one more finding that connects back to the brainstorming studies. LLM users reported the lowest sense of ownership over their essays, and when asked to recall what they’d written, struggled to accurately quote their own work. They hadn’t just outsourced the effort – they’d outsourced the experience of having done it. The full study is available on arXiv.

Working with wizards

Ethan Mollick, the University of Pennsylvania professor, has been tracking the shift with increasing unease. In a September 2025 essay, he argued that the relationship between humans and AI is moving from what he called “co-intelligence” – where you collaborate with the tool, check its work, guide it – to something more like working with a wizard. You make a vague request, something impressive comes back, but you have no idea how it was made, and limited ability to verify whether it’s right.

His question for educators was blunt:

How do you train someone to verify work in fields they haven’t mastered, when the AI itself prevents them from developing mastery?

It’s an almost perfect paradox. The skill you most need – judgment about whether the output is any good – is the skill that requires the domain knowledge that the tool has just made it unnecessary to acquire. Mollick’s answer, such as it is, was that we need to become “connoisseurs of output rather than process” – developing instincts, through extensive use, for when the tool succeeds and when it fails. It may be true, but it isn’t yet a curriculum.

Meanwhile, the bleaker readings of what’s left when you strip out everything that AI can do are piling up. Tyler Cowen’s provocative suggestion – that the university will persist mainly as “a dating service, a way of leaving the house, and a chance to party and go see some football games” – is usually quoted as a punchline.

But it deserves more serious attention than it gets. If the knowledge-transmission function is dead and the credentialling function is weakening, then what universities actually provide is structure, socialisation, proximity to peers, and a reason to leave the house for three years during a critical developmental window. That might sound like a downgrade. It might also be an honest description of something genuinely important – the relational and developmental architecture that no tool, however capable, can replicate. The problem is that “we’re a really expensive way of helping young people grow up” is a difficult line to put in a prospectus.

A piece in Frontiers in Education tried to frame it more ambitiously – arguing that the enduring value of higher education lies in “epistemic judgment, belonging, and wonder.” That’s a lovely sentence. It’s also aspirational rather than operational. Nobody has a validated pedagogy for “wonder”, and there is no module description for belonging. They are the words universities reach for when they sense that the old justifications are collapsing, but haven’t yet built the new ones.

Howard Gardner – the multiple intelligences theorist – went further at a Harvard forum last autumn, suggesting that by 2050, most cognitive aspects of mind will be done so well by machines that “whether we do them as humans will be optional.” What survives, in his view, is the respectful mind and the ethical mind – how we treat other people, and how we handle difficult questions as citizens and professionals.

His model is radical – a few years of basics, then teacher-coaches that guide students toward activities that challenge their thinking and expose them to ideas. It’s compelling, but it’s also extremely difficult to fund, politically almost impossible to sell, and structurally incompatible with everything from student loans to league tables to quality assurance frameworks.

What should they become?

What almost nobody is connecting is the productive struggle research – which is increasingly robust – to the purpose question. The cognitive science tells us, with growing confidence, that effort in the right zone builds the architecture for memory, attention, motivation, and metacognition. But architecture for what? If it’s architecture for carrying knowledge around in your head, and carrying knowledge around in your head is becoming a commodity, then we’re building capacity for something whose value is declining. The literature knows this is a problem – but doesn’t solve it.

The conservative position – articulated by people like Robert Pondiscio at the American Enterprise Institute – is to hold the line. “Developing judgment is the entire point of education,” he argues, and AI “takes judgment out of the loop.” Where previous technologies automated low-level skills, AI automates higher-order thinking – “the very mental operations that define a well-educated person.”

You can’t just move up the Bloom’s hierarchy when the tool is already sitting at the top of it. His solution is to basically resist. Don’t let the tool replace the struggle – education is transformation through effort, full stop. It’s coherent and comforting, but it’s also a defence of a model that is already losing – because the tool is here, and it isn’t going away, and telling students not to use it has roughly the same success rate as telling them not to use their phones.

Others acknowledge the tension without resolving it. Jason Gulya, writing in the Chronicle of Higher Education’s AI forum, argues that “we’ll need to chip away at the transactional model of education and put learning – with all of the productive struggle and inefficiency it often involves – at the centre.” Which is fine, as far as it goes. But “put learning at the centre” is an easy sentence to write in an education policy document. It has not, historically, been enough.

So I just ask again – what does productive struggle look like when the purpose of higher education is no longer knowledge transmission? Not struggle in the service of memorising content that a machine can retrieve instantly, and not struggle as a proxy for discipline or grit or moral seriousness. But struggle as the deliberate, designed, scaffolded process by which people learn to notice what they don’t know, to interrogate what they’re told, to hold complexity without reaching for a shortcut, and to recognise when a confident, fluent answer is wrong – especially when it’s being delivered by something that never hesitates and never says “I’m not sure.”

If we started there – if that were the organising question, rather than “how do we stop them using ChatGPT on the essay” – we might get somewhere. The maths tutor study already showed it’s possible. The tool that refused to give the answer and forced the student to think produced nearly double the learning gains without the long-term drop-off. The productive struggle didn’t have to disappear – but it did have to be deliberately reintroduced – by the tool itself, which feels like a strange inversion. The thing that makes you lazy can also be the thing that refuses to let you be lazy, if someone decides to build it that way.

But that requires knowing what you’re building it for. It requires a much deeper focus on teaching – and an acceptance that doing it will involve teachers building or adapting the tools they fear.

We’re doing our bit to try to find out. In all that we’ve read, few seem to be asking students what they think they learned from a given AI interaction and taking the answers seriously as data about a potentially new form of learning, rather than as evidence of success or failure against pre-AI benchmarks. In the run up to this year’s Secret Life of Students, we’re looking at that – through a survey and focus groups – and if you’re willing to put the survey out to your students, or can nominate course reps to get involved, we’d love to hear from you.

More broadly, it all means answering a question that higher education has been avoiding since long before AI arrived – what is this actually supposed to do to people? Not what they should know, but what they should become.

And whether, on balance, it’s OK to throw that box of old notepads out now. It’ll make me a better person.

Source link