If you’ve served on a faculty senate in the past two years, you’ve probably sat through some version of the same meeting: Take-home exams are unusable. Discussion posts show eerie sameness. Someone proposes mandatory in-person midterms. The instructional design team explains what the LMS can and cannot detect. A frustrated student representative asks, “Are we trying to stop this or adapt to it?”
You leave those meetings with the same uneasy feeling: We are treating generative AI as an academic integrity problem, when it’s actually something more destabilizing.
It is a measurement problem.
And it’s breaking the measurement system higher education has relied on for a century: the assumption that a student’s submitted work—an essay, a problem set, a lab report, a take-home exam—can serve as a credible proxy for what the student can actually do.
We’re already in the first phase of adaptation—restrictions, detection, redesigns that quietly “return to the room.” But a second phase is also emerging, unevenly and often quietly: faculty experimenting with assessments that look less like snapshots and more like evidence streams—what a sports scout would call game tape.
Those early experiments point toward a promising path out of the current muddle, one that would enhance our ability to measure outcomes and guide students. But first we need to name the thing that’s breaking.
The Artifact Economy Collapses
Higher education runs on artifacts. We assess students by what they hand in because artifacts are manageable. They fit into learning management systems, rubrics, grade books and accreditation reports. They can be stored, sampled, audited and compared across cohorts. They are, in the bureaucratic sense, legible.
For decades, this artifact economy worked reasonably well—because artifacts were costly to produce. Writing a coherent essay required effort and attention. Solving a problem set required struggle, even if it was struggle assisted by office hours or peers. You couldn’t reliably produce the right-looking thing without touching the underlying skill. That cost structure made artifacts a decent proxy for competence. Not perfect. But serviceable.
Generative AI collapses that cost structure.
When a student can produce fluent prose, plausible reasoning and tidy structure in minutes, the artifact stops carrying the information we thought it carried. The “beautifully written paper” no longer reliably signals careful reading, deep comprehension or original synthesis. It may signal those things. Or it may signal tool proficiency plus a willingness to outsource large parts of thinking to a machine. Often, it’s some mix, and the mix is the problem: The artifact no longer tells you what you need to know.
This is why cheating is too small a frame. Cheating is a subset of the problem. The deeper issue is decoupling: AI separates output from competence in a way higher education has not had to confront at scale.
Education Has 2 Products
Higher education produces two things at once: the learning product (the transformation of the student—knowledge, skill, judgment), and the credential product (the public signal—grades, degrees, transcripts).
These have never been perfectly aligned, but society has treated them as aligned enough. The credential has functioned as evidence of learning.
Generative AI attacks this coupling. It doesn’t necessarily prevent learning—in many contexts, it can improve learning. But it makes it significantly harder to infer learning from submitted work. And because higher education’s legitimacy depends disproportionately on credentials, the credential breaks first.
Once that happens, institutions face a choice: rebuild credibility by changing what we measure, or defend credibility by tightening control over how artifacts are produced.
We are already doing both. The question is which response we scale.
The Snapback—and the Inequality Loop
When signals degrade, institutions rarely respond by becoming more imaginative. They become more conservative.
In higher education, that conservatism shows up as a snapback in two directions: control and prestige. Both widen inequality.
- Control
We are already seeing the control snapback: timed in-class writing, closed-book exams, oral defenses, expanded proctoring, handwritten work—and, where resources permit, labor-intensive assessment that keeps the student’s process visible.
But notice: where resources permit.
Control is not just pedagogy. It is institutional capability. Small seminars and well-resourced campuses can add live assessments, schedule oral defenses and absorb the administrative friction.
The places that educate most students often cannot.
Large-enrollment gateway courses. Commuter campuses. Community colleges. Hybrid programs. Online programs serving adults with jobs and caregiving responsibilities. For those students, “return to the room” can be the difference between staying enrolled and dropping out. For institutions that serve them, replacing artifacts with supervised performance is limited by staffing, facilities and budgets—not virtue or willpower.
This is the first way AI risks widening inequality: The control response is easier to implement in the privileged parts of the sector.
- Prestige
The second snapback is toward prestige. When the meaning of coursework becomes uncertain, external audiences—employers, graduate programs, even families—lean more heavily on institutional brand as a substitute for measurement. If you can’t trust the artifact, you trust the institution that claims to have filtered and shaped the student.
Here, too, AI risks widening inequality. Institutions with the resources to shift assessment toward supervised performance can make a credible claim that they have “seen” the student in contexts where AI cannot fully substitute for understanding. That credibility strengthens the brand. The brand then becomes a proxy for credibility. And the cycle tightens.
This is the inequality loop: Capability enables control; control sustains credibility; credibility reinforces prestige; prestige attracts resources; resources expand capability.
Meanwhile, institutions that serve working adults and nontraditional students—those who most need flexibility—remain more dependent on asynchronous artifacts, which are precisely what AI destabilizes.
AI could have been an equalizing force, especially for students who haven’t been trained in elite academic English. Instead, the early adaptation pattern threatens to split the sector not just by selectivity, but by verifiability.
From Snapshots to Tape—at Scale
If the artifact economy is breaking, what replaces it?
Think about how we evaluate competence where performance is visible. An athlete. A pianist. A nurse. We watch performance over time, under constraint, with feedback and revision. We observe not only output, but also judgment, reflection, adaptation.
We want something like game tape.
Game tape was always better than snapshots—even before AI entered the picture. A single artifact freezes performance in one moment, often under artificial conditions, and gives you almost no visibility into how it came to be. It tells you what someone produced, not how they reasoned. It can’t show whether they learn from feedback, adjust when the context shifts or recognize the limits of their own understanding.
Game tape can. It captures the arc: missteps, revisions and improvement. It makes growth legible, not just achievement. And it demonstrates capability across varied conditions—not just polish in a single high-stakes moment.
So why did we build an entire assessment system around snapshots? Not because they were more valid. Because they were cheaper. And, crucially, because they scaled.
AI is breaking the artifact economy. But that disruption is also a kind of forced reckoning: It creates pressure to move toward an approach to assessment that was always more defensible; we just couldn’t afford to do it at scale. Until now.
Here’s the counterintuitive part: The same technology that destabilizes artifacts can also lower the cost of capturing and curating evidence streams—if we design for it.
So think of AI less as the ghostwriter and more as the tape recorder. Here are some examples.
- Capture process automatically. Students already work in digital environments—Google Docs, Jupyter notebooks and other tools with built-in version control. AI can analyze the revision history and generate a timeline showing when major changes occurred: “First draft focused on historical context. Second draft added three statistical arguments. Third draft reorganized to lead with counterargument.” An instructor can scan this digest in 30 seconds and immediately see whether the student engaged substantively with feedback or just polished surface features. No reflective essay required from the student; the system extracts the evidence trail automatically.
- Prompt targeted checks. Instead of scheduling 30 individual oral exams, AI can generate three follow-up questions tailored to each student’s submission: “You claim X led to Y, but the data shows Z—can you reconcile this?” or “Walk me through your choice of method here.” The student is given these questions on a video call and asked to respond right then, in five minutes. AI transcribes it, time-stamps key moments and flags unclear reasoning. The instructor reviews flagged sections and makes the judgment call. What would have taken six hours of oral examination becomes 90 minutes of focused evaluation. The instructor’s role shifts from administering the test to interpreting the evidence.
- Make low-stakes evidence cheap. A single high-stakes exam creates enormous pressure and limited information. Ten low-stakes checks across a semester reveal patterns: Does the student improve with feedback? Can they transfer concepts to new contexts? Do they recognize their own weak reasoning? But creating 10 assessments manually is prohibitive. AI can generate variations of case studies, produce “what’s wrong with this analysis” prompts, and sort student responses into “demonstrates understanding/partially demonstrates/does not demonstrate” buckets for rapid instructor review. This doesn’t automate judgment—it makes the judgment workload manageable.
- Move feedback upstream. Most instructor feedback arrives when it’s too late, scrawled on a final submission the student will never revise. AI can intervene earlier: “Your argument in paragraph three assumes causation, but you’ve only shown correlation. Consider whether reverse causality is possible here.” Or, “You cite three sources, but two are from the same advocacy organization. How might this limit your perspective?” This isn’t grading. It’s formative prodding that helps students catch problems while there’s still time to fix them. The instructor’s summative feedback load decreases because the work arriving for final evaluation is stronger.
- Normalize transparent tool use. The worst equilibrium is covert AI use combined with faculty suspicion. Break that cycle by designing assignments where AI assistance is expected and logged: “Use AI to generate three counterarguments to your thesis. Pick the strongest one and explain why it’s stronger than the others. Then show how you’d respond to it.” Or, “Have AI critique your statistical approach. Document what it flagged and what you changed as a result.” When students document their tool use—and when that documentation is partially automated (a log of queries, a diff of AI-suggested versus final text)—transparency stops being a burden. And instructors can evaluate the more important skill: the student’s judgment about what to accept, reject or refine.
In other words: AI can help us build the tape that AI makes necessary.
This matters most where high-control assessment is least feasible. If game tape becomes a luxury practice reserved for privileged campuses, it will become yet another mechanism of stratification. The tape has to be possible in high-enrollment and flexible settings, or it won’t solve the credibility problem—it will merely relocate it.
The Fork and the Prediction
We are already in the first phase: tighter rules, constrained assessments, more detection and enforcement.
And we can see the beginnings of the second phase: faculty redesigning courses around process evidence, live reasoning, iterative work and transparent tool use.
The institutions that emerge strongest will not be the ones that solve cheating. They will be the ones that develop credible ways to answer a simpler question:
What can this student actually do—when it matters, under constraint and by harnessing all the resources available in today’s world, including AI?
And here is the hopeful part: If we make that shift, we will end up with better assessment than we had before AI arrived. Not just good enough despite AI, but actually better—more valid, more informative, more aligned with what we claim to value. The artifact economy was always a compromise, a proxy we tolerated because the real thing seemed too expensive. AI forces our hand. But the hand we’re forced to play is the stronger one.
Higher education has always claimed that learning is about the process, not just the product.
AI is forcing us to prove it. And if we do, we’ll be better for it.

