AI Detectors are a lie—here’s what students are actually doing

AI Detectors Aren’t Catching Cheaters. They’re Punishing Honest Students — And Nobody Wants to Talk About It.

A student I know rewrote her entire 3,000-word thesis by hand. No AI. No shortcuts. She’d spent three weeks on it — primary sources, original argument, the whole thing. Her professor ran it through an AI detector and flagged it as 87% AI-generated.

She got an F.

The appeal took six weeks. She passed, eventually. But she lost a semester of momentum, her confidence in her own writing, and any remaining trust she had in the institution that was supposed to be educating her.

This is not an edge case. This is happening in classrooms across every country, at every level of education, right now. And the response from most institutions has been to double down on the detectors — to buy more subscriptions, write stricter policies, and treat the symptom while completely ignoring the disease.

The disease is not AI. The disease is that we built an education system that cannot tell the difference between learning and output — and now that the output is easy to fake, we have no idea what we’re actually measuring anymore.

Why AI Detectors Are Functionally Useless — The Data Schools Aren’t Sharing

Let’s be precise about what AI detectors actually do, because most administrators deploying them don’t fully understand the technology they’re trusting with students’ academic futures.

AI detectors work by analyzing perplexity (how unpredictable the word choices are) and burstiness (how much sentence length varies). Human writing tends to be less predictable and more variable. AI writing tends toward consistency. The detector assigns a probability score based on these patterns.

The problem: these are stylistic signals, not authorship signals. A student who writes clearly, precisely, and without filler — the kind of writing good teachers spend years trying to teach — will consistently score as “likely AI.” A student who writes with run-on sentences, tonal inconsistency, and structural chaos will score as “likely human.”

The detectors are, in the most literal sense, penalizing good writing.

The research backs this up. A 2023 Stanford study found that AI detectors flagged non-native English speakers as AI-generated at dramatically higher rates than native speakers — because non-native speakers often write in more structured, deliberate patterns that resemble AI output. A 2024 follow-up study by researchers at the University of Maryland found false positive rates as high as 17% on fully human-written text when submitted to leading commercial detectors.

Turnitin, one of the most widely deployed detectors in academic institutions, has publicly acknowledged that its tool should not be used as sole evidence in academic misconduct cases. Schools are using it as sole evidence anyway.

⚡ The Uncomfortable Truth

AI detector companies are selling certainty to institutions that are desperate for it. The product isn’t accuracy — it’s the appearance of a solution. Schools get to point at a percentage and feel like they’ve handled the problem. The percentage is often wrong. Nobody gets fired for buying the software.

The Real Problem: We’re Grading the Wrong Thing

Here’s the question nobody in education administration seems willing to ask: what were we actually measuring before AI existed?

A take-home essay measures a student’s ability to produce a document under low-stakes, unsupervised conditions over several days. That is an incredibly specific skill. It is not the same as understanding. It is not the same as critical thinking. It is not the same as the ability to apply knowledge in novel situations.

We treated it as a proxy for all of those things because it was the most scalable measurement tool available. You can grade 200 essays. You cannot have 200 Socratic dialogues.

AI didn’t break that proxy. AI just made it obvious that the proxy was already broken. Students have been outsourcing essays to tutors, essay mills, and each other for decades. The difference now is that the tool is free, accessible, and impossible to reliably detect — so suddenly the institutions that looked the other way for years are in crisis mode.

The crisis isn’t new. The visibility is.

What Educators Are Actually Doing That Works

There are teachers and institutions getting this right. They’re not the majority, but they exist, and their approaches are worth documenting.

Process-based assessment. Instead of grading the final essay, grade the drafts, the research notes, the revision history. AI can generate a final product. It cannot convincingly fake three weeks of messy, evolving thinking. Several universities have moved to requiring students to submit their entire document history via Google Docs — where every edit is timestamped and traceable.

Oral defense of written work. If a student wrote it, they can talk about it. A 10-minute conversation about their argument, their sources, and the choices they made in the writing process is more revealing than any detector score. This scales poorly for large classes, but for high-stakes assessments, it’s the most reliable tool available.

AI-integrated assignments. Some educators have stopped fighting AI and started designing around it. “Use Claude to generate a first draft of this argument, then write a 500-word critique of what it got wrong.” The assignment tests the skill — critical analysis — not the ability to produce text without assistance.

In-class writing components. Hybrid assessments that combine a take-home research phase with an in-class synthesis component mean that even if AI assisted the research, the student has to demonstrate understanding in a controlled environment. It’s not a perfect solution, but it closes the gap significantly.

What Needs to Happen — And Why It Probably Won’t

The honest answer is that fixing this requires institutions to do something they are structurally terrible at: admitting that a current practice is wrong and replacing it with something harder to scale.

AI detectors need to be formally prohibited as sole evidence in academic misconduct cases. Several faculty unions have already called for this. Most institutions haven’t moved.

Assessment design needs to shift from output-focused to process-focused across the board. This requires retraining teachers, redesigning rubrics, and accepting that some measurements will be less clean and comparable than a percentage score. Institutions don’t love ambiguity.

And students who were wrongly penalized by detector false positives need formal, accessible appeals processes — not six-week ordeals that punish them for the institution’s error.

None of this is technically difficult. All of it is politically difficult. Schools will keep buying detector subscriptions because it feels like doing something. Students will keep adapting. And somewhere in the middle, a kid who actually wrote every word will get an F that follows her for years.

📩 This is part of an ongoing series on AI and education at The Edge.
Next up: how three universities are redesigning assessment from scratch — and what’s actually working. Subscribe to get it before it publishes.
→ Subscribe to The Edge

Alex’s Take

We didn’t have an academic integrity crisis when AI arrived. We had one already — and AI just removed the fig leaf that was covering it.

The detectors aren’t the problem. They’re a symptom of institutions that would rather buy a subscription than ask a harder question: what are we actually trying to teach, and how do we know if it’s working?

Stop punishing students for the system’s failure to evolve. That’s not education. That’s just liability management with a rubric.

– Alex