Why Building an AI Detector Is a Losing Battle

Mustafa Kamal — Sat, 05 Jul 2025 00:23:17 GMT

The rise of large language models (LLMs) like GPT-4, Claude, and Gemini has sparked a wave of tools promising to detect AI-generated text. Schools, publishers, and employers are eager to adopt them due to their concern about plagiarism, misinformation, or loss of authenticity. But here’s the uncomfortable truth: developing a reliable LLM detector is a losing battle.

1. The Problem With False Positives and Negatives

No matter how sophisticated the detector, it must answer a binary question: Was this written by a human or an AI? But LLMs are trained on human writing, and their outputs are often indistinguishable from ours, sometimes even more “polished” than human text.

This leads to two fundamental failure modes:

False positives: Human-written text flagged as AI-generated. This happens with non-native English speakers, students with rigid or overly formal writing, and even professional authors.
False negatives: AI-generated content that passes as human-written, especially when lightly edited or prompted skillfully.

In high-stakes situations such as grading, hiring and publishing, either type of error is damaging. The cost of getting it wrong is often greater than the value of getting it right.

2. LLMs Are Improving Faster Than Detectors

Every time a detection method is released, LLM developers adapt. Prompt engineering alone can dramatically lower detection accuracy. For instance:

Asking an LLM to mimic a specific human writer
Using chain-of-thought reasoning to inject more variation
Post-editing with another model or a human

Meanwhile, LLMs are trained on increasingly vast and diverse datasets, closing the stylistic gap between AI and humans. Detectors, on the other hand, are trying to infer authorship from surface-level clues — essentially guessing from shadows.

This creates a treadmill where detectors fall behind with every model release. GPT-2 detectors were decent for GPT-2. They failed against GPT-3. They’re hopeless against GPT-4 or Claude 3.

3. Watermarking and Cryptographic Proofs? Still Theoretical

Some suggest cryptographic watermarking to solve this problem. Cryptographic watermarking means embedding invisible signals in AI text. But watermarking comes with limitations:

It’s easy to bypass with paraphrasing
It can’t be applied retroactively
It would require coordination across all LLM providers

Until these approaches are universally adopted, they remain theoretical. And even if adopted, malicious actors or cheaters will find ways around them.

4. The Adversarial Nature of Detection Is the Problem

The core issue is adversarial dynamics. Every time a detector learns a trick to spot AI, LLM users find a way to undo it. This is the same cat-and-mouse game we see in spam detection, ad fraud, or online cheating. But this time with much blurrier lines and much smarter systems.

An AI detector can’t see intention. It doesn’t know whether a paragraph was written to cheat, assist, or inspire. And in an age of collaborative writing between humans and AI, the lines are getting even harder to draw.

5. What Should We Do Instead?

Rather than chasing the mirage of perfect detection, we should shift focus:

Redesign assignments and assessments: Ask questions that require personal reflection, real-world data, or oral follow-ups. These are much harder to fake convincingly.
Teach critical thinking and AI literacy: Students and professionals will use AI. Help them use it well and ethically.
Use AI as a teaching tool, not a threat detector: AI can give feedback, explain mistakes, and guide revision better than many traditional tools.