AI has revolutionized countless industries, from healthcare to finance to education to the publishing industry. Among the latest applications is AI text detection software, which aims to determine whether a given piece of text was written by a human or generated by an AI chatbot such as ChatGPT or GPT-4. These tools are becoming increasingly prevalent in academic institutions, publishing, and content moderation as more and more authors and reviewers are using these tools to ease their writing burden. However, despite their growing use, AI text detectors are far from perfect. The detectors are a bit of an ouroboros situation and they not only suffer from accuracy limitations but also exhibit significant biases—especially against non-native English speakers and certain writing styles—raising ethical, social, and technological concerns.

AI text detection refers to the use of machine learning algorithms to assess whether a piece of writing was produced by a human or by an AI model. These tools analyze patterns in syntax, vocabulary richness, sentence structure, and other linguistic features. One of the most widely known detectors is OpenAI‘s own classifier, which was released in early 2023; we previously wrote about the software and how it was later discontinued due to low accuracy. Other tools, like GPTZero, Originality.ai, Turnitin’s AI detection, and Writer.com’s AI Content Detector, have also been developed with varying levels of success and transparency.
A major challenge with AI text detectors is their inconsistency. These tools often yield false positives—where human-written text is misclassified as AI-generated—and false negatives—where AI-generated text is misclassified as human. According to a recent study, OpenAI’s detector correctly identified AI-generated text only 26% of the time and misclassified 9% of human-written text as AI-generated. These figures reveal the inherent unreliability of current detection methods.
Furthermore, these tools struggle with identifying texts that fall into a gray area as many authors use AI tools to generate the bulk of the writing but then edit the content before submitting to a publication. As AI writing models improve (and the machines get smarter), the output from AI tools become increasingly difficult to distinguish from human writing, which only further complicates detection efforts.
Perhaps the most troubling aspect of AI text detection software is its potential for bias, especially against non-native English speakers. A study by Stanford researchers found that AI detectors consistently flagged human-written essays by non-native English writers as AI-generated, while similar texts by native speakers were not misclassified. This likely occurs because the language models supporting these detectors are trained with style guides and writing examples dominated by native English syntax, vocabulary, and idioms. As a result, writing that deviates from these norms is determined to be more “machine-like.”
This presents a significant equity issue in academic and professional settings. For example, a student from a non-English-speaking background could be unfairly accused of academic dishonesty simply because their writing style differs from that of native speakers. In academic publication settings, a non-English speaker could have their manuscript flagged as AI-generated and rejected leading to reputational damage due to a false positive.
At a technical level, AI detectors use stylometry and statistical patterns to make judgments; yet, these patterns rely on assumptions about what constitutes “typical” human writing versus machine writing. For instance, AI-generated text tends to be more repetitive, lacks deep semantic variation, and often has more predictable sentence structures. Detectors are trained to flag such features.
However, this approach is inherently limited. Writing styles vary significantly among humans—some authors prefer concise, formulaic prose, while others write in more complex ways. Moreover, people using English as a second language may produce text that appears more “predictable” or less varied due to linguistic constraints. Detectors are not nuanced enough to distinguish between these subtleties.
Using flawed AI detection software raises serious ethical and legal concerns. Relying on tools that produce high rates of false positives risks undermining trust and harming an individual’s reputation. This is especially problematic in high-stakes environments like education, hiring, or publishing fields.
From a legal standpoint, there’s a growing conversation about whether using AI detection in educational settings constitutes a violation of students’ rights. If a student is penalized based on an automated tool without transparent methodology, there may be due process concerns, particularly at public institutions governed by constitutional protections. Moreover, most tools do not provide detailed justifications for their decisions, making it difficult for users to challenge or understand a classification.
Another problem is the overreliance on AI detectors as definitive arbiters of truth. Because these tools are marketed as reliable, educators and administrators may take their output as correct at face value creating a dangerous false sense of certainty. In reality, detecting AI-generated text continues to be an evolving challenge. No tool currently provides even close to 100% accuracy or fairness.
Some developers openly acknowledge these shortcomings. OpenAI discontinued its classifier due to “low accuracy” and the potential for “misuse.” Similarly, Turnitin has received criticism for its AI detection capabilities, with users reporting significant false positives and lack of transparency in how the system reaches conclusions.
Instead of relying solely on detection software, institutions should consider more nuanced approaches to AI-generated content. These may include:
- Contextual evaluation: Assessing whether the content aligns with an author’s previous work or known capabilities.
- Transparent policies: Establishing clear guidelines for AI usage and detection, with input from authors, editors, and publishers.
- Human-in-the-loop systems: Incorporating expert review alongside automated tools to avoid reliance on flawed software.
- Bias audits and inclusive training data: Regular testing of AI detectors to identify and correct for bias, especially against marginalized groups.
Ultimately, as AI-generated content becomes more prevalent, the solution lies not in perfect detection, but in building systems that are transparent, fair, and centered around human judgment.
AI text detection software promises to distinguish between human and machine-generated content, but there are serious limitations and biases. These tools are not only prone to error but also disproportionately disadvantage non-native English writers and those with atypical writing styles. Without scrutiny and ethical implementation, AI detection software can do more harm than good. The future of responsible AI use will depend not on surveillance or policing, but on transparency and a watchful human eye.
By Chris Moffitt
Chris is a Managing Editor at Technica Editorial




