For this year’s Peer Review Week, the theme selected was “Innovation and Technology in Peer Review”. We’ve spoken at length in recent posts about how AI and technology have changed the nature of peer review. We’ve specifically discussed ChatGPT’s growing role in peer review and how non-peer reviewed publications are increasingly using AI to write most if not all of their growing online content. It’s a growing problem that isn’t going away. ChatGPT and other similar resources are only growing more and more popular. Because of this, publishers and institutions are now trying to build bigger mousetraps to catch the proverbial AI rat.
Many publishers and universities have turned to AI to catch AI-generated content before it goes through the motions of peer review (it’s an irony so tasty that Alanis Morisette is licking her lips thinking about it). OpenAI, the organization responsible for birthing ChatGPT, has launched a countermeasure against its own creation in the form of their AI classifier (picture Dr. Frankenstein trying to control his monster and keep it from wreaking havoc). According to the company’s website, the developers have “trained a classifier to distinguish between text written by a human and text written by AIs from a variety of providers.”
The problem is that the classifier isn’t very reliable. The website states that in test groups, the classifier was only able to identify AI-written text 26% of the time while incorrectly identifying human-written text 9% of the time for false positives. This might not seem that bad, giving the tool a one-in-four chance of correctly identifying AI in writing, but in practice those averages have seemingly gone even further down while the percentage of false positives was much higher. The number of false positives and incorrectly identified AI samples have become so frequent that OpenAI shut down the classifier with the following note on their website:
“As of July 20, 2023, the AI classifier is no longer available due to its low rate of accuracy. We are working to incorporate feedback and are currently researching more effective provenance techniques for text and have made a commitment to develop and deploy mechanisms that enable users to understand if audio or visual content is AI-generated.”
OpenAI isn’t the only company trying to corner the market on AI identifier technology. Turnitin, a leading company in academic integrity and originality detecting software including iThenticate, launched their own AI detector in 2023. A review of the software prior to its release to the public provided similarly mixed results. Washington Post reporters tested the tool on 16 writing samples that were a mix of AI-generated text, human-generated text, or a mix of both. It only correctly identified 6 of the 16 samples. Three of the samples were completely misidentified. On the last seven samples, the tool only partially got things correct as it misidentified parts of the text as written by ChatGPT and could not discern some of the ChatGPT text in mixed writing samples. Despite this less than 50% accuracy, Turnitin continues to advertise the tool on its website.
The examples of false positives created by AI detection software are notable and numerous. One study showed misidentified results in 9% of results from these tools. Another study humorously ran the U.S. Constitution through an AI tool, which flagged it as AI-generated (maybe James Madison was a Terminator?). The reports of false positives using Turnitin have been so frequent that the company released a YouTube video with one of their machine learning scientists explaining the nature of the false results and essentially reminding educators, institutions, and publishers to take the results with a “grain of salt.” The argument is that humans need to be the final judge analyzing the results to determine whether the text is actually AI-generated or a false report.
Of course, it has to be noted that it’s still early in the game for these AI tools. As we know from other innovations, practice makes perfect (“Genius is one percent inspiration, ninety-nine percent perspiration” – Thomas Edison). As noted in their website statement, OpenAI plans to retool their classifier with the ultimate plan of relaunching it once the technology has been improved. And improving that technology might be even more difficult than first imagined. New online tools such as UndetectableAI, HideMyAI, and QuillBot seem to be popping up on a daily basis with the goal of fooling new detection software. This isn’t too surprising considering that ChatGPT is growing into a big business. A 2023 report found that “43% of college students have used ChatGPT or a similar AI application” in writing their papers. Some of these students will be the next generation of scientists submitting research for publication, which only highlights the growing need for AI detection. Innovation in the area of research integrity is important and necessary for peer review to continue to evolve in the face of the growing AI revolution. ChatGPT and other AI software is clearly here to stay with growing popularity on a seemingly daily basis. While it would be nice to have a tool that reliably catches any and all instances of AI-generated text, that seems to be a pipedream at this point in time. Over time, this dream might become a reality but in the current academic publishing climate, a combination of AI tool and human intuition seems to be the best bet in combatting the growing amount of AI-generated materials submitted to institutions and publishers.
By: Chris Moffitt
Chris is a Managing Editor at Technica Editorial