Recent advancements in artificial intelligence readily go viral, but even among them, ChatGPT has few equals. This versatile chatbot broke the academic world: a large proportion of students uses ChatGPT to write essays, even in so complicated fields like philosophy, and this AI drastically outscores a human in this task. It copes well not only with essays — in August 2023, a large research team reported in American Journal of Obstetrics and Gynecology that ChatGPT outscores human candidates in objective structured clinical examinations widely used in different countries as licensing and certification exams for graduate medical doctors. Its ability to write almost any text the user wants gave rise to the burst of student cheating — some students even test the education system by defending AI-written graduation theses, and these attempts tend to be successful, particularly when a student edits the obtained text.
The problem of AI-powered academic cheating has already gone beyond students — even experienced faculty members fall under suspicion of using ChatGPT to write scientific papers. Now, these facts are usually revealed accidentally by indirect indicators — for example, multiple affiliations corresponding to estimated workload which is more than excessive for one human.
These facts indicate that both universities and editorial offices are in dire need of tools to detect usage of ChatGPT. This is a real challenge — it seems difficult to detect the usage of a neural network that outperforms a human in terms of creating texts. But the solution of this problem is achievable due to multiple intrinsic flaws of ChatGPT.
ChatGPT could seem a menacing sentient creature pregnant with the rise of the machines — but indeed, it is just a powerful chatbot with a next-generation engine. Its by-design goal is to confabulate by processing natural language. It has nothing to do with mathematical, biological or any other knowledge — it even cannot process this kind of information. This leads to multiple mistakes in mathematical tasks if they are ordered to ChatGPT. Moreover, it uses the “confabulation” approach to create any information — including literature references and scientific facts — which makes it extremely prone to “hallucinate” by creating references to non-existing papers or non-existing facts about programs and services. ChatGPT is excellent in processing texts, but it fails completely to process science — and this flaw provides the most evident way to catch it red-handed. These signatures, along with some other flaws like problems with context or vocabulary, can be checked manually by a professor or an editor — but they could also ship their attention. AI cannot think, but it can lie low.
Fortunately, the emergence of ChatGPT-driven cheating led to a burst of computational methods to distinguish between human- and AI-written texts. This is literally fighting fire with fire. Contemporary computers cannot predict what a human-written text should look like just because they have not yet achieved human-like consciousness. But they can easily predict what a computer-written text should look like.
When AI is trained to create texts, it memorizes typical word consecutions which are frequent in usual conversations. For example, it learns that “home” is an appropriate word after “John goes”, and “walk” is an appropriate word after “John goes for a”. It leads to overrepresentation of predictable stock phrases in the texts generated by AIs. When a professor checks this text with an AI detector program, it predicts with its own AI which word should be next. And, if these predictions match actual present words over a fragment of text, the AI detector labels it as a potentially AI-generated fragment. This is AI detection by predictability.
Regardless of the specific metric used, AI-detecting functionality is most helpful if integrated in plagiarism detection programs: in contrast to AI detection software, plagiarism detection has become a routine procedure, and simultaneous AI detection could prevent academic cheating on a wide scale. This is the reason we implemented AI detection in Advacheck. Our product is able to warn users when it suspects AI usage and highlight the suspicious fragments. This is the only AI detection output so far, but this function is in active development, and more detailed outputs are coming soon.
While using our product to detect potential AI-generated texts is simple, opting further actions is much more difficult. When you use Advacheck for its “hardcore” task — plagiarism detection — it provides the sources where text fragments are likely to have been borrowed from. This provides the opportunity to confirm the plagiarism from a specific source and make an immediate administrative or editorial decision based on a program output.
This is not the case of AI detection regardless of the system you use. Any AI detection metric enables software only to suspect the cheating but not to confirm it.
Thus, what to do if Advacheck alerts on a potential AI-generated text? First of all, we strongly recommend manual rechecking and reevaluation of the paper, particularly in the “suspicious” fragments. Crude mistakes, senseless sentences, unexisting data or references should indicate low quality of a paper — even if these signs are not ChatGPT traces but the consequences of the author’s negligence.
Comparing the suspicious fragments against each other and the rest of the text may show the Dory Effect — contradictions between the segments as if the author has forgotten what they had written minutes ago (like Dory the Fish from “Finding Nemo” movie). This effect is one more sign of potential cheating.
And finally, we encourage professors and editors to discuss suspicious texts with authors to find out how familiar they are with their own texts. Remember that only humans have full ability and responsibility to analyze scientific texts. Anyway, you will check with your own mind. Advacheck will point where it is needed.