Reference
- Desaire, H.; Chua, A.E.; Isom, M.; Jarosova, R.; Hua, D. (2023). Distinguishing academic science writing from humans or ChatGPT with over 99% accuracy using off-the-shelf machine learning tools. Cell Reports Physical Science. https://doi.org/10.1016/j.xcrp.2023.101426
Comment
As we are witnessing in the landscape of Artificial Intelligence, language models are transforming our relationship with information: from the ability to track the web in real time to the unsettling capacity to deceive their operators. But perhaps none of these dimensions affects so directly the core of academic activity as the one we address today: the infiltration of AI-generated texts into scientific literature. A recent study led by Professor Heather Desaire of the University of Kansas has demonstrated that it is possible to detect with near 100% accuracy scientific articles written by ChatGPT in the field of chemical sciences, opening a promising pathway for preserving the integrity of academic communication. This has transcended the academic sphere and captured our attention.
Researchers have reportedly developed an artificial intelligence (AI) text detector for scientific essays that can distinguish between human-written and machine-generated content with nearly 100% accuracy. The study, published in Cell Reports Physical Science, explains that while general-purpose AI text detectors already exist, none perform particularly well in the context of scientific documents.
What makes this research significant is not only its high accuracy rate but also the methodology employed and its implications for the publishing ecosystem. The Desaire team trained their detector exclusively using journals from the American Chemical Society—a highly specialized corpus. In contrast to general-purpose detectors such as ZeroGPT or OpenAI’s own tools, which showed mediocre performance on chemistry texts, this domain-specific approach correctly identified 100% of human-written introductions and 98% of those generated by ChatGPT from summaries.
The team’s detection software was trained using journals published by the American Chemical Society. They collected 100 introductions written by professionals and then programmed ChatGPT to generate its own introductions based on journal abstracts or simply on report titles. When the ChatGPT detector scanned three categories of reports, it correctly identified human-written sections 100% of the time, as well as those generated from report titles. Results were equally accurate for reports based on introductory sections, achieving a 98% correct identification rate.
From the perspective of Documentation Sciences, this study illustrates a fundamental principle: information retrieval and authenticity analysis cannot be separated from the disciplinary context. A model trained on organic chemistry literature detects stylistic, terminological, and rhetorical patterns specific to that field, which generic detectors overlook. This suggests that, in the future, editorial integrity management will require verification systems tailored to each domain of knowledge, not universal solutions.
Other automatic classification and machine learning systems, such as ZeroGPT and OpenAI, did not perform as well on reports related to chemistry. This has important implications for scientific journals seeking to prevent the infiltration of AI-generated content and its potential problems, such as the fabrication of false data.
Excessive use of AI could inundate journals with papers of marginal value and cause emerging works to be ignored. Moreover, there is concern that these AI tools tend to invent facts and assertions that are untrue when their knowledge base is insufficient to meet the needs of their users.
Here lies one of the most critical points we previously highlighted when discussing GPT-4’s ability to deceive in financial contexts. The factual hallucination of language models is no minor issue when transferred to the scientific domain. An article introducing nonexistent laboratory data or fabricated bibliographic references not only erodes the credibility of journals but can also divert research lines into dead ends. Early detection of such content thus becomes a safeguarding function for the scientific method itself.
Therefore, it is becoming crucial to identify and mitigate the influence of AI on scientific journals. Desaire emphasizes that journals must lead the detection of "AI contamination" and ensure that their policies on AI writing are enforced. Although some argue that resisting the emergence of AI-generated content is inevitable, the professor believes that developing tools like this enables researchers to keep pace with detecting and mitigating these issues. The work of Desaire and her team represents an encouraging step in this direction. But it also reminds us that the battle for scientific integrity will be a continuous race. Each advancement in detectors will likely be followed by improvements in models’ ability to mimic human style, in a cycle of action and reaction we have already observed in other areas of digital security. What is at stake, however, is far more than the authenticity of individual articles: it is trust in the system of scientific communication as a pillar of collective knowledge. Where is the boundary? Will AI be capable of creating true Science? Will AI-generated creations become indistinguishable from our own?