AuroraGPT: The AI Breakthrough for Scientific Research

News

Comment

An important project for scientific research has been unveiled. "AuroraGPT" is underway to develop an artificial intelligence of unprecedented scale, which will house humanity's scientific knowledge. This initiative aims to create one of the most advanced and well-trained AIs to support and advance Science. Its function will be to significantly accelerate scientific research by integrating information from all branches of knowledge into a single system. In this way, it will be possible to explore interconnected knowledge and obtain results and new insights that have thus far not been focused on or addressed by researchers. It is expected that scientific output will multiply, triggering an international race to secure the best AI technical infrastructure to lead advancements in research and patent development.

Technically, AuroraGPT is presumed to incorporate massive parallelization techniques and advanced task distribution methods among processing units to handle large-scale data processing—the big data generated by science, including scientific articles, papers, datasets, reports, patents, and all types of online resources and databases deemed relevant. It likely employs deep neural network architectures with multiple layers, utilizing sophisticated algorithms such as variations of "Transformer" neural networks, which are characterized by their ability to process long and complex data sequences.

The scale of the project is remarkable. According to information provided by Intel during the ISC23 conference, AuroraGPT will reach one trillion parameters, a figure that contrasts sharply with the 175 million parameters of ChatGPT’s base model. This quantitative leap is not merely incremental: it responds to the need to process the entirety of scientific knowledge generated over the past decades—a corpus encompassing peer-reviewed articles, experimental datasets, patents, technical reports, and software code.

It is speculated that the model could leverage a hierarchical structure of interconnected neural layers, enabling the assimilation and analysis of vast amounts of scientific data. The use of advanced machine learning techniques, such as reinforcement learning or unsupervised learning, may be incorporated to enhance its processing and analytical capabilities. Although the exact technical details remain unknown, AuroraGPT is expected to employ state-of-the-art algorithms to process, understand, and relate data from diverse scientific sources, thereby improving the scientific community’s ability to extract valuable insights more rapidly and accurately. It is anticipated that AuroraGPT will address relevant challenges such as "data invention" when gaps exist in the knowledge base, or alternatively, that it will properly cite and reference the sources and content upon which its analyses and responses are based. This advancement would represent a milestone in the history of Science, offering the potential to drastically accelerate scientific progress and possibly lead to unprecedented innovative and interdisciplinary discoveries.

What is Expected from AuroraGPT

Acceleration of Research: AuroraGPT aims primarily to reduce the time required to conduct scientific research by providing rapid and accurate answers to complex questions.
Identification of Hidden Connections: The AI is expected to uncover non-obvious relationships and patterns across different scientific disciplines, thereby helping to generate novel approaches for solving problems.
Facilitate access to knowledge: By having access to a vast amount of information, scientists will be able to conduct detailed queries from a single point of access.
Stimulate scientific collaboration: The ability to integrate data and concepts from diverse fields could foster collaboration among scientists from different disciplines, potentially generating unprecedented interdisciplinary innovations.
Improve citation and referencing: It is expected that all assertions, texts, and generative outputs of the AI will be properly grounded in bibliographic and documentary sources, resolving typical citation and referencing issues. This will enable us to understand how the AI studies the interrelationships among scientific knowledge, opening up a novel and emerging specialized field within Scientometrics: "AImetry".

The development of AuroraGPT aligns with a trend we have observed in previous articles: the specialization of artificial intelligence models by knowledge domains. In contrast to generalist models such as ChatGPT or GPT-4 Turbo, AuroraGPT prioritizes training focused on scientific corpora, with the ambition to become a research assistant capable of operating with the terminology, formats, and conventions specific to each discipline. This approach shares similarities with Vincent AI in the legal domain, albeit on a much larger scale and with far greater interdisciplinary ambition.

The initiative, led by a U.S. public laboratory in collaboration with industry, also raises questions about the geopolitics of scientific AI. The race to possess the most powerful models for knowledge generation and patent development is emerging as a new front of international competition, comparable to what the sequencing of the human genome or the construction of large particle accelerators once represented. The decision to make the model publicly available upon completion—as its developers have indicated—will be a decisive factor in its global adoption and in the balance of access to scientific knowledge in the coming years.

From the perspective of Documentation, AuroraGPT represents both a challenge and an opportunity. The promise of proper citation and referencing of sources addresses one of the most criticized shortcomings of current generative models, drawing closer to the ideal of documentary traceability that we advocate in our discipline. The notion of “AImetry” hinted at in the text—a specialization of scientometrics dedicated to studying how artificial intelligence processes, relates, and produces knowledge—could become an emerging field of research, with implications for the evaluation of scientific output and the understanding of the very dynamics of knowledge advancement.