AI in Documentation: ConocimIA Seminar Insights

Reference

Blázquez-Ochando, M.; Lázaro-Rodríguez, P. (2024). Debates, challenges, and opportunities of artificial intelligence in Documentation: The ConocimIA seminar. Métodos de información, 15(28), 52-83. https://doi.org/10.5557/IIMEI15-N28-052083

Comment

The emergence of large language models (LLMs) and generative artificial intelligence, exemplified by ChatGPT, has brought about a radical change in multiple domains, including Documentation Sciences. As Sardana, Fagan, and Wright (2023) note, ChatGPT is a disruptive innovation because it alters academic and social norms regarding original work, research development, and scientific publication. Similarly, González-Alcaide (2024) argues that generative AI constitutes a disruption in access to information due to its interactive, contextual, and generative nature, with significant impact on educational and academic contexts.

In this context, the scientific community has begun to explore the implications of these technologies. Torres-Salinas and Arroyo-Machado (2023) published a manual with examples of ChatGPT applications in research and university education, covering topics from scientific writing to programming in various languages. More specifically, Torres-Salinas, Thelwall, and Arroyo-Machado (2024) have developed a corpus of ChatGPT applications focused on bibliometrics.

In the library field, authors such as Adetayo (2023) have compared searches conducted with ChatGPT to traditional queries directed at library staff, concluding that students will continue to demand human reference services. Yang (2024) agrees that, although ChatGPT excels in information retrieval in certain areas, its performance is not comparable to that of library staff in others. Beyond these comparisons, authors such as Franganillo (2023) view LLMs as an opportunity for the library profession. In this vein, Chen (2023) and Cox and Tzoc (2023) analyze the implications of AI for the profession, while Lappalainen and Narayanan (2023) and Torres (2024) present chatbots for user interaction. Brzustowicz (2023) focuses on the potential of AI in cataloging, although Yang and Mason (2024) acknowledge its limitations compared to human expertise in tasks such as answering questions or MARC21 cataloging.

In archival studies, works such as those by González-Gallardo et al. (2023) explore entity recognition in historical documents, and Spina (2023) concludes that, despite detected inaccuracies, digitization and AI can significantly enhance archival research. In multimedia documentation, large vision models (Maaz et al., 2024) and tools such as DALL-E and Sora (Liu et al., 2024) are transforming the generation of images and videos from text.

In bibliometrics, Bornmann and Lepori (2024) analyze whether ChatGPT can be used to detect comparable institutions in benchmarking processes, concluding that expert intervention is necessary to review the results. De Winter (2024) explores ChatGPT’s potential for predicting citations, Mendeley readers, and social media engagement, while Sandnes (2024) examines whether it is possible to identify highly cited academics, with limited success for ChatGPT 3.5.

The debate on the use of AI in peer review and scientific writing is particularly contentious. Lopezosa (2023b) examines its application in editorial processes, while Carabantes and González-Geraldo (2023) highlight significant limitations of ChatGPT in peer review. Kousha and Thelwall (2024) and Thelwall (2024) reach similar conclusions. Mollaki (2024) questions whether the use of ChatGPT signifies the "death of the reviewer" or the integrity of peer review, pointing to the absence of guidelines and policies on this matter. Flanagin and Bibbins-Domingo (2023) mention the case of the JAMA publisher, which prohibits reviewers from inputting parts of manuscripts into chatbots due to violations of confidentiality agreements.

Regarding authorship of scientific works, Tang (2023) identifies two types of editorial policies: those that completely prohibit the use of ChatGPT and those that permit it under certain conditions. Kendall and Teixeira-da-Silva (2024) focus on the risks of abusive use of LLMs in scientific publication: authorship, predatory publishing, and paper mills. Other authors reflect on the ethical use of ChatGPT in scientific writing (Lund et al., 2023; Schlagwein & Willcocks, 2023; Cotton, Cotton & Shipway, 2024), while Thorp (2023) argues that ChatGPT cannot be an author. Alkaissi et al. (2023) invoke the concept of "hallucinations" to challenge the integrity of what ChatGPT provides.

In the field of programming, Torres-Salinas and Arroyo-Machado (2023) highlight the potential of ChatGPT across various languages. Hajj and Sah (2023), as well as Tóth, Bisztray, and Erdodi (2024), analyze the impact of ChatGPT on PHP programming, while Wuisang et al. (2023) and Diehl et al. (2024) evaluate code generated in Python and R.

ConocimIA: Origin, Methodology, and Objectives

The ConocimIA seminar emerges in this context of disruption with the aim of creating a space for dissemination, learning, and reflection for the academic community around the debate generated by AI in the Documentation sector. The initiative is structured around seven main goals, each with its corresponding objectives:

Analysis of Impact: examine how AI is transforming the field of Documentation Sciences.
Exploration of Uses and Applications: identify the most relevant tools and techniques.
Assessment of Advantages and Challenges: evaluate opportunities and threats in professional, scientific, academic, and productive domains.
Practical Activities: conduct workshops and sessions focused on teaching specific AI tools for Documentation.
Training and Updates: promote the integration of teacher training programs that incorporate AI.
Innovation and Development: identify documental tasks that can be optimized through AI and collaborate in the development of new services.
Collaboration and International Networks: establishing links with national and international institutions for knowledge exchange.

For its dissemination, the website http://www.conocimia.digital vex was created, structured into three fundamental sections: Home (latest updates), Events (scheduled sessions), and Resources (materials from past activities, including videos).

Activities Developed

~~The article reviews the main activities of the seminar to date:~~

~~Opening Conference: "The Disruptive Emergence of AI" (November 17, 2023).~~This session provided a comprehensive introduction to the concept of AI in Documentation, exploring its evolution from the perspective of Information Retrieval and its influence on the development of algorithms for neural networks and machine learning. Observable effects on user behavior were discussed—the decline in the use of search engines in favor of AI systems, and the shift from seeking sources to seeking direct answers—as well as issues related to dependence and the delegation of thinking. Privacy and control concerns associated with commercial AI systems were addressed, alongside the role of free software as a counterbalance, and the risks to the development of critical judgment.
~~ChatGPT Workshop: Data Mining of PARES and Public Libraries (December 15, 2023).~~ ~~This hands-on workshop demonstrated how ChatGPT can assist in creating~~ ~~web-scraping~~ programs to extract data from bibliographic catalogs and archives. The interaction with the model to generate Python and PHP code was demonstrated, along with advanced programming concepts such as loops, conditional structures, and arrays.
~~Lecture: "Understanding How ChatGPT Works" (December 15, 2023).~~ The internal functioning of the model was explained in an accessible manner: the Transformer architecture, the attention mechanism, tokenization, vectorization and embeddings, and the probabilistic method of response generation. It was also compared with traditional Information Retrieval systems, highlighting their differences and complementary nature.
~~Conference: "Artificial Intelligence in Multimedia Documentation" (February 23, 2024).~~ With the participation of Dr. Alfonso López Yepes, the evolution of automation in Multimedia Documentation and the impact of AI on areas such as time-stamping, video generation, automatic transcription and translation, image restoration and enhancement, voice and music synthesis, object and person recognition, and augmented and virtual reality were addressed. Risks such as bias, automated censorship in speech-to-text processes, and hyper-personalization of content were analyzed.
~~Conference: "What ChatGPT Does Not Do" (February 23, 2024).~~
This session explored the limits of AI in Documentation: the problem of reliable citation and referencing of sources, the use of proprietary formats, the invention of data (hallucinations) when asked to complete metadata structures, and difficulties with markup formats lacking closing tags. Possible solutions were critically analyzed, and specialized GPTs available on OpenAI’s GPT Store were reviewed.
~~Lecture: "The Problem of AI and Sources" (April 26, 2024).~~Continuing from the previous session, three generative AIs—Perplexity, Phind, and ChatGPT—were analyzed for their ability to provide sources, citations, and bibliographic references in HARVARD and APA styles. Variable levels of reference accuracy and transparency were observed, with Phind emerging as the most effective in addressing this issue.
~~Conference: "The First Documentation AI" (April 26, 2024).~~ "Mayordomo" was presented, the first open-source-based AI model in Documentation, built on PrivateGPT, Llama, Qdrant, and Mistral, installed on local servers and trained on a specialized collection of documents and articles from the field. This system addresses privacy and control issues associated with commercial AI solutions by providing a private, localized alternative. It includes an intermediate software layer that manages user requests, logs them, processes them in a queue, and offers a customized interface for library services.

Future Research Directions

~~The article also outlines the research directions ConocimIA intends to pursue in the future:~~

~~Ethics and Plagiarism.~~ Generative AI has overwhelmed traditional plagiarism detection tools, enabling students to produce final projects and solve exercises with minimal effort. In this regard, the project led by Dr. Michela Montesi, "Development and Validation of Teaching Activities with AI in the Field of Information and Documentation," represents a significant contribution to educating, training, and instructing the university community on the appropriate use of AI.
~~Curricular Transversality.~~ AI has multiple applications across all areas of Library and Information Science. It is necessary to address specific events for courses such as Information Retrieval, Documentary Languages, Metadata, Cataloging, Advanced Information Extraction and Processing, or Digital Publishing, with the active participation of involved faculty members.
~~Librarian Training.~~ The rapid integration of AI in the automation of bibliographic and documentary tasks demands intensive training for university and academic library staff, enabling them to harness the potential of AI and reinvent existing services and products.
~~Intellectual Property and Copyright.~~ Questions such as Who holds the rights to a document generated by AI? What threshold determines when an AI-generated work can be appropriated by a prompt author? require profound reflection by the scientific community.
~~AI-assisted software development.~~ ~~The~~ ~~web-scraping~~ ~~workshops have been the starting point of a movement that must continue with the development of systems for managing archives, libraries, museums, metadata, cataloging, retrieval, and~~ ~~Big-data~~~~, as well as workshops on automated metadata generation, content management program creation, sentiment analysis system design, and more.~~
~~Semantic Web.~~ ~~AI has the capacity to generate linked data structures according to relational models provided in~~ ~~prompts~~~~. Given the slow pace of change in university curricula, it is essential to transmit this knowledge to professionals, students, and educators.~~

Collaborations

~~The article identifies several lines of collaboration that the seminar aims to develop:~~

~~Specialized AI Projects:~~ ~~collaboration in designing criteria for document selection in AI documentary corpora, helping to reduce the risk of bias.~~
~~"Bailiff" Project:~~ ~~the first AI in Documentation is open to new sponsors, researchers, and collaborators who can foster its development, data feeding, and training.~~
~~Teaching Innovation:~~ ~~development of training programs that integrate AI into Document Science courses.~~
~~Reinvention of the Profession:~~ ~~collaboration in designing new degree programs, such as a Master's in Big Data and AI in Documentation.~~
~~Identification and Optimization of Tasks:~~ ~~research to determine which documentary tasks can be improved or replaced by AI.~~
~~Development of New Services:~~ ~~collaboration between universities and companies to create innovative AI-based services.~~
~~Evolution of the Professional Profile:~~ ~~research on how AI is transforming the competencies required for documentarians.~~
~~International Collaboration:~~ ~~establishment of partnerships with Hispanic-Mexican, Hispanic-Brazilian, and European institutions for knowledge exchange.~~
~~User Studies:~~ ~~design and execution of studies that help better understand users’ needs and behaviors in the use of AI applications and services.~~

Conclusions

AI has emerged as a disruptive tool in Documentation Sciences, transforming the way various academic and scientific tasks are carried out. It facilitates the automation of complex processes, improves efficiency in information retrieval and organization, and opens new opportunities for research and service development. However, it also presents significant challenges: the need to update curricula to incorporate AI competencies, the ethical management of information, and the adaptation of professionals to an ever-evolving technological environment.

ConocimIA has developed an innovative methodology that combines a digital platform with bimonthly in-person seminars. The platform serves as a news and resources observatory, providing access to educational materials and recorded lectures. The seminars enable direct and practical interaction, where current topics are discussed, case studies are presented, and hands-on activities are conducted to facilitate the understanding and application of AI tools in real-world contexts.

One of the pillars of ConocimIA is the organization of practical workshops that enable participants to experience AI technologies firsthand. These workshops not only provide technical skills but also foster a critical understanding of the capabilities and limitations of AI in professional practice.

Among the innovative projects presented, the development of the first AI model in Documentation, "Mayordomo," based on open-source technologies, stands out. This system not only demonstrates the potential of AI to transform document management but also underscores the importance of developing tools tailored to the specific needs of the field. Looking ahead, ConocimIA plans to expand its activities by integrating AI more deeply into academic curricula, collaborating more closely with the private sector and the Spanish Society for Scientific Information and Documentation (SEDIC), and exploring new ways to use AI to develop innovative services and enhance the efficiency of existing ones.