We are pleased to announce our upcoming event on "AI-Assisted Document Analysis," a key gathering to explore how AI is transforming the way we analyze and organize documentary information. This time, we will be joined by Professor Blázquez, who will share his experiences in developing AI-driven analysis tools and methods specifically designed within the context of Documentation Sciences.
- Date and Time: November 29, 2024 / 4:00-6:00 PM
- Location: Conference Room, Faculty of Documentation Sciences, UCM
- Admission: Free, subject to room capacity
Context and Foundations of Document Analysis
The conference begins with a review of the fundamental concepts underpinning documentary processing in information centers. Document analysis—the set of operations and techniques applied to documents to facilitate their organization, preservation, retrieval, and use—has traditionally been the cornerstone of efficient information management in libraries, archives, and documentation centers.
Document Analysis as a Foundation.
The process of extracting and representing the information contained in documents is divided into two complementary aspects: formal analysis, which focuses on the physical characteristics of documents, and content analysis, which centers on their subject matter and meaning. Their ultimate purpose is to facilitate the identification, retrieval, and dissemination of information through tools such as abstracts, keywords, and descriptors.
Access Points.
Identifying the elements that enable the location and retrieval of a document within an information system—such as author, title, subject, date, and place—constitutes one of the most critical aspects of documentary processing. These access points serve as bridges between users’ information needs and the documents to be retrieved; therefore, their standardization through standardized criteria is essential to ensure consistency.
Systematic organization.
Cataloging, classification using documentary languages, authority control, indexing, and the description of physical and content characteristics constitute the network of operations that enable the systematic organization of documents according to their subject matter or features.
Information retrieval.
Within the context of documentary processing, information retrieval is defined as the process of identifying and locating documents relevant to a specific information need. The systems that enable it—databases, online public access catalogs (OPAC), and specialized search engines—have evolved by incorporating strategies such as the use of Boolean operators, field-based searching, advanced filters, facets, and controlled vocabularies. Their effectiveness is evaluated in terms of relevance, pertinence, comprehensiveness, and precision.
Dissemination and reuse.
The documentary cycle concludes with the distribution, dissemination, and rebroadcasting of content through various channels—websites, social networks, email, selective dissemination services, mobile applications, podcasts, video platforms—and its storage in document management systems, digital repositories, databases, or cloud servers. Reuse, understood as the adaptation and repackaging of information for different uses and users, is complemented by measurement and analysis through metrics, traffic evaluation, and user experience analysis.
Evolution of the information professional.
Before the advent of artificial intelligence, the information professional had already undergone a significant transformation: from custodian to facilitator, with a shift in focus toward user service. Automation through computer systems, the acquisition of new technological and project management competencies, specialization in profiles such as audiovisual documentarians or digital content managers, and adaptation to new environments such as Web 2.0, 3.0, and 4.0, open access, Big Data, and data mining have shaped a profession in constant evolution, confronting challenges such as information overload, the need for continuous training, and the social recognition of its function.
Applications of AI in the documentary analysis of photography
The second part of the lecture focuses on one of the most innovative areas: document analysis of photographs through artificial intelligence. Current AI models offer capabilities that radically transform how visual collections can be processed.
Image Recognition and Automatic Tagging.
AI enables the detection of objects and scenes, generation of tags and metadata, normalization of descriptors, identification of objects within areas and frames, as well as detailed description of visible elements: primary objects, people, living beings, actions, activities, environment, setting, dominant colors, perceived emotions or atmosphere, relevant details in foreground and background, visible text, artistic style, and historical period or context. Tools such as GroundingDINO (IDEA-Research) exemplify these capabilities, enabling precise identification of visual elements that previously required hours of manual labor.
Facial and Emotion Recognition.
AI can identify individuals and analyze facial expressions, detecting primary emotions—joy, sadness, anger, surprise, fear, disgust, neutral—and estimating their intensity on a scale from 1 to 10. The analysis incorporates facial context, elements such as eyes, eyebrows, mouth, forehead, and cheeks, as well as factors that may influence interpretation, such as lighting, face angle, or objects adjacent to the face (glasses, hats, wigs).
The demonstration includes practical cases of emotion analysis on individual and group faces, with confidence assessments that account for factors such as the presence of makeup that amplifies expressions or the absence of situational context, which may limit analytical depth. A particularly revealing example is the application of these systems in educational settings, such as schools in China that use real-time facial recognition to monitor students’ levels of attention during classes.
Image Restoration and Enhancement.
AI also provides capabilities for image restoration and enhancement: noise removal and damage repair, resolution upscaling, colorization of black-and-white images, improvements in focus, lighting, contrast, color saturation, white balance, and removal of chromatic aberrations. Tools such as Bringing Old Photos Back to Life (Microsoft) and GFPGAN (Practical Algorithms for Real-world Face Restoration) demonstrate the potential of these technologies to recover and preserve visual heritage.
Visual indexing and search.
Content-based visual search enables grouping by similarity, detection of similar or duplicate images, source identification of images, translation of text present in photographs, text indexing, and rights management. Google Lens is a paradigmatic example of these capabilities applied at scale.
Automatic generation of descriptions and summaries.
AI systems can generate objective and detailed descriptions of the main visible elements in an image, interpret the scene’s context (location, era, atmosphere), identify objects and subjects, analyze composition (arrangement of elements, color, lighting), synthesize the essence into concise summaries, and suggest possible interpretations or meanings that the image might convey. Concrete examples analyzed during the conference include:
- A photograph of a cat with a slice of watermelon on its head, featuring a description of the scene, the animal’s expression, and an analysis of the composition.
- The winning image from the 2023 World Press Photo competition depicting workers constructing Egypt’s new financial capital, with contextual analysis positioning the scene within a developing country and interpreting it as a symbol of progress and human effort behind large-scale urban projects.
- A photograph of a Peruvian woman with a baby alpaca in the Andes, analyzed as a representation of the relationship between people and nature in rural communities, conveying a message about the preservation of ancestral traditions.
Thematic and chronological classification.
AI can organize photographs by themes or historical events, estimate their geolocation (country, city, region, coordinates, landmarks), identify the main subject and subtopics, perform chronological analysis (historical period, era, elements that help date the image), and contextualize the image socially, politically, and culturally.
An example presented during the conference is the analysis of a photograph of Times Square, where AI identifies elements that help date the image between 2009 and the early 2010s: advertisements for musicals such as Billy Elliot (award-winning in 2009) and Mamma Mia, alongside the presence of brands such as Bank of America, McDonald's, and Kodak. The contextualization addresses the cultural significance of Times Square as "the crossroads of the world," its role as a center of consumerism, and the global influence of American culture.
Live Demonstration of Image and Photograph Analysis
The third part of the conference consists of a live practical demonstration showing how the AI system can assist in the detailed analysis of images and photographs. The demonstration is structured into several segments:
From Image to Database.
It demonstrates how AI can identify plant species from photographs—gazania, mirabilis jalapa (four o'clock flower), parthenocissus quinquefolia (Virginia creeper), hydrangea macrophylla (hydrangea), iresine herbstii (blood of Christ)—and automatically generate the necessary metadata for insertion into databases, including the corresponding SQL code.
From Film to Catalog.
The demonstration includes the analysis of frames from classic and contemporary films, extracting complete cataloging information:
- Meet John Doe (Frank Capra, 1941): identification of the bar scene, characters (John Doe, bartender, patrons), actors (Gary Cooper), location, actions, themes (radio speech, social impact, political critique), and genre.
- Colt .45 (Edwin L. Marin, 1950): analysis of a scene in the sheriff’s office of the Old West, featuring characters Steve Farrell and the local sheriff, portrayed by Randolph Scott, with themes of law and order and justice.
- The General (Buster Keaton, 1926): identification of the locomotive, the character Johnny Gray, and his expression of determination, with themes of courage, ingenuity, and personal perseverance.
- Metropolis (Fritz Lang, 1927): analysis of the scene in Rotwang’s laboratory, featuring the characters Rotwang, Joh Fredersen, and the robot Maria, with themes of industrialization, class struggle, and ethics in technology.
- A Place in the Sun (George Stevens, 1951): analysis of the couple Angela Vickers (Elizabeth Taylor) and George Eastman (Montgomery Clift) in a car, with themes of love, ambition, and internal conflict.
- Mr. Smith Goes to Washington (Frank Capra, 1939): identification of Jefferson Smith’s (James Stewart) filibuster in the Senate, with themes of democracy, idealism, and political corruption.
- The Tenant (Roman Polanski, 1976): analysis of the unsettling atmosphere in the hospital room, with themes of alienation, paranoia, and identity.
- El Vampiro (Fernando Méndez, 1957): identification of Count Lavud (Germán Robles) and Marta (Ariadne Welter) at the Mexican hacienda, with themes of vampirism, mystery, and the struggle between good and evil.
- The Outlaws (Kang Yoon-sung, 2017): analysis of the scene featuring Detective Ma Seok-do (Ma Dong-seok) in the bustling Seoul market, with themes of justice, crime, and urban tension.
Additional recognition tests.
The demonstration includes cases of identification of scientific and contextual images:
- Photograph of North Korea following the death of Kim Jong Il (correct identification).
- Rouleaux effect of red blood cells (initially not correctly identified, but after reformulation, complete identification was achieved within the medical-hematological context).
- Cellular tissue of a leaf (correct identification with context, classification, and characterization).
- Synthetic DNA origami (complete identification of the object, details, technique, and characteristics).
The results are presented in tables showing the initial accuracy percentage, detected errors, improvement after query reformulation, and useful details identified in each case.
Ethical Challenges and Future Perspectives
The conference concludes with a reflection on the ethical challenges posed by the application of AI in the documentary analysis of photographs:
- Accuracy and reliability. Errors in the recognition of people, environments, contexts, and events depend on the level of training of the AI. The quality of the results is directly related to the comprehensiveness and representativeness of the datasets used to train the models.
- Privacy and ethics. Facial recognition of individuals raises serious privacy concerns. Its use for population monitoring may violate fundamental rights. The existence of technologies such as ultra-realistic masks designed to deceive facial recognition systems—such as those created by hackers to bypass controls at borders and airports—demonstrates the tension between technological capabilities and civil rights. Surveillance systems developed by companies such as Urme Surveillance illustrate the scope of these technologies and their potential use in contexts of social control.
- Biases. The training of AI depends on the photographic knowledge base used for its development. Issues such as image rights, comprehensiveness in training across diverse content, algorithmic transparency, and the handling of ideological trends or approaches are fundamental concerns that must be addressed to prevent the perpetuation of biases.
- Automation and Opportunities. The sophistication of these technologies poses risks to employment within the sector, particularly in the context of the Fourth Industrial Revolution and the evolving role of the photographic archivist. However, they also open new opportunities: the development of proprietary technology, the creation of new products and services, enhanced analytical capabilities, and the scientific and professional adaptation to new paradigms.
This conference is part of the activities of the ConocimIA Seminar, a space dedicated to monitoring and analyzing artificial intelligence in the field of Documentation Sciences.
Conference Materials
The materials used in this session are available for download in PDF format. The presentation captures the ideas, references, and practical examples developed throughout the conference and can serve as a starting point for further exploration of the topics covered or for use in educational contexts, provided proper attribution is given.
- Blázquez-Ochando, M. (2024). FotonimIA: Documental Analysis of Photography in the Age of Artificial Intelligence. conocimia_mblazquez_2024-11-29_fotonimia-imagenes.pptx