SEMTEST v2: pedagogical Semantic Web demonstrator

Reference

Blázquez-Ochando, M., & Ovalle-Perandones, M. A. (2026). SEMTEST: Un demostrador pedagógico de enriquecimiento semántico multicapa basado en Linked Open Data para la docencia en Información y Documentación. Infonomy, 4(4). https://doi.org/10.3145/infonomy.26.022

Comment

Teaching semantic technologies in Library and Information Science has always faced a significant hurdle: the gap between theoretical concepts (URI, RDF, SPARQL, OWL) and their actual implementation. Students are familiar with thesauri, subject headings, and authority records, but when they encounter DBpedia or Wikidata, they find interfaces that return results without revealing how those results were obtained. This opacity breaks the natural bridge that should exist between traditional knowledge organization and the Web of Data.

SEMTEST (Semantic Enrichment Test) was created to build that bridge. Version 2, which we have just published, represents a qualitative leap: it moves from being an information retrieval tool to a pedagogical demonstrator that makes every step of semantic query enrichment visible. And it does so with a clear philosophy: radical transparency. It does not merely show results; it shows how they are obtained: the HTTP requests, the headers, the JSON responses, the code that processes them. The student sees the machinery from the inside, turning abstraction into experience.

What's new in this version?

The most profound change is technical, but it carries enormous pedagogical implications. The original version relied on HTML scraping of Wikipedia, a fragile and opaque method. Now everything is based on official JSON APIs: Wikipedia, Wikidata, DBpedia, Open Library. In addition, a direct SPARQL query layer has been added against the DBpedia endpoint, with the specific HTTP header required by the protocol. Small details like this —a student seeing that application/json is not enough, but application/sparql-results+json is required— turn theory into tangible practice.

The demonstrator organizes the process into nine phases that traverse the layers of the Semantic Web architecture: from URI identification, through Wikipedia content extraction, backlinks, Commons image retrieval, canonical URI resolution in DBpedia, RDF property classification, Wikidata querying with semantic grouping of statements, SPARQL execution, bibliographic search in Open Library, and finally the construction of an interactive multi-layer graph.

The classification of RDF properties is one of the most interesting novelties. SEMTEST visually distinguishes between datatype properties (typed literals, such as dates), object properties (pointing to other URIs), and hierarchical relationships (rdf:type, rdfs:subClassOf). The student can see, for example, that dbo:birthDate is a literal of type xsd:date, while dbo:birthPlace is a link to another resource. This distinction, which in textbooks remains abstract, becomes intuitive when observed in a colored graph.

A graph that speaks for itself

The final visual representation is a knowledge graph with five semantic layers differentiated by color and position: the central entity, the RDFS class hierarchy, SKOS categories, related instances, and Open Library bibliographic resources. Inferred relationships —for example, through rdfs:subClassOf— are drawn with dashed lines, showing the student the difference between what is explicitly asserted and what can be deduced. This distinction is the foundation of ontological reasoning, and seeing it in action on their own query is far more effective than reading about it.

Moreover, the graph can be exported in Turtle, the standard RDF format, and loaded into tools such as Protégé or Apache Jena. The cycle is complete: the student moves from a natural language query to an RDF dataset that they can explore, query with SPARQL, and visualize in an ontological editor. This is pure experiential learning.

What does it bring to the classroom?

SEMTEST addresses three pedagogical problems at their root. First, abstraction: terms like "URI" or "object property" become concrete when seen in operation on familiar entities (Picasso, Madrid, Artificial Intelligence). Second, the black box: by showing each request and each response, the student learns to critically evaluate LOD services, understanding their limitations and dynamics. Third, disciplinary disconnection: the tool makes explicit that the broader/narrower terms of a thesaurus are rdfs:subClassOf in a graph, that authority control materializes in persistent URIs such as VIAF or ISNI, that subject analysis corresponds to the SKOS layer. This conceptual continuity is its greatest pedagogical value.

Limitations and future directions

The tool depends on the availability of external APIs (DBpedia, Wikidata, Open Library), which can occasionally cause timeouts or changes in responses. But the authors turn this limitation into a resource: the student learns that the Semantic Web is a living ecosystem, not a controlled laboratory. As future directions, they plan to add an RDFS reasoning module on the client side (using rdflib.js) and an alignment with the UNESCO thesaurus, to contrast automatic categories with institutional vocabularies.

SEMTEST v2 is not just a tool; it is a teaching philosophy. It makes the invisible visible, connects the traditional with the innovative, and places the student at the center of the process. If you work with Semantic Web, knowledge organization, or LOD technologies in the classroom, I invite you to try it. The best way to understand it is to type in a term and watch how the journey from the Document Web to the Web of Data unfolds.