Reference
- Blázquez Ochando, M. (2014). Proposal for the development of a web metadata system for public administration. Paper presented at the Latin American Symposium on Access to Government Information, Mexico City, Mexico. http://eprints.rclis.org/22968/
Comment
In the current context of e-government and the growing demand for governmental transparency, metadata systems have become fundamental tools for document management, institutional interoperability, and public information access. The research presents a critical analysis of major existing metadata schemas—AGLS (Australian Government Locator Service), e-EMGDE (Metadata Schema for Electronic Document Management), and Dublin Core Qualified—to propose, based on their strengths and weaknesses, a new model called GPAM (Government Public Administration Metadata).
Critical Analysis of Existing Metadata Schemas
1. AGLS (Australian Government Locator Service)
The AGLS schema was designed with the purpose of establishing a standardized method for describing Australian public administration documentation. Its key features include:
- Semantic foundation: Implements aspects of the Semantic Web through the use of RDF triples, theoretically enabling semantic inferences.
- Predefined structure: Uses denotative prefixes that facilitate identification of the application domain (AGLSTERMS, AVAILTERMS, ADMINTERMS, AGENTTERMS).
- Relationship with Dublin Core: Incorporates 46 of the 55 Dublin Core Qualified metadata elements, accounting for approximately 50% of the total.
From an archival perspective, AGLS offers significant advantages in terms of the capacity for interrelation among resources, with 34 metadata elements specialized in associated documentation. However, its practical application reveals limitations in distinguishing between types of entities (documents, agents, activities) and in manual encoding, despite the clarity of its prefixes.
2. e-EMGDE (Metadata Schema for Electronic Document Management)
Developed as a standardized instrument for the Spanish public administration, e-EMGDE is characterized by:
- Archival perspective: It provides a view of the electronic document's life cycle, establishing similarities with the description areas of the ISAD(G) standard.
- Hierarchical structure: It organizes its elements and subelements into a structure that allows detailed description of the properties and characteristics of documentary objects.
- Specialized Coverage: It excels in the description of access and usage conditions, with 29 metadata elements dedicated to this area, as well as in the traceability section, with 14 metadata elements addressing initiating regulations, jurisdiction, and document transfers.
The comparative analysis reveals that e-EMGDE surpasses AGLS in total number of metadata elements (117 versus 100), but presents significant practical disadvantages. The length of its terms—some of which exceed 50 characters—hinders manual coding, and its exclusive use of Spanish limits its international interoperability.
3. Common Limitations
Both schemes exhibit significant shortcomings from an archival and documentary perspective:
Dimension | Identified limitation |
|---|---|
Multilevel analysis | None of the schemes specify levels of comprehensiveness in their usage guidelines |
Publishers | Absence of public and official publishers to facilitate implementation |
Retrieval Systems | Lack of standardized systems for creating automated search engines and directories |
Semantic Differentiation | Ambiguity in the distinction between the described document and the agents involved in its management |
Standardization | Absence of predefined templates to guide the selection and combination of metadata according to document typologies |
Theoretical Foundations of the GPAM Proposal
The GPAM proposal emerges as a response to the identified limitations, proposing a metadata model that addresses four fundamental issues:
- Which elements, objects, and agents must be described
- What level of comprehensiveness the description must have
- How to design editors and assisted encoding systems
- How to implement information retrieval systems
1. Identification of entities using prefixes
One of the most significant contributions of GPAM is the introduction of a prefix system that enables unambiguous identification of the nature of the described element:
Prefix | Designation | Type of securities |
|---|---|---|
DOCGROUP | Documentary group | Fonds, subfonds, series, subseries, section, simple file, composite file, box, bundle |
AGENT | Agent | Public or private institution, natural or legal person, body, department, information system |
ACTIVITY | Activity | Goal, function, activity, action, process |
FRAMEWORK | Regulatory framework | Constitution, laws (organic, ordinary, regional), statutes, international agreements and treaties |
TRACE | Traceability | Initiation, editing, transfer, purging, copying, signing, verification, supervision, sealing |
This prefix-based structuring resolves one of the fundamental problems identified in existing schemas: the ambiguity between the described document and the associated agents or activities. By incorporating the prefix into the encoding, the system enables clear discrimination of the type of entity being described.
2. Syntactic Combination Model
The GPAM schema proposes a syntactic method that combines identifier prefixes with specific metadata. The basic structure is as follows:
For example, to describe an administrative file:
This syntactic model generates 240 possible combinations (5 prefixes × 48 metadata elements), doubling the descriptive capacity of the AGLS and e-EMGDE schemas, and enabling the cataloger to select the most appropriate elements for each documentary context.
3. Metadata Structure
The GPAM proposal organizes its 48 metadata elements into description areas that, while inspired by the ISAD(G) standards, incorporate specific categories for public administration:
Area | Metadata | Archival Application |
|---|---|---|
Identification | identifier, type, localcontrol, intercontrol | Standardization of national and international codes |
Title and Responsibility | title, othertitle, fullname, relcreator, relcontributor, relauthority | Distinction between author, contributor, and cited authority |
Dates | datecreated, datefinished, dateupdated, dateapply, daterights | Chronological control of the document lifecycle |
Content and Structure | summary, keywords, description | Description of Scope and Content |
Context | conhistoric, constatus, consocial, conspatial, contemporal | Historical, legal, social, and spatial contextualization |
Physical Description | phisholder, phisextent, phisdetails, phisdimensions, phisenclosed | Characterization of Support and Format |
Jurisdiction and Valuation | jurisdiction, valuables | Scope of Application and Documentary Values |
Access Conditions | accessconditions, rights, language, signature, security, classaccess | Access and Rights Control |
Relationships | relversionprev, relversionnext, relhierasc, relhierdesc, reldocument, relcopy, relsource | Linking between versions, hierarchies, and related documents |
Control and Notes | notes | Additional Information |
4. Flexibility and Levels of Detail
The GPAM proposal introduces a system of levels of exhaustiveness that allows the description to be adapted to the specific needs of each institution. The basic identification metadata—those using the prefix and the metadata "identifier"—are considered mandatory to ensure the unique identification of the resource. The remaining metadata are applied according to institutional policies and the characteristics of the documentary fonds.
This flexibility is particularly relevant in the archival domain, where centers with varying technical capabilities and resources coexist, ranging from office archives to historical archives, as well as central archives and administrative libraries.
Contributions from the Archival Perspective
1. Management of the Document Lifecycle
GPAM incorporates specific elements for controlling documentary traceability through the TRACE prefix and metadata that enable recording each intervention on a document throughout its lifecycle. The ability to repeat the trace.identifier metadata as many times as procedures occur within a file—recording the activity, the responsible agent, and the date—constitutes a fundamental tool for the archival management of digital documents.
2. Control of Hierarchies and Organizational-Functional Relationships
The metadata relhierasc and relhierdesc enable the establishment of hierarchical relationships from three perspectives:
- Functional: linkage between processes, activities, and functions
- Organizational: dependencies between administrative units
- Documentary: relationships between documentary groupings (fonds, series, file)
This capability is essential for constructing organic and functional classification schemes, fundamental elements in archival theory for organizing documentary holdings.
3. Records Appraisal
The metadata element valuables includes attributes to specify the document's values—administrative, legal-judicial, evidential, scientific-technological, historical-testimonial, and informational—providing a standardized basis for appraisal, selection, and purging processes.
4. Semantic Interoperability
The GPAM proposal incorporates elements for working with linked data through attributes such as code and value, which contain URIs and standardized codes. This enables interconnection among authority records, regulatory frameworks, and activities, overcoming the interoperability limitations observed in e-EMGDE.
Technical Implementation
1. Metadata Editor
The GPAM proposal includes the development of an editor that enables assisted metadata encoding. The editor supports multiple description schemas according to:
- Cataloging Center: office file, central archive, historical archive, library, documentation center
- Described Entity: administrative unit, fonds, series, file, individual/legal person, family, legal framework, activity
This adaptability is crucial for the effective implementation of the system, as it recognizes that descriptive needs vary according to institutional context and material typology.
2. Collector and Retrieval System
GPAM incorporates a web crawler-based retrieval system that operates in three phases:
- Source Registry: a list of URLs where content with GPAM metadata is published
- XPath-Based Extraction: filtering of metadata using XPath queries that identify tags and their attributes
- Database Storage: indexing and retrieval using systems such as MySQL, PostgreSQL, or Oracle
The proposed extraction code utilizes queries such as:
This method enables the retrieval of all 240 possible metadata combinations, ensuring complete indexing.
Final Considerations
The GPAM proposal represents a significant contribution to the development of metadata systems for public administration from an integrative perspective that combines:
- Theoretical Rigor: grounded in the critical analysis of existing schemas and the principles of archival and documentary science
- Practical Applicability: through the development of editors, retrieval systems, and adaptable levels of exhaustiveness
- Interoperability: through the use of standardized prefixes, linked-data attributes, and structures that facilitate international translation and adaptation
- Sustainability: through the simplification of encoding and the automation of description and retrieval processes
From the perspective of Documentation Sciences, GPAM provides an integrated solution that addresses not only documentary description but also authority management, administrative traceability, control of functional and organizational hierarchies, and documentary appraisal.
For professionals in archives and documentation centers, the system provides concrete tools to enhance the informational transparency of public administrations, facilitating access to government information and ensuring the preservation of electronic documents throughout their life cycle.