Reference

  1. Blázquez Ochando, M. (2014). Proposal for the development of a web metadata system for public administration. Paper presented at the Latin American Symposium on Access to Government Information, Mexico City, Mexico. http://eprints.rclis.org/22968/

Comment

In the current context of e-government and the growing demand for governmental transparency, metadata systems have become fundamental tools for document management, institutional interoperability, and public information access. The research presents a critical analysis of major existing metadata schemas—AGLS (Australian Government Locator Service), e-EMGDE (Metadata Schema for Electronic Document Management), and Dublin Core Qualified—to propose, based on their strengths and weaknesses, a new model called GPAM (Government Public Administration Metadata).

Critical Analysis of Existing Metadata Schemas

1. AGLS (Australian Government Locator Service)

The AGLS schema was designed with the purpose of establishing a standardized method for describing Australian public administration documentation. Its key features include:

  1. Semantic foundation: Implements aspects of the Semantic Web through the use of RDF triples, theoretically enabling semantic inferences.
  2. Predefined structure: Uses denotative prefixes that facilitate identification of the application domain (AGLSTERMS, AVAILTERMS, ADMINTERMS, AGENTTERMS).
  3. Relationship with Dublin Core: Incorporates 46 of the 55 Dublin Core Qualified metadata elements, accounting for approximately 50% of the total.

From an archival perspective, AGLS offers significant advantages in terms of the capacity for interrelation among resources, with 34 metadata elements specialized in associated documentation. However, its practical application reveals limitations in distinguishing between types of entities (documents, agents, activities) and in manual encoding, despite the clarity of its prefixes.

2. e-EMGDE (Metadata Schema for Electronic Document Management)

Developed as a standardized instrument for the Spanish public administration, e-EMGDE is characterized by:

  1. Archival perspective: It provides a view of the electronic document's life cycle, establishing similarities with the description areas of the ISAD(G) standard.
  2. Hierarchical structure: It organizes its elements and subelements into a structure that allows detailed description of the properties and characteristics of documentary objects.
  3. Specialized Coverage: It excels in the description of access and usage conditions, with 29 metadata elements dedicated to this area, as well as in the traceability section, with 14 metadata elements addressing initiating regulations, jurisdiction, and document transfers.

The comparative analysis reveals that e-EMGDE surpasses AGLS in total number of metadata elements (117 versus 100), but presents significant practical disadvantages. The length of its terms—some of which exceed 50 characters—hinders manual coding, and its exclusive use of Spanish limits its international interoperability.

3. Common Limitations

Both schemes exhibit significant shortcomings from an archival and documentary perspective:


Dimension

Identified limitation

Multilevel analysis

None of the schemes specify levels of comprehensiveness in their usage guidelines

Publishers

Absence of public and official publishers to facilitate implementation

Retrieval Systems

Lack of standardized systems for creating automated search engines and directories

Semantic Differentiation

Ambiguity in the distinction between the described document and the agents involved in its management

Standardization

Absence of predefined templates to guide the selection and combination of metadata according to document typologies

Theoretical Foundations of the GPAM Proposal

The GPAM proposal emerges as a response to the identified limitations, proposing a metadata model that addresses four fundamental issues:

  1. Which elements, objects, and agents must be described
  2. What level of comprehensiveness the description must have
  3. How to design editors and assisted encoding systems
  4. How to implement information retrieval systems

1. Identification of entities using prefixes

One of the most significant contributions of GPAM is the introduction of a prefix system that enables unambiguous identification of the nature of the described element:


Prefix

Designation

Type of securities

DOCGROUP

Documentary group

Fonds, subfonds, series, subseries, section, simple file, composite file, box, bundle

AGENT

Agent

Public or private institution, natural or legal person, body, department, information system

ACTIVITY

Activity

Goal, function, activity, action, process

FRAMEWORK

Regulatory framework

Constitution, laws (organic, ordinary, regional), statutes, international agreements and treaties

TRACE

Traceability

Initiation, editing, transfer, purging, copying, signing, verification, supervision, sealing

This prefix-based structuring resolves one of the fundamental problems identified in existing schemas: the ambiguity between the described document and the associated agents or activities. By incorporating the prefix into the encoding, the system enables clear discrimination of the type of entity being described.

2. Syntactic Combination Model

The GPAM schema proposes a syntactic method that combines identifier prefixes with specific metadata. The basic structure is as follows:

<meta name="PREFIX.metadata" attribute1="value" attribute2="value" />

For example, to describe an administrative file:

<meta name="DOCGROUP.identifier" type="Simple File" code="EXP-2024-001" title="Contracting Services File" value="http://www.administracion.es/expedientes/001" />

This syntactic model generates 240 possible combinations (5 prefixes × 48 metadata elements), doubling the descriptive capacity of the AGLS and e-EMGDE schemas, and enabling the cataloger to select the most appropriate elements for each documentary context.

3. Metadata Structure

The GPAM proposal organizes its 48 metadata elements into description areas that, while inspired by the ISAD(G) standards, incorporate specific categories for public administration:


Area

Metadata

Archival Application

Identification

identifier, type, localcontrol, intercontrol

Standardization of national and international codes

Title and Responsibility

title, othertitle, fullname, relcreator, relcontributor, relauthority

Distinction between author, contributor, and cited authority

Dates

datecreated, datefinished, dateupdated, dateapply, daterights

Chronological control of the document lifecycle

Content and Structure

summary, keywords, description

Description of Scope and Content

Context

conhistoric, constatus, consocial, conspatial, contemporal

Historical, legal, social, and spatial contextualization

Physical Description

phisholder, phisextent, phisdetails, phisdimensions, phisenclosed

Characterization of Support and Format

Jurisdiction and Valuation

jurisdiction, valuables

Scope of Application and Documentary Values

Access Conditions

accessconditions, rights, language, signature, security, classaccess

Access and Rights Control

Relationships

relversionprev, relversionnext, relhierasc, relhierdesc, reldocument, relcopy, relsource

Linking between versions, hierarchies, and related documents

Control and Notes

notes

Additional Information

4. Flexibility and Levels of Detail

The GPAM proposal introduces a system of levels of exhaustiveness that allows the description to be adapted to the specific needs of each institution. The basic identification metadata—those using the prefix and the metadata "identifier"—are considered mandatory to ensure the unique identification of the resource. The remaining metadata are applied according to institutional policies and the characteristics of the documentary fonds.

This flexibility is particularly relevant in the archival domain, where centers with varying technical capabilities and resources coexist, ranging from office archives to historical archives, as well as central archives and administrative libraries.

Contributions from the Archival Perspective

1. Management of the Document Lifecycle

GPAM incorporates specific elements for controlling documentary traceability through the TRACE prefix and metadata that enable recording each intervention on a document throughout its lifecycle. The ability to repeat the trace.identifier metadata as many times as procedures occur within a file—recording the activity, the responsible agent, and the date—constitutes a fundamental tool for the archival management of digital documents.

2. Control of Hierarchies and Organizational-Functional Relationships

The metadata relhierasc and relhierdesc enable the establishment of hierarchical relationships from three perspectives:

  1. Functional: linkage between processes, activities, and functions
  2. Organizational: dependencies between administrative units
  3. Documentary: relationships between documentary groupings (fonds, series, file)

This capability is essential for constructing organic and functional classification schemes, fundamental elements in archival theory for organizing documentary holdings.

3. Records Appraisal

The metadata element valuables includes attributes to specify the document's values—administrative, legal-judicial, evidential, scientific-technological, historical-testimonial, and informational—providing a standardized basis for appraisal, selection, and purging processes.

4. Semantic Interoperability

The GPAM proposal incorporates elements for working with linked data through attributes such as code and value, which contain URIs and standardized codes. This enables interconnection among authority records, regulatory frameworks, and activities, overcoming the interoperability limitations observed in e-EMGDE.

Technical Implementation

1. Metadata Editor

The GPAM proposal includes the development of an editor that enables assisted metadata encoding. The editor supports multiple description schemas according to:

  1. Cataloging Center: office file, central archive, historical archive, library, documentation center
  2. Described Entity: administrative unit, fonds, series, file, individual/legal person, family, legal framework, activity

This adaptability is crucial for the effective implementation of the system, as it recognizes that descriptive needs vary according to institutional context and material typology.

2. Collector and Retrieval System

GPAM incorporates a web crawler-based retrieval system that operates in three phases:

  1. Source Registry: a list of URLs where content with GPAM metadata is published
  2. XPath-Based Extraction: filtering of metadata using XPath queries that identify tags and their attributes
  3. Database Storage: indexing and retrieval using systems such as MySQL, PostgreSQL, or Oracle

The proposed extraction code utilizes queries such as:

$metadata01 = $xpath1->query("meta[@name='DOCGROUP.identifier']");

This method enables the retrieval of all 240 possible metadata combinations, ensuring complete indexing.

Final Considerations

The GPAM proposal represents a significant contribution to the development of metadata systems for public administration from an integrative perspective that combines:

  1. Theoretical Rigor: grounded in the critical analysis of existing schemas and the principles of archival and documentary science
  2. Practical Applicability: through the development of editors, retrieval systems, and adaptable levels of exhaustiveness
  3. Interoperability: through the use of standardized prefixes, linked-data attributes, and structures that facilitate international translation and adaptation
  4. Sustainability: through the simplification of encoding and the automation of description and retrieval processes

From the perspective of Documentation Sciences, GPAM provides an integrated solution that addresses not only documentary description but also authority management, administrative traceability, control of functional and organizational hierarchies, and documentary appraisal.

For professionals in archives and documentation centers, the system provides concrete tools to enhance the informational transparency of public administrations, facilitating access to government information and ensuring the preservation of electronic documents throughout their life cycle.