Last Wednesday, April 24, within the framework of the 10th Spanish-Mexican Seminar on Library and Information Science, the first evaluation tests of the automated content classification system ReSync—previously addressed in prior research but whose classification algorithms had not yet been evaluated—were presented. In this case, the algorithm aims to classify 16,000 news articles drawn from a corpus of 400,000 retrieved contents. Some of the conclusions reached indicate that the achieved precision exceeds 71%, using the Eurovoc thesaurus, which has proven highly useful as a classification vocabulary for collections with high heterogeneity. Furthermore, evaluators who assessed classifications in 2012 and 2013 showed considerable consensus regarding which categories were better or worse classified, as the trend lines corresponding to relevance percentages are closely aligned. This implies that the conducted evaluation possesses significant reliability. Finally, it is noted that analogous international studies have consistently achieved precision rates of 73%, placing this work very close to such results and suggesting that the employed technology is on par with those developed by laboratories at other universities and research centers.

Reference

  1. Blázquez-Ochando, M. 2013. [Paper]. Evaluation of the ReSync automated content classification system in Spanish and Mexican media. In: 10th Hispano-Mexican Seminar on Library and Information Science. (Madrid, April 22–24). https://files01.core.ac.uk/download/pdf/11890755.pdf

Abstract

The objective of this research is to evaluate algorithms for automated content classification, originally designed for thematic categorization of content and news collected via the ReSync platform. The evaluation process is carried out using questionnaires specifically designed to determine the relevance of content as assessed by evaluators. It is ultimately determined that one of the algorithms used achieves an accuracy rate of 71%. It is also found that the best-classified themes are those related to finance, law, and politics.

Keywords

Automatic classification systems, classification algorithms, algorithm evaluation, information retrieval

Download

  1. Paper. paper_10o-evaluacion-sistema-clasificacion-automatica-contenidos-resync-mexico.pdf
  2. Presentation. paper_10o-evaluacion-sistema-clasificacion-automatica-contenidos-resync-mexico.pptx