Last Wednesday, April 24, within the framework of the 10th Hispanic-Mexican Seminar on Library and Information Science, the first evaluation tests of the automated content classification system Resync—previously addressed in earlier research but whose classification algorithms had not yet been evaluated—were presented. In this case, the algorithm under study aims to classify 16,000 news articles drawn from a corpus of 400,000 retrieved contents. Some of the conclusions reached indicate that the achieved precision exceeds 71%, using the Eurovoc thesaurus, which has proven highly useful as a classification vocabulary for collections with high heterogeneity. Furthermore, evaluators who assessed classifications in the years 2012 and 2013 showed strong consensus regarding which categories were better or worse classified, as the trend lines corresponding to relevance percentages are closely aligned. This implies that the evaluation conducted enjoys considerable reliability. Finally, it is noted that similar international studies have consistently achieved precision rates of 73%, placing this work very close to such results and suggesting that the technology employed is on par with those used by laboratories at other universities and research centers.

Reference

  • BLÁZQUEZ OCHANDO, M. 2013. [Paper]. Evaluation of the automated content classification system Resync in Spanish and Mexican media. In: 10th Hispanic-Mexican Seminar on Library and Information Science. (Madrid, April 22–24).

Abstract

The objective of the research is the evaluation of algorithms for automatic content classification, originally designed for thematic categorization of content and news collected via the Resync platform. The evaluation process is carried out using forms specifically designed to determine the relevance of content classified by evaluators. It is ultimately determined that one of the algorithms used achieves an accuracy rate of 71%. It is also determined that the best-classified themes are those related to finance, law, and politics.

Keywords

Automatic classification systems, classification algorithms, algorithm evaluation, information retrieval

Download

Paper. 10o-seminario-hispanomexicano_manuel-blazquez-ochando

Presentation. 10o-seminario-hispanomexicano_manuel-blazquez-ochando