Educational Research and Reviews

  • Abbreviation: Educ. Res. Rev.
  • Language: English
  • ISSN: 1990-3839
  • DOI: 10.5897/ERR
  • Start Year: 2006
  • Published Articles: 2004

Full Length Research Paper

Identifying patterns for unsupervised learning of multiword terms

José Luis Ochoa, Ángela Almela and Rafael Valencia-García*
Facultad de Informática. Universidad de Murcia 30071, Espinardo Murcia, Spain.
Email: [email protected]

  •  Accepted: 20 July 2011
  •  Published: 30 September 2011

Abstract

 

The identification of valid terms in any domain is fundamental to its computerization. For this reason, in this paper we present a method for obtaining automated morphosyntactic patterns, which will help researchers to obtain valid terms from the proposed patterns, in order to build quality ontologies for the translation from one language to another, or to find important concepts in short sentences, which can be used as parameters in question-answer systems. For this purpose, we use some statistical methods which show candidates in a pattern vector. Then, a heuristic process unfolds to refine the pattern vector obtained, based on two main parameters: the statistical results previously obtained and the pattern length analyzed. As a result, we obtain the collection of the best patterns for the detection of real multiword terms.
 
Key words: Morphosyntactic patterns, multiword terms, incremental learning.