Natural language processing tasks includes summarization, machine translation, question understanding, part of speech tagging, etc. In order to achieve those tasks, a proper language representation must be defined. Roots and stems are considered as representations for some of those systems. A word needs to be processed to extract its root or stem. This paper presents a new technique that extracts word weights, by stripping of prefixes and suffixes from a given word. This technique is based on Hidden Markov Model (HMM). A path from a start state to the end state represents a word, each state constitute letters of a word. States are prefixes, weights, and suffixes. The best selected path should have the highest likelihood of a word. The approach results in a promising 95% performance.
Key words: Natural language processing, morphology, hidden markov model, stem.
Copyright © 2021 Author(s) retain the copyright of this article.
This article is published under the terms of the Creative Commons Attribution License 4.0