International Journal of
Physical Sciences

  • Abbreviation: Int. J. Phys. Sci.
  • Language: English
  • ISSN: 1992-1950
  • DOI: 10.5897/IJPS
  • Start Year: 2006
  • Published Articles: 2557

Full Length Research Paper

Text clustering on latent semantic indexing with particle swarm optimization (PSO) algorithm

Eisa Hasanzadeh1, Morteza Poyan rad2* and Hamid Alinejad Rokny3       
1Electrical and Computer Engineering Faculty, Qazvin Islamic Azad University, Qazvin, Iran. 2Electrical and Computer Engineering Faculty, Qazvin Islamic Azad University, Member of Young Research Club, Qazvin, Iran. 3Department of Computer Engineering, Science and Research Branch, Islamic Azad University, Tehran, Iran.  
Email: [email protected] or [email protected]

  •  Accepted: 16 November 2011
  •  Published: 02 January 2012


Most of web users use various search engines to get specific information. A key factor in the success of web search engines are their ability to rapidly find good quality results to the queries that are based on specific terms. This paper aims at retrieving more relevant documents from a huge corpus based on the required information. We propose a particle swarm optimization algorithm based on latent semantic indexing (PSO+LSI) for text clustering. PSO family of bio-inspired algorithms has recently successfully been applied to a number of real word clustering problems. We use an adaptive inertia weight (AIW) that do proper exploration and exploitation in search space. PSO can merge with LSI to achieve best clustering accuracy and efficiency. This framework provides more relevant documents to the user and reduces the irrelevant documents. It would be seen that for all numbers of dimensions, PSO+LSI are faster than PSO+Kmeans algorithms using vector space model (VSM). It takes 22.3 s for PSO+LSI method with 1000 terms to obtain its best performance on 150 dimensions.


Key words: Vector space model, particle swarm optimization (PSO) algorithm, latent semantic indexing, text clustering, adaptive inertia weight.