Journal of
Computational Biology and Bioinformatics Research

  • Abbreviation: J. Comput. Biol. Bioinform. Res
  • Language: English
  • ISSN: 2141-2227
  • DOI: 10.5897/JCBBR
  • Start Year: 2009
  • Published Articles: 39

Full Length Research Paper

Prediction of eukaryotic protein subcellular multi-localisation with a combined KNN-SVM ensemble classifier

Liqi Li1, Hong Kuang2, Yuan Zhang1*, Yue Zhou1, Kaifa Wang3 and Ying Wan4
1Department of Orthopaedics, Xinqiao Hospital, Third Military Medical University, Chongqing, China. 2Central Laboratory, 452nd Hospital of Chinese PLA, Chengdu, Sichuan, China. 3Department of Mathematics, Third Military Medical University, Chongqing, China. 4Department of Immunology, Third Military Medical University, Chongqing, China.
Email: [email protected]

  •  Accepted: 15 December 2010
  •  Published: 28 February 2011


Proteins may exist in or shift among two or more different subcellular locations, and this phenomenon is closely related to biological function. It is challenging to deal with multiple locations during eukaryotic protein subcellular localisation prediction with routine methods; therefore, a reliable and automatic ensemble classifier for protein subcellular localisation is needed. We propose a new ensemble classifier combined with the KNN (K-nearest neighbour) and SVM (support vector machine) algorithms to predict the subcellular localisation of eukaryotic proteins from the GO (gene ontology) annotations. This method was developed by fusing basic individual classifiers through a voting system. The overall prediction accuracies thus obtained via the jackknife test and resubstitution test were 70.5 and 77.6% for eukaryotic proteins respectively, which are significantly higher than other methods presented in the previous studies and reveal that our strategy better predicts eukaryotic protein subcellular localisation.


Key words: Gene ontology, multiple subcellular localisation, K-nearest neighbour, support vector machine, ensemble classifier.


GO, Gene ontology; KNN, K-nearest neighbour; SSL, subset subcellular location; SVM, support vector machine