10th Egerton University International Conference and Research Week

Prediction modelling of academic performance with logistic regression: A case of rural primary school students in Kenya

Mvurya Mgala
  • Mvurya Mgala
  • Institute of Computing and Informatics, Technical University of Mombasa, Kenya
  • Google Scholar
Audrey Mbogho
  • Audrey Mbogho
  • Institute of Computing and Informatics, Technical University of Mombasa, Kenya
  • Google Scholar

  • Article Number - A82C94


Every year, when the Kenya Certificate of Primary Education (KCPE) examination results are released, the same story of mass failure in rural schools is repeated. Academic performance prediction modelling could provide an opportunity for learners' outcomes to be known early, before they sit for final examinations. This would be particularly useful for education stakeholders to initiate intervention measures to help students who require high intervention to pass final examinations. This study proposed that an academic performance prediction model could be built using Logistic Regression to classify students into two categories: those that will pass and those that will need intervention to pass. A six-step Cross-Industry Standard Process for Data Mining (CRISP-DM) theoretical framework was used to support the modelling process. Modelling was conducted using two datasets collected in Kwale County and Mombasa County. The first dataset had 2426 records having 22 features, collected from 54 rural primary schools. The second dataset had 1105 records with 19 features, collected from 11 peri-urban primary schools. Evaluation was conducted to investigate: (i) the prediction performance of Logistic Regression on the two datasets with all the features and; (ii) the prediction performance with an optimal subset of features. Two common performance measures (ROC area and F-Measure) were adopted. It was found that the model achieved a ROC area measure of 88.7% with all features and 88.5% with the optimal feature dataset. Similarly the F-Measure rate was 89.7% for all the features and 89.6% for the optimal feature subset. Further, a mobile application was implemented to facilitate the model use in rural areas where desktops cannot be used. Teachers in 15 schools used the model for two weeks to classify their Class Six and Class Seven students. Results show that nearly 80% of the students requiring high intervention could be determined. This high prediction performance means that the students who need high intervention could be determined early enough before the final examination. Further, this accuracy of prediction is good enough to motivate stakeholders to initiate strategic intervention measures.


Key words: Prediction modelling, academic performance, rural schools, prediction performance.