The adaptation of Mycobacterium Tuberculosis to varied environmental stresses is a fundamental aspect of its pathogenesis and survival. Universal Stress Proteins (USPs) have emerged as pivotal players in this adaptive response, with their expression triggered by an array of stressors. In this study, we propose a novel approach to predict USPs using a Support Vector Machine (SVM) model, aiming to enhance our comprehension of Mycobacterium Tuberculosis' stress adaptation mechanisms. The universal stress proteins (USPs) of Mycobacterium tuberculosis (Mtb) play a pivotal role in the bacterium's ability to withstand diverse stress conditions and establish persistent infections in the host. Unraveling the functions and regulatory mechanisms of USPs is of paramount importance in understanding Mtb's pathogenicity and devising targeted interventions against tuberculosis. There is a need to provide a data-driven, scalable approach to predict USPs from Mycobacterium tuberculosis based on learned patterns and features extracted from protein sequences and properties. Protein sequences, that are annotated with both USPs and non-USPs, of 5,900 amino acids were obtained from the UniProt and NCBI protein database within the range of ten years. 3,082 of the amino acids were from the 600 non-USPs while 2,818 were from the 58 USPs. Data preprocessing was performed on both class of dataset and feature extraction techniques was used to transform raw protein sequences into numerical representations suitable for SVM training. Such evaluation matrices as Precision, Recall, F1-Score and Accuracy were used to evaluate the model performance. Additionally, we implemented 10-fold cross-validation for robust model evaluation and performance assessment. The model was able to predict 83% accuracy of universal stress protein sequences from Mycobacterium tuberculosis which show practical implications of the SVM model's performance in predicting USPs and its potential for supporting further research on Mtb's stress response mechanisms.
Keywords: Universal Stress Proteins, SVM, Accuracy, Precision, Recall, Cross-validation