PATTERN IDENTIFICATION OF DRUG RESISTANCE FOR TUBERCULOSIS IN PAKISTAN USING MACHINE LEARNING TECHNIQUES
Keywords:
Tuberculosis, Drug Resistance, MDR-TB, Machine Learning, Naïve Bayes, Ensemble Methods, ADASYN, Clinical Data, Pattern IdentificationAbstract
Tuberculosis (TB) remains a global health challenge due to the rise of drug-resistant strains, particularly multidrug-resistant TB (MDR-TB). This study employs machine learning to predict drug resistance patterns in TB patients using clinical data from Pakistan. We collected a dataset of 400 pre-processed samples with 12 key features, including demographic and drug response data, from multiple regions in Pakistan. After preprocessing and addressing class imbalance using the Adaptive Synthetic Sampling (ADASYN) technique, we evaluated nine supervised learning algorithms Multi-Layer Perceptron, Decision Tree, Random Forest, Naïve Bayes, Support Vector Machine, Gradient Boosting, Extreme Gradient Boosting, Logistic Regression, and an ensemble model under three techniques: Whole Dataset Imbalanced (Technique 1), Training Dataset Balanced with ADASYN (Technique 2), and Whole Dataset Balanced with ADASYN (Technique 3). Results show that NB achieved the highest realistic accuracy of 96.55% under Technique 2, with DT, RF, and the Ensemble model at 94.83%. Under Technique 3, NB reached a peak accuracy of 99.61%, outperforming prior literature benchmarks. These findings highlight the competitive performance of machine learning in the early detection of TB drug resistance, offering a pathway to improve treatment outcomes in resource-limited settings.