IMPACT OF MACHINE LEARNING ALGORITHM CHOICE AND DATA QUALITY ON MODEL ACCURACY
Keywords:
Machine Learning, Algorithm Choice, Data Quality, Model Accuracy, Ensemble Methods, Random Forest, Gradient Boosting, Support Vector Machine (SVM), Decision Tree, Neural Network, Data Preprocessing, Statistical Analysis, ANOVA, Model OptimizationAbstract
This study investigates the impact of machine learning (ML) algorithm choice and data quality on model accuracy. With the growing adoption of ML across industries such as healthcare, finance, and environmental sciences, understanding how different algorithms perform under varied data conditions is essential for optimizing model performance. The study examines five widely-used ML algorithms—Decision Tree, Random Forest, Support Vector Machine (SVM), Neural Network, and Gradient Boosting—across five publicly available datasets manipulated to simulate high and low-quality data conditions. Statistical analyses, including One-Way ANOVA, Independent Samples t-test, and Two-Way ANOVA, reveal that both algorithm choice and data quality significantly influence model accuracy. The results indicate that ensemble methods like Random Forest and Gradient Boosting are more robust to poor-quality data compared to simpler models such as SVM and Decision Trees. The study emphasizes the need for careful algorithm selection and data quality improvement in machine learning model optimization, highlighting the critical role of data preprocessing.