AI-POWERED ANOMALY DETECTION IN SOFTWARE LOGS: A MACHINE LEARNING APPROACH FOR PROACTIVE FAULT DIAGNOSIS AND SELF-HEALING SYSTEMS

Taib Ali; Rizwan Iqbal; Nadia Mustaqim Ansari; Talha Tariq; Adnan Ahmed Rafique

Authors

Taib Ali
Rizwan Iqbal
Nadia Mustaqim Ansari
Talha Tariq
Adnan Ahmed Rafique

Keywords:

AI-powered anomaly detection, software logs, machine learning, deep learning, self-healing systems, log analysis, LSTM, Transformer models, contrastive learning, proactive fault diagnosis, reinforcement learning, explainable AI, system resilience, automated fault recovery, real-time anomaly detection

Abstract

Due to the complexity of modern software systems the amount of logs generated to assist with monitoring and fault diagnosing has become way too large for manual processing. This paper aims at developing the architecture for identifying anomalous patterns in the software log files through the application of advanced machine learning and deep learning algorithms towards fault diagnosis for self- healing systems. Traditional rule based approaches cannot fit the modern complex scenarios as well as the large amounts of data that are produced in the form of logs. Machine learning approaches, including deep learning structures like LSTMs and Transforms, are more effective at detecting anomalies due to their ability to capture contextual dependencies inherent in log sequences. It also offers resolutions to the problem of a scarcity of labeled data in the utilization of self-supervised learning approaches, along with contrastive learning. Additionally, self-C damaged control mechanisms based on reinforcement learning as well as a rule and based automation recurrently correct faults decreasing the non-availability of the system. Several models are assessed on log datasets with different evaluation metrics such as precision, recall, F1-score, and AUC-ROC. The results when testing suggest that Transformer-based models yield the best performance as compared to other conventional machine learning methods while at the same time requiring more computational resources. Self-healing systems cut down on downtime by as much as 68.2 percent; such characteristics make AI promising for strengthening system performance. That is why some issues, like model interpretability, high computation costs, and real-time processing, are still present. Mitigating these challenges by employing lightweight deep learning models, explainable AI methods, and the ability to deploy these algorithms at scale will be instrumental in advancing the use of AI-based anomaly detection and self-healing systems in safety- critical software applications. This work presents a state-of-the-art review of AI- based log anomaly detection methods and discusses potential research directions for improving scalability, interpretability, and practicality in real-world applications.