AI-Driven System for Automated Anomaly Detection in Cloud Through Continuous Monitoring of Logs, Metrics, and Performance Data

Authors

  • Meenakshi Bansal Assistant Professor, CSE, Yadavindra Department of Engineering, Talwandi Sabo, India Author

DOI:

https://doi.org/10.48047/k6j4v358

Keywords:

Anomaly Detection, Cloud Operations, Cloud Environments, Performance Metrics

Abstract

The increasing complexity of cloud computing environments necessitates robust, automated monitoring systems to ensure high availability and operational efficiency. Traditional manual anomaly detection methods are no longer sufficient due to their limited scalability, high error rates, and delayed response times. This research proposes a machine learning-based anomaly detection system designed to proactively monitor cloud operations by analyzing real-time streams of logs, system metrics, and performance indicators. The system ingests diverse data sources including timestamped logs, CPU and memory utilization, network traffic, response times, and error rates, each tagged with unique resource identifiers. It leverages both labeled and unlabelled datasets for comprehensive model training and evaluation. A hybrid approach is adopted using supervised algorithms—Support Vector Machine (SVM), Random Forest, Deep Neural Network (DNN), and Extreme Gradient Boosting (XGBoost). SVM achieves 97% accuracy on labeled historical data, while DNN reaches 99.2% by modeling complex nonlinear patterns, and XGBoost achieves 98.8% by optimizing performance through gradient boosting on decision trees. Notably, Random Forest attains 100% accuracy across both labeled and unlabelled scenarios, demonstrating exceptional generalization through ensemble learning. Detected anomalies are classified by severity and type, with each assigned a confidence score to guide timely responses—either automated or manual. This intelligent framework substantially enhances anomaly detection accuracy, minimizes false positives, and supports resilient cloud operations. Its modular, scalable architecture ensures compatibility with existing cloud infrastructures. Future enhancements will focus on real-time adaptability, cross-cloud deployment, and integration with predictive analytics and self-healing protocols to enable autonomous cloud management.

Downloads

Download data is not yet available.

References

Farshchi, M., Schneider, J. G., Weber, I., & Grundy, J. (2018). Metric selection and anomaly detection for cloud operations using log and metric correlation analysis. Journal of Systems and Software, 137, 531-549.

Farshchi, M., Schneider, J. G., Weber, I., & Grundy, J. (2015, November). Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis. In 2015 IEEE 26th international symposium on software reliability engineering (ISSRE) (pp. 24-34). IEEE.

Chen, Z., Liu, J., Gu, W., Su, Y., & Lyu, M. R. (2021). Experience report: Deep learning-based system log analysis for anomaly detection. arXiv preprint arXiv:2107.05908.

Islam, M. S., Pourmajidi, W., Zhang, L., Steinbacher, J., Erwin, T., & Miranskyy, A. (2021, May). Anomaly detection in a large-scale cloud platform. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (pp. 150-159). IEEE.

Bhanage, D. A., Pawar, A. V., & Kotecha, K. (2021). IT infrastructure anomaly detection and failure handling: A systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access, 9, 156392-156421.

Catillo, M., Pecchia, A., & Villano, U. (2022). AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications, 191, 116263.

He, Z., Chen, P., Li, X., Wang, Y., Yu, G., Chen, C., ... & Zheng, Z. (2020). A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems. IEEE Transactions on Neural Networks and Learning Systems, 34(4), 1705-1719.

Mitropoulou, K., Kokkinos, P., Soumplis, P., & Varvarigos, E. (2024). Anomaly detection in cloud computing using knowledge graph embedding and machine learning mechanisms. Journal of Grid Computing, 22(1), 6.

El-Kassabi, H. T., Serhani, M. A., Masud, M. M., Shuaib, K., & Khalil, K. (2023). Deep learning approach to security enforcement in cloud workflow orchestration. Journal of Cloud Computing, 12(1), 10.

Rajapaksha, C. I. (2022). Machine Learning-Driven Anomaly Detection Models for Cloud-Hosted E-Payment Infrastructures. Journal of Computational Intelligence for Hybrid Cloud and Edge Computing Networks, 6(12), 1-11.

Dodda, S., Chintala, S., Kunchakuri, N., & Kamuni, N. (2024, October). Enhancing Microservice Reliability in Cloud Environments Using Machine Learning for Anomaly Detection. In 2024 International Conference on Computing, Sciences and Communications (ICCSC) (pp. 1-5). IEEE.

Hrusto, A., Engström, E., & Runeson, P. (2022, May). Optimization of anomaly detection in a microservice system through continuous feedback from development. In Proceedings of the 10th IEEE/ACM International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems (pp. 13-20).

Rousopoulou, V., Vafeiadis, T., Nizamis, A., Iakovidis, I., Samaras, L., Kirtsoglou, A., ... & Tzovaras, D. (2022). Cognitive analytics platform with AI solutions for anomaly detection. Computers in Industry, 134, 103555.

Jaramillo-Alcazar, A., Govea, J., & Villegas-Ch, W. (2023). Anomaly detection in a smart industrial machinery plant using iot and machine learning. Sensors, 23(19), 8286.

Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., ... & Zhang, D. (2019, August). Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (pp. 807-817).

Xin, R., Liu, H., Chen, P., & Zhao, Z. (2023). Robust and accurate performance anomaly detection and prediction for cloud applications: a novel ensemble learning-based framework. Journal of Cloud Computing, 12(1), 7.

Al-Amri, R., Murugesan, R. K., Man, M., Abdulateef, A. F., Al-Sharafi, M. A., & Alkahtani, A. A. (2021). A review of machine learning and deep learning techniques for anomaly detection in IoT data. Applied Sciences, 11(12), 5320.

Downloads

Published

2024-01-20

How to Cite

AI-Driven System for Automated Anomaly Detection in Cloud Through Continuous Monitoring of Logs, Metrics, and Performance Data (M. Bansal , Trans.). (2024). Cuestiones De Fisioterapia, 53(1), 333-349. https://doi.org/10.48047/k6j4v358