AI-Driven System for Automated Anomaly Detection in Cloud Through Continuous Monitoring of Logs, Metrics, and Performance Data
DOI:
https://doi.org/10.48047/k6j4v358Keywords:
Anomaly Detection, Cloud Operations, Cloud Environments, Performance MetricsAbstract
The increasing complexity of cloud computing environments necessitates robust, automated monitoring systems to ensure high availability and operational efficiency. Traditional manual anomaly detection methods are no longer sufficient due to their limited scalability, high error rates, and delayed response times. This research proposes a machine learning-based anomaly detection system designed to proactively monitor cloud operations by analyzing real-time streams of logs, system metrics, and performance indicators. The system ingests diverse data sources including timestamped logs, CPU and memory utilization, network traffic, response times, and error rates, each tagged with unique resource identifiers. It leverages both labeled and unlabelled datasets for comprehensive model training and evaluation. A hybrid approach is adopted using supervised algorithms—Support Vector Machine (SVM), Random Forest, Deep Neural Network (DNN), and Extreme Gradient Boosting (XGBoost). SVM achieves 97% accuracy on labeled historical data, while DNN reaches 99.2% by modeling complex nonlinear patterns, and XGBoost achieves 98.8% by optimizing performance through gradient boosting on decision trees. Notably, Random Forest attains 100% accuracy across both labeled and unlabelled scenarios, demonstrating exceptional generalization through ensemble learning. Detected anomalies are classified by severity and type, with each assigned a confidence score to guide timely responses—either automated or manual. This intelligent framework substantially enhances anomaly detection accuracy, minimizes false positives, and supports resilient cloud operations. Its modular, scalable architecture ensures compatibility with existing cloud infrastructures. Future enhancements will focus on real-time adaptability, cross-cloud deployment, and integration with predictive analytics and self-healing protocols to enable autonomous cloud management.
Downloads
References
Farshchi, M., Schneider, J. G., Weber, I., & Grundy, J. (2018). Metric selection and anomaly detection for cloud operations using log and metric correlation analysis. Journal of Systems and Software, 137, 531-549.
Farshchi, M., Schneider, J. G., Weber, I., & Grundy, J. (2015, November). Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis. In 2015 IEEE 26th international symposium on software reliability engineering (ISSRE) (pp. 24-34). IEEE.
Chen, Z., Liu, J., Gu, W., Su, Y., & Lyu, M. R. (2021). Experience report: Deep learning-based system log analysis for anomaly detection. arXiv preprint arXiv:2107.05908.
Islam, M. S., Pourmajidi, W., Zhang, L., Steinbacher, J., Erwin, T., & Miranskyy, A. (2021, May). Anomaly detection in a large-scale cloud platform. In 2021 IEEE/ACM 43rd International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP) (pp. 150-159). IEEE.
Bhanage, D. A., Pawar, A. V., & Kotecha, K. (2021). IT infrastructure anomaly detection and failure handling: A systematic literature review focusing on datasets, log preprocessing, machine & deep learning approaches and automated tool. IEEE Access, 9, 156392-156421.
Catillo, M., Pecchia, A., & Villano, U. (2022). AutoLog: Anomaly detection by deep autoencoding of system logs. Expert Systems with Applications, 191, 116263.
He, Z., Chen, P., Li, X., Wang, Y., Yu, G., Chen, C., ... & Zheng, Z. (2020). A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems. IEEE Transactions on Neural Networks and Learning Systems, 34(4), 1705-1719.
Mitropoulou, K., Kokkinos, P., Soumplis, P., & Varvarigos, E. (2024). Anomaly detection in cloud computing using knowledge graph embedding and machine learning mechanisms. Journal of Grid Computing, 22(1), 6.
El-Kassabi, H. T., Serhani, M. A., Masud, M. M., Shuaib, K., & Khalil, K. (2023). Deep learning approach to security enforcement in cloud workflow orchestration. Journal of Cloud Computing, 12(1), 10.
Rajapaksha, C. I. (2022). Machine Learning-Driven Anomaly Detection Models for Cloud-Hosted E-Payment Infrastructures. Journal of Computational Intelligence for Hybrid Cloud and Edge Computing Networks, 6(12), 1-11.
Dodda, S., Chintala, S., Kunchakuri, N., & Kamuni, N. (2024, October). Enhancing Microservice Reliability in Cloud Environments Using Machine Learning for Anomaly Detection. In 2024 International Conference on Computing, Sciences and Communications (ICCSC) (pp. 1-5). IEEE.
Hrusto, A., Engström, E., & Runeson, P. (2022, May). Optimization of anomaly detection in a microservice system through continuous feedback from development. In Proceedings of the 10th IEEE/ACM International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems (pp. 13-20).
Rousopoulou, V., Vafeiadis, T., Nizamis, A., Iakovidis, I., Samaras, L., Kirtsoglou, A., ... & Tzovaras, D. (2022). Cognitive analytics platform with AI solutions for anomaly detection. Computers in Industry, 134, 103555.
Jaramillo-Alcazar, A., Govea, J., & Villegas-Ch, W. (2023). Anomaly detection in a smart industrial machinery plant using iot and machine learning. Sensors, 23(19), 8286.
Zhang, X., Xu, Y., Lin, Q., Qiao, B., Zhang, H., Dang, Y., ... & Zhang, D. (2019, August). Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering (pp. 807-817).
Xin, R., Liu, H., Chen, P., & Zhao, Z. (2023). Robust and accurate performance anomaly detection and prediction for cloud applications: a novel ensemble learning-based framework. Journal of Cloud Computing, 12(1), 7.
Al-Amri, R., Murugesan, R. K., Man, M., Abdulateef, A. F., Al-Sharafi, M. A., & Alkahtani, A. A. (2021). A review of machine learning and deep learning techniques for anomaly detection in IoT data. Applied Sciences, 11(12), 5320.
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- Adapt — remix, transform, and build upon the material for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
- Attribution — You must give appropriate credit , provide a link to the license, and indicate if changes were made . You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
- No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation .
No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.