AI DRIVEN CLINICAL DECISION SUPPORT SYSTEMS: A RETRIEVAL AUGMENTED GENERATION APPROACH FOR HEALTHCARE DELIVERY AND EFFICIENCY

Authors

  • Abdur Rahman Lindon Department of Information Technology, Washington University of Science and Technology, 2900 Eisenhower Ave, Alexandria, VA 22314, USA Author
  • Hafiz Aziz Khan Department of Information Technology, Washington University of Science and Technology, 2900 Eisenhower Ave, Alexandria, VA 22314, USA Author
  • Nusrat Yasmin Nadia Department of Information Technology, Washington University of Science and Technology, 2900 Eisenhower Ave, Alexandria, VA 22314, USA Author
  • Habibor Rahman Rabby Department of Computer Science, Campbellsville University, 2300 Greene Way #100, Louisville, KY 40220, USA Author
  • Md Habibul Arif Department of Information Technology, Washington University of Science and Technology, 2900 Eisenhower Ave, Alexandria, VA 22314, USA Author

DOI:

https://doi.org/10.48047/v90mq567

Abstract

Clinical decision support systems have become increasingly important in modern healthcare, yet many language model-based approaches remain limited by unsupported responses, insufficient contextual grounding, and inadequate reliability for routine clinical use. To address these limitations, this study proposes a guideline-grounded retrieval-augmented generation framework that combines dense semantic retrieval with large-language model-based answer generation for healthcare question answering. The framework was developed using the epfl_llm/guidelines dataset, from which a large-scale retrieval corpus of 970,584 text chunks was constructed through systematic preprocessing, recursive text chunking, metadata preservation, embedding generation, and vector indexing in ChromaDB. Three embedding models, namely all-MiniLM-L6-v2, E5-base-v2, and BGE-base-v1.5, were evaluated alongside three language model configurations, including Phi-3 Mini, LLaMA 7B, and GPT-4o-mini, to assess retrieval effectiveness, answer relevance, contextual alignment, and inference efficiency across a manually curated set of 56 clinical questions. The results demonstrate that retrieval quality strongly influences final response quality, with BAAI/BGE-base-v1.5 achieving the highest retrieval performance across all ranking metrics. Furthermore, the RAG-based framework consistently outperformed direct language model generation across all lexical and semantic evaluation metrics, confirming the benefit of grounding generated responses in retrieved clinical evidence. A practical trade-off between response quality and inference latency was also observed across model configurations. These findings suggest that guideline-grounded retrieval-augmented generation is a promising, practically viable approach for developing more trustworthy, context-aware, and evidence-based clinical decision support systems.

Downloads

Download data is not yet available.

References

Kawamoto, K., Houlihan, C. A., Balas, E. A., & Lobach, D. F. (2005). Im- proving clinical practice using clinical decision support systems: a system- atic review of trials to identify features critical to success. Bmj, 330(7494), 765.

Sutton, R. T., Pincock, D., Baumgart, D. C., Sadowski, D. C., Fedorak,

R. N., & Kroeker, K. I. (2020). An overview of clinical decision support systems: benefits, risks, and strategies for success. NPJ digital medicine, 3(1), 17.

Van Baalen, S., Boon, M., & Verhoef, P. (2021). From clinical decision sup- port to clinical reasoning support systems. Journal of evaluation in clinical practice, 27(3), 520-528.

Susanto, A. P., Lyell, D., Widyantoro, B., Berkovsky, S., & Magrabi, F. (2023). Effects of machine learning-based clinical decision support systems on decision-making, care delivery, and patient outcomes: a scoping review. Journal of the American Medical Informatics Association, 30(12), 2050- 2063.

Labkoff, S., Oladimeji, B., Kannry, J., Solomonides, A., Leftwich, R., Koski, E., ... & Quintana, Y. (2024). Toward a responsible future: recom- mendations for AI-enabled clinical decision support. Journal of the Amer- ican Medical Informatics Association, 31(11), 2730-2739.

Singhal, K., Azizi, S., Tu, T., Mahdavi, S. S., Wei, J., Chung, H. W., ... & Natarajan, V. (2023). Large language models encode clinical knowledge. Nature, 620(7972), 172-180.

Kanjee, Z., Crowe, B., & Rodman, A. (2023). Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. Jama, 330(1), 78-80.

Hirosawa, T., Harada, Y., Tokumasu, K., Ito, T., Suzuki, T., & Shimizu, T. (2024). Comparative study to evaluate the accuracy of differential diagnosis lists generated by gemini advanced, gemini, and bard for a case report series analysis: cross-sectional study. JMIR Medical Informatics, 12, e63010.

Wang, C., Ong, J., Wang, C., Ong, H., Cheng, R., & Ong, D. (2024). Potential for GPT technology to optimize future clinical decision-making using retrieval-augmented generation. Annals of biomedical engineering, 52(5), 1115-1118.

Oniani, D., Wu, X., Visweswaran, S., Kapoor, S., Kooragayalu, S., Polan- ska, K., & Wang, Y. (2024, June). Enhancing large language models for clinical decision support by incorporating clinical practice guidelines. In 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI) (pp. 694-702). IEEE.

Miao, J., Thongprayoon, C., Suppadungsuk, S., Garcia Valencia, O. A., & Cheungpasitporn, W. (2024). Integrating retrieval-augmented generation with large language models in nephrology: advancing practical applications. Medicina, 60(3), 445.

Jeong, M., Sohn, J., Sung, M., & Kang, J. (2024). Improving medical rea- soning through retrieval and self-reflection with retrieval-augmented large language models. Bioinformatics, 40(Supplementc 1), i119-i129.

Alkhalaf, M., Yu, P., Yin, M., & Deng, C. (2024). Applying generative AI with retrieval augmented generation to summarize and extract key clinical information from electronic health records. Journal of biomedical informat- ics, 156, 104662.

Shanafelt, T. D., Dyrbye, L. N., Sinsky, C., Hasan, O., Satele, D., Sloan, J., & West, C. P. (2016, July). Relationship between clerical burden and characteristics of the electronic environment with physician burnout and professional satisfaction. In Mayo clinic proceedings (Vol. 91, No. 7, pp. 836-848). Elsevier.

Moy, A. J., Schwartz, J. M., Chen, R., Sadri, S., Lucas, E., Cato, K. D., & Rossetti, S. C. (2021). Measurement of clinical documentation burden among physicians and nurses using electronic health records: a scoping review. Journal of the American Medical Informatics Association, 28(5), 998-1008.

Wang, D., & Zhang, S. (2024). Large language models in medical and healthcare fields: applications, advances, and challenges. Artificial intelli- gence review, 57(11), 299.

Ke, Y., Jin, L., Elangovan, K., Abdullah, H. R., Liu, N., Sia, A. T. H., ... & Ting, D. S. W. (2024). Development and testing of retrieval augmented generation in large language models–a case study report. arXiv preprint arXiv:2402.01733.

Ullah, E., Parwani, A., Baig, M. M., & Singh, R. (2024). Challenges and barriers of using large language models (LLM) such as ChatGPT for diag- nostic medicine with a focus on digital pathology–a recent scoping review. Diagnostic pathology, 19(1), 43.

Lu, Y., Zhao, X., & Wang, J. (2024, August). ClinicalRAG: Enhancing clinical decision support through heterogeneous knowledge retrieval. In Pro- ceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024) (pp. 64-68).

Chen, Z., Cano, A. H., Romanou, A., Bonnet, A., Matoba, K., Salvi, F., ... & Bosselut, A. (2023). Meditron-70b: Scaling medical pretraining for large language models. arXiv preprint arXiv:2311.16079.

Downloads

Published

2024-12-30

How to Cite

AI DRIVEN CLINICAL DECISION SUPPORT SYSTEMS: A RETRIEVAL AUGMENTED GENERATION APPROACH FOR HEALTHCARE DELIVERY AND EFFICIENCY (A. R. Lindon, H. A. Khan, N. Y. Nadia, H. R. . Rabby, & M. H. Arif , Trans.). (2024). Cuestiones De Fisioterapia, 53(03), 5172-5189. https://doi.org/10.48047/v90mq567