2025, 12(12): 2371-2398.
doi: 10.1109/JAS.2025.125540
Abstract:
As the costs of global healthcare systems continue to rise, large language models (LLMs) have emerged as a promising technology with vast potential and wide-ranging applications in the medical field. We provide a detailed overview of the lifecycle of medical LLMs, encompassing three key stages: get, refine, and use, aimed at assisting healthcare practitioners and patients in utilizing these models more effectively. We also summarize the currently widely used medical evaluation benchmarks, analyzing their advantages and limitations. Furthermore, we conduct a comparative analysis of specialized medical LLMs on benchmarks such as MedQA, PubMedQA, MMLU-MED, and MedMCQA, revealing that methods like retrieval-augmented generation (RAG) enable smaller models to outperform larger ones by effectively integrating external medical knowledge. This review provides a reference for medical professionals to evaluate LLM capabilities and inspires the development of more effective benchmarking methods. Additionally, we showcase practical applications of medical LLMs in clinical, research, and educational settings, providing healthcare workers with valuable resources. Finally, we identify current challenges faced by medical LLMs and present outlooks for future technological advancements, aiming to inspire users to explore new ways to address existing issues. This review serves as an entry point for interested clinicians, helping them determine whether and how to integrate LLM technology into healthcare for the benefit of patients and practitioners.
Z. Hu, Z. Peng, Z. Bi, Q. Shen, Z. Liu, J. Lou, and X. Luo, “Advancing healthcare with large language models: Techniques and application,” IEEE/CAA J. Autom. Sinica, vol. 12, no. 12, pp. 2371–2398, Dec. 2025. doi: 10.1109/JAS.2025.125540.