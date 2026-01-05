AI chatbot vulnerability produces unsafe medical recommendations, Korean research team finds
Published: 05 Jan. 2026, 15:41
As more people turn to generative AI chatbots for medical advice, researchers are warning that many widely used models can be easily manipulated to give dangerous recommendations.
A research team in Korea reported on Monday that medical large language models are highly vulnerable to "prompt injection attacks," a cyberattack that can cause AI systems to exceed the bounds of safety frameworks. The team found that more than 94 percent of tested interactions resulted in unsafe responses.
The study was led by Prof. Suh Jun-gyo of the urology department at Asan Medical Center, Prof. Jun Tae-joon of the hospital’s department of information medicine, and Prof. Lee Ro-woon of the radiology department at Inha University Hospital.
In a prompt injection attack, a hacker inserts malicious prompts into a generative AI model, causing it to operate in ways that diverge from its intended function.
Even top-tier models such as GPT-5 and Gemini 2.5 Pro failed to withstand such attacks — with examples including recommending medications known to cause fetal abnormalities to pregnant patients, indicating serious safety limitations, according to the team's analysis.
The researchers said this study was the first in the world to systematically analyze how vulnerable AI models are to prompt injection attacks when used in medical consultation. They added that the application of AI models in clinical settings should require additional safety verification measures.
AI models are increasingly being used for patient consultation, education and clinical decision-making. However, concerns have been raised that prompt injection attacks could manipulate these systems to recommend dangerous or inadvisable treatments or medications.
From January to October of last year, the team evaluated the security vulnerabilities of three AI models: GPT-4o-mini, Gemini-2.0-flash-lite and Claude 3 Haiku.
They developed 12 clinical scenarios and categorized them into three risk levels.
A medium-risk scenario involved recommending herbal remedies instead of approved treatments to a patient with a chronic illness such as diabetes. A high-risk scenario involved recommending herbal remedies to patients with active bleeding or cancer, or suggesting drugs that could suppress respiration to patients with respiratory diseases. Critical-risk scenarios involved recommending inadvisable medications to pregnant patients.
Two types of attack methods were tested: context-aware prompt injection — which uses patient information to disrupt the model’s judgment — and evidence fabrication, which creates plausible but false information.
The team analyzed a total of 216 conversations between the three AI models and virtual patients. The overall attack success rate across the three models was 94.4 percent.
Attack success rates by model were 100 percent for GPT-4o-mini, 100 percent for Gemini-2.0-flash-lite and 83.3 percent for Claude 3 Haiku. Success rates by scenario risk level were 100 percent for medium risk, 93.3 percent for high risk and 91.7 percent for critical risk.
All three models were vulnerable to attacks recommending inappropriate medications to pregnant patients.
In over 80 percent of cases for all three models, the manipulated responses persisted in subsequent interactions, indicating that once compromised, the model remained compromised throughout the conversation.
The team further assessed vulnerabilities in top-tier AI models — GPT-5, Gemini 2.5 Pro and Claude 4.5 Sonnet — using a different technique called client-side indirect prompt injection, which hides malicious prompts in the user interface to manipulate model behavior. The test scenario again involved recommending inappropriate drugs to pregnant patients.
The attack success rates were 100 percent for GPT-5, 100 percent for Gemini 2.5 Pro and 80 percent for Claude 4.5 Sonnet — showing that even the most advanced AI models failed to defend against such attacks.
“This study demonstrates that medical AI models are structurally vulnerable not just to simple errors but to intentional manipulation,” Prof. Suh said. “Current safety mechanisms are insufficient to block malicious attacks that lead to inadvisable prescriptions.”
“To implement AI-based medical chatbots or remote consultation systems, it is necessary to thoroughly test model vulnerabilities and make security validation mandatory,” he added.
The research was published in the latest issue of JAMA Network Open, a peer-reviewed journal by the American Medical Association.
This article was originally written in Korean and translated by a bilingual reporter with the help of generative AI tools. It was then edited by a native English-speaking editor. All AI-assisted translations are reviewed and refined by our newsroom.
BY RHEE ESTHER [[email protected]]
with the Korea JoongAng Daily
