<p>“Are juice cleanses good for you?” </p><p>“My headache has been persistent for days, what could it be?” </p><p>“How can I manage anxiety?” </p><p>People turn to artificial intelligence (AI)-powered chatbots for even the smallest inconveniences. A recent <a href="https://www.eurekalert.org/news-releases/1129964">study </a>by Penn State has found that AI’s responses to everyday health queries are nearly 76 per cent accurate. The evidence raises concerns over increased dependency on AI and till what extent it should be trusted.</p><p>The researchers wanted to understand how the average person uses AI for health-related concerns and how accurately AI responds to everyday medical queries. </p><p>To understand how accurate or harmful health-related AI responses could be for an average internet user, the researchers held an AI competition called a Diagnose-a-thon at Penn State. A total of 34 participants—comprising faculty, staff and undergraduate and graduate students—submitted 212 prompts and AI-generated responses to real and imaginary health concerns written from both patient and doctor perspectives. Participants were allowed to choose one of four platforms to use for the contest: ChatGPT-4o, ChatGPT-3.5, Gemini-1.5 Pro and Llama3-8b.</p><p>They found that overall, 76.2 per cent of AI-generated responses provided accurate information. Specialties such as obstetrics and gynecology and otolaryngology, the treatment of disorders that affect the ear, nose and throat, saw the best performance, with high validity scores and low harm scores. </p><p>Internal medicine, neurology and dermatology saw the worst AI performance, with low validity scores and higher harm scores, according to the researchers. They added that very specific prompts, and prompts between 60 and 250 characters, resulted in more accurate outputs.</p>.AI chatbots might go lengths to flatter its users — can it be troublesome?.<p>“Our work focuses explicitly on healthcare scenarios that the average internet user might ask AI, which is a perspective that prior research into large language models (LLMs) and healthcare has not covered,” said study co-author Amulya Yadav, associate professor of informatics and intelligent systems in Penn State’s College of Information Sciences and Technology (IST). </p><p>“We wanted to understand that if people are using platforms like ChatGPT as a symptom health checker, like historically we’ve used Google, how accurate is the LLM in answering those queries, and how harmful could those responses be?”</p><p>Meanwhile, Bonam Mingole, lead author of the study, said, “This type of research is important for understanding how the public uses AI in their daily life.” The study was an effort to replicate real-world usage of AI platforms by participants as they would on a normal day. </p><p>The researchers then asked nine board-certified physicians to evaluate the accuracy of the AI-generated responses and how harmful they may be using a six-point scale ranging from very low to very high. A competition committee awarded prizes to the top eight submissions that generated the most medically accurate information and a prize to the submission that generated the response most likely to cause harm.</p><p>“We are entering a new age of healthcare, and AI is a significant part of it,” said study co-author Jennifer Kraschnewski, director of the Penn State Clinical and Translational Science Institute and professor in internal medicine at the Penn State College of Medicine. “There is a real opportunity for healthcare to transform, to integrate these new tools so that clinicians like myself can use them to improve patient care.”</p><p>Researchers warned that the AI error rates exceeded 20 per cent, roughly double the error rate of humans; those errors could potentially be harmful to patients. “I do not think AI will replace human physicians, but I do think there is a huge opportunity for us to help upskill today’s physicians in a way that’s never been done before,” said Kraschnewski.</p><p>People will continue to use AI to diagnose their health problems. Understanding their use patterns can help educate people to use it rightfully and seek expert advice when necessary. </p>
<p>“Are juice cleanses good for you?” </p><p>“My headache has been persistent for days, what could it be?” </p><p>“How can I manage anxiety?” </p><p>People turn to artificial intelligence (AI)-powered chatbots for even the smallest inconveniences. A recent <a href="https://www.eurekalert.org/news-releases/1129964">study </a>by Penn State has found that AI’s responses to everyday health queries are nearly 76 per cent accurate. The evidence raises concerns over increased dependency on AI and till what extent it should be trusted.</p><p>The researchers wanted to understand how the average person uses AI for health-related concerns and how accurately AI responds to everyday medical queries. </p><p>To understand how accurate or harmful health-related AI responses could be for an average internet user, the researchers held an AI competition called a Diagnose-a-thon at Penn State. A total of 34 participants—comprising faculty, staff and undergraduate and graduate students—submitted 212 prompts and AI-generated responses to real and imaginary health concerns written from both patient and doctor perspectives. Participants were allowed to choose one of four platforms to use for the contest: ChatGPT-4o, ChatGPT-3.5, Gemini-1.5 Pro and Llama3-8b.</p><p>They found that overall, 76.2 per cent of AI-generated responses provided accurate information. Specialties such as obstetrics and gynecology and otolaryngology, the treatment of disorders that affect the ear, nose and throat, saw the best performance, with high validity scores and low harm scores. </p><p>Internal medicine, neurology and dermatology saw the worst AI performance, with low validity scores and higher harm scores, according to the researchers. They added that very specific prompts, and prompts between 60 and 250 characters, resulted in more accurate outputs.</p>.AI chatbots might go lengths to flatter its users — can it be troublesome?.<p>“Our work focuses explicitly on healthcare scenarios that the average internet user might ask AI, which is a perspective that prior research into large language models (LLMs) and healthcare has not covered,” said study co-author Amulya Yadav, associate professor of informatics and intelligent systems in Penn State’s College of Information Sciences and Technology (IST). </p><p>“We wanted to understand that if people are using platforms like ChatGPT as a symptom health checker, like historically we’ve used Google, how accurate is the LLM in answering those queries, and how harmful could those responses be?”</p><p>Meanwhile, Bonam Mingole, lead author of the study, said, “This type of research is important for understanding how the public uses AI in their daily life.” The study was an effort to replicate real-world usage of AI platforms by participants as they would on a normal day. </p><p>The researchers then asked nine board-certified physicians to evaluate the accuracy of the AI-generated responses and how harmful they may be using a six-point scale ranging from very low to very high. A competition committee awarded prizes to the top eight submissions that generated the most medically accurate information and a prize to the submission that generated the response most likely to cause harm.</p><p>“We are entering a new age of healthcare, and AI is a significant part of it,” said study co-author Jennifer Kraschnewski, director of the Penn State Clinical and Translational Science Institute and professor in internal medicine at the Penn State College of Medicine. “There is a real opportunity for healthcare to transform, to integrate these new tools so that clinicians like myself can use them to improve patient care.”</p><p>Researchers warned that the AI error rates exceeded 20 per cent, roughly double the error rate of humans; those errors could potentially be harmful to patients. “I do not think AI will replace human physicians, but I do think there is a huge opportunity for us to help upskill today’s physicians in a way that’s never been done before,” said Kraschnewski.</p><p>People will continue to use AI to diagnose their health problems. Understanding their use patterns can help educate people to use it rightfully and seek expert advice when necessary. </p>