Microsoft’s Bing and Google’s Gemini artificial intelligent (AI) chatbots often give misleading and dangerous advice and you may need a degree in a life science to understand them.
Patients often check details of drugs on the Internet after they have been prescribed by a GP, but using search engines rather than going direct to trustworthy source can lead to misinformation.
In February 2023, search engines were fitted with AI-powered chatbots. This promised enhanced search results, comprehensive answers, and a new type of interactive experience.
Wahram Andrikyan, a PhD student at the Institute of Experimental and Clinical Pharmacology and Toxicology in Germany and colleagues used Bing co-pilot to ask 10 typical patient questions about the 50 most prescribed drugs in the US, generating 500 answers in total. The questions covered what the drug was used for, how it worked, instructions for use, common side effects, and contra-indications.
The chatbot answers were assessed for readability by calculating the Flesch Reading Ease Score which predicts the educational level required to understand a particular text.
The overall average answer scored was just over 37, which indicates the need for degree level education to understand it. Even the most readable chatbot answers still required completion of high school.
To assess the completeness and accuracy of chatbot answers, responses were compared with the drug information provided by website drugs.com which contains up-to-date drug information, peer-reviewed by healthcare professionals.
Current scientific consensus, and likelihood and extent of possible harm if the patient followed the chatbot’s recommendations, were assessed by seven experts in medication safety.
Agency for Healthcare Research and Quality (AHRQ) scales were used to rate the likelihood of possible harm as estimated by the experts.
The average chatbot answers were only 77 per cent complete. Only five questions were fully answered, while question three (What do I have to consider when taking the drug?) was answered with the lowest average completeness of only 23 per cent.
Chatbot statements didn’t match the drug reference data in 26 per cent of answers, and were totally inconsistent in three per cent of answers as assessed by a panel of pharmacological experts evaluating independently of each other.
Over 40 per cent of chatbot answers were considered likely to lead to moderate and 22 per cent to severe harm or even death if the patient followed the chatbot’s advice. Only one third were judged as unlikely to result in any harm.
“In this cross-sectional study, we found that search engines with an AI-powered chatbot produced overall complete and accurate answers to some patient questions,” the authors wrote.
“However, chatbot answers were difficult to read and answers often lacked information or showed inaccuracies, possibly threatening patient and medication safety.”
A major drawback was the chatbot’s inability to discriminate between reliable and unreliable sources on the Internet or to understand the underlying intent of a patient who posed a question.
“Despite their potential, it is still crucial for patients to consult their healthcare professionals, as chatbots may not always generate error-free information.
“Caution is advised in recommending AI-powered search engines until citation engines with higher accuracy rates are available,” they concluded.
The article was published last month in British Medical Journal Quality and Safety.
Photo credit; Image: Mongta Studio/Shutterstock.com