The saga of Utah’s Rx refill robot: A bold bet on AI and the researchers who cried foul – MedCity News

The saga of Utah’s Rx refill robot: A bold bet on AI and the researchers who cried foul – MedCity News

As AI models touch more and more aspects of clinical care, the medical community, government and public continue to wonder how to manage such transformative change. Many questions remain unanswered regarding reliability, transparency, safety, security and ethics.

This conundrum is currently playing out in Utah. In January, the state became the first in the country to enable an AI system to autonomously manage routine prescription refills for patients suffering from chronic conditions. The pilot aims to reduce delays and friction in the prescription refill process, which can be a major barrier to medication adherence. However, earlier this month, researchers said they discovered flaws in a chatbot created by a New York-based startup. Doctronicsthe same company Utah is partnering with for its pilot project.

Doctronic operates a telehealth clinic in all 50 states, offering insurance-covered care provided by its in-house physicians, employed as W-2 employees. It also creates AI systems designed to help clinicians manage routine prescription refills, developed using guidelines written by its own doctors.

THE report criticizing Doctronic’s AI was published by the London-based company AI Mindgarda cybersecurity and research company born from Lancaster University. It sells AI vulnerability tools and specializes in stress testing AI systems to detect safety and security vulnerabilities.

In the report, Mindgard explained how it tricked the system into producing dangerous medical advice and changing prescribed doses. However, Doctronic and Utah Office AI Policy say the vulnerabilities discovered by Mindgard do not reflect the AI ​​system that currently manages patient prescriptions in the state, noting that the AI ​​robot involved in the pilot operates under strict safeguards.

Nonetheless, the survey highlights the challenges faced by regulators and AI developers in ensuring that these models behave reliably in real-world settings.

Utah is trying something new

Research watch that nearly half of people with heart disease or diabetes do not adhere to their medication regimen, only leading to preventable complications and more costly care down the road. By automating this routine task, Utah hopes to relieve exhausted clinicians while ensuring patients receive their medications more quickly.

The state said the primary goal is to improve compliance and collect real-world data on the safety and effectiveness of AI-assisted medication delivery.

As part of the pilot, Doctronic’s system only manages refills for patients already under the care of a clinician, with monitoring built into the process to ensure prescribing decisions remain monitored by doctors and other healthcare professionals.

Mindgard conducted its survey in January, shortly after the pilot program launched.

In its report, Mindgard showed that Doctronic’s AI could be jailbroken by exploiting flaws in its system’s prompts – the hidden instructions that govern its behavior. By tricking the AI ​​robot into reciting and then rewriting these instructions, the researchers were able to make it generate dangerous clinical advice, including wildly incorrect medication doses and instructions for illegal drugs.

For example, when researchers cited a fabricated regulator and a fake press release, the AI ​​model said this would triple the standard prescribed dose of Oxycontin.

Source: Mindgard AI

Peter Garraghan, founder and chief scientific officer of Mindgard, highlighted that the survey aims to highlight systemic safety and security risks in AI applications in healthcare. in general, not just with Doctronic algorithms in particular.

He explained that researchers are usually able to extract system prompts from a chatbot simply by chatting with it. In other words, by using carefully designed questions, researchers can typically manipulate the AI ​​model to reveal its underlying instructions.

After Mindgard researchers were able to extract parts of these instructions for Doctronic’s AI model, they learned details about the model’s guarantees and knowledge deadline. The robot told them it was the knowledge base is limited to data published before June 2024.

They then manipulated the system further, providing it with “new guidelines” that a made-up medical authority had issued after the knowledge deadline.

Because large language models are designed to be useful and cannot truly verify information, the system accepted the false instructions and began generating dangerous output, Garraghan said.

He pointed out that the vulnerability of the AI ​​model stems from fundamental flaws in large language models, which cannot inherently distinguish between secure data and control instructions, making them susceptible to social engineering and manipulation.

“At a high level, I’m not particularly surprised, but it’s more of an indictment of the entire industry, as opposed to Doctronic itself. The difference is that Doctronic’s domain is very large. It’s one thing to have an AI chatbot that has a database of music records, for example, that doesn’t contain anything sensitive, versus people using it for medical advice and perhaps prescriptions. That’s a concern a lot more serious,” he remarked.

Separate fear from reality

Doctronic co-CEOs – Matt Pavelle and Dr. Adam Oskowitz – said Mindgard had uncovered no new risks, noting that the types of rapid manipulation vulnerabilities demonstrated by the report were already well understood in the AI ​​community.

Like Garraghan, they argued that these problems are a general feature of LLMs and are not unique to Doctronic systems. They also pointed out that Mingard wasn’t even testing the specific AI model deployed in the Utah pilot.

“The Utah model is structurally different from the one that was tested. Medications are extracted from the patient’s medical record. The AI ​​can only refill what has already been prescribed. Dosages and other checks are performed on external clinical databases. Abnormal behavior is automatically transmitted to a human doctor,” Pavelle said.

So if Mingard had attempted similar prompts on the actual model they claimed to be testing, they would be rejected, he said. Mindgard’s Garraghan responded by saying his organization “would not be able to prove or disprove the existence of another instance of the chatbot.”

Pavelle emphasized that Mindgard’s findings reflected the limitations of a single-session experiment rather than any real risks in the model deployed in Utah.

“[Mindgard] demonstrated that a chatbot can be asked to generate dangerous text. Importantly, this took place over the course of a single session, which is a known property of how large language models operate under adversarial incentive. But this text does not authorize a prescription. This text has not changed how the system actually works for other users,” Pavelle said.

He also noted that Utah’s pilot prohibits the robot from authorizing any new prescriptions, renewing prescriptions for controlled substances or making changes to the treatment plan.

If Pavelle is to be believed, this means that one of the most controversial and concerning findings in Mindgard’s report – the fact that Doctronic’s AI bot said it would inappropriately increase a dose of Oxycontin after manipulative inducements – represents little practical concern. Increasing a drug dose would never be allowed under the safety framework Doctronic has in place with the state of Utah, Pavelle noted.

The pilot also uses a strict formulary – a predefined list of 190 drugs that Doctronic’s AI is allowed to manage – which prevents the system from refilling drugs outside of that list or changing dosages, he pointed out.

“It’s absolutely impossible for the chatbot to change the rest of the code to change a prescription or prescribe a medication that isn’t in our formulary. A researcher could convince the chatbot to say it will do that, because I can convince a chatbot to say red is green, but it doesn’t actually do that,” Pavelle said. “I guess you never know, when it comes to people trying to get [improper doses of drugs on the formulary]but I don’t know if there is a big black market for statins.

Utah’s prescription refill bot also can’t verify whether a patient was actually prescribed a medication, he added. Instead, it checks the state’s prescription database to confirm previous prescriptions before authorizing a refill. According to Pavelle, the robot’s protections go beyond what most human doctors do, including real-time checking for drug interactions through First database.

AI with monitoring

Dr. Oskowitz emphasized that while he and Pavelle view Mindgard’s report as posing no real risk to patients, Doctronic continues to take this type of research seriously. With autonomous AI being an innovative addition to clinical care, he believes startups need to work hard to ensure patients are more comfortable using these systems.

He pointed to Doctronic’s “guardian” system, an additional AI layer that monitors conversations in real time to detect risky behavior or medical emergencies and can intervene if something seems dangerous.

Additionally, Doctronic’s AI is limited to medical advice based on evidence-based guidelines, which limits the risk of misinformation for normal users who are not deliberately trying to mislead the system, Dr. Oskowitz added. He said these guidelines were written by Doctronic doctors specifically for use by the company’s AI models.

He also stressed that safety measures must be weighed against the real risk of patients missing essential medicines.

“There are real problems. People die every year because they can’t get their medications,” Dr. Oskowitz noted.

There are approximately 125,000 preventable deaths in the United States each year due to medication non-adherence. Much of this has to do with the unaffordability of medications, but much of it is simply due to too much friction in the system. e – a problem that the Utah pilot seeks to solve, Dr. Oskowitz explained.

The Utah Office of AI Policy shares Doctronic’s take on the situation.

“We understand why reports like this raise questions, and we take them seriously. An independent red team can surface cases that are not encountered in normal use, and this type of stress testing is valuable as these systems evolve,” read a statement emailed to MedCityNews.

The office also said it was aware of these types of risks before the pilot began. It is why he structured this program with tiered safeguards, escalation pathways, reporting requirements, monitoring phases and physician review. It is important to note that these doctors are employees of Doctronic.

Balance between innovation and prudence

One of these full-time employees… Dr. Thomas Savagean internal medicine physician who has been with the company for seven months — said he and other doctors at Doctronic closely examined the results of each patient interaction to ensure the system was working as intended. He added that his team of doctors is working “closely with Utah.”

Doctronic and Utah are continuing to collect data before determining whether the pilot can be considered a success, but Dr. Savage nonetheless said he believes charging robots and similar automation tools could help address real clinical challenges when deployed safely.

“There are a lot of tasks that physicians, or health care providers in general, do, where we just need to find the contained box that is appropriate to use these technologies to assist in clinical care. And that’s part of what we’re doing with Utah,” he noted.

For clinicians, many tasks are very simple but very tedious and repetitive, such as refilling prescriptions, reviewing lab results, responding to patient portal messages, and completing prior authorization paperwork. As more tools are introduced to handle these tasks independently, the goal is not to replace doctors. s – but to automate narrowly defined administrative tasks that follow clear rules.

For Doctronic and Utah, refilling prescriptions for stable patients seemed like a good place to start. It’s a task that often creates delays for patients, but requires little clinical judgment when strict guidelines are in place, Dr. Savage explained.

All things considered, Mindgard’s report appears to raise a relevant policy question. It’s not a question of whether edge cases exist—that’s the case in all major language models—but whether technology developers, vendors, and regulators are exercising due diligence when venturing into uncharted territory: drug renewal without a human being involved.

Doctronic and the Utah Office of AI Policy say that for their charging robot pilot, their answer is yes. They believe they strike the right balance between innovation and safety through strict protocols, medical oversight and ongoing monitoring.

Both organizations maintain that the use of this robot does not endanger patients. And until concrete evidence demonstrates otherwise, they see no reason to slow the rollout.

Photo: Irina_Strelnikova, Getty Images

Exit mobile version