Nature's point of view: The testing of artificial intelligence in medicine is in chaos. What should be done?-AI-php.cn

Natures point of view: The testing of artificial intelligence in medicine is in chaos. What should be done?

Editor | ScienceAI

Based on limited clinical data, hundreds of medical algorithms have been approved. Scientists are debating who should test the tools and how best to do so.

Devin Singh witnessed a pediatric patient go into cardiac arrest in the emergency room while waiting for treatment for a long time, which prompted him to explore the use of AI to reduce wait times.
Leveraging triage data from SickKids emergency rooms, Singh and colleagues built a series of AI models to provide potential diagnoses and recommend tests.
One study shows these models can speed up doctor visits by 22.3%, speeding up results processing by nearly 3 hours per patient requiring a medical test.
However, the success of AI algorithms in research is only the first step in validating whether such interventions will help people in real life.
A way to autonomously order tests in the emergency department (ED) using machine learning medical orders (MLMD) (Source: jamanetwork.com)

Who is testing medical AI systems?

AI-based medical applications, such as the one Singh is developing, are often considered medical devices by drug regulators, including the US FDA and the UK Medicines and Healthcare products Regulatory Agency. Therefore, the standards for review and authorization of use are generally less stringent than those for pharmaceuticals. Only a small subset of devices—those that may pose a high risk to patients—require clinical trial data for approval.

Many people think the threshold is too low. When Gary Weissman, a critical care physician at the University of Pennsylvania in Philadelphia, reviewed FDA-approved AI devices in his field, he found that of the ten devices he identified, only three cited published data in their authorization. Only four mentioned a safety assessment, and none included a bias assessment, which analyzes whether the tool's results are fair to different patient groups. "The concern is that these devices can and do impact bedside care," he said. "Patient lives may depend on these decisions."

A lack of data makes it difficult for hospitals and health systems to decide whether to use these technologies. In a difficult situation. In some cases, financial incentives come into play. In the United States, for example, health insurance plans already reimburse hospitals for the use of certain medical AI devices, making them financially attractive. These institutions may also be inclined to adopt AI tools that promise cost savings, even if they don’t necessarily improve patient care.

Ouyang said these incentives may prevent AI companies from investing in clinical trials. “For a lot of commercial businesses, you can imagine they’re going to work harder to make sure their AI tools are reimbursable,” he said.

The situation may vary in different markets. In the United Kingdom, for example, a government-funded national health plan might set higher evidence thresholds before medical centers can purchase specific products, said Xiaoxuan Liu, a clinical researcher at the University of Birmingham who studies responsible innovation in artificial intelligence. There is an incentive to conduct clinical trials. "

Once a hospital purchases an AI product, they do not need to conduct further testing and can use it immediately like other software. However, some agencies recognize that regulatory approval is no guarantee that the device will actually be beneficial. So they chose to test it themselves. Currently, many of these efforts are conducted and funded by academic medical centers, Ouyang said.

Alexander Vlaar, medical director of intensive care at the University Medical Center Amsterdam, and Denise Veelo, an anesthesiologist at the same institution, began such an endeavor in 2017. Their goal was to test an algorithm designed to predict the occurrence of hypotension during surgery. This condition, known as intraoperative hypotension, can lead to life-threatening complications such as heart muscle damage, heart attack and acute kidney failure, and even death.

The algorithm, developed by California-based Edwards Lifesciences, uses arterial waveform data — the red lines with peaks and troughs displayed on monitors in emergency departments or intensive care units. The method can predict hypotension minutes before it occurs, allowing for early intervention.

Natures point of view: The testing of artificial intelligence in medicine is in chaos. What should be done?

Participant flow in the Hypotension Prediction (HYPE) trial. (Source: jamanetwork.com) Vlaar, Veelo and colleagues conducted a randomized clinical trial to test the tool on 60 patients undergoing noncardiac surgery. Patients who used the device experienced an average of eight minutes of hypotension during surgery, while patients in the control group experienced an average of nearly 33 minutes.
The team conducted a second clinical trial, confirming that the device is also effective in more complex settings, including during cardiac surgery and in intensive care units, when combined with a clear treatment regimen. The results have not yet been announced.
The success is not just due to the accuracy of the algorithm. The anesthetist's response to the alarm is also important. So the researchers make sure doctors are well prepared: "We have a diagnostic flowchart that outlines the steps to take when an alert is received," Veelo said. The same algorithm failed to show benefit in a clinical trial conducted by another institution. In that case, "when the alarm went off, the bedside physician failed to follow instructions and take action," Vlaar said.
Humans Involved A perfectly good algorithm may fail due to changes in human behavior, be it healthcare professionals or people receiving treatment.
The Mayo Clinic in Rochester, Minnesota, is testing an in-house developed algorithm for detecting heart disease with low ejection fraction, and Barbara Barry, a human-computer interaction researcher at the center, is responsible for bridging developers and users. The technology gap among primary care providers.
This tool is designed to flag individuals who may be at high risk of developing the condition, which can be a sign of heart failure and can be treated but often goes undiagnosed. A clinical trial showed that the algorithm did increase diagnosis rates. However, in conversations with providers, Barry found that they wanted further guidance on how to discuss the algorithm's results with their patients. This led to suggestions that the app, if widely implemented, should include bullet points of important information to communicate with patients so that healthcare providers don't have to think about how to have this conversation every time. “This is an example of us moving from pragmatic experimentation to an implementation strategy,” Barry said.
Another issue that may limit the success of some medical AI devices is “alert fatigue”—when clinicians are exposed to a large number of AI-generated warnings, they may become desensitized to them. That should be taken into consideration during the testing process, said David Rushlow, chief of the division of family medicine at the Mayo Clinic.
“We receive alerts many times a day about diseases that patients may be at risk for. For busy frontline clinicians, this is actually a very difficult task.” He said, “I think many of these tools will can help us. But if they are not introduced accurately, the default will be to continue doing things the same way because we don't have enough bandwidth to learn new things." Rushlow pointed out.
Consider Bias
Another challenge in testing medical AI is that clinical trial results are difficult to generalize across different populations. "It is well known that artificial intelligence algorithms can be very fragile when they are used on data that is different from the training data," Liu said.
She noted that results can only be safely extrapolated if clinical trial participants are representative of the population for which the tool will be used.
Also, algorithms trained on data collected in resource-rich hospitals may not perform well when applied in resource-poor settings. For example, the Google Health team developed an algorithm for detecting diabetic retinopathy, a condition that causes vision loss in people with diabetes, with theoretically high accuracy. But when the tool was used in clinics in Thailand, its performance dropped significantly.
An observational study showed that lighting conditions in Thai clinics resulted in poor eye image quality, reducing the tool's effectiveness.

Natures point of view: The testing of artificial intelligence in medicine is in chaos. What should be done?

Patient Consent

Currently, most medical AI tools help healthcare professionals with screening, diagnosis, or treatment planning. Patients may not be aware that these technologies are being tested or used routinely in their care, and no country currently requires healthcare providers to disclose this.

The debate continues over what patients should be told about artificial intelligence technology. Some of these apps have pushed the issue of patient consent into the spotlight for developers. That's the case with an artificial intelligence device that Singh's team is developing to streamline care for children in the SickKids emergency room.

What’s strikingly different about this technology is that it removes the clinician from the entire process, allowing the child (or their parent or guardian) to be the end user.

“What this tool does is take emergency triage data, make predictions, and give parents direct approval — yes or no — if their child can be tested,” Singh said. This reduces the burden on the clinician and speeds up the entire process. But it also brings many unprecedented problems. Who is responsible if something goes wrong for the patient? Who pays if unnecessary tests are performed?

“We need to obtain informed consent from families in an automated way.” Singh said, and the consent must be reliable and authentic. “This can’t be like when you sign up for social media and have 20 pages of fine print and you just click accept.”

While Singh and his colleagues wait for funding to start trials on patients, the team is working with legal experts and Engage the country's regulatory agency, Health Canada, in reviewing its proposals and considering regulatory implications. Currently, "the regulatory landscape is a bit like the Wild West," said Anna Goldenberg, a computer scientist and co-chair of the SickKids Children's Medical Artificial Intelligence Initiative.

Natures point of view: The testing of artificial intelligence in medicine is in chaos. What should be done?

Finding solutions

Medical institutions prudently adopt AI tools and conduct autonomous testing.
Cost factors have prompted researchers and medical institutions to explore alternatives.
Large medical institutions have less difficulty, while small institutions face greater challenges.
Mayo Clinic tests AI tool for use in community health care settings.
The Health AI Alliance established a guarantee laboratory to evaluate the model.
Duke University proposes internal testing capabilities to locally verify AI models.
Radiologist Nina Kottler emphasizes the importance of local verification.
Human factors need to be paid attention to to ensure the accuracy of artificial intelligence and end users.

Reference content: https://www.nature.com/articles/d41586-024-02675-0

The above is the detailed content of Nature's point of view: The testing of artificial intelligence in medicine is in chaos. What should be done?. For more information, please follow other related articles on the PHP Chinese website!