Editor’s note: This is the first story in a two-part series on the use of foundation models in the medtech industry. The second story will be published on Tuesday.
More medical device companies are promoting the use of “foundation models,” a type of artificial intelligence that can be adapted to a wide variety of tasks. However, there are still questions about use of the technology in the medtech industry.
In the past year, GE Healthcare promoted an MRI research foundation model, Philips announced plans to work with Nvidia to build a foundation model for MRIs, and several abstracts at last year's Radiological Society of North America meeting focused on how to assess and improve foundation models. The Food and Drug Administration also updated its database of AI-enabled medical devices to note that it is exploring ways to identify and tag devices that incorporate foundation models.
However, the definition of what counts as a foundation model is unclear, experts said, and it’s difficult to know if the tools available today are helping radiologists and patients.
What are foundation models?
Magdalini Paschali, a postdoctoral scholar at Stanford’s Department of Radiology, said foundation models have a few key characteristics: They’re trained on large datasets, which mostly consist of unlabeled data. They can process multiple types of data, such as images, text, medical history and genomics. And finally, they can tackle a wide variety of tasks, such as a model detecting a disease it hadn’t seen during training.

Paschali published a paper earlier this year in RSNA’s Radiology journal looking to define the technology more clearly.
In practice, “everything can really be defined as a foundation model,” said Akshay Chaudhari, an assistant professor of radiology and biomedical data science at Stanford.
Chaudhari said the term “foundation model” was first coined at Stanford in 2021. In healthcare, one of the first versions was the debut of Google’s Med-PaLM in late 2022, a large language model designed to answer medical questions.
Foundation models started becoming more prominent at RSNA in 2023, Chaudhari said.
Traditional deep learning models used in radiology, such as those for detecting pneumonia, are focused on a specific health condition and rely on labeled data. For example, radiologists would go through images, circling instances of pneumonia, or highlighting it in the text report, said Nina Kottler, associate chief medical officer for clinical artificial intelligence at Radiology Partners.
For foundation models, which are trained on millions of images instead of thousands of images, requiring labeled data simply isn’t practical, Chaudhari said.
Are foundation models more accurate?
Some medical device developers claim foundation models are more accurate than narrow AI models. For example, Aidoc, which makes triage software for radiology, says foundation models allow for faster development of more accurate AI tools.
Experts said the devices’ accuracy depends on how they’re built. Stanford’s Paschali said a foundation model used straight “out of the box,” without any specialization or additional training, may perform worse than a more specific AI tool, but it could work better once it has seen some examples and context.
“The best we can do is look at the summary statements that come from the FDA clearance documents, and at least on the market, we haven't really seen the benefits of these foundation models yet.”
Akshay Chaudhari
Assistant professor of radiology and biomedical data science at Stanford
Because foundation models are trained on such a vast amount of data, they can perform better at finding rare events, such as brain aneurysms, Kottler said.
“When you only have a small number of people that have something, finding that thing is like a needle in a haystack,” Kottler said. “You need a very accurate model to be able to do that.”

Another area where foundation models can shine is allowing for faster development of other AI models. Building a traditional narrow AI model, it might take six months to clean the data, label the data and go through training, Kottler said. Different iterations of a foundation model can be built in a matter of weeks.
In practice, however, it’s difficult to know if those advantages have translated to real benefits for patients or their care teams. For foundation models that have gone through the FDA, there’s little public information backing companies’ claims, Stanford’s Chaudhari said.
“The best we can do is look at the summary statements that come from the FDA clearance documents,” he said, “and at least on the market, we haven't really seen the benefits of these foundation models yet.”
Evaluating foundation models
Currently, the foundation models that have been authorized by the FDA are designed to solve a specific task, such as Aidoc’s rib fracture triage tool that is built on the company’s foundation model. Aidoc received 510(k) clearance for its model using an older version of the rib fracture triage tool as the predicate. The 510(k) process requires that manufacturers demonstrate substantial equivalence between their device and a predicate device that can be marketed in the U.S.
For broader models that incorporate language and images or video, there are no guidelines yet, Chaudhari said.
Some hospitals have come up with systems for evaluating AI models, but they’re not perfect. For example, hospitals will identify a need, such as a model that can identify pneumonia in X-ray images or a model to draft X-ray reports.
“Then they'll compile 1,000 images from their site where they know the labels, and then they'll basically hold a competition to see which vendor maximizes the performance on that dataset,” Chaudhari said. “It’s rudimentary, but that's really the best of what we have, because that's the only way we can assess that local performance will actually be ideal for a given task.”

This might be doable for larger academic medical centers that have a data science team, but “a lot of hospitals will just deploy models and get questionable quality data out of it,” Chaudhari added.
To test a foundation model for accuracy, it’s important to first define metrics and tasks for the model based on claims about what it can do, Paschali said. Hospitals should also test how the model performs in different subgroups of patients and different scanner types. Finally, hospitals should “stress test” a model to identify potential issues that could arise, such as around very rare diseases.
“That's why it's very important to work closely with radiologists, because they can help us design the stress testing in a very thorough manner,” Paschali said. “Because they have seen so many cases, and they know when even tasks are difficult for them.”
The hope is that foundation models, being trained on much larger datasets spanning different states, types of hospitals and imaging machines, could require less evaluation before deployment.
“I don't think we've gotten there, at least there is no evidence out in the world,” Chaudhari said. “But that's the allure of what these foundation models can provide.”
Another goal is to free up radiologists’ time, amid an ongoing shortage of radiologists in the U.S. and a growing number of images.
“If we're asking them to verify the output, to doublecheck everything,” Chaudhari said, “is that actually fulfilling the promise that these models have?”