Stanford's MUSK Model Brings Multimodal AI to Cancer Prognosis

A new artificial intelligence tool developed at Stanford Medicine combines data from medical images with text to predict cancer prognoses and treatment responses with significantly higher accuracy than standard clinical methods.

The model, named MUSK (multimodal transformer with unified mask modeling), represents a marked deviation from how artificial intelligence is currently used in clinical care. While AI tools have been primarily deployed for diagnostics -- identifying whether a microscope slide or scan shows signs of cancer -- MUSK moves into the realm of prognosis: predicting likely patient outcomes and determining which therapies are most effective for individuals.

"We designed MUSK because, in clinical practice, physicians never rely on just one type of data to make clinical decisions," said Ruijiang Li, MD, associate professor of radiation oncology at Stanford and senior author of the study. "We wanted to leverage multiple types of data to gain more insight and get more precise predictions about patient outcomes."

How MUSK Works

MUSK is what's called a foundation model, pretrained on vast amounts of data that can be customized with additional training to perform specific tasks. The key innovation is its ability to use unpaired multimodal data -- medical images and text that don't need to be explicitly linked during training -- which expands the pool of usable training data by several orders of magnitude.

The model was trained on 50 million medical images of standard pathology slides and more than 1 billion pathology-related texts. When applied to predict outcomes across 16 major cancer types, the results were striking:

Metric	MUSK	Standard Methods
Disease-specific survival accuracy	75%	64%
Immunotherapy response (NSCLC)	77%	61% (PD-L1 expression)
Melanoma recurrence prediction	83%	~71% (other foundation models)

Treatment Response Prediction

Perhaps more clinically significant is MUSK's ability to predict treatment response. For non-small cell lung cancer, the model correctly identified patients who benefited from immunotherapy treatment about 77% of the time. The standard method of predicting immunotherapy response based on PD-L1 protein expression was correct only about 61% of the time.

Similarly, for melanoma patients, MUSK predicted which individuals were most likely to experience a recurrence within five years after initial treatment with approximately 83% accuracy -- about 12 percentage points higher than other foundation models.

"The biggest unmet clinical need is for models that physicians can use to guide patient treatment," Li explained. "Does this patient need this drug? Or should we instead focus on another type of therapy? Currently, physicians use information like disease staging and specific genes or proteins to make these decisions, but that's not always accurate."

Clinical Implications

The researchers see MUSK as an "off-the-shelf" tool that doctors can fine-tune to help answer specific clinical questions without requiring massive labeled datasets for each use case. This approach could make multimodal AI more accessible to hospitals and clinics that lack the resources to train models from scratch.

The study was published in Nature and involved researchers from Harvard Medical School. Funding came from the National Institutes of Health and the Stanford Institute for Human-Centered Artificial Intelligence (HAI).

Stanford's MUSK Model Brings Multimodal AI to Cancer Prognosis