Editor | Cabbage Leaf
The large-scale pre-trained base model has achieved great success in non-medical fields. However, training these models often requires large, comprehensive datasets, in contrast to the smaller and more specialized datasets common in biomedical imaging.
Researchers at the Fraunhofer Institute for Digital Medicine MEVIS in Germany proposed a multi-task learning strategy that separates the number of training tasks from memory requirements.
They trained a universal biomedical pre-trained model (UMedPT) on a multi-task database including tomography, microscopy and X-ray images and employed various labeling strategies such as classification, segmentation and object detection. The UMedPT base model outperforms ImageNet pre-trained and previous STOA models.
In external independent validation, imaging features extracted using UMedPT were proven to set a new standard for cross-center transferability.
The study was titled "Overcoming data scarcity in biomedical imaging with a foundational multi-task model" and was published in "Nature Computational Science" on July 19, 2024.
Deep learning is gradually revolutionizing biomedical image analysis due to its ability to learn and extract useful image representations.
The general method is to pre-train the model on a large-scale natural image dataset (such as ImageNet or LAION), and then fine-tune it for specific tasks or directly use the pre-trained features. But fine-tuning requires more computing resources.
At the same time, the field of biomedical imaging requires a large amount of annotated data for effective deep learning pre-training, but such data is often scarce.
Multi-task learning (MTL) provides a solution to data scarcity by training a model to solve multiple tasks simultaneously. It leverages many small and medium-sized datasets in biomedical imaging to pre-train image representations suitable for all tasks and is suitable for data-scarce domains.
MTL has been applied to biomedical image analysis in a variety of ways, including training from multiple small and medium-sized datasets for different tasks, and using multiple label types on a single image, demonstrating that shared features can improve task performance.
In the latest research, in order to combine multiple datasets with different label types for large-scale pre-training, researchers from the MEVIS Institute introduced a multi-task training strategy and corresponding model architecture, specifically through learning Versatile representations across different modalities, diseases, and label types to address data scarcity in biomedical imaging.
To cope with the memory constraints encountered in large-scale multi-task learning, this method adopts a gradient accumulation-based training loop, whose expansion is almost unlimited by the number of training tasks.
On this basis, the researchers trained a fully supervised biomedical imaging base model called UMedPT using 17 tasks and their original annotations.
The image below shows the architecture of the team’s neural network, which consists of shared blocks including an encoder, segmentation decoder, and localization decoder, as well as task-specific heads. Shared blocks are trained to be applicable to all pre-training tasks, helping to extract common features, while task-specific supervisors handle label-specific loss calculations and predictions.
The set tasks include three supervised label types: object detection, segmentation and classification. For example, classification tasks can model binary biomarkers, segmentation tasks can extract spatial information, and object detection tasks can be used to train biomarkers based on cell numbers.
Illustration: UMedPT’s architecture. (Source: Paper)
UMedPT consistently matches or outperforms pretrained ImageNet networks on both in-domain and out-of-domain tasks, while maintaining strong performance using less training data when directly applying image representation (freezing) and fine-tuning settings.
Illustration: Results of tasks within the domain. (Source: paper)
For classification tasks associated with pre-trained databases, UMedPT is able to achieve the best performance of the ImageNet baseline on all configurations using only 1% of the original training data. This model achieves higher performance using frozen encoders compared to the model using fine-tuning.
Illustration: Results for out-of-domain tasks (source: paper)
For out-of-domain tasks, UMedPT is able to match the performance of ImageNet using only 50% or less data, even with fine-tuning applied.
Additionally, the researchers compared the performance of UMedPT with results reported in the literature. When using the frozen encoder configuration, UMedPT exceeded the external reference results in most tasks. In this setting, it also outperforms the average area under the curve (AUC) in the MedMNIST database 16 .
It is worth noting that the tasks for which the frozen application of UMedPT did not outperform the reference results were outside the domain (BC-Bach-WSI for breast cancer classification and CNS-MRI for CNS tumor diagnosis). With fine-tuning, pre-training with UMedPT outperforms external reference results in all tasks.
Illustration: The amount of data required by UMedPT to achieve state-of-the-art performance on tasks in different imaging domains. (Source: Paper)
As a foundation for future developments in data-scarce fields, UMedPT opens up the prospect of deep learning applications in medical fields where collecting large amounts of data is particularly challenging, such as rare diseases and pediatric imaging.
Paper link:https://www.nature.com/articles/s43588-024-00662-z
Related content:https://www.nature.com/articles/s43588-024-00658- 9
The above is the detailed content of New standard for AI imaging, only 1% of original data can achieve the best performance, general medical basic model published in Nature sub-journal. For more information, please follow other related articles on the PHP Chinese website!