Recently, the research group of Professor Hong Liang from the Institute of Natural Sciences/School of Physics and Astronomy/Zhangjiang Institute of Advanced Research/School of Pharmacy of Shanghai Jiao Tong University, and young researchers from the Shanghai Artificial Intelligence Laboratory talked about protein mutation - Important breakthroughs were made in property prediction.
This work adopts a new training strategy, which greatly improves the performance of traditional protein pre-trained large models in mutation-property prediction using very little wet experimental data.
The research results were titled "Enhancing the efficiency of protein language models with minimal wet-lab data through few-shot learning" and was published in "Nature Communications" on July 2, 2024.
Paper link:Research background
Enzyme engineering requires mutation and screening of proteins to Get a better protein product. Traditional wet experiment methods require repeated experimental iterations, which is time-consuming and labor-intensive.
Deep learning methods can accelerate protein mutation transformation, but require a large amount of protein mutation data to train the model. Obtaining high-quality mutation data is restricted by traditional wet experiments.
There is an urgent need for a method that can accurately predict protein mutation-function without large amounts of wet experimental data.
Research Method
This study proposes the FSFP method, which combines meta-learning, ranking learning and efficient fine-tuning of parameters to train a protein pre-training model using only dozens of wet experimental data, greatly improving the mutation-property prediction effect .
FSFP Method:
Test results show that even if the original prediction correlation is lower than 0.1, the FSFP method can increase the correlation to above 0.5 after training the model using only 20 wet experimental data.
Illustration: FSFP overview. (Source: paper)Research results
At the same time, in order to study the effectiveness of FSFP. We conducted a wet experiment in a specific case of protein Phi29 modification. FSFP was able to predict the top-20 single point mutations of the original protein pre-trained model ESM-1v when only 20 wet experiment data were used to train the model. The positivity rate increased by 25%, and nearly 10 new positive single point mutations could be found.
Summary
In this work, the author proposed a new fine-tuning training method FSFP based on the protein pre-training model.
FSFP comprehensively utilizes meta-learning, ranking learning and efficient parameter fine-tuning technology to efficiently train a protein pre-training model using only 20 random wet experiment data, and can greatly improve the single-point mutation prediction positivity rate of the model.
The above results show that the FSFP method is of great significance in solving the high experimental cycle and reducing experimental costs in current protein engineering.
Author information
Professor Hong Liang from the Academy of Natural Sciences/School of Physics and Astronomy/Zhangjiang Institute for Advanced Study, and Tan Peng, a young researcher from the Shanghai Artificial Intelligence Laboratory, are the corresponding authors.
Postdoctoral fellow Zhou Ziyi from the School of Physics and Astronomy of Shanghai Jiao Tong University, master student Zhang Liang, doctoral student Yu Yuanxi, and doctoral student Wu Banghao from the School of Life Science and Technology are the co-first authors.
The above is the detailed content of Da Hongliang's research group at Shanghai Jiao Tong University & Shanghai AI Laboratory team released FSFP, a small sample prediction method for protein function based on language model, which was published in the Nature sub-journal. For more information, please follow other related articles on the PHP Chinese website!