Transfer learning is a method of using trained models in existing machine learning tasks to solve new tasks. It can reduce the amount of training data required for new tasks by transferring the knowledge of existing models to new tasks. In recent years, transfer learning has been widely used in fields such as natural language processing and image recognition. This article will introduce the concepts and principles of transfer learning in detail.
Apply different transfer learning strategies and techniques based on the domain of the task and the availability of data.
1. Inductive transfer learning
Inductive transfer learning requires that the source domain and target domain are the same, although the specific tasks handled by the model are different. These algorithms attempt to exploit the knowledge of the source model and apply it to improve the target task. Pre-trained models already have expertise in domain features, giving them a better starting point than training them from scratch.
Inductive transfer learning is further divided into two subcategories based on whether the source domain contains labeled data. These include multi-task learning and self-paced learning respectively.
2. Transductive transfer learning
#Transductive transfer can be used in scenarios where the fields of the source task and the target task are not exactly the same but are related to each other. Learning Strategies. One can draw similarities between source and target tasks. These scenarios usually have a large amount of labeled data in the source domain and only unlabeled data in the target domain.
3. Unsupervised transfer learning
Unsupervised transfer learning is similar to inductive transfer learning. The only difference is that the algorithm focuses on unsupervised tasks and involves unlabeled datasets in both source and target tasks.
4. Strategy based on domain similarity and independent of training data sample type
The isomorphic transfer learning method is developed and proposed to handle the situation where the domains have the same feature space. In isomorphic transfer learning, domains differ only slightly in their marginal distributions. These methods adjust the domain by correcting for sample selection bias or covariate shift.
Heterogeneous transfer learning methods aim to solve the problem of source domain and target domain with different feature spaces and different Other issues such as data distribution and label space. Heterogeneous transfer learning is applied to cross-domain tasks such as cross-language text classification, text-to-image classification, etc.
1. Obtain the pre-trained model
The first step It is based on the task to select the pre-trained model we want to retain as the basis for our training. Transfer learning requires a strong correlation between the knowledge of the pre-trained source model and the target task domain to be compatible.
2. Create a basic model
The basic model is to select an architecture closely related to the task in the first step. There may be such a Situations where the base model has more neurons in the final output layer than required in the use case. In this case, the final output layer needs to be removed and changed accordingly.
3. Freeze the starting layer
Freezing the starting layer of the pre-trained model is crucial to avoid making the model learn basic features. If you do not freeze the initial layer, all learning that has occurred will be lost. This is no different than training a model from scratch, resulting in wasted time, resources, etc.
4. Add a new trainable layer
The only knowledge reused from the base model is the feature extraction layer. Additional layers need to be added on top of the feature extraction layer to predict the model's special tasks. These are usually the final output layers.
5. Train a new layer
The final output of the pre-trained model is likely to be different from the model output we want, in this case , a new output layer must be used to train the model.
6. Fine-tune the model
In order to improve the performance of the model. Fine-tuning involves unfreezing parts of the base model and training the entire model again on the entire dataset at a very low learning rate. A low learning rate will improve the model's performance on new data sets while preventing overfitting.
1. Traditional machine learning models need to be trained from scratch, which requires a large amount of calculation and a large amount of data to achieve high performance. Transfer learning, on the other hand, is computationally efficient and helps achieve better results using small data sets.
2. Traditional machine learning uses an isolated training method, and each model is independently trained for a specific purpose and does not rely on past knowledge. In contrast, transfer learning uses the knowledge gained from a pre-trained model to handle the task.
3. Transfer learning models reach optimal performance faster than traditional ML models. This is because the model leveraging knowledge (features, weights, etc.) from previously trained models already understands these features. It is faster than training a neural network from scratch.
Many model pre-trained neural networks and models form the basis of transfer learning in the context of deep learning, which It is called deep transfer learning.
To understand the process of deep learning models, it is necessary to understand their components. Deep learning systems are layered architectures that can learn different features at different layers. Initial layers compile higher-level features, which are narrowed down to fine-grained features as we go deeper into the network.
These layers are finally connected to the last layer to get the final output. This opens up the limitation of using popular pre-trained networks without having to use their last layer as a fixed feature extractor for other tasks. The key idea is to utilize the weighted layers of a pre-trained model to extract features, but not update the model's weights during training with new data for new tasks.
Deep neural networks are layered structures with many adjustable hyperparameters. The role of the initial layers is to capture generic features, while later layers are more focused on the explicit task at hand. It makes sense to fine-tune the higher-order feature representations in the base model to make them more relevant to specific tasks. We can retrain certain layers of the model while keeping some freezes in training.
A way to further improve model performance is to retrain or fine-tune the weights on the top layer of the pre-trained model while training the classifier. This forces the weights to be updated from a common feature map learned from the model's source task. Fine-tuning will allow the model to apply past knowledge and relearn something in the target domain.
Also, one should try to fine-tune a few top layers rather than the entire model. The first few layers learn basic general features that can be generalized to almost all types of data. The purpose of fine-tuning is to adapt these specialized features to new data sets, rather than overriding general learning.
The above is the detailed content of Understand the strategies, steps, differences, and concepts of transfer learning. For more information, please follow other related articles on the PHP Chinese website!