fit()
and fit_transform()
? These two functions often appear during data preprocessing. Let’s take a closer look at their differences and illustrate them with examples.
Data standardization is an important preprocessing step that usually requires calculating various parameters of the data, such as mean, minimum, maximum, and variance. fit_transform()
will calculate these parameters and apply them to the data set, while fit()
will only calculate these parameters and not apply them to the data set.
Suppose we have a small data array:
data = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Use fit()
and transform()
respectively:
<code class="language-python">from sklearn.preprocessing import StandardScaler # 步骤 1 scaler = StandardScaler() # 步骤 2 scaler.fit(data) # 此处仅计算均值和标准差,不进行数据缩放 # 步骤 3 scaled_data = scaler.transform(data) # 现在 scaled_data 包含标准化后的数据</code>
Use fit_transform()
:
<code class="language-python">from sklearn.preprocessing import StandardScaler # 步骤 1 scaler = StandardScaler() # 步骤 2 scaled_data = scaler.fit_transform(data) # scaled_data 包含标准化后的数据</code>
We can see that using fit_transform()
eliminates an extra step.
Which function to choose depends on your specific application scenario. If you need to first calculate the parameters and then apply the transformation to multiple data sets (such as training and test sets), it is more appropriate to use fit()
and transform()
respectively. But if you only need to apply the transformation to a single dataset, fit_transform()
can make the preprocessing process cleaner.
The above is the detailed content of Fit vs Fit_transform. For more information, please follow other related articles on the PHP Chinese website!