After the previous two sections, we can start to talk about the source of power for machine learning: gradient descent .
Gradient descent is not a very complicated mathematical tool. Its history has been more than 200 years. However, people may not have expected that such a relatively simple mathematical tool will become the basis of many machine learning algorithms, and it also has a history of more than 200 years. Together with neural networks, it ignited the deep learning revolution.
Find the partial derivatives of each parameter of the multivariate function, and then write the obtained partial derivatives of each parameter in the form of a vector, which is the gradient.
Specifically, the function f (x1, x2) of the two independent variables corresponds to the two features in the machine learning data set. If the partial derivatives are obtained for x1 and x2 respectively, then the obtained gradient vector It is (∂f/∂x1, ∂f/∂x2) T, which can be expressed mathematically as Δf (x1, x2). So what is the point of calculating the gradient vector? Its geometric meaning is the direction in which the function changes, and it is the fastest changing direction. For function f(x), at point (x0, y0), the direction of the gradient vector is the direction in which the y value increases fastest. In other words, along the direction of the gradient vector Δf (x0), the maximum value of the function can be found. On the other hand, along the opposite direction of the gradient vector, that is, the direction of -Δf(x0), the gradient decreases fastest and the minimum value of the function can be found. If the value of the gradient vector at a certain point is 0, then it has reached the lowest point (or local lowest point) of the function with a derivative of 0.
It is very common to use downhill as a metaphor for gradient descent in machine learning. Imagine that you are standing somewhere on a large mountain, looking at the endless terrain in the distance, only knowing that the location in the distance is much lower than here. You want to know how to go down the mountain, but you can only go down step by step, that is, every time you reach a position, find the gradient of the current position. Then, take a step down along the negative direction of the gradient, that is, go down the steepest place, continue to solve the gradient of the new position, and continue to take a step down along the steepest place at the new position. Just walk step by step until you reach the bottom of the mountain, as shown in the picture below.
From the above explanation, it is not difficult to understand why we just mentioned the concavity and convexity of the function. Because, in a non-convex function, it may not reach the bottom of the mountain, but stop at a certain valley. In other words, gradient descent for non-convex functions may not always find the global optimal solution, and may only obtain a local optimal solution. However, if the function is convex, then the gradient descent method can theoretically obtain the global optimal solution.
Gradient descent is very useful in machine learning. Simply put, you can pay attention to the following points.
The essence of machine learning is to find the optimal function.
How to measure whether a function is optimal? The method is to minimize the error between the predicted value and the true value (also called the loss value in machine learning).
You can establish a function between the error and model parameters (preferably a convex function).
Gradient descent can guide us to the global lowest point of the convex function, that is, to find the parameters with the smallest error.
The above is the detailed content of This article will help you understand what gradient descent is. For more information, please follow other related articles on the PHP Chinese website!