The k nearest neighbor algorithm is an instance-based or memory-based machine learning algorithm for classification and recognition. Its principle is to classify by finding the nearest neighbor data of a given query point. Since the algorithm relies heavily on stored training data, it can be viewed as a non-parametric learning method.
k nearest neighbor algorithm is suitable for processing classification or regression problems. For classification problems it works with discrete values whereas for regression problems it works with continuous values. Before classification, distance must be defined, and there are many choices for common distance measures.
This is a commonly used distance measure, suitable for real-valued vectors. The formula measures the straight-line distance between a query point and another point.
Euclidean distance formula
This is also a popular distance measure that measures the absolute value between two points.
Manhattan distance formula
This distance measure is a generalized form of the Euclidean and Manhattan distance measures.
Minkowski distance formula
This technique is often used with Boolean or string vectors to identify points where the vectors do not match. Therefore, it is also called overlap measure.
Hamming distance formula
In order to determine which data points are closest to a given query point, it is necessary to calculate the distance between the query point and Distances between other data points. These distance measures help form decision boundaries that divide query points into different regions.
The above is the detailed content of Application of commonly used distance measurement methods in K nearest neighbor algorithm. For more information, please follow other related articles on the PHP Chinese website!