In machine learning applications, similarity measurement is an indicator used to evaluate the similarity of two sample objects. Usually represented using distance measures, effective distance measures can improve the performance of machine learning models.
However, in terms of numerical relationship, the similarity measure and the distance measure are exactly the opposite.
The similarity measure is usually expressed as a numerical value. The higher the value, the more similar the data samples are. Generally, a number between 0 and 1 is used for conversion, where 0 indicates low similarity, that is, the data objects are not similar; and 1 indicates high similarity, indicating that the data objects are very similar.
Distance measure indicates that the similarity of data objects is inversely proportional to the distance value.
Euclidean Distance
That is, Euclidean distance Rider metric, which is the minimum distance between two points, is used by most machine learning algorithms to measure the similarity of observations.
Manhattan Distance
Manhattan distance is the total difference between two places in all dimensions. Because it is almost impossible to move in a straight line in the city, buildings are grouped into a grid that blocks straight paths. The term "Manhattan distance" is often used to refer to the distance between two city blocks.
Minkowski Distance
is the generalized form of Euclidean distance and Manhattan distance, defining two distance between observations.
Hamming Distance
The Hamming distance measures the similarity of two strings of the same length. Hamming distance is the number of points by which corresponding characters differ between two strings of the same length.
Cosine Distance(Cosine Similarity)
This indicator is widely used in text mining, natural language processing and information retrieval systems to measure two Similarity between given documents.
Chebyshev Distance(Chebyshev Distance)
The Chebyshev distance between two nD observations or vectors is equal to the data sample coordinates The maximum absolute value of the change. In the two-dimensional world, the Chebyshev distance between data points can be determined as the sum of the absolute differences of their two-dimensional coordinates.
Mahalanobis Distance
is mainly used for multivariate statistical testing to measure the distance between data points and distributions.
Chi-square Distance(Chi-square Distance)
Chi-square distance is often used in computer vision while performing texture analysis to find normalization The similarity between histograms is called "histogram matching".
Pearson Correlation
The Pearson correlation coefficient quantifies the strength of the linear monotonic relationship between two attributes and measures Whether the two data sets are on a line.
Spearman Correlation Coefficient
The Spearman correlation coefficient is a non-parametric indicator that measures the dependence of two variables. It uses a monotonic equation to evaluate the correlation between two statistical variables. Spearman correlation coefficient is often used for hypothesis testing.
The above is the detailed content of On the relationship between similarity measures and distance measures. For more information, please follow other related articles on the PHP Chinese website!