Home > Backend Development > Python Tutorial > How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?

How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?

DDD
Release: 2024-10-30 07:48:28
Original
626 people have browsed it

How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?

Calculating Cosine Similarity Between Sentence Strings

Given two strings representing sentences, there is a need to calculate their cosine similarity without using external libraries. Let us explore a Python implementation to achieve this.

The cosine similarity measures the angle between two vectors, typically representing documents or sentences in a vector space. A high cosine similarity value indicates that the sentences are similar, while a low value suggests they differ.

Step 1: Tokenization and Vectorization

To calculate cosine similarity, we must convert the sentences into vectors. We use a simple word-based tokenizer that splits the sentences into words and counts their occurrences:

<code class="python">import re
from collections import Counter

WORD = re.compile(r"\w+")

def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)</code>
Copy after login

Step 2: Calculating Cosine Similarity

The cosine similarity formula is:

cosine = (Numerator) / (Denominator)
Copy after login

where:

  • Numerator is the dot product of the two vectors.
  • Denominator is the product of the magnitudes of the two vectors.
<code class="python">import math

def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator</code>
Copy after login

Step 3: Example Usage

Using the above functions, we can calculate the cosine similarity between two sentences:

<code class="python">text1 = "This is a foo bar sentence."
text2 = "This sentence is similar to a foo bar sentence."

vector1 = text_to_vector(text1)
vector2 = text_to_vector(text2)

cosine = get_cosine(vector1, vector2)

print("Cosine:", cosine)</code>
Copy after login

The output will show a high cosine similarity value, indicating that the sentences are similar.

The above is the detailed content of How to Calculate Cosine Similarity Between Sentences in Python Without External Libraries?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template