How to Calculate Cosine Similarity of Two Text Strings in Pure Python?

Susan Sarandon
Release: 2024-10-30 08:05:02
Original
840 people have browsed it

How to Calculate Cosine Similarity of Two Text Strings in Pure Python?

How to Calculate Cosine Similarity of Two Text Strings without External Libraries

In text analysis, cosine similarity is a measure of the similarity between two texts based on their shared vocabulary. While external libraries can be used to calculate this measure, it's also possible to implement a simple pure-Python function:

<code class="python">import math
import re
from collections import Counter

WORD = re.compile(r"\w+")

def get_cosine(vec1, vec2):
    intersection = set(vec1.keys()) & set(vec2.keys())
    numerator = sum([vec1[x] * vec2[x] for x in intersection])

    sum1 = sum([vec1[x] ** 2 for x in list(vec1.keys())])
    sum2 = sum([vec2[x] ** 2 for x in list(vec2.keys())])
    denominator = math.sqrt(sum1) * math.sqrt(sum2)

    if not denominator:
        return 0.0
    else:
        return float(numerator) / denominator

def text_to_vector(text):
    words = WORD.findall(text)
    return Counter(words)</code>
Copy after login

This function takes two vectors vec1 and vec2 as input and calculates their cosine similarity. Here's how to use it to compare two text strings text1 and text2:

<code class="python">text1 = "This is a foo bar sentence ."
text2 = "This sentence is similar to a foo bar sentence ."

vector1 = text_to_vector(text1)
vector2 = text_to_vector(text2)

cosine = get_cosine(vector1, vector2)

print("Cosine:", cosine)</code>
Copy after login

Output:

Cosine: 0.861640436855
Copy after login

This indicates that the two text strings are highly similar.

The above is the detailed content of How to Calculate Cosine Similarity of Two Text Strings in Pure Python?. For more information, please follow other related articles on the PHP Chinese website!

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Articles by Author
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!