Home php教程 php手册 【代码】PHP 分析函数similar

【代码】PHP 分析函数similar

Jun 06, 2016 pm 07:44 PM
php code function analyze calculate

PHP 有个计算两个字符串相度的函数similar_text(),可以得出一个百分比来表示两个字符串的相程度。效果如下: 1similar_text('aaaa', 'aaaa', $percent);2var_dump($percent);3//float(100)4similar_text('aaaa', 'aaaabbbb', $percent);5var_dump($percent)

PHP有个计算两个字符串相似度的函数similar_text(),可以得出一个百分比来表示两个字符串的相似程度。效果如下:


1
similar_text('aaaa', 'aaaa', $percent);
2
var_dump($percent);
3
//float(100)
4
similar_text('aaaa', 'aaaabbbb', $percent);
5
var_dump($percent);
6
//float(66.666666666667)
7
similar_text('abcdef', 'aabcdefg', $percent);
8
var_dump($percent);
9
//float(85.714285714286)
Copy after login


利用这个函数,可以用来做模糊搜索的功能,或者其他需要模糊匹配的功能。最近我在验证码识别研究中的特征匹配一步上涉及到了这个函数。


但这个函数具体使用了怎样的算法呢?我研究了他的底层实现,总结为三步:


(1)找出两个字符串中相同部分最长的一段;
(2)再用同样的方法在剩下的两段中分别找出相同部分最长的一段,以此类推,直到没有任何相同部分;
(3)相似度 = 所有相同部分的长度之和 * 2 / 两个字符串的长度之和;


我研究的源代码版本是PHP 5.4.6,相关的代码位于文件php-5.4.6/ext/standard/string.c的第2951~3031行。以下是我加过注释后源代码。

01
//找出两个字符串中相同部分最长的一段
02
static void php_similar_str(const char *txt1, int len1, const char *txt2, int len2, int *pos1, int *pos2, int *max)
03
{
04
    char *p, *q;
05
    char *end1 = (char *) txt1 + len1;
06
    char *end2 = (char *) txt2 + len2;
07
    int l;
08
 
09
    *max = 0;
10
    //以第一个字符串为基准开始遍历
11
    for (p = (char *) txt1; p  *max) {
18
                *max = l;
19
                *pos1 = p - txt1;
20
                *pos2 = q - txt2;
21
            }
22
        }
23
    }
24
}
25
 
26
//计算两个字符串的相同部分的总长度
27
static int php_similar_char(const char *txt1, int len1, const char *txt2, int len2)
28
{
29
    int sum;
30
    int pos1, pos2, max;
31
 
32
    //找出两个字符串相同部分最长的一段
33
    php_similar_str(txt1, len1, txt2, len2, &pos1, &pos2, &max);
34
    //这里是对sum的初始赋值,也是对max值的判断
35
    //如果max为零,表示两个字符串没有任何相同的字符,也就会跳出if
36
    if ((sum = max)) {
37
        //对前半段递归,相同段长度累加
38
        if (pos1 && pos2) {
39
            sum += php_similar_char(txt1, pos1,
40
                                    txt2, pos2);
41
        }
42
        //对后半段递归,相同段长度累加
43
        if ((pos1 + max  2) {
68
        convert_to_double_ex(percent);
69
    }
70
 
71
    //如果两个字符串长度都为0,返回0
72
    if (t1_len + t2_len == 0) {
73
        if (ac > 2) {
74
            Z_DVAL_PP(percent) = 0;
75
        }
76
 
77
        RETURN_LONG(0);
78
    }
79
 
80
    //调用上面的函数,计算两个字符串的相似度
81
    sim = php_similar_char(t1, t1_len, t2, t2_len);
82
 
83
    //可以看到percent的计算公式
84
    if (ac > 2) {
85
        Z_DVAL_PP(percent) = sim * 200.0 / (t1_len + t2_len);
86
    }
87
 
88
    RETURN_LONG(sim);
89
}
Copy after login


另外,PHP还提供了另外一个计算字符串相似度的函数levenshtein(),通过计算两个字符串的编辑距离来表示字符串相似度,这也是一种很常见的算法。levenshtein()的性能相比similar_text()要好一些,因为通过前面的代码分析可以看到,similar_text()的复杂度是O(n^3),n表示最长字符串的长度,而levenshtein()的复杂度为O(m*n),m与n分别为两个字符串的长度。


以上是本文关于PHP 分析函数similar_text()的原理,希望本文对广大php开发者有所帮助,感谢阅读本文。更多有关php技术问题欢迎加群探讨:304224365 ,验证码:csl,不写验证不予通过。

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. How to Fix Audio if You Can't Hear Anyone
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Chat Commands and How to Use Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian PHP 8.4 Installation and Upgrade guide for Ubuntu and Debian Dec 24, 2024 pm 04:42 PM

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

How To Set Up Visual Studio Code (VS Code) for PHP Development How To Set Up Visual Studio Code (VS Code) for PHP Development Dec 20, 2024 am 11:31 AM

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c

7 PHP Functions I Regret I Didn't Know Before 7 PHP Functions I Regret I Didn't Know Before Nov 13, 2024 am 09:42 AM

If you are an experienced PHP developer, you might have the feeling that you’ve been there and done that already.You have developed a significant number of applications, debugged millions of lines of code, and tweaked a bunch of scripts to achieve op

How do you parse and process HTML/XML in PHP? How do you parse and process HTML/XML in PHP? Feb 07, 2025 am 11:57 AM

This tutorial demonstrates how to efficiently process XML documents using PHP. XML (eXtensible Markup Language) is a versatile text-based markup language designed for both human readability and machine parsing. It's commonly used for data storage an

Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Explain JSON Web Tokens (JWT) and their use case in PHP APIs. Apr 05, 2025 am 12:04 AM

JWT is an open standard based on JSON, used to securely transmit information between parties, mainly for identity authentication and information exchange. 1. JWT consists of three parts: Header, Payload and Signature. 2. The working principle of JWT includes three steps: generating JWT, verifying JWT and parsing Payload. 3. When using JWT for authentication in PHP, JWT can be generated and verified, and user role and permission information can be included in advanced usage. 4. Common errors include signature verification failure, token expiration, and payload oversized. Debugging skills include using debugging tools and logging. 5. Performance optimization and best practices include using appropriate signature algorithms, setting validity periods reasonably,

PHP Program to Count Vowels in a String PHP Program to Count Vowels in a String Feb 07, 2025 pm 12:12 PM

A string is a sequence of characters, including letters, numbers, and symbols. This tutorial will learn how to calculate the number of vowels in a given string in PHP using different methods. The vowels in English are a, e, i, o, u, and they can be uppercase or lowercase. What is a vowel? Vowels are alphabetic characters that represent a specific pronunciation. There are five vowels in English, including uppercase and lowercase: a, e, i, o, u Example 1 Input: String = "Tutorialspoint" Output: 6 explain The vowels in the string "Tutorialspoint" are u, o, i, a, o, i. There are 6 yuan in total

Explain late static binding in PHP (static::). Explain late static binding in PHP (static::). Apr 03, 2025 am 12:04 AM

Static binding (static::) implements late static binding (LSB) in PHP, allowing calling classes to be referenced in static contexts rather than defining classes. 1) The parsing process is performed at runtime, 2) Look up the call class in the inheritance relationship, 3) It may bring performance overhead.

What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? What are PHP magic methods (__construct, __destruct, __call, __get, __set, etc.) and provide use cases? Apr 03, 2025 am 12:03 AM

What are the magic methods of PHP? PHP's magic methods include: 1.\_\_construct, used to initialize objects; 2.\_\_destruct, used to clean up resources; 3.\_\_call, handle non-existent method calls; 4.\_\_get, implement dynamic attribute access; 5.\_\_set, implement dynamic attribute settings. These methods are automatically called in certain situations, improving code flexibility and efficiency.

See all articles