목차
PHP改进计算字符串相似度的函数similar_text()、levenshtein(),levenshtein
(100分)[php]写几个你熟悉的字符串处理函数!
对于php的levenshtein函数可以否给个通俗易懂的解释,手册看不懂
php教程 php手册 PHP改进计算字符串相似度的函数similar_text()、levenshtein(),levenshtein

PHP改进计算字符串相似度的函数similar_text()、levenshtein(),levenshtein

Jun 13, 2016 am 09:22 AM

PHP改进计算字符串相似度的函数similar_text()、levenshtein(),levenshtein

similar_text()中文汉字版

复制代码 代码如下:


          //拆分字符串 
     function split_str($str) { 
       preg_match_all("/./u", $str, $arr); 
       return $arr[0]; 
     } 
      
     //相似度检测 
     function similar_text_cn($str1, $str2) { 
       $arr_1 = array_unique(split_str($str1)); 
       $arr_2 = array_unique(split_str($str2)); 
       $similarity = count($arr_2) - count(array_diff($arr_2, $arr_1)); 
        
       return $similarity; 
     }  

levenshtein()中文汉字版
 

复制代码 代码如下:


          //拆分字符串 
     function mbStringToArray($string, $encoding = 'UTF-8') { 
         $arrayResult = array(); 
         while ($iLen = mb_strlen($string, $encoding)) { 
             array_push($arrayResult, mb_substr($string, 0, 1, $encoding)); 
             $string = mb_substr($string, 1, $iLen, $encoding); 
         } 
         return $arrayResult; 
     } 
     //编辑距离 
     function levenshtein_cn($str1, $str2, $costReplace = 1, $encoding = 'UTF-8') { 
         $count_same_letter = 0; 
         $d = array(); 
         $mb_len1 = mb_strlen($str1, $encoding); 
         $mb_len2 = mb_strlen($str2, $encoding); 
         $mb_str1 = mbStringToArray($str1, $encoding); 
         $mb_str2 = mbStringToArray($str2, $encoding); 
         for ($i1 = 0; $i1              $d[$i1] = array(); 
             $d[$i1][0] = $i1; 
         } 
         for ($i2 = 0; $i2              $d[0][$i2] = $i2; 
         } 
         for ($i1 = 1; $i1              for ($i2 = 1; $i2                  // $cost = ($str1[$i1 - 1] == $str2[$i2 - 1]) ? 0 : 1; 
                 if ($mb_str1[$i1 - 1] === $mb_str2[$i2 - 1]) { 
                     $cost = 0; 
                     $count_same_letter++; 
                 } else { 
                     $cost = $costReplace; //替换 
                 } 
                 $d[$i1][$i2] = min($d[$i1 - 1][$i2] + 1, //插入 
                 $d[$i1][$i2 - 1] + 1, //删除 
                 $d[$i1 - 1][$i2 - 1] + $cost); 
             } 
         } 
         return $d[$mb_len1][$mb_len2]; 
         //return array('distance' => $d[$mb_len1][$mb_len2], 'count_same_letter' => $count_same_letter); 
     }  


 
最长公共子序列LCS()

 

复制代码 代码如下:


                  //最长公共子序列英文版 
         function LCS_en($str_1, $str_2) { 
           $len_1 = strlen($str_1); 
           $len_2 = strlen($str_2); 
           $len = $len_1 > $len_2 ? $len_1 : $len_2; 
           $dp = array(); 
           for ($i = 0; $i              $dp[$i] = array(); 
             $dp[$i][0] = 0; 
             $dp[0][$i] = 0; 
           } 
           for ($i = 1; $i              for ($j = 1; $j                if ($str_1[$i - 1] == $str_2[$j - 1]) { 
                 $dp[$i][$j] = $dp[$i - 1][$j - 1] + 1; 
               } else { 
                 $dp[$i][$j] = $dp[$i - 1][$j] > $dp[$i][$j - 1] ? $dp[$i - 1][$j] : $dp[$i][$j - 1]; 
               } 
             } 
           } 
           return $dp[$len_1][$len_2]; 
         } 
         //拆分字符串 
         function mbStringToArray($string, $encoding = 'UTF-8') { 
           $arrayResult = array(); 
           while ($iLen = mb_strlen($string, $encoding)) { 
             array_push($arrayResult, mb_substr($string, 0, 1, $encoding)); 
             $string = mb_substr($string, 1, $iLen, $encoding); 
           } 
           return $arrayResult; 
         } 
         //最长公共子序列中文版 
         function LCS_cn($str1, $str2, $encoding = 'UTF-8') { 
           $mb_len1 = mb_strlen($str1, $encoding); 
           $mb_len2 = mb_strlen($str2, $encoding); 
           $mb_str1 = mbStringToArray($str1, $encoding); 
           $mb_str2 = mbStringToArray($str2, $encoding); 
           $len = $mb_len1 > $mb_len2 ? $mb_len1 : $mb_len2; 
           $dp = array(); 
           for ($i = 0; $i              $dp[$i] = array(); 
             $dp[$i][0] = 0; 
             $dp[0][$i] = 0; 
           } 
           for ($i = 1; $i              for ($j = 1; $j                if ($mb_str1[$i - 1] == $mb_str2[$j - 1]) { 
                 $dp[$i][$j] = $dp[$i - 1][$j - 1] + 1; 
               } else { 
                 $dp[$i][$j] = $dp[$i - 1][$j] > $dp[$i][$j - 1] ? $dp[$i - 1][$j] : $dp[$i][$j - 1]; 
               } 
             } 
           } 
           return $dp[$mb_len1][$mb_len2]; 
         }

(100分)[php]写几个你熟悉的字符串处理函数!

addcslashes addslashes bin2hex chop chr chunk_split convert_cyr_string cyrillic
convert_uudecode convert_uuencode count_chars crc32 crc32 crypt echo explode

fprintf get_html_translation_table hebrev

hebrevc
hex2bin — Decodes a hexadecimally encoded binary string
html_entity_decode — Convert all HTML entities to their applicable characters
htmlentities — Convert all applicable characters to HTML entities
htmlspecialchars_decode — Convert special HTML entities back to characters
htmlspecialchars — Convert special characters to HTML entities
implode — Join array elements with a string
join

lcfirst — Make a string's first character lowercase
levenshtein — Calculate Levenshtein distance between two strings
localeconv — Get numeric formatting information
ltrim — Strip whitespace (or other characters) from the beginning of a string
md5_file
metaphone — Calculate the metaphone key of a string
money_format — Formats a number as a currency string
nl_langinfo — Query language and locale information
nl2br

number_format — Format a number with grouped thousands
ord

parse_str

print

printf

quoted_printable_decode — Convert a quoted-printable string to an 8 bit string
quoted_printable_encode — Convert a 8 bit string to a quoted-printable string
quotemeta — Quote meta characters
rtrim
setlocale — Set locale information
sha1_file

sha1

soundex — Calculate the soundex key of a string
sprintf — Return a formatted string
sscanf — Parses input from a string according to a ......余下全文>>
 

对于php的levenshtein函数可以否给个通俗易懂的解释,手册看不懂

W3School的解释:
levenshtein() 函数返回两个字符串之间的 Levenshtein 距离。
Levenshtein 距离,又称编辑距离,指的是两个字符串之间,由一个转换成另一个所需的最少编辑操作次数。许可的编辑操作包括将一个字符替换成另一个字符,插入一个字符,删除一个字符。
例如把 kitten 转换为 sitting:
sitten (k→s)
sittin (e→i)
sitting (→g)
levenshtein() 函数给每个操作(替换、插入和删除)相同的权重。不过,您可以通过设置可选的 insert、replace、delete 参数,来定义每个操作的代价。

说明:“代价”就是权重。楼主的例子中,Hello World→ello World,需要“删除”“H”,也就是采用了第五个参数,对应的权重是30,所以返回30.
 

본 웹사이트의 성명
본 글의 내용은 네티즌들의 자발적인 기여로 작성되었으며, 저작권은 원저작자에게 있습니다. 본 사이트는 이에 상응하는 법적 책임을 지지 않습니다. 표절이나 침해가 의심되는 콘텐츠를 발견한 경우 admin@php.cn으로 문의하세요.

핫 AI 도구

Undresser.AI Undress

Undresser.AI Undress

사실적인 누드 사진을 만들기 위한 AI 기반 앱

AI Clothes Remover

AI Clothes Remover

사진에서 옷을 제거하는 온라인 AI 도구입니다.

Undress AI Tool

Undress AI Tool

무료로 이미지를 벗다

Clothoff.io

Clothoff.io

AI 옷 제거제

AI Hentai Generator

AI Hentai Generator

AI Hentai를 무료로 생성하십시오.

뜨거운 도구

메모장++7.3.1

메모장++7.3.1

사용하기 쉬운 무료 코드 편집기

SublimeText3 중국어 버전

SublimeText3 중국어 버전

중국어 버전, 사용하기 매우 쉽습니다.

스튜디오 13.0.1 보내기

스튜디오 13.0.1 보내기

강력한 PHP 통합 개발 환경

드림위버 CS6

드림위버 CS6

시각적 웹 개발 도구

SublimeText3 Mac 버전

SublimeText3 Mac 버전

신 수준의 코드 편집 소프트웨어(SublimeText3)