


PHP similar_text(), levenshtein(), lcs() support Chinese character version, _PHP tutorial
PHP similar_text(), levenshtein(), lcs() support the Chinese character version,
PHP’s native similar_text() function, levenshtein() function does not support Chinese characters well , I wrote one myself
similar_text() Chinese character version
<span> 1</span> <?<span>php </span><span> 2</span> <span>//</span><span>拆分字符串 </span> <span> 3</span> <span>function</span> split_str(<span>$str</span><span>) { </span><span> 4</span> <span>preg_match_all</span>("/./u", <span>$str</span>, <span>$arr</span><span>); </span><span> 5</span> <span>return</span> <span>$arr</span>[0<span>]; </span><span> 6</span> <span> } </span><span> 7</span> <span> 8</span> <span>//</span><span>相似度检测 </span> <span> 9</span> <span>function</span> similar_text_cn(<span>$str1</span>, <span>$str2</span><span>) { </span><span>10</span> <span>$arr_1</span> = <span>array_unique</span>(split_str(<span>$str1</span><span>)); </span><span>11</span> <span>$arr_2</span> = <span>array_unique</span>(split_str(<span>$str2</span><span>)); </span><span>12</span> <span>$similarity</span> = <span>count</span>(<span>$arr_2</span>) - <span>count</span>(<span>array_diff</span>(<span>$arr_2</span>, <span>$arr_1</span><span>)); </span><span>13</span> <span>14</span> <span>return</span> <span>$similarity</span><span>; </span><span>15</span> }
levenshtein() Chinese character version
<span> 1</span> <?<span>php </span><span> 2</span> <span>//</span><span>拆分字符串 </span> <span> 3</span> <span>function</span> mbStringToArray(<span>$string</span>, <span>$encoding</span> = 'UTF-8'<span>) { </span><span> 4</span> <span>$arrayResult</span> = <span>array</span><span>(); </span><span> 5</span> <span> 6</span> <span>while</span> (<span>$iLen</span> = mb_strlen(<span>$string</span>, <span>$encoding</span><span>)) { </span><span> 7</span> <span>array_push</span>(<span>$arrayResult</span>, mb_substr(<span>$string</span>, 0, 1, <span>$encoding</span><span>)); </span><span> 8</span> <span>$string</span> = mb_substr(<span>$string</span>, 1, <span>$iLen</span>, <span>$encoding</span><span>); </span><span> 9</span> <span> } </span><span>10</span> <span>11</span> <span>return</span> <span>$arrayResult</span><span>; </span><span>12</span> <span> } </span><span>13</span> <span>14</span> <span>//</span><span>编辑距离 </span> <span>15</span> <span>function</span> levenshtein_cn(<span>$str1</span>, <span>$str2</span>, <span>$costReplace</span> = 1, <span>$encoding</span> = 'UTF-8'<span>) { </span><span>16</span> <span>$count_same_letter</span> = 0<span>; </span><span>17</span> <span>$d</span> = <span>array</span><span>(); </span><span>18</span> <span>19</span> <span>$mb_len1</span> = mb_strlen(<span>$str1</span>, <span>$encoding</span><span>); </span><span>20</span> <span>$mb_len2</span> = mb_strlen(<span>$str2</span>, <span>$encoding</span><span>); </span><span>21</span> <span>22</span> <span>$mb_str1</span> = mbStringToArray(<span>$str1</span>, <span>$encoding</span><span>); </span><span>23</span> <span>$mb_str2</span> = mbStringToArray(<span>$str2</span>, <span>$encoding</span><span>); </span><span>24</span> <span>25</span> <span>for</span> (<span>$i1</span> = 0; <span>$i1</span> <= <span>$mb_len1</span>; <span>$i1</span>++<span>) { </span><span>26</span> <span>$d</span>[<span>$i1</span>] = <span>array</span><span>(); </span><span>27</span> <span>$d</span>[<span>$i1</span>][0] = <span>$i1</span><span>; </span><span>28</span> <span> } </span><span>29</span> <span>30</span> <span>for</span> (<span>$i2</span> = 0; <span>$i2</span> <= <span>$mb_len2</span>; <span>$i2</span>++<span>) { </span><span>31</span> <span>$d</span>[0][<span>$i2</span>] = <span>$i2</span><span>; </span><span>32</span> <span> } </span><span>33</span> <span>34</span> <span>for</span> (<span>$i1</span> = 1; <span>$i1</span> <= <span>$mb_len1</span>; <span>$i1</span>++<span>) { </span><span>35</span> <span>for</span> (<span>$i2</span> = 1; <span>$i2</span> <= <span>$mb_len2</span>; <span>$i2</span>++<span>) { </span><span>36</span> <span>//</span><span> $cost = ($str1[$i1 - 1] == $str2[$i2 - 1]) ? 0 : 1; </span> <span>37</span> <span>if</span> (<span>$mb_str1</span>[<span>$i1</span> - 1] === <span>$mb_str2</span>[<span>$i2</span> - 1<span>]) { </span><span>38</span> <span>$cost</span> = 0<span>; </span><span>39</span> <span>$count_same_letter</span>++<span>; </span><span>40</span> } <span>else</span><span> { </span><span>41</span> <span>$cost</span> = <span>$costReplace</span>; <span>//</span><span>替换 </span> <span>42</span> <span> } </span><span>43</span> <span>44</span> <span>$d</span>[<span>$i1</span>][<span>$i2</span>] = <span>min</span>(<span>$d</span>[<span>$i1</span> - 1][<span>$i2</span>] + 1, <span>//</span><span>插入 </span> <span>45</span> <span>$d</span>[<span>$i1</span>][<span>$i2</span> - 1] + 1, <span>//</span><span>删除 </span> <span>46</span> <span>$d</span>[<span>$i1</span> - 1][<span>$i2</span> - 1] + <span>$cost</span><span>); </span><span>47</span> <span> } </span><span>48</span> <span> } </span><span>49</span> <span>50</span> <span>return</span> <span>$d</span>[<span>$mb_len1</span>][<span>$mb_len2</span><span>]; </span><span>51</span> <span>//</span><span>return array('distance' => $d[$mb_len1][$mb_len2], 'count_same_letter' => $count_same_letter); </span> <span>52</span> }
Longest common subsequence LCS()
<span> 1</span> <?<span>php </span><span> 2</span> <span>//</span><span>最长公共子序列英文版 </span> <span> 3</span> <span>function</span> LCS_en(<span>$str_1</span>, <span>$str_2</span><span>) { </span><span> 4</span> <span>$len_1</span> = <span>strlen</span>(<span>$str_1</span><span>); </span><span> 5</span> <span>$len_2</span> = <span>strlen</span>(<span>$str_2</span><span>); </span><span> 6</span> <span>$len</span> = <span>$len_1</span> > <span>$len_2</span> ? <span>$len_1</span> : <span>$len_2</span><span>; </span><span> 7</span> <span> 8</span> <span>$dp</span> = <span>array</span><span>(); </span><span> 9</span> <span>for</span> (<span>$i</span> = 0; <span>$i</span> <= <span>$len</span>; <span>$i</span>++<span>) { </span><span>10</span> <span>$dp</span>[<span>$i</span>] = <span>array</span><span>(); </span><span>11</span> <span>$dp</span>[<span>$i</span>][0] = 0<span>; </span><span>12</span> <span>$dp</span>[0][<span>$i</span>] = 0<span>; </span><span>13</span> <span> } </span><span>14</span> <span>15</span> <span>for</span> (<span>$i</span> = 1; <span>$i</span> <= <span>$len_1</span>; <span>$i</span>++<span>) { </span><span>16</span> <span>for</span> (<span>$j</span> = 1; <span>$j</span> <= <span>$len_2</span>; <span>$j</span>++<span>) { </span><span>17</span> <span>if</span> (<span>$str_1</span>[<span>$i</span> - 1] == <span>$str_2</span>[<span>$j</span> - 1<span>]) { </span><span>18</span> <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span> - 1] + 1<span>; </span><span>19</span> } <span>else</span><span> { </span><span>20</span> <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] > <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1] ? <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] : <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1<span>]; </span><span>21</span> <span> } </span><span>22</span> <span> } </span><span>23</span> <span> } </span><span>24</span> <span>25</span> <span>return</span> <span>$dp</span>[<span>$len_1</span>][<span>$len_2</span><span>]; </span><span>26</span> <span> } </span><span>27</span> <span>28</span> <span>//</span><span>拆分字符串 </span> <span>29</span> <span>function</span> mbStringToArray(<span>$string</span>, <span>$encoding</span> = 'UTF-8'<span>) { </span><span>30</span> <span>$arrayResult</span> = <span>array</span><span>(); </span><span>31</span> <span>32</span> <span>while</span> (<span>$iLen</span> = mb_strlen(<span>$string</span>, <span>$encoding</span><span>)) { </span><span>33</span> <span>array_push</span>(<span>$arrayResult</span>, mb_substr(<span>$string</span>, 0, 1, <span>$encoding</span><span>)); </span><span>34</span> <span>$string</span> = mb_substr(<span>$string</span>, 1, <span>$iLen</span>, <span>$encoding</span><span>); </span><span>35</span> <span> } </span><span>36</span> <span>37</span> <span>return</span> <span>$arrayResult</span><span>; </span><span>38</span> <span> } </span><span>39</span> <span>40</span> <span>//</span><span>最长公共子序列中文版 </span> <span>41</span> <span>function</span> LCS_cn(<span>$str1</span>, <span>$str2</span>, <span>$encoding</span> = 'UTF-8'<span>) { </span><span>42</span> <span>$mb_len1</span> = mb_strlen(<span>$str1</span>, <span>$encoding</span><span>); </span><span>43</span> <span>$mb_len2</span> = mb_strlen(<span>$str2</span>, <span>$encoding</span><span>); </span><span>44</span> <span>45</span> <span>$mb_str1</span> = mbStringToArray(<span>$str1</span>, <span>$encoding</span><span>); </span><span>46</span> <span>$mb_str2</span> = mbStringToArray(<span>$str2</span>, <span>$encoding</span><span>); </span><span>47</span> <span>48</span> <span>$len</span> = <span>$mb_len1</span> > <span>$mb_len2</span> ? <span>$mb_len1</span> : <span>$mb_len2</span><span>; </span><span>49</span> <span>50</span> <span>$dp</span> = <span>array</span><span>(); </span><span>51</span> <span>for</span> (<span>$i</span> = 0; <span>$i</span> <= <span>$len</span>; <span>$i</span>++<span>) { </span><span>52</span> <span>$dp</span>[<span>$i</span>] = <span>array</span><span>(); </span><span>53</span> <span>$dp</span>[<span>$i</span>][0] = 0<span>; </span><span>54</span> <span>$dp</span>[0][<span>$i</span>] = 0<span>; </span><span>55</span> <span> } </span><span>56</span> <span>57</span> <span>for</span> (<span>$i</span> = 1; <span>$i</span> <= <span>$mb_len1</span>; <span>$i</span>++<span>) { </span><span>58</span> <span>for</span> (<span>$j</span> = 1; <span>$j</span> <= <span>$mb_len2</span>; <span>$j</span>++<span>) { </span><span>59</span> <span>if</span> (<span>$mb_str1</span>[<span>$i</span> - 1] == <span>$mb_str2</span>[<span>$j</span> - 1<span>]) { </span><span>60</span> <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span> - 1] + 1<span>; </span><span>61</span> } <span>else</span><span> { </span><span>62</span> <span>$dp</span>[<span>$i</span>][<span>$j</span>] = <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] > <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1] ? <span>$dp</span>[<span>$i</span> - 1][<span>$j</span>] : <span>$dp</span>[<span>$i</span>][<span>$j</span> - 1<span>]; </span><span>63</span> <span> } </span><span>64</span> <span> } </span><span>65</span> <span> } </span><span>66</span> <span>67</span> <span>return</span> <span>$dp</span>[<span>$mb_len1</span>][<span>$mb_len2</span><span>]; </span><span>68</span> }

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

In this chapter, we will understand the Environment Variables, General Configuration, Database Configuration and Email Configuration in CakePHP.

PHP 8.4 brings several new features, security improvements, and performance improvements with healthy amounts of feature deprecations and removals. This guide explains how to install PHP 8.4 or upgrade to PHP 8.4 on Ubuntu, Debian, or their derivati

To work with date and time in cakephp4, we are going to make use of the available FrozenTime class.

To work on file upload we are going to use the form helper. Here, is an example for file upload.

In this chapter, we are going to learn the following topics related to routing ?

CakePHP is an open-source framework for PHP. It is intended to make developing, deploying and maintaining applications much easier. CakePHP is based on a MVC-like architecture that is both powerful and easy to grasp. Models, Views, and Controllers gu

Validator can be created by adding the following two lines in the controller.

Visual Studio Code, also known as VS Code, is a free source code editor — or integrated development environment (IDE) — available for all major operating systems. With a large collection of extensions for many programming languages, VS Code can be c
