Table of Contents
手把手教你做关键词匹配项目(搜索引擎)---- 第二十天,教你做第二十天
Home php教程 php手册 手把手教你做关键词匹配项目(搜索引擎)---- 第二十天,教你做第二十天

手把手教你做关键词匹配项目(搜索引擎)---- 第二十天,教你做第二十天

Jun 13, 2016 am 09:25 AM
- Do Key words match Hand in hand search engine teach you project

手把手教你做关键词匹配项目(搜索引擎)---- 第二十天,教你做第二十天

客串:屌丝的坑人表单神器、数据库那点事儿

面向对象升华:面向对象的认识----新生的初识、面向对象的番外----思想的梦游篇(1)、面向对象的认识---如何找出类

负载均衡:负载均衡----概念认识篇、负载均衡----实现配置篇(Nginx)

吐槽:有人反馈了这样的一个信息,说该文章越到最后越难看懂,跟不上节奏,也有的人说小帅帅的能力怎么飙的那么快,是不是我比较蠢。也有的直接看文字,不看代码,代码太难懂了。

其实我这几天也一直在思考这个问题,所以没办法就去开展了一些面向对象的课程,希望对那些跟不上的有些帮助。其实说真的,读者不反馈的话,我只好按照我认为的小帅帅去开展课程了。

 

第二十天

起点:手把手教你做关键词匹配项目(搜索引擎)---- 第一天

回顾:手把手教你做关键词匹配项目(搜索引擎)---- 第十九天

话说小帅帅为了解决那个分词算法写出了初版,他拿给于老大看的时候,被要求重写了。

原因有以下几点:

    1. 如何测试,测试数据呢?

    2. Splitter是不是做了太多事情?

    3. 连衣裙xxl裙连衣裙这种 有重复词组怎么办?

小帅帅拿着这些问题,开始重构。

首先他发现了这点,中文、英文和中英文的判断,以及长度的计算,他把这个写成了类:

<?<span>php

</span><span>class</span><span> UTF8 {

    </span><span>/*</span><span>*
     * 检测是否utf8
     * @param $char
     * @return bool
     </span><span>*/</span>
    <span>public</span> <span>static</span> <span>function</span> is(<span>$char</span><span>){
        </span><span>return</span> (<span>preg_match</span>("/^([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){1}/",<span>$char</span>) ||
            <span>preg_match</span>("/([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){1}$/",<span>$char</span>) ||
            <span>preg_match</span>("/([".<span>chr</span>(228)."-".<span>chr</span>(233)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}[".<span>chr</span>(128)."-".<span>chr</span>(191)."]{1}){2,}/",<span>$char</span><span>));
    }

    </span><span>/*</span><span>*
     * 计算utf8字的个数
     * @param $char
     * @return float|int
     </span><span>*/</span>
    <span>public</span> <span>static</span> <span>function</span> length(<span>$char</span><span>) {

        </span><span>if</span>(self::is(<span>$char</span><span>))
            </span><span>return</span> <span>ceil</span>(<span>strlen</span>(<span>$char</span>)/3<span>);
        </span><span>return</span> <span>strlen</span>(<span>$char</span><span>);
    }

    </span><span>/*</span><span>*
     * 检测是否为词组
     * @param $word
     * @return bool
     </span><span>*/</span>
    <span>public</span> <span>static</span> <span>function</span> isPhrase(<span>$word</span><span>){

        </span><span>if</span>(self::length(<span>$word</span>)<=1<span>)
            </span><span>return</span> <span>false</span><span>;
        </span><span>return</span> <span>true</span><span>;
    }

}</span>
Copy after login

小帅帅又考虑到词典的来源有可能来自多个地方,比如我给的测试数据,这样不就是可以解决于老大说到无法测试的问题了,小帅帅把词典的来源抽成了个类,类如下:

<?<span>php

</span><span>class</span><span> DBSegmentation {

    </span><span>public</span> <span>$cid</span><span>;

    </span><span>/*</span><span>*
     * 获取类目下分词的词组数据
     * @return array
     </span><span>*/</span>
    <span>public</span> <span>function</span><span> transferDictionary(){
        </span><span>$ret</span> = <span>array</span><span>();
        </span><span>$sql</span> = "select word from category_linklist where cid='<span>$this</span>->cid'"<span>;
        </span><span>$words</span> = DB::makeArray(<span>$sql</span><span>);
        </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$strWords</span><span>){
            </span><span>$words</span> = <span>explode</span>(",",<span>$strWords</span><span>);

            </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$word</span><span>){
                </span><span>if</span>(UTF8::isPhrase(<span>$word</span><span>)){
                    </span><span>$ret</span>[] = <span>$word</span><span>;
                }
            }
        }
        </span><span>return</span> <span>$ret</span><span>;
    }
} 

</span><span>class</span><span> TestSegmentation {
    
    </span><span>public</span> <span>function</span><span> transferDictionary(){
        </span><span>$words</span> = <span>array</span><span>(
            </span>"连衣裙,连衣",
            "XXL,xxl,加大,加大码",
            "X码,中码",
            "外套,衣,衣服,外衣,上衣",
            "女款,女士,女生,女性"<span>
        );

        </span><span>$ret</span> = <span>array</span><span>();
        </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$strWords</span><span>){
            </span><span>$words</span> = <span>explode</span>(",",<span>$strWords</span><span>);

            </span><span>foreach</span>(<span>$words</span> <span>as</span> <span>$word</span><span>){
                </span><span>if</span>(UTF8::isPhrase(<span>$word</span><span>)){
                    </span><span>$ret</span>[] = <span>$word</span><span>;
                }
            }
        }
        </span><span>return</span> <span>$ret</span><span>;

    }
}</span>
Copy after login

那么Splitter 就专心分词把,代码如下:

<span>class</span><span> Splitter {

    </span><span>public</span> <span>$keyword</span><span>;
    </span><span>private</span> <span>$dictionary</span> = <span>array</span><span>();

    </span><span>public</span> <span>function</span> setDictionary(<span>$dictionary</span> = <span>array</span><span>()){

        </span><span>usort</span>(<span>$dictionary</span>,<span>function</span>(<span>$a</span>,<span>$b</span><span>){
            </span><span>return</span> (UTF8::length(<span>$a</span>)>UTF8::length(<span>$b</span>))?1:-1<span>;
        });

        </span><span>$this</span>->dictionary = <span>$dictionary</span><span>;
    }

    </span><span>public</span> <span>function</span><span> getDictionary(){
        </span><span>return</span> <span>$this</span>-><span>dictionary;
    }

    </span><span>/*</span><span>*
     * 把关键词拆分成词组或者单词
     * @return KeywordEntity $keywordEntity
     </span><span>*/</span>
    <span>public</span> <span>function</span> <span>split</span><span>(){

        </span><span>$remainKeyword</span> = <span>$this</span>-><span>keyword;

        </span><span>$keywordEntity</span> = <span>new</span> KeywordEntity(<span>$this</span>-><span>keyword);

        </span><span>foreach</span>(<span>$this</span>->dictionary <span>as</span> <span>$phrase</span><span>){

            </span><span>$matchTimes</span> = <span>preg_match_all</span>("/<span>$phrase</span>/",<span>$remainKeyword</span>,<span>$matches</span><span>);
            </span><span>if</span>(<span>$matchTimes</span>>0<span>){
                </span><span>$keywordEntity</span>->addElement(<span>$phrase</span>,<span>$matchTimes</span><span>);

                </span><span>$remainKeyword</span> = <span>str_replace</span>(<span>$phrase</span>,"::",<span>$remainKeyword</span><span>);
            }
        }

        </span><span>$remainKeywords</span> = <span>explode</span>("::",<span>$remainKeyword</span><span>);
        </span><span>foreach</span>(<span>$remainKeywords</span> <span>as</span> <span>$splitWord</span><span>){

            </span><span>if</span>(!<span>empty</span>(<span>$splitWord</span><span>)){
                </span><span>$keywordEntity</span>->addElement(<span>$splitWord</span><span>);
            }
        }

        </span><span>return</span> <span>$keywordEntity</span><span>;

    }

}


</span><span>class</span><span> KeywordEntity {

    </span><span>public</span> <span>$keyword</span><span>;
    </span><span>public</span> <span>$elements</span> = <span>array</span><span>();

    </span><span>public</span> <span>function</span> __construct(<span>$keyword</span><span>){
        </span><span>$this</span>->keyword = <span>$keyword</span><span>;
    }

    </span><span>public</span> <span>function</span> addElement(<span>$word</span>,<span>$times</span>=1<span>){

        </span><span>if</span>(<span>isset</span>(<span>$this</span>->elements[<span>$word</span><span>])){
            </span><span>$this</span>->elements[<span>$word</span>]->times += <span>$times</span><span>;
        }</span><span>else</span>
            <span>$this</span>->elements[] = <span>new</span> KeywordElement(<span>$word</span>,<span>$times</span><span>);
    }

    </span><span>/*</span><span>*
     * @desc 计算UTF8字符串权重
     * @param string $word
     * @return float
     </span><span>*/</span>
    <span>public</span> <span>function</span> calculateWeight(<span>$word</span><span>)
    {
        </span><span>$element</span> = <span>$this</span>->elements[<span>$word</span><span>];
        </span><span>return</span> <span>ROUND</span>(<span>strlen</span>(<span>$element</span>->word)*<span>$element</span>->times / <span>strlen</span>(<span>$this</span>->keyword), 3<span>);
    }
}


</span><span>class</span><span> KeywordElement {
    </span><span>public</span> <span>$word</span><span>;
    </span><span>public</span> <span>$times</span><span>;

    </span><span>public</span> <span>function</span> __construct(<span>$word</span>,<span>$times</span><span>){
        </span><span>$this</span>->word = <span>$word</span><span>;
        </span><span>$this</span>->times = <span>$times</span><span>;
    }
}</span>
Copy after login

他把算权重的也丢给了一个类专门去处理。

小帅帅写完之后,也顺手写了测试实例:

<?<span>php

</span><span>$segmentation</span> = <span>new</span><span> TestSegmentation();

</span><span>$splitter</span> = <span>new</span><span> Splitter();
</span><span>$splitter</span>->setDictionary(<span>$segmentation</span>-><span>transferDictionary());
</span><span>$splitter</span>->keyword = "连衣裙xxl裙连衣裙"<span>;
</span><span>$keywordEntity</span> = <span>$splitter</span>-><span>split</span><span>();

</span><span>var_dump</span>(<span>$keywordEntity</span>);
Copy after login

 

这样就算你的算法怎么改,它也能从容面对了。

 

小帅帅理解了这个,当你觉得类做的事情太多的时候,可以考虑下单一职责原则。

 

单一职责原则:一个类,只有一个引起它变化的原因。应该只有一个职责。每一个职责都是变化的一个轴线,如果一个类有一个以上的职责,这些职责就耦合在了一起。这会导致脆弱的设计。当一个职责发生变化时,可能会影响其它的职责。另外,多个职责耦合在一起,会影响复用性。例如:要实现逻辑和界面的分离。【来自百度百科】

 

当于老大提到是不是有其他分词算法的时候,我们能不能拿来用,小帅帅很高兴,因为现在它的代码是多么美好。

小帅帅如何玩转第三方分词扩展,请继续关注下回分解:手把手教你做关键词匹配项目(搜索引擎)---- 第二十一天

 

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to adjust aperture on Xiaomi Mi 14 Ultra? How to adjust aperture on Xiaomi Mi 14 Ultra? Mar 19, 2024 am 09:01 AM

Adjusting the aperture size has a crucial impact on the photo effect. Xiaomi Mi 14 Ultra provides unprecedented flexibility in camera aperture adjustment. In order to allow everyone to adjust the aperture smoothly and realize the free adjustment of the aperture size, the editor here brings you a detailed tutorial on how to set the aperture on Xiaomi Mi 14Ultra. How to adjust the aperture on Xiaomi Mi 14Ultra? Start the camera, switch to &quot;Professional Mode&quot;, and select the main camera - W lens. Click on the aperture, open the aperture dial, A is automatic, select f/1.9 or f/4.0 as needed.

Can AI conquer Fermat's last theorem? Mathematician gave up 5 years of his career to turn 100 pages of proof into code Can AI conquer Fermat's last theorem? Mathematician gave up 5 years of his career to turn 100 pages of proof into code Apr 09, 2024 pm 03:20 PM

Fermat's last theorem, about to be conquered by AI? And the most meaningful part of the whole thing is that Fermat’s Last Theorem, which AI is about to solve, is precisely to prove that AI is useless. Once upon a time, mathematics belonged to the realm of pure human intelligence; now, this territory is being deciphered and trampled by advanced algorithms. Image Fermat's Last Theorem is a "notorious" puzzle that has puzzled mathematicians for centuries. It was proven in 1993, and now mathematicians have a big plan: to recreate the proof using computers. They hope that any logical errors in this version of the proof can be checked by a computer. Project address: https://github.com/riccardobrasca/flt

How to set Chinese in Cheat Engine? How to set Chinese in ce modifier How to set Chinese in Cheat Engine? How to set Chinese in ce modifier Mar 18, 2024 pm 01:20 PM

Ce Modifier (CheatEngine) is a game modification tool dedicated to modifying and editing game memory. So how to set Chinese in CheatEngine? Next, the editor will tell you how to set Chinese in Ce Modifier. I hope it can Help friends in need. In the new software we download, it can be confusing to find that the interface is not in Chinese. Even though this software was not developed in China, there are ways to convert it to the Chinese version. This problem can be solved by simply applying the Chinese patch. After downloading and installing the CheatEngine (ce modifier) ​​software, open the installation location and find the folder named languages, as shown in the figure below

How to update Honor MagicOS 8.0 on Honor 90 GT? How to update Honor MagicOS 8.0 on Honor 90 GT? Mar 18, 2024 pm 06:46 PM

Honor 90GT is a cost-effective smartphone with excellent performance and excellent user experience. However, sometimes we may encounter some problems, such as how to update Honor MagicOS8.0 on Honor 90GT? This step may be different for different mobile phones and different models. So, let us discuss how to upgrade the system correctly. How to update Honor MagicOS 8.0 on Honor 90GT? According to news on February 28, Honor today pushed the MagicOS8.0 public beta update for its three mobile phones 90GT/100/100Pro. The package version number is 8.0.0.106 (C00E106R3P1) 1. Ensure your Honor The battery of the 90GT is fully charged;

DaVinci Resolve Studio now supports AV1 hardware encoding for AMD graphics cards DaVinci Resolve Studio now supports AV1 hardware encoding for AMD graphics cards Mar 06, 2024 pm 10:04 PM

Recent news, lackMagic has launched the 18.5PublicBeta2 public beta update of the DaVinci Resolve Studio video editing software, bringing AV1 encoding support to AMD Radeon graphics cards. After updating to the latest version, AMD graphics card users will be able to take advantage of hardware acceleration for AV1 encoding in DaVinci Resolve Studio. Although the official does not specify the supported architectures or models, it is expected that all AMD graphics card users can try this feature. In 2018, AOMedia released a new video coding standard AV1 (AOMediaVideoCodec1.0). AV1 is produced by a number of

Teach you how to use the new advanced features of iOS 17.4 'Stolen Device Protection' Teach you how to use the new advanced features of iOS 17.4 'Stolen Device Protection' Mar 10, 2024 pm 04:34 PM

Apple rolled out the iOS 17.4 update on Tuesday, bringing a slew of new features and fixes to iPhones. The update includes new emojis, and EU users will also be able to download them from other app stores. In addition, the update also strengthens the control of iPhone security and introduces more "Stolen Device Protection" setting options to provide users with more choices and protection. "iOS17.3 introduces the "Stolen Device Protection" function for the first time, adding extra security to users' sensitive information. When the user is away from home and other familiar places, this function requires the user to enter biometric information for the first time, and after one hour You must enter information again to access and change certain data, such as changing your Apple ID password or turning off stolen device protection.

Planet Mojo: Building a Web3 game metaverse from the auto-chess game Mojo Melee Planet Mojo: Building a Web3 game metaverse from the auto-chess game Mojo Melee Mar 14, 2024 pm 05:55 PM

Popular Metaverse game projects founded in the last crypto cycle are accelerating their expansion. On March 4, PlanetMojo, the Web3 game metaverse platform, announced a number of important developments in its game ecology, including the announcement of the upcoming parkour game GoGoMojo, the launch of the new season "Way of War" in the flagship auto-chess game MojoMelee, and the celebration of the new The first ETH series "WarBannerNFT" launched this season in cooperation with MagicEden. In addition, PlanetMojo also revealed that they plan to launch Android and iOS mobile versions of MojoMelee later this year. This project will be launched at the end of 2021. After nearly two years of hard work in the bear market, it will soon be completed.

Simplify file upload processing with Golang functions Simplify file upload processing with Golang functions May 02, 2024 pm 06:45 PM

Answer: Yes, Golang provides functions that simplify file upload processing. Details: The MultipartFile type provides access to file metadata and content. The FormFile function gets a specific file from the form request. The ParseForm and ParseMultipartForm functions are used to parse form data and multipart form data. Using these functions simplifies the file processing process and allows developers to focus on business logic.

See all articles