Home > Backend Development > PHP Tutorial > PHP simple Chinese word segmentation system (1/2)_PHP tutorial

PHP simple Chinese word segmentation system (1/2)_PHP tutorial

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB
Release: 2016-07-20 11:08:30
Original
1034 people have browsed it

PHP simple Chinese word segmentation system structure: first word hash table, Trie index tree node advantages: in word segmentation, there is no need to predict the length of the word to be queried, and it is matched word by word along the tree chain. Disadvantages: The construction and maintenance are complicated, there are many word branches, and a certain amount of space is wasted

PHP tutorial simple Chinese word segmentation system

Structure: first word hash table, trie index tree node
Advantages: word segmentation , there is no need to predict the length of the query word, and it is matched word by word along the tree chain.
Disadvantages: The construction and maintenance are complicated, there are many word branches, and a certain amount of space is wasted
* @version 0.1
* @todo constructed a general dictionary algorithm and wrote a simple word segmentation
* @author shjuto@gmail.com
* trie dictionary tree
*
*/

class trie
{
private $trie;

Function __construct()
{
$trie = array('children' => array(),'isword'=>false);
}

/**
*/ FUNCTION & Setword ($ word = '')
{
$ TrieNode = & $ this- & gt; Trie;
for ($ i = 0; $ i & lt; strlen (STRLEN $word);$i++)
                                                                                                                                                                                                                                                   through 🎜>                                                                                                             ​​​If($i == strlen($word)-1)
                                                        
                                                                                                                                  $trienode = &$trienode['children'][$character];
           }
                                                                                                                                                                               

        /**
* Determine whether it is a dictionary word
*
* @param string $word
* @return bool true/false
*/
        function & isword($word)
        {
                $trienode = &$this->trie;
                for($i = 0;$i < strlen($word);$i++)
                {
                        $character = $word[$i];
                        if(!isset($trienode['children'][$character]))
                        {
                                return false;
                        }
                        else
                        {
                                //判断词结束
                                if($i == (strlen($word)-1) && $trienode['children'][$character]['isword'] == true)
                                {
                                        return true;
                                }
                                elseif($i == (strlen($word)-1) && $trienode['children'][$character]['isword'] == false)
                                {
                                        return false;
                                }
                                $trienode = &$trienode['children'][$character];       
                        }
                }
        }


                                                                                                                                                               tree = $this->trie;
                            $find = array(); In AAB, you need to go back to
$ word = '';
for ($ i = 0; $ i & lt; $ TextLen; $ i ++)
{                        if(isset($trienode['children'][$text[$i]]))
                        {
                                $word = $word .$text[$i];
                                $trienode = $trienode['children'][$text[$i]];
                                if($prenode == false)
                                {
                                        $wordrootposition = $i;
                                }
                                $prenode = true;
                                if($trienode['isword'])
                                {
                                        $find[] = array('position'=>$wordrootposition,'word' =>$word);
                                }
                        }
                        else
                        {
                                $trienode = $tree;
                                $word = '';
                                if($prenode)
                                {
                                        $i = $i -1;
                                        $prenode = false;
                                }
                        }
                }
Return $find;
}
}

1 2

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/444871.htmlTechArticlephp simple Chinese word segmentation system structure: first word hash table, Trie index tree node advantages: in word segmentation, no need Predict the length of the query word and match it word by word along the tree chain. Disadvantages: Construction and maintenance comparison...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Latest Issues
php data acquisition?
From 1970-01-01 08:00:00
0
0
0
PHP extension intl
From 1970-01-01 08:00:00
0
0
0
How to learn php well
From 1970-01-01 08:00:00
0
0
0
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template