SCWS is the acronym for Simple Chinese Word Segmentation (ie: Simple Chinese Word Segmentation System).
1. Download the classes officially provided by scws (the fourth version of pscws is used here)
http://www.xunsearch.com/scws/down/pscws4-20081221.tar.bz2
Download the XDB dictionary file (used here is the utf8 simplified Chinese dictionary package)
http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2
2. Unzip the scws class Pscws.class.php (here I put pscws4 The .class.php file name has been changed to pscws.class.php) and XDB_R.class.php (here I have changed the xdb_r.class.php file name to uppercase XDB_R.class.php) and placed them under the ThinkPHPLibraryOrgUtil directory.
3. Then modify Pscws.class.php
Add the namespace
1 namespace Org\Util;
Change the name of the class to Pscws
把require_once (dirname(__FILE__) . '/XBD_R.class.php');这段代码删除掉。
Modify XDB_R.class.php
Add the namespace
namespace Org\Util;
4. Unzip the XDB dictionary file
Create a new dict folder in the Publicadmin directory, then unzip the dict.utf8.xdb of the XDB dictionary file to the word directory, and then unzip rules.utf8.ini under etc in the scws class Put it under this directory.
5. Add a line of constant definition code to the entry file (actually the path to define the dictionary file and configuration file)
define("CONF_PATH", dirname(__FILE__)."/Public/admin/dict/");
6. Create a private method in the IndexController.class.php controller for other methods to call
/** * 中文分词 * @params string $title 需要分词的语句 * @params int $num 分词个数,默认不用填写 **/ private function get_tags($title,$num=null){ $pscws = new \Org\Util\Pscws('utf8'); $pscws->set_dict(CONF_PATH . 'dict.utf8.xdb'); $pscws->set_rule(CONF_PATH . 'rules.utf8.ini'); $pscws->set_ignore(true); $pscws->send_text($title); $words = $pscws->get_tops($num); $pscws->close(); $tags = array(); foreach ($words as $val) { $tags[] = $val['word']; } return implode(',', $tags); } /** * 商品搜索结果页 **/ public function search(){ $rzt=$this->get_tags("新款 牛漆皮小尖头直跟高跟单鞋910033 灰羊猄(7.31发货) 39"); print_r($rzt); }
The displayed result is:
漆皮,单鞋,尖头,高跟,新款,发货,910033,7.31,39
The above introduces Thinkphp32 to use scws Chinese word segmentation to extract keywords, including the require content. I hope it will be helpful to friends who are interested in PHP tutorials.