php cannot load scws because it was not installed successfully. The solution is: 1. Find "scws-1.2.1.tar.bz2"; 2. Install through "make install"; 3. Install scws PHP extension; 4. Install the vocabulary library.
The operating environment of this article: Windows 7 system, PHP version 5.4, Dell G3 computer.
What should I do if scws cannot be loaded in php? Installation and usage examples of the open source PHP Chinese word segmentation system SCWS
1. Introduction to SCWS
SCWS is Simple Chinese Word The acronym for Segmentation (ie: Simple Chinese word segmentation system).
This is a mechanical Chinese word segmentation engine based on word frequency dictionary, which can basically correctly divide a whole paragraph of Chinese text into words. Word is the smallest morpheme unit in Chinese, but when written, words are not separated by spaces like English. Therefore, how to segment words accurately and quickly has always been a difficult problem in Chinese word segmentation.
SCWS is developed in pure C language and does not rely on any external library functions. It can directly use dynamic link libraries to embed applications. Supported Chinese encodings include GBK, UTF-8, etc. In addition, a PHP extension module is provided to quickly and easily use the word segmentation function in PHP.
There are not many innovative elements in the word segmentation algorithm. It uses the word frequency dictionary collected by itself, supplemented by certain proper names, names of people, place names, digital ages and other rule recognition to achieve basic word segmentation. The range test accuracy is between 90% and 95%, which can basically meet the needs of some small search engines, keyword extraction and other occasions. The first prototype version was released in late 2005.
SCWS was developed by hightman and released as open source under the BSD license. The source code is hosted on github.
2. scws installation
The code is as follows:
# wget -c http://www.xunsearch.com/scws/down/scws-1.2.1.tar.bz2 # tar jxvf scws-1.2.1.tar.bz2 # cd scws-1.2.1 # ./configure --prefix=/usr/local/scws # make && make install
3. scws PHP extension installation
The code is as follows:
# cd ./phpext # phpize # ./configure --with-php-config=/usr/local/php5410/bin/php-config # make && make install # echo "[scws]" >> /usr/local/php5410/etc/php.ini # echo "extension = scws.so" >> /usr/local/php5410/etc/php.ini # echo "scws.default.charset = utf-8" >> /usr/local/php5410/etc/php.ini # echo "scws.default.fpath = /usr/local/scws/etc/" >> /usr/local/php5410/etc/php.ini
4 , Thesaurus installation
code is as follows:
# wget http://www.xunsearch.com/scws/down/scws-dict-chs-utf8.tar.bz2 # tar jxvf scws-dict-chs-utf8.tar.bz2 -C /usr/local/scws/etc/ # chown www:www /usr/local/scws/etc/dict.utf8.xdb
5. PHP example code. You can read the SCWS official API description in detail
The code is as follows:
//实例化分词插件核心类 $so = scws_new(); //设置分词时所用编码 $so->set_charset('utf-8'); //设置分词所用词典(此处使用utf8的词典) $so->set_dict('/usr/local/scws/etc/dict.utf8.xdb'); //设置分词所用规则 $so->set_rule('/usr/local/scws/etc/rules.utf8.ini '); //分词前去掉标点符号 $so->set_ignore(true); //是否复式分割,如“中国人”返回“中国+人+中国人”三个词。 $so->set_multi(true); //设定将文字自动以二字分词法聚合 $so->set_duality(true); //要进行分词的语句 $so->send_text(“欢迎来到火星时代IT开发”); //获取分词结果,如果提取高频词用get_tops方法 while ($tmp = $so->get_result()) { print_r($tmp); } $so->close();
Return array result description:
The code is as follows:
word _string_ 词本身 idf _float_ 逆文本词频 off _int_ 该词在原文本路的位置 attr _string_ 词性
Recommended learning:《PHP video tutorial》
The above is the detailed content of What should I do if php cannot load scws?. For more information, please follow other related articles on the PHP Chinese website!