Home > php教程 > PHP开发 > body text

Use PHP extension trie_filter to filter Chinese sensitive words

黄舟
Release: 2017-03-22 14:29:13
Original
2705 people have browsed it

1. Install libiconv, which is a dependency of libdatrie

wget http://ftp.gnu.org/pub/gnu/libiconv/libiconv-1.14.tar.gz 
tar zxvf libiconv-1.14.tar.gz 
cd libiconv-1.14
./configure 
make 
make install
Copy after login

2. Install: libdatrie (http://linux.thai.net/~thep/datrie/datrie.html#Download)

tar zxf libdatrie-0.2.4.tar.gz   
cd libdatrie-0.2.4  
./configure --prefix=/usr/local   
make   
make install
Copy after login

Compilation error trietool.c:125: undefined reference to `libiconv'

The solution is: ./configure LDFLAGS=-L/usr/local/lib LIBS=-liconv

3. Install the trie_filter extension

Since the official trie_filter extension does not support Chinese very well, I found an extension on git that was rewritten on the official extension and has been tested without any problems

The installation method is as follows:

https://github.com/wulijun/php-ext-trie-filter Download the source code package here

phpize
./configure --with-php-config=/usr/local/bin/php-config 
make
make install
Copy after login

4. Modify the php.ini file and add the trie_filter extension :extension=trie_filter.so, restart PHP.

Check phpinfo and find that the trie_filter extension is available, as shown in the following figure:

Use PHP extension trie_filter to filter Chinese sensitive words

5. Generate a dictionary for word detection, because it is not included in the source code package downloaded above. It has a command to generate a dictionary, so you also need to download the official source code package

(https://code.google.com/p/as3chat/downloads/detail?name=trie_filter-2011-03-21.tar. gz)

tar zxf trie_filter-2011.03.21.tar.gz   
cd trie_filter-2011.03.21    
gcc -o dpp dpp.c -ldatrie // 生成dpp命令用语编译词典 
./dpp words.txt words.dic  //将words.txt 编译成trie_filter使用的词典 words.txt中每个词占一行
Copy after login

Error when generating dictionary: ./dpp: error while loading shared libraries: libdatrie.so.1: cannot open shared object file: No such file or directory

Solution : Execute

ldconfig
Copy after login

and then execute

./dpp words.txt words.dic
Copy after login

6. Test:

<!--?php 
/**
 * trie_filter 敏感词过滤示例
 * 
 **/ 
   
// 载入词典,成功返回一个 Trie_Filter 资源句柄,失败返回 NULL 
$file = trie_filter_load(&#39;./words.dic&#39;); 
var_dump($file); 
$str1 = &#39;今天利用trie_filter做敏感词过滤示例&#39;; 
$str2 = &#39;今天利用trie_filter做过滤示例&#39;; 
// 检测文本中是否含有词典中定义的敏感词(假设敏感词设定为:‘敏感词’) 
$res1 = trie_filter_search_all($file, $str1);  // 一次把所有的敏感词都检测出来
$res2 = trie_filter_search($file, $str2);// 每次只检测一个敏感词 
var_dump($res1); 
echo "<br/-->"; 
var_dump($res2);
trie_filter_free($file); //最后别忘记调用free
Copy after login

It is recommended to use php 5.3.3 or above version, I Using 5.3.3

The above is the content of using PHP extension trie_filter to filter Chinese sensitive words. For more related content, please pay attention to the PHP Chinese website (www.php.cn)!

Related articles:

An efficient sensitive word filtering method (PHP)

php sensitive word filtering uses a third-party extension trie_filter

PHP implements filtering sensitive words in message messages

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Recommendations
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
About us Disclaimer Sitemap
php.cn:Public welfare online PHP training,Help PHP learners grow quickly!