PHP 实现敏感词 / 停止词过滤（附敏感词库） - php 常见问题集锦

Home > List of blog posts > PHP 实现敏感词 / 停止词过滤（附敏感词库）

Blogger Information

Blog 91

fans 0

comment 0

visits 203457

Special Recommendation

More>

Related recommendations

Related Tutorials

Popular Recommendations

Latest courses

The latest ThinkPHP 5.1 world premiere video tutorial (60 days to become a PHP expert online training course)

1421606 times of learning
Collection
PHP introductory tutorial one: Learn PHP in one week

4265857 times of learning
Collection
JAVA Beginner's Video Tutorial

2518144 times of learning
Collection

Latest Downloads

More>

Web Effects

Website Source Code

Website Materials

Front End Template

PHP 实现敏感词 / 停止词过滤（附敏感词库）

何澤小生的博客

Original

3071 people have browsed it

敏感词、文字过滤是一个网站必不可少的功能，如何设计一个好的、高效的过滤算法是非常有必要的。在实现敏感词过滤的算法中，我们必须要减少运算，而 DFA 在 DFA 算法中几乎没有什么计算，有的只是状态的转换。所以想更高效的进行敏感词的过滤，需要使用 DFA 算法。

整理过滤函数代码如下：

/**
 * Notes: [DoFilterWords 过滤字符中敏感词]
 * Author HeZe
 * Date 2021/1/6 14:48
 * @param $list     过滤词一维数组  ['小明', '小红', '大白', '小白', '小黑', 'me', 'you'];
 * @param $string   输入文字       likeyou小白喜欢小黑爱着的大黄
 * @return string   过滤后文字     like**喜欢*爱着的大黄
 */
function DoFilterWords($list, $string, $symbol = '*')
{
    $count = 0;             // 违规词的个数
    $sensitiveWord = '';    // 违规词
    $stringAfter = $string;      // 替换后的内容
    $pattern = "/".implode("|",$list)."/i"; // 定义正则表达式

    if(preg_match_all($pattern, $string, $matches)) { // 匹配到了结果
        $patternList = $matches[0];     // 匹配到的数组
        $count = count($patternList);
        $sensitiveWord = implode(',', $patternList);    // 敏感词数组转字符串
        //把匹配到的数组进行合并，替换使用
        $replaceArray = array_combine($patternList,array_fill(0, count($patternList), $symbol));
        $stringAfter = strtr($string, $replaceArray); //结果替换
    }

    $log = "原句为 [ {$string} ]<br/>";

    if($count==0) {
        $log .= "暂未匹配到敏感词！";
    } else {
        $log .= "匹配到 [ {$count} ]个敏感词：[ {$sensitiveWord} ]<br/>"."替换后为：[ {$stringAfter} ]";
    }

    return $log;
}

使用方法

// 过滤词库
$list = ['小明', '小红', '大白', '小白', '小黑', 'me', 'you'];

// 输入文字
$string = "likeyou小白喜欢小黑爱着的大黄";

// 调用函数
$res = DoFilterWords($list , $string , '*');
echo $res;

// 输出结果
原句为 [ likeyou小白喜欢小黑爱着的大黄 ]
匹配到 [ 3 ]个敏感词：[ you,小白,小黑 ]
替换后为：[ like**喜欢*爱着的大黄 ]

最后附上敏感词、停止词词库：https://gitee.com/zehe/stopwords

转载请注明出处~~~~

Statement of this Website

The copyright of this blog article belongs to the blogger. Please specify the address when reprinting! If there is any infringement or violation of the law, please contact admin@php.cn Report processing!

All comments Speak rationally on civilized internet, please comply with News Comment Service Agreement

0 comments

Author's latest blog post

常用的php开发工具有哪些？

2018-01-21 19:55:29