Blogger Information
Blog 15
fans 0
comment 0
visits 20697
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template
PHP中文替换
李子的博客
Original
2406 people have browsed it

<?php

   //定义编码 

  header( 'Content-Type:text/html;charset=utf-8 '); 

   

   $words=array('我','你','他');

   $content="测一测我是不是违禁词";

   $banned=generateRegularExpression($words);

  //检查违禁词

   $res_banned=check_words($banned,$content);

   write_html($content,$res_banned);

   

  /**

     * @describe 数组生成正则表达式

     * @param array $words

     * @return string

     */

    function generateRegularExpression($words)

    {

        $regular = implode('|', array_map('preg_quote', $words));

        return "/$regular/i";

    }

    /**

     * @describe 字符串 生成正则表达式

     * @param array $words

     * @return string

     */

    function generateRegularExpressionString($string){

          $str_arr[0]=$string;

          $str_new_arr=  array_map('preg_quote', $str_arr);

          return $str_new_arr[0];

    }

    /**

     * 检查敏感词

     * @param $banned

     * @param $string

     * @return bool|string

     */

    function check_words($banned,$string)

    {    $match_banned=array();

        //循环查出所有敏感词

 

        $new_banned=strtolower($banned);

        $i=0;

        do{

            $matches=null;

            if (!empty($new_banned) && preg_match($new_banned, $string, $matches)) {

                $isempyt=empty($matches[0]);

                if(!$isempyt){

                    $match_banned = array_merge($match_banned, $matches);

                    $matches_str=strtolower(generateRegularExpressionString($matches[0]));

                    $new_banned=str_replace("|".$matches_str."|","|",$new_banned);

                    $new_banned=str_replace("/".$matches_str."|","/",$new_banned);

                    $new_banned=str_replace("|".$matches_str."/","/",$new_banned);

                }

            }

            $i++;

            if($i>20){

                $isempyt=true;

                break;

            }

        }while(count($matches)>0 && !$isempyt);

 

        //查出敏感词

        if($match_banned){

            return $match_banned;

        }

        //没有查出敏感词

        return array();

    }

     

      /**

     * 打印到页面上

     * @param $filepath

     * @param $res_mingan

     * @param $res_banned

     */

    function write_html($content,$res_banned){

       

            print_r($content);

            if($res_banned){

                print_r("  <font color='red'>违禁词(".count($res_banned)."):</font>".implode('|',$res_banned));

            }

            echo "<br>";

        

    }

1、匹配中文

$str = "中文“;
preg_match_all("/[\x{4e00}-\x{9fa5}]+/u",$str,$match);

 

2、替换中文:

在所在的php文件里,要加上

mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8");

这样才能支持多字节进行模式匹配。详细介绍:http://blog.chinaunix.net/uid-20279807-id-1711213.html

3、php提供了四个替换函数,分别是str_replace,preg_replace,mb_ereg_replace,ereg_replace(在php7.1已经摒弃掉)

   在替换中文时,发现用preg_replace替换中文最合适.

   str_replace 不支持正则表达式,不能完全匹配,导致局部字段被替换。例如: $str = "模块一 模块一断电",$str = str_replace("模块一","module1",$str);,导致"模块一断电"被替换成"module1断电"。

mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )  支持$pattern,$replacement 以数组的方式进行查找替换,但数组过多时,进行搜索匹配,耗CPU严重。

 mb_ereg_replace 支持正则表达式,但不用分隔符//进行匹配,但使用mb_ereg_replace,发现有些中文匹配不了。具体原因暂不清楚。


Statement of this Website
The copyright of this blog article belongs to the blogger. Please specify the address when reprinting! If there is any infringement or violation of the law, please contact admin@php.cn Report processing!
All comments Speak rationally on civilized internet, please comply with News Comment Service Agreement
0 comments
Author's latest blog post