We introduced a php program that filters some special characters before. Let’s upgrade this sensitive word filtering function to be more powerful. With it, we are no longer afraid of adding spaces or other punctuation marks in the middle of sensitive words.
As long as users can speak, advertisements or other sensitive words may appear, so a sensitive word filtering mechanism must be added to maintain the "purity" of the site.入 Filtering mechanism: Add PHP keyword regular matching
// $ STR is user data
Function wordfilter ($ STR) {/*
Sensitive words Storage method:
1: store in TXT In the file (general method)
2: Store in cache (better method)
*/
$words = getSensitiveWords();
foreach ($words as $word)
{
$preg_letter = '/^[A-Za-z]+$/';
if (preg_match($preg_letter, $ Str)/{// Matching Chinese
$ Str = StrTolower ($ Str);
$ Pattern_1 = '/([^A-Za-Z]+'. $ Word. '[— Za-Z] +)|([^A-Za-z]+' . $word . 's+)|(s+' . $word . '[^A-Za-z]+)|(^' . $word . '[ ^A-Za-z]+)|([^A-Za-z]+' . $word.'$)/';
; |(^' . $word . '$)/';
. }
else
{//Match English strings, case insensitive
$pattern = '/s*' . ;
}
}}}}}
Existing problems:
If you simply add keyword matching, the user's counter -filtering method is diverse, including adding spaces or other punctuation symbols in the middle.
Example:
Sensitive word: buckle
After user processing:
buckle buckle
buckle, buckle
buckle @ buckle
buckle 1 buckle
At this time, the regular matching of the code may not match.
Solution:
First remove all punctuation marks and some special characters from the user data, and then conduct sensitive word judgment.
Code:
$flag_arr=array('?','!','¥','(',')',':',''',''','"','"' ,'《','》',',','...','. ',',','nbsp',']','[','~'); /s/','',preg_replace("/[[:punct:]]/",'',strip_tags(html_entity_decode(str_replace($flag_arr,'',$content),ENT_QUOTES,'UTF-8')) ));
$content_filter is the processed user data, and then perform wordFilter($content_filter) filtering operation