There are two general processing methods: 1. Remove directly through the JS of the editor. 2. After submitting to the background, directly use the program to remove invalid tags. Below I will share a processing method through PHP. The success rate may not be 100%. I also saw this program on the PHP official website, so I pasted it here.
Copy code The code is as follows:
function ClearHtml($content,$allowtags='') {
mb_regex_encoding('UTF-8');
//replace MS special characters first
$search = array('/'/u', '/'/u', '/"/u', ' /"/u', '/—/u');
$replace = array(''', ''', '"', '"', '-');
$content = preg_replace ($search, $replace, $content);
//make sure _all_ html entities are converted to the plain ascii equivalents - it appears
//in some MS headers, some html entities are encoded and some aren' t
$content = html_entity_decode($content, ENT_QUOTES, 'UTF-8');
//try to strip out any C style comments first, since these, embedded in html comments, seem to
/ /prevent strip_tags from removing html comments (MS Word introduced combination)
if(mb_stripos($content, '/*') !== FALSE){
$content = mb_eregi_replace('#/*.*?* /#s', '', $content, 'm');
}
//introduce a space into any arithmetic expressions that could be caught by strip_tags so that they won't be
// '<1' becomes '< 1'(note: somewhat application specific)
$content = preg_replace(array('/<([0-9]+)/'), array('< $1 '), $content);
$content = strip_tags($content, $allowtags);
//eliminate extraneous whitespace from start and end of line, or anywhere there are two or more spaces, convert it to one
$content = preg_replace(array('/^ss+/', '/ss+$/', '/ss+/u'), array('', '', ' '), $content) ;
//strip out inline css and simplify style tags
$search = array('#<(strong|b)[^>]*>(.*?)(strong| b)>#isu', '#<(em|i)[^>]*>(.*?)(em|i)>#isu', '#]*>(.*?)#isu');
$replace = array('$2', '$2< ;/i>', '$1');
$content = preg_replace($search, $replace, $content);
//on some of the ?newer MS Word exports, where you get conditionals of the form 'if gte mso 9', etc., it appears
//that whatever is in one of the html comments prevents strip_tags from eradicating the html comment that contains
//some MS Style Definitions - this last bit gets rid of any leftover comments */
$num_matches = preg_match_all("//isu', '', $content);
}
return $content;
}
Test results:
Copy code The code is as follows:
< ?php
$content = '
"Youban Outdoor Travel" - Make traveling a habit!
As you become increasingly busy, do you want to take a vacation for yourself? If you are focused on work, do you still remember the last time you exercised? Youban Outdoor Travel gives you a different travel experience: give your heart freedom and you will see scenery everywhere! ';
echo ClearHtml($content,'
');
/*
Result obtained:
"Youban Outdoor Travel"--make traveling a habit!
As you become increasingly busy, do you want to take a vacation for yourself? If you are focused on work, do you still remember the last time you exercised? Youban Outdoor Travel gives you a different travel experience: give your heart freedom and you will see scenery everywhere!
*/
?>
http://www.bkjia.com/PHPjc/326185.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/326185.htmlTechArticleThere are two general processing methods: 1. Direct removal through the JS of the editor. 2. After submitting to the background, directly use the program to remove invalid tags. Below I will share a processing method through PHP...