PHP function to clean html generated by word

WBOY
Release: 2016-07-25 08:57:42
Original
1259 people have browsed it
在使用FCKedior或ueditor时,从word中粘贴过来的内容,会产生很多额外的标签,本文分享一个清理这种多出的html代码的函数,有需要的朋友参考下。

说明: 使用FCKedior或ueditor时,从word中粘贴过来的,会产生好多额外的标签。 本节分享一例代码,用于清除word产生的html代码。 例子:

<?php
/**
* 清理word产生的html
* by bbs.it-home.org
*/
function strip_word_html($text, $allowed_tags = '<b><i><sup><sub><em><strong><u><br>')
{
mb_regex_encoding('UTF-8');
$search = array('/&lsquo;/u', '/&rsquo;/u', '/&ldquo;/u', '/&rdquo;/u', '/&mdash;/u');
$replace = array('\'', '\'', '"', '"', '-');
$text = preg_replace($search, $replace, $text);
$text = html_entity_decode($text, ENT_QUOTES, 'UTF-8');
if(mb_stripos($text, '/*') !== FALSE){
$text = mb_eregi_replace('#/\*.*?\*/#s', '', $text, 'm');
}
$text = preg_replace(array('/<([0-9]+)/'), array('< $1'), $text);
$text = strip_tags($text, $allowed_tags);
$text = preg_replace(array('/^\s\s+/', '/\s\s+$/', '/\s\s+/u'), array('', '', ' '), $text);
$search = array('#<(strong|b)[^>]*>(.*?)</(strong|b)>#isu', '#<(em|i)[^>]*>(.*?)</(em|i)>#isu', '#<u[^>]*>(.*?)</u>#isu');
$replace = array('<b>$2</b>', '<i>$2</i>', '<u>$1</u>');
$text = preg_replace($search, $replace, $text);
$num_matches = preg_match_all("/\<!--/u", $text, $matches);
if($num_matches){
$text = preg_replace('/\<!--(.)*--\>/isu', '', $text);
}
return $text;
}
Copy after login


source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template