Generate summary
Recently, I need to add a requirement. There is a send_article interface. I need to extract Chinese characters from the html code and turn it into a summary. I have tried many methods, such as:
<code><span>//匹配中文utf8编码</span><span><span>function</span><span>utf8_summary</span><span>(<span>$article</span>)</span> {</span><span>$match</span> = <span>"/^[\x{4e00}-\x{9fa5}]+$/u"</span>;<span>//正则表达式,匹配中文</span> preg_match_all(<span>$match</span>,<span>$article</span>,<span>$temp</span>); <span>$summary</span> = <span>""</span>; <span>foreach</span> (<span>$temp</span><span>as</span><span>$key</span> => <span>$value</span>) { <span>$sum</span> = implode(<span>''</span> , <span>$value</span>); <span>$summary</span> = <span>$sumary</span> . <span>$sum</span>; }<span>//将中文拼接起来</span><span>return</span><span>$summary</span>; } </code>
The question is:
1. When consecutive Chinese characters appear, it will be impossible to take them out
2. This method is effective when Chinese characters and characters are mixed
Reason:
Maybe when it is pure Chinese, the encoding will be changed to something else, so the regular expression cannot match, but when there is a mix of Chinese and characters, the character encoding is utf8, so it can match. In fact, the client can wrap Chinese in the label , and added the header, using setchars=utf8 to specify, but the client's entity class has been written, and it is too troublesome to change it. I had to find a way in the background, so I tried the second method:
<code><span>$function</span> url_summary(<span>$article</span>) { <span>$article</span> = urlencode(); <span>$match</span> = <span>"/^%[a-zA-Z0-9]{2}/"</span>; preg_match_all(<span>$match</span>,<span>$article</span>,<span>$temp</span>); <span>$summary</span> = <span>""</span>; <span>foreach</span> (<span>$temp</span><span>as</span><span>$key</span> => <span>$value</span>) { <span>$sum</span> = implode(<span>''</span> , <span>$value</span>); <span>$summary</span> = <span>$sumary</span> . <span>$sum</span>; } <span>$summary</span> = decode(<span>$summary</span>); <span>return</span><span>$summary</span>; }</code>
The idea of this method is: observation After non-letters and numbers are URL-encoded, they will become strings similar to %e7, so these are taken out, then spliced together, and after decoding, Chinese characters will be obtained.
Later I found out: It’s actually a function
I found out that there is a function that can change the encoding
<code><span>iconv(<span>"gbk"</span>,<span>"utf-8"</span>,<span>"php中文转码"</span>)</span>;<span>//把中文gbk编码转为utf8</span><span>iconv(<span>"utf-8"</span>,<span>"gbk"</span>,<span>"php中文转码"</span>)</span>;<span>//把中文utf8编码转为gbk</span></code>
But if you want to use this function, you need to go to the php.ini file and set extension=php_iconv.dll Open it and install the iconv function library to use it
,kind of hard.
Finally, I found that using the strip_tags() function can solve the problem
This function can remove html tags and then intercept a section,
mb_substr(summary,0,50);//Intercept a character
and need to remove escape characters, such as
str_replace(’ ’,‘’,summary); //Remove escape characters
A summary can be generated, and more functions can be added later, such as sentence segmentation and line wrapping;
The above introduces PHP internship tips (how to generate a simple summary), including tips and PHP content. I hope it will be helpful to friends who are interested in PHP tutorials.