By understanding The coexistence of multiple languages means multi-bytes. PHP’s built-in string length function strlen cannot correctly handle Chinese strings. It only gets the characters occupied by the string. Number of sections. For GB2312 Chinese encoding, the value obtained by strlen is twice the number of Chinese characters, while for UTF-8 encoded Chinese, the difference is 1 to 3 times.
Using PHP string mbstring can better solve this problem. The usage of mb_strlen is similar to strlen, except that it has a second optional parameter to specify the character encoding. For example, to get the length of the UTF-8 string $str, you can use mb_strlen($str,’UTF-8′). If the second parameter is omitted, PHP's internal encoding will be used. The internal encoding can be obtained through the mb_internal_encoding() function. There are two ways to set it:
1. Set mbstring.internal_encoding = UTF-8 in php.ini
2. Call mb_internal_encoding("GBK")
In addition to the PHP string mbstring, there are many cutting functions, among which mb_substr splits characters by words, and mb_strcut splits characters by bytes, but neither of them will produce half a character. Moreover, cutting from functions has different effects on length. The cutting condition of mb_strcut is less than strlen, and mb_substr is equal to strlen. See the example below.
<ol class="dp-xml"> <li class="alt"><span><span class="tag"><</span><span> ? </span></span></li><li><span>$</span><span class="attribute">str</span><span> = ‘我是一串比较长的中文-www.jefflei.com’; </span></li><li class="alt"><span>echo “mb_substr:” . mb_substr($str, 0, 6, ‘utf-8′); </span></li><li><span>echo ” </span></li><li class="alt"><span>“; </span></li><li><span>echo “mb_strcut:” . mb_strcut($str, 0, 6, ‘utf-8′); </span></li><li class="alt"><span class="tag">?></span><span> </span></span></li> <li><span> </span></li> </ol>
The output is as follows:
mb_substr: I am a string Compare
mb_strcut: I am
It should be noted that the PHP string mbstring is not a PHP core function. Before use, you need to ensure that mbstring support is added when compiling the module in PHP:
(1) Compile Use –enable-mbstring
(2) to modify /usr/local/lib/php.inc
default_charset = “zh-cn”
mbstring.language = zh-cn
mbstring.internal_encoding = zh-cn
The PHP string mbstring class library has a lot of content, and also includes email processing functions such as mb_ send_ mail, etc.