Question
mb_substr
Solution
$string = "People's Republic of China";
$mystring=mb_substr($string,0,6,'UTF-8');
echo $mystring;
Copy the code and see it in the book: Under UTF-8 encoding, one Chinese character occupies 3 bytes; under GB2312/GBK encoding, one Chinese character occupies 2 bytes
So should the above code output the word "中华"?
Reference answer
Output "People's Republic of Central"
Reference answer
That 6 is the number of characters, not the number of bytes.
Please read the manual carefully-.-
Reference answer
'UTF-8' Remove this and have a look...
Reference answer
The original post was published by An on 2008-11-7 13:28 [url=http://bbs.111cn.cn/redirect.php?goto=findpost&pid=698924&ptid=89149]Link tag [img]http://bbs. 111cn.cn/images/common/back.gif[/img][/url]
'UTF-8' Remove this and see...
If the encoding parameter is not passed in,
I remember that mbstring.internal_encoding
under ini is used by default.
If mbstring.internal_encoding is not set, Latin-1 (iso-8859-1) should be used
Reference answer
The original post was published by An at 2008-11-7 13:28[url=http://www.111cn.cn/bbs/redirect.php?goto=findpost&pid=698924&ptid=89149]Link tag[img]http:// www.111cn.cn/bbs/images/common/back.gif[/img][/url]
'UTF-8' Remove this and see...
In this case, a Chinese character takes up two bytes
For example: echo $mystring=mb_substr($string,0,4);//Result: China
echo $mystring=mb_substr($string,0,3 or 5);//Result: China
Reference answer
The original post was published by An on 2008-11-7 13:28[url=http://www.111cn.cn/bbs/redirect.php?goto=findpost&pid=698924&ptid=89149]Link tag[img]http:// www.111cn.cn/bbs/images/common/back.gif[/img][/url]
'UTF-8' Remove this and see...
Sorry, my mistake, my page is GB2312- -
After the page is changed to UTF8, remove 'UTF-8' and it will be three bytes per Chinese character. Thank you for your help
Reference answer
The original post was posted by a man on 2008-11-7 16:31 [url=http://bbs.111cn.cn/redirect.php?goto=findpost&pid=699902&ptid=89149]Link tag [img]http://bbs. 111cn.cn/images/common/back.gif[/img][/url]
Sorry, my mistake, my page is GB2312- -
After the page is changed to UTF8, remove 'UTF-8' and it will be three bytes per Chinese character. Thank you for your help
A character set can contain characters down to the number of bytes,
For example gbk can contain double-byte characters and single-byte characters.
utf8 can contain characters of 6, 5, 4, 3, 2, 1 (the old one is 4, 3, 2, 1) bytes.
Because it is impossible to determine whether a string contains multi-byte or single-byte characters.
GBK environment: $string = "Test ab test cd word 0 string";
$mystring=mb_substr($string,0,3,'GBK');
echo $mystring;
Copy the code. This 3 returns the number of characters. The setting is the number of characters and not the number of bytes.
Sigh~, what can I say? I see you have posted so many posts about multi-byte character encoding,
In the end, I still don’t understand the relationship between characters, bytes and character encoding-.-
[ ]
Reference answer
Crash -. -
Reference answer
Thank you, I feel enlightened...