The principle is very simple, because gb2312/gbk is Chinese two bytes, these two bytes have a value range, while Chinese characters in utf-8 are three bytes, and each byte also has a value range. Regardless of the encoding situation, English is less than 128 and only takes up one byte (except full-width)
When PHP processes pages, we use functions such as iconv or mb_convert to convert character sets. However, this actually has a premise. That is, we must know in advance what encoding in and out are so that we can perform the correct conversion.
The following function can automatically determine the encoding of the source string and convert it without knowing its encoding. Although it only supports UTF8 encoding and GB2312 encoding, it is enough for most domestic websites.
The code is as follows |
Copy code |
代码如下 |
复制代码 |
function safeEncoding($string,$outEncoding = 'UTF-8')
{
$encoding = "UTF-8";
for($i=0;$i<128)
continue;
if((ord($string{$i})&224)==224)
{
//第一个字节判断通过
$char = $string{++$i};
if((ord($char)&128)==128)
{
//第二个字节判断通过
$char = $string{++$i};
if((ord($char)&128)==128)
{
$encoding = "UTF-8";
break;
}
}
}
if((ord($string{$i})&192)==192)
{
//第一个字节判断通过
$char = $string{++$i};
if((ord($char)&128)==128)
{
//第二个字节判断通过
$encoding = "GB2312";
break;
}
}
}
if(strtoupper($encoding) == strtoupper($outEncoding))
return $string;
else
return iconv($encoding,$outEncoding,$string);
}
|
function safeEncoding($string,$outEncoding = 'UTF-8')
{
$encoding = "UTF-8";
for($i=0;$i<128)
continue;
if((ord($string{$i})&224)==224)
{
//The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
{
//The second byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
{
$encoding = "UTF-8";
break;
}
}
}
If((ord($string{$i})&192)==192)
{
//The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
{
//The second byte passed
$encoding = "GB2312";
break;
}
}
}
if(strtoupper($encoding) == strtoupper($outEncoding))
return $string;
else
return iconv($encoding,$outEncoding,$string);
}
|
Example 2
The code is as follows |
Copy code |
//Identify Chinese character encoding, because YBlog uses utf-8, if the citation notification is sent with gb2312 encoding, it needs to be able to identify and complete the encoding conversion
Function safeEncoding($string,$outEncoding = 'UTF-8')
{
$encoding = "UTF-8";
for($i=0;$i
If(ord($string{$i})<128)
Continue;
If((ord($string{$i})&224)==224)
//The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
//The second byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
$encoding = "UTF-8";
break;
If((ord($string{$i})&192)==192)
//The first byte passed
$char = $string{++$i};
If((ord($char)&128)==128)
//The second byte passed
$encoding = "GB2312";
break;
If(strtoupper($encoding) == strtoupper($outEncoding))
return $string;
else return iconv($encoding,$outEncoding,$string);
}
|
http://www.bkjia.com/PHPjc/632750.htmlwww.bkjia.comtruehttp: //www.bkjia.com/PHPjc/632750.htmlTechArticleThe principle is very simple, because gb2312/gbk is Chinese two bytes, and these two bytes have a value range , and Chinese characters in UTF-8 are three bytes, and each byte also has a value range. And English no matter where...