PHP automatically recognizes character sets and completes transcoding
Because the character encoding I use is generally UTF-8 encoding, but if the other party’s blog uses gb2312 encoding, the POST will be garbled (unless the other party converts the encoding before POSTing). When you cannot guarantee whether the other party must use UTF-8 encoding, it is necessary to do an encoding check and conversion yourself.
I wrote a function to complete this work. The principle is very simple, because gb2312/gbk is Chinese two bytes, these two bytes have a value range, and Chinese characters in utf-8 are three bytes, the same Each byte also has a value range. Regardless of the encoding situation, English is less than 128 and only occupies one byte (except full-width).
If it is an encoding check in the form of a file, you can also directly check the BOM information of utf-8. Regarding this aspect, you can take a look at the encoding conversion function of the TP toolbox. I wrote more details in the AppCodingSwitch class. annotation.
Without further ado, let’s go directly to the function. This function is used to check and transcode strings. File inspection and transcoding
[php]
function safeEncoding($string, $outEncoding = 'UTF-8') {
$encoding = "UTF-8";
for ($i = 0; $i < strlen($string); $i++) {
if (ord($string{$i}) < 128)
continue;
if ((ord($string{$i}) & 224) == 224) {
//The first byte passed
$char = $string{++$i};
if ((ord($char) & 128) == 128) {
$char = $string{++$i};
if ((ord($char) & 128) == 128) {
$encoding = "UTF-8";
break;
}
}
} }
if ((ord($string{$i}) & 192) == 192) {
//The first byte passed
$char = $string{++$i};
if ((ord($char) & 128) == 128) {
$encoding = "GB2312";
break;
}
} }
}
if (strtoupper($encoding) == strtoupper($outEncoding))
return $string;
else
returniconv($encoding, $outEncoding, $string);
}
http://www.bkjia.com/PHPjc/477773.html
www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/477773.htmlTechArticlePHP automatically recognizes the character set and completes the transcoding because it uses character encoding. Generally, it is UTF-8 encoding, but if If the other party's blog uses gb2312 encoding, the POST will appear garbled (unless...