PHP自动识别字符集编码并完成转码
May 25, 2016 pm 04:44 PM原理很简单,因为gb2312/gbk是中文两字节,这两个字节是有取值范围的,而utf-8中汉字是三字节,同样每个字节也有取值范围,而英文不管在何种编码情况下,都是小于128,只占用一个字节,全角除外.
在PHP处理页面的时候,我们对于字符集的转换都是采用了iconv或者mb_convert等函数,但这其实是有一个前提的,即我们事先得知道in和out是什么样的编码,我们才能进行正确的转换.
下面这个函数,就可以在不知道源字符串编码的情况下,自动判断其编码并进行转换,虽然只支持UTF8编码和GB2312编码,但对于国内绝大多数网站来说,已经够用了,代码如下:
<?php function safeEncoding($string, $outEncoding = 'UTF-8') { $encoding = "UTF-8"; for ($i = 0; $i < 128) continue; if ((ord($string{$i}) & 224) == 224) { //第一个字节判断通过 $char = $string{++$i}; if ((ord($char) & 128) == 128) { //第二个字节判断通过 $char = $string{++$i}; if ((ord($char) & 128) == 128) { $encoding = "UTF-8"; break; } } } if ((ord($string{$i}) & 192) == 192) { //第一个字节判断通过 $char = $string{++$i}; if ((ord($char) & 128) == 128) { //第二个字节判断通过 $encoding = "GB2312"; break; } } } if (strtoupper($encoding) == strtoupper($outEncoding)) return $string; else return iconv($encoding, $outEncoding, $string); } ?>
识别汉字编码,因为YBlog用的是utf-8,如果引用通告发过来的是gb2312的编码的话,需要可以识别并完成编码转换,代码如下:
<?php function safeEncoding($string, $outEncoding = 'UTF-8') { $encoding = "UTF-8"; for ($i = 0; $i < strlen($string); $i++) { if (ord($string{$i}) < 128) continue; if ((ord($string{$i}) & 224) == 224) { //第一个字节判断通过 $char = $string{++$i}; if ((ord($char) & 128) == 128) { //第二个字节判断通过 $char = $string{++$i}; if ((ord($char) & 128) == 128) { $encoding = "UTF-8"; break; } } } if ((ord($string{$i}) & 192) == 192) { //第一个字节判断通过 $char = $string{++$i}; if ((ord($char) & 128) == 128) { //第二个字节判断通过 $encoding = "GB2312"; break; } } } if (strtoupper($encoding) == strtoupper($outEncoding)) return $string; else return iconv($encoding, $outEncoding, $string); } ?>

Hot Article

Hot tools Tags

Hot Article

Hot Article Tags

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

11 common classification feature encoding techniques

How many bytes do utf8 encoded Chinese characters occupy?

Knowledge graph: the ideal partner for large models

How to solve the problem of encoding of php database query results

AI software can automatically identify ancient cuneiform tablets, and researchers have made a breakthrough

Learn how to improve coding performance based on GenAI in one article

PHP coding tips: How to generate a QR code with anti-counterfeiting verification function?
