-
- // Automatic conversion character set supports array conversion
- function auto_charset($fContents, $from='gbk', $to='utf-8') {
- $from = strtoupper($ from) == 'UTF8' ? 'utf-8' : $from;
- $to = strtoupper($to) == 'UTF8' ? 'utf-8' : $to;
- if (strtoupper($from) = == strtoupper($to) || empty($fContents) || (is_scalar($fContents) && !is_string($fContents))) {
- //No conversion if encoding is the same or non-string scalar
- return $fContents ;
- }
- if (is_string($fContents)) {
- if (function_exists('mb_convert_encoding')) {
- return mb_convert_encoding($fContents, $to, $from);
- } elseif (function_exists('iconv')) {
- return iconv($from, $to, $fContents);
- } else {
- return $fContents;
- }
- } elseif (is_array($fContents)) {
- foreach ($fContents as $key => $val) {
- $_key = auto_charset($key, $from, $to);
- $fContents[$_key] = auto_charset($val, $from, $to);
- if ($key != $_key)
- unset( $fContents[$key]);
- }
- return $fContents;
- }
- else {
- return $fContents;
- }
- }
Copy code
At this time, you may think of using iconv directly for transcoding. But the two parameters that the iconv function needs to provide are input encoding and output encoding, and now we don’t know what encoding the received string is. It would be great if we could get the encoding of the received character at this time.
For this problem, there are the following two options for reference.
Option 1
When you want the client to submit data, specify the submitted encoding. In this case, you need to provide an additional variable to specify the encoding.
$string = $_GET['charset'] === 'gbk' ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str'];
For this situation, if there is no agreement or we cannot control the client, it seems that this solution is not very good to use.
Option 2
The received data encoding is detected directly by the server.
This solution is of course the most ideal. Now the question is how to detect the encoding of a character? For this situation, in PHP, the mb_check_encoding in the mb_string extension provides the functionality we need.
$str = mb_check_encoding($_GET['str'],'gbk') ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str'];
But this requires turning on the mb_string extension. Sometimes this extension may not be turned on in our production server. In this case, you need to use the following function to determine the encoding.
-
- function isGb2312($string) {
- for($i=0; $i 127) {
- if( ($v >= 228) && ($v < = 233 ) )
- {
- if( ($i+2) >= (strlen($string) - 1)) return true;
- $v1 = ord( $string[$i+1] );
- $v2 = ord ( $string[$i+2] );
- if( ($v1 >= 128) && ($v1 < =191) && ($v2 >=128) && ($v2 < = 191) )
- return false;
- else
- return true;
- }
- }
- }
- return true;
- }
- function isUtf8($string) {
- return preg_match('%^(?:
- [x09x0Ax0Dx20-x7E] # ASCII
- | [xC2-xDF][x80-xBF] # non-overlong 2-byte
- | xE0[xA0-xBF][x80-xBF] # excluding overlongs
- | [xE1-xECxEExEF][x80-xBF]{2} # straight 3-byte
- | xED[x80-x9F][x80-xBF] # excluding surrogates
- | xBF]{3} # planes 4-15
- |
- Here we can use any of the above functions to detect the encoding and convert it to the specified encoding.
$str = isGb2312($_GET['str'],'gbk') ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str'];
-
-
|