PHP encoding conversion function (automatic character set conversion supports array conversion)

WBOY
Release: 2016-07-25 09:10:20
Original
1178 people have browsed it
  1. // Automatic conversion character set supports array conversion
  2. function auto_charset($fContents, $from='gbk', $to='utf-8') {
  3. $from = strtoupper($ from) == 'UTF8' ? 'utf-8' : $from;
  4. $to = strtoupper($to) == 'UTF8' ? 'utf-8' : $to;
  5. if (strtoupper($from) = == strtoupper($to) || empty($fContents) || (is_scalar($fContents) && !is_string($fContents))) {
  6. //No conversion if encoding is the same or non-string scalar
  7. return $fContents ;
  8. }
  9. if (is_string($fContents)) {
  10. if (function_exists('mb_convert_encoding')) {
  11. return mb_convert_encoding($fContents, $to, $from);
  12. } elseif (function_exists('iconv')) {
  13. return iconv($from, $to, $fContents);
  14. } else {
  15. return $fContents;
  16. }
  17. } elseif (is_array($fContents)) {
  18. foreach ($fContents as $key => $val) {
  19. $_key = auto_charset($key, $from, $to);
  20. $fContents[$_key] = auto_charset($val, $from, $to);
  21. if ($key != $_key)
  22. unset( $fContents[$key]);
  23. }
  24. return $fContents;
  25. }
  26. else {
  27. return $fContents;
  28. }
  29. }
Copy code

At this time, you may think of using iconv directly for transcoding. But the two parameters that the iconv function needs to provide are input encoding and output encoding, and now we don’t know what encoding the received string is. It would be great if we could get the encoding of the received character at this time. For this problem, there are the following two options for reference.

Option 1 When you want the client to submit data, specify the submitted encoding. In this case, you need to provide an additional variable to specify the encoding. $string = $_GET['charset'] === 'gbk' ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str']; For this situation, if there is no agreement or we cannot control the client, it seems that this solution is not very good to use.

Option 2 The received data encoding is detected directly by the server. This solution is of course the most ideal. Now the question is how to detect the encoding of a character? For this situation, in PHP, the mb_check_encoding in the mb_string extension provides the functionality we need. $str = mb_check_encoding($_GET['str'],'gbk') ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str']; But this requires turning on the mb_string extension. Sometimes this extension may not be turned on in our production server. In this case, you need to use the following function to determine the encoding.

  1. function isGb2312($string) {
  2. for($i=0; $i 127) {
  3. if( ($v >= 228) && ($v < = 233 ) )
  4. {
  5. if( ($i+2) >= (strlen($string) - 1)) return true;
  6. $v1 = ord( $string[$i+1] );
  7. $v2 = ord ( $string[$i+2] );
  8. if( ($v1 >= 128) && ($v1 < =191) && ($v2 >=128) && ($v2 < = 191) )
  9. return false;
  10. else
  11. return true;
  12. }
  13. }
  14. }
  15. return true;
  16. }
  17. function isUtf8($string) {
  18. return preg_match('%^(?:
  19. [x09x0Ax0Dx20-x7E] # ASCII
  20. | [xC2-xDF][x80-xBF] # non-overlong 2-byte
  21. | xE0[xA0-xBF][x80-xBF] # excluding overlongs
  22. | [xE1-xECxEExEF][x80-xBF]{2} # straight 3-byte
  23. | xED[x80-x9F][x80-xBF] # excluding surrogates
  24. | xBF]{3} # planes 4-15
  25. |
  26. Here we can use any of the above functions to detect the encoding and convert it to the specified encoding. $str = isGb2312($_GET['str'],'gbk') ? iconv('gbk','utf-8',$_GET['str']) : $_GET['str'];

source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template