GBK and UTF8 encoding processing in PHP_PHP tutorial

WBOY
Release: 2016-07-13 17:38:08
Original
798 people have browsed it

1. Coding range
1. GBK (GB2312/GB18030)
x00-xff GBK double-byte encoding range
x20-x7f ASCII
xa1-xff Chinese
x80-xff Chinese

2. UTF-8 (Unicode)
u4e00-u9fa5 (Chinese)
x3130-x318F (Korean)
xAC00-xD7A3 (Korean)
u0800-u4e00 (Japanese)
ps: Korean is a character larger than [u9fa5]

Regular example:
preg_replace("/([x80-xff])/","",$str);
preg_replace("/([u4e00-u9fa5])/","",$str);

2. Code examples

//Judge whether there is Chinese in the content-GBK (PHP)
function check_is_chinese($s){
Return preg_match(/[x80-xff]./, $s);
}
//Get the string length-GBK (PHP)
function gb_strlen($str){
$count = 0;
for($i=0; $i           $s = substr($str, $i, 1);
If (preg_match("/[x80-xff]/", $s)) ++$i;
         ++$count;
}
Return $count;
}
//Intercept string string-GBK (PHP)
function gb_substr($str, $len){
$count = 0;
for($i=0; $i If($count == $len) break;
If(preg_match("/[x80-xff]/", substr($str, $i, 1))) ++$i;
          ++$count;                                          }
Return substr($str, 0, $i);
}
//Statistics string length-UTF8 (PHP)
function utf8_strlen($str) {
$count = 0;
for($i = 0; $i < strlen($str); $i++){
         $value = ord($str[$i]);
If($value > 127) {
              $count++;
If($value >= 192 && $value <= 223) $i++;
               elseif($value >= 224 && $value <= 239) $i = $i + 2;
               elseif($value >= 240 && $value <= 247) $i = $i + 3;
              else die(Not a UTF-8 compatible string);
}
        $count++;
}
Return $count;
}

//Intercept string-UTF8(PHP)
function utf8_substr($str,$position,$length){
$start_position = strlen($str);
$start_byte = 0;
$end_position = strlen($str);
$count = 0;
for($i = 0; $i < strlen($str); $i++){
If($count >= $position && $start_position > $i){
               $start_position = $i;
               $start_byte = $count;
}
If(($count-$start_byte)>=$length) {
               $end_position = $i;
             break;
                                                                               $value = ord($str[$i]);
If($value > 127){
              $count++;
If($value >= 192 && $value <= 223) $i++;
               elseif($value >= 224 && $value <= 239) $i = $i + 2;
                elseif($value >= 240 && $value <= 247) $i = $i + 3;
              else die(Not a UTF-8 compatible string);
}
        $count++;
}
Return(substr($str,$start_position,$end_position-$start_position));
}

//String length statistics-UTF8 [3 bytes for Chinese, 2 bytes for Russian and Korean, 1 byte for letters] (Ruby)
def utf8_string_length(str)
Temp = CGI::unescape(str)
i = 0;
j = 0;
temp.length.times{|t|
If temp[t] < 127
            i += 1
          elseif temp[t] >= 127 and temp[t] < 224
             j += 1
If 0 == (j % 2)
                  i += 2
j = 0
end
        else
             j += 1
If 0 == (j % 3)
                 i +=2
j = 0
end
end
}
Return i
}

//Determine whether it contains Korean-UTF-8 (javascript)
function checkKoreaChar(str) {
for(i=0; i if(((str.charCodeAt(i) > 0x3130 && str.charCodeAt(i) < 0x318F) || (str.charCodeAt(i) >= 0xAC00 && str.charCodeAt(i) <= 0xD7A3)) ) {
             return true;
}
}
Return false;
}

//Determine whether there are Chinese characters-GBK (javascript)
function check_chinese_char(s){
Return (s.length != s.replace(/[^x00-xff]/g,"**").length);
}

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/486508.htmlTechArticle1. Encoding range 1. GBK (GB2312/GB18030) x00-xff GBK double-byte encoding range x20-x7f ASCII xa1-xff Chinese x80-xff Chinese 2. UTF-8 (Unicode) u4e00-u9fa5 (Chinese) x3130-x318F (Korean)...
Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template