用php实现gb2312和unicode间的编码转换_php基础
gb2312 和 unicode 间的编码转换
下面的例子是将 gb2312 转换为 "全"这种形式
php4.3.1以后的iconv函数很好用的,只是需要自己写一个uft8到unicode的转换函数
查表(gb2312.txt)也行
$text = "脚本之家";
preg_match_all("/[\x80-\xff]?./",$text,$ar);
foreach($ar[0] as $v)
echo "".utf8_unicode(iconv("GB2312","UTF-8",$v)).";";
?>
// utf8 -> unicode
function utf8_unicode($c) {
switch(strlen($c)) {
case 1:
return ord($c);
case 2:
$n = (ord($c[0]) & 0x3f) $n += ord($c[1]) & 0x3f;
return $n;
case 3:
$n = (ord($c[0]) & 0x1f) $n += (ord($c[1]) & 0x3f) $n += ord($c[2]) & 0x3f;
return $n;
case 4:
$n = (ord($c[0]) & 0x0f) $n += (ord($c[1]) & 0x3f) $n += (ord($c[2]) & 0x3f) $n += ord($c[3]) & 0x3f;
return $n;
}
}
?>
下面的例子是利用php将"全"这中编码转换为gb2312.
$str = "TTL全天候自动聚焦";
$str = preg_replace("|([0-9]{1,5});|", "\".u2utf82gb(\\1).\"", $str);
$str = "\$str=\"$str\";";
eval($str);
echo $str;
function u2utf82gb($c){
$str="";
if ($c $str.=$c;
} else if ($c $str.=chr(0xC0 | $c>>6);
$str.=chr(0x80 | $c & 0x3F);
} else if ($c $str.=chr(0xE0 | $c>>12);
$str.=chr(0x80 | $c>>6 & 0x3F);
$str.=chr(0x80 | $c & 0x3F);
} else if ($c $str.=chr(0xF0 | $c>>18);
$str.=chr(0x80 | $c>>12 & 0x3F);
$str.=chr(0x80 | $c>>6 & 0x3F);
$str.=chr(0x80 | $c & 0x3F);
}
return iconv('UTF-8', 'GB2312', $str);
}
?>
或者是
function unescape($str) {
$str = rawurldecode($str);
preg_match_all("/(?:%u.{4})|.{4};|\d+;|.+/U",$str,$r);
$ar = $r[0];
print_r($ar);
foreach($ar as $k=>$v) {
if(substr($v,0,2) == "%u")
$ar[$k] = iconv("UCS-2","GB2312",pack("H4",substr($v,-4)));
elseif(substr($v,0,3) == "")
$ar[$k] = iconv("UCS-2","GB2312",pack("H4",substr($v,3,-1)));
elseif(substr($v,0,2) == "") {
echo substr($v,2,-1)."
";
$ar[$k] = iconv("UCS-2","GB2312",pack("n",substr($v,2,-1)));
}
}
return join("",$ar);
}
$str = "TTL全天候自动聚焦";
echo unescape($str); //out TTL全天候自动聚焦
利用javascript来转换
下面是一个显示所有全角半角的字体的查看例子
<script> <BR>function showUni(min,max){ <BR>show.document.open(); <BR>show.document.writeln("<style>body{font-size:9pt;word-break:break-all;}"); <BR>show.document.writeln(min + " - " + max + "<br><br>"); <BR>var i=0; <BR>for(i=min;i<=max;i++){ <BR>show.document.write("&#" + i + ";"); <BR>} <BR>show.document.close(); <BR>} <BR></script>
自定义: -
下面是一个查表(gb2312),转换gb2312到utf8的例子, 现在有iconv函数,这个已经没有太大的意义了,
function gb2utf8($gb){
if(!trim($gb)) return $gb;
$filename="gb2312.txt";
$tmp=file($filename);
$codetable=array();
while(list($key,$value)=each($tmp))
$codetable[hexdec(substr($value,0,6))]=substr($value,7,6);
$utf8="";
while($gb) {
if (ord(substr($gb,0,1))>127) {
$this=substr($gb,0,2);
$gb=substr($gb,2,strlen($gb)-2);
$utf8.=u2utf8(hexdec($codetable[hexdec(bin2hex($this))-0x8080]));
}else{
$this=substr($gb,0,1);
$gb=substr($gb,1,strlen($gb)-1);
$utf8.=u2utf8($this);
}
}
return $utf8;
}
function u2utf8($c){
$str="";
if ($c $str.=$c;
} else if ($c $str.=chr(0xC0 | $c>>6);
$str.=chr(0x80 | $c & 0x3F);
} else if ($c $str.=chr(0xE0 | $c>>12);
$str.=chr(0x80 | $c>>6 & 0x3F);
$str.=chr(0x80 | $c & 0x3F);
} else if ($c $str.=chr(0xF0 | $c>>18);
$str.=chr(0x80 | $c>>12 & 0x3F);
$str.=chr(0x80 | $c>>6 & 0x3F);
$str.=chr(0x80 | $c & 0x3F);
}
return $str;
}
?>

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Unicode is a character encoding standard used to represent various languages and symbols. To convert Unicode encoding to Chinese characters, you can use Python's built-in functions chr() and ord().

In-depth understanding of PHP: Implementation method of converting JSONUnicode to Chinese During development, we often encounter situations where we need to process JSON data, and Unicode encoding in JSON will cause us some problems in some scenarios, especially when Unicode needs to be converted When encoding is converted to Chinese characters. In PHP, there are some methods that can help us achieve this conversion process. A common method will be introduced below and specific code examples will be provided. First, let us first understand the Un in JSON

Are you troubled by Chinese garbled characters in Eclipse? To try these solutions, you need specific code examples 1. Background introduction With the continuous development of computer technology, Chinese plays an increasingly important role in software development. However, many developers encounter garbled code problems when using Eclipse for Chinese development, which affects work efficiency. Then, this article will introduce some common garbled code problems and give corresponding solutions and code examples to help readers solve the Chinese garbled code problem in Eclipse. 2. Common garbled code problems and solution files

JSON (JavaScriptObjectNotation) is a lightweight data exchange format commonly used for data exchange between web applications. When processing JSON data, we often encounter Unicode-encoded Chinese characters (such as "u4e2du6587") and need to convert them into readable Chinese characters. In PHP, we can achieve this conversion through some simple methods. Next, we will detail how to convert JSONUnico

With the development of technologies such as big data and cloud computing, databases have become one of the important cornerstones of enterprise informatization. In applications developed in Java, connecting to MySQL database has become the norm. However, in this process, we often encounter a thorny problem - inconsistent Unicode character set encoding. This will not only affect our development efficiency, but also affect the performance and stability of the application. This article will introduce how to solve this problem and make Java connect to the MySQL database more smoothly. 1. Unicode

The differences between unicode and ascii include different encoding ranges, different storage spaces, and different compatibility. Detailed introduction: 1. The encoding range is different. The encoding range of ASCII is 0-127, which is mainly used to represent English letters. The encoding range of Unicode is much wider and can represent almost all language characters; 2. The storage space is different. ASCII usually Use 1 byte to store a character, while unicode may use 2 or more bytes to store a character; 3. Different compatibility, etc.

Sequential access Sequential access is a basic operation for processing strings in the Java language. Under this approach, each character in the input string is accessed sequentially from beginning to end, or sometimes from end to beginning. This section discusses seven technical examples of creating a 32-bit code point array from a string using sequential access methods and estimates their processing time. Example 1-1: Benchmark (no support for surrogate pairs) Listing 1 assigns a 16-bit char type value directly to a 32-bit code point value, without taking the surrogate pair into account at all: Listing 1. No support for surrogate pairs int[]toCodePointArray(Stringstr) {//Example1-1intlen=str.length();//t

How to convert json unicode to Chinese in PHP: 1. Use the "json_encode($log['result_data'],JSON_UNESCAPED_UNICODE);" method to convert; 2. Use the "function unicodeDecode($unicode_str){...}" method to convert That’s it.
