Detailed explanation about ord($str)>0x80 in PHP_PHP tutorial

WBOY
Release: 2016-07-21 15:16:14
Original
847 people have browsed it

The encoding of the GBK simplified character set is represented by both 1 byte and 2 bytes. When the high bit is 0x00~0x7f, it is one byte. When the high bit is 0x80 or above, it is represented by 2 bytes. "

Note: All the brackets are binary

When you find that the content of a byte is greater than 0x7f, then it must be a Chinese character (joined together with another byte). How to judge that it is definitely greater than 0x7f?
The number after 0x7f (1111111) is 0x80 (10000000) ), so if you want it to be greater than 0x7f, the highest bit of this byte must be 1. We only need to determine whether the highest bit is 1.

Judgment method:

Bitwise AND (the same bits are all 1, it is 1, otherwise it is 0):
For example: to determine whether the third digit of a number is 1, just follow 4 (100) bitwise ANDs to determine one To determine whether the second digit of a number is 1, just follow the AND of 2(10) bits.
Similarly, to determine whether the eighth digit is 1, just follow (10000000), which is the 0x80 bit AND.

Why not use it here? >0x7f, PHP may be OK, but in other strongly typed languages, the highest bit of 1 byte is used to mark a negative number. A negative number cannot be greater than 0x7f (the largest integer)

Another example Example:
The assic code of a is 97 (1100001)
The assic code of A is 65 (1000001)

The assic code of b is 98 (1100010)
The assic code of b is 66 (1000010)

Found a rule: as long as a letter from a-z is a lowercase letter, the sixth digit must be 1. We can use this to determine the case:
At this time, it only needs to be followed by a letter Follow 0x20 (100000) and judge:

Copy code The code is as follows:

if(ord($a)&0x20 ){
//Capital
}

How to change all letters to uppercase? Just change the 1 in the sixth position to 0:
Copy code The code is as follows:

$a='a';
$a = chr(ord($a)&(~0x20));
echo $a;

www.bkjia.comtruehttp: //www.bkjia.com/PHPjc/325958.htmlTechArticleThe encoding of the GBK simplified character set is represented by 1 byte and 2 bytes at the same time. When the high bit is 0x00~0x7f, it is one byte. When the high bit is 0x80 or above, it is represented by 2 bytes." Note: Inside the brackets...
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template