Home > Common Problem > The internal code of a Chinese character requires several bytes to store

The internal code of a Chinese character requires several bytes to store

青灯夜游
Release: 2023-02-08 13:50:33
Original
93513 people have browsed it

The internal code of a Chinese character requires 2 bytes to store. In the popular Chinese character system in China, the internal code of a Chinese character occupies 2 bytes. Because the Chinese character processing system must ensure compatibility between Chinese and Western languages, ambiguity will occur when ASCII codes and Chinese character national standard codes exist in the system. ; To this end, the Chinese character internal code should be appropriately processed and transformed into the national standard code.

The internal code of a Chinese character requires several bytes to store

#The operating environment of this article: windows10 system, thinkpad t480 computer.

How many bytes are needed to store the internal code of a Chinese character?

The internal code of a Chinese character requires 2 bytes to store.

The National Bureau of Standards of my country promulgated the "Chinese Coded Character Set for Information Exchange—Basic Set" in May 1981, code-named GB2312-80, with a total of 6763 Chinese characters and 682 graphic characters. Encoding is carried out, and the encoding principle is: Chinese characters are represented by two bytes.

In principle, two bytes can represent 256×256=65536 different symbols, which is feasible as the basis for Chinese character encoding representation. However, considering the relationship between Chinese character encoding and other international universal encodings, such as ASCII Western character encoding, my country's National Bureau of Standards adopted a modified two-byte Chinese character encoding scheme, using only the lower 7 bits of the two bytes.

This solution can accommodate 128×128=16384 different Chinese characters, but in order to be compatible with the standard ASCII code, 32 control function codes and spaces with a code value of 32 and 32 spaces can no longer be used in each byte. The opcode of 127. So there can only be 94 encodings per byte. In this way, the actual number of words that can be represented by double seven digits is: 94×94=8836.

The internal code of a Chinese character requires several bytes to store

To read more related articles, please visit PHP Chinese website! !

The above is the detailed content of The internal code of a Chinese character requires several bytes to store. For more information, please follow other related articles on the PHP Chinese website!

Related labels:
source:php.cn
Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
Popular Tutorials
More>
Latest Downloads
More>
Web Effects
Website Source Code
Website Materials
Front End Template