GB 2312 Chinese Ideograms Coded Character Set
GB 2312 is an information interchange code for Chinese ideograms under the National Standard of the People's Republic of China. Its full name is "Chinese Ideograms Coded Character Set for Information Interchange-Basic Set", and the standard number is GB 2312-80(GB is abbreviated from the Pinyin "GuoBiao", meaning National Standard). Issued by the State Standard Administration of the People's Republic of China, it was implemented as of May 1, 1981.

Commonly called GuoBiao Code, GB Code or Quwei (Regional Position) Code, GB 2312 is a Simplified Chinese code prevailing in China's mainland, Singapore and some other areas.

GB 2312-80 incorporates simplified Hanzi and some common symbols, serial numbers, numerals, Latin letters, Japanese kana, Greek letters, Russian letters, Pinyin symbols, and zhuyin diacritics. There are all together 7445 graphical characters, of which 682 are non-Hanzi graphical characters and 6763 Hanzi.

As prescribed in GB 2312-80, "every graphical character should be represented by two bytes, and each byte represented by the 7-bit code in GB 1988-80 and GB 2311-80. Of the two bytes, the one in the front is the first byte, and the following one is the second byte." We often call the first byte as high byte and the second as low byte.

GB 2312-80 divides the code table into 94 Sections to correspond to the first byte; there are 94 Positions in each Section to correspond to the second byte. The values of the two bytes are Section value and Position value plus 32 (20H) respectively.

According to GB 2312-80, Sections 01~09 (originally defined as Sections 1~9, and now renamed to Sections 01~09 for easier representation of the regional code) is used for symbols and numerals; Sections 16~87 is the Hanzi section while Sections 10~15 and Sections 88~94 have vacant positions to be further standardized. But the recommendation in Section 10 is identical to the 94 graphical characters in Section 3 (i.e., the 94 graphical characters in GB 1988-80), and the font width is half of the latter's.

GB 2312-80 divides the Hanzi it contains into two levels. Level 1 includes 3755 frequently used Hanzi, placed in Sections 16~55 as per the order of pinyin/stroke form sequence; level 2 contains 3008 less frequently used Hanzi, placed in Sections 56~87 as per the order of radicals/stroke sequence. The standard pronunciation is based on the "Initial Draft of the Summary Table of Mandarin Words with Variant Pronunciation Resulting from the Third Pronunciation Review" (published in 1963), issued by the Mandarin Pronunciation Review Committee; the standard font is based on the "Table of Universal Chinese Ideograms for Printing" (published in 1964) by the Ministry of Culture of the PRC and the Chinese Literal Reform Committee.

For example: For Hanzi "啊", the first byte is 0110000 and the second byte is 0100001, i.e., Position 01 of Section 16, which is expressed as 16-01.

1. Some external Chinese platforms cannot correctly display the vacant positions of Sections 01~15.

2. Positions of 0201~0210 in Section 02, 0664~0685 in Section 06 and 0827~0832 in Section 08 are originally defined as vacant positions to be further standardized in GB 2312-80. But if the operating system you use is the Windows 95 simplified Chinese version (or higher version), or if you use other systems that support GBK Code, CJK Code, you will see in some cases 10 lowercase Roman numerals, 19 Chinese vertical symbols and 6 pinyin symbols at the above-mentioned positions. These symbols are additional symbols in GB 5007.1, GB/T 12345-90 and GBK; they can be shown because Windows 95 simplified version uses GBK fonts. Besides, Hanzi shown in Sections 10, 11 and 12 are of similar conditions.

3. The code range of GB Code is 2121H~777EH, which duplicates with ASCII Code to some extent. We often set the highest orders of both bytes in GB Code as 1 (MSB=1) to show the difference. So the actually used GB Code is the variant code after the high order position 1, with the code range being 0A1A1H~0F7FEH. E.g., for Hanzi"啊", the first byte is 10110000 and the second byte is 10100001.


