CJK Unified Ideographs
From Wikipedia, the free encyclopedia
- For unified CJK characters, see Han Unification.
CJK Unified Ideographs is a range of Unicode code points assigned for ideographs used by Chinese characters. Since its introduction in Unicode 1.00, the use of CJK ideographs has been extended to multiple blocks.
Contents |
[edit] Unicode ranges
These ideographic characters appear in the following blocks:
- CJK Unified Ideographs (4E00–9FFF) (chart)
- CJK Unified Ideographs Extension A (3400–4DBF) (chart)
- CJK Unified Ideographs Extension B (20000–2A6DF)
- Enclosed CJK Letters and Months (3200–32FF) (chart)
Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different:
- Kangxi Radicals (2F00–2FDF)
- CJK Radicals Supplement (2E80–2EFF)
- CJK Symbols and Punctuation (3000–303F) (chart)
- CJK Strokes (31C0–31EF)
- Ideographic Description Characters (2FF0–2FFF)
Additional compatibility (discouraged use) characters appear in these blocks:
- CJK Compatibility (3300–33FF) (chart)
- CJK Compatibility Ideographs (F900–FAFF) (chart)
- CJK Compatibility Ideographs (2F800–2FA1F) (chart)
- CJK Compatibility Forms (FE30–FE4F) (chart)
These compatibility characters are included for compatibility with legacy text handling system and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means.
[edit] CJK Compatibility Ideographs
Usually, compatibility characters are those that would not have been encoded except for compatibility and round-trip convertibility with other standards. However, the amount of CJK ideographs within any non-Unicode standard is too big to fit into Unicode's CJK Compatibility Ideographs blocks. Instead, code points are assigned when the affected characters are approved by Unicode Consortium, but have yet to assign any code points within the CJK Unified Ideographs blocks.
[edit] Version history
| Unicode version | Addition | Plane | Characters | Total Characters |
|---|---|---|---|---|
| 1.0 | CJK Unified Ideographs | Basic Multilingual Plane(BMP) | 20,902 | 20,914 |
| CJK Compatibility Ideographs | BMP | 12 | ||
| 3.0 | CJK Unified Ideographs Extension A | BMP | 6,582 | 27,496 |
| 3.1 | CJK Unified Ideographs Extension B | Supplementary Ideographic Plane(SIP) | 42,711 | 70,207 |
| 4.1 | CJK Unified Ideographs: Ideographs from HKSCS-2004 and GB 18030-2000 not in ISO 10646 | BMP | 22 | 70,229 |
| Post 5.1 | CJK Unified Ideographs Extension C | SIP | 4,251 | 74,480 |
[edit] Sources
[edit] CJK Unified Ideographs
The code points in this region are assigned under Source Separation Rule. These characters came from following:
[edit] PRC
| Code | Standard | Character count | note |
|---|---|---|---|
| G0 | GB 2312-80 | 6763 | |
| G1 | GB 12345-90 | 2352 | |
| G3 | GB 7589-87 traditional Chinese | 7237 | |
| G5 | GB 7590-87 traditional Chinese | 7039 | |
| G7 | Modern Chinese general character chart | 642 | |
| G8 | GB 8565-89 | 290 |
[edit] Taiwan
| Code | Standard | Character count | note |
|---|---|---|---|
| T1 | CNS 11643-1986 plane 1 | 5401+9 | |
| T2 | CNS 11643-1986 plane 2 | 7650 | |
| TE | CNS 11643-1986 plane 14 | 6319+239+10 | 239 from CCIII, 10 from XCCS |
[edit] Japan
| Code | Standard | Character count | note |
|---|---|---|---|
| J0 | JIS X 0208-90 | 6335+1 | |
| J1 | JIS X 0212-90 | 5801 |
[edit] South Korea
| Code | Standard | Character count | note |
|---|---|---|---|
| K0 | KS C 5601-87 | 4888 | includes 268 duplicates |
| K1 | KS C 5657-91 | 2856 |
[edit] Others
- ANSI Z39.64-1989
- Big5
- CCCII plane 1
- GB 12052-89
- JEF
- Chinese telegraph code
- Taiwan telegraph code
- Xerox Chinese
In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters are assigned to between U+9FA6 and U+9FBB code points.
[edit] CJK Unified Ideographs Extension A
[edit] PRC
| Code | Standard |
|---|---|
| GE | GB 16500-95 |
| GS | Singapore CJK ideographs |
[edit] Taiwan
| Code | Standard | note |
|---|---|---|
| T3 | CNS 11643-1992 plane 3 | |
| T4 | CNS 11643-1992 plane 4 | |
| T5 | CNS 11643-1992 plane 5 | |
| T6 | CNS 11643-1992 plane 6 | |
| T7 | CNS 11643-1992 plane 7 | |
| TF | CNS 11643-1992 plane 15 |
[edit] Japan
| Code | Standard | note |
|---|---|---|
| JA | Unified Japanese IT Vendors Contemporary Ideographs, 1993 |
[edit] South Korea
| Code | Standard | note |
|---|---|---|
| K2 | PKS C 5700-1:1994 | |
| K3 | PKS C 5700-2:1994 |
[edit] Vietnam
| Code | Standard | note |
|---|---|---|
| V0 | TCVN 5773:1993 | |
| V1 | TCVN 6056:1995 |
[edit] CJK Unified Ideographs Extension B
- Kangxi dictionary
- Hanyu character dictionary
- Ciyuan
- Cihai
- Hanyu word dictionary
- Encyclopedia of China
- Beijing University Founder DTP
- Siku Quanshu
- HKSCS
- JIS X 0213 planes 3 and 4
- PKS 5700-3:1998
- KPS 9566-97, KPS 10721-2000
- CNS 11643 planes 4-7, 15
- TCVN, VHN 01:1998, VHN 02:1998
[edit] CJK Unified Ideographs Extension C
Extension C is currently (January 2007) under ballot within the International Organization for Standardization (ISO) and will be included in some version of Unicode after 5.1. The current allocation is to the code points U+2A6E0 to U+2B77A. The characters are derived from the following:
PRC
- Encyclopedia of China
- Beijing University Founder DTP
- Hanyu character dictionary
- Hanyu word dictionary
- Old hanyu word dictionary
- Commercial Press Ideographs
- Xiandaihanyu Cidian
- Cihai
- Kangxi dictionary
- Chinese Academy of Surveying & Mapping
- Modern Chinese Dialect Encyclopedia
- Yanzhou jinwen jicheng yinde (殷周金文集成引得)
Macau Japan
- Japanese KOKUJI Collection
South Korea
- Korean IRG Hanja Character Set 5th Edition: 2001
North Korea
- KPS 10721:2003
Vietnam
- Từ điển chữ Nôm (喃字典), Nguyễn Quang Hồng, 2006
- Từ điển chữ Nôm Tày, Hoàng Triều Ân, 2003
- Bảng tra chữ Nôm miền Nam, Vũ Văn Kính, 1994
UTC
- ABC Chinese-English Dictionary, John DeFrancis(德范克), et al., eds., 2nd edition. (1998) Honolulu: University of Hawaii Press
- The Church of Jesus Christ of Latter-Day Saints Hong Kong division
- Mathews' Chinese-English Dictionary, Robert H. Mathews (1975) Cambridge; Harvard University Press
- Guangyun
- Chinese bird system index (中国鸟类系统检索), Zheng Zhuoxin (郑作新), et al. (2000), Beijing, 科学出版社 (www.sciencep.com)
- Annotated Shuowen Jiezi, Duan Yucai
[edit] CJK Unified Ideographs Extension D
According to the CJK editorial group report ISO/IEC JTC1/SC2/WG2/IRG N1266, there are at least characters from following:
Taiwan
- TD-454E
- TC-5036
- TD-624C
- TD-5352
- TC-4139
- TC-4A76
- TD-5C26
Korea
- K5H00535
- K5H00222
- K5H00297
- KP1-73E1
- KP1-712E
- KP1-70BE
- KP1-6752
- KP1-672B
- KP1-6651
- KP1-4B50
- KP1-487E
- KP1-4731
Vietnam
- V04-5073
Unicode
- UTC00103
Others
- CJK Unified Ideographs Extension C Remainder list
- Macao SAR (IRGN1249 with minor adjustment)
- Unicode (IRGN1256 and IRGN1257, 472 char)
- China (IRGN1264, 57)

