CJK Unified Ideographs

From Wikipedia, the free encyclopedia

For unified CJK characters, see Han Unification.

CJK Unified Ideographs is a range of Unicode code points assigned for ideographs used by Chinese characters. Since its introduction in Unicode 1.00, the use of CJK ideographs has been extended to multiple blocks.

Contents

[edit] Unicode ranges

v  d  e
Character Types

Letters and other
     script specific
Unihan ideographs, etc.
Phonetic characters
Numerals
Punctuation and separators
Diacritics and other marks
Symbols:
Compatibility characters
Control characters
Other Topics
Combining character
Precomposed character

These ideographic characters appear in the following blocks:

  • CJK Unified Ideographs (4E00–9FFF) (chart)
  • CJK Unified Ideographs Extension A (3400–4DBF) (chart)
  • CJK Unified Ideographs Extension B (20000–2A6DF)
  • Enclosed CJK Letters and Months (3200–32FF) (chart)

Unicode includes support of CJKV radicals, strokes, punctuation, marks and symbols. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different:

Additional compatibility (discouraged use) characters appear in these blocks:

  • CJK Compatibility (3300–33FF) (chart)
  • CJK Compatibility Ideographs (F900–FAFF) (chart)
  • CJK Compatibility Ideographs (2F800–2FA1F) (chart)
  • CJK Compatibility Forms (FE30–FE4F) (chart)

These compatibility characters are included for compatibility with legacy text handling system and other legacy character sets. They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means.

[edit] CJK Compatibility Ideographs

Usually, compatibility characters are those that would not have been encoded except for compatibility and round-trip convertibility with other standards. However, the amount of CJK ideographs within any non-Unicode standard is too big to fit into Unicode's CJK Compatibility Ideographs blocks. Instead, code points are assigned when the affected characters are approved by Unicode Consortium, but have yet to assign any code points within the CJK Unified Ideographs blocks.

[edit] Version history

Unicode version Addition Plane Characters Total Characters
1.0 CJK Unified Ideographs Basic Multilingual Plane(BMP) 20,902 20,914
CJK Compatibility Ideographs BMP 12
3.0 CJK Unified Ideographs Extension A BMP 6,582 27,496
3.1 CJK Unified Ideographs Extension B Supplementary Ideographic Plane(SIP) 42,711 70,207
4.1 CJK Unified Ideographs: Ideographs from HKSCS-2004 and GB 18030-2000 not in ISO 10646 BMP 22 70,229
Post 5.1 CJK Unified Ideographs Extension C SIP 4,251 74,480

[edit] Sources

[edit] CJK Unified Ideographs

The code points in this region are assigned under Source Separation Rule. These characters came from following:

[edit] PRC

Code Standard Character count note
G0 GB 2312-80 6763
G1 GB 12345-90 2352
G3 GB 7589-87 traditional Chinese 7237
G5 GB 7590-87 traditional Chinese 7039
G7 Modern Chinese general character chart 642
G8 GB 8565-89 290

[edit] Taiwan

Code Standard Character count note
T1 CNS 11643-1986 plane 1 5401+9
T2 CNS 11643-1986 plane 2 7650
TE CNS 11643-1986 plane 14 6319+239+10 239 from CCIII, 10 from XCCS

[edit] Japan

Code Standard Character count note
J0 JIS X 0208-90 6335+1
J1 JIS X 0212-90 5801

[edit] South Korea

Code Standard Character count note
K0 KS C 5601-87 4888 includes 268 duplicates
K1 KS C 5657-91 2856

[edit] Others

In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters are assigned to between U+9FA6 and U+9FBB code points.

[edit] CJK Unified Ideographs Extension A

[edit] PRC

Code Standard
GE GB 16500-95
GS Singapore CJK ideographs

[edit] Taiwan

Code Standard note
T3 CNS 11643-1992 plane 3
T4 CNS 11643-1992 plane 4
T5 CNS 11643-1992 plane 5
T6 CNS 11643-1992 plane 6
T7 CNS 11643-1992 plane 7
TF CNS 11643-1992 plane 15

[edit] Japan

Code Standard note
JA Unified Japanese IT Vendors Contemporary Ideographs, 1993

[edit] South Korea

Code Standard note
K2 PKS C 5700-1:1994
K3 PKS C 5700-2:1994

[edit] Vietnam

Code Standard note
V0 TCVN 5773:1993
V1 TCVN 6056:1995

[edit] CJK Unified Ideographs Extension B

[edit] CJK Unified Ideographs Extension C

Extension C is currently (January 2007) under ballot within the International Organization for Standardization (ISO) and will be included in some version of Unicode after 5.1. The current allocation is to the code points U+2A6E0 to U+2B77A. The characters are derived from the following:

PRC

Macau Japan

  • Japanese KOKUJI Collection

South Korea

  • Korean IRG Hanja Character Set 5th Edition: 2001

North Korea

  • KPS 10721:2003

Vietnam

  • Từ điển chữ Nôm (喃字典), Nguyễn Quang Hồng, 2006
  • Từ điển chữ Nôm Tày, Hoàng Triều Ân, 2003
  • Bảng tra chữ Nôm miền Nam, Vũ Văn Kính, 1994

UTC

  • ABC Chinese-English Dictionary, John DeFrancis(德范克), et al., eds., 2nd edition. (1998) Honolulu: University of Hawaii Press
  • The Church of Jesus Christ of Latter-Day Saints Hong Kong division
  • Mathews' Chinese-English Dictionary, Robert H. Mathews (1975) Cambridge; Harvard University Press
  • Guangyun
  • Chinese bird system index (中国鸟类系统检索), Zheng Zhuoxin (郑作新), et al. (2000), Beijing, 科学出版社 (www.sciencep.com)
  • Annotated Shuowen Jiezi, Duan Yucai

[edit] CJK Unified Ideographs Extension D

According to the CJK editorial group report ISO/IEC JTC1/SC2/WG2/IRG N1266, there are at least characters from following:

Taiwan

  • TD-454E
  • TC-5036
  • TD-624C
  • TD-5352
  • TC-4139
  • TC-4A76
  • TD-5C26

Korea

  • K5H00535
  • K5H00222
  • K5H00297
  • KP1-73E1
  • KP1-712E
  • KP1-70BE
  • KP1-6752
  • KP1-672B
  • KP1-6651
  • KP1-4B50
  • KP1-487E
  • KP1-4731

Vietnam

  • V04-5073

Unicode

  • UTC00103

Others

  • CJK Unified Ideographs Extension C Remainder list
  • Macao SAR (IRGN1249 with minor adjustment)
  • Unicode (IRGN1256 and IRGN1257, 472 char)
  • China (IRGN1264, 57)

[edit] CJK Compatibility Ideographs

[edit] See also