ISO/IEC 6937

From Wikipedia, the free encyclopedia

ISO/IEC 6937 is a multibyte extension of ASCII, or rather of ISO/IEC 646-IRV, developed in common with ITU-T (then CCITT) for telematic services under the name of T.51. Certain byte codes are used as lead bytes for letters with diacritics (accents). The value of the lead byte often indicates which diacritic that the letter has, and the follow byte then has the ASCII-value for the letter that the diacritic is on. Only certain combinations of lead byte and follow byte are allowed, and there are some exceptions to the lead byte interpretation for some follow bytes. Note, however, that no combining characters at all are encoded in ISO/IEC 6937. But one can represent some free-standing diacritics, often by letting the follow byte have the code for ASCII space.

ISO/IEC 6937's architects were Hugh McGregor Ross, Peter Fenwick, Bernard Marti and Luek Zeckendorf.

ISO6937/2 defines 327 characters found in modern European languages using the Latin alphabet. Unfortunately non-Latin European characters, such as Cyrillic and Greek are not included in the standard. Also, some accents used with the Latin alphabet like the Romanian comma are not included.

The ISO/IEC 2022 escape sequence to specify the right-hand side of the ISO/IEC 6937 character set is ESC - R (hex 1B 2D 52).[1]

Contents

[edit] Single byte characters

The primary set of ISO6937/2 is based on ISO646 (characters 0x00..0x7f) with the exception of character 0x24 ($) which is denoted as a "general currency sign" (¤):

        !"#¤%&`()*+'-./0123456789:;<=>?@
        ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`
        abcdefghijklmnopqrstuvwxyz{|}

The supplementary set (characters 0x80..0xff) contains a selection of spacing and non-spacing graphic characters, additional symbols and some locations reserved for future standardisation.

[edit] Two byte characters

The characters which are not represented in the primary set are coded on two bytes. The first byte the "non spacing diacritical mark" is followed by a letter from the base set e.g.:

small e with acute accent (é) = [Acute]+e

In total 13 diacritical marks can be followed by the selected characters from the primary set:

Accent Code Second character Result
Grave 0xC1 AEIOUaeiou ÀÈÌÒÙàèìòù
Acute 0xC2 ACEILNORSUYZaceilnorsuyz ÁĆÉÍĹŃÓŔŚÚÝŹáćéíĺńóŕśúýź
Circumflex 0xC3 ACEGHIJOSUWYaceghijosuwy ÂĈÊĜĤÎĴÔŜÛŴŶâĉêĝĥîĵôŝûŵŷ
Tilde 0xC4 AINOUainou ÃĨÑÕŨãĩñõũ
Macron 0xC5 AEIOUaeiou ĀĒĪŌŪāēīōū
Breve 0xC6 AGUagu ĂĞŬăğŭ
Dot 0xC7 CEGIZcegiz ĊĖĠİŻċėġıż
Umlaut or diæresis 0xC8 AEIOUYaeiouy ÄËÏÖÜŸäëïöüÿ
Ring 0xCA AUau ÅŮåů
Cedilla 0xCB CGKLNRSTcgklnrst ÇĢĶĻŅŖŞŢçģķļņŗşţ
DoubleAcute 0xCD OUou ŐŰőű
Ogonek 0xCE AEIUaeiu ĄĘĮŲąęįų
Caron 0xCF CDELNRSTZcdelnrstz ČĎĚĽŇŘŠŤŽčďěľňřšťž

[edit] Codepage layout

ISO/IEC 6937:1982[1]
—0 —1 —2 —3 —4 —5 —6 —7 —8 —9 —A —B —C —D —E —F
 
0−
 
NUL
0000
0
SOH
0001
1
STX
0002
2
ETX
0003
3
EOT
0004
4
ENQ
0005
5
ACK
0006
6
BEL
0007
7
BS
0008
8
HT
0009
9
LF
000A
10
VT
000B
11
FF
000C
12
CR
000D
13
SO
000E
14
SI
000F
15
 
1−
 
DLE
0010
16
DC1
0011
17
DC2
0012
18
DC3
0013
19
DC4
0014
20
NAK
0015
21
SYN
0016
22
ETB
0017
23
CAN
0018
24
EM
0019
25
SUB
001A
26
ESC
001B
27
FS
001C
28
GS
001D
29
RS
001E
30
US
001F
31
 
2−
 
SP
0020
32
!
0021
33
"
0022
34
#
0023
35
$
0024
36
%
0025
37
&
0026
38
'
0027
39
(
0028
40
)
0029
41
*
002A
42
+
002B
43
,
002C
44
-
002D
45
.
002E
46
/
002F
47
 
3−
 
0
0030
48
1
0031
49
2
0032
50
3
0033
51
4
0034
52
5
0035
53
6
0036
54
7
0037
55
8
0038
56
9
0039
57
:
003A
58
;
003B
59
<
003C
60
=
003D
61
>
003E
62
?
003F
63
 
4−
 
@
0040
64
A
0041
65
B
0042
66
C
0043
67
D
0044
68
E
0045
69
F
0046
70
G
0047
71
H
0048
72
I
0049
73
J
004A
74
K
004B
75
L
004C
76
M
004D
77
N
004E
78
O
004F
79
 
5−
 
P
0050
80
Q
0051
81
R
0052
82
S
0053
83
T
0054
84
U
0055
85
V
0056
86
W
0057
87
X
0058
88
Y
0059
89
Z
005A
90
[
005B
91
\
005C
92
]
005D
93
^
005E
94
_
005F
95
 
6−
 
`
0060
96
a
0061
97
b
0062
98
c
0063
99
d
0064
100
e
0065
101
f
0066
102
g
0067
103
h
0068
104
i
0069
105
j
006A
106
k
006B
107
l
006C
108
m
006D
109
n
006E
110
o
006F
111
 
7−
 
p
0070
112
q
0071
113
r
0072
114
s
0073
115
t
0074
116
u
0075
117
v
0076
118
w
0077
119
x
0078
120
y
0079
121
z
007A
122
{
007B
123
|
007C
124
}
007D
125
~
007E
126
DEL
007F
127
 
8−
 


128


129


130


131


132


133


134


135


136


137


138


139


140


141


142


143
 
9−
 


144


145


146


147


148


149


150


151


152


153


154


155


156


157


158


159
 
A−
 
NBSP
00A0
160
¡
00A1
161
¢
00A2
162
£
00A3
163


164
¥
00A5
165


166
§
00A7
167
¤

168


169


170
«

171


172


173


174


175
 
B−
 
°

176
±

177
²

178
³

179
×

180
µ

181


182
·

183
÷

184


185


186
»

187
¼

188
½

189
¾

190
¿

191
 
C−
 


192
◌̀
0300
193
◌́
0301
194
◌̂
0302
195
◌̃
0303
196
◌̄
0304
197
◌̆
0306
198
◌̇
0307
199
◌̈
0308
200


201
◌̊
030A
202
◌̧
0327
203


204
◌̋
030B
205
◌̨
0328
206
◌̌
030C
207
 
D−
 

2015
208
¹

209
®

210
©

211


212

266A
213
¬

214
¦

215


216


217


218


219

215B
220

215C
221

215D
222

215E
223
 
E−
 

2126
224
Æ

225
Đ
0110
226
ª

227
Ħ
0126
228


229
IJ
0132
230
Ŀ
013F
231
Ł
0141
232
Ø

233
Œ

234
º

235
Þ

236
Ŧ
0166
237
Ŋ
014A
238
ʼn
0149
239
 
F−
 
ĸ
0138
240
æ

241
đ
0111
242
ð

243
ħ
0127
244
ı
0131
245
ij
0133
246
ŀ
0140
247
ł
0142
248
ø

249
œ

250
ß

251
þ

252
ŧ
0167
253
ŋ
014B
254
SHY
00AD
255

Character code 0xA0 is a non-breaking space (U+00A0), while code 0xFF is a soft hyphen (U+00AD). The above table uses the Unicode combining accents (U+0300–U+036F) for the accents from codes 0xC1–0xCF, but the translation is not exact; Unicode combining accents suffix the accented character, while ISO/IEC 6937's non-spacing accents prefix the characters they modify.

[edit] References

  1. ^ a b Supplementary Set of ISO/IEC 6937:1992 The high-ASCII half of the character set. (The left-hand side is U.S. ASCII.)
Languages