GNU Unifont

From Wikipedia, the free encyclopedia

The GNU Unifont by Roman Czyborra is a free bitmap font that covers the Unicode Basic Multilingual Plane (BMP), using an intermediate bitmapped font format. It is present in most free operating systems and windowing systems such as Linux, XFree86 or the X.Org Server. The font is released under the GNU General Public License.

Contents

[edit] History

In 1998, Roman Czyborra observed that no font existed that covered the entire Unicode Basic Multilingual Plane (BMP). The BMP covers the first 65,536 code points of Unicode and includes most modern scripts. He began an effort to provide a free bitmapped font, to which others could contribute, to produce a complete BMP font. His goal was to have a display device be able to show some meaningful representation of each glyph in the BMP.

To this end, he developed a simple intermediate hexadecimal bitmap format with tools to convert to and from an ASCII representation of the bitmap. He also developed a utility to convert files in this hexadecimal format to Adobe Glyph Bitmap Distribution Format (BDF) files for use with X Window on Unix systems.

[edit] Status

The Unicode Basic Multilingual Plane covers 216 = 65,536 code points. Of this number, 4096 are reserved for special use as surrogate pairs and 6,400 are reserved for private use. This leaves approximately 55,000 code points to which glyphs can be assigned. Some of these code points are special values that do not have an assigned glyph, but most do have assigned glyphs.

Of the approximately 55,000 code points that can have assigned glyphs, the GNU Unifont contains about 41,000 glyphs as of December 2007. This leaves 14,000 glyphs to be added for complete BMP coverage. The existing glyphs cover all code points for ASCII, Latin-1, Latin A and B extensions, Greek, Cyrillic, Armenian, and Hebrew. Most Arabic glyphs are defined. Most Chinese-Japanese-Korean (CJK) glyphs are defined, including Hangul.

The following table shows font coverage as a percentage of each script that is complete as of the end of 2007. Scripts that are less than 100% complete can be augmented by any contributor.

Covered Range Script
100.0% U+0000..U+007F C0 Controls and Basic Latin
100.0% U+0080..U+00FF C1 Controls and Latin-1 Supplement
100.0% U+0100..U+017F Latin Extended-A
100.0% U+0180..U+024F Latin Extended-B
100.0% U+0250..U+02AF IPA Extensions
100.0% U+02B0..U+02FF Spacing Modifier Letters
100.0% U+0300..U+036F Combining Diacritical Marks
100.0% U+0370..U+03FF Greek and Coptic
100.0% U+0400..U+04FF Cyrillic
100.0% U+0500..U+052F Cyrillic Supplement
100.0% U+0530..U+058F Armenian
100.0% U+0590..U+05FF Hebrew
79.3% U+0600..U+06FF Arabic
3.8% U+0700..U+074F Syriac
37.5% U+0750..U+077F Arabic Supplement
21.9% U+0780..U+07BF Thaana
7.8% U+07C0..U+07FF N'Ko
100.0% U+0800..U+08FF Unassigned
95.3% U+0900..U+097F Devanagari
96.9% U+0980..U+09FF Bengali
39.1% U+0A00..U+0A7F Gurmukhi
35.2% U+0A80..U+0AFF Gujarati
36.7% U+0B00..U+0B7F Oriya
44.5% U+0B80..U+0BFF Tamil
37.5% U+0C00..U+0C7F Telugu
32.8% U+0C80..U+0CFF Kannada
39.1% U+0D00..U+0D7F Malayalam
37.5% U+0D80..U+0DFF Sinhala
100.0% U+0E00..U+0E7F Thai
100.0% U+0E80..U+0EFF Lao
82.0% U+0F00..U+0FFF Tibetan
51.2% U+1000..U+109F Myanmar
55.2% U+10A0..U+10FF Georgian
32.4% U+1100..U+11FF Hangul Jamo
97.9% U+1200..U+137F Ethiopic
18.8% U+1380..U+139F Ethiopic Supplement
100.0% U+13A0..U+13FF Cherokee
75.5% U+1400..U+167F Unified Canadian Aboriginal Syllabics
100.0% U+1680..U+169F Ogham
100.0% U+16A0..U+16FF Runic
37.5% U+1700..U+171F Tagalog
28.1% U+1720..U+173F Hanunoo
37.5% U+1740..U+175F Buhid
43.8% U+1760..U+177F Tagbanwa
10.9% U+1780..U+17FF Khmer
11.9% U+1800..U+18AF Mongolian
100.0% U+18B0..U+18FF Unassigned
17.5% U+1900..U+194F Limbu
27.1% U+1950..U+197F Tai Le
16.7% U+1980..U+19DF New Tai Lue
0.0% U+19E0..U+19FF Khmer Symbols
6.2% U+1A00..U+1A1F Buginese
0.0% U+1A20..U+1AFF Unassigned
5.5% U+1B00..U+1B7F Balinese
100.0% U+1B80..U+1CFF Unassigned
0.0% U+1D00..U+1D7F Phonetic Extensions
0.0% U+1D80..U+1DBF Phonetic Extensions Supplement
79.7% U+1DC0..U+1DFF Combining Diacritical Marks Supplement
100.0% U+1E00..U+1EFF Latin Extended Additional
100.0% U+1F00..U+1FFF Greek Extended
80.4% U+2000..U+206F General Punctuation
87.5% U+2070..U+209F Superscripts and Subscripts
87.5% U+20A0..U+20CF Currency Symbols
75.0% U+20D0..U+20FF Combining Diacritical Marks for Symbols
75.0% U+2100..U+214F Letterlike Symbols
98.4% U+2150..U+218F Number Forms
89.3% U+2190..U+21FF Arrows
94.5% U+2200..U+22FF Mathematical Operators
57.4% U+2300..U+23FF Miscellaneous Technical
100.0% U+2400..U+243F Control Pictures
100.0% U+2440..U+245F Optical Character Recognition
86.9% U+2460..U+24FF Enclosed Alphanumerics
100.0% U+2500..U+257F Box Drawing
68.8% U+2580..U+259F Block Elements
91.7% U+25A0..U+25FF Geometric Shapes
73.4% U+2600..U+26FF Miscellaneous Symbols
92.7% U+2700..U+27BF Dingbats
18.8% U+27C0..U+27EF Miscellaneous Mathematical Symbols - A
0.0% U+27F0..U+27FF Supplemental Arrows - A
100.0% U+2800..U+28FF Braille Patterns
0.0% U+2900..U+297F Supplemental Arrows - B
0.0% U+2980..U+29FF Miscellaneous Mathematical Symbols - B
0.0% U+2A00..U+2AFF Supplemental Mathematical Operators
87.9% U+2B00..U+2BFF Miscellaneous Symbols and Arrows
2.1% U+2C00..U+2C5F Glagolithic
46.9% U+2C60..U+2C7F Latin Extended C
10.9% U+2C80..U+2CFF Coptic
20.8% U+2D00..U+2D2F Georgian Supplement
31.2% U+2D30..U+2D7F Tifinagh
17.7% U+2D80..U+2DDF Ethiopic Extended
100.0% U+2DE0..U+2DFF Unassigned
79.7% U+2E00..U+2E7F Supplemental Punctuation
10.2% U+2E80..U+2EFF CJK Radicals Supplement
4.5% U+2F00..U+2FDF Kangxi Radicals
100.0% U+2FE0..U+2FEF Unassigned
100.0% U+2FF0..U+2FFF Ideographic Description Characters
62.5% U+3000..U+303F CJK Symbols and Punctuation
94.8% U+3040..U+309F Hiragana
97.9% U+30A0..U+30FF Katakana
93.8% U+3100..U+312F Bopomofo
100.0% U+3130..U+318F Hangul Compatibility Jamo
0.0% U+3190..U+319F Kanbun
25.0% U+31A0..U+31BF Bopomofo Extended
66.7% U+31C0..U+31EF CJK Strokes
0.0% U+31F0..U+31FF Katakana Phonetic Extensions
32.4% U+3200..U+32FF Enclosed CJK Letters and Months
39.1% U+3300..U+33FF CJK Compatibility
0.2% U+3400..U+4DBF CJK Unified Ideographs Extension A
0.0% U+4DC0..U+4DFF Yijing Hexagram Symbols
86.9% U+4E00..U+9FBF CJK Unified Ideographs
100.0% U+9FC0..U+9FFF Unassigned
0.3% U+A000..U+A48F Yi Syllables
14.1% U+A490..U+A4CF Yi Radicals
100.0% U+A4D0..U+A6FF Unassigned
15.6% U+A700..U+A71F Modifier Tone Letters
99.1% U+A720..U+A7FF Latin Extended - D
8.3% U+A800..U+A82F Syloti Nagri
100.0% U+A830..U+A83F Unassigned
12.5% U+A840..U+A87F Phags-pa
100.0% U+A880..U+ABFF Unassigned
100.0% U+AC00..U+D7AF Hangul Syllables
0.0% U+D7B0..U+D7FF Unassigned
0.0% U+D800..U+DFFF Surrogate Pairs - Not Used
100.0% U+E000..U+F8FF Private Use Area
61.5% U+F900..U+FAFF CJK Compatibility Ideographs
98.8% U+FB00..U+FB4F Alphabetic Presentation Forms
36.8% U+FB50..U+FDFF Arabic Presentation Forms - A
0.0% U+FE00..U+FE0F Variation Selectors
37.5% U+FE10..U+FE1F Vertical Forms
100.0% U+FE20..U+FE2F Combining Half Marks
87.5% U+FE30..U+FE4F CJK Compatibility Forms
100.0% U+FE50..U+FE6F Small Form Variants
99.3% U+FE70..U+FEFF Arabic Presentation Forms - B
77.1% U+FF00..U+FFEF Halfwidth and Fullwidth Forms
68.8% U+FFF0..U+FFFF Specials


[edit] unifont.hex Font Format

The GNU Unifont .hex format defines its glyphs as either 8 or 16 pixels in width by 16 pixels in height. Most Western script glyphs can be defined as 8 pixels wide, while other glyphs (notably the Chinese-Japanese-Korean, or CJK set) are typically defined as 16 pixels wide.

The unifont.hex file contains one line for each glyph. Each line consists of a four digit Unicode hexadecimal code point, a colon, and the bitmap string. The bit string is 32 hexadecimal digits for an 8 pixel wide glyph or 64 hexadecimal digits for a 16 pixel wide glyph.

A '1' bit in the bit string corresponds to an 'on' pixel. Pixels bits are stored top to bottom, left to right.

The font is then converted into a BDF file for use on X Window.

[edit] Example

This is an example font containing one glyph, for ASCII capital 'A'.

0041:0000000018242442427E424242420000

The first number is the hexadecimal Unicode code point, with range 0000 through FFFF. Hexadecimal 0041 is decimal 65, the code point for the letter 'A'. The colon separates the code point from the bitmap. In this example, the glyph is 8 pixels wide, so the bit string is 32 hexadecimal digits long.

The bit string begins with 8 zeroes, so the top 4 rows will be empty (2 hexadecimal digits per 8 bit byte, with 8 bits per row for an 8 pixel-wide glyph). The bit string also ends with 4 zeroes, so the bottom 2 rows will be empty. It is implicit from this that the default font descender is 2 rows below the baseline, and the capital height is 10 rows above the baseline. This is the case in the GNU Unifont with Latin glyphs.

The hexdraw Perl script produces the following output from the one line glyph definition above:

0041:   --------
        --------
        --------
        --------
        ---##---
        --#--#--
        --#--#--
        -#----#-
        -#----#-
        -######-
        -#----#-
        -#----#-
        -#----#-
        -#----#-
        --------
        --------

This can be edited in a text editor, then converted back into a hex string with the same utility. The goal was to create an intermediate format that would facilitate adding new glyphs.

[edit] References

  • The Unicode Consortium: The Unicode 5.0 Standard. 5th, Addison Wesley 2007; ISBN 0-321-48091-0.

[edit] External links