TECHNICAL INFORMATION ABOUT UNICODE ON THE WEB

If you are new to Unicode on the web, viewing the source of a Unicode page may be an odd experience.  For example, the source of the first hypothesis of “On the Sizes and Distances of the Sun and Moon” by Aristarchus looks like this:

αʹ. 
Τὴν σελήνην παρὰ τοῦ ἡλίου τὸ φῶς λαμβάνειν.

The browser turns it into this:

αʹ.  Τὴν σελήνην παρὰ τοῦ ἡλίου τὸ φῶς λαμβάνειν.

On this site, each Unicode character is represented by a numeric code reference, which consists of “&#,” the decimal Unicode value of the character, and a semicolon.  For example, small alpha is assigned the value 945, so α becomes α.

(To make things more complicated, the hexadecimal value can also be used; the format is α, but support in older browsers is not as good.  The JavaScript format is \u03B1.)

JavaScript

This site began as an exercise in representing some original text of the Greek astronomer Aristarchus on the web.  The font utilities came about as a means of generating Unicode text as above.  If you would like to try your hand at it, the relevant JavaScript methods are charCodeAt(index), which returns the numeric value of a character; and fromCharCode(num1, ..., numN), which turns numbers into characters.  Unfortunately, these methods can only return values up to 0xFFFF, but this accounts for most existing Unicode font ranges.  See the surrogate pair calculator on the utilities page for information about representing values above 0xFFFF.

Hexadecimal notation

Hexadecimal refers to base 16 numbers, as opposed to the familiar base 10 (decimal).  The digits 0-9 and letters A-F are used to represent values 0-15; 10 in hexadecimal is the decimal value 16.  In JavaScript, to indicate a hexadecimal number, precede it with zero x (0x), as in 0x1FA0 for 8096 (the value for ).  (Do not precede a decimal number with zero, because this means octal, or base 8.)  In the JavaScript utilities, either base may be used for character values, as long as the appropriate format is used.  The Unicode range viewer was designed to display both values for each character, as most published maps show only the hexidecimal.

Choosing the right charset to make browsers display Unicode fonts

Browsers should always be told what character set is being used on a web page.  This is done via a tag in the head of the page, which for this page consists of

<meta http-equiv="content-type" content="text/html; charset=windows-1252">

If you view the encoding of this page, it should be Western European (Windows) or Western (Windows-1252) if specified.

Windows-1252 is a limited character set which is sufficient for most plain text.  Using this character set, numeric code references must be used to represent Unicode characters as described above.  To include actual Unicode characters themselves in an HTML document, a different character set must be used, and the browser must be “Unicode-enabled.”  Much has been written about this subject elsewhere; in short, the charset value that is usually the most relevant is UTF-8.  However, older browsers were not able to represent Unicode characters in this manner.  I used windows-1252 and numeric code references on these pages because when they were first uploaded, non-Unicode-enabled browsers were more common than they are today.  Fonts are specified with stylesheets as all users may not have a default Unicode font specified.

See the UTF-8 version of “On the Sizes and Distances of the Sun and Moon” for an example of a UTF-8-encoded page.  The encoding, if specified, will be listed as UTF-8.  If you view the page source, you will see the unicode characters, not numeric code references; and if you save the page using Notepad, the encoding will be UTF-8 instead of the default ANSI.  See the surrogate pair calculator on the utilities page for more information.  Feedback is welcome.

Font software

Finally, here are links to some font software if you decide to pursue fontmaking yourself.  (But I warn you, it is more addicting than most illegal drugs.)

Many people's “starter” program is Softy by Dave Emmett.  But you may quickly outgrow it.

The program I originally used to make the glyphs in Aristarcoj was TypeTool from FontLab.  The editing is more advanced than any other program in its price range.

To create the font file itself, I used Font Creator by High Logic.  I now use it for the entire fontmaking process as the editing has beome more advanced in later versions.  The only basic function that it lacks is hinting, which can be done with TypeTool if necessary.

Cross Font can be used to convert a .ttf file to a Mac binary which can be opened on a Macintosh.  It is said to work well; but Unicode fonts require Mac OS X, which uses .ttf fonts in their native form, so this utility is not necessary for Unicode fonts.

You may already have some sort of font viewer program, but it is not likely that it will display extended Unicode ranges.  The Unicode range viewer may thus come in handy.

One more must-have utility:  the Microsoft Font properties extension.

Feedback  is welcome about any of the information above.