Working with fonts
We have mace a long journey to write our PostScript interpreter and we are finally ready to approach text. Text is what is most rendered in PostScript documents. Most layout contain texts.
Handling text in PostScript is both simple and complex. Drawing text is no other thing than defining paths and filling them with black. Text will use the same bezier and scanFill algorithms as any other shape.
But defining shapes for every document would be complex. That's where fonts enter in. Fonts are collections of paths for each character, called glyphs.
Where do we get the fonts from? Fonts are files provided in different formats. At the introduction of PostScript, most fonts where Type 1 fonts, a format promoted by Adobe and adopted by most foundaries because it was encrypted. This will not working for us. We could use TrueType fonts, but their format is very complex (hello Microsoft!). OpenType is just Type 1 and TrueType in an envelope. We could use the SVG fonts, where the paths are provided in XML format. For a reason I cannot understand, propably also to protect foundaries.
But there are Type 3 fonts. At the introduction of PostScript, Adobe proposed also this lower range format to all foundries who would not want to license Type 1. Type 3 fonts are just PostScript code.
For our interpreter, we will use the open source fonts Computer Modern. This font was created by Donald E. Knuth and it has an amazing history. Knuth is a pioneer in computer programming and wrote the book series The Art of Computer Programming which describes virtually all fundamental algorithms that have been created. The series is supposed to have 7 volumes. The first volume was published 1968, volume 4 was published 2011 and volume 5 is expected for 2030. If you are interested in programming, you should read them. I have never seen as good explanations.
When Knuth published volume 1, the book was printed in lead type. Typesetting mathematical formulas was complicated manual work with Monotype machines. And forget about a typewriter font for code. All lead fonts were proportional.
The book production was too expensive for the editor, given that the book sold less than the spy novels by Ian Fleming, the editor wanted a cheaper production for the subsequent volumes. He proposed Knuth to use a typewriter for the formulas, take a photograph of the paper and use a clichée of the photograph for the book. The graphical quality of the formulas was much lower.
Knuth could not really accept compromises and heard about recent progress of phototypesetting. This method of typesetting existed since the forties and allowed to create type optically. In the sixties, with the progress of television, experiences were made representing the type on a cathodic tube and photograph that. This transforms type into an electronic signal, furthermore a bitmap. Once the resolution was good enough (1000 lines per inch, much more than our interpreter), the type was the same quality than lead type.
So a computer provided a digital image based on a bitmap, which itself is just a sequence of 0 for black and 1 for white. With this, typesetting became an algorithmic problem. This is something Knuth could handle.
Knuth spent much of his time not only to write a word processor (TeX), but he also created a font generator (Metafont), leading to the Computer Modern font family. Knuth studied the geometry of the renaissance fonts derived from mathematical principles. But Knuth also had to learn the hard way that the Masters only loosely followed them, because pure mathematical fonts weren't so pretty. To design beautiful curves, he created an algorithm based on cubic splines. For the first font versions, he didn't know about Bézier.
There is a Truetype font at https://www.fontsquirrel.com/fonts/computer-modern
We download the fonts and convert them with FontForge to Type 3 fonts (ignore warnings on Em-Size) Note that the Metafont fonts have version for each point size. We use size 12 points.
Let's take a closer look to CMUSerif-Roman.ps
The font file starts with some comments and then creates a nested dictionary. We will comment the ones that are of interest for us.
- /FontType (Type 3)
- /FontMatrix [0.000488281 0 0 0.000488281 0 0 ] is the transformation matrix for point size 1. All paths are reduced by the factor of 2048.
- /FontName /CMUSerif-Roman
- /FontBBox {-2324 -723 2982 1915 }
- /PaintType 0 def
- /FontInfo
- /version (0.7.0)
- /Notice (Converted by Andrey V. Panov from TeX fonts. Some glyphs are copied from Blue Sky fonts released by AMS.)
- /FullName (CMU Serif Roman)
- /FamilyName (CMU Serif)
- /Weight (Medium)
- /FSType 4
- /ItalicAngle 0
- /isFixedPitch
- /UnderlinePosition -307
- /UnderlineThickness 102
- /Encoding has 256 entries
- 13 /uni000
- 32 /space
- 33 /exclam ...
- 255 /ydieresis
- /BuildChar { 1 index /Encoding get exch get 1 index /BuildGlyph get exec }
- /BuildGlyph { 2 copy exch /CharProcs get exch 2 copy known not { pop /.notdef} if get exch pop 0 exch exec pop pop fill}
- /CharProcs are the actual paths
- /.notdef { 1556 0 20 20 1536 1567 setcachedevice 20 20 moveto 20 1567 lineto ... closepathh
Note that each path starts with a setchachedevice operator. This operator shows the geometry of the font with 6 values: wx wy llx lly urx ury. wx and wy define the width and height for the space used, llx, lly lower left and urx, ury the upper right corner.
Note also that Type 3 fonts do not allow for kerning.
At the end, definefont binds the dictionary to the font name CMUSerif-Roman.
To use the font, we embed it in a Javascript file and define the variable rpnFiles["CMUSerif-Roman"]. The operator run will run the code
Let's start to implement the operators we need: array, bind, definefont, for, index, readonly, run, setcachedevice. We need also to to add fontdict to the context and a rpnDictionary class.