Perfecting TrueType parsing

Let's review what happened last chapter when we tried to read the TrueType file.

First, the TrueType font is a binary file and therefore we needed a class rpnBinary for all data types that binary files have: bytes (8bit), words (16bit), long words (32 bit), rational numbers (actually 32 bit integers divided by 256*256, strings and date.

Then, a TrueType is just a collection of tables, so we read first the header which gives us at offset 4 a word with the number of tables. Then we start at offset 10 reading the table index into a dictionary, where each entry is 16 bytes: a type (4 letters), checksum, offset and length. For each table, we can then jump to the offset and start read. Each type of table has it's own structure.

The tables are deeply interconnected. You net to jump to get all the information.

  • Table head: the only values we are interested are unitsPerEm which is the scaling factor and indexToLocFormat which tells us if the offsets in the glyphs are 16bit or 32bit (for big font files).
  • Table maxp: this gives us the number of glyphs numglypghs.
  • Table cmap makes the translation between the character values and the position inside the glyphs. This table is quite complicated because there are multiple methods to make the translation, depending on platform and encoding. We assume there is at least a method supporting unicode and either windows or unix. This is the famous format 4, apparently most TrueType fonts support. The table is then a list of segments that have the same offset. For example 10-19 is mapped to 0-9, 30-90 to 10-70, 100-127 to 70-87 and 65535-65535 is always mapped to 0.
  • Table hhea: this gives us typographic values like lineGap, ascent and descent we do not yet use, but PS Type 3 fonts had them too in their dictionary. It gives us also the number of widths numOfLongHorMetrics. I wonder why this should be different from numglypghs.
  • Table hmtx gives the widths advanceWidth for each glyph (actually numOfLongHorMetrics times).
  • Table loca gives for each of the numGlyphs glyphs the offset in the glyf table.
  • Table glyf finally gives us the paths for each glyph in a highly compressed offset. Every byte is optimised.
    • First, we get the numberOfContours paths. A letter like F has only one path, O and R have 2, % has 5.
    • For each path we then get the final point that closes the path endPtsOfContours.
    • We then have instruction but apparently of no use, I ignore them.
    • Then we have the points and that is highly compressed. Every information that can be induced will be omitted.
    • We read first for each point a one byte flag and prefill the positions x and y with 0. The flag tells us if the point is on the curve, if x and y are the same values as the previous values. In that case, it has to read only one and not two coordinates.
    • We then read the x and y values with a function readCoords
    • But actually, we do not get all points. Normally, on quadratic bezier curves there should always be a point on curve and a control point that is not on curve. However, when you inspect the values, there a much more control points than curve points. This is en extra compression: If two control points follow, there is an implied on-curve point exactly between the control points. This happens quite often because symmetrical control points make well balanced paths.


Javascript Editor

Javascript Editor

Now, what does our test operator function showttf? For testing purposes, we just defined a global font variable that should be the current font. We will do this properly with findfont scalefont definefont, when we include it in the workflow. For now, we just want to see the font rendered.

For each character it finds the glyph and constructs postscript code to render the character.

  • It loops the point list
  • It adds a moveto for the first point.
  • If the next point is on curve, it adds a lineto.
  • If it is off curve, it keeps the controls point as bezier variable.
  • Now, if the next point is on-curve, it uses the bezier and adds a qcurveto (we will define below).
  • If it is off-curve, the we calculate the middle point, set a qcurveto and keep the control point as a bezier.
  • For each round, we must check if we reached the end of a segment and close the path with closepath and start a new one with moveto. (frankly, we could also have made the segmentation before the loop).
  • At the end we fill.
  • The entire code must be enclosed with a graphics context, where we translate to the current point and scale. We need to offset at the end by the width.

The qcurveto' is the quadratic variant of the curveto operator. As a matter of fact, each quadratic bezier can be represented by a cubic curve. Having start, end point and quadratic control points, you calculate the two cubic points being 2/3 between the endpoints and the control point

Javascript Editor

This seems to render quite well

Javascript Editor

How is our rainbow?

Javascript Editor

You need only four segments for a decent circle of cubic bezier.
With quadratic bezier, this is clumsy. You need 8 points. I did with trial and error.

Javascript Editor

We are ready now to replace the PostScript Type 3 fonts with TrueType fonts. Actually we can still decide to convert them to PostScript Type 3 fonts as the TrueType code we produce here creates PostScript code.

We wait to the next chapter to include that in the code base.

My Journey to PostScript