Render PostScript to PDF
We are about to arrive to the pièce de résistance. Adobe developed PostScript 1982 for typesetting, and Apple made it popular using it as the printer language for the LaserWriter 1984. PostScript was the Rosetta Stone for desktop publishing and the question raised if it could have a wider use as graphics language for documents in general and display.
But PostScript was a kind of too powerful. As a Turing complete language it could be a challenge for computers to handle the graphics never knowing when the page would be ready.
Therefore Adobe came 1991 with the PDF format. PDF is based on PostScript, but it has a reduced instruction set which is mainly the path operators cutting away the general programming language. This, together with the design choice to embed everything in one document, was a wise decision. PDF is now one of the longest living document formats having a public specification making is suitable for long time archiving. It is also readable on virtually any device except that is not an image standard on browsers. Actually it is the only archiving format for documents I would recommend you to use.
Rendering to PDF is however quite a challenge. Though PostScript is a text based format and most part of the PDF is text based too, it is a technically binary format. PDF was designed of random access so that the reader application does not have to read the entire file. This random access needs an index with byte offsets for each element of the page. We will have to handle them. Current PDF uses also compression heavily, but we are not obliged to use it. PDF 1.0 specification allows for clear text of all elements.
The code that follows is a longer interaction of trial and error. I did read the book (Portable Document Format Reference Manual), but I did also open simple PDF files in a text editor, reverse engineered the code, exported files and checked them against Preview first, Acrobat Reader later.
The PDF file looks like this
%PDF-1.1
%•±Î rpn
1 0 obj
<< /Type /Catalog /Pages 2 0 R >>
endobj
2 0 obj
<< /Type /Pages /Kids [ 3 0 R ] /Count 1 /MediaBox [0 0 590 330] >>
endobj
3 0 obj
<< /Type /Page /Parent 2 0 R /Resources << /Font << >> >> /Contents 4 0 R >>
endobj
4 0 obj << /Length 229322 >>
stream
10.5468696 100 m 10.5468696 109.2285109 l...
75.45897133999999 l 354.9705725800001 75.45897133999999 l 354.9705725800001 75.45897133999999 l h
0.100 0.100 0.100 rg
F
endstream
endobj
xref
0 5
0000000000 65535 f
0000000018 00000 n
0000000069 00000 n
0000000154 00000 n
0000000251 00000 n
trailer << /Root 1 0 R /Size 5 >>
startxref
229629
%%EOF
The PDF file is a list of objects. Each objects starts with id generation obj and ends with endobj.
The id number is ordinal. The generation number would allow to create multiple versions of the same object without overwriting; we will not use it. Values are separated by space and newline. Each object is a dictionary which uses a notation with double tag brackets and a simple list of name keys and values.
Each object has a type
- Object 1 Catalog is the main object that just shows us that the pages list is object 2.
- Object 2 Pages is a list of pages. We have one page which is object 3 and the MediaBox indicates the size of the page.
- Object 3 Page may have resources which is the empty list of fonts and the content of the page width is object 4.
- Object 4 is the content stream. This is no dictionary, just reduced PostScript code. The operators are abbreviated
- m move
- l lineto
- c curveto
- h closepath
- rg setfillcolor
- RG setstrokectlor
- w setlinewidh
- F fill
- S stroke
- Finally, there is a cross reference table. There are 6 elements starting at 0. Each object has the byte offset, generation and the status. The object 0 is max generation (65535) and free (f), the other are generation 0 and used (n). Finally, startxref which is always at the end of the file shows the offset of the xref table.
We first build a rpnPDFDevice which just creates an empty page to work if it can be opened by Preview and add a link to postScriptEditor. Then we add fill and stroke.
And the justified text
It should be able to show the PDF directly with the object or embed tag, but I couldn't get it work to refresh the node.
We now can create valid SVG and PDF files but all text is converted in paths. It would be better to handle the text as text, so it can be selected and modified. The files would also be smaller if the path for each character is only defined once. To be able to do that, we will have to add the font to the SVG and to the PDF.
We have now 2631 lines of Javascript code (88 KB) with no dependencies capable to interpret PostScript and render gray level images to canvas, SVG and PDF. ps20240921.js and a reference installation
minimal5.html
Bonus: InDesign product manger David Evans on why Adobe created PDF:
https://www.asc.ohio-state.edu/schumacher.60/imageinfo/pdf_ps_eps.html