[More quick guides]

A quick guide to
the boxtracking API
in Prince 15

Think inside the boxes

Prince is a HTML-to-PDF-via-CSS converter which offers advanced layout capabilities. In order to achieve perfection, it is sometimes necessary to analyze the output of the formatting process. You can do so through a JavaScript API, and this guide will show you how. Some knowledge of HTML, CSS and JavaScript is required to follow this tutorial. For a more formal description of the boxtracking API, see the Prince documentation.

Here's a minimal, but complete, document which says hello to the boxtracking API:

html<!DOCTYPE html><html><meta charset=utf-8>
<script>
Prince.trackBoxes = true;
Prince.registerPostLayoutFunc(countpages);

function countpages() {
  var pages = PDF.pages;
  console.log("Number of pages: "+pages.length);
}
</script>
<body>
<p>Hello world!
</html>

To format the document yourself, download and install Prince, then run this in a console:

prince -j https://css4.pub/2023/boxtracking/sample-1.html -o foo.pdf

When Prince runs, you will see this in your console:

Number of pages: 1

You can process all examples in this guide in this manner; the source code is linked on the right side of the box.

Looking inside

In the first example we just counted the number of elements in the PDF.pages array; this number corresponds to the number of pages in the formatted document. If we look inside the PDF.pages array, we can find all boxes created by Prince in the formatting process:

htmlPrince.trackBoxes = true;
Prince.registerPostLayoutFunc(analyze);

function lookwithin(str,box) {
  console.log(str+box.type); 

  for (var i=0; i<box.children.length; i++) {
    lookwithin(str+"  ",box.children[i]);
  }
}

function analyze() {
  var pages = PDF.pages;

  for (var i = 0; i<pages.length; ++i) {
     console.log("Boxes on page "+(i+1)+":");
     lookwithin("  ", pages[i]);
  }
}

When processed, Prince logs:

Boxes on page 1:
  BODY
    BOX
      BOX
        BOX
          LINE
            TEXT
          LINE
            TEXT

Prince generates different kinds of boxes: BODY, BOX, LINE, TEXT. Boxes are organized in tree structure, similiar to how HTML elements are organized. The structure will often be similar, but one element can have many boxes due to line breaks and page breaks.

The boxtracking API allows you to analyze this tree of boxes. In the example above, the analyze function is started after the formatting process has been completed.

Finding tag names and text

By digging a little deeper, we can also find the elements that generated the boxes, and their content:

htmlfunction lookwithin(str,box) {
  if (box.type=="BOX") { 
     console.log(str+"BOX created by "+box.element.tagName+" element"); 
  } else if (box.type=="TEXT") { 
     console.log(str+"TEXT with content: "+box.text); 
  } else {
     console.log(str+box.type); 
  }

  for (var i=0; i<box.children.length; i++) {
    lookwithin(str+"  ",box.children[i]);
  }
}

function analyze() {
  var pages = PDF.pages;

  for (var i = 0; i<pages.length; ++i) {
     console.log("Boxes on page "+(i+1)+":");
     lookwithin("  ", pages[i]);
  }
}

When processed, Prince logs:

Boxes on page 1:
  BODY
    BOX created by HTML element
      BOX created by BODY element
        BOX created by P element
          LINE
            TEXT with content: Hello
          LINE
            TEXT with content: world!

As you can see, it is quite simple to extract information from the tree of boxes.

End of the line

Consecutive lines ending with the same word may be sub-optimal. Here's a simple script that detects such lines:

html  if (box.type =="TEXT") { 
     var n = box.text.split(" ");
     var lastWord = n[n.length - 1];

     if (lastWord == previousLastWord) {
        console.log("LINE ENDING ALERT ON PAGE "+box.pageNum+": "+lastWord); 
     }
     previousLastWord = lastWord;
  }

When processed, Prince logs:

LINE ENDING ALERT ON PAGE 1: world.

(It should be noted that this example does not work so well when the text is justified. We'll get back to this.)

Where are my elements?

A common task for the boxtracking API is to track boxes generated by certain elements. In this example, a CSS selector (p.foo, p.bar) indicates which elements to track, and the API provides the coordinates to the corresponding boxes.

html<p class=foo>Hello world!
<p>Hello world!
<p class=bar>Hello world!
<p>Hello world!

The script extracts the x/y/width/height for all boxes belonging to matching elements:

tagname:  P  classname:  foo
  x:  15.874015748031498
  y:  210.8976377952756
  width:  195.02362204724412
  height:  27.77952755905511
tagname:  P  classname:  bar
  x:  15.874015748031498
  y:  155.3385826771654
  width:  195.02362204724412
  height:  27.77952755905511

Color me red (but only on page 2, please)

In the previous examples, we have analyzed boxes without making any changes. In this example we will change the styling of certain elements.

We will start with finding the elements, and from there move over to the generated boxes. The script below starts by looking for all p elements in the document. It then uses the Prince-specific getPrinceBoxes function to find the boxes that belong to the elements. If a box appears on page 2, the color of the corresponding p element will be set to red:

htmlfunction setColor() {
  var elements = document.querySelectorAll("p");
  for(var i=0; i<elements.length; i++) { 
      var boxes = elements[i].getPrinceBoxes();
      for(var j=0; j<boxes.length; j++) { 
         if (boxes[j].pageNum == 2) {
            boxes[j].element.style.color="red";
         }
      }
   }
}

When the script makes changes to the element structure (in this case by setting the color to red), Prince will automatically rerun the formatting process, and we therefore see red text on the second page.

One common question about the boxtracking API is: how can I change the color (or other properties) of a box? This is not possible.

Elements vs boxes

This example uses the same script as above; it looks for all p elements and the corresponding boxes. However, in this case there is only one p element which is split into several boxes, each on a different page. Therefore, when the color of the element is changed, all boxes generated by the P element will be red.

htmlfunction setColor() {
  var elements = document.querySelectorAll("p");
  for(var i=0; i<elements.length; i++) { 
      var boxes = elements[i].getPrinceBoxes();
      for(var j=0; j<boxes.length; j++) { 
         if (boxes[j].pageNum == 2) {
            boxes[j].element.style.color="red";
         }
      }
  }
}

Dynamic table headers

HTML has a thead element for setting the header of table. This works well when the header is the same on all pages. However, it is quite common to have a changing header, based on the current section of the table. This is similar to having running headers for chapters in a book.

This somewhat complex example creates dynamic table headers by using the boxtracking API. The strategy is to hide headers, rather than adding them. That is, the original document has duplicate headers on every other row. The script in the document below analyses the page layout and hides duplicate headers not shown at the top of a page. The elements must be hidden one at a time, then the script must be rerun. Thankfully, Prince automatically handle such reruns.

html<table>
<tr><th colspan=2>Apples
<tr><td>foo<td>bar
<tr class=duplicate><th colspan=2>Apples (cont.)
<tr><td>foo<td>bar
<tr class=duplicate><th colspan=2>Apples (cont.)
<tr><td>foo<td>bar
<tr class=duplicate><th colspan=2>Apples (cont.)
<tr><td>foo<td>bar
<tr><th colspan=2>Pears
<tr><td>foo<td>bar
<tr class=duplicate><th colspan=2>Pears (cont.)
<tr><td>foo<td>bar
<tr class=duplicate><th colspan=2>Pears (cont.)
<tr><td>foo<td>bar
...

Subtotals

It is common to calculate subtotals on invoices on a per-page basis. Here's a solution which shows subtotals in deferred elements:

html<table>
<tr><td>Apples<td class=add>1
<tr><td>Oranges<td class=add>2
<tr><td>Pears<td class=add>1
...
</table>

2023-08-15 HÃ¥kon Wium Lie