How to Debug Weird PDFs

The PDF specification is very complex. The PDF Reference has 1,310 pages, and we regularly need to refer to the chapters on Graphics, Text, and Interactive Features.

We've been working on DocSpring's PDF filling engine for quite a while, and we've built up a very comprehensive set of test cases. We've processed millions of documents and have come across a lot of seriously strange PDFs. Every time we come across a malformed PDF or some edge case that crashes our server, we add a new test case and figure out how to fix the problem. Our servers don't crash so often these days, but PDF tools and editors are always finding new ways to produce some truly bizarre files. Here are some of the tools we use to track down PDF bugs and edge cases.

Conditional Numeric Editable Checkboxes in a Radio Group

To figure out why a bug is happening, a good first step is to create a minimal, reproducible example. Here's an example of a bug we encountered:

  • The template contains two fields with the same name
  • The fields have the "Number" data type
  • The display type is set to "checkbox"
  • The fields have a numeric equality condition
  • The PDF is rendered with editable form fields:

If the number field was set to either 1 or 2, then the rendered PDF would contain two empty checkboxes:

One of the checkboxes should have been checked:

This was working if the field type was a "String" with "1" and "2" as options.

To figure out what was going, I generated two PDFs: One with the fields set to the Number type, and one with the String type. Then I opened one of my favorite PDF debugging tools: iText RUPS PDF Inspector:

I opened the PDF with string fields, clicked File => "Compare with", and then chose the PDF with number fields.

RUPS shows that the only two difference are the appearance state (/AS) entry for the first checkbox annotation, and the value (/V). We were able to use this information to figure out how to get the checkboxes to render correctly for number fields.

This article was originally a lot longer and went into a lot more detail, but I thought it would be best to end it here for now (for your own safety.) The PDF specification is dark and full of terrors.

Another PDF debugging technique is to create a new PDF form in Adobe Acrobat and set up some fields. You can then open up the PDF in RUPS and see how the internal structure of the PDF is supposed to look. Acrobat can generally be considered the gold standard when it comes to producing valid PDF files.

One final quick tip. If you're on a Mac and you're reading through the PDF Reference in Preview, click "View" => "Table of Contents":

This is a lot more useful than the default "Thumbnails" view which shows all the pages, and you can quickly jump to the right section.

Thanks for reading, and come back soon for more PDF debugging and engineering tips!


Resources

Tools

PDF Viewers

Here are some of the main PDF viewers that I use for testing PDFs: