Visually compare PDFs

Sometimes a problem seems difficult, but a good overview can make it easy. If you were asked to write a program to compare two PDF files and show the differences, do you think it would be difficult? If you're [serhack], you'll make things much easier than you think.

Of course, making something simple sometimes depends on making simplifying assumptions. If you're expecting a "diff-like" utility that shows insertion and deletion, that's not what's happening here. Instead, you'll see an image of the PDF with the changes highlighted by a red box. It's easy because the program uses available utilities to render PDF files as images, then simply compares the pixels in the resulting images, drawing red frames over the parts that don't match.

Obviously it's better for PDFs that only have a few edits. Inserting a paragraph, for example, makes the output pretty useless. For this, you might consider extracting the text from the PDF using something like pdf2text (which uses the same underlying library used to generate images).

The program sends a lot of messages about missing files but still seems to do the job. Here is the result of comparing two versions of the Hackaday homepage captured in PDF format within minutes of each other:

You can see, however, that if a new article was published and everything slipped a notch, you would have nothing but a giant red block.

Always a smart idea. There are surprisingly few tools for this, although we found a few more. There are, of course, many Linux tools for manipulating PDFs. Many of them are mashups of other tools like this.

Visually compare PDFs

Sometimes a problem seems difficult, but a good overview can make it easy. If you were asked to write a program to compare two PDF files and show the differences, do you think it would be difficult? If you're [serhack], you'll make things much easier than you think.

Of course, making something simple sometimes depends on making simplifying assumptions. If you're expecting a "diff-like" utility that shows insertion and deletion, that's not what's happening here. Instead, you'll see an image of the PDF with the changes highlighted by a red box. It's easy because the program uses available utilities to render PDF files as images, then simply compares the pixels in the resulting images, drawing red frames over the parts that don't match.

Obviously it's better for PDFs that only have a few edits. Inserting a paragraph, for example, makes the output pretty useless. For this, you might consider extracting the text from the PDF using something like pdf2text (which uses the same underlying library used to generate images).

The program sends a lot of messages about missing files but still seems to do the job. Here is the result of comparing two versions of the Hackaday homepage captured in PDF format within minutes of each other:

You can see, however, that if a new article was published and everything slipped a notch, you would have nothing but a giant red block.

Always a smart idea. There are surprisingly few tools for this, although we found a few more. There are, of course, many Linux tools for manipulating PDFs. Many of them are mashups of other tools like this.

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow