I was browsing social media this morning, and I saw a claim I’ve seen go past a few times now – that there’s a maximum size for a PDF document:

Some version of this has been floating around the Internet since 2007, probably earlier. This tweet is pretty emblematic of posts about this claim: it’s stated as pure fact, with no supporting evidence or explanation. We’re meant to just accept that a single PDF can only cover about half the area of Germany, and we’re not given any reason why 381 kilometres is the magic limit.

I started wondering: has anybody made a PDF this big? How hard would it be? Can you make a PDF that’s even bigger?

A few years ago I did some silly noodling into PostScript, the precursor to PDF, and it was a lot of fun. I’ve never actually dived into the internals of PDF, and this seems like a good opportunity.

Let’s dig in.

Where does the claim come from? These posts are often accompanied by a “well, actually” where people in the replies explain this is a limitation of a particular PDF reader app, not a limitation of PDF itself. They usually link to something like the Wikipedia article for PDF, which explains:

Page dimensions are not limited by the format itself. However, Adobe Acrobat imposes a limit of 15 million by 15 million inches, or 225 trillion in2 (145,161 km2).[2]

If you follow the reference link, you find the specification for PDF 1.7, where an appendix item explains in more detail (emphasis mine):

In PDF versions earlier than PDF 1.6, the size of the default user space unit is fixed at 1/72 inch. In Acrobat viewers earlier than version 4.0, the minimum allowed page size is 72 by 72 units in default user space (1 by 1 inch); the maximum is 3240 by 3240 units (45 by 45 inches). In Acrobat versions 5.0 and later, the minimum allowed page size is 3 by 3 units (approximately 0.04 by 0.04 inch); the maximum is 14,400 by 14,400 units (200 by 200 inches).

Beginning with PDF 1.6, the size of the default user space unit may be set with the UserUnit entry of the page dictionary. Acrobat 7.0 supports a maximum UserUnit value of 75,000, which gives a maximum page dimension of 15,000,000 inches (14,400 * 75,000 * 1 ⁄ 72). The minimum UserUnit value is 1.0 (the default).

15 million inches is exactly 381 kilometres, matching the number in the original tweet. And although this limit first appeared in PDF 1.6, it’s “version 7” of Adobe Acrobat. This is probably where the original claim comes from.

What if we make a PDF that exceeds these “maximum” values?

The inner structure of PDFs I’ve never dived into the internals of a PDF document – I’ve occasionally glimpsed some bits in a hex editor, but I’ve never really understood how they work. If I’m going to be futzing around for fun, this is a good opportunity to learn how to edit the PDF directly, rather than going through a library.

I found a good article which explains the internal structure of a PDF, and combined with asking ChatGPT a few questions, I was able to get enough to write some simple files by hand.

I know that PDFs support a huge number of features, so this is probably a gross oversimplification, but this is the mental picture I created:

%PDF-1.6 objects object 1 object 2 object N xref trailer startxref %%EOF tbc The start and end of a PDF file are always the same: a version number (%PDF-1.6) and an end-of-file marker (%%EOF).

After the version number comes a long list of objects. There are lots of types of objects, for all the various things you can find in a PDF, including the pages, the text, and the graphics.

After that list comes the xref or cross-reference table, which is a lookup table for the objects. It points to all the objects in the file: it tells you that object 1 is 10 bytes after the start, object 2 is after 20 bytes, object 3 is after 30 bytes, and so on. By looking at this table, a PDF reading app knows how many objects there are in the file, and where to find them.

The trailer contains some metadata about the overall document, like the number of pages and whether it’s encrypted.

Finally, the startxref value is a pointer to the start of the xref table. This is where a PDF reading app starts: it works from the end of the file until it finds the startxref value, then it can go and read the xref table and learn about all the objects.

With this knowledge, I was able to write my first PDF by hand. If you save this code into a file named myexample.pdf, it should open and show a page with a red square in a PDF reading app:

%PDF-1.6

% The first object. The start of every object is marked by:
%
% obj
%
% (The generation number is used for versioning, and is usually 0.)
%
% This is object 1, so it starts as `1 0 obj`. The second object will
% start with `2 0 obj`, then `3 0 obj`, and so on. The end of each object
% is marked by `endobj`.
%
% This is a “stream” object that draws a shape. First I specify the
% length of the stream (54 bytes). Then I select a colour as an
% RGB value (`1 0 0 RG` = red), then I set a line width (`5 w`) and
% finally I give it a series of coordinates for drawing the square:
%
% (100, 100) —-> (200, 100)
% |
% [s = start] |
% ^ |
% | |
% | v
% (100, 200)
stream
1 0 0 RG
5 w
100 100 m
200 100 l
200 200 l
100 200 l
s
endstream
endobj

% The second object.
%
% This is a “Page” object that defines a single page. It contains a
% single object: object 1, the red square. This is the line `1 0 R`.
%
% The “R” means “Reference”, and `1 0 R` is saying “look at object number 1
% with generation number 0” — and object 1 is the red square.
%
% It also points to a “Pages” object that contains the information about
% all the pages in the PDF — this is the reference `3 0 R`.
2 0 obj
>
endobj

% The third object.
%
% This is a “Pages” object that contains information about the different
% pages. The `2 0 R` is reference to the “Page” object, defined above.
3 0 obj
>
endobj

% The fourth object.
%
% This is a “Catalog” object that provides the main structure of the PDF.
% It points to a “Pages” object that contains information about the
% different pages — this is the reference `3 0 R`.
4 0 obj
>
endobj

% The xref table. This is a lookup table for all the objects.
%
% I’m not entirely sure what the first entry is for, but it seems to be
% important. The remaining entries correspond to the objects I created.
xref
0 4
0000000000 65535 f
0000000851 00000 n
0000001396 00000 n
0000001655 00000 n
0000001934 00000 n

% The trailer. This contains some metadata about the PDF. Here there
% are two entries, which tell us that:
%
% – There are 4 entries in the `xref` table.
% – The root of the document is object 4 (the “Catalog” object)
%
trailer
>

% The startxref marker tells us that we can find the xref table 2196 bytes
% after the start of the file.
startxref
2196

% The end-of-file mar
Read More