One PDF to rule them all: Creating self-sufficient documents using LaTeX
Imagine you’re reading an academic thesis, and you are interested in the raw data behind a graph or the source code of an experiment. It would be nice to have all this bundled neatly into the PDF file, reducing the need to keep external sources working. Including files in PDFs achieves precisely that, giving you:
- Self-contained reports: Attach supplementary materials like datasets, scripts, or related PDFs within your main document.
- Interactive manuals: Training materials or guides can embed relevant templates, scripts, or samples for immediate use.
- Long-term archiving: Store all essential elements in a single document.
Now, let’s dive into the technical details. We’ll first discuss the core concepts and then see how we can implement this practically using LaTeX.
The secret sauce: PDF annotations and embedding
One main benefit of the PDF format is its consistency—making it a favorite of many universities for academic submissions. PDF documents have a lot more to offer than just text and images. They support multiple types of annotations, such as text comments, geometric shapes, links, and multimedia. While text annotations are quite common for remarks, one of the lesser-known annotation type is file attachment annotations, which allows attaching entire files directly to the document.
In PDF documents, the standard support two methods for including files:
- Attaching files using annotations and
- Embedding files in collections.
Attaching files
Since the introduction of PDF 1.3 in the year 2000, the PDF specification has supported a feature called Embedded File Streams (Section 3.10.3 on p. 112 of the specification). This allows arbitrary files to be embedded as part of the PDF structure. The mechanism for embedding files via annotations is known as File Attachment Annotations. These annotations associate a file with an icon, which users can interact with to extract or open the attached file.
From a technical standpoint, these attachments are represented as objects in the PDF’s internal structure. For example, in the PDF 1.3 reference documentation (p. 417), you’ll find details about how Embedded File Streams are organized and linked to annotations. Each annotation is an object in the PDF with specific attributes, such as an icon appearance and an embedded stream that represents the attached file.
PDF clients that support this feature, can offer options to save or open the attached file. Most clients, including Adobe Acrobat and the reader in Firefox, support this, allowing users to right-click on an icon in the PDF and choose “Save Embedded File to Disk…” or “Extract file…” to access the embedded content.
Embeddings files
As the PDF specification evolved, PDF 1.7 introduced Collections in 2006. Collections allow multiple embedded files to be organized in a structured manner, making it easier for readers to find and access related resources. Collections enable the creation of a file hierarchy within the PDF, grouping related files into a navigable list.
In a collection, each file is treated as an item, with metadata like file descriptions, types, and ordering information. This method is useful when dealing with multiple supplementary files that don’t need a visual marker on the page.
PDF/A standard for long-term archiving
The PDF/A standard was established as a subset of the full PDF specification to support long-term document preservation. PDF/A eliminates features that may compromise a document’s future readability. For example, PDF/A does not allow for font linking, but instead mandates font embedding to ensure that the document can be viewed in the future without relying on external resources.
There are different PDF/A variants that support included files:
- PDF/A-2 (2011): Allows embedding other PDF/A files within a main PDF/A document. Useful for preserving the integrity of supplementary or referenced documents.
- PDF/A-3 (2012): Expands the capability to allow the embedding of any arbitrary file type. This is particularly helpful for including non-PDF files like CSVs, spreadsheets, or code files within an archivable document.
LaTeX
Now that we understand the technical specifications, let’s dive into the implementation using LaTeX. LaTeX provides several packages that offer control over file embedding and attachment in PDF documents. These packages rely on the underlying PDF features discussed above.
Attaching files
The attachfile package and its enhanced version, attachfile2, allow you to attach arbitrary files directly into a LaTeX document.
These packages work by leveraging the annotation-based file attachment features discussed earlier.
\documentclass{article}
\usepackage{attachfile2}
\begin{document}
Please find the attached file below:
\attachfile{example.txt}
\end{document}
The attached file is linked to a PDF annotation that stores metadata and references the file stream. You can read the package documentation for options, including the customization of the icon and metadata.
Embedding files
The author of the attachfile2 package also created the embedfile package:
\documentclass{article}
\usepackage{embedfile}
\begin{document}
Here is another example document. Please find the attached file below:
\embedfile{example.txt}
\end{document}
In this case, a file named example.txt is added to the PDF.
It’s up to the reader where this file will be visible.
PDF/A compliant files
To ensure your document meets PDF/A standards, LaTeX provides a variety of tools.
The pdfx package is commonly used to create PDF/A documents.
This package takes care of various requirements like font embedding, metadata inclusion, and structural compliance.
\documentclass[a4paper]{article}
\usepackage[a-3]{pdfx}
\usepackage{attachfile2}
\begin{document}
This is a PDF/A-compliant document with an attached file.
\attachfile{example.txt}
\end{document}
Conclusion
The ability to embed files in PDF documents enriches the document’s usability. By understanding the PDF standards you can create documents that are not only more informative but also more future-proof.
Leveraging LaTeX packages like attachfile2 and embedfile allows for precise control over embedding and attaching files within PDFs.
Whether you’re creating technical documentation, academic reports, or interactive manuals, embedding files ensures that all supplementary materials are preserved within a single, cohesive unit.