Today, PDFs are a widespread form of document sharing. Invented in 1993 by Adobe, PDFs helped spur a revolution toward paperless offices by allowing graphics-heavy files to retain their original look when shared.
But are PDFs accessible to everyone, including people with disabilities? In 2001, Adobe added the function to tag PDF contents for navigation with screen readers and other assistive technologies. Both Adobe and the W3C provide detailed instructions on PDF accessibility. Unfortunately, making PDFs accessible is more complicated than many other formats, such as HTML. In practice, many PDFs have not been created correctly.
Types of PDFs
A PDF is either just a scanned image of a document or a combination of images and text that is tagged or untagged. Of these three types, only tagged PDFs are accessible.
Scanned PDFs
A scanned PDF is simply an image, which is inaccessible to all assistive technologies. With the use of Optical Character Recognition (OCR), it may be possible to access some text, but not reliably, as this commenter explains:
I recently attended a training course, and we were required to read a number of articles prior to the commencement of the course. The articles were all sent as PDFs. I opened the first one, hoping that it had been formatted correctly so that it would be accessible to read. My screen reader announced: “alert: empty document”. I sighed in frustration, realizing the document was a scanned image of the text. I tried to use OCR software to recognize the text. Certain parts were readable, while other paragraphs contained text interspersed with symbols and numbers, making it impossible to read.
Untagged PDFs
PDFs that display text and images are generally untagged by default, though it is possible to automatically tag some features when converting from programs like Word. While assistive technology users can technically access the content in untagged PDFs these files are often very difficult to interpret and use—to the point of inaccessibility:
I’d not been to this particular venue before, so I found a PDF version of the menu on their website. The screen reader presented the food items first, followed by a string of numbers, indicating prices for each item. This made it difficult to read the menu properly, and I had to rely on my friends to read it for me. People are usually happy to help, whether it be friends or restaurant staff, but when information isn’t accessible, it takes away a person’s choice and right to be independent.
There is nothing as frustrating and time consuming as PDFs with tables which are not tagged. Imagine a programming tutorial in which first you read (hear) ten commands and only then you read their explanation starting from the first one. It happens when instead of a correct table one puts text in two columns. For a sighted person it looks fine, but for a screen reader user it is illegible.
Tagged PDFs
A correctly tagged PDF is (mostly) accessible. It allows the user to read and understand the information in the intended order by adding content such as heading tags, alternative text for images, and table headings.
However, two major accessibility issues still complicate PDFs:
- Tagging is essential for accessibility, but it is a time-consuming and often buggy process compared to formats like HTML. A skilled PDF tagger requires training and time to appropriately do their job.
- PDFs can be problematic for mobile users even when correctly tagged. Unlike HTML, they often don’t resize or display properly, and assistive technologies don’t always interpret tags appropriately on mobile devices.
Why do people use PDFs?
In my line of work, I hear many excuses for only providing information in PDF format—and none of them are great arguments.
Reason 1: It’s easy
PDFs are easy because the information is created in another application first. No one creates their PDF in Adobe Acrobat. Designers create the content in another format first, such as InDesign or Word—which makes providing another accessible format to your customers alongside a PDF extremely easy.
Reason 2: Security
People often create a PDF to lock information from editing or alteration. However, it’s possible to break into a PDF and easy to recreate content from it. The best way to protect your content is also an accessibility best practice: in addition to the PDF, ensure that you publish that content to your website in HTML. Someone has to hack your server in order to change that!
Reason 3: Brand
The ability to display graphic-heavy content on items such as annual reports is hard to resist from a branding perspective. But in today’s mobile-centric world, will people want to take the time to download a document in order to access the information? If you must provide a PDF, ensure that everyone has access by also providing the same information in HTML.
Reason 4: Print versions
Print versions are indeed popular when received in person. But today, a majority of people use smartphones at least some of the time and are more likely to prioritize receiving content in HTML that they can download quickly and easily. Providing these items as PDFs also cuts the cost of printing—and did you know that ink costs more than crude oil?
So, what’s the conclusion?
Adobe is working to improve their accessibility features, but until PDFs are easier to tag and use on mobile devices, you should publish content in an alternative format (such as HTML or Word) alongside any PDF. And if you must provide PDFs without alternatives, be sure to train your staff to tag them properly with accessibility features. Providing professionally prepared, accessible content will make a big difference to your employees and potential hires, as this user notes:
Whenever I come across a correctly tagged PDF document, I know that the company which released it hires really professional editors who do know their job.
For more information techniques for creating accessible PDFs, check out the factsheet PDF Accessibility Principles, as well as these additional resources: