Orange Noise Production Company, LLC recently completed a contract to convert approximately 7,000 pages of Word documents into accessible PDFs. The following describes some of what we learned in the process.
There are various standards for PDF accessibility, one of which is outlined in Web Content Accessibility Guidelines (WCAG). The popular tools for checking PDF accessibility attempt to measure documents according to these standards. Adequacy, not perfection, is the goal, and the final test is whether or not those relying on screen readers can successfully navigate and understand a document.
Three common tools are MS Word's built-in accessibility checker (which we did not rely upon), Adobe Acrobat's accessibility checker, and the PAC version 3 PDF validation tool—by far the most rigorous of these three tools.
What one discovers is that, while authoring in Word, one cannot address all the requirements for accessibility. When one moves to Adobe Acrobat, one can make many "fixes," but some are tedious and difficult to attain. Most demonstrations of how-to-make-accessible PDFs on YouTube and elsewhere use very short documents as examples. These examples are not helpful when one is faced with poorly formatted Word documents ranging from 50 pages to 500 pages in length. Only through scripting some things in Word's Visual Basic and other things in Python were we able to achieve satisfactory results.
Two things we could not achieve in Word were (1) adjusting tables to pass both Acrobat's checker and PAC 3 and (2) adjusting figures so that they had the correct settings for bounding boxes, a common point of failure upon accessibility inspection.
Acrobat's accessibility checker is very good at addressing certain points of failure. Perhaps the best example is "tab order." Once the checker detects a problem with an un-defined tab order, the single click of a mouse fixes the entire document.
In addition to the accessibility checker, Acrobat provides several preflight tools that can also address outstanding needs. One of these arises from the PAC 3 error "PDF/UA identifier is missing." That metadata can be added through the click of a preflight tool. Similarly, hyperlinks need alt text. While there may be a way to do this manually in Word, the Acrobat preflight fixes all of these with the click of a mouse.
As mentioned above, the documents for which we were responsible were long and poorly formatted. Most of the formatting required manually apply the correct styles to headings, and occasionally a Word macro sped of that work. One things we discovered was that editable figures (charts with labels) failed the validations tests. For these, we had to create bitmaps of the figures, temporarily insert them into the Word document, proceed with PDF generation, and then re-place the editable versions.
Even at that point, the PDFs would fail on the "bounding box" criterion. For this, our developer used Python to script changes in the PDFs that we simply could never have achieved manually. The same thing applied to adjusting the tables to pass the validation tests. We couldn't adjust the tables sufficiently in Word, as noted above, and it would have taken far too long to adjust them in Acrobat.
In our experience, very few commercial or non-profit enterprises have needed to make extremely long and sometimes clumsy Word documents into accessible PDFS that pass both the Acrobat check and the PAC 3 validation. What one can achieve in a short brochure is quite different from what one can achieve in a lengthy document. We achieved both validations and can assure others that the task is possible.