Tag Archives: accessibility

Closed captioning formats

CC logoAn online discussion led to my learning about Udemy’s support for closed captioning and to the formats available for it. Since I hadn’t heard about these formats before, I’m guessing a lot of other people haven’t. They can be useful not only for accessibility but for preservation, since they provide a textual version of spoken words in a video. These are just some notes on what I’ve found in a cursory investigation. In general, sites that support closed captioning expect a text file in one of several formats, which has to have at least the text of the caption, its starting time, and its duration or ending time.
Continue reading

PDF and accessibility

PDF is both better and worse than its reputation for accessibility. That is, it’s worse than most people realize when it’s used with text-to-speech readers, but potentially much better than many visually impaired people suppose from their own experience. The reason for this paradox is that PDF wasn’t designed to present content rather than appearance, but modern versions have features which largely make up for this.

The worst case, of course, is the scanned document. Not only does this mean you’re stuck with OCR for machine reading, but it isn’t searchable. It’s a cheap solution when working from hardcopy originals, but should be avoided if possible.

Normal PDF has a number of problems. There’s no necessary relationship between the order of elements in a file and the expected reading order. If an article is in multiple columns, the text ordering in the document might go back and forth between columns. If an article is “continued on page 46,” it can be hard to find the continuation.

Character encoding is based on the font, so there’s no general way to tell what character a numeric value represents. The same character may have different encodings within the same document. This means that reader software doesn’t know what to do with non-ASCII characters (and even ASCII isn’t guaranteed).

Adobe provided a fix to this problem with a new feature in PDF 1.4, known as Tagged PDF. All except seriously outdated PDF software supports at least 1.4. This doesn’t mean using it is easy, though. Some software, such as Adobe’s InDesign, supports creation of Tagged PDF files, but you have to remember to turn on the feature, and you may need to edit automatically created tags to reflect your document structure accurately. For some things, it can be a pain. I tried fixing up a songbook in InDesign with PDF tags, and realized I’d need to do a lot of work to get it right.

Tagging defines contiguous groups of text and ordering, offering a fix for the problem of multiple columns, sidebars, and footnotes. It allows language identification, so if you have a paragraph of German in the middle of an English text, the reader can switch languages if it supports them. Character codes in Tagged PDF are required to have an unambiguous mapping to Unicode.

These features of Tagged PDF are obviously valuable to preservation as well as to visual access. PDF/A incorporates Tagged PDF.

It shouldn’t be assumed that because a document is in PDF, all problems with visual access are solved. But solutions are possible, with some extra effort.

Some useful links: