In MS Word, the bullet bites back

There’s nothing new about Microsoft’s ignoring standards and ruining compatibility, but knowing the details is useful. One case I just learned about, from Mark Mandel, is the way it does bullet lists. This applies to the old Word DOC format on Mac OS X.

A 2008 OpenOffice Forum discussion explains the problem. If you create a bullet list in Word and import it into OpenOffice, the bullets are turned into something odd-looking. The file doesn’t use Unicode bullets, but instead uses the Microsoft Symbol font, which has its own nonstandard encoding. This applies only to bullets generated by list styles, not to ones you type in. On Windows, OpenOffice will display the files correctly, since it has access to the needed fonts and mapping.

Apparently the issue can also be manifested when creating a DOC file with OpenOffice and importing into Word, though I’m not clear on how that happens.

The problem is that Word 97/2000/2002 isn’t fully Unicode-compatible, mapping Unicode characters to the 8-bit encodings that its fonts need. This has presumably been fixed in the more recent versions that use DOCX (Office Open XML), but DOC is still widely used as an interchange format, so it’s an important issue. It’s also an illustration of the risks of using undocumented interchange formats.

One response to “In MS Word, the bullet bites back

  1. The whole handling of lists is ill-thought-out in Word — both bullet lists and numbered lists. When you choose “Format -> List” you get a dialog with six “slots” that you can assign to various list formats. This is an error right there, because it assumes you will have no more than 6 “kinds” of bullet lists and 6 “kinds” of numbered/lettered lists. And, in most cases, six of each would be enough. Except that…

    Word does not separate the counting/bulleting from the indentation and other formatting, even though those properly belong with the _paragraph_ format. If you change the indentation on a list, its bulleting/numbering is likely to change too, be captured by some other list format that is at the same indentation.

    And if you would like to create two separate list formats with the same indentation, good luck. I can’t believe the number of times I’ve tried to create both a lettered and numbered list at the same indentation and had them interfere with each other.

    Framemaker gets it right. Or at least the last version of Frame that I used. For each type of list, you can specify a letter, the equivalent of a math “variable”, that will be associated with the numbering of that list. You can specify which list variables are reset when this list’s numbering is advanced. And then the whole thing can be associated with a paragraph type, so that you simply choose a style name and get the list formatting, numbering, paragraph and font styles, etc. that are appropriate at this point.

    It’s one thing to get it wrong when you’re starting from scratch. It’s something quite different to persistently get it wrong when there are examples out there of how to do it right.