Tag Archives: OpenDocument

In MS Word, the bullet bites back

There’s nothing new about Microsoft’s ignoring standards and ruining compatibility, but knowing the details is useful. One case I just learned about, from Mark Mandel, is the way it does bullet lists. This applies to the old Word DOC format on Mac OS X.

A 2008 OpenOffice Forum discussion explains the problem. If you create a bullet list in Word and import it into OpenOffice, the bullets are turned into something odd-looking. The file doesn’t use Unicode bullets, but instead uses the Microsoft Symbol font, which has its own nonstandard encoding. This applies only to bullets generated by list styles, not to ones you type in. On Windows, OpenOffice will display the files correctly, since it has access to the needed fonts and mapping.

Apparently the issue can also be manifested when creating a DOC file with OpenOffice and importing into Word, though I’m not clear on how that happens.

The problem is that Word 97/2000/2002 isn’t fully Unicode-compatible, mapping Unicode characters to the 8-bit encodings that its fonts need. This has presumably been fixed in the more recent versions that use DOCX (Office Open XML), but DOC is still widely used as an interchange format, so it’s an important issue. It’s also an illustration of the risks of using undocumented interchange formats.

A preservation hazard in OpenOffice

While playing with OpenOffice in my research for Files that Last, I came across a preservation risk. I copied an image from a website and pasted it into a text document, then looked at the resulting XML. The image data wasn’t anywhere in content.xml or anywhere else in the overall ZIP document. Instead, I found this:


<draw:image
xlink:href="http://plan-b-for-openoffice.org/resources/images/x180x60_3_get.png.pagespeed.ic.fjV0teeVb_.png"
xlink:type="simple"
xlink:show="embed"
xlink:actuate="onLoad"/>

The source for the image is on the Web. This means that if the URL stops working, the document loses the image. That’s a poor plan for long-term storage.

The way to avoid this is to use Edit > Paste special and paste the image as a bitmap. It can be a pain to remember to do this. You may be able to catch images that are pasted by reference, since there can be a brief delay while just a box with the URL is displayed before the image comes up.

Sneaky little preservation hazards like this (and the earlier one mentioned with Adobe Illustrator files) are the kind of thing you’ll find when Files that Last comes out.