Tag Archives: Microsoft

The steep road to supporting the PDF format

A lot of applications claim they can display PDF files, but not all of them fully support the format. They won’t necessarily display all valid files correctly. The PDF Association has an article discussing this problem, with the main focus on the Microsoft Edge browser.

Edge offers only partial support for the JBIG2Decode and JPXDecode filters, which means some objects might not display. It doesn’t support certain types of shadings, so other objects could render incorrectly.

The strength of PDF is supposed to be that it will render the same way everywhere. You can blame Microsoft for not putting enough work into it, or Adobe for making the format too complex. I have enough experience with it to know it’s a seriously difficult format just to analyze, to say nothing of rendering. Is a format which presents such difficulties really the ideal for a universal document rendering format that people will count on far into the future?

Update: It gets worse. Take a look at this discussion of what’s in PDF.

OOXML: The good and the bad

An article by Markus Feilner presents a very critical view of Microsoft’s Open Office XML as it currently stands. There are three versions of OOXML — ECMA, Transitional, and Strict. All of them use the same extensions, and there’s no easy way for the casual user to tell which variant a document is. If a Word document is created on one computer in the Strict format, then edited on another machine with an older version of Word, it may be silently downgraded to Transitional, with resulting loss of metadata or other features.

On the positive side, Microsoft has released the Open XML SDK as open source on Github. This is at least a partial answer to Feilner’s complaint that “there are no free and open source solutions that fully support OOXML.”

Incidentally, I continue to hate Microsoft’s use of the deliberately confusing term “Open XML” for OOXML.

Thanks to @willpdp for tweeting the links referenced here.

In MS Word, the bullet bites back

There’s nothing new about Microsoft’s ignoring standards and ruining compatibility, but knowing the details is useful. One case I just learned about, from Mark Mandel, is the way it does bullet lists. This applies to the old Word DOC format on Mac OS X.

A 2008 OpenOffice Forum discussion explains the problem. If you create a bullet list in Word and import it into OpenOffice, the bullets are turned into something odd-looking. The file doesn’t use Unicode bullets, but instead uses the Microsoft Symbol font, which has its own nonstandard encoding. This applies only to bullets generated by list styles, not to ones you type in. On Windows, OpenOffice will display the files correctly, since it has access to the needed fonts and mapping.

Apparently the issue can also be manifested when creating a DOC file with OpenOffice and importing into Word, though I’m not clear on how that happens.

The problem is that Word 97/2000/2002 isn’t fully Unicode-compatible, mapping Unicode characters to the 8-bit encodings that its fonts need. This has presumably been fixed in the more recent versions that use DOCX (Office Open XML), but DOC is still widely used as an interchange format, so it’s an important issue. It’s also an illustration of the risks of using undocumented interchange formats.

Charles Stross on Microsoft Word

Not many people are brilliant writers and also have the technical knowledge to comment on file formats intelligently. When it does happen, it’s worth reading. So I recommend to you Why Microsoft Word Must Die by Charles Stross.

I’ve been on a digital preservation panel with Stross, and he can talk as expertly on the subject as I can. When it comes to Word, he knows a lot more about the format than I do, and he can demolish it more eloquently than I could even if I had the same level of knowledge.

The disappearing format blues

Old formats sometimes fade into obscurity and can no longer be supported, even if they come from a big company like Microsoft. Chris Rusbridge has noted that Microsoft’s Open Specifications page only goes as far back as Office 97, and that PowerPoint 4.0 files can’t be opened with today’s Microsoft Office. Tony Hey at Microsoft has replied. (Hey is vice president of Microsoft Research Connections). The response was encouraging, particularly in suggesting that Microsoft might “participate in a ‘crowd source’ project working with archivists to create a public spec of these old file formats.”

There’s usually some kind of software around that can read old formats. A search turns doesn’t turn up a lot; there’s something called PowerPressed, which will wrap old PowerPoint files in a .exe application. It looks as if it should run on current Windows systems, but all I know is what that page says.

The situation shows the risk of using a format that isn’t publicly documented. Today this is less of a problem. I think it’s been shown that publishing format specs doesn’t lead to cannibalization of sales by competing software; the company that created the spec is in a position to produce the best implementation. The description of PDF is fully public, and Adobe still dominates the market for PostScript software. Publishing the spec has just made the pie bigger. There’s still quite a lot of software that uses unpublished proprietary specs, though, and it’s risky to rely on the long-term reliability of the files they produce.

HTML5 and video

There’s an entry on the W3C blog about the state of HTML5 video. The most significant point is that “we still don’t have a baseline video codec for HTML5.” Without that, it’s silly to talk about HTML5 as an alternative to Flash or any other kind of video presentation. Microsoft is pushing H.264, and IE9 will support only H.264 under HTML5. Mozilla is going with Ogg Theora. Both codecs have patent issues, limiting the opportunities for third parties to fill in the gap. Both have enthusiastic advocates.

The Browser Wars are back.

Microsoft to open up Outlook format

A report on CNET says that Microsoft will be publicly documenting the formats of .pst files used by Outlook. Microsoft’s Paul Lorimer is quoted as saying the format specification will be available “under our Open Specification Promise, which will allow anyone to implement the .pst file format on any platform and in any tool, without concerns about patents, and without the need to contact Microsoft in any way.” No timetable is given.