When you create a Microsoft Word file, you may think that all the information you’re giving is what you type into it and keep in the final version. If you’re seriously concerned about confidentiality, you can’t count on that. A file’s metadata can include information about its source and history which you never realized was there. Redaction may not remove all the information it’s supposed to chop out.
When you or somebody else installed Word on your computer, you were asked to enter information about yourself. It gets put into every file you create. If multiple people edit a document, the information on all of them gets into the metadata. Most people don’t mind, but in some cases it could be revealing too much information. If you entered gibberish or silly comments, they go into your documents.
A file contains information on its history, sometimes including previous versions. Turning off tracking history doesn’t delete the information; it just hides it. The file can contain not only a record of the changes, but of the comments made during the process. Some of these could be the kind of comments you’d never want to see the light of day.
Let’s say the document is a proposal to another company. There was probably a lot of work fine-tuning it, deciding on time estimates and describing the approach to each part of the project. You don’t really want the recipient to see your earlier guesses, abandoned approaches, and the occasional “What is this moron talking about?” comment.
Microsoft provides details of what could be hidden and how to remove it. How you discover the hidden information depends on the version of Word you’re using, but all modern versions have an “Inspect Document” command available. The Document Inspector lets you remove metadata as well. How much it will do depends on what version of Word and what inspector modules you have.
The same cautions apply if you’re using Open Office or Libre Office. They imitate Word whether there’s any good reason to or not, including its careless treatment of metadata.
In situations where avoiding leaked information is critical, the safest approach is to create a completely new document just before publication. Select the text from the working document and paste it. Make sure any personal metadata in the final document is what you want the people viewing it to see.
If you’re really worried, paste the content into a text-only editor such as Emacs first, then copy it from there into the new document. This will, of course, wipe out all formatting. Not many organizations will want to go to such lengths, but the option is there.
Thanks to Ellen Kranzer for calling my attention to these issues.