The Joy Reid case and the fragility of archives

The exposure of old, embarrassing posts by MSNBC columnist Joy Ann Reid has provoked a lot of heated discussion. It’s also revealed the difficulty of retaining reliable information about old material on the Web.

When these old posts came to public attention through Twitter, she asserted that there had been one or more unauthorized break-ins altering her articles to add offensive content.

In December I learned that an unknown, external party accessed and manipulated material from my now-defunct blog, The Reid Report, to include offensive and hateful references that are fabricated and run counter to my personal beliefs and ideology.

I began working with a cyber-security expert who first identified the unauthorized activity, and we notified federal law enforcement officials of the breach. The manipulated material seems to be part of an effort to taint my character with false information by distorting a blog that ended a decade ago.

Now that the site has been compromised I can state unequivocally that it does not represent the original entries.

The “altered” material, however, also was found on the Internet Archive’s Wayback Machine with the same content. If Reid’s statement is true, the alterations must have taken place shortly after their publication and yet not been noticed, or else the Internet Archive must also have been compromised.

There are two other possibilities worth considering. One is that Reid lied. Another, which I haven’t seen discussed, is false memory. Those columns could be so abhorrent to her present self that she just can’t believe she wrote them. However, she stated that an expert had “identified” unauthorized activity, so she claims to have objective evidence of alteration. This still doesn’t entirely rule out false memory; an unscrupulous “expert” might have manipulated her, collecting a big consulting fee in the process. But either there was a break-in or someone lied.

Whether the Internet Archive was also targeted or not, its archives of her writing are certainly an important part of the evidence. If they’re valid, they help to pinpoint when the alleged alteration took place. If they were altered, they need to be studied to find out how.

Making the record vanish

So what did NBC do? It willfully wiped the evidence. Mediate noted that her articles’ “Wayback Machine links mysteriously disappeared in December.” The mystery has since been solved. The Internet Archive says it has found no evidence of tampering with its archives. But what’s really interesting is this:

At some point after our correspondence [with Reid’s lawyers], a robots.txt exclusion request specific to the Wayback Machine was placed on the live blog. That request was automatically recognized and processed by the Wayback Machine and the blog archives were excluded, unbeknownst to us (the process is fully automated). The robots.txt exclusion from the web archive remains automatically in effect due to the presence of the request on the live blog. Also, the blog URL which previously pointed to an msnbc.com page now points to a generic parked page.

“Was placed.” This means (unless there was yet another break-in, this time by a Reid supporter) that someone at NBC decided to suppress the evidence. That’s a serious matter when there’s an ongoing investigation.

It’s strange how lightly NBC has gotten off for this. A lot of reports have mentioned the placement of the file, but only in passing. In the long run, what Reid said isn’t nearly as important as a major media corporation’s attempt to cover up relevant facts. The FBI is (according to Reid) investigating the matter, so removing evidence is a serious matter.

Compliance with robots.txt is, of course, voluntary, but non-compliance would probably land the IA in hot water. The person who authorized changing the file presumably knew or expected that the change would remove evidence from public view.

Would changing the file back result in the reappearance of the articles? Are they purged from the archive or just removed from public view? I don’t know the answers to these questions.

The fragile Internet past

The events remind us of the usefulness and the fragility of the Internet Archive. Material constantly disappears from the Web or moves to a different URL. Establishing what people said on the Internet decades ago is harder than establishing what they said in print.

Internet debates are generally more concerned with emotions than facts, so it isn’t really surprising that this issue has been neglected. But which is really more important: that a columnist wrote nasty things in 2005, or that NBC is obstructing the determination of facts that could clear her name or prove her a liar?

If information can just go down the memory hole, then people can claim anything.

Update:Vice has run a detailed article focusing on the archiving issues. If you cared enough about the issue to read to the end of this article, definitely read that one.

Comments are closed.