Recently there was a discussion on the Library of Congress’s MODS mailing list, pointing out that the MODS Schema uses non-canonical URI’s for the xml.xsd and xlink.xsd schemas. The URI for xml.xsd simply points to a copy of the standard schema, but the xlink schema points at a modified version.
A person at LoC explained that the change to the XML URI was needed because the W3C server was being hammered by so many accesses by way of the MODS schema. Every time a MODS document was validated, unless the validating application used a local or cached copy, there would be an access to the W3C server. We’re told that “W3C was complaining (loudly) about excessive accesses and threatening to block certain clients.” The XLink issue is more complicated and not fully explained in the list discussion, but one part of the problem was the same issue.
The identification of XML namespaces with URI’s creates a denial-of-service attack against servers that host popular schemas, as an unintended consequence of the design. Since you can’t always know which schemas will become popular, this can create a huge burden on servers that aren’t prepared for it. The URI can never move without breaking the namespace for existing documents. I’ve written here before about this problem but hadn’t known it was so severe it was forcing important schemas to clone namespaces. This causes obvious conflicts when a MODS element is embedded within a document that uses the standard XML namespaces.
The only solution available is for applications either to keep a permanent local copy of heavily used schemas or to cache them. Unfortunately, not all applications are going to be fixed, and not all users will upgrade to the fixed versions. So we’ll continue to see cases where schema hosts are hammered with requests and performance somewhere else suffers for reasons the users can’t guess.