Tag Archives: music

The digital preservation song challenge!

Should there be songs about digital preservation? This is just a special case of the question, “Should there be songs about X?” For nearly all X, the answer is “Yes, and there probably are!” (Even — perhaps especially — if there shouldn’t be, there are.)

Someone in the Australiasian preservation community asked if AusPreserves needed a theme song. The first responses were existing popular songs, but then people started getting more creative. This led to the Digital Preservation Song Challenge!

One response was the Beyonce parody, “All the Corrupt Files” (“Put a checksum on it”). I think it’s the first song ever to mention JHOVE!

Naturally, I already have my own song on digital preservation, called Files that Last. I wrote it to promote my book of the same title, but it stands (or falls) by itself.

If it’s worth doing, it’s worth singing about, and that certainly applies to digital preservation!

Pono’s file format

I’ve been seeing weirdly intense hostility to the Pono music player and service. A Business Insider article implies that it’s a scheme by Apple to make you buy your music all over again at higher prices. Another article complains that it will hold “only” 1,872 tracks and protests that “the Average person” (their capitalization) doesn’t hear any improvement. I wonder if some of these people are outraged because they’re confusing Pono with Bono and thinking this is the new copy-proof file format which he said Apple is working on.

In fact, Pono isn’t using any new format and isn’t introducing DRM. Its files are in the well-known FLAC format. FLAC stands for “Free Lossless Audio Codec.” The term technically refers only to the codec, not the container, but it’s usually delivered in a “Native FLAC” container. It can also be delivered in an Ogg container, providing better metadata support and slightly larger files.

The “lossless” part of the name refers to FLAC’s compression. MP3 uses lossy compression, which removes some information, sacrificing a little audio quality to make the file smaller. FLAC delivers larger files, giving better quality and a larger file size for the same sampling rate and bit resolution. According to CNET, “Pono’s recordings will range from CD-quality 16-bit/44.1kHz to 24-bit/192kHz “ultra-high resolution.” 96 kilohertz (dividing 192 by 2 per the Nyquist theorem) is way beyond the threshold of human hearing, so it’s understandable that people are skeptical about whether it offers any benefit over a lower sampling rate. Frequencies that high are normally filtered out.

FLAC is non-proprietary and DRM-free, and it has an open source reference implementation. Someone could put FLAC into a DRM container, but then why not use a proprietary encoding? Using FLAC is a step forward from the patent-encumbered MP3, with license requirements that effectively lock out free software.

iTunes doesn’t support FLAC files, so the Business Insider claim that Pono is Apple’s way of making you buy music over again is idiotic. It’s like saying Windows 8 is an Apple scheme to make you buy new software.

As the number of gigabytes you can stick in your pocket keeps growing, the need for compression decreases. For many people, amount of music storage takes priority over improved sound quality, but some will pay for a high-end player that gives them the best sound possible. I don’t get why this infuriates so many critics. At any rate, the file format shouldn’t scare anyone.

For more discussion of FLAC as it relates to Pono, see “What is FLAC? The high-def MP3 explained” on CNET’s site; the headline is totally wrong, but the article itself is good.

Song identification on GitHub

The code for my song identification “nichesourcing” web application is now available on GitHub. It’s currently aimed at one project, as I’d mentioned in my earlier post, but has potential for wide use. It allows the following:

  • Users can register as editors or contributors. Only registered users have access.
  • Editors can post recording clips with short descriptions.
  • Contributors can view the list of clips and enter reports on them.
  • Reports specify type of sound, participants, song titles, and instruments. Contributors can enter as much or as little information as they’re able to.
  • Editors can modify clip metadata, delete clips, and delete reports.
  • Contributors and editors can view reports.
  • More features are planned, including an administrator role.

This is my first PHP coding project of any substance, so I’m willing to accept comments about my overall coding approach. It’s inevitable that, to some degree, I’m writing PHP as if it’s Java. If there are any standard practices or patterns I’m overlooking, let me know.

Crowdsourcing song identification

Some friends of mine are pulling together a project for crowdsourcing identification of a large collection of music clips. At least a couple of us are professional software developers, but I’m the one with the most free time right now, and it fits with my library background, so I’ve become lead developer. In talking about it, we’ve realized it can be useful to librarians, archivists, and researchers, so we’re looking into making it a crowdfunded open source project.

A little background: “Filk music” is songs created and sung by science fiction and fantasy fans, mostly at conventions and in homes. I’ve offered a definition of filk on my website. There are some shoestring filk publishers; technically they’re in business, but it’s a labor of love rather than a source of income. Some of them have a large backlog of recordings from past conventions. Just identifying the songs and who’s singing them is a big task.

This project is, initially, for one of these filk publishers, who has the biggest backlog of anyone. The approach we’re looking at is making short clips available to registered crowdsource contributors, and letting them identify as much as they can of the song, the author, the performer(s), the original tune (many of these songs are parodies), etc. Reports would be delivered to editors for evaluation. There could be multiple reports on the same clip; editors would use their judgment on how to combine them. I’ve started on a prototype, using PHP and MySQL.

There’s a huge amount of enthusiasm among the people already involved, which makes me confident that at least the niche project will happen. The question is whether there may be broader interest. I can see this as a very useful tool for professionals dealing with archives of unidentified recordings: folk music, old jazz, transcribed wax cylinder collections, whatever. There’s very little in the current design that’s specific to one corner of the musical world.

The first question: Has anyone already done it? Please let me know if something like this already exists.

If not, how interesting does it sound? Would you like it to happen? What features would you like to see in it?

Update: On the Code4lib mailing list, Jodi Schneider pointed out that nichesourcing is a more precise word for what this project is about.

New audio format from Apple?

The Guardian reports that Apple is developing a new audio file format.

Apple is working on a new audio file format that will offer “adaptive streaming” to provide high- or low-quality files to users of its iCloud service.

The new format could mean that users can get “high-definition” audio by downloading to an iPhone, iPad or iPod Touch. Alternatively, it could offer a streaming service – like that of Lala.com, the music streaming and online storage company, which Apple acquired late in 2009.

No technical details are available yet as far as I can tell. This part is weird:

“All of a sudden, all your audio from iTunes is in HD rather than AAC. Users wouldn’t have to touch a thing – their library will improve in an instant,” said the source, who requested to remain anonymous.

This presumably refers to your music files on iCloud, not the ones you’ve downloaded. It seems a bit disturbing to me that Apple would just replace all the music you’ve paid for with a new format, but maybe I just don’t understand iCloud.

Undocumented “open” formats

Recently I learned that I can’t upgrade to a current version of Finale Allegro, a music entry program, except by getting the very expensive full version or taking a step downward to PrintMusic. Since I don’t want to lose all my files when some “upgrade” makes Allegro stop working, I’ve been looking for alternatives. MuseScore has its attractions; it’s open source, powerful, and generally well regarded. But I ran across this discussion on the MuseScore forum, which has me just a bit worried. According to “Thomas,” whose user ID is 1 and so probably speaks with authority, “As the MuseScore format is still being shaped on a daily basis, we haven’t put any effort yet to create a schema.”

This doesn’t encourage me to use MuseScore. Even though it’s an “open” application, its format isn’t open in any meaningful sense. You can download the code and reverse-engineer it, of course, but it’s going to change in the next version. While I’m sure the developers will try not to break files created with earlier versions, there’s no guarantee they’ll succeed, and they’re likely to be especially careless about compatibility with files that are more than a few versions old.

You can export files to MusicXML, which is standardized, but in trying this out I came upon a disturbing bug. If I edit the file and save the changes, they’re saved not to the .xml file but to a .mcsz file, MuseScore’s native format. If there’s already an older file with that name, it gets overwritten without warning.

The dichotomy between “open” and “proprietary” formats is the wrong one. There are many formats which are trademarked by a business and their documentation copyrighted, but if the documentation is public and the format not encumbered by patents, anyone can use it. Formats which are created by open-source code but are undocumented and subject to change might are effectively closed formats.

This post grew, in part, from my thoughts on avoiding data loss due to format obsolescence, which is this topic of this week’s post on Files That Last.