The Bitcoin blockchain format

The Bitcoin cryptocurrency depends on security and confidence. If a flaw in the design broke its trust or usability, the whole system would collapse.

It’s strange, then, that Bitcoin doesn’t have a specification. This is considered a feature, not a bug:

The Developer Documentation describes how Bitcoin works to help educate new Bitcoin developers, but it is not a specification—and it never will be.

Bitcoin security depends on consensus. … The only correct specification of consensus behavior is the actual behavior of programs on the network which maintain consensus. As that behavior is subject to arbitrary inputs in a large variety of unique environments, it cannot ever be fully documented here or anywhere else.

That’s one way to guarantee bug-free code. If the actual behavior is the specification, it can’t be wrong. But is that a meaningful spec?

Bitcoin is based on blockchain technology. The idea of the blockchain is a distributed, digitally signed ledger. Any transaction is authenticated by the people who engage in it and added to the blockchain. It just keeps growing over time. Each transaction is represented by a block which links back to an earlier block.

Bitcoin is supposed to be decentralized, but decentralization in file formats goes with open specifications. Anyone can create an Open Office file, because the spec is public and anyone can write code to check files against it. Proprietary, unpublished formats use the code as the spec, and everyone else has to reverse engineer them.

The developer reference actually provides a lot of detail about the blockchain format. However, it contains a strong disclaimer that “this documentation has not been extensively reviewed by Bitcoin experts and so likely contains numerous errors.” In a conflict between the code and the documentation, the code is the final authority.

It’s possible to store arbitrary data on the Bitcoin blockchain. This doesn’t interfere with anyone’s financial transactions, but it means someone could permanently store potentially explosive content on every copy of the blockchain. What if someone inserted data that made it a crime for anyone to possess a copy?

Blockchain splits concern some people. Suppose, for instance, China cut itself off from the Internet for a year, Chinese continued using Bitcoin, and then the Great Firewall went down? There would be two established, divergent blockchains that would have to compete under the consensus procedure. People with access to both Internets could spend the same money twice. The resolution might be a permanent fork, with two separately operating Bitcoins.

There’s currently a heated debate over increasing the block size in Bitcoin. The currency is far more popular than its mysterious creator expected, and the existing block size makes transactions inefficient as usage increases.

Bitcoin looks like a classic case of a format that’s good enough and is so popular that replacing it with something better is hard.

Is Bitcoin truly decentralized? It depends on one blockchain database, however many copies it has, and on one set of software, even if it’s open source.

The ability to do a hard fork — to split Bitcoin into two or more permanently separate blockchains — may be its salvation. A world with multiple blockchains and technical variations might be more robust than one with a single cryptocurrency.

I don’t claim to be an expert on this technology; I’m just trying to look at it from the standpoint of file format technology and raise some questions. If you think I’ve made errors or want to expand on any relevant points, please comment.

Comments are closed.