The CAP Theorem
If you can figure out P vs. NP, you’ll get a million dollars from the Clay Institute. If you can prove or disprove the Beal Conjecture, you’ll get a million-dollar prize held in trust by the American Mathematics Society. If you can beat the CAP theorem… I’ll give you a million dollars!
Unlike P vs. NP or the Beal Conjecture, the CAP theorem is not an open problem in mathematics. It is a proof that a distributed system must sacrifice consistency, availability, or partition tolerance. If you need an introduction or reminder on the CAP theorem, I like A plain english introduction to CAP Theorem by Kaushik Sathupadi.
So, which one does Bitcoin sacrifice? Availability, consistency, or partition tolerance?
- If Bitcoin sacrifices partition tolerance, this means that Bitcoin can only be run on networks that reliably deliver messages.
- If Bitcoin sacrifices availability, this means that the Bitcoin network pauses when transactions can’t be committed. So, if there’s a connectivity problem, you can’t read or write Bitcoin data at all.
- If Bitcoin sacrifices consistency, this means that new Bitcoin transactions aren’t guaranteed to be committed. So, you send a transaction and it may or may not be received by other parties.
Wait a minute… all of those sound pretty bad! So which one is it?
As you may know, “the network is reliable” is a classic distributed systems fallacy, and there’s not much that software engineers can do about it. As of 2016, most new distributed systems recognize that sacrificing partition tolerance is not an option; you must sacrifice availability or consistency.
Detecting a Partition
In order to enforce consistency, the system must be able to detect a partition. Most distributed systems are able to do this pretty easily, but… most systems are not Bitcoin!
Probably the simplest way to detect partitions is to choose a master node and declare that the canonical copy of the data set lives on the master. A transaction is considered confirmed when the master node confirms it. Bitcoin can’t use this, because then it would be easy to shut down or manipulate the whole system by getting at the master node.
In order to detect partitions without relying on a single master, a distributed system needs some kind of consensus algorithm, i.e. a way for the participants to vote on the true state of the data. Paxos and Raft are algorithms for doing this. In these systems, a write is considered to be committed if a majority of the nodes in the network accept it.
The word “majority” is the problem here: since anyone can join or leave the Bitcoin network at any time, it’s tough to calculate what counts as a majority. Imagine that I am a Bitcoin participant living on the fictional island of Blefuscu. Consider two different situations:
- The rest of the world is destroyed in an unfortunate nuclear holocaust, leaving only Blefuscu intact. In this case, there’s no need to pause Bitcoin transactions: all Bitcoin traffic can continue as usual, because everyone who uses Bitcoin is still right here on Blefuscu.
- A giant tramples the only communication cable with the outside world. In this case, we should not process any transactions until the connection is restored. Otherwise, the Blefuscu network will get out of sync with the rest of the world’s network.
However, to me on Blefuscu, those two situations look exactly the same! When the size of a network can change, there’s no surefire way to determine whether a majority of participants is present. This is why Bitcoin cannot enforce consistency.
See this StackExchange question for more about Bitcoin catastrophes on hypothetical islands.
So, what’s the verdict? Well, Bitcoin is “eventually consistent,” which is a polite way to say that it is not consistent. That is, if you send a Bitcoin transaction, there’s no guarantee that it will be received. Sounds bad? Maybe.
First, remember that the CAP theorem applies to all financial systems just as it applies to Bitcoin. A credit card is a good example of an “eventually consistent” system: typically, you pay a bill only once a month. This makes it easy to violate the constraints of the system by spending more money than you have. Of course, this risk is (theoretically) mitigated because you know that the credit card companies will ruin your credit score if you do it.
Eric Brewer, the originator of the CAP theorem, pointed out in 2012 that the CAP theorem only prohibits a tiny fraction of the design space of distributed systems; it’s still possible to create many useful and robust distributed systems. Although Bitcoin doesn’t provide firm guarantees that your transactions have been committed, in practice it does a pretty good job. The longer you wait, the more confident you can be that a transaction has gone through successfully:
- In a matter of seconds, the transaction will be relayed around the world. At this point, it will be difficult but possible to reverse (see BitUndo).
- After an average of five minutes, the transaction will be inserted into a block. About 999 in 1000 blocks make it into the blockchain, so you can feel pretty good at this point. 1 in 1000 blocks is an orphan block; if the transaction ends up in an orphan block, you just ignore it and wait until the transaction goes into a non-orphan block.
- The longer you wait, the more blocks will pile up on top of the block containing your transaction, and the more certain you can be that your transaction won’t be rolled back. In practice, the longest ever fork in the blockchain was 24 blocks in length, and even this was a one-time event caused by a software bug. To date, no transaction more than five hours old has ever been rolled back.
The rules of thumb above only apply when the Internet is functioning more or less normally.
If there were a partition where a small area were cut off from the rest of the world, the larger area would continue to work more or less as usual. However, in the smaller area, new blocks would only be created in proportion to the amount of mining power available in that partition. Let’s say that Blefuscu, with 0.1% of the world’s hashing power, is suddenly cut off from the rest of the world. In this case, new blocks are produced about once per week, instead of the usual, once every ten minutes. This more or less freezes the Bitcoin network in Blefuscu.
If there were a worldwide network partition where the Bitcoin network was divided more or less in half, things would get pretty ugly; Bitcoin would fork off into two separate streams of data, one for each region. If they later regained connectivity, the longer stream would “win,” erasing all the data in the shorter stream.
In conclusion, let’s hope that there isn’t a worldwide network partition.
Correction (May 31, 2016): Matt Branton points out that the longest fork was 24 blocks long, not four blocks long.