Papers on solutions to this problem were dated as far as few decades ago, but recently popularized as solutions to blockchain consensus. In a blockchain, any transaction sent to a ledger (messages) must be verified, there are millions of nodes (generals) sending transactions to a network, some may contain malicious intentions, so how do we trust these messages?
Public Blockchain Solution
POW (Proof of Work), as name suggests, is a validation of work that happened and proving it correct. The idea is simple, prover presents the result of a computation which is difficult to compute but easy to verify, and by verifying the solution anyone can be sure that prover performed a certain amount of computational power to generate the result.
Bitcoin was the first to apply POW to a blockchain, the core of its application is to ensure that all nodes cannot create blocks at the same time, a node in the chain will continually assign arbitrary value and only when a target data condition is met, a block will be generated.
In this model, consensus abandons finality of traditional BFT design in exchange for permission-less (open admission) security model. This however has a vulnerability when malicious actor controls more than 50% of the hashpower known as 51% attack. Also the ever-expanding racks of GPUs, consuming ridiculous amounts of energy, performing complex calculation that has pretty much no value.
POS (Proof of Stake) was later proposed as an alternative to POW, with same intentions to provide consensus and prevent doublespend. In contrast to POW’s complicated cryptographical puzzles to validate and create new blocks, POS requires user to show ownership of X amount of tokens.
The creator of a new block is chosen randomly, depending on a validator’s respective age of the stake, blocks are then validated and minted. In typical POS networks, tokens are pre-allocated and non-minable, validators receive transaction fees as rewards. While the staking concept may be similar, internal operations are very different depending on the algorithms applied.
(This is a super overly simplified explanation of POS, the complexity is way beyond the scope of this article so I’ll stop here)
Dated but still interesting article by Vitalik, What Proof of Stake is and why it matters
These algorithms are commonly found in public blockchains, they prioritize on validity (in POW: Nonce, in POS: amount and age of stake). Let’s further investigate into BFT algorithms, a more practical approach to private chains.
Private Blockchain Solution
PBFT (Practical Byzantine Fault Tolerance)
Practical algorithm for state machine replication that tolerates Byzantine faults. The algorithm offers both liveness and safety provided at most (n-1)/3 out of a total of n replicas are simultaneously faulty. This means that clients eventually receive replies to their requests and those replies are correct according to linearizability. The algorithm works in asynchronous systems like the Internet and it incorporates important optimizations that enable it to perform efficiently.
So in simpler words, PBFT is a consensus algorithm to ensure all nodes participating in a distributed system can successfully negotiate when Byzantine nodes exist. The algorithm is a form of state machine replication, let’s first outline some fundamental properties
Transactions within the network are asynchronous, steps occur at different times, it is impossible to achieve consensus in an async network with one bad node.
The results of any execution is the same executed in some sequential order
Sequential consistency + respecting real-time order of events
In a BFT algorithm, we assume there will be Byzantine nodes, any “bad things” can happen, be it drop, delay, duplicate or re-order messages.
A linearizable replicated state machine (replication for fault prevention, not scalability)
The basic idea here is to get same messages from enough nodes to know that non-faulty nodes are in same state
*Assume 3f+1 nodes, with at most f faults
*Assume signed messages
*Assume deterministic behavior
*Assume no systematic failures
f+1 nodes, one node must be non-faulty, ensuring correct answer
2f+1 nodes, a majority of nodes must be non-faulty, providing ‘quorum’
3f+1 nodes, a quorum must be available, even if f nodes are faulty
- A client sends a request to invoke a service operation to the primary
- The primary multicasts the request to the backups
- Replicas execute the request and send a reply to the client
- The client waits for f+1 replies from different replicas with the same result; this is the result of the operation.
In a normal case, the leader node run a 3-phase protocol to coordinate w replicas