A New Approach To High Availability: Validator Pairs

Where We Left Off

Validator Pairs

I think I have a solution that’s even simpler or better than using raft between the validators. You have this high communication overhead within your system before you can make nodes, A sign every single block. You also need to have at least three nodes in order to do raft. So here’s my solution. Let’s say I had two validator nodes. Let’s say I had a primary and a backup, right? I want the primary to basically always be signing, and if it fails, you want the secondary, the backup to take its position. This would be simple to do if you had a perfect communication link, like a perfectly synchronous communication line between your two validators. But then the problem is we don’t because what if something happened between that wire that connects you to validators. Here’s the thing, we actually do have a perfectly synchronous communication link between the two nodes and it’s the blockchain itself. So, what you could do is you can make a simple rule. In Tendermint and in the Cosmos SDK staking module we kind of say, you can miss hundreds of blocks without getting in trouble, right? We can just make a simple rule that says, look, the primary is signing blocks always and the secondary is watching the blockchain. If the primary, if it ever sees 10 Tendermint blocks in which our signature is not on, the primary signature is not on it. It will start signing and what the primary will say is if I ever see 10 Tendermint blocks in a row, in which my signature is not there, I will shut off and never turn on again. I will not sign after that. This guarantees that there’s no situation in which there’s any block in which both the primary and the secondary tried to sign it.

How Does It Work?

Healthy pair of validators
Unhealthy pair of validators
State swap



