Identifying the issues and pitfalls of upgrading RChain nodes
Purpose
The initial purpose of this document was to propose a set of processes and tools for upgrading RChain nodes. Through several discussions, it became evident that prior to diving in on a method for upgrading nodes, it would be worth trying to capture some of the challenges associated with this upgrade process. The document will serve as the home for capturing these issues. Please add or update sections as needed.
Scope
The scope of the update discussion is intended to be limited to the software responsible for the Casper consensus and should focus on the node software and system-wide on-chain contracts that are critical to how the RChain blockchain functions (this such as the REV Vault contract).
For Casper, this means any software that changes the Casper BlockMessage protobuf:
- Rholang interpreter / tuplespace
- As Tuplespace's trie's resulting state hash would change
- RSpace event log
- Every other field in the BlockMessage protobuf
Questions:
- Who decides when a protocol-breaking change needs to be released
- What is the mechanism for coordinating
Rholang fundamentally is a code is data and data is code model making it tricky to do upgrades.
Tuplespace
There is one global tuple space that represents the state of the Rholang virtual machine. Blocks that are proposed by validators to other validators contain signed deploys. These deploys, starting with those in the genesis block, contain Rholang that when executed updates the tuplespace. After months or years of operation, the amount of data stored in the tuplespace can become quite large. This must be taken into consideration when we propose things such as a hard-fork. For example, if a decision is made to "hardfork" such that a new genesis block needs to capture all the state of the tuplespace, would result in one that would be too large to practically propagate via the block proposal mechanism.
Blessed Contracts
There are two contracts that are consulted by the node software that performs Casper-specific block processing: PoS.rho and RevVault.rho. Any updates to these contracts need to be coordinated across validators in the same way any Casper-specific node software updates do. These two contracts are initially deployed in the genesis block. A mechanism will be needed to update these two contracts that all validators agree on. There is a proposal on how to update these two contracts here: Blessed Contracts - A better user experience
Rholang Casper Params
A decision was made to keep all of the Casper parameters, including all of the bonded validators, on-chain in Rholang, and thus in the tuplespace. Currently, this is confined to the Casper PoS contract. Updating these contracts purely via Rholang is tricky because the deploy must extract the state of these contracts as of the previous block, then deploy an updated contract with this state. In the case of the REV Vault, constructing a deploy that contains this state is probably not practical once there are millions of REV Addresses.
Changes to Casper consensus
Changes to Casper consensus need to be coordinated such that a validator applies appropriate rules while processing blocks. We don't want one validator to slash another validator for proposing an "invalid block" due to them running different versions of Casper processing logic. This means that if we don't want to have downtime on the blockchain, we will need to update the software in the validators prior to protocol change going into effect then coordinate an upgrade using something like a block number.
Example upgrade scenarios
To help think through pitfalls of upgrading, here are possible upgrade scenarios. (Please update this list to build out an exhaustive list to test.)
Impacts | Description |
---|---|
Casper | Add a new slashing condition to Casper |
Casper | Upgrade a rholang contract that's (transitively) a part of our consensus protocol |
Casper | Upgrade rholang syntax to allow new sugar/semantics |
Casper | Upgrading REV Vault.rho |
Casper | Updating a consensus parameter (such as the # of blocks in an epoch) |
Node | Remove no-longer used structures from blockstore |
Node | Change the way tuplespace stores its data |
Proposals
Here are some proposals that were just discussed.