Identifying the issues and pitfalls of upgrading RChain nodes

Purpose

The initial purpose of this document was to propose a set of processes and tools for upgrading RChain nodes. Through several discussions, it became evident that prior to diving in on a method for upgrading nodes, it would be worth trying to capture some of the challenges associated with this upgrade process. The document will serve as the home for capturing these issues. Please add or update sections as needed.

Scope

The scope of the update discussion is intended to be limited to the software responsible for the Casper consensus and should focus on the node software and system-wide on-chain contracts that are critical to how the RChain blockchain functions (this such as the REV Vault contract).

For Casper, this means any software that changes the Casper BlockMessage protobuf:

  • Rholang interpreter / tuplespace
    • As Tuplespace's trie's resulting state hash would change
  • RSpace event log
  • Every other field in the BlockMessage protobuf


Questions:

  • Who decides when a protocol-breaking change needs to be released
  • What is the mechanism for coordinating 


Considerations

Rholang fundamentally is a code is data and data is code model making it tricky to do upgrades.

Tuplespace

There is one global tuple space that represents the state of the Rholang virtual machine. Blocks that are proposed by validators to other validators contain signed deploys. These deploys, starting with those in the genesis block, contain Rholang that when executed updates the tuplespace.  After months or years of operation, the amount of data stored in the tuplespace can become quite large. This must be taken into consideration when we propose things such as a hard-fork. For example, if a decision is made to "hardfork" such that a new genesis block needs to capture all the state of the tuplespace, would result in one that would be too large to practically propagate via the block proposal mechanism.

Blessed Contracts

There are two contracts that are consulted by the node software that performs Casper-specific block processing: PoS.rho and RevVault.rho. Any updates to these contracts need to be coordinated across validators in the same way any Casper-specific node software updates do. These two contracts are initially deployed in the genesis block. A mechanism will be needed to update these two contracts that all validators agree on.  There is a proposal on how to update these two contracts here: Blessed Contracts - A better user experience

Rholang Casper Params

A decision was made to keep all of the Casper parameters, including all of the bonded validators, on-chain in Rholang, and thus in the tuplespace. Currently, this is confined to the Casper PoS contract. Updating these contracts purely via Rholang is tricky because the deploy must extract the state of these contracts as of the previous block, then deploy an updated contract with this state. In the case of the REV Vault, constructing a deploy that contains this state is probably not practical once there are millions of REV Addresses.

Changes to Casper consensus

Changes to Casper consensus need to be coordinated such that a validator applies appropriate rules while processing blocks. We don't want one validator to slash another validator for proposing an "invalid block" due to them running different versions of Casper processing logic. This means that if we don't want to have downtime on the blockchain, we will need to update the software in the validators prior to protocol change going into effect then coordinate an upgrade using something like a block number.

Example upgrade scenarios

To help think through pitfalls of upgrading, here are possible upgrade scenarios. (Please update this list to build out an exhaustive list to test.)

ImpactsDescription
Casper

Add a new slashing condition to Casper

CasperUpgrade a rholang contract that's (transitively) a part of our consensus protocol
CasperUpgrade rholang syntax to allow new sugar/semantics
CasperUpgrading REV Vault.rho
CasperUpdating a consensus parameter (such as the # of blocks in an epoch)


NodeRemove no-longer used structures from blockstore
Node

Change the way tuplespace stores its data

Proposals

Here are some proposals that were just discussed.

Update the Trie Directly

Essentially this involves manipulating bits in the trie. At a certain agreed upon block with height H, all nodes would run a script that would update LMDB such that the behavior of the blessed contract to be upgraded would change. As a result, the block H+1 would have a prestate hash that doesn't match the post-state hash of H. To not be slashed, we would have a case (if/else statement) that would ignore the prestate/post-state matching validation for block H+1. Validators would have to upgrade their nodes to have a version of the software that contains these behavioral changes before block with height H. [Autoupdate solution] We have a private key that is allowed to override existing blessed contracts based on the existing insertSigned system process. No node upgrade will be needed. For changes to the structure of mutable data structures (channels that hold immutable data structures), the executor of the upgrade will have to execute a script that would fetch the existing data structure and interpolate its values into the new data structure.(edited)

Anti-autoupdate solution

We have a system process updateBlessed that would take as a parameter the code that will replace the existing blessed contract behavior. The blessed contract will only be replaced with the new code if the current block has the agreed upon block with height H. When a validator is about to create a block with height H, they will produce an agreed upon deploy that upgrades a blessed contract through the updateBlessed system contract. Validators that receive this block with height H, will have to validate that the block contains this agreed upon deploy. Note that if the deploy might involve migrating mutable data structures, the deploy will have to be dynamically generated based on the state of block with height H-1.

The Circle of Life

Every upgrade will just be a new genesis block until we figure out a better solution.