Node version 0.9.24 is the current release on observer nodes - Housekeeping changes and meeting requirements of exchanges for reporting state and transaction history RNode-0.9.24 release plan
RNode 0.9.24 changes impact only the observer/read only node. We've set up an 'exchanges only' read only node for exclusive use by the exchanges.
Dev team working on a Feature branch that has improvements to Last Finalized State, block store, dag store and now also beginning to store Casper state in LMDB. 0.9.25 will be released when we complete this feature branch and are able to merge that into the main dev branch. It has substantial improvements and some bug fixes to both observer/read only changes and validator nodes.
Testnet is running rchain/rnode-staging:v0.9.23-8-g0eb25aa24 - this has a couple of patches beyond the 0.9.23
Sequence of updates is testnet to mainnet observers and then main net validators if applicable. Current philosophy is to minimize updates/disruptions to validator nodes while enabling improved observer node functionality.
Focus is to make sure that the network can handle the anticipated volume from the exchanges and that exchanges can have responsive monitoring and customer service.
Sprint 53 in progress
Main Focus: Work on Last Finalized State, hardening the mainnet, improve performance, make usability improvements including configuration, API improvements including functionality needed by exchanges.
Getting ready for 0.9.25 release possibly next week
New optimization - Set fault tolerance to 1 for VALID old blocks
Slashing investigation: One validator node was slashed due to tuple space mismatch. First part of debugging revealed that the problem is manifesting when Trie is recalculated because of insert/delete of nodes when they are share common prefix.
RCHAIN-4102Getting issue details...STATUSWe are fixing this issue but while it would certainly help reduce errors, it's not clear that this is the ONLY source of the problem. At this time this is a non-deterministic and rare error. It took 3 months to manifest and only in one of the ten main net nodes. We will continue to watch and analyze/debug it. We have to put in place a strategy to handle such errors. This is a future ToDo.
BlockMerge design discussions are complete. The spec is being written up.
Completed - Improvements to CI environment. Github was earlier giving an error on some tests without detailed explanation. Through https://github.com/rchain/rchain/pull/2936 we adjusted different CI environment parameters to successfully complete the integration tests. One of the things we did was to decrease the JVM memory foot print.
Continuing work on Last Finalized State. See https://github.com/rchain/rchain/pull/2935 (Dagstore implementation in LMDB) and https://github.com/rchain/rchain/pull/2934 (abstraction over LMDB - Key value store with manager, that RNode can use). We completed DAG Storage changes and the migration logic from current file storage - this is currently being tested but will not be released in 0.9.25. The Key value store that Tomislav demoed earlier is now being used for DAG storage, and we plan to also use it for caching the transactions and state changes.
Being tested in Test net - Fixing an issue in grpc delays and connection time outs by enabling a grpc proxy on each node. This will also give us more information and logs about the traffic seen by each node, helping with debugging issues.
Addressing discovered bugs: Investigating the 'tuple space error' that we occasionally see on the main net.
Ongoing - Improvements to last finalized state issued https://github.com/rchain/rchain/pull/2913 and https://github.com/rchain/rchain/pull/2926 but quite a bit of work involved still. Significant progress, some of which will be released in 0.9.25. The PR and the branch are structured so that multiple people can collaborate/ work on different parts of the feature at the same time. The scope of this work enables (a) faster catchup by new nodes - you can start from the last finalized state - this is a differentiator for RChain (b) offloading older data and differentiated storage and retrieval strategies for the same (c) allows for a leaner / less bloated node. Tomislav continuing to work and test this. Nutzipper and Will are helping to accelerate delivery. Having to pick between refactoring and work-arounds in various parts. This change touches most parts of the code base. Trying to get a more modular and future-beneficial approach.
Ongoing SRE - Adding diversity of cloud providers. Gurinder is working on moving the test net and some main net observer nodes to IBM. Exploring Oracle as another cloud provider (some credits and promise of low prices for 2 years). Main net validator nodes are still on Google cloud. Exploring methods of enabling easy scaling and decentralization of nodes. Couple of options are (a) one click install buttons for various clouds (b) run your own 'shard in a box' or 'node in a box' using low cost devices like Raspberry Pi based Antsle etc.
CI/CD: We are slowly moving from Jira to Github for the development team, started working on release notes in Github. For a while, we will maintain in both Jira and Github. Recent issue of test failing without errors on Github mostly resolved.