All nodes on the mainnet are currently running release 0.9.25. Time between creation of two blocks is observed to be about 51 secs. vs 3.5 minutes earlier, approx 4x faster. But we have not done any extensive performance tests to see if this is continuous and sustainable. We are focused on understanding if we are getting close to full thread utilization.
0.9.25 is a bug fix release, with significant configuration changes, preparatory constructs for last finalized state and many other enhancements (total of 25 PRs): Improvements to RNode Storage (Keyvaluestore in LMDB to be used in future releases by DAG Store and Block Store, as well as to cache transactions and state changes), HTTP Admin API endpoint and by bytesToHex method by Arthur Greef, Ability to visualize DAG from any point, Improved logging and handling of errors in the Web API, Configuration for API server, isFinalized call in the http API etc.
Sequence of updates is testnet to mainnet observers and then main net validators if applicable. Current philosophy is to minimize updates/disruptions to validator nodes while enabling improved observer node functionality.
Sprint 56 in progress
Main Focus: Work towards completing Last Finalized State, identify and address as many slashing issues/bugs as possible, hardening the main net, improve performance. Current PR list at https://github.com/rchain/rchain/pulls
Current Work In Progress
Ongoing - Investigate and Fix Slashing errors: One validator node was slashed due to tuple space mismatch. First part of debugging revealed that the problem is manifesting when Trie is recalculated because of insert/delete of nodes when they share common prefix.
RCHAIN-4102Getting issue details...STATUSWe are fixing this issue. While it would certainly help reduce errors, it's not clear that this is the ONLY source of the problem. At this time this is a non-deterministic and rare error. It took 3 months to manifest and only in one of the ten main net nodes. We will continue to watch and analyze/debug it. We have to put in place a strategy to handle such errors. This is a future ToDo.
Fix rewards / cost accounting bug: In developing a dashboard to display validator rewards and costs. This seems to occur after slashing. Tomislav noticed that the sequence/timing of application of the various charges and rewards may be incorrect. PR to fix this is currently in. Developing unit tests and testing on the sandbox.
0.9.26 is targeted to be the release for completion of Last Finalized State. 0.9.25.1 is expected be released in the next week or two if no substantial issues are discovered during testing.
Addressing discovered bugs: Investigating the 'tuple space error' that we occasionally see on the main net.
Ongoing - Improvements to last finalized state issued but quite a bit of work involved still. Significant progress, some of which have been released in 0.9.25. The PR and the branch are structured so that multiple people can collaborate/ work on different parts of the feature at the same time. The scope of this work enables (a) faster catchup by new nodes - you can start from the last finalized state - this is a differentiator for RChain (b) offloading older data and differentiated storage and retrieval strategies for the same (c) allows for a leaner / less bloated node. Tomislav continuing to work and test this. Nutzipper and Will are helping to accelerate delivery. Having to pick between refactoring and work-arounds in various parts. This change touches most parts of the code base. Trying to get a more modular and future-beneficial approach.
Ongoing SRE - We got the increased IBM resources grant. Moving more servers to IBM.
Ongoing - CI/CD:
We are slowly moving from Jira to Github for the development team, started publishing release notes in Github with release 0.9.25. New issues are entered only in github. For a while, we will maintain in both Jira and Github.
Thursday 10 AM Eastern. In the last session we had a substantial discussion on sharding and interaction among shards - specifically, tree vs. mesh behavior and what token support across shards is needed to enable inter-operation.
Dan Connolly demonstrated the ability to catch and chain zulip chat messages using the pgSQL lIsten. Zulip meeting on Wednesdays 3 PM Eastern / 12 Noon Pacific (after the debrief), no meeting today. Also no meeting this coming Saturday due to July 4th holiday.
Dan is setting up one of the coop servers as a dev server for the rchat effort.
Current Backlog (partial)
Improve merging in system deploys
Improve Casper by enabling more tests and resolving identified code issues
Improve BlockMerge including refactoring RunTimeManager
Improve multi-parent Casper enablement
Implement sharding capabilities
Improve logging to be able to learn what API calls are being used, so they can be related to resource use and performance etc
Rholang 1.1 to improve syntax and user experience / learning curve