Post mortem for Oct. 23 test net launch

Timeline

  • 2018-10-23 13:00 UTC SRE team instantiated the test net bootstrap node to support the genesis ceremony
  • 2018-10-23 14:00 UTC Test net launchers received instructions in Information for testnet launchers and validators and in the community testing session for how to participate in the ceremony
  • 2018-10-23 16:00 UTC Starting here and over the next 16 hours 15 validators were able to process and approve the genesis block
  • 2018-10-24 13:21 UTC The genesis ceremony timer ran out and the bootstrap node transition and showed block 0, the approved genesis block. Meanwhile over the next 8 hours the 15 validating nodes started processing the genesis block. Three crashed with out of memory issues. Ten were taken down due to unhealthy memory and utilization levels. 
  • 2018-10-24 21:00 UTC Three node operators attempted to join the network with the goal of receiving the signed genesis block from the test net bootstrap. No one was able to receive the block due to it exceeding maximum message size. Upon hitting the error, node requested genesis blocks from other peers. Each node operator received a different genesis block.
  • 2018-10-24 23:00 UTC Meeting with core developers to discuss. Made the following decisions:
    • Kent to ticket and investigate issues related to ```none.get``` and why the --required-sigs flag in the run command did not prevent node from accepting a different genesis block.
    • Pawel to resolve the message size issue in CORE-1392
    • Tickets created to improve the processing of the genesis block: RHOL-778 and CORE-1387
    • Create a ticket to develop an integration test to test processing the genesis block.
    • Relaunch the test net with a smaller genesis block on Oct. 25.

Analysis of what went wrong and recommendations

  • We caught a big bug. Being able to process, approve, and receive a genesis block containing the information required for REV issuance is a requirement of Mercury.

Related tickets

key summary type created updated due assignee reporter priority status resolution
Loading...
Refresh