2019-05-07 Meeting notes: hardening requirements

Date

May 7, 2019

Participants

  • @Kelly Foster

  • @Pawel Szulc

  • @Lucius Meredith

  • @Łukasz Gołębiewski

  • @Ovidiu Deac

  • @Former user (Deleted)

  • @Timm Schäuble

  • @Artur Gajowy

  • @Adam Szkoda

  • @Tomáš Virtus

  • @Dominik Zajkowski

  • @Sebastian Bach

  • @Chris Boscolo

  • @Kayvan Kazeminejad

Goals

We have a document with the requirements for hardening the platform.

Discussion topics

Item

Notes

Item

Notes

Resources

Questions

  • What are the hardening requirements for components of the platform and the platform as a whole? OR How do we challenge duration, load, and stability?

  • What are ideas recommendations for the hardening effort?

What will we do as part of the hardening effort?

  • Generative testing

    • Consensus

      • @Pawel Szulc is taking lead

      • Generate random DAG. Run an oracle to find finalized block. Add more blocks to get to next finalization. Validate finalization change

    • Rholang

      • @Timm Schäuble is taking lead

      • Generate random AST. Validate correctness.

      • Run various property tests for the interpreter.

      • Greg’s suggestion

        • This is a framework that could be used “quick check for Rholang”

        • Based on John Holland’s idea of genetic algorithms

  • Consider running games described for Ceres Games

  • Design and runs a game to take down the network

    • Malicious validators with a small amount of weight

    • Goal to increase the weight required

  • Code audit

    • Cross-team internal code audit

    • External

  • HackerOne engagement

  • Hardening infrastructure

    • Geographic distribution of nodes using GCP options

    • Wide

      • How many networks can we afford to operate? (pubnet, devnet, Whiteblock)

        • Could we have multiple instances in Whiteblock?

      • Deep

        • How many tests can we run end-to-end in a 24-hour period?

  • Fight regression

    • Use testnet better to run dev

  • Mechanism for generate, store, and deploy contracts

    • TICKET tool to ask node what was deployed

    • Goal to ensure we’re using a wide variety of contracts in our tests

  • A/B testing with devnet and pubnet

    • Mirror every deploy made to pubnet to devnet

  • Performance

    • TICKET alerting for the perf harness

  • Chaos monkey

    • Hardware instability

    • Random killing of node processes

    • Packet dropping

    • Do at both OS and GCP levels

Discussion about tests

Things we need to improve testing

  • Need a better tuplespace metric

    • @Dominik Zajkowski reports work in progress

  • Confirm or add metrics for RNode

    • How long is the deploy queue for validators

Misc.

  • Can we run 5-6k tests per week?

  • We can accelerate by reusing the DAG as it gets longer

    • Ex run tests, pull DAG, add it to another test and keep building on it

  • Discussion about a hard fork after mainnet launch

    • We expect there would be a hard fork(s)in the first 1-2 years.

Discussion about Mercury release criteria

Which contracts to stress the network?

  • Recommendation to use wallet transfer

  • Recommendation to use RSong contracts in a black box testing scenario

  •  

Discussion about grace period

  • What else could we use other than specified grace period to understand when Casper is complete

Action items

@Kelly Foster find documents create for Ceres Games
@Kelly Foster ticket alerting for perf harness
@Kelly Foster ticket incorporate RCat into our test suite
@Kayvan Kazeminejad bring RCAT yaml files into rchain/rchain
@Kelly Foster find time to walk through RNode upgrade on a live network
@Kelly Foster ticket review of gRPC calls to see if we can close off anything else (ex propose).
@Kelly Foster ticket budgeting for hardening infrastructure
@Kelly Foster ask Whiteblock about how many instances we can run in our current contract
@Kelly Foster ticket tooling in RNode to ask what contract was deployed
@Kelly Foster ticket specify chaos monkey test plan
@Kelly Foster ticket improve metrics reporting on the tuplespace
@Kelly Foster ticket provide metric for showing how many deploys are in the queue
@Kelly Foster take a first pass at updating https://rchain.atlassian.net/wiki/spaces/CORE/pages/564199720
@Adam Szkoda document

Decisions