Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Date

Attendees

Goals

  • Following our conversation with Nash earlier this week, we will use this time to discuss next steps to bringing production engineering to the RChain project.

Discussion items

Item

Notes

Resources

Backward compatibility

  • Establishing a mindset for backward compatibility

  • Plan to start testing protobuf schema after Mercury

Plan for upgrading testnet

  • Deliverable from the SRE team

  • Test the plan

  • Gather metrics to the health of the cluster

Predicting failure

  • Prerequisite

    • Metrics for what good node performance looks like

      • Health of clique

Baselines

  • Memory consumption

    • How do we establish the baselines?

    • What processes do we put in place to monitor change in baselines?

  • Resilience to errors on the network

How do we monitor the network?

  • Discussion about how we monitor

    • Do we offer a centralized system?

      • Becomes difficult as the network grows

    • Do we offer a monitoring module with node software?

      • Achieving buy-in from node operators may be difficult or at best inconsistent

    • Do we use the Casper protocol to help bubble up metrics on the network?

  • Do we need to support monitoring beyond the clique?

    • Measure things on a single node

    • Aggregate metrics for nodes on the clique

SLO Speed

  • What does it mean that the network supports 40K transactions/second?

    • The business requirement is to measure 40K COMM events/second

    • This means COMM events

      • There is a misunderstanding in the community between COOM events and TRANSACTIONS

      • COMM events are a join

      • Deploys aren’t a good metric because the platform is computational and not token transfer

    • What tools do we need to monitor a network to understand it’s transaction speed?

  • What does it mean to be secure?

    • What tools do we need to monitor a network to understand it’s security?

  • What does it mean to be scalable?

    • What

SLO Security

  • DOS attack is the priority for Mercury

    • Ability to detect the number of deploys a node receives

    • Ability to detect the number of proposes a node makes

  • Trust infrastructure

    • Proof of stake

    • What is the trust between two nodes? Does proof of stake

SLO Reliability

  • What does it mean to be reliable

  • Idea to test with a testnet crash

Action items

Decisions

  • No labels