Block merging - trie merge

WIP

Motivation

Disclaimer

This document discusses trie merge. It does not touch on creation of a merged block or the replay of such a block.

Why is it needed?

Currently the only method of merging two blocks is by rerunning the deploys that happened in two blocks. This means:

having a block A (created by validator A)
and a list of blocks B (created by validators B1-Bn) that need to be merged with A

1) Pick A as base block.

2) Pick blocks from B which have deploys that do not conflict with A (B')

3) Run deploys from B' on top of A.

This results in a merged block C.

This approach is sound but inefficient.

First - the blocks were already verified (via replay) so rerunning their deploys does not introduce any value.

Second - the values are already known, as long as there is a method of resolving conflicts - the data can be used as-is but the indexing structure needs to be rebuilt.

TL;DR It's a performance improvement.

Where will it be used?

In the consensus algorithm, during block merging. When two blocks are considered "same height" they need to be merged to allow a perceived DAG to grow.

What does it do?

High level overview:

it's possible to identify the underlying structure that indexes the data in blocks A and B
identify the differences between A and B by comparing the underlying structures
- pick the nonconflicting pieces as is
- resolve conflicts on conflicting paths (some sort of conflict resolution/merging strategy)
recalculate the paths that changed due to combining not conflicting paths/merged paths

This will produce a new underlying indexing structure which can be used as the basis for a block.

Implementation

Where will it be implemented?

The logical place seems to be trait ISpace. It has access to HistoryRepository and is able to commit new tries.

How will it be implemented?

def merge(conflictResolver: ConflictResolver[C])(parentHash: Blake2b256Hash, hashA: Blake2b256Hash, hashB: Blake2b256Hash): Option[Blake2b256Hash]

Conflict Resolution

Block merging - conflict resolution - peek

The current implementation does a mapping:

  private def buildBlockAncestorChannels[F[_]: Monad: BlockStore](
      blockAncestorsMeta: List[BlockMetadata]
  ): F[Set[ByteString]] =
    for {
      maybeAncestors <- blockAncestorsMeta.traverse(
                         blockAncestorMeta => BlockStore[F].get(blockAncestorMeta.blockHash)
                       )
      ancestors      = maybeAncestors.flatten
      ancestorEvents = ancestors.flatMap(_.getBody.deploys.flatMap(_.deployLog))
      ancestorChannels = ancestorEvents.flatMap {
        case Event(Produce(produce: ProduceEvent)) =>
          Set(produce.channelsHash)
        case Event(Consume(consume: ConsumeEvent)) =>
          consume.channelsHashes.toSet
        case Event(Comm(CommEvent(Some(consume: ConsumeEvent), produces))) =>
          consume.channelsHashes.toSet ++ produces.map(_.channelsHash).toSet
        case _ => throw new RuntimeException("incorrect ancestor events")
      }.toSet
    } yield ancestorChannels

This throws away all of the additional information carried in Event.

A plan to enable a denser perceived DAG is to introduce more introspection and a broader set of conflict resolution rules.

Two rules that will be implemented soon are:

peeking values
invoking persistent continuations

These do not require additional context information and can be resolved at channel level.

IOW the planned changes to conflict resolution should not block the future work on trie merging.

It would make sense to move the mechanism itself to a dedicated interface which will be used in merge.

Open topics

is it possible to use this mechanism to merge multiple tries at once

RChain wiki