RNode feature gaps

WIP

Persistence optimization

Cold Store

https://github.com/rchain/rchain/blob/dev/rspace/src/main/scala/coop/rchain/rspace/history/ColdStore.scala

RSpace in its current form does not handle singular values under a given channel. It deals with lists.

rho: x!("42")
cold store: hash(List("42")) -> List("42")
rho: x!("43")
cold store: hash(List("42", "43")) -> List("42", "43")
rho: x!("44")
cold store: hash(List("42", "43", "44")) -> List("42", "43", "44")

This has two consequences that need to be considered:

  1. lookup
  2. size on disk
  3. LMDB limit on data per hash

ad 1.

When RSpace needs to deal with channel "x" it has to fetch the whole list and deserialize its contents.

ad 2.

There is no data sharing: in the above example, "42" will be stored 3 times.

ad 3.

There is a limit to how much data can fit under one name in LMDB. The current implementation assumes that it will never be reached.


Random part

RSpace.createWithReplay[
      F,
      Par,
      BindPattern,
      ListParWithRandom,
      TaggedContinuation
    ]


case class ListParWithRandom(
    pars: _root_.scala.collection.Seq[coop.rchain.models.Par] = _root_.scala.collection.Seq.empty,
    randomState: coop.rchain.crypto.hash.Blake2b512Random = coop.rchain.models.ListParWithRandom._typemapper_randomState.toCustom(_root_.com.google.protobuf.ByteString.EMPTY)
    )
TaggedContinuation contains a

case class ParBody(value: coop.rchain.models.ParWithRandom)


RSpace is unaware of what data it contains.

trait ISpace[F[_], C, P, A, K]

class RSpace[F[_], C, P, A, K](
    historyRepository: HistoryRepository[F, C, P, A, K],
    storeAtom: AtomicAny[HotStore[F, C, P, A, K]],
    branch: Branch
)(
    implicit
    serializeC: Serialize[C],
    serializeP: Serialize[P],
    serializeA: Serialize[A],
    serializeK: Serialize[K],

The nature of how rholang discovers change (a notion of time, evolution, progress etc.) is by an element that is able to divide and merge. In the current implementation it is the random part (datum has randomState in ListParWithRandom, continuation has its in ParBody).

This creates an awkward situation for RSpace - it cannot share data as every piece is unique (the random part guarantees this).


Proposed solution

Cold Store is the "binary dump" of rholang. Whatever makes it through checkpointing will be persisted in this structure.

There are 3 types of data written to the Cold Store:

final case class JoinsLeaf(bytes: ByteVector)         extends PersistedData
final case class DataLeaf(bytes: ByteVector)          extends PersistedData
final case class ContinuationsLeaf(bytes: ByteVector) extends PersistedData

Joins don't contain random parts, so it is possible to apply the following proposal 'as is'.

Data and Continuations do, so they require more context to be able to share data.


When RSpace is fetching data for matching it knows what channels are being addressed.

Cold Store could store lists of pointers with an optional random part. This way the lookup algorithm would be able to fetch a channel (list of pointers), choose one element, deserialize it, match it and, hopefully, cache it in the Hot Store.

This way the cost of deserialization is dampened severely (instead of deserializing all of the data under a given channel it is possible to delay it).

This approach consists of two main changes:

  • RSpace would have to be aware of the split (at least the data structures need to allow lazy fetching)
  • the Cold Store needs to know how to extract the RandomParts from Data and Continuations

It should not affect other pieces of RSpace.


When will this become relevant?

As soon as it is observed that users use channels to persist more than a couple of values. 

RCHAIN-940 - Getting issue details... STATUS

It was also observed during the initial vault demo when one channel was used as a logging channel.

Bind Patterns for continuations

This reorganization also relates to https://rchain.atlassian.net/projects/RCHAIN/issues/RCHAIN-1851

In short - returning a continuation differs from returning of a datum in that the continuation needs to match on a "bind pattern" besides the channel.

With the assumption that a given channel contains more than one continuation - it would make sense to be able to fetch only the patterns and return only the chosen matching continuation.

This fits pretty well with the notion of storing the contents of a channel as a list of pointers do bytes plus metadata (random part, bind pattern).

Hot Store Changes

During checkpoint, RSpace requests a list of changes from the HotStore.

The preferred way of interacting with the Hot Store is to have consumes and produces cancel out as much as possible. This means that if a channel is populated in a given deploy - it should be ideally consumed in the same deploy. This way no data will be persisted. This, of course, is not possible in real-life cases but it should be a rule of thumb for coding Rholang to keep Cold Store as lean as possible.

Currently, Hot Store is unable to distinguish between a value that was created and removed and a value that was fetched from the Cold Store and removed.

Optimization of the creation of the changes list would be to track such values and never pass them to the Cold Store - this way fetching of the internal representation of the state to check if a key actually exists can be omitted.

Finalization

The process of finalization is not automatic and no mechanisms around requesting and discovering finalization were introduced.

Essentially this means that finalization is not part of the system. It is possible to trigger finalization manually and get the current value of it but:

  • manual finalization means that if the chain is big enough requesting finalization will stall a node (vector of attack)
  • finalized blocks will not be automatically discovered, therefore they will not speed up the system
  • the threshold is node-specific, which means this can be abused (e.g. threshold of -1.0)

RholangCLI

It would be beneficial to allow the RholangCLI to set its "tip" to an existing block hash.

This can be achieved similarly to how ReportingCasper manipulates the state of RSpace and the HotStore.

The gain would be that a user could run rholang against some relevant state.