This is a Living Document

Introduction

Purpose

This document is intended to to specify the initial release of the RChain Storage Layer, known as Storage.01

References

RChain Architecture Docs: Storage and Query

The version of the storage layer envisioned for the Storage.01 release is a cut-down version of the layer described in this document.

It lacks the following features described in this document:

  • Distributed operation: Storage.01 will not operate in a distributed setting. Storage.01 is designed to work in the context of a single Rholang interpreter running alone on a single node which does not communicate with other nodes.  Later releases of the RChain Storage Layer will work with multiple instances of the Rholang interpreter or RhoVM running on a single node which will be networked with other such nodes.

There are also some important differences from the storage layer described in this document:

  • The underlying key-value store: Instead of MongoDB, the current design of the storage layer uses LMDB.  See Software Dependencies for more information.
  • The query mechanism: Originally, SpecialK used a Prolog/Datalog-based query mechanism, which was included in the original design for the RChain Storage Layer . However, this feature will not be available in the Storage.01 release.  It will be replaced with a different approach to matching on keys.


Storage Specification & Design - Deprecated

This document describes an earlier, abandoned attempt at Storage.01. 


Lmdb and Lmdbjava Usage Recommendations 

This document contains relevant information about LMDB.


Storage Issues 

Old, somewhat weird solution to each key having multiple pieces of data

Definitions

JAR: Java ARchive

"A package file format typically used to aggregate many Java class files and associated metadata and resources (text, images, etc.) into one file for distribution." (Wikipedia)


COMM Event

From RChain Architecture Docs: Execution Model

RhoVM operates against a key-value data store. A state change of RhoVM is realized by an operation that modifies which key maps to what value. Since, like Rholang, RhoVM is derived from the rho-calculus model of computation, that operation is the low-level rho-calculus reduction rule. Effectively, the reduction rule, known as the “COMM” rule, is a substitution that defines a computation P to be performed if a new value is observed at a key. A key is analogous to a name in that it references a location in memory which holds the value being substituted.


BlockDAG

A variant on a blockchain, where the each block is a node in a Directed Acyclic Graph (DAG).  In order to accomodate the CASPER consensus model, RChain's Blockchain is a BlockDAG.  In future versions of the storage layer, a history of states and means for rolling back to them will be informed by this design. 


Type class

From the Cats Documentation

Type classes are a powerful tool used in functional programming to enable ad-hoc polymorphism, more commonly known as overloading. Where many object-oriented languages leverage subtyping for polymorphic code, functional programming tends towards a combination of parametric polymorphism (think type parameters, like Java generics) and ad-hoc polymorphism.

Original paper demonstrating how to encode type classes in Scala: https://ropas.snu.ac.kr/~bruno/papers/TypeClasses.pdf

Scope

As stated above, this document is only intended to describe the minimal initial version of the storage layer, known as Storage.01.

Storage.01 will implement the basic tuplespace functionality described below for use in the Rholang Interpreter.

Storage.01

  • will not have history, so will not be a suitable candidate upon which to build a BlockDAG.
  • will not operate in a distributed mode
  • will not implement the "system processes" feature which allow a client to run arbitrary Scala code keyed to a particular channel. 




AuthorHenry Till
Updated 



Use Cases

The Storage Layer will be used as library code in an existing Scala codebase.

The Storage.01 release of the Storage Layer is envisioned to cover the following use cases:

  1. It should allow the Rholang interpreter to persist and retrieve objects related to it's execution state.  It WILL NOT execute those objects.
  2. It should be general enough to for use as a storage layer outside of the context of the Rholang interpreter.  This will be delivered as a self-contained Scala library, targeting Scala 2.11.x, 2.12.x, and eventually 2.13.x


Traceability Matrix

Design Considerations

Here are a couple of issues which need to be addressed or resolved before attempting to devise a complete design:

How do we support keys longer than the MDB_MAXKEYSIZE?  

By default, LMDB is built with a maximum key size of 511 bytes.  

We should generate the cryptographic hash (with a suitably fast hashing algorithm like BLAKE2b) of the serialized bytes of a Normalized Rholang Term.  This hash is stored as the key in two (or more) key-value pairs in separate LMDB sub-databases. 

One pair is (key: hash, value: actual key). 

The other pair is (key: hash, value: actual value). 

An important thing to note is that solution relies on the assumption that the serialization process is deterministic.  That is, it always return the same bytes, regardless of the version of serialization library.

Early (somewhat outdated) sketch contained here: https://github.com/henrytill/rchain/commit/d662006cae5ce458cffd0be622a28971aec51366


This commit sketches out an approach to handling keys that are longer
than MDB_MAXKEYSIZE (511 bytes).

It creates two separate sub-databases, one for keys and one for values.
When a key-value pair is to be added, we generate a cryptographic hash
of the key using the BLAKE2b algorithm.  The hash becomes the pair's key
in both the keys database and the values database.  We store the
original key as the value in the keys database.  We store the original
value as the value in the values database.

Matching algorithm

In discussions about the functionality of produce and consume, an important piece is the matching algorithm used to check if a channel matches against a certain pattern. 

How is this check done?

Former user (Deleted) said "Run it through a matching algorithm - this needs to be written explicitly."

Kyle Butt said the following:

We can gradually improve the matching algorithm. It's the same algorithm used for match in the interpreter.

Let's start with lists ground types and variables. That's what the RBL backend currently supports, and it's easy to do that much.

The matching algorithm is very similar to the spatial type checking algorithm found here. The basic idea is that true in the model checker corresponds to a free-variable in the pattern.

It should be easier to do some of the matching because we have things binned. Not a lot easier, but let's take what we can get.

Matching may be expensive. We should account for it in the cost modeling.

Serialization

Michael Stay (Unlicensed) has suggested that we use Protocol Buffers to define the data types common to Rholang and the Storage Layer.  Protocol Buffers offer a straightforward means of binary serialization.  The serialized Rholang objects will be stored in LMDB.

See the following tickets for a description of the planned work:

See System Dependencies for more information about Protocol Buffers.

Interfaces  

Software Interface

Storage.01 will be programmed in Scala as a Scala library. 

For the Rholang Interpreter, the class files will be linked transparently as part of the sbt build process, and included in a packaged Node.

For users outside of the RChain project, this will be delivered as a self-contained JAR, targeting Scala 2.11.x, 2.12.x, and eventually 2.13.x

User Interface

(From the standpoint of a user writing Scala code)

After importing coop.rchain.storage package, a user will instantiate a Storage object with the desired configuration parameters like the location of the database file and the size of the underlying key-value map.

Generic/Extensible API

The Storage.01 is designed to be agnostic to the underlying data types that it persists.  It does this by providing a generic, type class-based API.  For more information about type classes, see above.

In order to use the library, a user will need to provide instances for two type classes: Serialize and Match.

Serialize

Instances of the Seralize type class are used to serialize and deserialize the data types that user wishes to persist in the storage layer.  

Essentially, this means providing definitions for two functions: 

  1. encode, which converts a given object to Array[Byte]
  2. decode, which converts a given Array[Byte] to that object
Match

Instances of the Match type class are used for matching a user-determined pattern type against the keys in the storage layer.

Essentially this means providing a definition for one function:

  1. isMatch which takes a value of the some type A representing patterns and a value of some type B representing data and determines if they are a match, returning a Boolean value.
Rholang

In the case of the Rholang interpreter, these are the the types that form Rholang's core Abstract Data Types (ADTs).  We will provide instances of Serialize and Match for these data types.

Core Actions

The user will then use two functions to interact with the Storage object, produce and consume.  

These functions are presented below in the (deprecated) RSpace 0.1 Specification#API Section.

Finishing up

When a user has completed their interactions with the Storage object, they will call a close function to clean up its resources.


System Overview/Architecture

Prototype

A Scala prototype based on Former user (Deleted)'s Python version is here:

https://github.com/henrytill/mirror-world

API (Sketch)

// https://github.com/rchain/rchain/blob/2d65ab3e281fd88598159a9ed4e6208c51de16f1/rholang/src/main/scala/coop/rchain/rholang/interpreter/rholangADT.scala

sealed trait Channel
case class Quote(p: Par) extends Channel
case class ChanVar(cvar: Var) extends Channel

case class Send(chan: Channel, data: List[Par])

type Pattern = List[Channel]
type Source  = Channel

case class Receive(binds: List[(Pattern, Source)])

// and one new type:
case class Continuation(par: Par, env: Map[Int, Par])


/**
  * Interface for the main storage object
  *
  * @tparam C The type of channels
  * @tparam P The type of patterns
  * @tparam A The type of data (aka "waiting writes")
  * @tparam K The type of objects (aka "continuations") 
  */
trait IStore[C, P, A, K]


/**
  * Actions on a `Storage`
  *
  * In Rholang:
  *   C = Channel
  *   P = Pattern (where Pattern = List[Channel])
  *   A = List[Quote]
  *   K = Continuation (where Continuation = ?)
  *
  * A storage layer for the Rholang interpreter will provide `produce` and `consume` functions specialized to:
  *
  *  consume(storage: IStore[Channel, List[Channel], List[Quote], Continuation], 
  *          key: List[Channel],
  *          patterns: List[List[Channel]],
  *          k: Continuation): Option[(K, List[List[Quote]])]
  *
  *  produce(storage: IStore[Channel, List[Channel], List[Quote], Continuation],
  *          key: Channel
  *          data: List[Quote]): Option[(K, List[List[Quote]])]
  *
  */
trait StorageActions {

  /** `consume` does the "consume" thing
    *
    * @param store
    * @param channels
    * @param patterns
    * @param continuation
    * @tparam C a type representing a channel
    * @tparam P a type representing a pattern
    * @tparam A a type representing a piece of data
    * @tparam K a type representing a continuation
    */
  def consume[C, P, A, K](store: IStore[C, P, A, K],
                          channels: List[C],
                          patterns: List[P],
                          continuation: K)(implicit m: Match[P, A]): Option[(K, List[A])]

  /** `produce` does the "produce" thing
    *
    * @param store
    * @param channel
    * @param data
    * @tparam C a type representing a channel
    * @tparam P a type representing a pattern
    * @tparam A a type representing a piece of data
    * @tparam K a type representing a continuation
    */
  def produce[C, P, A, K, T](store: IStore[C, P, A, K], 
                             channel: C, 
                             data: A)(implicit m: Match[P, A]): Option[(K, List[A])]
}

Serialize Type Class

trait Serialize[A] {

  def encode(a: A): Array[Byte]

  def decode(bytes: Array[Byte]): Either[SerializeError, A]
}
Example Instance

Let's say an external user wants to persist Strings as data (in other words, in the above code, String is substituted for type A).

They would write an instance of Serialize for String like this:

implicit object stringInstance extends Serialize[String] {

  def encode(s: String): Array[Byte] =
    s.getBytes(StandardCharsets.UTF_8)

  def decode(bytes: Array[Byte]): Either[SerializeError, String] =
    Either
      .catchNonFatal(new String(bytes, StandardCharsets.UTF8))
      .leftMap(SerializeError.apply)
}

They would then bring this into the scope where they were calling produce and consume.

Match Type Class

/**
  * Type class for matching patterns with data.
  *
  * In Rholang, `P` is `List[Channel]`, and `A` is `List[Quote]`.
  *
  * @tparam P The type of Patterns
  * @tparam A The type of Data
  */
trait Match[P, A] {

  def isMatch(p: P, a: A): Option[List[A]]
}
Example Instance

Let's say an external user wants to use String as the type of patterns (in other words, in the above code, String is substituted for type P), and String as the type of data (in the above code String is substituted for type A).  They have also decided that they only want to use a trivial implementation of match that tests String equality.

They would write an instance of Match like this:

implicit object stringMatcher extends Match[String] {

  def isMatch(s1: String, s2: String): Boolean = 
   s1 == s2
}

They would then bring this into the scope where they were calling produce and consume.

More Notes from Kyle Butt

(This is essentially the Rholang conceptual model for using the tuplespace.  In a generalized implementation, the details will vary).

3 parts:

  1. The list of waiting continuations, along with their List[(Channel, Pattern)]. Stored at the key List[Channel].
  2. A mapping from Channel → List[Channel].
  3. Data of type Seq[Par], stored at the key Channel.  

Consume

A consume checks #3, to see if a COMM Event can be produced.

If not, it stores the relevant info in #1 and #2.

Produce

produce uses #2 to get List[Channel].

It checks them against #3 and #1 to see if a COMM Event can be created.

If a COMM Event can't be created, the produce is stored in #3.

It needs to hold the locks for each Channel for the duration of a single check*.

So storage doesn't just transition on COMM Events, it transitions of every send and receive. 

  • Kyle Butt will have to think about the ships in the night problem and locking.

Discussion

From the perspective of a generalized storage layer, a the produce or consume functions return some object of type K (see below).  When the storage layer is used in the Rholang interpreter, this K will be a Continuation which the interpreter can then call, producing the aforementioned COMM Event.

Regarding #3, what is the type signature of a "waiting write"?

  1. Former user (Deleted) says, "If I understand this correctly it is just arbitrary data at a channel."
  2. Kyle Butt clarifies: For rholang, it is a name. (Which may be a quoted process). The name gets substituted upon send, so we either explicitly do the substitution, or pass an environment.

Consume, in Pseudocode

Look up key in (2), returning a list of Channels
Check if data matching each pattern is available on each channel
yes:
  Remove it from table and pass it to continuation
no:
  Produce joined channel
  Store continuation and patterns at joined channel
  add joined channel -> [[channel pattern]] mapping
  (Not necessary if we store the joined channel separated below)
  add channel -> joined channel mapping

Discussion

What is the "joined channel" that we are supposed to produce?

Former user (Deleted) says "Just a tuple of channels." Kyle Butt agrees


Could we clarify the last two lines? It's not clear to me what the "joined channel separated below" is.

Former user (Deleted) says "Because we store the joined channel as a tuple, we can just look it up as a tuple. In Rosette you can't store tuples as keys to table/dictionary and hence the hack.

See https://gist.github.com/KentShikama/8b1c83a88e0a87785c3c70fb3605a3fb for an implementation without the last two lines necessary." 

Produce, in Pseudocode

for each joined channel in (2)
  for each channel in joined channel
    if channel == send.chan and pattern matches data
      save send.data
    elif channel has data that matches the pattern
      save send.data
    else
     this joined channel isn't viable
if any joined channel is viable
  pick 1 and eat the data
  send the data to the picked continuation
else
  store the send.data

;;; Kent provided a small correction to this:
;;;
;;;   The produce side makes a quick check to see if the current data matches
;;;   the pattern at the bat-channel of the joined channel. Otherwise, we'll
;;;   be looping through every continuation of every joined channel. Also note
;;;   that each (joined-)channel can store multiple continuations (and
;;;   products/data).

Discussion

Is "eat" the same as "consume"?

Former user (Deleted) says "Yes".  

Kyle Butt says "No. eating just meant removing it from the store so no other continuation gets it  It should also say 'send the data to the continuation'".

Former user (Deleted) says "If by consume it was originally meant that included also sending the data to the picked continuation, then I agree with Kyle".  


Is "store" the same as "save"?

Former user (Deleted) and Kyle Butt say yes.

Limitations

List any limitations or system constraints that have an impact on the design of the software.  These can be any of the following:

Hardware/software environment, End User Environment, Resource availability, Standards Compliance, Interop, Interface/Protocol requirements, Data requirements, Security, Performance, Networking, Verification & validation, means to address quality goals.

Assumptions and Dependencies

Software Dependencies

Lightning Memory-mapped Database

"An ultra-fast, ultra-compact, crash-proof key-value embedded data store."

General Information: 

https://symas.com/lmdb

http://www.lmdb.tech/doc/index.html (Doxygen documentation for C version) 

JNR-FFI Wrapper Library for use from Scala

https://github.com/lmdbjava/lmdbjava

http://www.javadoc.io/doc/org.lmdbjava/lmdbjava/0.6.0 (javadoc)

Protocol Buffers (serialization format)

"...are a language-neutral, platform-neutral extensible mechanism for serializing structured data."

Both Rholang objects and Blockchain blocks will be serialized to and deserialized from binary using the Protocol Buffer serialization format. 

The serialized data will be stored in LMDB as the value portion of the key-value pair.

https://developers.google.com/protocol-buffers/

ScalaPB

"...is a protocol buffer compiler plugin for Scala. It will generate Scala case classes, parsers and serializers for your protocol buffers."

https://scalapb.github.io/

Architectural Strategies

Describe any design decisions and/or strategies that affect the overall organization of the system and its higher-level structures. These strategies should provide insight into the key abstractions and mechanisms used in the system architecture. Describe the reasoning employed for each decision and/or strategy (possibly referring to previously stated design goals and principles) and how any design goals or priorities were balanced or traded-off. Such decisions might concern (but are not limited to) things like the following:

  • Use of a particular type of product (programming language, database, library, etc. ...)
  • Reuse of existing software components to implement various parts/features of the system
  • Future plans for extending or enhancing the software
  • User interface paradigms (or system input and output models)
  • Hardware and/or software interface paradigms
  • Error detection and recovery
  • Memory management policies
  • External databases and/or data storage management and persistence
  • Distributed data or control over a network
  • Generalized approaches to control
  • Concurrency and synchronization
  • Communication mechanisms
  • Management of other resources


System Architecture

This section should provide a high level overview of how the functionality and responsibilities of the the system were partitioned and then assigned to subsystems or components.  The main purpose here is to gain a general understanding of how and why the system was decomposed, and how the individual parts work together to provide the desired functionality.  If there are diagrams, models, flowcharts, or documented scenarios.  This applies only to the scope of the document, not the entire system.