The Storage Layer will be used as library code in an existing Scala codebase.
The Storage.01 release of the Storage Layer is envisioned to cover the following use cases:
Here are a couple of issues which need to be addressed or resolved before attempting to devise a complete design:
MDB_MAXKEYSIZE
? By default, LMDB is built with a maximum key size of 511 bytes.
We should generate the cryptographic hash (with a suitably fast hashing algorithm like BLAKE2b) of the serialized bytes of a Normalized Rholang Term. This hash is stored as the key in two (or more) key-value pairs in separate LMDB sub-databases.
One pair is (key: hash, value: actual key).
The other pair is (key: hash, value: actual value).
An important thing to note is that solution relies on the assumption that the serialization process is deterministic. That is, it always return the same bytes, regardless of the version of serialization library.
Early (somewhat outdated) sketch contained here: https://github.com/henrytill/rchain/commit/d662006cae5ce458cffd0be622a28971aec51366
This commit sketches out an approach to handling keys that are longer than MDB_MAXKEYSIZE (511 bytes). It creates two separate sub-databases, one for keys and one for values. When a key-value pair is to be added, we generate a cryptographic hash of the key using the BLAKE2b algorithm. The hash becomes the pair's key in both the keys database and the values database. We store the original key as the value in the keys database. We store the original value as the value in the values database.
In discussions about the functionality of produce and consume, an important piece is the matching algorithm used to check if a channel matches against a certain pattern.
How is this check done?
Former user (Deleted) said "Run it through a matching algorithm - this needs to be written explicitly."
Kyle Butt said the following:
We can gradually improve the matching algorithm. It's the same algorithm used for
match
in the interpreter.Let's start with lists ground types and variables. That's what the RBL backend currently supports, and it's easy to do that much.
The matching algorithm is very similar to the spatial type checking algorithm found here. The basic idea is that
true
in the model checker corresponds to a free-variable in the pattern.It should be easier to do some of the matching because we have things binned. Not a lot easier, but let's take what we can get.
Matching may be expensive. We should account for it in the cost modeling.
Michael Stay (Unlicensed) has suggested that we use Protocol Buffers to define the data types common to Rholang and the Storage Layer. Protocol Buffers offer a straightforward means of binary serialization. The serialized Rholang objects will be stored in LMDB.
See the following tickets for a description of the planned work:
See System Dependencies for more information about Protocol Buffers.
Storage.01 will be programmed in Scala as a Scala library.
For the Rholang Interpreter, the class files will be linked transparently as part of the sbt build process, and included in a packaged Node.
For users outside of the RChain project, this will be delivered as a self-contained JAR, targeting Scala 2.11.x, 2.12.x, and eventually 2.13.x
(From the standpoint of a user writing Scala code)
After importing coop.rchain.storage
package, a user will instantiate a Storage
object with the desired configuration parameters like the location of the database file and the size of the underlying key-value map.
The Storage.01 is designed to be agnostic to the underlying data types that it persists. It does this by providing a generic, type class-based API. For more information about type classes, see above.
In order to use the library, a user will need to provide instances for two type classes: Serialize
and Match
.
Instances of the Seralize
type class are used to serialize and deserialize the data types that user wishes to persist in the storage layer.
Essentially, this means providing definitions for two functions:
encode
, which converts a given object to Array[Byte]
decode
, which converts a given Array[Byte]
to that objectInstances of the Match
type class are used for matching a user-determined pattern type against the keys in the storage layer.
Essentially this means providing a definition for one function:
isMatch
which takes a value of the some type A
representing patterns and a value of some type B
representing data and determines if they are a match, returning a Boolean
value.In the case of the Rholang interpreter, these are the the types that form Rholang's core Abstract Data Types (ADTs). We will provide instances of Serialize
and Match
for these data types.
The user will then use two functions to interact with the Storage
object, produce
and consume
.
These functions are presented below in the (deprecated) RSpace 0.1 Specification#API Section.
When a user has completed their interactions with the Storage
object, they will call a close
function to clean up its resources.
A Scala prototype based on Former user (Deleted)'s Python version is here:
https://github.com/henrytill/mirror-world
// https://github.com/rchain/rchain/blob/2d65ab3e281fd88598159a9ed4e6208c51de16f1/rholang/src/main/scala/coop/rchain/rholang/interpreter/rholangADT.scala sealed trait Channel case class Quote(p: Par) extends Channel case class ChanVar(cvar: Var) extends Channel case class Send(chan: Channel, data: List[Par]) type Pattern = List[Channel] type Source = Channel case class Receive(binds: List[(Pattern, Source)]) // and one new type: case class Continuation(par: Par, env: Map[Int, Par]) |
/** * Interface for the main storage object * * @tparam C The type of channels * @tparam P The type of patterns * @tparam A The type of data (aka "waiting writes") * @tparam K The type of objects (aka "continuations") */ trait IStore[C, P, A, K] |
/** * Actions on a `Storage` * * In Rholang: * C = Channel * P = Pattern (where Pattern = List[Channel]) * A = List[Quote] * K = Continuation (where Continuation = ?) * * A storage layer for the Rholang interpreter will provide `produce` and `consume` functions specialized to: * * consume(storage: IStore[Channel, List[Channel], List[Quote], Continuation], * key: List[Channel], * patterns: List[List[Channel]], * k: Continuation): Option[(K, List[List[Quote]])] * * produce(storage: IStore[Channel, List[Channel], List[Quote], Continuation], * key: Channel * data: List[Quote]): Option[(K, List[List[Quote]])] * */ trait StorageActions { /** `consume` does the "consume" thing * * @param store * @param channels * @param patterns * @param continuation * @tparam C a type representing a channel * @tparam P a type representing a pattern * @tparam A a type representing a piece of data * @tparam K a type representing a continuation */ def consume[C, P, A, K](store: IStore[C, P, A, K], channels: List[C], patterns: List[P], continuation: K)(implicit m: Match[P, A]): Option[(K, List[A])] /** `produce` does the "produce" thing * * @param store * @param channel * @param data * @tparam C a type representing a channel * @tparam P a type representing a pattern * @tparam A a type representing a piece of data * @tparam K a type representing a continuation */ def produce[C, P, A, K, T](store: IStore[C, P, A, K], channel: C, data: A)(implicit m: Match[P, A]): Option[(K, List[A])] } |
trait Serialize[A] { def encode(a: A): Array[Byte] def decode(bytes: Array[Byte]): Either[SerializeError, A] } |
Let's say an external user wants to persist String
s as data (in other words, in the above code, String
is substituted for type A
).
They would write an instance of Serialize
for String
like this:
implicit object stringInstance extends Serialize[String] { def encode(s: String): Array[Byte] = s.getBytes(StandardCharsets.UTF_8) def decode(bytes: Array[Byte]): Either[SerializeError, String] = Either .catchNonFatal(new String(bytes, StandardCharsets.UTF8)) .leftMap(SerializeError.apply) } |
They would then bring this into the scope where they were calling produce
and consume
.
/** * Type class for matching patterns with data. * * In Rholang, `P` is `List[Channel]`, and `A` is `List[Quote]`. * * @tparam P The type of Patterns * @tparam A The type of Data */ trait Match[P, A] { def isMatch(p: P, a: A): Option[List[A]] } |
Let's say an external user wants to use String
as the type of patterns (in other words, in the above code, String
is substituted for type P), and String
as the type of data (in the above code String
is substituted for type A). They have also decided that they only want to use a trivial implementation of match that tests String
equality.
They would write an instance of Match
like this:
implicit object stringMatcher extends Match[String] { def isMatch(s1: String, s2: String): Boolean = s1 == s2 } |
They would then bring this into the scope where they were calling produce
and consume
.
(This is essentially the Rholang conceptual model for using the tuplespace. In a generalized implementation, the details will vary).
3 parts:
List[(Channel, Pattern)]
. Stored at the key List[Channel]
.Channel → List[Channel]
.Seq[Par]
, stored at the key Channel
. A consume
checks #3, to see if a COMM Event can be produced.
If not, it stores the relevant info in #1 and #2.
produce
uses #2 to get List[Channel]
.
It checks them against #3 and #1 to see if a COMM Event can be created.
If a COMM Event can't be created, the produce is stored in #3.
It needs to hold the locks for each Channel
for the duration of a single check*.
So storage doesn't just transition on COMM Events, it transitions of every send and receive.
From the perspective of a generalized storage layer, a the produce
or consume
functions return some object of type K
(see below). When the storage layer is used in the Rholang interpreter, this K
will be a Continuation
which the interpreter can then call, producing the aforementioned COMM Event.
Regarding #3, what is the type signature of a "waiting write"?
Look up key in (2), returning a list of Channels Check if data matching each pattern is available on each channel yes: Remove it from table and pass it to continuation no: Produce joined channel Store continuation and patterns at joined channel add joined channel -> [[channel pattern]] mapping (Not necessary if we store the joined channel separated below) add channel -> joined channel mapping |
What is the "joined channel" that we are supposed to produce?
Former user (Deleted) says "Just a tuple of channels." Kyle Butt agrees
Could we clarify the last two lines? It's not clear to me what the "joined channel separated below" is.
Former user (Deleted) says "Because we store the joined channel as a tuple, we can just look it up as a tuple. In Rosette you can't store tuples as keys to table/dictionary and hence the hack.
See https://gist.github.com/KentShikama/8b1c83a88e0a87785c3c70fb3605a3fb for an implementation without the last two lines necessary."
for each joined channel in (2) for each channel in joined channel if channel == send.chan and pattern matches data save send.data elif channel has data that matches the pattern save send.data else this joined channel isn't viable if any joined channel is viable pick 1 and eat the data send the data to the picked continuation else store the send.data ;;; Kent provided a small correction to this: ;;; ;;; The produce side makes a quick check to see if the current data matches ;;; the pattern at the bat-channel of the joined channel. Otherwise, we'll ;;; be looping through every continuation of every joined channel. Also note ;;; that each (joined-)channel can store multiple continuations (and ;;; products/data). |
Is "eat" the same as "consume"?
Former user (Deleted) says "Yes".
Kyle Butt says "No. eating just meant removing it from the store so no other continuation gets it It should also say 'send the data to the continuation'".
Former user (Deleted) says "If by consume it was originally meant that included also sending the data to the picked continuation, then I agree with Kyle".
Is "store" the same as "save"?
Former user (Deleted) and Kyle Butt say yes.
List any limitations or system constraints that have an impact on the design of the software. These can be any of the following:
Hardware/software environment, End User Environment, Resource availability, Standards Compliance, Interop, Interface/Protocol requirements, Data requirements, Security, Performance, Networking, Verification & validation, means to address quality goals.
"An ultra-fast, ultra-compact, crash-proof key-value embedded data store."
http://www.lmdb.tech/doc/index.html (Doxygen documentation for C version)
https://github.com/lmdbjava/lmdbjava
http://www.javadoc.io/doc/org.lmdbjava/lmdbjava/0.6.0 (javadoc)
"...are a language-neutral, platform-neutral extensible mechanism for serializing structured data."
Both Rholang objects and Blockchain blocks will be serialized to and deserialized from binary using the Protocol Buffer serialization format.
The serialized data will be stored in LMDB as the value portion of the key-value pair.
https://developers.google.com/protocol-buffers/
"...is a protocol buffer compiler plugin for Scala. It will generate Scala case classes, parsers and serializers for your protocol buffers."
Describe any design decisions and/or strategies that affect the overall organization of the system and its higher-level structures. These strategies should provide insight into the key abstractions and mechanisms used in the system architecture. Describe the reasoning employed for each decision and/or strategy (possibly referring to previously stated design goals and principles) and how any design goals or priorities were balanced or traded-off. Such decisions might concern (but are not limited to) things like the following:
This section should provide a high level overview of how the functionality and responsibilities of the the system were partitioned and then assigned to subsystems or components. The main purpose here is to gain a general understanding of how and why the system was decomposed, and how the individual parts work together to provide the desired functionality. If there are diagrams, models, flowcharts, or documented scenarios. This applies only to the scope of the document, not the entire system.