(deprecated) RSpace 0.1 Specification
Use Cases
The Storage Layer will be used as library code in an existing Scala codebase.
The Storage.01 release of the Storage Layer is envisioned to cover the following use cases:
- It should allow the Rholang interpreter to persist and retrieve objects related to it's execution state. It WILL NOT execute those objects.
- It should be general enough to for use as a storage layer outside of the context of the Rholang interpreter. This will be delivered as a self-contained Scala library, targeting Scala 2.11.x, 2.12.x, and eventually 2.13.x
Traceability Matrix
Design Considerations
Here are a couple of issues which need to be addressed or resolved before attempting to devise a complete design:
How do we support keys longer than the MDB_MAXKEYSIZE
?
By default, LMDB is built with a maximum key size of 511 bytes.
We should generate the cryptographic hash (with a suitably fast hashing algorithm like BLAKE2b) of the serialized bytes of a Normalized Rholang Term. This hash is stored as the key in two (or more) key-value pairs in separate LMDB sub-databases.
One pair is (key: hash, value: actual key).
The other pair is (key: hash, value: actual value).
An important thing to note is that solution relies on the assumption that the serialization process is deterministic. That is, it always return the same bytes, regardless of the version of serialization library.
Early (somewhat outdated) sketch contained here: https://github.com/henrytill/rchain/commit/d662006cae5ce458cffd0be622a28971aec51366
This commit sketches out an approach to handling keys that are longer than MDB_MAXKEYSIZE (511 bytes). It creates two separate sub-databases, one for keys and one for values. When a key-value pair is to be added, we generate a cryptographic hash of the key using the BLAKE2b algorithm. The hash becomes the pair's key in both the keys database and the values database. We store the original key as the value in the keys database. We store the original value as the value in the values database.
Matching algorithm
In discussions about the functionality of produce and consume, an important piece is the matching algorithm used to check if a channel matches against a certain pattern.
How is this check done?
Former user (Deleted) said "Run it through a matching algorithm - this needs to be written explicitly."
Kyle Butt said the following:
We can gradually improve the matching algorithm. It's the same algorithm used for
match
in the interpreter.Let's start with lists ground types and variables. That's what the RBL backend currently supports, and it's easy to do that much.
The matching algorithm is very similar to the spatial type checking algorithm found here. The basic idea is that
true
in the model checker corresponds to a free-variable in the pattern.It should be easier to do some of the matching because we have things binned. Not a lot easier, but let's take what we can get.
Matching may be expensive. We should account for it in the cost modeling.
Serialization
Michael Stay (Unlicensed) has suggested that we use Protocol Buffers to define the data types common to Rholang and the Storage Layer. Protocol Buffers offer a straightforward means of binary serialization. The serialized Rholang objects will be stored in LMDB.
See the following tickets for a description of the planned work:
- CORE-174Getting issue details... STATUS
- CORE-176Getting issue details... STATUS
See System Dependencies for more information about Protocol Buffers.
Interfaces
Software Interface
Storage.01 will be programmed in Scala as a Scala library.
For the Rholang Interpreter, the class files will be linked transparently as part of the sbt build process, and included in a packaged Node.
For users outside of the RChain project, this will be delivered as a self-contained JAR, targeting Scala 2.11.x, 2.12.x, and eventually 2.13.x
User Interface
(From the standpoint of a user writing Scala code)
After importing coop.rchain.storage
package, a user will instantiate a Storage
object with the desired configuration parameters like the location of the database file and the size of the underlying key-value map.
Generic/Extensible API
The Storage.01 is designed to be agnostic to the underlying data types that it persists. It does this by providing a generic, type class-based API. For more information about type classes, see above.
In order to use the library, a user will need to provide instances for two type classes: Serialize
and Match
.
Serialize
Instances of the Seralize
type class are used to serialize and deserialize the data types that user wishes to persist in the storage layer.
Essentially, this means providing definitions for two functions:
encode
, which converts a given object toArray[Byte]
decode
, which converts a givenArray[Byte]
to that object
Match
Instances of the Match
type class are used for matching a user-determined pattern type against the keys in the storage layer.
Essentially this means providing a definition for one function:
isMatch
which takes a value of the some typeA
representing patterns and a value of some typeB
representing data and determines if they are a match, returning aBoolean
value.
Rholang
In the case of the Rholang interpreter, these are the the types that form Rholang's core Abstract Data Types (ADTs). We will provide instances of Serialize
and Match
for these data types.
Core Actions
The user will then use two functions to interact with the Storage
object, produce
and consume
.
These functions are presented below in the (deprecated) RSpace 0.1 Specification#API Section.
Finishing up
When a user has completed their interactions with the Storage
object, they will call a close
function to clean up its resources.
System Overview/Architecture
Prototype
A Scala prototype based on Former user (Deleted)'s Python version is here:
https://github.com/henrytill/mirror-world
API (Sketch)
// https://github.com/rchain/rchain/blob/2d65ab3e281fd88598159a9ed4e6208c51de16f1/rholang/src/main/scala/coop/rchain/rholang/interpreter/rholangADT.scala sealed trait Channel case class Quote(p: Par) extends Channel case class ChanVar(cvar: Var) extends Channel case class Send(chan: Channel, data: List[Par]) type Pattern = List[Channel] type Source = Channel case class Receive(binds: List[(Pattern, Source)]) // and one new type: case class Continuation(par: Par, env: Map[Int, Par])
/** * Interface for the main storage object * * @tparam C The type of channels * @tparam P The type of patterns * @tparam A The type of data (aka "waiting writes") * @tparam K The type of objects (aka "continuations") */ trait IStore[C, P, A, K]
/** * Actions on a `Storage` * * In Rholang: * C = Channel * P = Pattern (where Pattern = List[Channel]) * A = List[Quote] * K = Continuation (where Continuation = ?) * * A storage layer for the Rholang interpreter will provide `produce` and `consume` functions specialized to: * * consume(storage: IStore[Channel, List[Channel], List[Quote], Continuation], * key: List[Channel], * patterns: List[List[Channel]], * k: Continuation): Option[(K, List[List[Quote]])] * * produce(storage: IStore[Channel, List[Channel], List[Quote], Continuation], * key: Channel * data: List[Quote]): Option[(K, List[List[Quote]])] * */ trait StorageActions { /** `consume` does the "consume" thing * * @param store * @param channels * @param patterns * @param continuation * @tparam C a type representing a channel * @tparam P a type representing a pattern * @tparam A a type representing a piece of data * @tparam K a type representing a continuation */ def consume[C, P, A, K](store: IStore[C, P, A, K], channels: List[C], patterns: List[P], continuation: K)(implicit m: Match[P, A]): Option[(K, List[A])] /** `produce` does the "produce" thing * * @param store * @param channel * @param data * @tparam C a type representing a channel * @tparam P a type representing a pattern * @tparam A a type representing a piece of data * @tparam K a type representing a continuation */ def produce[C, P, A, K, T](store: IStore[C, P, A, K], channel: C, data: A)(implicit m: Match[P, A]): Option[(K, List[A])] }
Serialize Type Class
trait Serialize[A] { def encode(a: A): Array[Byte] def decode(bytes: Array[Byte]): Either[SerializeError, A] }
Example Instance
Let's say an external user wants to persist String
s as data (in other words, in the above code, String
is substituted for type A
).
They would write an instance of Serialize
for String
like this:
implicit object stringInstance extends Serialize[String] { def encode(s: String): Array[Byte] = s.getBytes(StandardCharsets.UTF_8) def decode(bytes: Array[Byte]): Either[SerializeError, String] = Either .catchNonFatal(new String(bytes, StandardCharsets.UTF8)) .leftMap(SerializeError.apply) }
They would then bring this into the scope where they were calling produce
and consume
.
Match Type Class
/** * Type class for matching patterns with data. * * In Rholang, `P` is `List[Channel]`, and `A` is `List[Quote]`. * * @tparam P The type of Patterns * @tparam A The type of Data */ trait Match[P, A] { def isMatch(p: P, a: A): Option[List[A]] }
Example Instance
Let's say an external user wants to use String
as the type of patterns (in other words, in the above code, String
is substituted for type P), and String
as the type of data (in the above code String
is substituted for type A). They have also decided that they only want to use a trivial implementation of match that tests String
equality.
They would write an instance of Match
like this:
implicit object stringMatcher extends Match[String] { def isMatch(s1: String, s2: String): Boolean = s1 == s2 }
They would then bring this into the scope where they were calling produce
and consume
.
More Notes from Kyle Butt
(This is essentially the Rholang conceptual model for using the tuplespace. In a generalized implementation, the details will vary).
3 parts:
- The list of waiting continuations, along with their
List[(Channel, Pattern)]
. Stored at the keyList[Channel]
. - A mapping from
Channel → List[Channel]
. - Data of type
Seq[Par]
, stored at the keyChannel
.
Consume
A consume
checks #3, to see if a COMM Event can be produced.
If not, it stores the relevant info in #1 and #2.
Produce
produce
uses #2 to get List[Channel]
.
It checks them against #3 and #1 to see if a COMM Event can be created.
If a COMM Event can't be created, the produce is stored in #3.
It needs to hold the locks for each Channel
for the duration of a single check*.
So storage doesn't just transition on COMM Events, it transitions of every send and receive.
- Kyle Butt will have to think about the ships in the night problem and locking.
Discussion
From the perspective of a generalized storage layer, a the produce
or consume
functions return some object of type K
(see below). When the storage layer is used in the Rholang interpreter, this K
will be a Continuation
which the interpreter can then call, producing the aforementioned COMM Event.
Regarding #3, what is the type signature of a "waiting write"?
- Former user (Deleted) says, "If I understand this correctly it is just arbitrary data at a channel."
- Kyle Butt clarifies: For rholang, it is a name. (Which may be a quoted process). The name gets substituted upon send, so we either explicitly do the substitution, or pass an environment.
Consume, in Pseudocode
Look up key in (2), returning a list of Channels Check if data matching each pattern is available on each channel yes: Remove it from table and pass it to continuation no: Produce joined channel Store continuation and patterns at joined channel add joined channel -> [[channel pattern]] mapping (Not necessary if we store the joined channel separated below) add channel -> joined channel mapping
Discussion
What is the "joined channel" that we are supposed to produce?
Former user (Deleted) says "Just a tuple of channels." Kyle Butt agrees
Could we clarify the last two lines? It's not clear to me what the "joined channel separated below" is.
Former user (Deleted) says "Because we store the joined channel as a tuple, we can just look it up as a tuple. In Rosette you can't store tuples as keys to table/dictionary and hence the hack.
See https://gist.github.com/KentShikama/8b1c83a88e0a87785c3c70fb3605a3fb for an implementation without the last two lines necessary."
Produce, in Pseudocode
for each joined channel in (2) for each channel in joined channel if channel == send.chan and pattern matches data save send.data elif channel has data that matches the pattern save send.data else this joined channel isn't viable if any joined channel is viable pick 1 and eat the data send the data to the picked continuation else store the send.data ;;; Kent provided a small correction to this: ;;; ;;; The produce side makes a quick check to see if the current data matches ;;; the pattern at the bat-channel of the joined channel. Otherwise, we'll ;;; be looping through every continuation of every joined channel. Also note ;;; that each (joined-)channel can store multiple continuations (and ;;; products/data).
Discussion
Is "eat" the same as "consume"?
Former user (Deleted) says "Yes".
Kyle Butt says "No. eating just meant removing it from the store so no other continuation gets it It should also say 'send the data to the continuation'".
Former user (Deleted) says "If by consume it was originally meant that included also sending the data to the picked continuation, then I agree with Kyle".
Is "store" the same as "save"?
Former user (Deleted) and Kyle Butt say yes.
Limitations
Assumptions and Dependencies
Software Dependencies
Lightning Memory-mapped Database
"An ultra-fast, ultra-compact, crash-proof key-value embedded data store."
General Information:
http://www.lmdb.tech/doc/index.html (Doxygen documentation for C version)
JNR-FFI Wrapper Library for use from Scala
https://github.com/lmdbjava/lmdbjava
http://www.javadoc.io/doc/org.lmdbjava/lmdbjava/0.6.0 (javadoc)
Protocol Buffers (serialization format)
"...are a language-neutral, platform-neutral extensible mechanism for serializing structured data."
Both Rholang objects and Blockchain blocks will be serialized to and deserialized from binary using the Protocol Buffer serialization format.
The serialized data will be stored in LMDB as the value portion of the key-value pair.
https://developers.google.com/protocol-buffers/
ScalaPB
"...is a protocol buffer compiler plugin for Scala. It will generate Scala case classes, parsers and serializers for your protocol buffers."
Architectural Strategies
Describe any design decisions and/or strategies that affect the overall organization of the system and its higher-level structures. These strategies should provide insight into the key abstractions and mechanisms used in the system architecture. Describe the reasoning employed for each decision and/or strategy (possibly referring to previously stated design goals and principles) and how any design goals or priorities were balanced or traded-off. Such decisions might concern (but are not limited to) things like the following:
- Use of a particular type of product (programming language, database, library, etc. ...)
- Reuse of existing software components to implement various parts/features of the system
- Future plans for extending or enhancing the software
- User interface paradigms (or system input and output models)
- Hardware and/or software interface paradigms
- Error detection and recovery
- Memory management policies
- External databases and/or data storage management and persistence
- Distributed data or control over a network
- Generalized approaches to control
- Concurrency and synchronization
- Communication mechanisms
- Management of other resources