(deprecated) RSpace 0.1 Specification
This is a Living Document
Introduction
Purpose
This document is intended to to specify the initial release of the RChain Storage Layer, known as Storage.01
References
RChain Architecture Docs: Storage and Query
The version of the storage layer envisioned for the Storage.01 release is a cut-down version of the layer described in this document.
It lacks the following features described in this document:
Distributed operation: Storage.01 will not operate in a distributed setting. Storage.01 is designed to work in the context of a single Rholang interpreter running alone on a single node which does not communicate with other nodes. Later releases of the RChain Storage Layer will work with multiple instances of the Rholang interpreter or RhoVM running on a single node which will be networked with other such nodes.
There are also some important differences from the storage layer described in this document:
The underlying key-value store: Instead of MongoDB, the current design of the storage layer uses LMDB. See Software Dependencies for more information.
The query mechanism: Originally, SpecialK used a Prolog/Datalog-based query mechanism, which was included in the original design for the RChain Storage Layer . However, this feature will not be available in the Storage.01 release. It will be replaced with a different approach to matching on keys.
Storage Specification & Design - Deprecated
This document describes an earlier, abandoned attempt at Storage.01.
Lmdb and Lmdbjava Usage Recommendations
This document contains relevant information about LMDB.
Old, somewhat weird solution to each key having multiple pieces of data
Definitions
JAR: Java ARchive
"A package file format typically used to aggregate many Java class files and associated metadata and resources (text, images, etc.) into one file for distribution." (Wikipedia)
COMM Event
From RChain Architecture Docs: Execution Model
RhoVM operates against a key-value data store. A state change of RhoVM is realized by an operation that modifies which key maps to what value. Since, like Rholang, RhoVM is derived from the rho-calculus model of computation, that operation is the low-level rho-calculus reduction rule. Effectively, the reduction rule, known as the “COMM” rule, is a substitution that defines a computation
Pto be performed if a new value is observed at a key. A key is analogous to a name in that it references a location in memory which holds the value being substituted.
BlockDAG
A variant on a blockchain, where the each block is a node in a Directed Acyclic Graph (DAG). In order to accomodate the CASPER consensus model, RChain's Blockchain is a BlockDAG. In future versions of the storage layer, a history of states and means for rolling back to them will be informed by this design.
Type class
From the Cats Documentation
Type classes are a powerful tool used in functional programming to enable ad-hoc polymorphism, more commonly known as overloading. Where many object-oriented languages leverage subtyping for polymorphic code, functional programming tends towards a combination of parametric polymorphism (think type parameters, like Java generics) and ad-hoc polymorphism.
Original paper demonstrating how to encode type classes in Scala: https://ropas.snu.ac.kr/~bruno/papers/TypeClasses.pdf
Scope
As stated above, this document is only intended to describe the minimal initial version of the storage layer, known as Storage.01.
Storage.01 will implement the basic tuplespace functionality described below for use in the Rholang Interpreter.
Storage.01
will not have history, so will not be a suitable candidate upon which to build a BlockDAG.
will not operate in a distributed mode
will not implement the "system processes" feature which allow a client to run arbitrary Scala code keyed to a particular channel.
- 1 Introduction
- 1.1 Purpose
- 1.2 References
- 1.3 Definitions
- 2 Scope
- 3 Use Cases
- 4 Design Considerations
- 5 Interfaces
- 5.1 Software Interface
- 5.2 User Interface
- 5.2.1 Generic/Extensible API
- 5.2.2 Core Actions
- 5.2.3 Finishing up
- 6 System Overview/Architecture
- 6.1 Prototype
- 6.2 API (Sketch)
- 6.2.2 Serialize Type Class
- 6.2.2.1 Example Instance
- 6.2.3 Match Type Class
- 6.2.3.1 Match Type Class
- 6.2.3.2 Example Instance
- 6.3 More Notes from @Kyle Butt
- 6.3.1 Consume
- 6.3.2 Produce
- 6.3.3 Discussion
- 6.3.4 Consume, in Pseudocode
- 6.3.4.1 Consume
- 6.3.5 Discussion
- 6.3.6 Produce, in Pseudocode
- 6.3.6.1 Produce
- 6.3.7 Discussion
- 7 Limitations
- 8 Assumptions and Dependencies
- 8.1 Software Dependencies
- 8.1.1 Lightning Memory-mapped Database
- 8.1.1.1 General Information:
- 8.1.1.2 JNR-FFI Wrapper Library for use from Scala
- 8.1.2 Protocol Buffers (serialization format)
- 8.1.2.1 ScalaPB
- 8.1.1 Lightning Memory-mapped Database
- 8.1 Software Dependencies
- 9 Architectural Strategies
- 10 System Architecture
Use Cases
The Storage Layer will be used as library code in an existing Scala codebase.
The Storage.01 release of the Storage Layer is envisioned to cover the following use cases:
It should allow the Rholang interpreter to persist and retrieve objects related to it's execution state. It WILL NOT execute those objects.
It should be general enough to for use as a storage layer outside of the context of the Rholang interpreter. This will be delivered as a self-contained Scala library, targeting Scala 2.11.x, 2.12.x, and eventually 2.13.x
Traceability Matrix
Design Considerations
Here are a couple of issues which need to be addressed or resolved before attempting to devise a complete design:
How do we support keys longer than the MDB_MAXKEYSIZE?
By default, LMDB is built with a maximum key size of 511 bytes.
We should generate the cryptographic hash (with a suitably fast hashing algorithm like BLAKE2b) of the serialized bytes of a Normalized Rholang Term. This hash is stored as the key in two (or more) key-value pairs in separate LMDB sub-databases.
One pair is (key: hash, value: actual key).
The other pair is (key: hash, value: actual value).
An important thing to note is that solution relies on the assumption that the serialization process is deterministic. That is, it always return the same bytes, regardless of the version of serialization library.
Early (somewhat outdated) sketch contained here: https://github.com/henrytill/rchain/commit/d662006cae5ce458cffd0be622a28971aec51366
This commit sketches out an approach to handling keys that are longer than MDB_MAXKEYSIZE (511 bytes). It creates two separate sub-databases, one for keys and one for values. When a key-value pair is to be added, we generate a cryptographic hash of the key using the BLAKE2b algorithm. The hash becomes the pair's key in both the keys database and the values database. We store the original key as the value in the keys database. We store the original value as the value in the values database.
Matching algorithm
In discussions about the functionality of produce and consume, an important piece is the matching algorithm used to check if a channel matches against a certain pattern.
How is this check done?
@Former user (Deleted) said "Run it through a matching algorithm - this needs to be written explicitly."
@Kyle Butt said the following:
We can gradually improve the matching algorithm. It's the same algorithm used for
matchin the interpreter.Let's start with lists ground types and variables. That's what the RBL backend currently supports, and it's easy to do that much.
The matching algorithm is very similar to the spatial type checking algorithm found here. The basic idea is that
truein the model checker corresponds to a free-variable in the pattern.It should be easier to do some of the matching because we have things binned. Not a lot easier, but let's take what we can get.
Matching may be expensive. We should account for it in the cost modeling.
Serialization
@Michael Stay (Unlicensed) has suggested that we use Protocol Buffers to define the data types common to Rholang and the Storage Layer. Protocol Buffers offer a straightforward means of binary serialization. The serialized Rholang objects will be stored in LMDB.
See the following tickets for a description of the planned work:
https://rchain.atlassian.net/browse/CORE-174
https://rchain.atlassian.net/browse/CORE-176
See System Dependencies for more information about Protocol Buffers.
Interfaces
Software Interface
Storage.01 will be programmed in Scala as a Scala library.
For the Rholang Interpreter, the class files will be linked transparently as part of the sbt build process, and included in a packaged Node.
For users outside of the RChain project, this will be delivered as a self-contained JAR, targeting Scala 2.11.x, 2.12.x, and eventually 2.13.x
User Interface
(From the standpoint of a user writing Scala code)
After importing coop.rchain.storage package, a user will instantiate a Storage object with the desired configuration parameters like the location of the database file and the size of the underlying key-value map.
Generic/Extensible API
The Storage.01 is designed to be agnostic to the underlying data types that it persists. It does this by providing a generic, type class-based API. For more information about type classes, see above.
In order to use the library, a user will need to provide instances for two type classes: Serialize and Match.
Serialize
Instances of the Seralize type class are used to serialize and deserialize the data types that user wishes to persist in the storage layer.
Essentially, this means providing definitions for two functions:
encode, which converts a given object toArray[Byte]decode, which converts a givenArray[Byte]to that object
Match
Instances of the Match type class are used for matching a user-determined pattern type against the keys in the storage layer.
Essentially this means providing a definition for one function:
isMatchwhich takes a value of the some typeArepresenting patterns and a value of some typeBrepresenting data and determines if they are a match, returning aBooleanvalue.
Rholang
In the case of the Rholang interpreter, these are the the types that form Rholang's core Abstract Data Types (ADTs). We will provide instances of Serialize and Match for these data types.
Core Actions
The user will then use two functions to interact with the Storage object, produce and consume.
These functions are presented below in the API Section.
Finishing up
When a user has completed their interactions with the Storage object, they will call a close function to clean up its resources.