Latest Message Storage Specification (Draft)
1. Introduction
1.1 Purpose of the system
The latest messages data structure represents a mapping between validator public keys and the hash of the latest block the respective validator has seen. It is used in various places by Casper protocol such as equivocation detection and score map building, but, most importantly, it is used by safety oracle to count agreement graph's edges as it is computation heavy task.
In order to count agreement graph's edges, safety oracle, provided by a list of validator candidates, checks for every pair of validators if they see an agreement with each other. This process involves looking up the latest message for the first validator and checking its justifications to find one sent by the second validator and compatible with the estimate (block to detect safety on). The same process is then done for the second validator with respect to the first validator. As you can see, the process of counting agreement graph's edges involves looking up latest messages for n^2 number of validator pairs, where n is the number of validators which can reach a few thousands.
The purpose of this document is to provide a specification of latest messages data structure storage on disk that will be efficient enough to allow safety oracle computation to do constant-time lookups in the storage.
1.2 Design goals
The design goals of the latest message storage include:
- Do constant-time lookups which happen n^2 times (n is the number of validators) every time a block is received by a node
- Do updates which happen once every time a block is received by a node
- Do inserts which happen rarely as validators pool will be fairly stable in the long run
- Use as less I/O operators as we can to support the previous three types of operations
- An ability to restore the latest messages on node startup in case of file corruption after a system crash
1.3 Definitions, acronyms, and abbreviations
1.4 References
2. Current software architecture
As for now the latest messages data structure is stored in-memory only and represented as Map[Validator, BlockMessage]
.
3. Proposed software architecture
3.1 Overview
The API for storing latest messages is represented by LatestMessagesStorage
.
trait LatestMessagesStorage[F[_]] { def access[A](fa: F[Map[Validator, BlockHash] => A]): F[A] def insert(validator: Validator, blockHash: BlockHash): F[Unit] }
The proposed implementation LatestMessagesFileStorage
of this API is designed as a combination of in-memory representation and synchronized on-disk representation implemented as an arbitrarily ordered sequence of publickey-blockhash pairs with in-memory index.
final class LatestMessagesFileStorage[F[_]: Monad: Concurrent: Sync: Log] private ( lock: MVar[F, Unit], latestMessagesRef: Ref[F, Map[Validator, BlockHash]], indexRef: Ref[F, Map[Validator, Long]], dataFileResource: Resource[F, RandomAccessFile], crcPath: Path ) extends LatestMessagesStorage[F] { def access[A](fa: F[Map[Validator, BlockHash] => A]): F[A] = for { _ <- lock.take latestMessages <- latestMessagesRef.get result <- fa.map(_(latestMessages)) _ <- lock.put(()) } yield result def insert(validator: Validator, blockHash: BlockHash): F[Unit] = for { _ <- lock.take _ <- updateFile(validator, blockHash) result <- latestMessagesRef.update(_ + (validator -> blockHash)) _ <- lock.put(()) } yield result private def updateFile(validator: Validator, blockHash: BlockHash): F[Unit] } object LatestMessagesFileStorage { final case class Config( dataPath: Path, crcPath: Path ) def apply[F[_]: Monad: Concurrent: Sync: Log](config: Config): F[LatestMessagesFileStorage[F]] }
Every instance of LatestMessagesFileStorage
can only be created by using LatestMessagesFileStorage.apply
provided by the LatestMessagesFileStorage.Config
configuration containing the path to the on-disk storage file. On apply
invocation, this file is read into memory as a List[(Long, Validator, BlockHash)]
. The first element of the tuple stands for the offset in the file, the second and third elements stand for the elements of the publickey-blockhash pair. After that, two maps are built out of the list: Map[Validator, BlockHash]
and Map[Validator, Long]
. The first map has the same semantics as the legacy in-memory representation of latest messages. The second map provided by validator public key gives back the offset of the corresponding pair in the file. We will call the first map as in-memory representation and the second map as in-memory index.
The in-memory representation and in-memory index are wrapped into cats.effect.concurrent.Ref
to represent mutable atomic references and passed as the second and third arguments to the LatestMessagesFileStorage
constructor respectively. The first argument is a cats.effect.concurrent.MVar
used as a mutex for synchronizing the storage and the fourth argument is a random access file wrapped into cats.effect.Resource
to capture the effectful allocation of the file.
Required operations are handled in the following fashion:
- Lookups can be done by acquiring
lock
and running the user provided function of typeMap[Validator, BlockHash]
on the in-memory representation. The function can do lookups in the provided map in effectively constant time.lock
is released after the function is executed. - Updates can be done by acquiring
lock
and looking up the offset in the in-memory index in effectively constant time, seeking to the corresponding part of the file and rewriting the blockhash value. In-memory representation is updated with the new blockhash value.lock
is released afterwards. - Inserts can be done by acquiring
lock
and appending the pair to the end of the file as the order of the pairs is completely arbitrary. In-memory representation is updated with the new validator-blockhash pair and in-memory index is updated with the new validator-offset pair.lock
is released afterwards.
crcPath
is used to store the CRC of the on-disk storage file. As CRC is a linear function, it is possible to recompute CRC after changing a part of the file by xoring the old value of CRC with the old part of the file and the new part of the file (i.e. newCrc = oldCrc ⊕ crc(oldPart) ⊕ crc(newPart)
). The file can be updated atomically by writing the new CRC value to a temporary file and replacing the old CRC file with a new one since renaming is an atomic operation. CRC file is used to detect corrupted data after a system crash.
3.2 Software/hardware mapping
3.2.1 Memory allocation
Both validator public key size and block has size are 52 bytes each (8 bytes for pointer + 12 bytes for array header + 32 bytes for the content), so a single pair requires 112 bytes of memory (52 * 2 + 8 as pair itself also requires a pointer). Hence, the in-memory representation requires about 33 KiB of RAM for test net bond file with 300 validators and only 1 MiB of RAM in case the amount of validators bloats up to 10000. The in-memory index will at most double the memory consumption. Hence, we do not expect that holding in-memory representation will make a significant effect on memory consumption.
3.2.2 Disk Memory
In case of on-disk representation, the actual size of a pair requires only 64 bytes as headers and pointers are not required for storing data on the disk. Hence, only 19 KiB is required for storing a file with 300 publickey-blockhash pairs and 625 KiB is required for the severe case of 10000 publickey-blockhash pairs.
Note: I think that using memory-mapped is unnecessary as we can easily fit the whole data structure into the memory. Lookups do not involve any I/O, inserts require a simple append of the pair to the disk and by using the in-memory index we can rewrite the necessary place in the file using a single seek operation and a single write operation (i.e. without actually scanning the whole file or using binary search on it).
3.3 Persistent data management
As both publickey and blockhash are invariable in size, the proposed method of storing latest messages data structure on disk is to consequently write out the publickey-blockhash pairs byte-by-byte (i.e. 32 bytes of publickey followed by 32 bytes of blockhash).
This data will be stored in the file specified by LatestMessagesFileStorage.Config.dataPath
and the CRC value will be stored in the file specified by LatestMessagesFileStorage.Config.crcPath.
The configuration for latest messages storage will be added to the existing node configuration coop.rchain.node.configuration.Configuration:
final class Configuration( val command: Command, val server: Server, val grpcServer: GrpcServer, val tls: Tls, val casper: CasperConf, val blockstorage: LMDBBlockStore.Config, val latestMessagesStorage: LatestMessagesFileStorage.Config, private val options: commandline.Options )
The default value for latestMessagesStorage
is defined by using dataDir
(which defaults to $HOME/.rnode
) as follows:
val latestMessagesStorage = LatestMessagesFileStorage.Config( dataDir.resolve("casper-latest-messages-file-storage"), dataDir.resolve("casper-latest-messages-crc") )
The default values can be redefined by adding the following command-line options to coop.rchain.node.configuration.commandline.Options
:
val casperLatestMessagesData = opt[Path](required = false, descr = "Path to latest messages data file") // --casper-latest-messages-data val casperLatestMessagesCrc = opt[Path](required = false, descr = "Path to latest messages crc file") // --casper-latest-messages-crc
LatestMessagesFileStorage
will be initialized in NodeRuntime.main
using the provided configuration. As SafetyOracle
depends on latest messages data structure, it will have to be initialized after LatestMessagesFileStorage
.
3.5 Operational aspects
3.5.1 Metrics
LatestMessagesFileStorage
will export the following metrics by reusing the existing class coop.rchain.metrics.Metrics
:
- Accessing the in-memory representation map by invoking
access
method will be reported with name"latest-messages-file-storage-access"
. - Appending a new key-value pair to the data file by invoking
insert
method will be reported with name"latest-messages-file-storage-insert"
. - Overwriting the value of an existing key-value pair in the data file by invoking
insert
method will be reported with name"latest-messages-file-storage-update"
.
By using this metrics data one could determine how often lookups/updates/inserts are performed by node.
3.5.2 Logs
LatestMessagesFileStorage
will not log anything.
3.6 Global software control
Both lookups and inserts/updates require acquiring the lock and are hence mutually exclusive. On the other hand, doing a bulk of lookups still requires taking only a single lock by using the access
method.
3.7 Boundary conditions
- On node startup on-disk storage file is read into memory. In-memory representation and in-memory index are built from the read data.
- On node shutdown no additional actions are required as the system is always fully synchronized
- In case of node error, on startup the storage file is checked for corruptions by comparing the CRC to the actual check value of the storage file. In case of a corruption, the file is rebuilt from scratch using the information from persisted DAG.