Reduce bloat - RSpace (current) analysis

Note: This page is WIP.

Data gathered

Analysis of data on transfer 2

https://docs.google.com/spreadsheets/d/1W2UgFUmEbegR7VMsnUi3rrQDAeBYyenILFHH_0YpRdQ/edit?usp=sharing

Analysis of data on transfer 3

https://docs.google.com/spreadsheets/d/1Ks75ExHoLx6cxam9DRbztREzG-FpQ6f7RkS6otcYjpI/edit?usp=sharing

Analysis of data on transfer 4

https://docs.google.com/spreadsheets/d/1BEQ4IBtftvssiLcdCh7npVuuUGW1DlL_y6031D2lMOw/edit?usp=sharing

Analysis

All the info was gathered as seen by rnode.

The code used is available here:

https://github.com/dzajkowski/rchain/tree/wip-bloat-analysis

All the gathered data is doubled since the deploying node has to run replay on the same data.

GNAT

Based on "Analysis of data on transfer 2"

A GNAT is held by Leaf (vide: trie analysis, lower)

A GNAT holds:

channels - the name used in to send or receive
data - produce, send
continuations - consume, receive

size of leaf + node + skip := (1738 + 90014 + 19134) / 2 /* - replay */ => 55443

Data and continuations size

Data cost

Transfer from Bob to Alice

A Datum holds:

source - checksums for event tracking
a - produce data

size of data := 7142 / 2 /* (- replay) */ => 3571 (100%)
size of a := 6372 / 2 => 3181 (89%)
size of source := 690 / 2 => 345 (10%)

Deeper dive into the AST

based on "analysis of data on transfer 3"

size of A in Data := 3185
size of pars := 2689 (84%)
size of top level random state := 494 (16%)

Continuations cost

Transfer from Bob to Alice

A WaitingContinuation holds:

continuation - consume data
source - checksums for event tracking
patterns - used for spatial matching

Size of continuations being shuffled during a transfer:

size of wks := 10372 / 2 /*(- replay)*/ => 5186 (100%)
size of continuations in wks := 7692 / 2 => 3946 (76%)
size of source := 1606 / 2 => 803 (15%)
size of patterns := 1106 / 2 => 503 (10%)

Deeper dive into the AST

based on "analysis of data on transfer 3"

size of wks := 5 186
size of continuation par := 2 536 (50%)
size of continuation top level random  := 1 088 (20%)
size of patterns := 553 (10%)
size of source := 804 (16%)

Trie size

Leaf cost

Leafs are cheap themselves, the data usage comes from the size of data stored.

Skip cost

size of skip := 1738 / 2 /* -replay */ => 869

Node cost

size of node:pointerblocks := 89984 / 2 /* (- replay) */ => 44992

Size on disk

LMDB

observed size on disk := 102400 / 2 /* (-replay) */ => 51200

Observations

	% of total
leaf	17%
leaf:data	6%
leaf:continuation	9%
node	81%
skip	2%

Takeaways/next steps

Work through checkpoint creation (1 step instead of 13 steps) which would reduce the amount of stored tries (less data) RCHAIN-3415 - Getting issue details... STATUS
1. estimated gain:
  1. store of top level pointerblock (4416 + 4512 + 128) instead of 44992 (80%)
  2. note: this calculation is very imprecise as the change will happen (is possible) in next rspace
2. Address EmptyPointer stored in Node pointer block. RCHAIN-3226 - Getting issue details... STATUS
  1. encode pointer type in 2 bits, not in 8 bits
  2. explore gain from storing sparse vectors
Continue analysis of rholang AST
1. we can address the source size (which is constant but it constitutes a 10% or 15% of the data stored)
2. there is no way of addressing the random state