Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

https://github.com/dzajkowski/rchain/tree/wip-bloat-analysis


All the gathered data is doubled since the deploying node has to run replay on the same data.

GNAT

Based on "Analysis of data on transfer 2"

A GNAT is held by Leaf (vide: trie analysis, lower)

A GNAT holds:

  • channels - the name used in to send or receive
  • data - produce, send
  • continuations - consume, receive

...


% of total
leaf17%
leaf:data6%

leaf:continuation

9%
node81%
skip2%


Thoughts

Thought experiment 1

Let's assume that:

  1. the system is operating at 1500 COMM/s
  2. it takes 50 COMM events to transfer X from A to B
  3. the random state is the only thing we track in history

one random state takes up 100 bytes

we are able to process 30 rev txns/s (1500 / 50)

This results in 10MB of random state per hour, 1.7GB in a week.

Thought experiment 2

Let's assume that:

  1. there are 20000 active rev vaults being used
  2. each rev vault lives on a dedicated channel
  3. the keys are evenly distributed

each key weighs 32 bytes

each node can hold up to 256 keys

To be able to track 20k unique channels it takes a trie that:

  • has a filled 0 level node (256 * 32 bytes)
  • the 1 level is able to index 65 536 keys
  • we use 20k of those keys
  • the rest of the structure needs a leaf and a skip per key - 64 bytes each

The structure that tracks 20k active accounts weighs:

  1. level 0: 8 192 
  2. level 1: 640 000
  3. skip + leaf: 1 280 000

So to track a block that has 20k rev transactions we need a total of 1 928 192 bytes on disk.

Thought experiment 3 (1 + 2)

Assumptions 1 & 2 hold.

we can have 20k rev transactions in 667 seconds, 12 hours.

Let's assume that we only create a block when we have 20k rev txns accumulated.

The lower limit of the cost on disk is: 12*10MB + 2MB = 122MB per block.

The cost of the above will go up with the frequency of block creation (only whole nodes can be shared).

Takeaways/next steps

  1. Work through checkpoint creation (1 step instead of 13 steps) which would reduce the amount of stored tries (less data) 
    Jira Legacy
    serverSystem JIRA
    serverId50130123-f232-3df4-bccb-c16e7d83cd3e
    keyRCHAIN-3415
    1. estimated gain:
      1. store of top level pointerblock (4416 + 4512 + 128) instead of 44992 (80%)
      2. note: this calculation is very imprecise as the change will happen (is possible) in next rspace
    2. Address EmptyPointer stored in Node pointer block. 
      Jira Legacy
      serverSystem JIRA
      serverId50130123-f232-3df4-bccb-c16e7d83cd3e
      keyRCHAIN-3226
      1. encode pointer type in 2 bits, not in 8 bits
      2. explore gain from storing sparse vectorsvectors https://rchain.atlassian.net/projects/RCHAIN/issues/RCHAIN-3227
  2. Continue analysis of rholang AST
    1. we can address the source size (which is constant but it constitutes a 10% or 15% of the data stored)
    2. there is no way of addressing the random state

Trie Merging in RSpace Next

Transfer from A → B, existing wallets.

The experiment was performed on a clean rnode instance with genesis and 

  • vault_demo/0.create_genesis_vault.rho
  • vault_demo/2.check_balance.rho
  • vault_demo/3.transfer_funds.rho

run to prime the data.

What is gathered in the sheet is the result of a second vault_demo/3.transfer_funds.rho run.

https://docs.google.com/spreadsheets/d/1v5MJhKyzu1WgzAQo51rHt0TNXeBOffbQ9bUgldxHeUw/edit?usp=sharing

The trie structure weighs 7020 bytes.

It's easy to observe that there is one central PointerBlock which weighs 4929 bytes.

This cost seems to be very close to the lower bound for any change to the trie. This would dictate that any actions on a block should focus on performing as much actions in memory and committing (checkpointing) as seldom as possible.

This also points to work done in 

Jira Legacy
serverSystem JIRA
serverId50130123-f232-3df4-bccb-c16e7d83cd3e
keyRCHAIN-3453

On a side note: RSpace Next does not hold a separate data bag for replay, so the data is not duplicated.

Thought

One could make a case that the highest level PointerBlocks (let's say 3 levels) will change constantly and will be heavily populated (256 * 32 bytes).

Maybe it would make sense to differentiate them from the pointer blocks that will live lower in the trie and invest in some sort of structure sharing.

But at the time of writing no such structure (that would save space) comes to mind.

At the same time if the deployments passed to trie creation are diverse enough this will become a mute point as the structure will be shared in a limited fashion.

This needs further investigation to merit any further code changes.

Data usage in RSpace Next

ticket: 

Jira Legacy
serverSystem JIRA
serverId50130123-f232-3df4-bccb-c16e7d83cd3e
keyRCHAIN-3484

data referenced:

https://docs.google.com/spreadsheets/d/1MHEWeha6zOhEV39fye-3trZlg50R13jRAjbjvgE2OSw/edit?usp=sharing

typesize
insert-continuation

5043

insert-data

3634
insert-join280
sum8957

Comparing the RSpace Next to original RSpace we can observe: 

  • the data is not doubled as replay runs of the same store
  • there is a new type of data "joins"
  • the original structure took up around 9 565 bytes (GNAT in Leaf) (times two for replay)

Diving deeper into what the data itself is we observe

Datums:

Code Block
collapsetrue
20190606-15:50:17.840;642122de191479453604d09e92f8864d6527b526861493b50126f8356adf8a49;insert-data;1167;1;[{"11112ChXfi25ByoKouxaiUKRSrVXjdQzTUoJBtkw71JVvYNL2NgMDi" : bundle+ {   Unforgeable(0x4d17a5e7e7d1098774b9b9460392d25cd9127a64ecbaec87b99cf4ef79eb5d51) }, "11112SGU8f1D7DfkS98MzadvCPzHKzuiUjukHH8c7RqnffPhVP3Ws9" : bundle+ {   Unforgeable(0x96019b21498dcd36a18e720cc8efd5847edee6ba28ff5c853e836dd95f6f2464) }, "11112pbz6HqE2Lc2T7ojH116U7LkepSewDemoPnf56voiRMpDzVyhe" : bundle+ {   Unforgeable(0xa39baf11a39a13cee1e90f6ccdbaa783b134625d96c9576ced0e74e422ca585a) }, "11112pwq4ZQ634rvTSJqZ4SmWYBWyHv98U46jF9pgmB5LHozWLzJXu" : bundle+ {   Unforgeable(0x73d27b713c2c8466686b9f285b18c2b92329851fbe0a5729b7720b58c3186269) }, "1111GcGEJX6me9vVN5cPGbFHSXepo8BvQ9r9CFcNmYSemHDvdpPZB" : bundle+ {   Unforgeable(0xf25f98897615ae7602bdecb3163bda2808200039bdf1ce1934dc852cf0393c14) }, "1111Ggp3miC6ULewtjtoCw4mB1WywmoU35QNw7z6JUfVqZdxRFxBn" : bundle+ {   Unforgeable(0xbe9f997b0d2aa987cac1b43f21d04c6928cde3c9e669b27f72d2f4ab8f1ba5ff) }, "1111hcuCGNhYcjZ2PHbyRLiZTGY9M1CwgXZDxrABADiLmoXyW2RZT" : bundle+ {   Unforgeable(0xba0e1ec9d909842334650361ec2e7b3da1f14e0058e902998bffec4ddfd8cd15) }, "1111iZYyhMm2tZtAEUevwFTyw9SgwnSCf9Vc3Qk39Gio1k9sHxWUK" : bundle+ {   Unforgeable(0xca5c1739042042c36cf1d422dc815e920810aa7a3cd62115753941357fceb67f) }, "1111sc15PERLugDRdfPHkSfYPpar8i1NcwM5hKsE64HVUG2zCF5sy" : bundle+ {   Unforgeable(0x31caa7b4f5cd32794fc65d12b9bd0a79d033a5754ececc6bf50068077a1161a2) }}]
20190606-15:50:17.840;dbbfa78ff7d7bdfc9f674a5bc78a71350f5140007630846e3e4a3c08e44e55a2;insert-data;198;1;[8600]
20190606-15:50:17.840;999f2cebea94b041378fc9838a7784d8e9b48b3280e4b62cee4769e13f935ef2;insert-data;1876;1;[{54 : (0, 36ed509b74f58de9801175ae055de06f8a371a73ea6bc3a464b7d225dca4e1, (9223372036854775807, bundle+ {   Unforgeable(0xeb58132e9b06ccc10a037ec292c37b76c39841f8b9a25dcc0ad76c592fa78b7f) })), 5a : (0, 5a7c333756a4d02bdac7b44f69b0ee2ec668ccc69fbb3b3da8b3aa92cf70d2, (9223372036854775807, bundle+ {   Unforgeable(0x0483e8263c258cb1daabdd8b5ee19ea4f34f61573ca72b1c9c5f0ce277d9e3f7) })), 5e : (0, 5eaef0e4de87174696209eebf10790716a201ef0af4f2f79c8803d6db10400, (9223372036854775807, bundle+ {   Unforgeable(0xa32c1649f1700469f125ea85f9c2a220072e593ae617213782d738e389639d3b) })), 71 : (0, 39e598adddfd02199c35ae078a7016057ef299c2cc4691d1656b8f62d18fbc, (9223372036854775807, bundle+ {   Unforgeable(0xcaad4e983dedd6c5ae42384b6d87ba6f77997cdf6bb75b8e5f3ffa2133e6b45d) })), dd : (0, ac4303f2a5943d78c94c3cec2dfc31cf19140b0ed0a35caab1808984cec8a6, (9223372036854775807, bundle+ {   Unforgeable(0x4ca8ca1cff2656b3a1f7765bf3819416ce6f9f9e44102229a2622c98cd65de22) })), f5 : (0, d8570a82c39946a013059f222f4710c9eb1ac109933a175baec50266b0e8c9, (9223372036854775807, bundle+ {   Unforgeable(0x263f45f2ca50669ef4011a57d3a10270c1250d52ed99f212a6332d8ffe57d3be) })), c5 : (0, 816250a6b94e12f6cf556146e8398b2c0d14f0586b9a7fc0b41fa9e5c358a5, (9223372036854775807, bundle+ {   Unforgeable(0x10d4824180800daef356857cffca4f51471960b92711df0fd3c3425401149e11) })), ae : (0, a4389cc059a43ce09d1d629a3d7113a33a799cb8b8f6573effa2528083cb5a, (9223372036854775807, bundle+ {   Unforgeable(0x6cd75a683533af49eb9d8cd1d71fc24577cfa6099502eef95668d7a0a9887686) })), e3 : (0, c181f245281a4c6ba407d6af04e495739c1254fd13645f4ae151fe4ee42b01, (9223372036854775807, bundle+ {   Unforgeable(0x61657f51076320deb7358dfcfc1f703be818ee08876c8b8efbfdf6e9d3020bcd) })), f1 : (0, 6f02556e119081a1aa256e6ab976f5d31efae9432aa8c648a950e98350b5f3, (9223372036854775807, bundle+ {   Unforgeable(0x95fcdae5d4db7cf5aa0022bf6dba3f7c21b506fed8f3f4698a570d048ea758bd) })), a6 : (0, ee541718b13689de727ef72a5026144f093359d6c9961994c48d301488cf8b, (9223372036854775807, bundle+ {   Unforgeable(0x61a47068f8b14054beb31fd7b7f580623ff90104c04f7e4dac105ec52f379688) })), 93 : (0, a9b4716e93f5f1a01a460649734262f3e9533bcfb16e43a3264f50420aa32c, (9223372036854775807, bundle+ {   Unforgeable(0x741f6ee13b54ca29cd6e3340e24a781e8cc2739bc4ab5fd1650de54e11040955) })), ca : (0, fd5a3c1d53b541f29ae597b71b837b1439be2515cc93fae092f91ce3a3b01e, (9223372036854775807, bundle+ {   Unforgeable(0xc274979cec5d4e0f9f6ef7b7c1236254b1eeb8869f0dad6df87649a9660c58f9) }))}]
20190606-15:50:17.841;4e30f330755d6b6dc8f54ca16be004db87d88893f5f49d7b3165fa259e248240;insert-data;197;1;[400]
20190606-15:50:17.841;25cbf27ecfc0f25a1d6c81c1be4616a00bb45a0d3509467008d880e82b302a6c;insert-data;196;1;[0]

Continuations:

Code Block
linenumberstrue
collapsetrue
20190606-15:50:17.836;3c1316e73f94b3044f97954d190292e11b5cf88bd871adef29dcde0fc96b2556;insert-continuation;1760;3;match (x-2 >= 0) {
  true => {
    for( @{x0} <- @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)} ) {
      match ((x0 + x-2) >= x0) {
        true => {
          @{x-1}!(true) |
          @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)}!((x0 + x-2))
        }
        false => {
          @{x-1}!(false) |
          @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)}!(x0)
        }
      }
    }
  }
  false => {
    @{x-1}!(false)
  }
};match (x-2 >= 0) {
  true => {
    for( @{x0} <- @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)} ) {
      match (x-2 <= x0) {
        true => {
          @{x-1}!(true) |
          @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)}!((x0 - x-2))
        }
        false => {
          @{x-1}!(false) |
          @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)}!(x0)
        }
      }
    }
  }
  false => {
    @{x-1}!(false)
  }
};for( @{x0} <- @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)} ) {
  @{x-1}!(x0) |
  @{Unforgeable(0x09e1534416a0a03cc0680a50a7a8116b674005316b3d325f3cbafd9bb33d123c)}!(x0)
}
20190606-15:50:17.838;10aab7ee3a588895868250d442f29f60a160944c0bb009298150644a101d4224;insert-continuation;353;1;@{Unforgeable(0xe29fc7b0349439bb273c36a0f719607aa53858a246b788e2d0f248ce5a23c0ca)}!("1111sc15PERLugDRdfPHkSfYPpar8i1NcwM5hKsE64HVUG2zCF5sy", 0, x-1)
20190606-15:50:17.839;002aaf52b691c44c91af2b9d068020ecac6100b20e480a9d660e0099ef00bc60;insert-continuation;427;1;@{x-1}!(bundle0 {   (Unforgeable(0x7e9d36337ccbbab727a21f6c2ed273b0f1142e0eb3427eab5cf50333f0071322), (Unforgeable(0xf43babf465fe04a0be8d1ef3b543f2203fd287a1d6a9fb290c2ca13127210219), "1111sc15PERLugDRdfPHkSfYPpar8i1NcwM5hKsE64HVUG2zCF5sy")) })
20190606-15:50:17.839;1002f77a40953a11781cc0b0bf714ecb8f363a5f46e533e09fbf8f8f9fa6a6a9;insert-continuation;2503;5;new x0 in {
  @{bundle+ {   Unforgeable(0xd8b16a8c4e343834d6491668458ac6ee8f4f85e70e604ce370c6621d275a93a5) }}!("sub", x-2, *x0) |
  for( @{x1} <- x0 ) {
    @{x-1}!(x1, Unforgeable(0xa114647fa0587e970884cafc259ffc7f088cb3e6a51427a5751b742639887c47))
  }
};new x0, x1 in {
  @{Unforgeable(0xd4363a6448e79ca3043294d86906546eb0c3cb642c0beea40c9fea64a6e4fc57)}!("sprout", *x0) |
  for( @{x2} <- x0 ) {
    @{x2}!("deposit", x-2, Unforgeable(0xd4363a6448e79ca3043294d86906546eb0c3cb642c0beea40c9fea64a6e4fc57), *x1) |
    for( @{x3} <- x1 ) {
      match x3 {
        true => {
          @{x-1}!([x2])
        }
        false => {
          @{x-1}!([])
        }
      }
    }
  }
};new x0 in {
  @{x-2}!(bundle0 {   bundle0 {     x-2 } |
  Unforgeable(0xa114647fa0587e970884cafc259ffc7f088cb3e6a51427a5751b742639887c47) }, x-3, *x0) |
  for( @{x1}, @{x2} <- x0 ) {
    match (x1 && (x2 == Unforgeable(0xa114647fa0587e970884cafc259ffc7f088cb3e6a51427a5751b742639887c47))) {
      true => {
        @{bundle+ {   Unforgeable(0xd8b16a8c4e343834d6491668458ac6ee8f4f85e70e604ce370c6621d275a93a5) }}!("add", x-3, x-1)
      }
      false => {
        @{x-1}!(false)
      }
    }
  }
};@{Unforgeable(0xd93eb64c221e94ed10b3c97291af187fe189f851125affe75a63a096cf9a5558)}!("makePurse", 0, x-1);@{bundle+ {   Unforgeable(0xd8b16a8c4e343834d6491668458ac6ee8f4f85e70e604ce370c6621d275a93a5) }}!("value", x-1)

Findings:

  • it's easy to observe that the Datums are divided into two types
    • the interesting part "balances", they take up 196 + 197 + 198 bytes
    • the boring part of maps of revvaults 1167 + 1876
      • Jira Legacy
        serverSystem JIRA
        serverId50130123-f232-3df4-bccb-c16e7d83cd3e
        keyRCHAIN-3535
      • Jira Legacy
        serverSystem JIRA
        serverId50130123-f232-3df4-bccb-c16e7d83cd3e
        keyRCHAIN-3534
    • in this case (rev transfer) the map does not change, so storing it again is just a consequence of not having enough introspection into the data in RSpace 
      Jira Legacy
      serverSystem JIRA
      serverId50130123-f232-3df4-bccb-c16e7d83cd3e
      keyRCHAIN-3528
    • the potential saving by having Datums share data should be of the size of the registry (2900 bytes in this case)
    • in theory this can be achieved by providing peek semantics
  • Continuations consist of the purse contract being created (persisted)
    • it seems that all these pieces should not make it to the tuplespace at all since they will go out of scope post the purse being empty
      • a proposal of making this go away is to introduce a one-time-use purse that will be consumed when money is drained 
        Jira Legacy
        serverSystem JIRA
        serverId50130123-f232-3df4-bccb-c16e7d83cd3e
        keyRCHAIN-3536
      • Jira Legacy
        serverSystem JIRA
        serverId50130123-f232-3df4-bccb-c16e7d83cd3e
        keyRCHAIN-3539
    • it seems that there is a possibility to not persist the continuations over and over but simply point to them with data 
      Jira Legacy
      serverSystem JIRA
      serverId50130123-f232-3df4-bccb-c16e7d83cd3e
      keyRCHAIN-3529
      • this would boil down to a one time cost of the contract in a schematic form ~ 2500 bytes + 1760
      • and a reference with indexes and values stored on each transfer with 7 unforgeable names
      • the gain here is not obvious and needs further inspection

Trie vs data in RSpace Next

In the above example it looks like the data and the trie are taking up roughly the same amount of data but that is misleading. There is an initial cost to making a new trie but its growth will not be linear to the amount of data put into the tuplespace (TODO this should be shown).

It looks like the next steps is to make the data take up as little space as possible.