Arbitrary block size
Pawel Szulc October 11, 2018
1 Requirements
One of the node requirements is to have an ability to deal with an arbitrary block sizes. That means that if node receives a block that can physically fit into its memory then that block should be handled, if it does not fit into the memory, then it’s not handled, but the receiving node will not crash and continue operating.
2 Problems (and [possible] solutions)
2.1 DONE Limitation on network message size
grpc-java uses io.netty which has limitations on message size that are being sent over the network. As it turns out we can make the limit size arbitrary big on the runtime. This approach was however discourage, since (as pointed by Sebastian) HTTP2 RFC 7540 limits the size of the message to be not bigger then 16 MB. And even though io.netty see to do just fine by breaking that rule, we would like to keep node compliant with the RFC. We’ve solved this by streaming big messages over the network (see TransportLayer.stream method)
2.2 TODO "Can fit physically into memory"
Java process when launched reservers initial amount of memory and can grow up to some statically defined the limit (provided by the -XmX flag). Most of the reserved memory is availalble for the heap. The heap is where all the created objects are stored (including blocks). When we say "node can fit hyscially into memory" we must mean the amount of memory that is given for the heap minus some amount of memory reserved for othe objects then Blocks. The challange here is how to evaluate that value. Since the maximum size of memory used by java process is limited on the start of the process and we can not really know how much space other objects will take, we are only left with one option: speculation. We potentially can run node to observe how much memory it will consume for its work other then handling blocks but that will always be just a rough estimate.
2.3 TODO Concurrent message handling
If blocks were received one ofter the other in sequential manner, then han- dling arbitrary block size would not be hug big of a deal. Node would know how big of a block it can take (see point above), received it in stream manner (see two points above). If (while recieving block) it would realize that the block is reaching its limits, it would abort the stream. The problem is that blocks are being received in parallel (we have seperate queue for stream). It means e.g that for node 2GB heap, block of size 500mb might fit with out issues, but it will cause a OutOfMemoryError (from which we can not recover), if that message was broadcasted by 6 differenct nodes (500mb x 6 = 3GB). Potential solution is to restrict the ratio between the maximum message size and the level of parallelism (number of blocks that can be re- ceived per node), but it is not ideal solution - this only takes into account the proces of receiving a block, not the proces of "handling it", where the block is still not ready to be garbage collected by the JVM.