Road to stability of communications module
Communication module is facing some challenges, some of them were known in the past, majority of them were recently discovered. Issues can not be simply put into single category, they relate to infrastructure challenges (going through NAT), coupling issues that trigger conceptual/business issues, implementation errors & lack of fear by having untested / not documented code. They also come with different level of priority. The idea behind this document is to gather all issues that comm is facing, building up a roadmap to bring stability to the module.
Key | Summary | T | Due | Assignee | P | Status | Description |
---|---|---|---|---|---|---|---|
RCHAIN-2077 | Investigate and test - check how Kademlia reponds when given two nodes with same ID but different IP addresses |
|
Pawel Szulc |
|
Done |
Done when:
|
|
RCHAIN-2076 | Investiage and test how kademlia behaves when node is not responding to heart beats. |
|
Pawel Szulc |
|
Done | ||
RCHAIN-1959 | Write initial set of tests for kademlia |
|
Pawel Szulc |
|
Done | ||
RCHAIN-1952 | Simulate transport layer: Kademlia test infrastructure |
|
Pawel Szulc |
|
Done |
Create the test infrastructure for Kademlia.
|
|
RCHAIN-1906 | Remove Disconnet from NodeDiscovery.Protocol |
|
Pawel Szulc |
|
Done |
Node Discovery should not rely on external sources to learn that given node disconnected from the network. Disconnect messages seems as an artifical abstraction that does not bring any value, neither were it is defined (NodeDiscovery) nor where it is being used (RChain protocol) |
|
RCHAIN-1905 | Any upstream should be removed from NodeDiscovery.Protocol |
|
Pawel Szulc |
|
Done |
TBD |
|
RCHAIN-1891 | Remove Messages Hello and Disconnect are not used in the RChain protocol |
|
Pawel Szulc |
|
Done | ||
RCHAIN-1883 | Remove method disconnect from TransportLayer |
|
Pawel Szulc |
|
Done |
Method disconnect should never be part of TransportLayer abstraction. It is TcpTransportLayer implementation detail. It must be removed from the abstraction. It should be called externally when TcpTransportLayer realises that given tcp connection is no longer valid. TcpTransportLayer should not rely on external algorithms to inform it, that given tcp connection should be removed. It should be able to handled that on its own. |
|
RCHAIN-1882 | Type of messages received and send by TransportLayer should be generic |
|
Pawel Szulc |
|
Done |
More details to be provided! Signature change from ``` to ``` |
|
RCHAIN-1815 | Remove mutable List from Kademlia |
|
Pawel Szulc |
|
Done | ||
RCHAIN-1807 | Introduce ApplicativeAsk for NodeConfig |
|
Pawel Szulc |
|
Done | ||
RCHAIN-1804 | Introduce NodeConfig and provide it via ApplicativeAsk |
|
Pawel Szulc |
|
Done | ||
RCHAIN-1794 | Move whoami from conf to node or comm |
|
Sebastian Bach |
|
Done |
This improves the readability of the code. Low priority. |
|
RCHAIN-1737 | Integration tests for TcpTransportLayer |
|
Sebastian Bach |
|
Done |
Minimum set integration tests needed for TcpTransportLayer: ***** roundTrip is send and response is received if message is handled |
|
RCHAIN-1706 | Investiage / modify NodeDiscovery to forget nodes that don't respond |
|
Sebastian Bach |
|
Done |
What needs to be done:
|
|
RCHAIN-863 | Reading and writing files takes ages, not deterministic behaviour, seems to disappear after reboot |
|
Sebastian Bach |
|
Done | ||
RCHAIN-398 | Redefine boundary between TransportLayer, NodeDiscovery and RChain protocol |
|
Pawel Szulc |
|
Done |
This ticket is based on some discussion made in https://rchain.atlassian.net/wiki/spaces/CORE/pages/500367432/Communications+Module+-+current+state+and+challenges - please read it through before reading rest of this ticket description. Currently NodeDiscovery is toupled with Rchain protocol which. RChain adds elements to table defined in NodeDiscovery (via addNode) and NodeDiscovery adds and removes those elements according to its own internal algorithms. This leads to following errors: The main root of evil here is the fact that NodeDiscovery *shares* same table of peers with RChain Protocol. Right now when RChain Protocol: 1. can be called from the out side world via `connectToBootstrap` (goes through `connect` procedure for given bootstrap The way `connect` works is described in `ConnectSpec.scala`. In essence it sends message ProtocolHandshake and once received ProtocolhandshakeResponse it adds the given node to kademlia peer table. On the other side, upon receiving ProtocolHandshake message, ProtocolHandshakeResponse is being send back, and incoming node is added to Kademlia peer table. Node Discovery (kademlia) abstracts concept of node attaching itself to existing P2P network and continuously learning about new nodes. The way that Kademlia works right now is following: it waits for outside calls like `addNode` and `findMorePeers` and triggers communication with the outside world if needed (sending Loopkup and Ping messages) Node Discovery should be totally independent protocol, with only single method AddMe available as an entry point (sent once to connect given node to P2P network). Once that method is sent, NodeDiscovery should continuously explor the P2P trying to learn about neighbour peers and at the same time exposing single method listPeers. That method should allow listing all nodes that Node Discovery learned while runnings. Node Discovery should be totally abstracted away from anything build on top of it (like RChain protocol). What should be done described in subtasks of this issue. |