CORE-711 Postmortem


stay - Today at 11:01 AM
No, all I know is that when they went to deploy the node software, the node service failed to start.


How problem manifested itself?

When trying to run rnode as systemd process, system was not booting up. The reason was related with recent changes of adding subcommands (CORE-496).

Subcommands were added in CORE-496 as a bounty feature. Docker and documentation were updated, however systemd configuration was was not updated.

Thus when running a node as a systemd, the incorrect arguments were passed to the execution command.

Why problem occurred?

Docker is being used in integration tests, thus most probably that exploded when the feature was implemented and was updated by the developer. Systemd is not covered in integration tests, thus probably why it was omitted (forgot).

Also I was recently told that we are no longer supporting debian and fedora package installers (only docker and tarball) and yet somehow fedora installer was used yesterday during testing. That ambiguity of installation artefacts (that we are supporting) needs to be clarified. 

Lastly, this was not the cause of the issue, but I think worth pointing out, that systemd takes a fixed set or arguments when starting rnode and the user has no way to modify them or add new ones when starting rnode using systemd.

What can be done so in the future the issue is avoided?

  1.  systemd is not tested with integration tests ( CORE-716 - Getting issue details... STATUS )
  2. we need to clarify once and for all which builds we are supporting and which not ( CORE-718 - Getting issue details... STATUS )
  3. we need to clarify how do we support systemd in our documentation ( CORE-717 - Getting issue details... STATUS )


Other known issues:

  1. systemd does not allow passing up arguments ( CORE-719 - Getting issue details... STATUS