Managing the Leader Node

Managing the Leader Node#

This section collects some useful how-tos that help in the management of the Leader Node.

Find the Leader Node#

To find which Carbonio Mesh node is currently the leader node, first get the Carbonio Mesh token.

The token is encrypted and stored in file /etc/zextras/service-discover/cluster-credentials.tar.gpg and can be retrieved with this command, which will output the token on the CLI

# gpg -qdo - /etc/zextras/service-discover/cluster-credentials.tar.gpg | tar xOf - consul-acl-secret.json | jq .SecretID -r

For simplicity you can put the token in a local variable as follows

# export CONSUL_HTTP_TOKEN=$(gpg -qdo - /etc/zextras/service-discover/cluster-credentials.tar.gpg | tar xOf - consul-acl-secret.json | jq .SecretID -r)

You can then check the password with command

# echo $CONSUL_HTTP_TOKEN

The password will remain in memory until you exit the CLI session, but you can explicitly delete it using command

# unset CONSUL_HTTP_TOKEN

Query the Carbonio Mesh service to retrieve the state of all its Nodes. The leader node has the attribute State set to leader.

# consul operator raft list-peers

The output of the command will be similar to the following. In this case, the leader node is srv2-example-com:

Node                          ID                                    Address              State     Voter  RaftProtocol
srv1-example-com  10092f88-53cc-6938-08d3-48d112b5b25e  10.174.166.116:8300  follower  true   3
srv2-example-com  04033e5a-5597-20ca-81ef-5cdad4f24581  10.174.166.117:8300  leader    true   3
srv3-example-com  0d325666-f792-2258-a351-f74c01249fb3  10.174.166.118:8300  follower  true   3

Missing Leader Node#

When a Carbonio Mesh cluster falls and the election quorum is not met, you may find a situation where no leader node exists and the following error appears in the syslog log file:

No cluster leader

In a case like this, it is possible to forcefully elect a node as the new leader and restore the cluster’s functionality following this procedure.

First, choose one of the Carbonio Mesh cluster’s nodes that you want to be the new leader. We call this node newleader in the remainder of this procedure.

On all Carbonio Mesh nodes, except for newleader, stop the service-discover service

# systemctl status service-discover.service

On newleader, make a backup of peers.json file:

# cp /var/lib/service-discover/data/raft/peers.json /root/peers.json.bak

Then, retrieve the id of the consul server

# cat /var/lib/service-discover/data/node-id

The output will be a string like:

61f22310-97de-0965-4958-321840df66b6

Use this string to create a new /var/lib/service-discover/data/raft/peers.json with the following content:

{
  "id": "<consul_server_node_id>",
  "address": "<mesh_newleader_IP:8300",
  "non_voter": false
}

Note

It is important that the non_voter attribute be set to false.

The new file will therefore be similar to this:

{
    "id": "61f22310-97de-0965-4958-321840df66b6",
    "address": "10.22.247.11:8300",
    "non_voter": false
}

Hint

You can find further information about the format of file peers.json inside file /var/lib/service-discover/data/raft/peers.info.

Ensure the file has proper ownership:

# chown service-discover:service-discover /var/lib/service-discover/data/raft/peers.json

Now, on newleader, start the service-discover.service, then execute command

# consul members

The output will include the FQDN of newleader as the leader and only member of the cluster. The same result can be seen from the Carbonio Mesh Administration Interface. This is correct, since the service-discover service has been stopped on all Carbonio Mesh nodes at the beginning of the procedure.

To complete the procedure, and bring back the cluster to its full efficiency, start the service-discover service on the other cluster nodes.

Once done, you can check on each Carbonio Mesh node that all cluster nodes are alive.