Managing the Leader Node#
This section collects some useful how-tos that help in the management of the Leader Node.
Find the Leader Node#
To find which Carbonio Mesh node is currently the leader node, first get the Carbonio Mesh token.
The token is encrypted and stored in file
/etc/zextras/service-discover/cluster-credentials.tar.gpg
and
can be retrieved with this command, which will output the token on the CLI
# gpg -qdo - /etc/zextras/service-discover/cluster-credentials.tar.gpg | tar xOf - consul-acl-secret.json | jq .SecretID -r
For simplicity you can put the token in a local variable as follows
# export CONSUL_HTTP_TOKEN=$(gpg -qdo - /etc/zextras/service-discover/cluster-credentials.tar.gpg | tar xOf - consul-acl-secret.json | jq .SecretID -r)
You can then check the password with command
# echo $CONSUL_HTTP_TOKEN
The password will remain in memory until you exit the CLI session, but you can explicitly delete it using command
# unset CONSUL_HTTP_TOKEN
Query the Carbonio Mesh service to retrieve the state of all its Nodes. The leader node has the attribute State set to leader.
# consul operator raft list-peers
The output of the command will be similar to the following. In this case, the leader node is srv2-example-com:
Node ID Address State Voter RaftProtocol
srv1-example-com 10092f88-53cc-6938-08d3-48d112b5b25e 10.174.166.116:8300 follower true 3
srv2-example-com 04033e5a-5597-20ca-81ef-5cdad4f24581 10.174.166.117:8300 leader true 3
srv3-example-com 0d325666-f792-2258-a351-f74c01249fb3 10.174.166.118:8300 follower true 3
Missing Leader Node#
When a Carbonio Mesh cluster falls and the election quorum is not met, you
may find a situation where no leader node exists and the following
error appears in the syslog
log file:
No cluster leader
In a case like this, it is possible to forcefully elect a node as the new leader and restore the cluster’s functionality following this procedure.
First, choose one of the Carbonio Mesh cluster’s nodes that you want to be the new leader. We call this node newleader in the remainder of this procedure.
On all Carbonio Mesh nodes, except for newleader, stop the service-discover service
# systemctl status service-discover.service
On newleader, make a backup of peers.json
file:
# cp /var/lib/service-discover/data/raft/peers.json /root/peers.json.bak
Then, retrieve the id
of the consul server
# cat /var/lib/service-discover/data/node-id
The output will be a string like:
61f22310-97de-0965-4958-321840df66b6
Use this string to create a new
/var/lib/service-discover/data/raft/peers.json
with the
following content:
{
"id": "<consul_server_node_id>",
"address": "<mesh_newleader_IP:8300",
"non_voter": false
}
Note
It is important that the non_voter
attribute be set to
false
.
The new file will therefore be similar to this:
{
"id": "61f22310-97de-0965-4958-321840df66b6",
"address": "10.22.247.11:8300",
"non_voter": false
}
Hint
You can find further information about the format of file
peers.json
inside file
/var/lib/service-discover/data/raft/peers.info
.
Ensure the file has proper ownership:
# chown service-discover:service-discover /var/lib/service-discover/data/raft/peers.json
Now, on newleader, start the service-discover.service, then execute command
# consul members
The output will include the FQDN of newleader as the leader and only member of the cluster. The same result can be seen from the Carbonio Mesh Administration Interface. This is correct, since the service-discover service has been stopped on all Carbonio Mesh nodes at the beginning of the procedure.
To complete the procedure, and bring back the cluster to its full efficiency, start the service-discover service on the other cluster nodes.
Once done, you can check on each Carbonio Mesh node that all cluster nodes are alive.