Raft telemetry
Raft telemetry provides information on
OpenBao integrated storage.
Default metrics
vault.raft.apply
Metric type | Value | Description |
---|
counter | number | Number of transactions in the configured interval |
The vault.raft.apply
metric is generally a good indicator of the write load
on your raft internal storage.
vault.raft.barrier
Metric type | Value | Description |
---|
counter | number | Number of times the node started the barrier |
A node starts the barrier by issuing a blocking call when it wants to ensure
that all pending operations that need to be applied to the finite state machine
are properly queued.
vault.raft.candidate.electSelf
Metric type | Value | Description |
---|
summary | ms | Time required for a node to send a vote request to a peer |
vault.raft.commitNumLogs
Metric type | Value | Description |
---|
gauge | number | Number of logs processed for application to the finite state machine in a single batch |
vault.raft.commitTime
Metric type | Value | Description |
---|
summary | ms | Time required to commit a new entry to the raft log on the leader node |
vault.raft.compactLogs
Metric type | Value | Description |
---|
summary | ms | Time required to trim unnecessary logs |
vault.raft.fsm.apply
Metric type | Value | Description |
---|
summary | number | Number of logs committed by the finite state machine since the last interval |
vault.raft.fsm.applyBatch
Metric type | Value | Description |
---|
summary | ms | Time required by the finite state machine to apply the most recent batch of logs |
vault.raft.fsm.applyBatchNum
Metric type | Value | Description |
---|
counter | number | Number of logs applied in the most recent batch |
vault.raft.fsm.enqueue
Metric type | Value | Description |
---|
summary | ms | Time required to queue up a batch of logs for the finite state machine to apply |
vault.raft.fsm.restore
Metric type | Value | Description |
---|
summary | ms | Time required by the finite state machine to complete a restore operation from a snapshot |
vault.raft.fsm.snapshot
Metric type | Value | Description |
---|
summary | ms | Time required by the finite state machine to record state information for the current snapshot |
vault.raft.fsm.store_config
Metric type | Value | Description |
---|
summary | ms | Time required to store the most recent raft configuration |
vault.raft.get
Metric type | Value | Description |
---|
summary | ms | Time required to retrieve an entry from underlying storage |
vault.raft.list
Metric type | Value | Description |
---|
summary | ms | Time required to retrieve a list of keys from underlying storage |
vault.raft.peers
Metric type | Value | Description |
---|
guage | number | The number of peers in the raft cluster configuration |
vault.raft.restore
Metric type | Value | Description |
---|
counter | number | Number of times that the node performed a restore operation |
In the context of raft storage, a restore operation refers to the process where
raft consumes an external snapshot to restore its state.
vault.raft.restoreUserSnapshot
Metric type | Value | Description |
---|
timer | ms | Time required to restore the finite state machine from a user snapshot |
vault.raft.rpc.appendEntries
Metric type | Value | Description |
---|
timer | ms | Time required to process a remote appendEntries call from a node |
vault.raft.rpc.appendEntries.processLogs
Metric type | Value | Description |
---|
timer | ms | Time required to completely process the outstanding logs for the given node |
vault.raft.rpc.appendEntries.storeLogs
Metric type | Value | Description |
---|
timer | ms | Time required to record any outstanding logs since the last request to append entries for the given node |
vault.raft.rpc.installSnapshot
Metric type | Value | Description |
---|
timer | ms | Time required to process an installSnapshot RPC call |
Only nodes currently in the follower
state report
vault.raft.rpc.installSnapshot
metrics.
vault.raft.rpc.processHeartbeat
Metric type | Value | Description |
---|
timer | ms | Time required to process a heartbeat request |
vault.raft.rpc.requestVote
Metric type | Value | Description |
---|
summary | ms | Time required to complete a requestVote call |
vault.raft.snapshot.create
Metric type | Value | Description |
---|
timer | ms | Time required to capture a new snapshot |
vault.raft.snapshot.persist
Metric type | Value | Description |
---|
timer | ms | Time required to record snapshot meta information to disk while taking snapshots |
vault.raft.snapshot.takeSnapshot
Metric type | Value | Description |
---|
timer | ms | Total time required to create and persist the current snapshot |
In most cases, vault.raft.snapshot.takeSnapshot
is approximately equal to
vault.raft.snapshot.create + vault.raft.snapshot.persist
.
vault.raft.state.candidate
Metric type | Value | Description |
---|
counter | number | Number of times the raft server initiated an election |
vault.raft.state.follower
Metric type | Value | Description |
---|
summary | number | Number of times in the configured interval that the raft server became a follower |
Nodes transition to follower
state under the following conditions:
- when the node joins the cluster
- when a leader is elected, but the node was not elected leader
vault.raft.state.leader
Metric type | Value | Description |
---|
counter | number | Number of times the raft server became a leader |
vault.raft.transition.heartbeat_timeout
Metric type | Value | Description |
---|
summary | number | Number of times that the node transitioned to candidate state after not receiving a heartbeat message from the last known leader |
vault.raft.transition.leader_lease_timeout
Metric type | Value | Description |
---|
counter | number | The number of times the leader could not contact a quorum of nodes and therefore stepped down |
vault.raft.verify_leader
Metric type | Value | Description |
---|
counter | number | Number of times in the configured interval that the node confirmed it is still the leader |
Autopilot metrics
<Note heading="Metrics only apply to the active node">
Autopilot only runs on the active node, so autopilot metrics are only
captured for the current active node.
</Note>
vault.autopilot.failure_tolerance
Metric type | Value | Description |
---|
gauge | nodes | The number of healthy nodes in excess of quorum |
The failure tolerance indicates how many currently healthy nodes can fail without losing quorum.
vault.autopilot.healthy
Metric type | Value | Description |
---|
gauge | boolean | Indicates whether all nodes are healthy |
- A value of
1
on the gauge means that Autopilot deems all nodes healthy.
- A value of
0
on the gauge means that Autopilot deems at least 1 node
unhealthy.
vault.autopilot.node.healthy
Metric type | Value | Description |
---|
gauge | boolean | Indicates whether the active node is healthy |
- A value of
1
on the gauge means that Autopilot deems the node indicated by
node_id
is healthy.
- A value of
0
on the gauge means that Autopilot cannot communicate with the
node indicated by node_id
, or deems the node unhealthy.
Leadership change metrics
Leadership change metrics indicate the overall performance of the integrated
storage on raft servers and the network connection between raft nodes.
vault.raft.leader.dispatchLog
Metric type | Value | Description |
---|
timer | ms | Time required for the leader node to write a log entry to disk |
vault.raft.leader.dispatchNumLogs
Metric type | Value | Description |
---|
gauge | number | Number of logs committed to disk in the most recent batch |
Metric type | Value | Description |
---|
summary | ms | Time since the leader was last able to contact the follower nodes when checking its leader lease |
Raft replication metrics
vault.raft.replication.appendEntries.log
Metric type | Value | Description |
---|
summary | number | Number of logs replicated to a node to establish parity with leader logs |
vault.raft.replication.appendEntries.rpc
Metric type | Value | Description |
---|
timer | ms | Time required to replicate leader node log entries to all follower nodes with appendEntries |
vault.raft.replication.heartbeat