Raft telemetry

Raft telemetry provides information on OpenBao Integrated Storage.

Default metrics

vault.raft.apply

Metric type	Value	Description
counter	number	Number of transactions in the configured interval

The vault.raft.apply metric is generally a good indicator of the write load on your raft internal storage.

vault.raft.barrier

Metric type	Value	Description
counter	number	Number of times the node started the barrier

A node starts the barrier by issuing a blocking call when it wants to ensure that all pending operations that need to be applied to the finite state machine are properly queued.

vault.raft.candidate.electSelf

Metric type	Value	Description
summary	ms	Time required for a node to send a vote request to a peer

vault.raft.commitNumLogs

Metric type	Value	Description
gauge	number	Number of logs processed for application to the finite state machine in a single batch

vault.raft.commitTime

Metric type	Value	Description
summary	ms	Time required to commit a new entry to the raft log on the leader node

vault.raft.compactLogs

Metric type	Value	Description
summary	ms	Time required to trim unnecessary logs

vault.raft.fsm.apply

Metric type	Value	Description
summary	number	Number of logs committed by the finite state machine since the last interval

vault.raft.fsm.applyBatch

Metric type	Value	Description
summary	ms	Time required by the finite state machine to apply the most recent batch of logs

vault.raft.fsm.applyBatchNum

Metric type	Value	Description
counter	number	Number of logs applied in the most recent batch

vault.raft.fsm.enqueue

Metric type	Value	Description
summary	ms	Time required to queue up a batch of logs for the finite state machine to apply

vault.raft.fsm.restore

Metric type	Value	Description
summary	ms	Time required by the finite state machine to complete a restore operation from a snapshot

vault.raft.fsm.snapshot

Metric type	Value	Description
summary	ms	Time required by the finite state machine to record state information for the current snapshot

vault.raft.fsm.store_config

Metric type	Value	Description
summary	ms	Time required to store the most recent raft configuration

vault.raft.get

Metric type	Value	Description
summary	ms	Time required to retrieve an entry from underlying storage

vault.raft.list

Metric type	Value	Description
summary	ms	Time required to retrieve a list of keys from underlying storage

vault.raft.peers

Metric type	Value	Description
gauge	number	The number of peers in the raft cluster configuration

vault.raft.restore

Metric type	Value	Description
counter	number	Number of times that the node performed a restore operation

In the context of raft storage, a restore operation refers to the process where raft consumes an external snapshot to restore its state.

vault.raft.restoreUserSnapshot

Metric type	Value	Description
timer	ms	Time required to restore the finite state machine from a user snapshot

vault.raft.rpc.appendEntries

Metric type	Value	Description
timer	ms	Time required to process a remote `appendEntries` call from a node

vault.raft.rpc.appendEntries.processLogs

Metric type	Value	Description
timer	ms	Time required to completely process the outstanding logs for the given node

vault.raft.rpc.appendEntries.storeLogs

Metric type	Value	Description
timer	ms	Time required to record any outstanding logs since the last request to append entries for the given node

vault.raft.rpc.installSnapshot

Metric type	Value	Description
timer	ms	Time required to process an `installSnapshot` RPC call

Only nodes currently in the follower state report vault.raft.rpc.installSnapshot metrics.

vault.raft.rpc.processHeartbeat

Metric type	Value	Description
timer	ms	Time required to process a heartbeat request

vault.raft.rpc.requestVote

Metric type	Value	Description
summary	ms	Time required to complete a `requestVote` call

vault.raft.snapshot.create

Metric type	Value	Description
timer	ms	Time required to capture a new snapshot

vault.raft.snapshot.persist

Metric type	Value	Description
timer	ms	Time required to record snapshot meta information to disk while taking snapshots

vault.raft.snapshot.takeSnapshot

Metric type	Value	Description
timer	ms	Total time required to create and persist the current snapshot

In most cases, vault.raft.snapshot.takeSnapshot is approximately equal to vault.raft.snapshot.create + vault.raft.snapshot.persist.

vault.raft.state.candidate

Metric type	Value	Description
counter	number	Number of times the raft server initiated an election

vault.raft.state.follower

Metric type	Value	Description
summary	number	Number of times in the configured interval that the raft server became a follower

Nodes transition to follower state under the following conditions:

when the node joins the cluster
when a leader is elected, but the node was not elected leader

vault.raft.state.leader

Metric type	Value	Description
counter	number	Number of times the raft server became a leader

vault.raft.transition.heartbeat_timeout

Metric type	Value	Description
summary	number	Number of times that the node transitioned to `candidate` state after not receiving a heartbeat message from the last known leader

vault.raft.transition.leader_lease_timeout

Metric type	Value	Description
counter	number	The number of times the leader could not contact a quorum of nodes and therefore stepped down

vault.raft.verify_leader

Metric type	Value	Description
counter	number	Number of times in the configured interval that the node confirmed it is still the leader

Autopilot metrics

<Note heading="Metrics only apply to the active node"> Autopilot only runs on the active node, so autopilot metrics are only captured for the current active node. </Note>

vault.autopilot.failure_tolerance

Metric type	Value	Description
gauge	nodes	The number of healthy nodes in excess of quorum

The failure tolerance indicates how many currently healthy nodes can fail without losing quorum.

vault.autopilot.healthy

Metric type	Value	Description
gauge	boolean	Indicates whether all nodes are healthy

A value of 1 on the gauge means that Autopilot deems all nodes healthy.
A value of 0 on the gauge means that Autopilot deems at least 1 node unhealthy.

vault.autopilot.node.healthy

Metric type	Value	Description
gauge	boolean	Indicates whether the active node is healthy

A value of 1 on the gauge means that Autopilot deems the node indicated by node_id is healthy.
A value of 0 on the gauge means that Autopilot cannot communicate with the node indicated by node_id, or deems the node unhealthy.

Leadership change metrics

Leadership change metrics indicate the overall performance of the integrated storage on raft servers and the network connection between raft nodes.

vault.raft.leader.dispatchLog

Metric type	Value	Description
timer	ms	Time required for the leader node to write a log entry to disk

vault.raft.leader.dispatchNumLogs

Metric type	Value	Description
gauge	number	Number of logs committed to disk in the most recent batch

vault.raft.leader.lastContact

Metric type	Value	Description
summary	ms	Time since the leader was last able to contact the follower nodes when checking its leader lease

Raft replication metrics

vault.raft.replication.appendEntries.log

Metric type	Value	Description
summary	number	Number of logs replicated to a node to establish parity with leader logs

vault.raft.replication.appendEntries.rpc

Metric type	Value	Description
timer	ms	Time required to replicate leader node log entries to all follower nodes with `appendEntries`

vault.raft.replication.heartbeat

Metric type	Value	Description
timer	ms	Time required to invoke `appendEntries` on a peer so the peer does not time out

vault.raft.replication.installSnapshot

Metric type	Value	Description
timer	ms	Time required to process an `installSnapshot` RPC call

Only nodes currently in the follower state report vault.raft.replication.installSnapshot metrics.

Storage metrics

vault.raft_storage.bolt.cursor.count

Metric type	Value	Description
gauge	number	Number of cursors created in the Bolt database

vault.raft_storage.bolt.freelist.allocated_bytes

Metric type	Value	Description
gauge	bytes	Total space allocated for the freelist for the Bolt database

vault.raft_storage.bolt.freelist.free_pages

Metric type	Value	Description
gauge	number	Number of free pages in the freelist for the Bolt database

vault.raft_storage.bolt.freelist.pending_pages

Metric type	Value	Description
gauge	number	Number of pending pages in the freelist for the Bolt database

vault.raft_storage.bolt.freelist.used_bytes

Metric type	Value	Description
gauge	bytes	Total space used by the freelist for the Bolt database

vault.raft_storage.bolt.node.count

Metric type	Value	Description
gauge	number	Number of node allocations for the Bolt database

vault.raft_storage.bolt.node.dereferences

Metric type	Value	Description
gauge	number	Total number of node dereferences by the Bolt database

vault.raft_storage.bolt.page.bytes_allocated

Metric type	Value	Description
gauge	bytes	Total space allocated to the Bolt database

vault.raft_storage.bolt.page.count

Metric type	Value	Description
gauge	number	Number of page allocations in the Bolt database

vault.raft_storage.bolt.rebalance.count

Metric type	Value	Description
gauge	number	Number of node rebalances performed by the Bolt database

vault.raft_storage.bolt.rebalance.time

Metric type	Value	Description
summary	ms	Time required by the Bolt database to rebalance nodes

vault.raft_storage.bolt.spill.count

Metric type	Value	Description
gauge	number	Number of nodes spilled by the Bolt database

vault.raft_storage.bolt.spill.time

Metric type	Value	Description
summary	ms	Total time spent spilling by the Bolt database

vault.raft_storage.bolt.split.count

Metric type	Value	Description
gauge	number	Number of nodes split by the Bolt database

vault.raft_storage.bolt.transaction.currently_open_read_transactions

Metric type	Value	Description
gauge	number	Number of in-process read transactions for the Bolt DB

vault.raft_storage.bolt.transaction.started_read_transactions

Metric type	Value	Description
gauge	number	Number of read transactions started by the Bolt DB

vault.raft_storage.bolt.write.count

Metric type	Value	Description
gauge	number	Number of writes performed by the Bolt database

vault.raft_storage.bolt.write.time

Metric type	Value	Description
counter	ms	Total cumulative time the Bolt database has spent writing to disk.

vault.raft_storage.follower.applied_index_delta

Metric type	Value	Description
gauge	number	The difference between the index applied by the leader and the index applied by the follower as reported by echoes

vault.raft_storage.follower.last_heartbeat_ms

Metric type	Value	Description
gauge	ms	Time since the follower last received a heartbeat request

vault.raft_storage.stats.applied_index

Metric type	Value	Description
gauge	number	Highest index of raft log last applied to the finite state machine or added to `fsm_pending queue`

vault.raft_storage.stats.commit_index

Metric type	Value	Description
gauge	number	Index of the last raft log committed to disk on the node

vault.raft_storage.stats.fsm_pending

Metric type	Value	Description
gauge	number	Number of raft logs queued by the node for the finite state machine to apply

vault.raft-storage.delete

Metric type	Value	Description
timer	ms	Time required to insert log entry to delete path

vault.raft-storage.entry_size

Metric type	Value	Description
summary	bytes	The total size of a raft entry during log application

vault.raft-storage.get

Metric type	Value	Description
timer	ms	Time required to retrieve a value for the given path from the finite state machine

vault.raft-storage.list

Metric type	Value	Description
timer	ms	Time required to list all entries under the prefix from the finite state machine

vault.raft-storage.put

Metric type	Value	Description
timer	ms	Time required to insert a log entry to the persist path

vault.raft-storage.transaction

Metric type	Value	Description
timer	ms	Time required to insert operations into a single log

Default metrics​

vault.raft.apply​

vault.raft.barrier​

vault.raft.candidate.electSelf​

vault.raft.commitNumLogs​

vault.raft.commitTime​

vault.raft.compactLogs​

vault.raft.fsm.apply​

vault.raft.fsm.applyBatch​

vault.raft.fsm.applyBatchNum​

vault.raft.fsm.enqueue​

vault.raft.fsm.restore​

vault.raft.fsm.snapshot​

vault.raft.fsm.store_config​

vault.raft.get​

vault.raft.list​

vault.raft.peers​

vault.raft.restore​

vault.raft.restoreUserSnapshot​

vault.raft.rpc.appendEntries​

vault.raft.rpc.appendEntries.processLogs​

vault.raft.rpc.appendEntries.storeLogs​

vault.raft.rpc.installSnapshot​

vault.raft.rpc.processHeartbeat​

vault.raft.rpc.requestVote​

vault.raft.snapshot.create​

vault.raft.snapshot.persist​

vault.raft.snapshot.takeSnapshot​

vault.raft.state.candidate​

vault.raft.state.follower​

vault.raft.state.leader​

vault.raft.transition.heartbeat_timeout​

vault.raft.transition.leader_lease_timeout​

vault.raft.verify_leader​

Autopilot metrics​

vault.autopilot.failure_tolerance​

vault.autopilot.healthy​

vault.autopilot.node.healthy​

Leadership change metrics​

vault.raft.leader.dispatchLog​

vault.raft.leader.dispatchNumLogs​

vault.raft.leader.lastContact​

Raft replication metrics​

vault.raft.replication.appendEntries.log​

vault.raft.replication.appendEntries.rpc​

vault.raft.replication.heartbeat​

vault.raft.replication.installSnapshot​

Storage metrics​

vault.raft_storage.bolt.cursor.count​

vault.raft_storage.bolt.freelist.allocated_bytes​

vault.raft_storage.bolt.freelist.free_pages​

vault.raft_storage.bolt.freelist.pending_pages​

vault.raft_storage.bolt.freelist.used_bytes​

vault.raft_storage.bolt.node.count​

vault.raft_storage.bolt.node.dereferences​

vault.raft_storage.bolt.page.bytes_allocated​

vault.raft_storage.bolt.page.count​

vault.raft_storage.bolt.rebalance.count​

vault.raft_storage.bolt.rebalance.time​

vault.raft_storage.bolt.spill.count​

vault.raft_storage.bolt.spill.time​

vault.raft_storage.bolt.split.count​

vault.raft_storage.bolt.transaction.currently_open_read_transactions​

vault.raft_storage.bolt.transaction.started_read_transactions​

vault.raft_storage.bolt.write.count​

vault.raft_storage.bolt.write.time​

vault.raft_storage.follower.applied_index_delta​

vault.raft_storage.follower.last_heartbeat_ms​

vault.raft_storage.stats.applied_index​

vault.raft_storage.stats.commit_index​

vault.raft_storage.stats.fsm_pending​

vault.raft-storage.delete​

vault.raft-storage.entry_size​

vault.raft-storage.get​

vault.raft-storage.list​

vault.raft-storage.put​

vault.raft-storage.transaction​