Node Monitoring and Maintenance

This document provides a practical guide for operating StableNet nodes, including metric collection, logging, disk space management, database maintenance, and common troubleshooting scenarios.
The tools and procedures described here are intended to maintain node stability and to diagnose issues that arise in production environments. For validator-specific operations, refer to Validator Operations. For initial network deployment and configuration, refer to Network Deployment.

Metrics Collection

StableNet nodes expose a wide range of internal metrics to observe performance and operational status.
The metrics system is implemented using the go-metrics library and provides real-time indicators for consensus processing, transaction handling, network status, and database behavior. In production environments, metric collection enables operators to monitor:

Block production and consensus latency
Transaction pool congestion
Peer connection stability
Database and state processing performance
Resource bottlenecks (CPU, memory, disk I/O)

Enabling Metrics

Metrics collection is controlled via command-line flags.

Flag	Description	Default
`--metrics`	Enable metric collection and exposure	Disabled
`--metrics.expensive`	Enable expensive metrics (not recommended in production)	Disabled
`--metrics.addr`	Metrics HTTP server bind address	None
`--metrics.port`	Metrics HTTP server port	6060

Example: running the default metrics server

gstable --metrics --metrics.addr 0.0.0.0 --metrics.port 6060

The metrics HTTP endpoint exposes Prometheus-compatible data at: http://<addr>:<port>/debug/metrics In production environments, it is recommended not to expose the metrics server directly to the public network.
Instead, restrict access via an internal network or a proxy.

Exporting to InfluxDB

Metrics can be exported to InfluxDB for long-term storage and time-series analysis.
StableNet supports both InfluxDB v1 and v2.

InfluxDB v1 Configuration

Flag	Description
`--metrics.influxdb`	Enable InfluxDB v1 export
`--metrics.influxdb.endpoint`	InfluxDB API endpoint
`--metrics.influxdb.database`	Database name
`--metrics.influxdb.username`	Authentication username
`--metrics.influxdb.password`	Authentication password
`--metrics.influxdb.tags`	Comma-separated key/value tags

InfluxDB v2 Configuration

Flag	Description
`--metrics.influxdbv2`	Enable InfluxDB v2 export
`--metrics.influxdb.token`	Authentication token
`--metrics.influxdb.bucket`	Bucket name
`--metrics.influxdb.organization`	Organization name

Examples:

# InfluxDB v1
gstable --metrics --metrics.influxdb \
  --metrics.influxdb.endpoint "http://localhost:8086" \
  --metrics.influxdb.database "gstable_metrics" \
  --metrics.influxdb.username "admin" \
  --metrics.influxdb.password "secret" \
  --metrics.influxdb.tags "host=node01,network=mainnet"

# InfluxDB v2
gstable --metrics --metrics.influxdbv2 \
  --metrics.influxdb.endpoint "http://localhost:8086" \
  --metrics.influxdb.token "my-token" \
  --metrics.influxdb.bucket "gstable" \
  --metrics.influxdb.organization "my-org"

Available Metrics

Key Metrics by Category

Category	Metric Name	Type	Description	Source
WBFT Consensus	`consensus/wbft/core/commitwork`	Timer	Block commit processing time	miner/worker.go
Worker	`miner.newTxs`	Counter	Number of incoming transactions	miner/worker.go
Worker	`miner.running`	Bool	Whether the block production worker is active	miner/worker.go
Worker	`miner.syncing`	Bool	Whether the node is syncing	miner/worker.go
Chain	Block insertion rate	Meter	Block insertion throughput	core/blockchain.go
Chain	Reorg depth	Counter	Chain reorganization depth	core/blockchain.go
TxPool	Pending transactions	Gauge	Executable transactions	core/txpool/legacypool
TxPool	Queued transactions	Gauge	Queued (non-executable) transactions	core/txpool/legacypool
P2P	Peer count	Gauge	Number of connected peers	p2p/server.go
P2P	Ingress / Egress	Meter	Network traffic	p2p/server.go
State	Trie cache hits	Counter	State cache hit count	core/state/statedb.go
State	Commit time	Timer	State commit duration	core/state/statedb.go

StableNet / Anzeon-Specific Metrics

In Anzeon (WBFT)-based networks, additional consensus-specific metrics are available:

Gas tip change events via governance
Epoch-based validator set changes
BLS signature verification latency
Round changes and timeout occurrences

Worker State Monitoring

The worker structure exposes several atomic variables that reflect block production and consensus processing state.
These are critical for diagnosing why block production may have stalled on validator nodes.

Logging

StableNet uses a structured logging system, with configurable verbosity levels to control output detail.

Log Levels

Level	Value	Description
Critical	1	Fatal errors requiring immediate action
Error	2	Errors that may cause functional failure
Warn	3	Warning conditions
Info	4	General operational information (default)
Debug	5	Detailed debugging logs
Trace	6	Very detailed trace logs

Log Output Examples

Node startup logs:

INFO Starting Gstable on StableNet
INFO Maximum peer count total=50
INFO Set global gas cap cap=50,000,000

Disk space warning logs:

WARN Disk space is running low available=50GiB
ERROR Low disk space. Shutting down to prevent database corruption

Disk Space Management

StableNet includes automatic disk space monitoring to prevent database corruption caused by insufficient disk space.

Disk Space Thresholds

The effective threshold is determined by one of the following:

Default: 2 * TrieDirtyCache
Cache-based: 2 * --cache * --cache.gc / 100
Explicit configuration: --datadir.minfreedisk

Behavior

Free space ≥ 2× threshold: normal operation
Between 1× and 2×: periodic warning logs
< 1×: node shutdown is triggered

Platform-Specific Implementations

Platform	Implementation	System Call
Linux / Unix	`cmd/utils/diskusage.go`	`syscall.Statfs()`
Windows	`cmd/utils/diskusage_windows.go`	`GetDiskFreeSpaceEx()`
OpenBSD	`cmd/utils/diskusage_openbsd.go`	`syscall.Statfs()`

Database Maintenance

Database Backends

StableNet supports the following database backends:

LevelDB
Pebble

Compaction

Automatic compaction runs in the background during normal operation.
Completion is logged.

INFO Database compaction finished elapsed=5m30s

State Pruning

Offline pruning is supported only for the hash-based state schema
Requires the node to be stopped
If interrupted, recovery is attempted on the next startup

Ancient Data (Freezer)

Data older than a certain block height is moved to ancient storage
Read-only with high compression efficiency
Grows continuously as the chain advances

Node Health Monitoring

StableNet tracks abnormal shutdowns.

A clean marker is written on normal shutdown
A warning log is emitted on restart after an abnormal shutdown

Monitoring Checklist

Daily

Synchronization completed
Peer count ≥ 10
Available disk space check

Weekly

Database size growth trend
Occurrence of abnormal shutdowns

Monthly

Review log retention policies
Manage ancient data growth
Evaluate version upgrades

Troubleshooting

Node Synchronization Failure

Verify peer connectivity
Check firewall configuration
Confirm network ID and bootnode settings

High Memory Usage

Review cache configuration
Check whether archive mode is enabled
Inspect system OOM logs

Slow Block Processing

Check disk I/O bottlenecks
Inspect the commitwork metric
Review cache settings and CPU core availability

Database Corruption

Stop the node immediately
Restore from backup or resynchronize
Consider switching to the Pebble backend for long-term stability

By following these monitoring and maintenance practices, StableNet nodes can be operated reliably, and production issues can be diagnosed and resolved efficiently.

​Metrics Collection

​Enabling Metrics

​Exporting to InfluxDB

​InfluxDB v1 Configuration

​InfluxDB v2 Configuration

​Available Metrics

​Key Metrics by Category

​StableNet / Anzeon-Specific Metrics

​Worker State Monitoring

​Logging

​Log Levels

​Log Output Examples

​Disk Space Management

​Disk Space Thresholds

​Behavior

​Platform-Specific Implementations

​Database Maintenance

​Database Backends

​Compaction

​State Pruning

​Ancient Data (Freezer)

​Node Health Monitoring

​Monitoring Checklist

​Troubleshooting

​Node Synchronization Failure

​High Memory Usage

​Slow Block Processing

​Database Corruption