The tools and procedures described here are intended to maintain node stability and to diagnose issues that arise in production environments. For validator-specific operations, refer to Validator Operations. For initial network deployment and configuration, refer to Network Deployment.
Metrics Collection
StableNet nodes expose a wide range of internal metrics to observe performance and operational status.The metrics system is implemented using the
go-metrics library and provides real-time indicators for consensus processing, transaction handling, network status, and database behavior.
In production environments, metric collection enables operators to monitor:
- Block production and consensus latency
- Transaction pool congestion
- Peer connection stability
- Database and state processing performance
- Resource bottlenecks (CPU, memory, disk I/O)
Enabling Metrics
Metrics collection is controlled via command-line flags.| Flag | Description | Default |
|---|---|---|
--metrics | Enable metric collection and exposure | Disabled |
--metrics.expensive | Enable expensive metrics (not recommended in production) | Disabled |
--metrics.addr | Metrics HTTP server bind address | None |
--metrics.port | Metrics HTTP server port | 6060 |
http://<addr>:<port>/debug/metrics
In production environments, it is recommended not to expose the metrics server directly to the public network.Instead, restrict access via an internal network or a proxy.
Exporting to InfluxDB
Metrics can be exported to InfluxDB for long-term storage and time-series analysis.StableNet supports both InfluxDB v1 and v2.
InfluxDB v1 Configuration
| Flag | Description |
|---|---|
--metrics.influxdb | Enable InfluxDB v1 export |
--metrics.influxdb.endpoint | InfluxDB API endpoint |
--metrics.influxdb.database | Database name |
--metrics.influxdb.username | Authentication username |
--metrics.influxdb.password | Authentication password |
--metrics.influxdb.tags | Comma-separated key/value tags |
InfluxDB v2 Configuration
| Flag | Description |
|---|---|
--metrics.influxdbv2 | Enable InfluxDB v2 export |
--metrics.influxdb.token | Authentication token |
--metrics.influxdb.bucket | Bucket name |
--metrics.influxdb.organization | Organization name |
Available Metrics
Key Metrics by Category
| Category | Metric Name | Type | Description | Source |
|---|---|---|---|---|
| WBFT Consensus | consensus/wbft/core/commitwork | Timer | Block commit processing time | miner/worker.go |
| Worker | miner.newTxs | Counter | Number of incoming transactions | miner/worker.go |
| Worker | miner.running | Bool | Whether the block production worker is active | miner/worker.go |
| Worker | miner.syncing | Bool | Whether the node is syncing | miner/worker.go |
| Chain | Block insertion rate | Meter | Block insertion throughput | core/blockchain.go |
| Chain | Reorg depth | Counter | Chain reorganization depth | core/blockchain.go |
| TxPool | Pending transactions | Gauge | Executable transactions | core/txpool/legacypool |
| TxPool | Queued transactions | Gauge | Queued (non-executable) transactions | core/txpool/legacypool |
| P2P | Peer count | Gauge | Number of connected peers | p2p/server.go |
| P2P | Ingress / Egress | Meter | Network traffic | p2p/server.go |
| State | Trie cache hits | Counter | State cache hit count | core/state/statedb.go |
| State | Commit time | Timer | State commit duration | core/state/statedb.go |
StableNet / Anzeon-Specific Metrics
In Anzeon (WBFT)-based networks, additional consensus-specific metrics are available:- Gas tip change events via governance
- Epoch-based validator set changes
- BLS signature verification latency
- Round changes and timeout occurrences
Worker State Monitoring
Theworker structure exposes several atomic variables that reflect block production and consensus processing state.These are critical for diagnosing why block production may have stalled on validator nodes.
Logging
StableNet uses a structured logging system, with configurable verbosity levels to control output detail.Log Levels
| Level | Value | Description |
|---|---|---|
| Critical | 1 | Fatal errors requiring immediate action |
| Error | 2 | Errors that may cause functional failure |
| Warn | 3 | Warning conditions |
| Info | 4 | General operational information (default) |
| Debug | 5 | Detailed debugging logs |
| Trace | 6 | Very detailed trace logs |
Log Output Examples
Node startup logs:Disk Space Management
StableNet includes automatic disk space monitoring to prevent database corruption caused by insufficient disk space.Disk Space Thresholds
The effective threshold is determined by one of the following:- Default:
2 * TrieDirtyCache - Cache-based:
2 * --cache * --cache.gc / 100 - Explicit configuration:
--datadir.minfreedisk
Behavior
- Free space ≥ 2× threshold: normal operation
- Between 1× and 2×: periodic warning logs
- < 1×: node shutdown is triggered
Platform-Specific Implementations
| Platform | Implementation | System Call |
|---|---|---|
| Linux / Unix | cmd/utils/diskusage.go | syscall.Statfs() |
| Windows | cmd/utils/diskusage_windows.go | GetDiskFreeSpaceEx() |
| OpenBSD | cmd/utils/diskusage_openbsd.go | syscall.Statfs() |
Database Maintenance
Database Backends
StableNet supports the following database backends:- LevelDB
- Pebble
Compaction
- Automatic compaction runs in the background during normal operation.
- Completion is logged.
State Pruning
- Offline pruning is supported only for the hash-based state schema
- Requires the node to be stopped
- If interrupted, recovery is attempted on the next startup
Ancient Data (Freezer)
- Data older than a certain block height is moved to ancient storage
- Read-only with high compression efficiency
- Grows continuously as the chain advances
Node Health Monitoring
StableNet tracks abnormal shutdowns.- A clean marker is written on normal shutdown
- A warning log is emitted on restart after an abnormal shutdown
Monitoring Checklist
Daily- Synchronization completed
- Peer count ≥ 10
- Available disk space check
- Database size growth trend
- Occurrence of abnormal shutdowns
- Review log retention policies
- Manage ancient data growth
- Evaluate version upgrades
Troubleshooting
Node Synchronization Failure
- Verify peer connectivity
- Check firewall configuration
- Confirm network ID and bootnode settings
High Memory Usage
- Review cache configuration
- Check whether archive mode is enabled
- Inspect system OOM logs
Slow Block Processing
- Check disk I/O bottlenecks
- Inspect the
commitworkmetric - Review cache settings and CPU core availability
Database Corruption
- Stop the node immediately
- Restore from backup or resynchronize
- Consider switching to the Pebble backend for long-term stability

