Skip to main content
ALL CHAINS

Monitoring & Alerting

What to Monitor

MetricWhyAlert Threshold
Block heightDetect sync stallsNo increase in 30 minutes
Peer countNetwork connectivityBelow 3 peers
Disk usagePrevent full diskAbove 85% capacity
RAM usageDetect memory leaksAbove 90% capacity
Process statusDetect crashesProcess not running
BPoS statusDetect penaltiesStatus changed from "Active"
Mempool sizeDetect spam/congestionAbove 10,000 transactions
Block timestampDetect clock driftMore than 5 minutes behind
Chain data growth rateCapacity planningN/A (informational)

Built-In Status Commands

# Full status of all chains
node.sh status

# Per-chain status (shows 18 metrics for ELA)
node.sh ela status

# Output includes:
# Version, disk usage, address, public key, balance
# PID, RAM, uptime, file descriptors, TCP ports/connections
# Peers, height
# BPoS state (name, status, staked, votes, unclaimed rewards)
# Elastos Council state (name, status)

RPC Health Checks

ELA main chain:

# Current block height
node.sh ela jsonrpc getcurrentheight

# Best block hash
node.sh ela jsonrpc getbestblockhash

# Connection count
node.sh ela jsonrpc getconnectioncount

# Node state (comprehensive)
node.sh ela jsonrpc nodestate

# Memory pool info
node.sh ela jsonrpc getrawmempool

ESC sidechain:

# Block number
curl -s -X POST http://127.0.0.1:20636 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}'

# Peer count
curl -s -X POST http://127.0.0.1:20636 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}'

# Syncing status
curl -s -X POST http://127.0.0.1:20636 \
-H "Content-Type: application/json" \
-d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'

Alerting with Elastos.ELA.Monitor

The official monitoring tool is a Python-based cron job from github.com/elastos/Elastos.ELA.Monitor:

Checks performed:

  • Block height stalls (no new blocks in N minutes)
  • Producer status changes
  • Mempool size thresholds
  • Peer count minimums

Alerts: Sent via SMTP email.

Setup:

git clone https://github.com/elastos/Elastos.ELA.Monitor.git
cd Elastos.ELA.Monitor

# Configure monitoring targets and SMTP settings
cp config.example.json config.json
# Edit config.json with your ELA RPC details and email settings

# Install as cron job (run every 5 minutes)
crontab -e
# Add: */5 * * * * cd /path/to/Elastos.ELA.Monitor && python3 monitor.py

Custom Monitoring Scripts

Block height monitor (bash):

#!/bin/bash
PREV_HEIGHT_FILE="/tmp/ela_height"
ALERT_EMAIL="[email protected]"

CURRENT_HEIGHT=$(node.sh ela jsonrpc getcurrentheight 2>/dev/null | jq -r '.result')

if [ -f "$PREV_HEIGHT_FILE" ]; then
PREV_HEIGHT=$(cat "$PREV_HEIGHT_FILE")
if [ "$CURRENT_HEIGHT" = "$PREV_HEIGHT" ]; then
echo "ELA block height stalled at $CURRENT_HEIGHT" | \
mail -s "ALERT: ELA Sync Stalled" "$ALERT_EMAIL"
fi
fi

echo "$CURRENT_HEIGHT" > "$PREV_HEIGHT_FILE"

Process watchdog (bash):

#!/bin/bash
COMPONENTS=("ela" "esc" "eid" "arbiter")
ALERT_EMAIL="[email protected]"

for comp in "${COMPONENTS[@]}"; do
if ! pgrep -x "$comp" > /dev/null 2>&1; then
echo "$comp process is not running on $(hostname)" | \
mail -s "ALERT: $comp DOWN on $(hostname)" "$ALERT_EMAIL"

# Attempt auto-restart
node.sh "$comp" start
fi
done

Disk usage monitor (bash):

#!/bin/bash
THRESHOLD=85
ALERT_EMAIL="[email protected]"

USAGE=$(df ~/node --output=pcent | tail -1 | tr -d ' %')

if [ "$USAGE" -gt "$THRESHOLD" ]; then
echo "Disk usage at ${USAGE}% on $(hostname). Node directory: ~/node" | \
mail -s "ALERT: Disk ${USAGE}% on $(hostname)" "$ALERT_EMAIL"
fi

Install all three as cron jobs:

crontab -e
# Add:
*/5 * * * * /home/elastos/scripts/monitor_height.sh
*/2 * * * * /home/elastos/scripts/monitor_process.sh
*/30 * * * * /home/elastos/scripts/monitor_disk.sh

Prometheus & Grafana Integration

For production environments, export metrics to Prometheus. ESC/EID (geth forks) have built-in metrics support:

# Start ESC with metrics enabled
./esc --datadir data --metrics --metrics.addr 127.0.0.1 --metrics.port 6060 ...

For ELA main chain, create a custom exporter that polls RPC and exposes Prometheus metrics:

# prometheus.yml scrape config
scrape_configs:
- job_name: 'elastos-ela'
static_configs:
- targets: ['localhost:9101']
scrape_interval: 30s

- job_name: 'elastos-esc'
static_configs:
- targets: ['localhost:6060']
scrape_interval: 15s