Node Monitoring

Overview

Learn how to monitor your Circle Layer node effectively using the correct Geth-based monitoring tools and approaches.

Monitoring Setup

1. Basic Node Monitoring

# Check if node process is running
ps aux | grep geth

# Check service status (if using systemd)
sudo systemctl status circlelayer.service

# View real-time logs
tail -f /data/circlelayer/logs/systemd_chain_console.out

# Check sync status via RPC
curl -H "Content-Type: application/json" \
  -X POST --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
  http://localhost:8545

# Check current block number
curl -H "Content-Type: application/json" \
  -X POST --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://localhost:8545

# Check peer count
curl -H "Content-Type: application/json" \
  -X POST --data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
  http://localhost:8545

2. System Resource Monitoring

# Check memory usage
free -h

# Check disk space (important for SSD requirements)
df -h /data/circlelayer

# Monitor CPU usage
top -p $(pgrep geth)

# Check network connectivity
netstat -tlnp | grep 32668

# Monitor disk I/O (critical for SSD performance)
sudo iotop -a

3. Prometheus Integration

# prometheus.yml - Updated for Geth metrics
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'circlelayer-geth'
    static_configs:
      - targets: ['localhost:6060']  # Geth metrics endpoint
    metrics_path: '/debug/metrics/prometheus'
    
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']  # System metrics

4. Grafana Dashboard Configuration

{
  "dashboard": {
    "title": "Circle Layer Node Dashboard",
    "panels": [
      {
        "title": "Block Height",
        "type": "stat",
        "targets": [
          {
            "expr": "geth_chain_head_block",
            "legendFormat": "Current Block"
          }
        ]
      },
      {
        "title": "Peer Count",
        "type": "stat",
        "targets": [
          {
            "expr": "geth_p2p_peers",
            "legendFormat": "Connected Peers"
          }
        ]
      },
      {
        "title": "Transaction Pool",
        "type": "graph",
        "targets": [
          {
            "expr": "geth_txpool_pending",
            "legendFormat": "Pending Transactions"
          },
          {
            "expr": "geth_txpool_queued", 
            "legendFormat": "Queued Transactions"
          }
        ]
      },
      {
        "title": "CPU Usage",
        "type": "graph",
        "targets": [
          {
            "expr": "100 - (avg by (instance) (rate(node_cpu_seconds_total{mode=\"idle\"}[5m])) * 100)",
            "legendFormat": "CPU Usage %"
          }
        ]
      },
      {
        "title": "Memory Usage",
        "type": "graph", 
        "targets": [
          {
            "expr": "geth_system_memory_allocs",
            "legendFormat": "Memory Allocations"
          }
        ]
      },
      {
        "title": "Disk I/O",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(node_disk_io_time_seconds_total[5m])",
            "legendFormat": "Disk I/O Time"
          }
        ]
      }
    ]
  }
}

Key Metrics

1. Node Health Indicators

Sync Status

# Check if node is syncing
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
  http://localhost:8545

Block Progress

# Monitor block progression
watch -n 3 'curl -s -X POST -H "Content-Type: application/json" \
  --data "{\"jsonrpc\":\"2.0\",\"method\":\"eth_blockNumber\",\"params\":[],\"id\":1}" \
  http://localhost:8545 | jq -r ".result" | xargs printf "%d\n"'

Peer Connectivity

# Monitor peer connections
curl -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
  http://localhost:8545

2. Performance Metrics

Transaction Throughput

Pending transactions in mempool
Transaction processing rate
Gas usage patterns
Block utilization

Resource Usage

CPU: Should stay below 80% average
Memory: Monitor Geth memory allocation
Disk: Critical SSD performance (IOPS > 5,000)
Network: Monitor P2P bandwidth on port 32668

Network Performance

Block propagation time
Peer discovery efficiency
Network latency to validators
Connection stability

3. Security Metrics

Port Security

# Verify only required ports are open
sudo nmap -sT -p 32668,8545,8546 localhost

# Check firewall status
sudo ufw status verbose

Process Security

# Verify geth process ownership
ps aux | grep geth | grep -v grep

# Check file permissions
ls -la /data/circlelayer/config.toml

Alerting Configuration

1. Critical Alerts

# alert-rules.yml - Updated for Circle Layer
groups:
  - name: circlelayer_critical
    rules:
      - alert: NodeDown
        expr: up{job="circlelayer-geth"} == 0
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Circle Layer node is down"
          description: "Node has been down for more than 2 minutes"
      
      - alert: SyncFalling
        expr: increase(geth_chain_head_block[5m]) < 50
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node falling behind in sync"
          description: "Block progression is too slow"
          
      - alert: LowPeerCount
        expr: geth_p2p_peers < 3
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "Low peer count"
          description: "Node has fewer than 3 peers"

      - alert: HighCPUUsage
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage"
          description: "CPU usage above 85% for 5 minutes"

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes{mountpoint="/data/circlelayer"} / node_filesystem_size_bytes) * 100 < 20
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Low disk space"
          description: "Less than 20% disk space remaining"

      - alert: HighDiskIO
        expr: rate(node_disk_io_time_seconds_total[5m]) > 0.8
        for: 3m
        labels:
          severity: warning
        annotations:
          summary: "High disk I/O"
          description: "Disk I/O time high - SSD performance may be degraded"

2. Notification Channels

Telegram Bot Setup

# Create notification script
cat > /usr/local/bin/alert-notify.sh << 'EOF'
#!/bin/bash
BOT_TOKEN="YOUR_BOT_TOKEN"
CHAT_ID="YOUR_CHAT_ID"
MESSAGE="$1"

curl -s -X POST "https://api.telegram.org/bot${BOT_TOKEN}/sendMessage" \
  -d chat_id="${CHAT_ID}" \
  -d text="${MESSAGE}"
EOF

chmod +x /usr/local/bin/alert-notify.sh

Email Alerts

# alertmanager.yml
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'web.hook'

receivers:
  - name: 'web.hook'
    email_configs:
      - to: 'admin@circlelayer.com'
from: 'admin@circlelayer.com'
        subject: 'Circle Layer Node Alert'
        body: |
          {{ range .Alerts }}
          Alert: {{ .Annotations.summary }}
          Description: {{ .Annotations.description }}
          {{ end }}

Advanced Monitoring

1. Custom Monitoring Scripts

Node Health Check Script

#!/bin/bash
# /usr/local/bin/node-health-check.sh

LOG_FILE="/data/circlelayer/logs/health-check.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

# Function to log with timestamp
log() {
    echo "[$TIMESTAMP] $1" >> $LOG_FILE
}

# Check if geth process is running
if ! pgrep -f geth > /dev/null; then
    log "ERROR: Geth process not running"
    exit 1
fi

# Check RPC connectivity
if ! curl -f -s http://localhost:8545 > /dev/null; then
    log "ERROR: RPC endpoint not responding"
    exit 1
fi

# Check sync status
SYNC_STATUS=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}' \
  http://localhost:8545 | jq -r '.result')

if [ "$SYNC_STATUS" != "false" ]; then
    log "WARNING: Node is still syncing"
fi

# Check peer count
PEER_COUNT=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
  http://localhost:8545 | jq -r '.result')

PEER_COUNT_DEC=$((PEER_COUNT))
if [ $PEER_COUNT_DEC -lt 3 ]; then
    log "WARNING: Low peer count: $PEER_COUNT_DEC"
fi

log "Health check completed successfully - Peers: $PEER_COUNT_DEC"

Performance Monitoring Script

#!/bin/bash
# /usr/local/bin/performance-monitor.sh

METRICS_FILE="/data/circlelayer/logs/performance-metrics.log"
TIMESTAMP=$(date '+%Y-%m-%d %H:%M:%S')

# Collect system metrics
CPU_USAGE=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
MEMORY_USAGE=$(free | grep Mem | awk '{printf "%.2f", ($3/$2) * 100.0}')
DISK_USAGE=$(df /data/circlelayer | tail -1 | awk '{print $5}' | sed 's/%//')

# Collect geth metrics via RPC
BLOCK_NUMBER=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' \
  http://localhost:8545 | jq -r '.result')
  
PEER_COUNT=$(curl -s -X POST -H "Content-Type: application/json" \
  --data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' \
  http://localhost:8545 | jq -r '.result')

# Log metrics
echo "[$TIMESTAMP] CPU:${CPU_USAGE}% MEM:${MEMORY_USAGE}% DISK:${DISK_USAGE}% BLOCK:$BLOCK_NUMBER PEERS:$PEER_COUNT" >> $METRICS_FILE

2. Log Analysis

Log Parsing and Analysis

# Monitor error patterns
grep -i "error\|panic\|fatal" /data/circlelayer/logs/systemd_chain_console.out | tail -20

# Monitor connection issues
grep -i "dial\|connection\|timeout" /data/circlelayer/logs/systemd_chain_console.out | tail -10

# Monitor block production
grep -i "imported\|mined" /data/circlelayer/logs/systemd_chain_console.out | tail -10

# Performance patterns
grep -i "slow\|timeout\|delay" /data/circlelayer/logs/systemd_chain_console.out | tail -10

Log Rotation Configuration

# Configure logrotate for Circle Layer logs
sudo cat > /etc/logrotate.d/circlelayer << 'EOF'
/data/circlelayer/logs/*.log {
    daily
    rotate 30
    compress
    delaycompress
    missingok
    notifempty
    create 644 circlelayer circlelayer
    postrotate
        systemctl reload circlelayer.service
    endscript
}
EOF

Maintenance Procedures

1. Regular Health Checks

Daily Checks

# Create daily monitoring script
#!/bin/bash
# daily-check.sh

echo "=== Circle Layer Node Daily Check ===" 
echo "Date: $(date)"
echo "Node Status: $(systemctl is-active circlelayer.service)"
echo "Disk Usage: $(df -h /data/circlelayer | tail -1 | awk '{print $5}')"
echo "Memory Usage: $(free -h | grep Mem | awk '{print $3 "/" $2}')"
echo "Log Size: $(du -sh /data/circlelayer/logs)"
echo "Last Block: $(curl -s -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"eth_blockNumber","params":[],"id":1}' http://localhost:8545 | jq -r '.result')"
echo "Peer Count: $(curl -s -X POST -H "Content-Type: application/json" --data '{"jsonrpc":"2.0","method":"net_peerCount","params":[],"id":1}' http://localhost:8545 | jq -r '.result')"
echo "====================================="

Weekly Maintenance

Review log files for unusual patterns
Check system resource trends
Verify backup procedures
Update monitoring configurations
Test alert systems

2. Performance Optimization

Resource Tuning

# Optimize system for Circle Layer
echo 'vm.swappiness=10' >> /etc/sysctl.conf
echo 'fs.file-max=2097152' >> /etc/sysctl.conf
echo 'net.core.rmem_max=134217728' >> /etc/sysctl.conf
echo 'net.core.wmem_max=134217728' >> /etc/sysctl.conf

# Apply settings
sysctl -p

Geth Performance Tuning

Monitor and adjust cache settings
Optimize database configuration
Tune network buffer sizes
Monitor and manage log levels

3. Backup and Recovery

Configuration Backup

# Backup critical files
tar -czf circlelayer-backup-$(date +%Y%m%d).tar.gz \
    /data/circlelayer/config.toml \
    /data/circlelayer/run.sh \
    /etc/systemd/system/circlelayer.service \
    /etc/logrotate.d/circlelayer

Data Recovery Procedures

Document chaindata restoration process
Test backup and recovery procedures
Maintain emergency contact information
Document rollback procedures

Troubleshooting

Common Issues and Solutions

Node Won't Start

# Check service logs
sudo journalctl -u circlelayer.service -f

# Check configuration
geth --datadir /data/circlelayer/data --config /data/circlelayer/config.toml --check-config

# Check permissions
ls -la /data/circlelayer/

Sync Issues

# Force resync (use with caution)
sudo systemctl stop circlelayer.service
rm -rf /data/circlelayer/data/geth/chaindata
sudo systemctl start circlelayer.service

Performance Issues

# Check disk performance
sudo hdparm -Tt /dev/sda

# Monitor I/O
sudo iotop -aqqqd 1

# Check network issues
ping -c 4 8.8.8.8

For additional support, refer to:

Node Monitoring

Overview​

Monitoring Setup​

1. Basic Node Monitoring​

2. System Resource Monitoring​

3. Prometheus Integration​

4. Grafana Dashboard Configuration​

Key Metrics​

1. Node Health Indicators​

Sync Status​

Block Progress​

Peer Connectivity​

2. Performance Metrics​

Transaction Throughput​

Resource Usage​

Network Performance​

3. Security Metrics​

Port Security​

Process Security​

Alerting Configuration​

1. Critical Alerts​

2. Notification Channels​

Telegram Bot Setup​

Email Alerts​

Advanced Monitoring​

1. Custom Monitoring Scripts​

Node Health Check Script​

Performance Monitoring Script​

2. Log Analysis​

Log Parsing and Analysis​

Log Rotation Configuration​

Maintenance Procedures​

1. Regular Health Checks​

Daily Checks​

Weekly Maintenance​

2. Performance Optimization​

Resource Tuning​

Geth Performance Tuning​

3. Backup and Recovery​

Configuration Backup​

Data Recovery Procedures​

Troubleshooting​

Common Issues and Solutions​

Node Won't Start​

Sync Issues​

Performance Issues​

Overview

Monitoring Setup

1. Basic Node Monitoring

2. System Resource Monitoring

3. Prometheus Integration

4. Grafana Dashboard Configuration

Key Metrics

1. Node Health Indicators

Sync Status

Block Progress

Peer Connectivity

2. Performance Metrics

Transaction Throughput

Resource Usage

Network Performance

3. Security Metrics

Port Security

Process Security

Alerting Configuration

1. Critical Alerts

2. Notification Channels

Telegram Bot Setup

Email Alerts

Advanced Monitoring

1. Custom Monitoring Scripts

Node Health Check Script

Performance Monitoring Script

2. Log Analysis

Log Parsing and Analysis

Log Rotation Configuration

Maintenance Procedures

1. Regular Health Checks

Daily Checks

Weekly Maintenance

2. Performance Optimization

Resource Tuning

Geth Performance Tuning

3. Backup and Recovery

Configuration Backup

Data Recovery Procedures

Troubleshooting

Common Issues and Solutions

Node Won't Start

Sync Issues

Performance Issues