A bash script to manage multiple Nexus network nodes with different node IDs.
- Start, stop, and monitor multiple Nexus nodes
- View real-time logs and status
- Auto-restart nodes on failure
- Time-based success/error rate calculation (5-minute window)
- Process hang detection with automatic recovery
- Log rotation to prevent disk space issues
- Detailed monitoring with configurable parameters
- Automatic dependency check for Nexus Network CLI
- Optimized performance for large log files
graph TD
subgraph "Nexus Node Manager"
A[nexus_node_start.sh] --> B[Start Nodes]
A --> C[Stop Nodes]
A --> D[Check Status]
A --> E[View Logs]
A --> F[Monitor System]
end
subgraph "Nodes"
N1[Node 1]
N2[Node 2]
N3[Node 3]
end
subgraph "Monitoring System"
M1[Monitor Daemon]
M2[Success Rate Analysis]
M3[Auto-Restart]
M4[Hang Detection]
end
subgraph "Log Files"
L1[Node Logs]
L2[Monitor Log]
L3[Restart Log]
end
B --> N1
B --> N2
B --> N3
F --> M1
M1 --> M2
M1 --> M4
M2 --> M3
M4 --> M3
N1 --> L1
N2 --> L1
N3 --> L1
M1 --> L2
M3 --> L3
M3 -.Restarts.-> N1
M3 -.Restarts.-> N2
M3 -.Restarts.-> N3
- Clone this repository:
git clone https://github.com/anzai3458/nexus-scripts.git
- Copy the config template:
cp nexus_config.conf.template nexus_config.conf
- Edit the config file and update your node IDs:
nano nexus_config.conf
- Install Nexus Network CLI:
curl https://cli.nexus.xyz/ | sh
- Start your nodes:
./nexus_node_start.sh start
- Check status:
./nexus_node_start.sh status
- Start monitoring:
./nexus_node_start.sh monitor start
-
Download the script:
git clone https://github.com/anzai3458/nexus-scripts.git cd nexus-scripts chmod +x nexus_node_start.sh
-
Install the Nexus Network CLI:
curl https://cli.nexus.xyz/ | sh
After installation, restart or refresh your terminal:
# For Bash source ~/.bashrc # For Zsh source ~/.zshrc # Or simply open a new terminal window
Note: The script will automatically check if the CLI is installed and prompt you to install it if it's not.
-
Create your configuration file from the template:
# Copy the template cp nexus_config.conf.template nexus_config.conf # Edit the configuration file and update your node IDs nano nexus_config.conf
Make sure to update the NODE_IDS array with your actual node IDs:
# List of node IDs to manage - REPLACE THESE with your actual node IDs NODE_IDS=(6515746 6515747 6515748)
-
Make sure you have the
nexus-network
command in your PATH
Edit the nexus_config.conf
file to customize your settings:
Parameter | Description | Default |
---|---|---|
NODE_IDS |
Array of node IDs to manage | (required) |
MAX_LOG_SIZE_MB |
Maximum log file size before rotation | 25 |
MAX_LOG_FILES |
Number of log backups to keep | 2 |
MONITOR_ENABLED |
Enable/disable monitoring | false |
MONITOR_INTERVAL |
Seconds between monitoring checks | 30 |
SUCCESS_RATE_THRESHOLD |
Minimum success rate percentage | 60 |
MIN_LOG_ENTRIES |
Minimum log entries for rate calculation | 20 |
RESTART_COOLDOWN |
Seconds to wait before restarting same node | 300 |
RATE_CALCULATION_MINUTES |
Time window in minutes for rate calculation | 5 |
INACTIVITY_THRESHOLD |
Seconds of inactivity before considering a node as hanging | 300 |
ENABLE_NOTIFICATIONS |
Show notifications when monitor takes action | true |
LOG_RESTART_ACTIONS |
Log all restart actions | true |
Here's a typical workflow for managing your Nexus nodes:
# Create required directories
mkdir -p logs run
# Start all nodes
./nexus_node_start.sh start
# Output will show the PIDs of the started nodes
# Check status to ensure nodes are running
./nexus_node_start.sh status
# Start monitoring (only running nodes will be monitored)
./nexus_node_start.sh monitor start
# Or start monitoring and auto-start all nodes even if not running
./nexus_node_start.sh monitor start --force
# Check monitor status to see which nodes are being monitored
./nexus_node_start.sh monitor status
# Check node status each morning
./nexus_node_start.sh status
# Check error/success rates
./nexus_node_start.sh rates
# Check if any nodes were restarted overnight
cat run/nexus_restart.log
# View recent monitor events
./nexus_node_start.sh monitor status
# To restart a specific node for maintenance
./nexus_node_start.sh restart 6515746
# To temporarily remove a node from monitoring
./nexus_node_start.sh monitor remove 6515746
# Perform maintenance, then re-add to monitoring
./nexus_node_start.sh monitor add 6515746
# To restart the entire cluster
./nexus_node_start.sh stop
./nexus_node_start.sh start
# Start all nodes
./nexus_node_start.sh start
# Start a specific node
./nexus_node_start.sh start 6515746
# Check status of all nodes
./nexus_node_start.sh status
# Check status of a specific node
./nexus_node_start.sh status 6515746
# Stop all nodes
./nexus_node_start.sh stop
# Stop a specific node
./nexus_node_start.sh stop 6515746
# View logs (tail -f) for a specific node
./nexus_node_start.sh log 6515746
# View error/success rates for all nodes
./nexus_node_start.sh rates
# Restart all nodes
./nexus_node_start.sh restart
# Start monitoring daemon
./nexus_node_start.sh monitor start
# Force start monitoring even if no nodes are running
# This will also auto-start all configured nodes
./nexus_node_start.sh monitor start --force
# Check monitoring status
./nexus_node_start.sh monitor status
# View monitoring logs
./nexus_node_start.sh monitor log
# Stop monitoring
./nexus_node_start.sh monitor stop
# Add a node to monitoring
./nexus_node_start.sh monitor add 6515746
# Remove a node from monitoring
./nexus_node_start.sh monitor remove 6515746
The monitoring system tracks node health and can automatically restart nodes if:
- A node process stops running
- A node's success rate falls below the configured threshold
- A node appears to be hanging (no log output for 5 minutes)
- Only monitors nodes that are running when the monitor starts (or manually added)
- Calculates success/error rates from log entries within the last 5 minutes
- Detects hanging processes by monitoring the timestamp of the last log entry
- Restarts nodes that fail, hang, or have low success rates
- Applies a cooldown period to prevent excessive restarts
- Maintains separate logs for monitor events and restart actions
The success/error rates are calculated based on:
- Log entries from the past 5 minutes only (time-based window)
- Entries with timestamps outside this window are ignored
- Each entry is categorized as Success, Error, or Refresh
- Success rate = (Success entries / Total entries) Γ 100%
- Optimized for large log files with efficient pre-filtering and in-memory timestamp parsing
A node is considered "hanging" when:
- No new log entries have been generated in the past 5 minutes
- The node process is still running but not producing output
- The monitor will automatically restart hanging nodes to recover them
- Node logs:
logs/nexus_node_<node_id>.log
- Monitor log:
run/nexus_monitor.log
- Restart log:
run/nexus_restart.log
Check the node's log file for errors:
cat logs/nexus_node_<node_id>.log
-
Check if the node is in the monitored nodes list:
./nexus_node_start.sh monitor status
-
Check the monitor log for events:
./nexus_node_start.sh monitor log
-
Verify monitor is running:
./nexus_node_start.sh monitor status
If the script reports a node is running but it's not:
# Stop the node to clean up the PID file
./nexus_node_start.sh stop <node_id>
# Then start it again
./nexus_node_start.sh start <node_id>
MIT License
The Nexus Network Node Manager is designed to simplify the operation and maintenance of multiple Nexus network nodes. With its automated monitoring and restart capabilities, you can ensure maximum uptime for your nodes while minimizing manual intervention.
Key benefits:
- π Simplifies managing multiple nodes from a single interface
- π Provides detailed error and success rate analytics
- π οΈ Automatically recovers from failures and hanging processes
- π Maintains comprehensive logs for troubleshooting
- βοΈ Flexible configuration to meet your specific needs
- π Optimized performance even with large log files
For questions, issues, or contributions, please open a GitHub issue or submit a pull request. Happy node running! π