A high-performance, multi-threaded disk health monitoring tool for Linux systems. Disk Patrol continuously reads from block devices to detect and report I/O errors, helping identify failing sectors before they cause data loss.
- Continuous Monitoring: Patrols entire disk surfaces over configurable time periods
- Multi-Device Support: Monitor multiple disks concurrently with independent patrol cycles
- Smart Scheduling: Adaptive sleep intervals with optional jitter to spread I/O load
- Error Detection: Tracks and reports read errors with sector-level precision
- Email Alerts: Configurable SMTP notifications when error thresholds are exceeded
- Progress Tracking: Real-time progress bars showing patrol status and error counts
- State Persistence: Resumes patrol from last position after restarts
- O_DIRECT I/O: Bypasses page cache for accurate hardware error detection
- Flexible Logging: Support for file logging and syslog integration
# Install Rust if not already installed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# Clone and build
git clone https://github.com/permezel/disk-patrol.git
cd disk-patrol
cargo build --release
# Install binary
sudo cp target/release/disk_patrol /usr/local/bin/
# Create directories
sudo mkdir -p /etc/disk_patrol /var/lib/disk_patrol /var/log
- Rust 1.70 or later
- Linux kernel with O_DIRECT support
- Block device access (typically requires root)
- Generate a configuration file:
disk_patrol --generate-config /etc/disk_patrol/config.toml
- Edit the configuration to specify your devices:
sudo vi /etc/disk_patrol/config.toml
- Start patrolling:
sudo disk_patrol --config /etc/disk_patrol/config.toml
# Devices to patrol
devices = ["/dev/sda", "/dev/nvme0n1"]
# Complete patrol cycle duration (days)
patrol_period_days = 30
# Read size per operation (supports KB/MB suffixes)
read_size = "8MB"
# State persistence
state_file = "/var/lib/disk_patrol/state.json"
# Logging
log_file = "/var/log/disk_patrol.log"
verbose = false
use_syslog = true
syslog_facility = "daemon"
# Email alerts
email_alerts = true
smtp_server = "smtp.example.com"
smtp_port = 587
smtp_use_starttls = true
smtp_username = "disk_patrol@example.com"
smtp_password = "your_password_here"
alert_from = "disk_patrol@example.com"
alert_to = ["admin@example.com", "ops@example.com"]
error_threshold = 5
# Performance tuning
show_progress = true
enable_jitter = true
max_jitter_percent = 25
disk_patrol [OPTIONS] [DEVICES]...
OPTIONS:
-c, --config <FILE> Configuration file path (TOML format) [default: /etc/disk_patrol/config.toml]
--generate-config [<FILE>] Generate example configuration file (defaults to /etc/disk_patrol/config.toml)
--merge-config Merge configuration file with command line and exit
--reset Reset device state
--reset-all Reset state for all devices
-p, --period <DAYS> Patrol period in days [default: 30]
-r, --read-size <BYTES> Read size per operation [default: 8MB]
--seek <PERCENT> Percentage to offset starting position [default: 0]
-s, --state-file <FILE> State file path [default: /var/lib/disk_patrol/state.json]
-v, --verbose Verbose output
--no-verbose Disable verbose output
-d, --debug Debug output
--no-debug Disable debug output
--status Show status and exit
--error-threshold <COUNT> Number of errors before sending alert [default: 5]
--progress Show progress bars for each device
--no-progress Disable progress bars
--min-sleep <SECONDS> Minimum sleep interval between reads [default: 5]
--max-sleep <SECONDS> Maximum sleep interval between reads [default: 300]
--enable-jitter Enable random timing jitter to spread I/O operations
--max-jitter <PERCENT> Maximum jitter as percentage of sleep interval (0-100) [default: 25]
-h, --help Print help
-V, --version Print version
LOGGING:
--syslog Enable syslog logging
--no-syslog Disable syslog logging
--syslog-facility <FACILITY> Syslog facility (daemon, user, local0-local7) [default: daemon]
EMAIL ALERTS:
--email-alerts Enable email alerts
--no-email-alerts Disable email alerts
--smtp-server <SERVER> SMTP server hostname
--smtp-port <PORT> SMTP server port [default: 587]
--smtp-starttls Use STARTTLS for SMTP connection
--smtp-username <USERNAME> SMTP username
--smtp-password <PASSWORD> SMTP password
--alert-from <EMAIL> Alert sender email address
--alert-to <EMAIL> Alert recipient email address
--test-email Test email
# Patrol a single disk
sudo disk_patrol /dev/sda
# Patrol multiple disks
sudo disk_patrol /dev/sda /dev/sdb /dev/nvme0n1
# Use configuration file
sudo disk_patrol --config /etc/disk_patrol/config.toml
# Check patrol status
sudo disk_patrol --config /etc/disk_patrol/config.toml --status
# Reset patrol state for specific device
sudo disk_patrol --config /etc/disk_patrol/config.toml --reset /dev/sda
# Reset all device states
sudo disk_patrol --config /etc/disk_patrol/config.toml --reset-all
# Test email configuration
sudo disk_patrol --config /etc/disk_patrol/config.toml --test-email
# Run with verbose output
sudo disk_patrol --config /etc/disk_patrol/config.toml --verbose --progress
- Initialization: Reads device sizes and loads previous state
- Scheduling: Calculates optimal read intervals based on device size and patrol period
- Reading: Performs O_DIRECT reads at calculated positions
- Error Tracking: Records any I/O errors with timestamp and sector information
- Progress: Updates position and saves state periodically
- Alerting: Sends email notifications when error thresholds are exceeded
To prevent synchronized I/O spikes when monitoring multiple devices, Disk Patrol supports timing jitter:
- Base Interval:
patrol_period / (device_size / read_size)
- Jitter: Random variation up to
max_jitter_percent
of base interval - Distribution: Jitter is randomly split between pre-read and post-read delays
Disk Patrol uses a single shared buffer for all devices, minimizing memory usage while maintaining high performance. The buffer is aligned for O_DIRECT operations and sized according to the configured read size.
Create /etc/systemd/system/disk-patrol.service
:
[Unit]
Description=Disk Patrol - Continuous disk health monitoring
After=network.target
[Service]
Type=simple
ExecStart=/usr/local/bin/disk_patrol --config /etc/disk_patrol/config.toml
Restart=on-failure
RestartSec=30
User=root
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
Enable and start:
sudo systemctl enable disk-patrol
sudo systemctl start disk-patrol
Disk Patrol can expose metrics for Prometheus monitoring:
disk_patrol_errors_total
: Total errors per devicedisk_patrol_progress_percent
: Patrol progress percentagedisk_patrol_last_error_time
: Timestamp of last error
Disk Patrol logs can be parsed for monitoring:
# Count errors per device
grep "I/O Error" /var/log/disk_patrol.log | awk '{print $4}' | sort | uniq -c
# Recent errors
journalctl -u disk-patrol --since "1 hour ago" | grep ERROR
- Read Size: Larger reads (8-16MB) are more efficient but may impact system responsiveness
- Patrol Period: Longer periods reduce I/O load but increase time to detect errors
- Jitter: Helps prevent I/O storms when multiple devices complete reads simultaneously
- O_DIRECT: Bypasses cache but requires aligned buffers and may not work on all filesystems
- Permission Denied: Run with sudo or as root
- Device Not Found: Check device path exists and is a block device
- Email Not Sending: Verify SMTP settings and test with
--test-email
- High I/O Impact: Increase patrol period or reduce read size
# Check device accessibility
ls -la /dev/sda
lsblk
# Verify state file
cat /var/lib/disk_patrol/state.json | jq .
# Test specific device
sudo disk_patrol /dev/sda --verbose --period 1
This project is licensed under the MIT License - see the LICENSE file for details.
- Inspired by the need for proactive disk failure detection
- Built with Rust for performance and reliability
- Uses tokio for async I/O and concurrency
- Developed in conjunction with Claude 4.0
- SMART data integration
- Prometheus metrics exporter