Disk Patrol

A high-performance, multi-threaded disk health monitoring tool for Linux systems. Disk Patrol continuously reads from block devices to detect and report I/O errors, helping identify failing sectors before they cause data loss.

Features

Continuous Monitoring: Patrols entire disk surfaces over configurable time periods
Multi-Device Support: Monitor multiple disks concurrently with independent patrol cycles
Smart Scheduling: Adaptive sleep intervals with optional jitter to spread I/O load
Error Detection: Tracks and reports read errors with sector-level precision
Email Alerts: Configurable SMTP notifications when error thresholds are exceeded
Progress Tracking: Real-time progress bars showing patrol status and error counts
State Persistence: Resumes patrol from last position after restarts
O_DIRECT I/O: Bypasses page cache for accurate hardware error detection
Flexible Logging: Support for file logging and syslog integration

Installation

From Source

# Install Rust if not already installed
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# Clone and build
git clone https://github.com/permezel/disk-patrol.git
cd disk-patrol
cargo build --release

# Install binary
sudo cp target/release/disk_patrol /usr/local/bin/

# Create directories
sudo mkdir -p /etc/disk_patrol /var/lib/disk_patrol /var/log

Dependencies

Rust 1.70 or later
Linux kernel with O_DIRECT support
Block device access (typically requires root)

Quick Start

Generate a configuration file:

disk_patrol --generate-config /etc/disk_patrol/config.toml

Edit the configuration to specify your devices:

sudo vi /etc/disk_patrol/config.toml

Start patrolling:

sudo disk_patrol --config /etc/disk_patrol/config.toml

Configuration

Configuration File (TOML)

# Devices to patrol
devices = ["/dev/sda", "/dev/nvme0n1"]

# Complete patrol cycle duration (days)
patrol_period_days = 30

# Read size per operation (supports KB/MB suffixes)
read_size = "8MB"

# State persistence
state_file = "/var/lib/disk_patrol/state.json"

# Logging
log_file = "/var/log/disk_patrol.log"
verbose = false
use_syslog = true
syslog_facility = "daemon"

# Email alerts
email_alerts = true
smtp_server = "smtp.example.com"
smtp_port = 587
smtp_use_starttls = true
smtp_username = "disk_patrol@example.com"
smtp_password = "your_password_here"
alert_from = "disk_patrol@example.com"
alert_to = ["admin@example.com", "ops@example.com"]
error_threshold = 5

# Performance tuning
show_progress = true
enable_jitter = true
max_jitter_percent = 25

Command Line Options

disk_patrol [OPTIONS] [DEVICES]...

OPTIONS:
  -c, --config <FILE>               Configuration file path (TOML format) [default: /etc/disk_patrol/config.toml]
      --generate-config [<FILE>]    Generate example configuration file (defaults to /etc/disk_patrol/config.toml)
      --merge-config                Merge configuration file with command line and exit
      --reset                       Reset device state
      --reset-all                   Reset state for all devices
  -p, --period <DAYS>               Patrol period in days [default: 30]
  -r, --read-size <BYTES>           Read size per operation [default: 8MB]
      --seek <PERCENT>              Percentage to offset starting position [default: 0]
  -s, --state-file <FILE>           State file path [default: /var/lib/disk_patrol/state.json]
  -v, --verbose                     Verbose output
      --no-verbose                  Disable verbose output
  -d, --debug                       Debug output
      --no-debug                    Disable debug output
      --status                      Show status and exit
      --error-threshold <COUNT>     Number of errors before sending alert [default: 5]
      --progress                    Show progress bars for each device
      --no-progress                 Disable progress bars
      --min-sleep <SECONDS>         Minimum sleep interval between reads [default: 5]
      --max-sleep <SECONDS>         Maximum sleep interval between reads [default: 300]
      --enable-jitter               Enable random timing jitter to spread I/O operations
      --max-jitter <PERCENT>        Maximum jitter as percentage of sleep interval (0-100) [default: 25]
  -h, --help                        Print help
  -V, --version                     Print version

LOGGING:
      --syslog                      Enable syslog logging
      --no-syslog                   Disable syslog logging
      --syslog-facility <FACILITY>  Syslog facility (daemon, user, local0-local7) [default: daemon]

EMAIL ALERTS:
      --email-alerts                Enable email alerts
      --no-email-alerts             Disable email alerts
      --smtp-server <SERVER>        SMTP server hostname
      --smtp-port <PORT>            SMTP server port [default: 587]
      --smtp-starttls               Use STARTTLS for SMTP connection
      --smtp-username <USERNAME>    SMTP username
      --smtp-password <PASSWORD>    SMTP password
      --alert-from <EMAIL>          Alert sender email address
      --alert-to <EMAIL>            Alert recipient email address
      --test-email                  Test email

Usage Examples

Basic Usage

# Patrol a single disk
sudo disk_patrol /dev/sda

# Patrol multiple disks
sudo disk_patrol /dev/sda /dev/sdb /dev/nvme0n1

# Use configuration file
sudo disk_patrol --config /etc/disk_patrol/config.toml

Status and Management

# Check patrol status
sudo disk_patrol --config /etc/disk_patrol/config.toml --status

# Reset patrol state for specific device
sudo disk_patrol --config /etc/disk_patrol/config.toml --reset /dev/sda

# Reset all device states
sudo disk_patrol --config /etc/disk_patrol/config.toml --reset-all

Testing and Debugging

# Test email configuration
sudo disk_patrol --config /etc/disk_patrol/config.toml --test-email

# Run with verbose output
sudo disk_patrol --config /etc/disk_patrol/config.toml --verbose --progress

How It Works

Patrol Algorithm

Initialization: Reads device sizes and loads previous state
Scheduling: Calculates optimal read intervals based on device size and patrol period
Reading: Performs O_DIRECT reads at calculated positions
Error Tracking: Records any I/O errors with timestamp and sector information
Progress: Updates position and saves state periodically
Alerting: Sends email notifications when error thresholds are exceeded

Timing and Jitter

To prevent synchronized I/O spikes when monitoring multiple devices, Disk Patrol supports timing jitter:

Base Interval: patrol_period / (device_size / read_size)
Jitter: Random variation up to max_jitter_percent of base interval
Distribution: Jitter is randomly split between pre-read and post-read delays

Memory Efficiency

Disk Patrol uses a single shared buffer for all devices, minimizing memory usage while maintaining high performance. The buffer is aligned for O_DIRECT operations and sized according to the configured read size.

Systemd Service

Create /etc/systemd/system/disk-patrol.service:

[Unit]
Description=Disk Patrol - Continuous disk health monitoring
After=network.target

[Service]
Type=simple
ExecStart=/usr/local/bin/disk_patrol --config /etc/disk_patrol/config.toml
Restart=on-failure
RestartSec=30
User=root
StandardOutput=journal
StandardError=journal

[Install]
WantedBy=multi-user.target

Enable and start:

sudo systemctl enable disk-patrol
sudo systemctl start disk-patrol

Monitoring Integration

Prometheus Metrics (Future)

Disk Patrol can expose metrics for Prometheus monitoring:

disk_patrol_errors_total: Total errors per device
disk_patrol_progress_percent: Patrol progress percentage
disk_patrol_last_error_time: Timestamp of last error

Log Analysis

Disk Patrol logs can be parsed for monitoring:

# Count errors per device
grep "I/O Error" /var/log/disk_patrol.log | awk '{print $4}' | sort | uniq -c

# Recent errors
journalctl -u disk-patrol --since "1 hour ago" | grep ERROR

Performance Considerations

Read Size: Larger reads (8-16MB) are more efficient but may impact system responsiveness
Patrol Period: Longer periods reduce I/O load but increase time to detect errors
Jitter: Helps prevent I/O storms when multiple devices complete reads simultaneously
O_DIRECT: Bypasses cache but requires aligned buffers and may not work on all filesystems

Troubleshooting

Common Issues

Permission Denied: Run with sudo or as root
Device Not Found: Check device path exists and is a block device
Email Not Sending: Verify SMTP settings and test with --test-email
High I/O Impact: Increase patrol period or reduce read size

Debug Commands

# Check device accessibility
ls -la /dev/sda
lsblk

# Verify state file
cat /var/lib/disk_patrol/state.json | jq .

# Test specific device
sudo disk_patrol /dev/sda --verbose --period 1

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Inspired by the need for proactive disk failure detection
Built with Rust for performance and reliability
Uses tokio for async I/O and concurrency
Developed in conjunction with Claude 4.0

Roadmap

SMART data integration
Prometheus metrics exporter

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.cargo		.cargo
src		src
.gitignore		.gitignore
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build-x86.sh		build-x86.sh
build.sh		build.sh
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Disk Patrol

Features

Installation

From Source

Dependencies

Quick Start

Configuration

Configuration File (TOML)

Command Line Options

Usage Examples

Basic Usage

Status and Management

Testing and Debugging

How It Works

Patrol Algorithm

Timing and Jitter

Memory Efficiency

Systemd Service

Monitoring Integration

Prometheus Metrics (Future)

Log Analysis

Performance Considerations

Troubleshooting

Common Issues

Debug Commands

License

Acknowledgments

Roadmap

About

Uh oh!

Releases

Packages

Languages

License

permezel/disk_patrol

Folders and files

Latest commit

History

Repository files navigation

Disk Patrol

Features

Installation

From Source

Dependencies

Quick Start

Configuration

Configuration File (TOML)

Command Line Options

Usage Examples

Basic Usage

Status and Management

Testing and Debugging

How It Works

Patrol Algorithm

Timing and Jitter

Memory Efficiency

Systemd Service

Monitoring Integration

Prometheus Metrics (Future)

Log Analysis

Performance Considerations

Troubleshooting

Common Issues

Debug Commands

License

Acknowledgments

Roadmap

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages