High-performance file downloader for archive.org
Made with 💝 by 🤖
Simply pass the URL of an archive.org details page you want to download and ia-get
will automatically fetch the JSON metadata and download all files with blazing speed.
ia-get https://archive.org/details/<identifier>
# Concurrent downloads with compression
ia-get --compress --decompress https://archive.org/details/your_archive
# Filter by file types
ia-get --include-ext pdf,epub https://archive.org/details/books_archive
# Limit file sizes
ia-get --max-file-size 100MB https://archive.org/details/data_archive
# Specify output directory
ia-get --output ./downloads https://archive.org/details/software_archive
I wanted to download high-quality scans of ZZap!64 magazine and some read-only memory from archive.org. Archives of this type often include many large files, torrents are not always provided and when they are available they do not index all the available files in the archive.
Archive.org provides comprehensive JSON APIs for every collection that indexes every file available.
So I co-authored ia-get
to automate the download process with maximum efficiency and reliability.
- 🔽 Fast Concurrent Downloads: Parallel downloading with configurable concurrency limits
- 🌳 Directory Structure: Preserves the original archive directory structure
- 🔄 Smart Resume: Automatically resumes partial or failed downloads
- 🔏 Integrity Verification: MD5 hash checks to confirm file integrity
- 🌱 Incremental Updates: Can be run multiple times to update existing downloads
- 📊 Complete Metadata: Fetches all metadata for the archive using JSON API
- 🗜️ Compression Support: HTTP compression + automatic decompression of archives
- 🎯 Smart Filtering: Filter by file extension, size, or custom patterns
- 📈 Progress Tracking: Real-time progress bars and download statistics
- � Session Management: Persistent download sessions for large archives
- 🛡️ Error Recovery: Robust retry logic for network failures
- �📦 Cross-Platform: Available for Linux 🐧 macOS 🍏 and Windows 🪟
- ⚡ JSON-First: Uses Internet Archive's modern JSON API (no legacy XML)
- 🧹 Clean Codebase: Comprehensive refactoring with extensive documentation
- 🧪 Well-Tested: Full test suite with 27+ unit tests
- 📚 Rich Documentation: In-depth API documentation and examples
ia-get includes comprehensive compression features powered by modern JSON APIs:
- HTTP Compression: Automatically reduces bandwidth usage during downloads
- Auto-Detection: Detects compressed files from Internet Archive metadata
- Multiple Formats: Support for gzip, bzip2, xz, tar, and combined formats (tar.gz, tar.bz2, tar.xz)
- Transparent Decompression: Automatically decompresses files after download
- Configurable: Choose which formats to decompress automatically
# Enable compression and auto-decompression
ia-get --compress --decompress https://archive.org/details/your_archive
# Decompress specific formats only
ia-get --decompress --decompress-formats gzip,bzip2 https://archive.org/details/your_archive
See docs/COMPRESSION.md for detailed compression documentation.
The codebase has been completely refactored for performance and maintainability:
- JSON-First Design: Uses Internet Archive's modern JSON API exclusively
- Concurrent Engine: Enhanced parallel downloader with session tracking
- Session Management: Persistent download state for large archives
- Error Recovery: Comprehensive retry logic for network failures
- Clean Abstractions: Well-documented modules with clear responsibilities
metadata
: JSON metadata fetching with retry logicenhanced_downloader
: Main download engine with session supportconcurrent_simple
: Enhanced concurrent downloadercompression
: Automatic decompression utilitiesfilters
: File filtering and formatting utilities
You can use ia-get
to download files from archive.org, including all the metadata and the .torrent
file, if there is one.
You can the start seeding the torrent using a pristine copy of the archive, and a complete file set.
ia-get is built with modern Rust practices and comprehensive testing.
# Build the project
cargo build
# Build optimized release
cargo build --release
# Run with development features
cargo run -- https://archive.org/details/your_archive
The codebase maintains high quality standards:
# Run all tests (27+ unit tests)
cargo test
# Check code formatting
cargo fmt --check
# Run linting
cargo clippy
# Check compilation
cargo check
The codebase has undergone major improvements:
- ✅ Removed Legacy XML Support: Migrated entirely to JSON API
- ✅ Enhanced Concurrent Downloader: Added session tracking and progress reporting
- ✅ Comprehensive Documentation: Added extensive inline documentation and examples
- ✅ Code Cleanup: Removed orphaned files and improved error handling
- ✅ Test Coverage: Updated tests to work with new JSON-only architecture
I used these commands to test ia-get
during development.
ia-get https://archive.org/details/deftributetozzap64
ia-get https://archive.org/details/zzapp_64_issue_001_600dpi
This program is an ongoing experiment 🧪 in AI-assisted development. The project has evolved through multiple phases:
Phase 1 (Late 2023): Initially co-authored using Chatty Jeeps (ChatGPT-4). I had no Rust experience and wanted to see if AI could help me build a program in an unfamiliar language.
Phase 2 (Oct-Dec 2023): Used Unfold.ai to add features and improve the code. All AI-assisted commits from this period have full details in the commit messages.
Phase 3 (Jan 2024): Community input from Linux Matters listener Daniel Dewberry who submitted a comprehensive peer review.
Phase 4 (May 2025): Major refactoring and modernization using GitHub Copilot with Claude Sonnet 3.5, including:
- Complete migration from XML to JSON APIs
- Enhanced concurrent downloading with session management
- Comprehensive code cleanup and documentation
- Modern Rust practices and error handling
As featured on Linux Matters podcast! 🎙️ The initial version was discussed in Episode 16 - Blogging to the Fediverse, covering the AI development process, successes, and challenges.
Through this journey, I've learned Rust fundamentals and modern development practices, with each phase building on the previous work to create a robust, well-documented tool.