Skip to content

Gameaday/ia-get-cli

 
 

Repository files navigation

ia-get
ia-get

High-performance file downloader for archive.org

GitHub all releases

Made with 💝 by 🤖

Usage 📖

Simply pass the URL of an archive.org details page you want to download and ia-get will automatically fetch the JSON metadata and download all files with blazing speed.

ia-get https://archive.org/details/<identifier>

Advanced Usage 🚀

# Concurrent downloads with compression
ia-get --compress --decompress https://archive.org/details/your_archive

# Filter by file types
ia-get --include-ext pdf,epub https://archive.org/details/books_archive

# Limit file sizes  
ia-get --max-file-size 100MB https://archive.org/details/data_archive

# Specify output directory
ia-get --output ./downloads https://archive.org/details/software_archive

Why? 🤔💭

I wanted to download high-quality scans of ZZap!64 magazine and some read-only memory from archive.org. Archives of this type often include many large files, torrents are not always provided and when they are available they do not index all the available files in the archive.

Archive.org provides comprehensive JSON APIs for every collection that indexes every file available. So I co-authored ia-get to automate the download process with maximum efficiency and reliability.

✨ Features

Core Functionality

  • 🔽 Fast Concurrent Downloads: Parallel downloading with configurable concurrency limits
  • 🌳 Directory Structure: Preserves the original archive directory structure
  • 🔄 Smart Resume: Automatically resumes partial or failed downloads
  • 🔏 Integrity Verification: MD5 hash checks to confirm file integrity
  • 🌱 Incremental Updates: Can be run multiple times to update existing downloads
  • 📊 Complete Metadata: Fetches all metadata for the archive using JSON API

Advanced Features

  • 🗜️ Compression Support: HTTP compression + automatic decompression of archives
  • 🎯 Smart Filtering: Filter by file extension, size, or custom patterns
  • 📈 Progress Tracking: Real-time progress bars and download statistics
  • Session Management: Persistent download sessions for large archives
  • 🛡️ Error Recovery: Robust retry logic for network failures
  • �📦 Cross-Platform: Available for Linux 🐧 macOS 🍏 and Windows 🪟

Technical Improvements

  • JSON-First: Uses Internet Archive's modern JSON API (no legacy XML)
  • 🧹 Clean Codebase: Comprehensive refactoring with extensive documentation
  • 🧪 Well-Tested: Full test suite with 27+ unit tests
  • 📚 Rich Documentation: In-depth API documentation and examples

🗜️ Compression Support

ia-get includes comprehensive compression features powered by modern JSON APIs:

  • HTTP Compression: Automatically reduces bandwidth usage during downloads
  • Auto-Detection: Detects compressed files from Internet Archive metadata
  • Multiple Formats: Support for gzip, bzip2, xz, tar, and combined formats (tar.gz, tar.bz2, tar.xz)
  • Transparent Decompression: Automatically decompresses files after download
  • Configurable: Choose which formats to decompress automatically
# Enable compression and auto-decompression
ia-get --compress --decompress https://archive.org/details/your_archive

# Decompress specific formats only
ia-get --decompress --decompress-formats gzip,bzip2 https://archive.org/details/your_archive

See docs/COMPRESSION.md for detailed compression documentation.

🏗️ Architecture

The codebase has been completely refactored for performance and maintainability:

  • JSON-First Design: Uses Internet Archive's modern JSON API exclusively
  • Concurrent Engine: Enhanced parallel downloader with session tracking
  • Session Management: Persistent download state for large archives
  • Error Recovery: Comprehensive retry logic for network failures
  • Clean Abstractions: Well-documented modules with clear responsibilities

Key Modules

  • metadata: JSON metadata fetching with retry logic
  • enhanced_downloader: Main download engine with session support
  • concurrent_simple: Enhanced concurrent downloader
  • compression: Automatic decompression utilities
  • filters: File filtering and formatting utilities

Sharing is caring 🤝

You can use ia-get to download files from archive.org, including all the metadata and the .torrent file, if there is one. You can the start seeding the torrent using a pristine copy of the archive, and a complete file set.

Demo 🧑‍💻

ia-get demo

Development 🏗️

ia-get is built with modern Rust practices and comprehensive testing.

# Build the project
cargo build

# Build optimized release
cargo build --release

# Run with development features
cargo run -- https://archive.org/details/your_archive

Code Quality 🧪

The codebase maintains high quality standards:

# Run all tests (27+ unit tests)
cargo test

# Check code formatting
cargo fmt --check

# Run linting
cargo clippy

# Check compilation
cargo check

Recent Improvements 🚀

The codebase has undergone major improvements:

  • Removed Legacy XML Support: Migrated entirely to JSON API
  • Enhanced Concurrent Downloader: Added session tracking and progress reporting
  • Comprehensive Documentation: Added extensive inline documentation and examples
  • Code Cleanup: Removed orphaned files and improved error handling
  • Test Coverage: Updated tests to work with new JSON-only architecture

Manual Tests 🤞

I used these commands to test ia-get during development.

ia-get https://archive.org/details/deftributetozzap64
ia-get https://archive.org/details/zzapp_64_issue_001_600dpi

A.I. Driven Development 🤖

This program is an ongoing experiment 🧪 in AI-assisted development. The project has evolved through multiple phases:

Development History

Phase 1 (Late 2023): Initially co-authored using Chatty Jeeps (ChatGPT-4). I had no Rust experience and wanted to see if AI could help me build a program in an unfamiliar language.

Phase 2 (Oct-Dec 2023): Used Unfold.ai to add features and improve the code. All AI-assisted commits from this period have full details in the commit messages.

Phase 3 (Jan 2024): Community input from Linux Matters listener Daniel Dewberry who submitted a comprehensive peer review.

Phase 4 (May 2025): Major refactoring and modernization using GitHub Copilot with Claude Sonnet 3.5, including:

  • Complete migration from XML to JSON APIs
  • Enhanced concurrent downloading with session management
  • Comprehensive code cleanup and documentation
  • Modern Rust practices and error handling

Featured Coverage

As featured on Linux Matters podcast! 🎙️ The initial version was discussed in Episode 16 - Blogging to the Fediverse, covering the AI development process, successes, and challenges.

Linux Matters Podcast
Linux Matters Podcast

Through this journey, I've learned Rust fundamentals and modern development practices, with each phase building on the previous work to create a robust, well-documented tool.

About

File downloader for archive.org ⬇️

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 98.9%
  • Nix 1.1%