ia-get

High-performance file downloader for archive.org

Made with 💝 by 🤖

Usage 📖

Simply pass the URL of an archive.org details page you want to download and ia-get will automatically fetch the JSON metadata and download all files with blazing speed.

ia-get https://archive.org/details/<identifier>

Advanced Usage 🚀

# Concurrent downloads with compression
ia-get --compress --decompress https://archive.org/details/your_archive

# Filter by file types
ia-get --include-ext pdf,epub https://archive.org/details/books_archive

# Limit file sizes  
ia-get --max-file-size 100MB https://archive.org/details/data_archive

# Specify output directory
ia-get --output ./downloads https://archive.org/details/software_archive

Why? 🤔💭

I wanted to download high-quality scans of ZZap!64 magazine and some read-only memory from archive.org. Archives of this type often include many large files, torrents are not always provided and when they are available they do not index all the available files in the archive.

Archive.org provides comprehensive JSON APIs for every collection that indexes every file available. So I co-authored ia-get to automate the download process with maximum efficiency and reliability.

✨ Features

Core Functionality

🔽 Fast Concurrent Downloads: Parallel downloading with configurable concurrency limits
🌳 Directory Structure: Preserves the original archive directory structure
🔄 Smart Resume: Automatically resumes partial or failed downloads
🔏 Integrity Verification: MD5 hash checks to confirm file integrity
🌱 Incremental Updates: Can be run multiple times to update existing downloads
📊 Complete Metadata: Fetches all metadata for the archive using JSON API

Advanced Features

🗜️ Compression Support: HTTP compression + automatic decompression of archives
🎯 Smart Filtering: Filter by file extension, size, or custom patterns
📈 Progress Tracking: Real-time progress bars and download statistics
� Session Management: Persistent download sessions for large archives
🛡️ Error Recovery: Robust retry logic for network failures
�📦 Cross-Platform: Available for Linux 🐧 macOS 🍏 and Windows 🪟

Technical Improvements

⚡ JSON-First: Uses Internet Archive's modern JSON API (no legacy XML)
🧹 Clean Codebase: Comprehensive refactoring with extensive documentation
🧪 Well-Tested: Full test suite with 27+ unit tests
📚 Rich Documentation: In-depth API documentation and examples

🗜️ Compression Support

ia-get includes comprehensive compression features powered by modern JSON APIs:

HTTP Compression: Automatically reduces bandwidth usage during downloads
Auto-Detection: Detects compressed files from Internet Archive metadata
Multiple Formats: Support for gzip, bzip2, xz, tar, and combined formats (tar.gz, tar.bz2, tar.xz)
Transparent Decompression: Automatically decompresses files after download
Configurable: Choose which formats to decompress automatically

# Enable compression and auto-decompression
ia-get --compress --decompress https://archive.org/details/your_archive

# Decompress specific formats only
ia-get --decompress --decompress-formats gzip,bzip2 https://archive.org/details/your_archive

See docs/COMPRESSION.md for detailed compression documentation.

🏗️ Architecture

The codebase has been completely refactored for performance and maintainability:

JSON-First Design: Uses Internet Archive's modern JSON API exclusively
Concurrent Engine: Enhanced parallel downloader with session tracking
Session Management: Persistent download state for large archives
Error Recovery: Comprehensive retry logic for network failures
Clean Abstractions: Well-documented modules with clear responsibilities

Key Modules

metadata: JSON metadata fetching with retry logic
enhanced_downloader: Main download engine with session support
concurrent_simple: Enhanced concurrent downloader
compression: Automatic decompression utilities
filters: File filtering and formatting utilities

Sharing is caring 🤝

You can use ia-get to download files from archive.org, including all the metadata and the .torrent file, if there is one. You can the start seeding the torrent using a pristine copy of the archive, and a complete file set.

Demo 🧑‍💻

Development 🏗️

ia-get is built with modern Rust practices and comprehensive testing.

# Build the project
cargo build

# Build optimized release
cargo build --release

# Run with development features
cargo run -- https://archive.org/details/your_archive

Code Quality 🧪

The codebase maintains high quality standards:

# Run all tests (27+ unit tests)
cargo test

# Check code formatting
cargo fmt --check

# Run linting
cargo clippy

# Check compilation
cargo check

Recent Improvements 🚀

The codebase has undergone major improvements:

✅ Removed Legacy XML Support: Migrated entirely to JSON API
✅ Enhanced Concurrent Downloader: Added session tracking and progress reporting
✅ Comprehensive Documentation: Added extensive inline documentation and examples
✅ Code Cleanup: Removed orphaned files and improved error handling
✅ Test Coverage: Updated tests to work with new JSON-only architecture

Manual Tests 🤞

I used these commands to test ia-get during development.

ia-get https://archive.org/details/deftributetozzap64
ia-get https://archive.org/details/zzapp_64_issue_001_600dpi

A.I. Driven Development 🤖

This program is an ongoing experiment 🧪 in AI-assisted development. The project has evolved through multiple phases:

Development History

Phase 1 (Late 2023): Initially co-authored using Chatty Jeeps (ChatGPT-4). I had no Rust experience and wanted to see if AI could help me build a program in an unfamiliar language.

Phase 2 (Oct-Dec 2023): Used Unfold.ai to add features and improve the code. All AI-assisted commits from this period have full details in the commit messages.

Phase 3 (Jan 2024): Community input from Linux Matters listener Daniel Dewberry who submitted a comprehensive peer review.

Phase 4 (May 2025): Major refactoring and modernization using GitHub Copilot with Claude Sonnet 3.5, including:

Complete migration from XML to JSON APIs
Enhanced concurrent downloading with session management
Comprehensive code cleanup and documentation
Modern Rust practices and error handling

Featured Coverage

As featured on Linux Matters podcast! 🎙️ The initial version was discussed in Episode 16 - Blogging to the Fediverse, covering the AI development process, successes, and challenges.

Linux Matters Podcast

Through this journey, I've learned Rust fundamentals and modern development practices, with each phase building on the previous work to create a robust, well-documented tool.

Name		Name	Last commit message	Last commit date
Latest commit History 166 Commits
.github		.github
.vscode		.vscode
assets		assets
docs		docs
examples		examples
src		src
tests		tests
.envrc		.envrc
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
ENHANCEMENTS.md		ENHANCEMENTS.md
LICENSE		LICENSE
README.md		README.md
commute_test.json		commute_test.json
flake.lock		flake.lock
flake.nix		flake.nix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

ia-get

Usage 📖

Advanced Usage 🚀

Why? 🤔💭

✨ Features

Core Functionality

Advanced Features

Technical Improvements

🗜️ Compression Support

🏗️ Architecture

Key Modules

Sharing is caring 🤝

Demo 🧑‍💻

Development 🏗️

Code Quality 🧪

Recent Improvements 🚀

Manual Tests 🤞

A.I. Driven Development 🤖

Development History

Featured Coverage

About

Uh oh!

Releases

Packages

Languages

License

Gameaday/ia-get-cli

Folders and files

Latest commit

History

Repository files navigation

ia-get

Usage 📖

Advanced Usage 🚀

Why? 🤔💭

✨ Features

Core Functionality

Advanced Features

Technical Improvements

🗜️ Compression Support

🏗️ Architecture

Key Modules

Sharing is caring 🤝

Demo 🧑‍💻

Development 🏗️

Code Quality 🧪

Recent Improvements 🚀

Manual Tests 🤞

A.I. Driven Development 🤖

Development History

Featured Coverage

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages