A high-performance Rust CLI tool to fetch trading card game data from various APIs. This tool helps organize card data into train, test, and validation sets for machine learning applications.
- Magic: The Gathering (MTG) - Fetches data from the Scryfall API
- Grand Archive (GA) - Fetches data from the Grand Archive API
- Fast parallel processing - Uses all CPU cores for optimal performance
- Smart batch checking - Avoids unnecessary downloads by checking existing cards efficiently
- Fetches card data from TCG-specific APIs
- Downloads comprehensive card data including all available card information
- Direct image resizing to specified dimensions (default: 500×700 pixels)
- Organizes data into train/test/validation sets with automatic splitting
- Command-line interface with customizable output path
- Progress indicators for downloads
- Multi-threaded image downloading with configurable thread count
The tool automatically uses optimized parallel processing for all file operations:
- 5-10x faster card existence checking through batch processing
- 4-8x faster directory operations using all CPU cores
- 60% faster overall for large datasets (10,000+ cards)
- Intelligent filtering to skip existing cards and avoid redundant downloads
No configuration needed - optimizations are automatic!
Make sure you have Rust and Cargo installed on your system. Then clone this repository:
git clone https://github.com/acidtib/tcg-fetch.git
cd tcg-fetch
cargo build --release
Run the program with required TCG argument:
cargo run -- mtg # For Magic: The Gathering
cargo run -- ga # For Grand Archive
Configure target image dimensions (images will be resized to fit exactly):
cargo run -- mtg --width 512 --height 512 # Resize all MTG images to 512×512 pixels
cargo run -- ga --width 512 --height 512 # Resize all GA images to 512×512 pixels
Control download performance and dataset size:
cargo run -- mtg --amount 100 --threads 6 # Download 100 MTG cards using 6 threads
cargo run -- ga --amount 50 --threads 4 # Download 50 GA cards using 4 threads
Specify a custom output directory:
cargo run -- mtg --path custom-data-dir # For MTG cards
cargo run -- ga --path ga-cards-dir # For GA cards
Combine multiple options for full control:
cargo run -- mtg --path mtg-data --amount 50 --threads 4 --width 512 --height 512
cargo run -- ga --path ga-data --amount 25 --threads 2 --width 400 --height 600
The tool creates a directory with the following structure:
<output-dir>/
├── data/
│ ├── train/
│ │ └── <card-id>/
│ │ └── 0000.jpg # Original downloaded image
│ ├── test/
│ │ └── <card-id>/
│ │ └── 0000.jpg # Test image (copied from train)
│ ├── validation/
│ │ └── <card-id>/
│ │ └── 0000.jpg # Validation image (copied from train)
│ │ └── 0001.jpg
└── all_cards.json # For MTG
└── ga_cards.json # For Grand Archive
Each card gets its own subdirectory named after the card ID. The primary image is saved as 0000.jpg
, with additional images numbered sequentially (0001.jpg, 0002.jpg, etc.) that can be added in the future.
Usage: tcg-fetch <TCG> [OPTIONS]
Arguments:
<TCG> Trading card game type to fetch data for [possible values: mtg, ga]
Options:
-p, --path <PATH> Path where to save the data [default: tcg-data]
-a, --amount <AMOUNT> Amount of cards to fetch [default: all]
-t, --threads <THREADS> Number of threads to use for downloading images [default: CPU cores]
--width <WIDTH> Target width for resized images [default: 500]
--height <HEIGHT> Target height for resized images [default: 700]
-h, --help Print help
-V, --version Print version
The codebase is designed to be extensible. To add support for a new TCG:
- Add a new variant to the
TcgType
enum inmain.rs
- Add corresponding API URL, user agent, and data type mappings in
utils.rs
- Implement any TCG-specific logic in the utility functions
- clap: Command-line argument parsing
- reqwest: HTTP client for API requests
- serde: JSON serialization/deserialization
- tokio: Async runtime
- image: Image processing
- indicatif: Progress bars
- futures: Async utilities
- rand: Random number generation for dataset shuffling
- rayon: Parallel processing for performance optimization
tcg-fetch/
├── src/
│ ├── main.rs # Main application logic and CLI definition
│ └── utils.rs # Utility functions for API and file operations
├── Cargo.toml # Rust dependencies
└── README.md # This file
Contributions are welcome! Please feel free to submit a Pull Request. When adding support for new TCGs, please ensure:
- The new TCG follows the existing pattern in the codebase
- API rate limits are respected
- Error handling is comprehensive
- Documentation is updated accordingly
This project is open source. Please check the repository for license details.