A web scraper for downloading high-resolution artwork images from the Fabienne Vincent gallery website using Playwright.
- 🎨 Gallery Crawling - Automatically discovers and downloads from all gallery categories
- 🖼️ High-Resolution Images - Gets full-size images from lightbox popups
- 📁 Smart Organization - Organizes images into category folders
- 🏷️ Auto-Renaming - Renames files using artwork titles for better organization
- 📊 Metadata Collection - Saves detailed information about each artwork
- 🔄 Duplicate Prevention - Avoids re-downloading existing images
Run the complete workflow automatically - crawl, download, rename, and organize:
python gallery_tool.py
This will:
- Crawl the gallery and download images
- Automatically rename files using artwork titles
- Organize into cleaned category folders
- Save metadata for all images
- Python 3.11+
- Install dependencies:
pip install -r requirements.txt
- Discovers Categories - Finds all gallery categories automatically
- Crawls Systematically - Downloads from each category completely before moving to the next
- High-Quality Images - Clicks each image to get full-resolution versions
- Smart Naming - Renames files like
tigre_du_bengale_419.jpg
instead of739467410189419.jpg
- Organized Storage - Creates folders like
fauves/
,animaux-d-afrique/
, etc.
assets/
└── crawl_runs/
└── 20241208_143022/ # Timestamped run
├── images/ # Downloaded images
│ ├── fauves/ # Category folders
│ ├── animaux-d-afrique/
│ └── ...
└── metadata.json # Image details and metadata
Edit config.py
to change:
- Target website URL
- Download delays and timeouts
- File organization preferences
- Each crawl session creates a timestamped folder
- Respects the website with proper delays between requests
- Handles pagination automatically within each category
- Skips duplicate downloads across multiple runs