imgscraper for artists from the artist: parser - collecting references - description, features and dependencies
🎨 Description:
The program is designed for automatic parsing primarily references for artists, downloading, sorting and filtering them from:
- Pinterest (with support for full-size images)
- Google Images
- Bing Images
💡 Does not require third-party APIs, collection occurs through Selenium/webdriver-manager/icrawler locally.
🚀 Main features:
• Search images by keywords, categories, poses and additional (separate) requests
• Automatic filtering:
- Check for a person in the frame (YOLO + you can change the trigger parameter)
- Detect a full-length person in the image (you can change the trigger parameter)
- Filter low-quality files (<10 KB)
- Filter for black and white images (you can combine Poses + Additional requests = "black and white" + Filter black and white)
- Check and remove duplicates via pHash (you can change the trigger parameter)
• Work with history: exclude already loaded images (seen_urls.json)
• Save full-size URLs in `full_urls.log`
• Load full-res images
• Parse Pinterest in windowless\headless mode
• Index folders for faster duplicate checking
• Real-time work log on the "Debug" tab
• Save program settings between launches
• Selecting the minimum image size (wide\height) when parsing
• Selecting a preset for parsing poses (foreshortening\dynamic action\perspective pack)
• Selecting the parsing mode for sketches\lines only
• Full randomization of the search when parsing (helpful when you search for new material through search engines)
📝 Notes:
- The first launch may be long due to the absence of some dependent libraries on the device (like YOLO\WebDriver-manager)
⚡ Future improvements:
- New classes to search (animals and items)
- Filter by precise poses and number of people in image
- Pose estimation (Keypoint detection), checks that the frame contains visible limbs/pose
- Automatic model selection: if the task is general, take the standard yolov8n.pt for speed, or yolov8m.pt for quality
- Visualization and filter preview - open a window with examples of recognized frames, highlighted boxes and statistics (how many were filtered out, how many were left)
- Asynchronous loading + processing
- Advanced content analysis (YOLO + Pose Estimation + OCR + scene recognition: selection by facial expression/body position)
- Watermark removal, filters by main (dominant) color and composition
- Automatic sorting by categories/tags
- Built-in ban protection: User-Agent rotation, random delays and random browser emulation
- Export results to ZIP/JSON
- GUI and UX: gonna be like a Figma/Notion style
👨💻 Authors:
- Logic developer and tester: Hara
- Development and code: ChatGPT
📦 Dependencies:
- Python 3.x
- PyQt5
- Selenium
- webdriver-manager
- icrawler
- pillow
- requests
- beautifulsoup4
- imagehash
- ultralitycs




1. clone the repository
git clone https://github.com/rambaeharambae/img_drawing_reference_scraper.git
2. install dependencies
pip install -r requirements.txt
3. run via python
python imgscraper.py
4. build an .exe executable file
pyinstaller --onefile --noconsole imgscraper.py
go to release and download .exe
If you get too much not what you wanted to parse (heads instead of full body person) — change full-body ratio threshold or use additional requests or both.
---
The first launch could be long if you're missing some dependencies.