Skip to content

scraper\parser for artists needs with sourcers like: pinterest (without any API), google and bing images, — support automatic sorting, filtering for poses, color etc

License

Notifications You must be signed in to change notification settings

rambaeharambae/img_drawing_reference_scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

41 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

imgscraper for artists from the artist: parser - collecting references - description, features and dependencies

      🎨 Description:
        The program is designed for automatic parsing primarily references for artists, downloading, sorting and filtering them from:
        - Pinterest (with support for full-size images)
        - Google Images
        - Bing Images

        💡 Does not require third-party APIs, collection occurs through Selenium/webdriver-manager/icrawler locally.

       🚀 Main features:
      • Search images by keywords, categories, poses and additional (separate) requests
      • Automatic filtering:
      
      - Check for a person in the frame (YOLO + you can change the trigger parameter)
      - Detect a full-length person in the image (you can change the trigger parameter)
      - Filter low-quality files (<10 KB)
      - Filter for black and white images (you can combine Poses + Additional requests = "black and white" + Filter black and white)
      - Check and remove duplicates via pHash (you can change the trigger parameter)
      
      • Work with history: exclude already loaded images (seen_urls.json)
      • Save full-size URLs in `full_urls.log`
      • Load full-res images
      • Parse Pinterest in windowless\headless mode
      • Index folders for faster duplicate checking
      • Real-time work log on the "Debug" tab
      • Save program settings between launches
      • Selecting the minimum image size (wide\height) when parsing
      • Selecting a preset for parsing poses (foreshortening\dynamic action\perspective pack)
      • Selecting the parsing mode for sketches\lines only
      • Full randomization of the search when parsing (helpful when you search for new material through search engines)

       📝 Notes:
        - The first launch may be long due to the absence of some dependent libraries on the device (like YOLO\WebDriver-manager)

       ⚡ Future improvements:
        - New classes to search (animals and items)
        - Filter by precise poses and number of people in image
        - Pose estimation (Keypoint detection), checks that the frame contains visible limbs/pose
        - Automatic model selection: if the task is general, take the standard yolov8n.pt for speed, or yolov8m.pt for quality
        - Visualization and filter preview - open a window with examples of recognized frames, highlighted boxes and statistics (how many were filtered out, how many were left)
        - Asynchronous loading + processing
        - Advanced content analysis (YOLO + Pose Estimation + OCR + scene recognition: selection by facial expression/body position)
        - Watermark removal, filters by main (dominant) color and composition
        - Automatic sorting by categories/tags
        - Built-in ban protection: User-Agent rotation, random delays and random browser emulation
        - Export results to ZIP/JSON
        - GUI and UX: gonna be like a Figma/Notion style

        👨‍💻 Authors:
        - Logic developer and tester: Hara
        - Development and code: ChatGPT 
      
       📦 Dependencies:
        - Python 3.x
        - PyQt5
        - Selenium
        - webdriver-manager
        - icrawler
        - pillow
        - requests
        - beautifulsoup4
        - imagehash
        - ultralitycs
image image image image

# imgscraper for artists: parser - collecting references - setup

      1. clone the repository
         git clone https://github.com/rambaeharambae/img_drawing_reference_scraper.git
      2. install dependencies
         pip install -r requirements.txt
      3. run via python
         python imgscraper.py
      4. build an .exe executable file
         pyinstaller --onefile --noconsole imgscraper.py

# imgscraper for artists: parser - collecting references - download

      go to release and download .exe

# imgscraper for artists: parser - collecting references - knows bugs

      If you get too much not what you wanted to parse (heads instead of full body person) — change full-body ratio threshold or use additional requests or both.
      ---
      The first launch could be long if you're missing some dependencies.

About

scraper\parser for artists needs with sourcers like: pinterest (without any API), google and bing images, — support automatic sorting, filtering for poses, color etc

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages