Skip to content

Automate scraping videos (with metadata!) from 9vids, familyporn.tv, familypornhd, incestflix, lonefun, luxure, motherless, pornhub, spankbang, tnaflix, txxx, xhamster, xnxx, xrares, xvideos—and more!

Notifications You must be signed in to change notification settings

io-flux/smutscrape

Repository files navigation

   ▒█▀▀▀█ █▀▄▀█ █░░█ ▀▀█▀▀ █▀▀ █▀▀ █▀▀█ █▀▀█ █▀▀█ █▀▀ 
   ░▀▀▀▄▄ █░▀░█ █░░█ ░░█░░ ▀▀█ █░░ █▄▄▀ █▄▄█ █░░█ █▀▀ 
   ▒█▄▄▄█ ▀░░░▀ ░▀▀▀ ░░▀░░ ▀▀▀ ▀▀▀ ▀░▀▀ ▀░░▀ █▀▀▀ ▀▀▀ 

Securing smut to salty pervs over CLI 🍆💦

A Python-based tool to scrape and download adult content from various websites straight to your preferred data store, alongside .nfo files that preserve the title, tags, actors, studios, and other metadata for a richer immediate watching experience in Plex, Jellyfin, or Stash.


Requirements 🧰

All Python dependencies are in requirements.txt.


Installation 🛠️

  1. Clone the Repo 📂

    git clone https://github.com/io-flux/smutscrape.git
    cd smutscrape
  2. Install Dependencies 🚀

    # With Conda (Recommended):
    conda create -n smutscrape python=3.10.13
    conda activate smutscrape
    pip install -r requirements.txt
    
    # With pip:
    pip3 install -r requirements.txt

    Install additional tools:

    # On Ubuntu/Debian
    sudo apt-get install yt-dlp wget curl ffmpeg chromium
    # On macOS with Homebrew
    brew install yt-dlp wget curl ffmpeg google-chrome

    For Selenium (not required for all sites):

    # webdriver-manager is the best solution for most people:
    pip install webdriver-manager
    # ... but a manual chromedriver installation may be necessary for some setups:
    brew install chromedriver

    For Cloudflare evasion:

    pip install --upgrade --force-reinstall --no-cache yt-dlp curl_cffi
  3. Configure config.yaml ⚙️

    cp example-config.yaml config.yaml
    nano config.yaml

    Set up download_destinations, ignored terms, selenium paths, and optional vpn integration for secure, anonymous scraping.

  4. Make Executable ⚡️

    chmod +x scrape.py
    # Optional: add a symlink for easy use from anywhere
    sudo ln -s $(realpath ./scrape.py) /usr/local/bin/scrape

Usage 🚀

Run python scrape.py (or scrape if symlinked) to download adult content and save metadata in .nfo files. With no arguments, you'll get a detailed, aesthetic readout of all supported site modes on your system, dynamically generated from ./sites/ configurations (see left image below). Alternatively, running scrape {code} (e.g., scrape ml) provides detailed info about that site—curated notes, tips, caveats, available metadata, special requirements, and usage examples (see right image below).

No Arguments Screenshot Site Identifier Screenshot

To start scraping, build commands following this basic syntax:

      scrape {code} {mode} {query}

Supported sites and modes:

Refer to this table of supported sites with available modes and metadata, or see the current configuration with latest updates by simply running scrape without arguments.

code site modes metadata
11v 11Vids video · search · tag ✦ · category actors · categories · description · studios · tags
9v 9Vids video · search · tag description · tags
bsip BrotherSisterIncestPorn video · all ✦ None
fdpis FatherDaughterPornIncestSex video · all ✦ None
fphd FamilyPornHD video · tag ✦ · model ✦ · search ✦ · studio ✦ · rss actors · description · studios · tags
fptv FamilyPorn video · model · tag · search · studio actors · description · studios · tags
fs Family Sex video · tag ✦ · search · model ✦ actors · description · studios · tags
fsv FamilySexVideos video · search None
fsx ForcedSex video · all ✦ None
if IncestFlix video · tag ✦‡ actors · studios · tags
ig IncestGuru video · tag ✦ actors · studios · tags
lf LoneFun video · search ✦ description · tags
lux Luxure video · search ✦ · channel description · tags
lv LeakVids video · search · tag · category actors · description · studios · tags
ml Motherless video · search ✦ · tag ✦ · user ✦ · group ✦ · group_code ✦ tags
msip MomSonIncestPorn video · all ✦ None
ph PornHub video · model ✦ · category ✦ · tag ✦ · studio ✦ · search ✦ · pornstar ✦ actors · code · date · studios · tags
rip RapeIncestPornXXXSex video · all ✦ None
sb SpankBang video · model ✦ · search ✦ · tag ✦ actors · description · tags
tna TNAflix video · search ✦ actors · date · description · studios · tags
tr TopRealIncestVideos video · search None
tt TabooTube video · search actors · date · description · studios · tags
tx TXXX video · search actors · description · studios · tags
xh xHamster video · model ✦ · studio ✦ · search ✦ · tag ✦ actors · studios · tags
xn XNXX video · search ✦ · model ✦ · tag ✦ · studio ✦ actors · date · description · studios · tags
xr Xrares video · search ✦ description · tags
xv XVideos video · search ✦ · studio ✦ · model ✦ · tag ✦ · playlist · profile actors · studios · tags

Supports pagination; see optional arguments below.

Selenium required.

Combine terms with "&".


Command-Line Arguments [ > ]

CLI Mode (default)

scrape [args] [optional arguments]
argument summary
-p {p}.{video} start scraping on a given page and video (e.g., -p 12.9 to start at video 9 on page 12.
-o, --overwrite download all videos, ignoring .state and overwriting existing media when filenames collide. ⚠
-n, --re_nfo refresh metadata and write new .nfo files, irrespective of whether --overwrite is set. ⚠
-a, --applystate retroactively add URL to .state without re-downloading if local file matches (-o has priority).
-t, --table {site} output site table in Markdown format and exit (specify site code or leave empty for all sites).
-d, --debug enable detailed debug logging.
-h, --help show the help submenu.

Server Mode

scrape --server [server options]
argument summary
--server run as FastAPI server instead of CLI mode.
--host {host} host to bind the API server to (overrides config.yaml).
--port {port} port to bind the API server to (overrides config.yaml).
-d, --debug enable detailed debug logging.
-h, --help show server-specific help menu.

⚠ Caution: Using --overwrite or --re_nfo risks overwriting different videos or .nfo files with identical names—a growing concern as your collection expands and generic titles (e.g., "Hot Scene") collide. Mitigate this by adding name_suffix: "{unique site identifier}" in a site's YAML config (e.g., name_suffix: " - Motherless.com" for Motherless, where duplicate titles are rampant).


Usage Examples 🙋

  1. All videos on Massy Sweet's 'pornstar' page on PornHub that aren't saved locally, refreshing metadata for already saved videos we encounter again:

    scrape ph pornstar "Massy Sweet" -n
  2. All videos produced by MissaX from FamilyPornHD, overwriting existing copies:

    scrape fphd studio "MissaX" -o
  3. Chloe Temple's videos involving brother-sister (BS) relations not yet saved locally, starting on page 4 of results with 6th video, recording URL for faster scraping when upon matching with local file:

    scrape if tag "Chloe Temple & BS" -a -p 4.6
  4. Down and dirty in debug logs for scraping that "real" incest stuff on Lonefun:

    scrape lf tag "real incest" -d
  5. One particular vintage mother/daughter/son video on Motherless:

    scrape https://motherless.com/2ABC9F3
  6. All videos from Halle Von's pornstar page on XNXX:

    scrape https://www.xnxx.com/pornstar/halle-von

API Server Mode 🌐

Smutscrape can now run as a FastAPI server, allowing you to execute scraping commands via HTTP requests. This is useful for integrating smutscrape into other applications or creating web interfaces.

# Start the API server (uses config.yaml settings or defaults to 127.0.0.1:6999)
python scrape.py --server

# Override with command-line arguments
python scrape.py --server --host 0.0.0.0 --port 8080

Configure default server settings in config.yaml:

api_server:
  host: "127.0.0.1"
  port: 6999

Available endpoints:

  • GET / - API information
  • GET /sites - List all supported sites
  • GET /sites/{code} - Get site details
  • POST /scrape - Execute a scrape command

Example API usage:

# Execute a scrape command via API
curl -X POST http://localhost:6999/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "command": "xh search \"Vintage taboo\"",
    "re_nfo": true
  }'

See API.md for complete API documentation.


Advanced Configuration ⚙️

Download Destinations 📁

Define destinations in config.yaml. The first is primary, any others are fallbacks.

download_destinations:
  - type: smb
    server: "192.168.69.69"
    share: "media"
    path: "xxx"
    username: "ioflux"
    password: "th3P3rv3rtsGu1d3"
    permissions:
      uid: 1000
      gid: 3003
      mode: "750"
    temporary_storage: "/Users/ioflux/.private/incomplete"
  - type: local
    path: "/Users/ioflux/.private/xxx"

Smutscrape was built with SMB in mind, and it's the recommended mode when it fits.

Filtering Content 🚫

Add any content you want Smutscrape to avoid altogether to the ignored terms list in your config.yaml:

ignored:
  - "JOI"
  - "Age Play"
  - "Psycho Thrillers"
  - "Virtual Sex"

All metadata fields are checked against the ignored list, so you can include specific genres, sex acts, performers, studios, etc. that you do not want content of.

Selenium & Chromedriver 🕵️‍♂️

For Javascript-heavy sites (marked on the table with †), selenium with chromedriver is required. By default, the script uses webdriver-manager for seamless setup. Some setups require a manual installation, including macOS typically. This worked for me:

  1. Install Chrome Binary:
wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chrome-mac-arm64.zip
unzip chrome-mac-arm64.zip
chmod +x "chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
sudo mv "chrome-mac-arm64/Google Chrome for Testing.app" /Applications/
  1. Install Chromedriver:
wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chromedriver-mac-arm64.zip
unzip chromedriver-mac-arm64.zip
chmod +x chromedriver-mac-arm64/chromedriver
sudo mv chromedriver-mac-arm64/chromedriver /usr/local/bin/chromedriver
  1. Update config.yaml:
selenium:
  mode: "local"
  chromedriver_path: "/usr/local/bin/chromedriver"
  chrome_binary: "/Applications/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"

VPN Support 🔒

Smutscrape can be set to automatically rotate VPN exit-nodes, using most existing VPN apps that have CLI tools. In config.yaml, enable and configure:

vpn:
  enabled: true
  vpn_bin: "/usr/bin/protonvpn"
  start_cmd: "{vpn_bin} connect -f"
  new_node_cmd: "{vpn_bin} connect -r"
  new_node_time: 1200  # Refresh IP every 20 minutes

Contributing 🤝

Smutscrape welcomes contributions! The application features a modular, PyPI-ready package structure that makes collaboration straightforward. Adding site configurations—YAML files with URL schemes and CSS selectors—is a simple, valuable contribution.

Inspired by Stash CommunityScrapers, Smutscrape's YAML configs adapt its structure. We use CSS selectors instead of XPath (though conversion is straightforward), and metadata fields port easily. The challenge is video downloading—some sites use iframes or countermeasures—but the yt-dlp fallback often simplifies this. Adapting a CommunityScrapers site for Smutscrape is a great way to contribute. Pick a site, tweak the config, and submit a pull request!


Scrape responsibly! You're on your own. 🧠💭

About

Automate scraping videos (with metadata!) from 9vids, familyporn.tv, familypornhd, incestflix, lonefun, luxure, motherless, pornhub, spankbang, tnaflix, txxx, xhamster, xnxx, xrares, xvideos—and more!

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published