Securing smut to salty pervs over CLI 🍆💦

   ▒█▀▀▀█ █▀▄▀█ █░░█ ▀▀█▀▀ █▀▀ █▀▀ █▀▀█ █▀▀█ █▀▀█ █▀▀ 
   ░▀▀▀▄▄ █░▀░█ █░░█ ░░█░░ ▀▀█ █░░ █▄▄▀ █▄▄█ █░░█ █▀▀ 
   ▒█▄▄▄█ ▀░░░▀ ░▀▀▀ ░░▀░░ ▀▀▀ ▀▀▀ ▀░▀▀ ▀░░▀ █▀▀▀ ▀▀▀

Securing smut to salty pervs over CLI 🍆💦

A Python-based tool to scrape and download adult content from various websites straight to your preferred data store, alongside .nfo files that preserve the title, tags, actors, studios, and other metadata for a richer immediate watching experience in Plex, Jellyfin, or Stash.

Requirements 🧰

Python 3.10+ 🐍
Recommended: Conda or Mamba for environment management 🐼
For JavaScript-heavy sites: Selenium + Chromedriver for JS-heavy sites, and webdriver-manager for foolproof ChromeDriver management.
ffmpeg for downloading from certain sites that use HLS and aren't supported by yt-dlp.
Dozen or so Python libraries in requirements.txt.

All Python dependencies are in requirements.txt.

Installation 🛠️

Clone the Repo 📂

git clone https://github.com/io-flux/smutscrape.git
cd smutscrape

Install Dependencies 🚀

# With Conda (Recommended):
conda create -n smutscrape python=3.10.13
conda activate smutscrape
pip install -r requirements.txt

# With pip:
pip3 install -r requirements.txt

Install additional tools:

# On Ubuntu/Debian
sudo apt-get install yt-dlp wget curl ffmpeg chromium
# On macOS with Homebrew
brew install yt-dlp wget curl ffmpeg google-chrome

For Selenium (not required for all sites):

# webdriver-manager is the best solution for most people:
pip install webdriver-manager
# ... but a manual chromedriver installation may be necessary for some setups:
brew install chromedriver

For Cloudflare evasion:

pip install --upgrade --force-reinstall --no-cache yt-dlp curl_cffi

Configure config.yaml ⚙️
```
cp example-config.yaml config.yaml
nano config.yaml
```
Set up download_destinations, ignored terms, selenium paths, and optional vpn integration for secure, anonymous scraping.

Make Executable ⚡️

chmod +x scrape.py
# Optional: add a symlink for easy use from anywhere
sudo ln -s $(realpath ./scrape.py) /usr/local/bin/scrape

Usage 🚀

Run python scrape.py (or scrape if symlinked) to download adult content and save metadata in .nfo files. With no arguments, you'll get a detailed, aesthetic readout of all supported site modes on your system, dynamically generated from ./sites/ configurations (see left image below). Alternatively, running scrape {code} (e.g., scrape ml) provides detailed info about that site—curated notes, tips, caveats, available metadata, special requirements, and usage examples (see right image below).

To start scraping, build commands following this basic syntax:

      scrape {code} {mode} {query}

Supported sites and modes:

Refer to this table of supported sites with available modes and metadata, or see the current configuration with latest updates by simply running scrape without arguments.

code	site	modes	metadata
`11v`	*11Vids* †	video · search · tag ✦ · category	actors · categories · description · studios · tags
`9v`	*9Vids* †	video · search · tag	description · tags
`bsip`	*BrotherSisterIncestPorn*	video · all ✦	None
`fdpis`	*FatherDaughterPornIncestSex*	video · all ✦	None
`fphd`	*FamilyPornHD* †	video · tag ✦ · model ✦ · search ✦ · studio ✦ · rss	actors · description · studios · tags
`fptv`	*FamilyPorn* †	video · model · tag · search · studio	actors · description · studios · tags
`fs`	*Family Sex* †	video · tag ✦ · search · model ✦	actors · description · studios · tags
`fsv`	*FamilySexVideos* †	video · search	None
`fsx`	*ForcedSex*	video · all ✦	None
`if`	*IncestFlix*	video · tag ✦‡	actors · studios · tags
`ig`	*IncestGuru*	video · tag ✦	actors · studios · tags
`lf`	*LoneFun*	video · search ✦	description · tags
`lux`	*Luxure*	video · search ✦ · channel	description · tags
`lv`	*LeakVids* †	video · search · tag · category	actors · description · studios · tags
`ml`	*Motherless* †	video · search ✦ · tag ✦ · user ✦ · group ✦ · group_code ✦	tags
`msip`	*MomSonIncestPorn*	video · all ✦	None
`ph`	*PornHub* †	video · model ✦ · category ✦ · tag ✦ · studio ✦ · search ✦ · pornstar ✦	actors · code · date · studios · tags
`rip`	*RapeIncestPornXXXSex*	video · all ✦	None
`sb`	*SpankBang*	video · model ✦ · search ✦ · tag ✦	actors · description · tags
`tna`	*TNAflix*	video · search ✦	actors · date · description · studios · tags
`tr`	*TopRealIncestVideos*	video · search	None
`tt`	*TabooTube* †	video · search	actors · date · description · studios · tags
`tx`	*TXXX* †	video · search	actors · description · studios · tags
`xh`	*xHamster* †	video · model ✦ · studio ✦ · search ✦ · tag ✦	actors · studios · tags
`xn`	*XNXX* †	video · search ✦ · model ✦ · tag ✦ · studio ✦	actors · date · description · studios · tags
`xr`	*Xrares*	video · search ✦	description · tags
`xv`	*XVideos* †	video · search ✦ · studio ✦ · model ✦ · tag ✦ · playlist · profile	actors · studios · tags

✦ Supports pagination; see optional arguments below.

† Selenium required.

‡ Combine terms with "&".

Command-Line Arguments [ > ]

CLI Mode (default)

scrape [args] [optional arguments]

argument	summary
`-p {p}.{video}`	start scraping on a given page and video (e.g., `-p 12.9` to start at video 9 on page 12.
`-o, --overwrite`	download all videos, ignoring `.state` and overwriting existing media when filenames collide. ⚠
`-n, --re_nfo`	refresh metadata and write new `.nfo` files, irrespective of whether `--overwrite` is set. ⚠
`-a, --applystate`	retroactively add URL to `.state` without re-downloading if local file matches (`-o` has priority).
`-t, --table {site}`	output site table in Markdown format and exit (specify site code or leave empty for all sites).
`-d, --debug`	enable detailed debug logging.
`-h, --help`	show the help submenu.

Server Mode

scrape --server [server options]

argument	summary
`--server`	run as FastAPI server instead of CLI mode.
`--host {host}`	host to bind the API server to (overrides config.yaml).
`--port {port}`	port to bind the API server to (overrides config.yaml).
`-d, --debug`	enable detailed debug logging.
`-h, --help`	show server-specific help menu.

⚠ Caution: Using --overwrite or --re_nfo risks overwriting different videos or .nfo files with identical names—a growing concern as your collection expands and generic titles (e.g., "Hot Scene") collide. Mitigate this by adding name_suffix: "{unique site identifier}" in a site's YAML config (e.g., name_suffix: " - Motherless.com" for Motherless, where duplicate titles are rampant).

Usage Examples 🙋

All videos on Massy Sweet's 'pornstar' page on PornHub that aren't saved locally, refreshing metadata for already saved videos we encounter again:
```
scrape ph pornstar "Massy Sweet" -n
```
All videos produced by MissaX from FamilyPornHD, overwriting existing copies:
```
scrape fphd studio "MissaX" -o
```
Chloe Temple's videos involving brother-sister (BS) relations not yet saved locally, starting on page 4 of results with 6th video, recording URL for faster scraping when upon matching with local file:
```
scrape if tag "Chloe Temple & BS" -a -p 4.6
```
Down and dirty in debug logs for scraping that "real" incest stuff on Lonefun:
```
scrape lf tag "real incest" -d
```
One particular vintage mother/daughter/son video on Motherless:
```
scrape https://motherless.com/2ABC9F3
```
All videos from Halle Von's pornstar page on XNXX:
```
scrape https://www.xnxx.com/pornstar/halle-von
```

API Server Mode 🌐

Smutscrape can now run as a FastAPI server, allowing you to execute scraping commands via HTTP requests. This is useful for integrating smutscrape into other applications or creating web interfaces.

# Start the API server (uses config.yaml settings or defaults to 127.0.0.1:6999)
python scrape.py --server

# Override with command-line arguments
python scrape.py --server --host 0.0.0.0 --port 8080

Configure default server settings in config.yaml:

api_server:
  host: "127.0.0.1"
  port: 6999

Available endpoints:

GET / - API information
GET /sites - List all supported sites
GET /sites/{code} - Get site details
POST /scrape - Execute a scrape command

Example API usage:

# Execute a scrape command via API
curl -X POST http://localhost:6999/scrape \
  -H "Content-Type: application/json" \
  -d '{
    "command": "xh search \"Vintage taboo\"",
    "re_nfo": true
  }'

See API.md for complete API documentation.

Advanced Configuration ⚙️

Download Destinations 📁

Define destinations in config.yaml. The first is primary, any others are fallbacks.

download_destinations:
  - type: smb
    server: "192.168.69.69"
    share: "media"
    path: "xxx"
    username: "ioflux"
    password: "th3P3rv3rtsGu1d3"
    permissions:
      uid: 1000
      gid: 3003
      mode: "750"
    temporary_storage: "/Users/ioflux/.private/incomplete"
  - type: local
    path: "/Users/ioflux/.private/xxx"

Smutscrape was built with SMB in mind, and it's the recommended mode when it fits.

Filtering Content 🚫

Add any content you want Smutscrape to avoid altogether to the ignored terms list in your config.yaml:

ignored:
  - "JOI"
  - "Age Play"
  - "Psycho Thrillers"
  - "Virtual Sex"

All metadata fields are checked against the ignored list, so you can include specific genres, sex acts, performers, studios, etc. that you do not want content of.

Selenium & Chromedriver 🕵️‍♂️

For Javascript-heavy sites (marked on the table with †), selenium with chromedriver is required. By default, the script uses webdriver-manager for seamless setup. Some setups require a manual installation, including macOS typically. This worked for me:

Install Chrome Binary:

wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chrome-mac-arm64.zip
unzip chrome-mac-arm64.zip
chmod +x "chrome-mac-arm64/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"
sudo mv "chrome-mac-arm64/Google Chrome for Testing.app" /Applications/

Install Chromedriver:

wget https://storage.googleapis.com/chrome-for-testing-public/134.0.6998.88/mac-arm64/chromedriver-mac-arm64.zip
unzip chromedriver-mac-arm64.zip
chmod +x chromedriver-mac-arm64/chromedriver
sudo mv chromedriver-mac-arm64/chromedriver /usr/local/bin/chromedriver

Update config.yaml:

selenium:
  mode: "local"
  chromedriver_path: "/usr/local/bin/chromedriver"
  chrome_binary: "/Applications/Google Chrome for Testing.app/Contents/MacOS/Google Chrome for Testing"

VPN Support 🔒

Smutscrape can be set to automatically rotate VPN exit-nodes, using most existing VPN apps that have CLI tools. In config.yaml, enable and configure:

vpn:
  enabled: true
  vpn_bin: "/usr/bin/protonvpn"
  start_cmd: "{vpn_bin} connect -f"
  new_node_cmd: "{vpn_bin} connect -r"
  new_node_time: 1200  # Refresh IP every 20 minutes

Contributing 🤝

Smutscrape welcomes contributions! The application features a modular, PyPI-ready package structure that makes collaboration straightforward. Adding site configurations—YAML files with URL schemes and CSS selectors—is a simple, valuable contribution.

Inspired by Stash CommunityScrapers, Smutscrape's YAML configs adapt its structure. We use CSS selectors instead of XPath (though conversion is straightforward), and metadata fields port easily. The challenge is video downloading—some sites use iframes or countermeasures—but the yt-dlp fallback often simplifies this. Adapting a CommunityScrapers site for Smutscrape is a great way to contribute. Pick a site, tweak the config, and submit a pull request!

Scrape responsibly! You're on your own. 🧠💭

Name		Name	Last commit message	Last commit date
Latest commit History 237 Commits
screenshots		screenshots
sites		sites
smutscrape		smutscrape
.gitignore		.gitignore
API.md		API.md
MANIFEST.in		MANIFEST.in
README.md		README.md
config.py		config.py
example-config.yaml		example-config.yaml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
scrape.py		scrape.py
setup.py		setup.py
sites.md		sites.md
update_sites_table_in_README.sh		update_sites_table_in_README.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Securing smut to salty pervs over CLI 🍆💦

Requirements 🧰

Installation 🛠️

Usage 🚀

To start scraping, build commands following this basic syntax:

Supported sites and modes:

Command-Line Arguments [ > ]

CLI Mode (default)

Server Mode

Usage Examples 🙋

API Server Mode 🌐

Advanced Configuration ⚙️

Download Destinations 📁

Filtering Content 🚫

Selenium & Chromedriver 🕵️‍♂️

VPN Support 🔒

Contributing 🤝

About

Uh oh!

Releases

Packages

Uh oh!

Languages

io-flux/smutscrape

Folders and files

Latest commit

History

Repository files navigation

Securing smut to salty pervs over CLI 🍆💦

Requirements 🧰

Installation 🛠️

Usage 🚀

To start scraping, build commands following this basic syntax:

Supported sites and modes:

Command-Line Arguments [ > ]

CLI Mode (default)

Server Mode

Usage Examples 🙋

API Server Mode 🌐

Advanced Configuration ⚙️

Download Destinations 📁

Filtering Content 🚫

Selenium & Chromedriver 🕵️‍♂️

VPN Support 🔒

Contributing 🤝

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages