Skip to content

Tor-powered OSINT tool for ethical dark web mapping, extracting onion URLs, titles, descriptions, internal/external links, and metadata. Multi-depth crawling (5+ levels) and 7-layer classification uncover forums and markets for #OpChildSafety. A public JSON scan demo showcases its massive data collection. Restricted to authorized users

Notifications You must be signed in to change notification settings

NoTrac3/DarkScraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 

Repository files navigation

DarkScraper v6.0 - Advanced Dark Web Resource Discovery Engine

================================================================

DarkScraper is a cutting-edge, Tor-powered reconnaissance tool designed for systematic discovery and cataloging of hidden services on the dark web. Unlike basic scrapers, DarkScraper employs multi-layered discovery techniques, adaptive resource management, and intelligent classification to uncover valuable resources while maintaining operational security.

Powered by a sophisticated architecture, DarkScraper combines:

  • Tor-based anonymous crawling with automatic circuit rebuilding
  • Machine-learning inspired resource classification
  • Multi-threaded exploration with depth-controlled crawling
  • Real-time resource monitoring and adaptive throttling
  • Comprehensive metadata extraction and relationship mapping

Feature Highlights: ───────────────────

  1. Autonomous Discovery System:

    • Multi-source seed generation from 5+ dark web search engines
    • Priority URL handling with specialized crawling rules
    • Automatic search engine discovery and integration
  2. Intelligent Resource Classification:

    • 7-layer detection algorithm combining:
      • Content pattern recognition
      • Structural analysis (forms, tables, listings)
      • URL path heuristics
      • Semantic keyword scoring
    • Real-time classification of forums, markets, leak databases
    • Dynamic resource typing with confidence scoring
  3. Adaptive Operation System:

    • Real-time CPU/RAM monitoring with auto-throttling
    • Low-resource mode activation (75% threshold default)
    • Progressive backoff during service degradation
    • Automatic garbage collection and memory optimization
  4. Advanced Anonymity Features:

    • Tor identity rotation every 25 requests (configurable)
    • Randomized request timing with human-like patterns
    • Dynamic user-agent rotation from 100+ profiles
    • Referer spoofing from legitimate dark web sources
  5. Comprehensive Data Collection:

    • Full metadata extraction (headers, forms, links)
    • Resource relationship mapping
    • Size-based content filtering (10MB max)
    • Content validation (500-character minimum)
  6. Resilient Operation:

    • State persistence with auto-resume capability
    • Error assessment and intelligent recovery
    • Multi-phase interrupt handling
    • Deep exploration mode on demand
  7. Enterprise-grade Reporting:

    • Statistical analysis of operations
    • Resource type distribution
    • Size-based categorization
    • Error diagnostics with remediation guidance

Command Reference with Complete Examples: ─────────────────────────────────────────

  1. Standard Discovery Cycle: python darkscraper.py

    • Basic keyword discovery (25 built-in terms)
    • Depth-limited crawling (3 levels)
    • Output to default resources/ directory
  2. Extended Discovery with Custom Parameters: python darkscraper.py
    --keywords "vpn,encryption,whistleblower"
    --max-depth 5
    --max-concurrent 8
    --delay-min 2.5
    --delay-max 5.0
    --resource-dir sensitive_data/

    • Custom keyword set
    • Deeper crawling (5 levels)
    • Increased concurrency (8 threads)
    • Longer delays between requests
    • Custom output directory
  3. Endless Operation with Deep Exploration: python darkscraper.py
    -e -ed
    --auto-continue
    --merge-keywords
    --low-threshold 85

    • Continuous discovery cycles
    • Deeper exploration on interrupt
    • Auto-continue without prompts
    • Merged keyword processing
    • Higher resource threshold (85%)
  4. Clearnet Research Mode: python darkscraper.py
    --no-tor
    --tor-port 9150
    --control-port 9151
    --keywords "privacy,tools,security"
    --resource-dir clearnet_research/

    • Disables Tor routing
    • Uses custom ports
    • Focused clearnet research
    • Dedicated output directory
  5. Resource-Constrained Operation: python darkscraper.py
    --min-concurrent 1
    --max-concurrent 3
    --low-threshold 60
    --timeout 120

    • Conservative threading (1-3 concurrent)
    • Early low-resource activation (60% CPU/MEM)
    • Extended timeout (120 seconds)
  6. Targeted Dark Web Investigation: python darkscraper.py
    --keywords "marketplace,vendor,cvv"
    --max-depth 4
    --delay-min 3.0
    --delay-max 7.0
    --merge-keywords

    • Focused high-risk keyword set
    • Extended crawl depth
    • Increased operational delays
    • Merged keyword analysis
  7. Resume Interrupted Operation:

    After Ctrl+C interruption

    python darkscraper.py --auto-continue

    • Automatically resumes from last state
    • Preserves visited URLs and resources
    • Continues discovery cycle

Advanced Operational Guide: ──────────────────────────

Discovery Workflow:

  1. Seed Generation:

    • Query multiple dark web search engines
    • Extract 150-300 seed URLs per keyword
    • Prioritize .onion domains and known resources
  2. Depth-based Crawling:

    • Level 0: Seed URLs (direct search results)
    • Level 1: Discovered links from seed pages
    • Level 2: Links from promising resources
    • Level 3+: Specialized exploration of high-value targets
  3. Resource Processing:

    • Content type detection
    • Metadata extraction
    • Relationship mapping
    • Classification and scoring
  4. Continuous Enrichment:

    • New search engines added automatically
    • Priority URLs receive deeper analysis
    • Resource cross-linking between keywords

Performance Tuning:

  • Increase --max-concurrent for high-bandwidth connections
  • Adjust --delay-min/--delay-max for target responsiveness
  • Lower --low-threshold for resource-constrained systems
  • Use --merge-keywords for broad-scope investigations

Security Protocols:

  • Tor circuit renewal every 25 requests (default)
  • Strict .onion domain validation
  • No JavaScript execution
  • Automatic cleartext avoidance
  • Zero persistent cookies

Output Analysis Tools:

  1. Resource Explorer: jq '. | select(.type == "forum")' resources/*.json

    • Filter resources by type
  2. Relationship Mapper: jq '.metadata.internal_links' resources/*.json | sort | uniq -c | sort -nr

    • Identify frequently linked resources
  3. Timeline Analysis: jq '.discovered' resources/*.json | sort

    • Track discovery chronology
  4. Content Analysis: jq -r 'select(.content_length > 10000) | .url' resources/*.json

    • Find large resources

Ethical Guidelines:

  1. Legal Compliance:

    • Research only legal content
    • Check local jurisdiction laws
    • Never access prohibited materials
  2. Operational Security:

    • Use on isolated systems
    • Enable full-disk encryption
    • Route through VPN+Tor (double anonymization)
  3. Responsible Disclosure:

    • Report vulnerabilities to service owners
    • Never exploit discovered resources
    • Anonymize all research data

Warning: This tool provides access to unregulated networks. You may encounter:

  • Illegal content and services
  • Malicious actors and honeypots
  • Disturbing and harmful materials
  • Advanced monitoring systems

Use only with proper authorization, legal counsel, and psychological preparation. Maintain strict operational security protocols at all times.

About

Tor-powered OSINT tool for ethical dark web mapping, extracting onion URLs, titles, descriptions, internal/external links, and metadata. Multi-depth crawling (5+ levels) and 7-layer classification uncover forums and markets for #OpChildSafety. A public JSON scan demo showcases its massive data collection. Restricted to authorized users

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published