An open-source tool to scrape malware, vulnerabilities, and phishing data from various sources including Reddit, BleepingComputer, X (Twitter), CISA, Pastebin, and PhishTank. The data is stored in SQLite for ethical hackers and security researchers.
Note: This tool is intended for ethical hacking and security research only. Please respect the terms of service of the data sources and privacy laws.
- Multi-source Scraping: Collects data from Reddit, BleepingComputer, X, CISA, Pastebin, and PhishTank
- Automated Data Collection: Scheduled scraping to keep the database up-to-date
- Command-line Interface: Easy access to the collected data
- Web Dashboard: Visual representation of the collected data
- API: Programmatic access to the collected data
- Community Features: Validation of collected data
- IP Addresses: Potentially malicious IP addresses
- Hashes: MD5, SHA256, and other hashes of malware samples
- CVEs: Common Vulnerabilities and Exposures identifiers
- URLs: Malicious and phishing URLs
- TTPs: Tactics, Techniques, and Procedures used by threat actors
For a quick demonstration of all features:
python demo.py
This will:
- Initialize the database with sample data
- Demonstrate the CLI features
- Start the web dashboard and API
- Open the web dashboard and API documentation in your browser
- Demonstrate the validation feature
- Run a sample spider
- Python 3.9+
- Git
- Chrome/Chromium (for the X spider which uses Selenium)
-
Clone the repository:
git clone https://github.com/yourusername/malware-scraper.git cd malware-scraper
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- Windows:
venv\Scripts\activate
- Linux/macOS:
source venv/bin/activate
- Windows:
-
Run the installation script:
python install.py
This will install all required dependencies and download the spaCy model.
-
Set up Reddit API credentials:
- Go to https://www.reddit.com/prefs/apps and create a new app
- Select "script" as the app type
- Create a
.env
file with the following content:REDDIT_CLIENT_ID=your_client_id REDDIT_CLIENT_SECRET=your_client_secret REDDIT_USER_AGENT=malware-scraper/1.0
The project includes several demo scripts to help you get started:
demo.py
: Demonstrates all featuresdemo_cli.py
: Demonstrates the CLIdemo_dashboard.py
: Demonstrates the web dashboarddemo_api.py
: Demonstrates the APIdemo_validate.py
: Demonstrates the validation featuredemo_scheduler.py
: Demonstrates the scheduler
You can run individual spiders:
cd scraper
scrapy crawl reddit_spider
scrapy crawl bleeping_spider
scrapy crawl x_spider
scrapy crawl cisa_spider
scrapy crawl pastebin_spider
scrapy crawl phishtank_spider
Or run all spiders at once:
python run_spiders.py
Or run all spiders automatically using the scheduler:
python scheduler.py
List all IOCs in the database:
python -m cli.app list_iocs
Search for IOCs by name:
python -m cli.app search --name Emotet
Export all IOCs to a CSV file:
python -m cli.app export_csv --output data/my_iocs.csv
Start the web dashboard:
python app.py
Then open http://localhost:5000
in your browser.
Features:
- View all IOCs in a table
- Filter by source and IOC type
- Search by name
- Validate or invalidate IOCs
- View source URLs
Start the API server:
uvicorn api:app --reload
API endpoints:
GET /iocs
: List all IOCs (with optional filtering)GET /iocs/{ioc_id}
: Get a specific IOC by IDGET /search?name=...
: Search IOCs by name
Interactive API documentation is available at http://localhost:8000/docs
.
You can validate or invalidate IOCs using the validation script:
python validate.py --id 1 --valid True
To run the entire system (dashboard and API):
python run.py
This will start the web dashboard and API, and open them in your browser.
For personal use or testing:
# Run the dashboard and API
python run.py
- Launch an EC2 instance (t2.micro for free tier)
- SSH into the instance:
ssh -i key.pem ubuntu@ec2-ip
- Install dependencies:
sudo apt update && sudo apt install python3 python3-pip git git clone https://github.com/yourusername/malware-scraper.git cd malware-scraper pip install -r requirements.txt
- Run as a background service:
nohup python run.py &
- Build the Docker image:
docker build -t malware-scraper .
- Run the container:
docker run -p 5000:5000 -p 8000:8000 malware-scraper
See DEPLOYMENT.md for detailed deployment instructions.
- Reddit: r/Malware, r/netsec, r/cybersecurity, r/hacking
- BleepingComputer: Security news articles
- X (Twitter): #malware hashtag
- CISA: Known Exploited Vulnerabilities Catalog
- Pastebin: Recent public pastes
- PhishTank: Recent phishing URLs
This tool is intended for ethical hacking and security research only. Please respect the terms of service of the data sources and privacy laws. Do not use the collected data for malicious purposes.
- Comply with the terms of service of each data source
- Respect rate limits and robots.txt
- Comply with data protection regulations (GDPR, CCPA, etc.)
- Use the data for defensive security purposes only
MIT
Please see CONTRIBUTING.md for details on how to contribute to this project.