Advanced Email Web Scraper

This tool is designed to extract email addresses from a list of websites. It uses a two-step approach:

First, it tries a fast method using requests and BeautifulSoup
If that fails, it falls back to a more robust method using Selenium with Chrome WebDriver

Features

Two-step scraping approach for maximum effectiveness
Automatically checks contact pages for additional emails
Filters out false positives (image files with @ symbols)
Creates example URLs file if none exists
Saves results to CSV for easy analysis
Detailed console output with progress information

Requirements

Python 3.6 or higher
Chrome browser installed (for Selenium fallback method)
Required Python packages (see Installation)

Installation

Make sure you have Python installed (with "Add to PATH" option checked)
Install the required packages:

pip install -r requirements.txt

Or install them individually:

pip install selenium pandas beautifulsoup4 requests webdriver-manager

Usage

Create a file named urls.txt with one URL per line, for example:
```
https://example.com
https://example.org
```
Run the script:
```
python local_scraper.py
```
The script will create a file named extracted_emails.csv with the results

How It Works

For each URL in your list, the scraper first tries the fast method using requests
If no emails are found, it automatically switches to the more powerful Selenium method
Both methods also check for contact pages and scan them for additional emails
All unique emails are saved to a CSV file with their source URLs

Customization

You can modify the following variables at the top of the script:

URLS_FILE: Change the input file name (default: 'urls.txt')
OUTPUT_CSV: Change the output file name (default: 'extracted_emails.csv')
EMAIL_REGEX: Modify the regular expression used to find emails

Troubleshooting

If you encounter issues with Selenium:

Make sure Chrome is installed on your system
Try updating Chrome to the latest version
If you're on Linux, you might need to install additional dependencies

Notes

The script includes a 3-second delay when using Selenium to allow JavaScript to load
A 1-second delay is added between URLs to avoid overloading servers

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
local_scraper.py		local_scraper.py
requirements.txt		requirements.txt
run_scraper.bat		run_scraper.bat
urls.txt		urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Advanced Email Web Scraper

Features

Requirements

Installation

Usage

How It Works

Customization

Troubleshooting

Notes

About

Uh oh!

Releases

Packages

Languages

srinivas-skr/Website-Email-Scraper

Folders and files

Latest commit

History

Repository files navigation

Advanced Email Web Scraper

Features

Requirements

Installation

Usage

How It Works

Customization

Troubleshooting

Notes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages