A robust web scraping application built with Python that extracts and exports website data. Built by Shubhadeep Naskar.
- Extract website content (titles, headers, links, paragraphs)
- Export data to CSV or JSON formats
- User-friendly command line interface
- Error handling and validation
- Cross-platform compatibility
- Python 3.6+
- BeautifulSoup4
- Pandas
- Requests
- PyInstaller
# Clone repository
git clone https://github.com/yourusername/webscraper.git
# Setup virtual environment
python -m venv venv
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run application
python main.py
webscraper/
├── src/webscraper/
│ ├── config/ # Configuration settings
│ ├── scrapers/ # Scraping modules
│ └── utils/ # Helper functions
├── tests/ # Unit tests
└── main.py # Entry point
pip install -r requirements-dev.txt
pyinstaller webscraper.spec
pip install -r requirements-dev.txt pip install -e .
python -m pytest tests/
- Project Overview
- Features
- Technologies Used
- Installation & Setup
- Usage Instructions
- Build Process
- Directory Structure
- Contributing Guidelines
- License Information
# Python Web Scraper
A robust web scraping application built with Python that extracts and exports website data. Built by Shubhadeep Naskar.
## Features
- Extract website content (titles, headers, links, paragraphs)
- Export data to CSV or JSON formats
- User-friendly command line interface
- Error handling and validation
- Cross-platform compatibility
## Technologies Used
- Python 3.6+
- BeautifulSoup4
- Pandas
- Requests
- PyInstaller
## Quick Start
```bash
# Clone repository
git clone https://github.com/yourusername/webscraper.git
# Setup virtual environment
python -m venv venv
venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Run application
python main.py
webscraper/
├── src/webscraper/
│ ├── config/ # Configuration settings
│ ├── scrapers/ # Scraping modules
│ └── utils/ # Helper functions
├── tests/ # Unit tests
└── main.py # Entry point
# Install development requirements
pip install -r requirements-dev.txt
# Build executable
pyinstaller webscraper.spec
- Run the application
- Enter target website URL
- Review scraped data summary
- Choose export format (JSON/CSV)
# Setup development environment
pip install -r requirements-dev.txt
pip install -e .
# Run tests
python -m pytest tests/
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Shubhadeep Naskar
- Fork the repository
- Create feature branch
- Commit changes
- Push to branch
- Create Pull Request