Web Scraping User Data Analysis Tool

A comprehensive Python tool for scraping HTML tables, analyzing user data, and generating detailed reports with visualizations.

Installation

Clone the Repository

git clone https://github.com/pkscripts/web_scraper_bs4
cd web_scraper_bs4

Install Dependencies
```
pip install -r requirements.txt
```

Project Structure

web_scraper_bs4/
│
├── web_scraper_project.py    # Main script
├── index.html               # Input HTML file
├── requirements.txt         # Project dependencies
├── README.md               # Documentation
│
└── reports/                # Generated reports directory
    ├── user_data_20240312_143022.csv
    ├── report_20240312_143022.json
    └── country_distribution_20240312_143022.png

Features

Data Extraction
- User ID and basic information
- Contact details (email)
- Geographic data (country)
- Account status and join dates
- Profile URLs and avatar images
Report Generation
- CSV exports of raw data
- JSON formatted analysis
- Country-wise distribution charts
- Statistical summaries
Visualizations
- Bar charts for user distribution
- Country-wise analysis graphs
- Automated chart generation

Usage

Prepare Input File
- Place your HTML file named index.html in the project directory
- Ensure it contains user data in table format with class="user-row"
Run the Script
```
python web_scraper_project.py
```

Example Output

Console Output

Country-wise User Report:
+---------------+------------------+-------------------------+
| Country       | Number of Users | Users                   |
+---------------+------------------+-------------------------+
| United States | 25              | John Doe, Jane Smith... |
| Canada        | 15              | Mike Ross, Rachel Z...  |
| UK           | 10              | James Bond, Emma W...   |
+---------------+------------------+-------------------------+

Detailed Statistics:
Total number of countries: 12
Total number of users: 150
Active users: 142
Average users per country: 12.50

Top 5 countries by number of users:
United States: 25 users
Canada: 15 users
UK: 10 users
Germany: 8 users
France: 7 users

Reports have been generated and saved in the 'reports' directory.

Generated Files

a. CSV Report (user_data_[timestamp].csv):

id,name,email,country,join_date,status
001,John Doe,john@example.com,United States,2024-01-15,active
002,Jane Smith,jane@example.com,Canada,2024-01-16,active
...

b. JSON Report (report_[timestamp].json):

{
  "country_wise": [
    {
      "Country": "United States",
      "Number of Users": 25,
      "Users": "John Doe, Jane Smith..."
    }
  ],
  "statistics": {
    "total_countries": 12,
    "total_users": 150,
    "active_users": 142,
    "average_users_per_country": 12.5
  }
}

Requirements

Python 3.6+
beautifulsoup4>=4.9.3
tabulate>=0.8.9
pandas>=1.3.0
matplotlib>=3.4.3

Error Handling

The script includes error handling for common issues:

Missing input file
Invalid HTML structure
File permission errors
Data extraction failures

Troubleshooting

File Not Found Error

Error: Could not find index.html
Solution: Ensure index.html exists in the script directory

Parse Error

Error: Unable to parse HTML content
Solution: Verify HTML file format and encoding

Permission Error

Error: Cannot create reports directory
Solution: Check write permissions in the current directory

Contributing

Fork the repository
Create your feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Author

Your Name

GitHub: @Pandiyaraj

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping User Data Analysis Tool

Installation

Project Structure

Features

Usage

Example Output

Requirements

Error Handling

Troubleshooting

Contributing

License

Author

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
reports		reports
LICENSE		LICENSE
README.md		README.md
data_generator.py		data_generator.py
index.html		index.html
requirements.txt		requirements.txt
web_scraper_project.py		web_scraper_project.py

License

kpandiyaraj/web_scraper_bs4

Folders and files

Latest commit

History

Repository files navigation

Web Scraping User Data Analysis Tool

Installation

Project Structure

Features

Usage

Example Output

Requirements

Error Handling

Troubleshooting

Contributing

License

Author

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages