Multi-Platform Web Scraper

This project is a robust web scraping tool designed to gather and process user and company data from platforms like LinkedIn, GitHub, Behance, Crunchbase, and TechCrunch. It integrates multiple APIs and scraping techniques to extract, process, and summarize relevant information.

Features

LinkedIn Scraping: Extract user profiles, company details, and other relevant data.
GitHub Integration: Identify LinkedIn usernames from GitHub profiles.
Behance Integration: Extract LinkedIn usernames from Behance profiles.
Crunchbase and TechCrunch: Scrape and summarize company details and news.
Google Search Support: Perform LinkedIn user discovery via Google search.
Logging: Detailed logging to track execution and debug issues.
Environment Variable Management: Secure handling of sensitive API keys using .env.

Installation

Clone this repository:

git clone https://github.com/your-username/multi-platform-web-scraper.git
cd multi-platform-web-scraper

Install the required dependencies:
```
pip install -r requirements.txt
```

Create a .env file in the project directory and add your API keys:

LINKEDIN_API_KEY=your_linkedin_api_key
LINKEDIN_API_HOST=your_linkedin_api_host
PROSPEO_API_KEY=your_prospeo_api_key

Usage

Run the script:
```
python main.py
```
Follow the prompts to enter user information:
- First name
- Last name
- Email
- Company name
- LinkedIn username
- Behance username
- GitHub username
The script processes the input, extracts data from various platforms, and logs the results.

File Structure

main.py: Entry point for the script.
linkedin_scraper/: Contains modules for LinkedIn scraping and data extraction.
github/github_scraper.py: Handles GitHub data scraping.
behance_to_linkedin/: Maps Behance profiles to LinkedIn.
linkedin_google_search.py: Performs LinkedIn discovery using Google search.
linkedin_to_cb_tc.py: Handles integration with Crunchbase and TechCrunch.
logfile.log: Log file generated during execution.

Logging

Logs are saved in logfile.log, with details on each step, including errors and extracted information.

Disclaimer

This project is intended for educational and personal use only. Scraping data from platforms like LinkedIn and others may violate their terms of service. Use responsibly.

Happy Scraping!

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
behance		behance
crunchbase		crunchbase
f6s		f6s
github		github
linkedin		linkedin
techcrunch		techcrunch
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
behance_to_linkedin.py		behance_to_linkedin.py
linkedin_google_search.py		linkedin_google_search.py
linkedin_to_cb_tc.py		linkedin_to_cb_tc.py
main.py		main.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi-Platform Web Scraper

Features

Installation

Usage

File Structure

Logging

Disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

emna-khemiri/Multi-Platform-Web-Scraper

Folders and files

Latest commit

History

Repository files navigation

Multi-Platform Web Scraper

Features

Installation

Usage

File Structure

Logging

Disclaimer

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages