This project is a robust web scraping tool designed to gather and process user and company data from platforms like LinkedIn, GitHub, Behance, Crunchbase, and TechCrunch. It integrates multiple APIs and scraping techniques to extract, process, and summarize relevant information.
- LinkedIn Scraping: Extract user profiles, company details, and other relevant data.
- GitHub Integration: Identify LinkedIn usernames from GitHub profiles.
- Behance Integration: Extract LinkedIn usernames from Behance profiles.
- Crunchbase and TechCrunch: Scrape and summarize company details and news.
- Google Search Support: Perform LinkedIn user discovery via Google search.
- Logging: Detailed logging to track execution and debug issues.
- Environment Variable Management: Secure handling of sensitive API keys using
.env
.
-
Clone this repository:
git clone https://github.com/your-username/multi-platform-web-scraper.git cd multi-platform-web-scraper
-
Install the required dependencies:
pip install -r requirements.txt
-
Create a
.env
file in the project directory and add your API keys:LINKEDIN_API_KEY=your_linkedin_api_key LINKEDIN_API_HOST=your_linkedin_api_host PROSPEO_API_KEY=your_prospeo_api_key
-
Run the script:
python main.py
-
Follow the prompts to enter user information:
- First name
- Last name
- Company name
- LinkedIn username
- Behance username
- GitHub username
-
The script processes the input, extracts data from various platforms, and logs the results.
main.py
: Entry point for the script.linkedin_scraper/
: Contains modules for LinkedIn scraping and data extraction.github/github_scraper.py
: Handles GitHub data scraping.behance_to_linkedin/
: Maps Behance profiles to LinkedIn.linkedin_google_search.py
: Performs LinkedIn discovery using Google search.linkedin_to_cb_tc.py
: Handles integration with Crunchbase and TechCrunch.logfile.log
: Log file generated during execution.
Logs are saved in logfile.log
, with details on each step, including errors and extracted information.
This project is intended for educational and personal use only. Scraping data from platforms like LinkedIn and others may violate their terms of service. Use responsibly.
Happy Scraping!